JP2009295103A

JP2009295103A - Motion planning device, method, and program for autonomously moving robot, recording medium, and motion control device and method for autonomously moving robot

Info

Publication number: JP2009295103A
Application number: JP2008150729A
Authority: JP
Inventors: Hiroshi Kawano; 洋川野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-06-09
Filing date: 2008-06-09
Publication date: 2009-12-17
Anticipated expiration: 2028-06-09
Also published as: JP5079602B2

Abstract

<P>PROBLEM TO BE SOLVED: To make a motion plan of an autonomously moving robot which is safer and is hard to fail. <P>SOLUTION: A first state transition probability is calculated under a flow speed faster than one assumed so as to improve safety. The first state transition probability and prescribed first reward are used to obtain a state value function V<SP>π</SP>(s) on the basis of dynamic programming in a Markov decision process. A second state transition probability is calculated under a flow rate assumed. A second reward is defined in accordance with a first index with respect to a state of a transition destination. The second state transition probability and the second reward are used to obtain action value function Q<SP>π</SP>(s, a) and the state value function V<SP>π</SP>(s) on the basis of the dynamic programming in the Markov decision process. An action a making the action value function Q<SP>π</SP>(s, a) and the state value function V<SP>π</SP>(s) maximum is selected. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

この発明は、自律移動ロボットの動作計画装置、方法、プログラム及び記録媒体並びに自律移動ロボットの動作制御装置及び方法に関する。 The present invention relates to an operation planning apparatus, method, program and recording medium for an autonomous mobile robot, and an operation control apparatus and method for an autonomous mobile robot.

イナーシャ（慣性）が高く劣駆動型の自律移動ロボットの動作計画を行う技術として、マルコフ決定過程における動作計画法を利用した技術が知られている（例えば、特許文献１及び非特許文献１参照。）。 As a technique for performing an operation plan of an autonomous mobile robot with a high inertia (inertia) and an underdrive type, a technique using an operation planning method in a Markov decision process is known (see, for example, Patent Document 1 and Non-Patent Document 1). ).

この技術においては、想定される流速の下、各状態ｓ∈｛ｓ_１，…，ｓ_Ｎ｝にある自律移動ロボットが各行動ａ∈｛ａ_１，…，ａ_Ｍ｝を取った場合に各状態ｓ’∈｛ｓ_１，…，ｓ_Ｎ｝に遷移する状態遷移確率Ｐ^ａ _ｓｓ’と、そのときに得られる報酬Ｒ^ａ _ｓｓ’とをまず求める。例えば、到達点を含む状態ｓ’に遷移するときに与えられる報酬を１、障害物を含む状態ｓ’に遷移するときに与えられる報酬を−１、障害物を含まない状態ｓ’に遷移するときに与えられる報酬Ｒ^ａ _ｓｓ’を一律０とする。 In this technique, under the flow rate is assumed, each state _{_{s∈ {s 1, ..., s}} N} autonomous mobile robot each behavior in _{_{a∈ {a 1, ..., a}} M} each when taking the state _{_{s'∈ {s 1, ..., s}} N} ' and reward ^R _{a ss} obtained at that _time' transition to the state transition probability ^P _{a ss} to determine first and. For example, the reward given when transitioning to the state s ′ including the reaching point is 1, the reward given when transitioning to the state s ′ including the obstacle is −1, and the state s ′ not including the obstacle is transitioned. and reward ^R _{a ss'} the uniform 0 given at the time.

そして、状態遷移確率Ｐ^ａ _ｓｓ’と、そのときに得られる報酬Ｒ^ａ _ｓｓ’とを用いて、マルコフ決定過程における動的計画法に基づき、状態価値関数Ｖ^π（ｓ）を求める。そして、想定される流速と実際の流速の流速差を考慮しつつ、状態価値関数Ｖ^π（ｓ）を最大にする行動ａを選択し、その選択された行動ａに従って自律移動ロボットを制御する。
特開２００７−３１７１６５号公報 H.Kawano, “Three Dimensional Obstacle Avoidance of Autonomous Blimp Flying in Unknown Disturbance”, Proceeding of 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.123-130, October, 2006 Then, the state transition probability P ^a _ss using a _'and reward R ^a _ss obtained at that _time', based on dynamic programming in Markov decision processes, obtains the state value function V ^[pi a (s). Then, the action a that maximizes the state value function V ^π (s) is selected in consideration of the difference between the assumed flow speed and the actual flow speed, and the autonomous mobile robot is controlled according to the selected action a.
JP 2007-317165 A H. Kawano, “Three Dimensional Obstacle Avoidance of Autonomous Blimp Flying in Unknown Disturbance”, Proceeding of 2006 IEEE / RSJ International Conference on Intelligent Robots and Systems, pp.123-130, October, 2006

非特許文献１及び特許文献１においては、障害物を含まない状態ｓ’に遷移するときに与えられる報酬Ｒ^ａ _ｓｓ’を一律０としており、障害物を含まない遷移先の状態ｓ’における障害物へのぶつかりやすさを考慮していない。このため、想定よりも流速が速い場合に、動作計画が破綻しやすいという問題があった。 In Non-Patent Document 1 and Patent Document 1, it has a a uniform 0 'reward R ^a _ss given when transitioning to _the' state s free of obstacles, failure in the transition destination state s' that does not include an obstacle It does not consider the ease of hitting objects. For this reason, when the flow velocity is faster than expected, there is a problem that the operation plan is likely to fail.

この発明は、上記問題に鑑みて、より動作計画が破綻しづらい自律移動ロボットの動作計画装置、方法、プログラム及び記録媒体並びに自律移動ロボットの動作制御装置及び方法を提供することを目的とする。 In view of the above problems, an object of the present invention is to provide an operation planning apparatus, method, program, and recording medium for an autonomous mobile robot, and an operation control apparatus and method for an autonomous mobile robot, in which an operation plan is more difficult to fail.

不定の流速の流れがある流体の中で出発点に位置する自律移動ロボットを到達点に到達させるための動作計画を行うために、想定される流速よりも速い流速の下で、各状態にある自律移動ロボットが各行動を取った場合に各状態に遷移する第一状態遷移確率を計算する。各状態にある自律移動ロボットが各行動を取り各状態に遷移する場合に得られる第一報酬を、障害物を含む状態に遷移する場合に得られる第一報酬が障害物を含まない状態に遷移する場合に得られる第一報酬よりも低く又は高くなるように定める。第一状態遷移確率及び第一報酬を用いて、マルコフ決定過程における動的計画法に基づき、自律移動ロボットの障害物へのぶつかりやすさを表す第一指標を状態ごとに求める。想定される流速の下で、各状態にある自律移動ロボットが各行動を取った場合に各状態に遷移する第二状態遷移確率を計算する。各状態にある自律移動ロボットが各行動を取り各状態に遷移する場合に得られる第二報酬を、遷移先の状態についての第一指標に応じて定めると共に、到達点を含む状態に遷移する場合に得られる第二報酬が最も高く又は低くなるように定める。第二状態遷移確率及び第二報酬を用いて、マルコフ決定過程における動的計画法に基づき、自律移動ロボットの到達点への到達しやすさを表す第二指標を状態ごとに求める。 In order to make an action plan for an autonomous mobile robot at the starting point to reach the destination in a fluid with an indefinite flow, it is in each state under a flow rate that is faster than the assumed flow rate. When the autonomous mobile robot takes each action, a first state transition probability is calculated for transition to each state. The first reward obtained when an autonomous mobile robot in each state takes each action and transitions to each state, the first reward obtained when transitioning to a state that includes an obstacle transitions to a state that does not include an obstacle To be lower or higher than the first reward obtained when Using the first state transition probability and the first reward, a first index representing the ease of collision of an autonomous mobile robot with an obstacle is obtained for each state based on dynamic programming in the Markov decision process. When the autonomous mobile robot in each state takes each action under the assumed flow velocity, a second state transition probability is calculated for transition to each state. When the autonomous mobile robot in each state takes each action and transitions to each state, the second reward obtained is determined according to the first index for the transition destination state and transitions to a state that includes the arrival point The second reward to be obtained is determined to be the highest or lowest. Based on the dynamic programming in the Markov decision process, a second index representing the ease of reaching the destination of the autonomous mobile robot is obtained for each state using the second state transition probability and the second reward.

動作計画に基づき、不定の流速の下で出発点に位置する自律移動ロボットを到達点に到達するように制御するために、想定される流速と、流速の実測値との差である流速差を求める。自律移動ロボットが各行動を取った場合の遷移先の状態を、自律移動ロボットの位置を流速差の分だけ移動させることにより求める。遷移先予測部が求めた遷移先の状態についての第二指標を互いに比較して、到達点に最も到達しやすい行動を決定する。自律移動ロボットが決定された行動に従って移動するように、自律移動ロボットを制御する。 Based on the motion plan, in order to control the autonomous mobile robot located at the starting point under an indefinite flow velocity so as to reach the arrival point, the flow velocity difference that is the difference between the assumed flow velocity and the actual measured flow velocity is calculated. Ask. The state of the transition destination when the autonomous mobile robot takes each action is obtained by moving the position of the autonomous mobile robot by the difference in flow velocity. The second index for the state of the transition destination obtained by the transition destination prediction unit is compared with each other to determine the action that most easily reaches the destination. The autonomous mobile robot is controlled so that the autonomous mobile robot moves according to the determined behavior.

障害物を含まない遷移先の状態ｓ’における障害物へのぶつかりやすさを考慮して報酬を決定し、その際、想定される流速よりも速い流速を想定している。このため、動作計画がより破綻しづらくなる。 The reward is determined in consideration of the ease of hitting the obstacle in the transition destination state s ′ not including the obstacle, and at that time, a flow velocity faster than the assumed flow velocity is assumed. For this reason, the operation plan is more difficult to fail.

［マルコフ決定過程］
まず、この発明の技術的意義を把握するための基礎知識である強化学習（Reinforcement Learning）におけるマルコフ決定過程（Markov decision Process）の概略を説明する。 [Markov decision process]
First, an outline of a Markov decision process in reinforcement learning, which is basic knowledge for grasping the technical significance of the present invention, will be described.

環境を構成する離散的な状態の集合をＳ＝｛ｓ_１，ｓ_２，…，ｓ_Ｎ｝、行動主体が取り得る行動の集合をＡ＝｛ａ_１，ａ_２，…ａ_Ｍ｝と表す。環境中のある状態ｓ∈Ｓにおいて、行動主体がある行動ａ∈Ａを実行すると、環境は確率的に状態ｓ’∈Ｓへ遷移する。その遷移確率を
Ｐ^ａ _ｓｓ’＝Ｐｒ｛ｓ_ｔ＋１＝ｓ’｜ｓ_ｔ＝ｓ，ａ_ｔ＝ａ｝
により表す。このとき環境から行動主体へ報酬ｒが確率的に与えられるが、その期待値を
Ｒ^ａ _ｓｓ’＝Ｅ｛ｒ_ｔ｜ｓ_ｔ＝ｓ，ａ_ｔ＝ａ，ｓ_ｔ＋１＝ｓ’｝
とする。 _S = the set of discrete states which constitute the environmental _{{s 1, s 2, ...} , s N}, action _A = the set of entities can take action _{{a 1, a 2, ...} a M} represents the . In a state sεS in the environment, when an action aεA is executed, the environment probabilistically changes to the state s′εS. The transition probability is P ^a _{ss ′} = Pr {s _{t + 1} = s ′ | s _t = s, a _t = a}
Is represented by At this time, the reward r is probabilistically given from the environment to the action subject, and the expected value is expressed as R ^a _{ss ′} = E {r _t | s _t = s, a _t = a, s _{t + 1} = s ′}
And

なお、状態ｓ’に附されている記号’は、状態ｓとの識別を図るための記号である。時間微分を表す記号として記号’を用いることがあるが、記号’が附された対象がマルコフ状態遷移モデルの状態であるか否かで記号’の意味を容易に識別できるので、以降の説明でもこの記法に従うとする。 The symbol 'attached to the state s' is a symbol for identifying the state s. The symbol 'may be used as a symbol representing time differentiation, but the meaning of the symbol' can be easily identified by whether or not the object with the symbol 'is a state of the Markov state transition model. Suppose you follow this notation.

ある時間ステップｔで行った行動が、その後の報酬獲得にどの程度貢献したのかを評価するため、その後得られる報酬の時系列を考える。報酬の時系列評価は価値と呼ばれる。行動主体の目標は、価値を最大化すること、又は、価値を最大にする方策π（ｓ，ａ）を求めることである。方策π（ｓ，ａ）は、状態ｓにおいて行動ａを取ることを意味し、状態ｓと行動ａの各組み合わせについて定義される。価値は、時間の経過とともに報酬を割引率γ（０≦γ＜１）で割引いて合計される。すなわち、ある方策πの下での状態ｓの価値である状態価値関数Ｖ^π（ｓ）は、以下のように定義される。Ｅ_πは期待値を求める関数である。 In order to evaluate how much the action performed at a certain time step t contributed to the subsequent reward acquisition, a time series of rewards obtained thereafter is considered. The time series evaluation of reward is called value. The goal of the action subject is to maximize the value or to find a policy π (s, a) that maximizes the value. Policy π (s, a) means taking action a in state s, and is defined for each combination of state s and action a. The value is totaled by discounting the reward with a discount rate γ (0 ≦ γ <1) over time. That is, the state value function V ^π (s) that is the value of the state s under a certain policy π is defined as follows. E _π is a function for _obtaining an expected value.

ここでは価値関数として方策πの下での状態ｓの価値である状態価値関数Ｖ^π（ｓ）を採用したが、方策πの下で状態ｓにおいて行動ａを採ることの価値である行動価値関数Ｑ^π（ｓ，ａ）を採用することもできる。

Here, the state value function V ^π (s) that is the value of the state s under the policy π is adopted as the value function, but the behavior value function that is the value of taking the action a in the state s under the policy π. Q ^π (s, a) can also be adopted.

行動主体の目標は、最適な方策πを求めること、つまり任意の状態ｓについて価値関数（上記の例では状態価値関数Ｖ^π（ｓ）である。）が他の方策πを採った場合よりも劣るものではない方策πを求めることである。この方策πの探求は、Ｂｅｌｌｍａｎ方程式で表され、状態ｓと行動ａと遷移先の状態ｓ’との各組み合わせについてのＰ^ａ _ｓｓ’とＲ^ａ _ｓｓ’の値が定まっていれば、動的計画法（ダイナミックプログラミング法）により、最適な、状態価値関数Ｖ^π（ｓ）、行動価値関数Ｑ^π（ｓ）及び方策πを計算することができる（例えば、三上貞芳、皆川雅章共訳、R.S.Sutton、A.G.Barto 原著「強化学習」森北出版、1998、pp.94-118参照。）。動的計画法の処理は、周知技術であるため説明は省略する。 The goal of the action subject is to obtain an optimal policy π, that is, the value function (in the above example, the state value function V ^π (s)) for an arbitrary state s than when another policy π is adopted. It is to find a policy π that is not inferior. The search for this policy π is expressed by the Bellman equation, and if the values of P ^a _{ss ′} and R ^a _{ss ′} for each combination of the state s, the action a, and the transition state s ′ are determined, The optimal state value function V ^π (s), action value function Q ^π (s) and policy π can be calculated by the programming method (dynamic programming method) (for example, Sadayoshi Mikami, Masaaki Minagawa, (See RSSutton, AGBarto, “Strengthening Learning”, Morikita Publishing, 1998, pp.94-118.) Since the process of the dynamic programming is a well-known technique, description is abbreviate | omitted.

［自律移動ロボットの動作計画装置及び方法］
自律移動ロボットの動作計画装置及び方法の実施形態について説明する。 [Operation planning apparatus and method for autonomous mobile robot]
An embodiment of an operation planning apparatus and method for an autonomous mobile robot will be described.

この発明は、動的計画法による計算を２回行うことを特徴とする。１回目の動的計画法による計算により、障害物を含まない状態における将来の障害物へのぶつかりやすさを計算する（ステップＳ１からステップＳ３）。そして、そのぶつかりやすさをその状態に遷移するときに与えられる報酬として２回目の動的計画法に用いる（ステップＳ５）。これにより、障害物を含まない遷移先の状態ｓ’における障害物へのぶつかりやすさを考慮することができ、より動作計画が破綻しづらい動作計画を行うことができる。 The present invention is characterized in that the calculation by dynamic programming is performed twice. By the first calculation based on the dynamic programming method, the ease of hitting a future obstacle in a state not including the obstacle is calculated (step S1 to step S3). Then, the ease of collision is used for the second dynamic programming as a reward given when transitioning to the state (step S5). As a result, it is possible to consider the ease of hitting an obstacle in the transition destination state s ′ that does not include an obstacle, and it is possible to perform an operation plan that makes the operation plan less likely to fail.

この実施形態では、行動主体は、図４に例示される飛行船形の自律移動ロボットである。自律移動ロボットは、舵２、主推進器３、上下方向推進器４、ゴンドラ５、流速差取得部２１、位置計測部２５を有する。この自律移動ロボットは、真横方向に直接移動することができない。搭載アクチュエータである舵２、主推進器３、上下方向推進器４が制御可能な運動自由度よりも自律移動ロボットの運動自由度は高いので、この自律移動ロボットは劣駆動ロボットである。この実施形態では、自律移動ロボットとして飛行船タイプのものを採用しているが、水中無人探索機のような水中ロボットを採用してもよい。 In this embodiment, the action subject is an airship-shaped autonomous mobile robot illustrated in FIG. The autonomous mobile robot includes a rudder 2, a main propulsion unit 3, a vertical direction propulsion unit 4, a gondola 5, a flow velocity difference acquisition unit 21, and a position measurement unit 25. This autonomous mobile robot cannot move directly in the lateral direction. Since the autonomous mobile robot has a higher degree of freedom of movement than the degree of freedom of movement that can be controlled by the rudder 2, the main propulsion unit 3, and the vertical direction propulsion unit 4, the autonomous mobile robot is an underactuated robot. In this embodiment, an airship type robot is used as the autonomous mobile robot, but an underwater robot such as an underwater unmanned search machine may be used.

自律移動ロボットは、不定の流速の流れがある流体で満たされた空間を航行する。その空間は、マルコフ遷移状態モデルにより離散的にモデル化されており、自律移動ロボットの水平方向の位置のＸ座標、Ｙ座標、方位角ψ及び旋回速度ψ’の４つ次元から構成される。各次元は、その次元の物理量を測定するセンサの分解能に応じて離散化されている。 An autonomous mobile robot navigates a space filled with a fluid having an indefinite flow velocity. The space is discretely modeled by a Markov transition state model, and is composed of four dimensions: the X coordinate, the Y coordinate, the azimuth angle ψ, and the turning speed ψ ′ of the horizontal position of the autonomous mobile robot. Each dimension is discretized according to the resolution of the sensor that measures the physical quantity of that dimension.

予め定められた出発点を含む状態に位置する自律移動ロボットは、予め定められた行動の集合の中から１つの行動を選択する。そして、予め定められた行動単位時間Ｔだけその行動に従って移動して、遷移先の状態に移動する。この遷移先の状態において、再び、予め定められた行動の集合の中から１つの行動を選択して、行動単位時間Ｔだけその行動に従って移動して、遷移先の状態に移動する。この行動の選択と状態の遷移を繰り返すことにより、初めは出発点を含む状態に位置している自律移動ロボットは、予め定められた到達点を含む状態に移動しようとする。自律移動ロボットの動作計画装置は、そのための動作計画を行う。 An autonomous mobile robot located in a state including a predetermined starting point selects one action from a set of predetermined actions. And it moves according to the action only for predetermined action unit time T, and moves to the state of a transition place. In this transition destination state, one action is again selected from a predetermined set of actions, moved according to the action for the action unit time T, and moved to the transition destination state. By repeating this action selection and state transition, the autonomous mobile robot initially located in a state including the starting point tries to move to a state including a predetermined arrival point. The operation planning apparatus for an autonomous mobile robot performs an operation plan for that purpose.

＜ステップＳ１（図８）＞
第一状態遷移確率計算部１０（図１）は、想定される流速よりも速い流速の下で、各状態にある上記自律移動ロボットが各行動を取った場合に各状態に遷移する第一状態遷移確率を計算する（ステップＳ１）。すなわち、状態ｓ、行動ａ及び遷移先の状態ｓ’の各組合せについての第一状態遷移確率Ｐ^ａ _ｓｓ’を計算する。計算された第一状態遷移確率Ｐ^ａ _ｓｓ’は、第一動的計算部１２に送られる。 <Step S1 (FIG. 8)>
The first state transition probability calculation unit 10 (FIG. 1) is a first state that transitions to each state when the autonomous mobile robot in each state takes each action under a flow rate that is faster than the assumed flow rate. A transition probability is calculated (step S1). That is, the first state transition probability P ^a _{ss ′} for each combination of the state s, the action a, and the transition destination state s _′ is calculated. The calculated first state transition probability P ^a _{ss ′} is sent to the first dynamic calculation unit 12.

第一状態遷移確率Ｐ^ａ _ｓｓ’の計算方法の例について説明する。この例では、第一状態遷移確率計算部１０は、図２に例示するように、目標速度計算部１０１、変位量計算部１０２及び確率計算部１０３を含む。 For an example of calculation method of the first state transition probability P ^a _ss' will be described. In this example, the first state transition probability calculation unit 10 includes a target speed calculation unit 101, a displacement amount calculation unit 102, and a probability calculation unit 103, as illustrated in FIG.

≪ステップＳ１１≫
第一状態遷移確率計算部１０の目標速度計算部１０１は、自律移動ロボットが各状態ｓにおいてある各行動ａを取ったときの目標速度を決定する（ステップＳ１１）。目標速度は、変位量計算部１０２に送られる。例えば、各行動ａについて、下記の式に従って自律移動ロボットの旋回速度ψ^’ _τ（ｔ）と前後方向の速度ｖ_ｘｗτ（ｔ）を自律移動ロボットの目標速度として定める。（ｂ_１，ｂ_２）はマルコフ状態遷移モデルの各状態ｓにおける行動ａに対応する二次元ベクトル、αは予め定められた旋回加速度αであり、βは予め定められた前後方向の加速度であり、ｔは各行動ａの開始時からの経過時間、ψ’_τ０は行動ａの開始時における自律移動ロボットの旋回速度、ｖ_ｘ０は行動ａの開始時における自律移動ロボットの旋回速度である。 << Step S11 >>
The target speed calculation unit 101 of the first state transition probability calculation unit 10 determines a target speed when the autonomous mobile robot takes each action a in each state s (step S11). The target speed is sent to the displacement amount calculation unit 102. For example, for each action a, the turning speed ψ ^′ _τ (t) of the autonomous mobile robot and the speed v _xwτ (t) in the front-rear direction are determined as the target speed of the autonomous mobile robot according to the following formula. (B ₁ , b ₂ ) is a two-dimensional vector corresponding to the action a in each state s of the Markov state transition model, α is a predetermined turning acceleration α, and β is a predetermined longitudinal acceleration. , T is the elapsed time from the start of each action a, ψ ′ _τ0 is the turning speed of the autonomous mobile robot at the start of action a, and v _x0 is the turning speed of the autonomous mobile robot at the start of action a.

ここで、旋回加速度α及び前後方向の加速度βは、自律移動ロボットの性能の限界を超えないように設定される。また、前後方向の速度ｖ_ｚｗτ（ｔ）及び前後方向の加速度βは、
それぞれ対流体機体速度及び対流体機体加速度として記述される。 Here, the turning acceleration α and the longitudinal acceleration β are set so as not to exceed the performance limit of the autonomous mobile robot. Also, the longitudinal velocity v _zwτ (t) and the longitudinal acceleration β are:
These are described as fluid velocity and fluid velocity, respectively.

動作計画を行うために、想定される流速及び想定される流速よりも速い流速が予め設定されて、記憶部１９に格納されている。［自律移動ロボットの動作制御方法］の欄で後述するように想定される流速と流速の実測値とが異なる場合には適宜補正されるため、想定される流速は厳密な流速である必要はなく、おおよその流速でよい。もっとも、想定される流速が流速の実測値と近いほど、この動作計画及びこれに基づく動作制御の精度が増す。 In order to perform an operation plan, an assumed flow velocity and a flow velocity faster than the assumed flow velocity are set in advance and stored in the storage unit 19. As will be described later in the section [Operation Control Method for Autonomous Mobile Robot], if the assumed flow rate is different from the measured value of the flow rate, it will be corrected as appropriate, so the assumed flow rate does not have to be a strict flow rate. Approximate flow rate is sufficient. However, the closer the estimated flow velocity is to the actual measured flow velocity, the greater the accuracy of this operation plan and operation control based on this operation plan.

≪ステップＳ１２≫
変位量計算部１０２は、各状態ｓにある自律移動ロボットが、想定される流速よりも速い流速の下において、各行動ａに従って移動した場合の、自律移動ロボットの世界座標系における水平面内位置のＸ座標，Ｙ座標，方位角ψ及び旋回速度ψ’がそれぞれどれくらい変位するのか計算する（ステップＳ１２）。計算された変位量は確率計算部１０３に送られる。 << Step S12 >>
The displacement amount calculation unit 102 calculates the position in the horizontal plane in the world coordinate system of the autonomous mobile robot when the autonomous mobile robot in each state s moves according to each action a under a flow velocity faster than the assumed flow velocity. It calculates how much each of the X coordinate, the Y coordinate, the azimuth angle ψ, and the turning speed ψ ′ is displaced (step S12). The calculated displacement amount is sent to the probability calculation unit 103.

想定される流速よりも速い流速の下で計算を行うことにより、より安全な動作計画を行うことができる。想定される流速よりも速い流速は、例えば、想定される流速の中で最も速い流速とする。最も速い流速を用いることにより、最も安全な動作計画を行うことができる。 By performing the calculation under a flow rate that is faster than the assumed flow rate, a safer operation plan can be performed. The flow rate faster than the assumed flow rate is, for example, the fastest flow rate among the assumed flow rates. By using the fastest flow rate, the safest operation plan can be performed.

自律移動ロボットの水平面内位置のＸ座標の変位量をＤ_Ｘ（ψ_０，ａ）、Ｙ座標の変位量をＤ_Ｙ（ψ_０，ａ）、方位角ψの変位量をＤ_ψ（ψ_０，ａ）、旋回速度ψ’の変位量をＤ_ψ’（ψ_０，ａ）とすると、それぞれの変位量は、次式にように与えられる（図５を参照のこと）。 The X-coordinate displacement amount of the position in the horizontal plane of the autonomous mobile robot is D _X (ψ ₀ , a), the Y-coordinate displacement amount is _DY (ψ ₀ , a), and the azimuth angle ψ displacement amount is D _ψ (ψ ₀ , A), where the displacement amount of the turning speed ψ ′ is D _{ψ ′} (ψ ₀ , a), the respective displacement amounts are given by the following equations (see FIG. 5).

ここで、ψ_０は各状態ｓの開始時の方位角、Ｔは状態ｓから次の状態ｓ’に遷移するまでの時間（以下、行動単位時間とする）、ｆ_ｍｘは想定される流速よりも速い流速のＸ座標の成分、ｆ_ｍｙは想定される流速よりも速い流速のＹ座標の成分である。なお、方位角ψの変位量Ｄ_ψ（ψ_０，ａ）と、旋回速度ψ’の変位量Ｄ_ψ’（ψ_０，ａ）については、旋回速度ψ’の制御を行うことになるため、風の影響による補正は行わない。行動単位時間は例えば１５秒とすることができる。 Here, ψ ₀ is the azimuth angle at the start of each state s, T is the time to transition from the state s to the next state s ′ (hereinafter referred to as action unit time), and f _mx is from the assumed flow velocity. Is an X-coordinate component having a faster flow velocity, and f _my is a Y-coordinate component having a faster flow velocity than an assumed flow velocity. Since the displacement amount D _ψ (ψ ₀ , a) of the azimuth angle ψ and the displacement amount D _{ψ ′} (ψ ₀ , a) of the turning speed ψ ′, the turning speed ψ ′ is controlled. No correction is made due to wind effects. The action unit time can be set to 15 seconds, for example.

≪ステップＳ１３≫
確率計算部１０３は、自律移動ロボットの水平面内位置のＸ座標の変位量Ｄ_Ｘ（ψ_０，ａ）、Ｙ座標の変位量Ｄ_Ｙ（ψ_０，ａ）、方位角ψの変位量Ｄ_ψ（ψ_０，ａ）及び旋回速度ψ’の変位量Ｄ_ψ’（ψ_０，ａ）に基づいて、第一状態遷移確率Ｐ^ａ _ｓｓ’を計算する（ステップＳ１３）。 << Step S13 >>
The probability calculation unit 103 includes an X coordinate displacement amount D _X (ψ ₀ , a), a Y coordinate displacement amount D _Y (ψ ₀ , a), and an azimuth angle displacement amount D _{ψ of the} position in the horizontal plane of the autonomous mobile robot. Based on (ψ ₀ , a) and the displacement D _{ψ ′} (ψ ₀ , a) of the turning speed ψ ′, the first state transition probability P ^a _{ss ′} is calculated (step S13).

まず、状態ｓが、自律移動ロボットの水平面内位置のＸ座標、Ｙ座標、方位角ψ及び旋回速度ψ’の４つの次元で構成される格子で示されるとし、その格子をＲ（ｓ）と定義する（図６を参照のこと）。そして、その格子Ｒ（ｓ）を、上記各変位量から構成される変位量ベクトル（Ｄ_Ｘ（ψ_０，ａ），Ｄ_Ｙ（ψ_０，ａ），Ｄ_ψ（ψ_０，ａ），Ｄ_ψ’（ψ_０，ａ））で、平行移動したものをＲ_ｔ（ｓ）と定義する。 First, it is assumed that the state s is represented by a grid composed of four dimensions of an X coordinate, a Y coordinate, an azimuth angle ψ, and a turning speed ψ ′ of the position in the horizontal plane of the autonomous mobile robot. Define (see FIG. 6). Then, the lattice R (s) is converted into displacement vector (D _X (ψ ₀ , a), D _Y (ψ ₀ , a), D _ψ (ψ ₀ , a), D) composed of the respective displacements. _{ψ ′} (ψ ₀ , a)) is defined as R _t (s).

ここで、自律移動ロボットが状態ｓにあるときは、自律移動ロボットは、その状態ｓを表わす４次元の格子Ｒ（ｓ）の各点の何れかに、等しい確率で存在するものと仮定する。この仮定の下では、第一状態遷移確率Ｐ^ａ _ｓｓ’は、Ｒ_ｔ（ｓ）と各Ｒ（ｓ’）の重なった部分の体積に比例してそれぞれ求めることができる。ここで、Ｒ（ｓ’）は、Ｒ_ｔ（ｓ）と重なった格子である。すなわち、Ｒ（ｓ’）は、状態ｓにおいてある行動ａを取ったときの遷移先の候補の状態ｓ’に対応した４次元の格子である。Ｒ_ｔ（ｓ）は最大で８つのＲ（ｓ’）と重なる可能性がある。 Here, when the autonomous mobile robot is in the state s, it is assumed that the autonomous mobile robot exists with an equal probability at any point of the four-dimensional lattice R (s) representing the state s. Under this assumption, the first state transition probability P ^a _{ss ′} can be obtained in proportion to the volume of the overlapping portion of R _t (s) and each R (s ′). Here, R (s ′) is a lattice overlapping with R _t (s). That is, R (s ′) is a four-dimensional lattice corresponding to the transition destination candidate state s ′ when the action a in the state s is taken. R _t (s) may overlap with up to 8 R (s ′).

第一状態遷移確率Ｐ^ａ _ｓｓ’は、Ｒ_ｔ（ｓ）とあるＲ（ｓ’）の重なった部分の体積をＶ_０（ｓ，ｓ’，ａ）、Ｒ_ｔ（ｓ）とすべてのＲ（ｓ’）との重なった部分の体積をΣ_ｓ’Ｖ_０（ｓ，ｓ’，ａ）とすると、次式により求めることができる。 The first state transition probability P ^a _{ss ′} is the volume of the overlapping portion of R _t (s) and a certain R (s ′) as V ₀ (s, s ′, a), R _t (s) and all R If the volume of the portion overlapping (s ′) is Σ _{s ′} V ₀ (s, s ′, a), it can be obtained by the following equation.

ステップＳ１１からステップＳ１３の処理を適宜繰り返すことにより、状態ｓ、行動ａ及び遷移先の状態ｓ’の各組合せについての第一状態遷移確率Ｐ^ａ _ｓｓ’を求める。 By appropriately repeating the processing from step S11 to step S13, the first state transition probability P ^a _{ss ′} for each combination of the state s, the action a, and the transition destination state s ′ is obtained.

＜ステップＳ２＞
第一報酬決定部１１（図１）は、各状態ｓにある自律移動ロボットが各行動ａを取り各状態ｓ’に遷移する場合に得られる第一報酬Ｒ^ａ _ｓｓ’を、障害物を含む状態ｓ’に遷移する場合に得られる第一報酬Ｒ^ａ _ｓｓ’が障害物を含まない状態ｓ’に遷移する場合に得られる第一報酬Ｒ^ａ _ｓｓ’よりも高くなるように定める（ステップＳ２）。定められた第一報酬Ｒ^ａ _ｓｓ’は、第一動的計画部１２に送られる。 <Step S2>
First reward determining unit 11 (FIG. 1), the 'first reward R ^a _ss obtained when a transition _to' autonomous mobile robot in each state s each state s take the action a, including obstacle It is determined that the first reward R ^a _{ss ′} obtained when transitioning to the state s ′ is higher than the first reward R ^a _{ss ′} obtained when transitioning to the state s ′ not including an obstacle (step S2). ). First reward R ^a _ss defined _'is sent to the first dynamic programming unit 12.

例えば、障害物を含む状態ｓ’に遷移する場合に得られる第一報酬Ｒ^ａ _ｓｓ’を１として、障害物を含まない状態ｓ’に遷移する場合に得られる第一報酬Ｒ^ａ _ｓｓ’を０とする。 For example, the first reward R ^a _{ss ′} obtained when transitioning to a state s ′ including an obstacle is set to 1, and the first reward R ^a _{ss ′} obtained when transitioning to a state s ′ not including an obstacle is _defined as 1. 0.

状態ｓ（遷移先の状態ｓ’も状態ｓであることに変わりはない。）が障害物を含むかどうかは例えば下記の２つの方法によって判断される。 Whether or not the state s (the state s ′ of the transition destination is also the state s) includes an obstacle is determined by, for example, the following two methods.

〔第一の方法〕
地形モデル保存部１３には、各状態ｓが障害物を含むかどうかの情報を含む地形モデルが記憶されている。第一報酬決定部１１は、地形モデル保存部１３に記憶された地形モデルを参照して、遷移先の状態ｓ’が障害物を含むかどうかを判断する。 [First method]
The terrain model storage unit 13 stores a terrain model including information on whether each state s includes an obstacle. The first reward determination unit 11 refers to the terrain model stored in the terrain model storage unit 13 and determines whether or not the transition destination state s ′ includes an obstacle.

〔第二の方法〕
この方法では、図３に例示するように、第一報酬決定部１１は、傾斜角差計算部１１１、登坂角度計算部１１２及び障害物判断部１１３を含む。また、地形モデル保存部１３には、位置（Ｘ，Ｙ）と方位角ψの各組合せについての傾斜角データが記憶されている。 [Second method]
In this method, as illustrated in FIG. 3, the first reward determination unit 11 includes an inclination angle difference calculation unit 111, an uphill angle calculation unit 112, and an obstacle determination unit 113. Further, the terrain model storage unit 13 stores inclination angle data for each combination of the position (X, Y) and the azimuth angle ψ.

図３の傾斜角差計算部１１１が、状態ｓにおける地形の傾斜角θ_{ｓｔｅｅｐ}（ｓ）と、遷移先の状態ｓ’の傾斜角θ_{ｓｔｅｅｐ}（ｓ’）との差の絶対値ｄθ_{ｓｔｅｅｐ}（ｓ’，ｓ）を計算する。傾斜角の差の絶対値ｄθ_{ｓｔｅｅｐ}（ｓ’，ｓ）は下式により定義される（図７を参照のこと）。計算された傾斜角の差の絶対値ｄθ_{ｓｔｅｅｐ}（ｓ’，ｓ）は、障害物判断部１１３に送られる。 Inclination angle difference calculation unit 111 of FIG. 3, the inclination angle theta _steep terrain in the state s (s), the absolute value d [theta] _steep (s of the difference between the 'tilt angle theta _steep of (s' destination state s) ', S) is calculated. Absolute value d [theta] _steep difference of the tilt angle (s', s) is (see FIG. 7) that is as defined by the following equation. Absolute value d [theta] _steep of the computed difference between the tilt angle (s', s) is sent to the obstacle determining unit 113.

登坂角度計算部１１２が、状態ｓから遷移先の状態ｓ’に遷移するときの自律移動ロボットの最大登坂角度ｄθ_ｍａｘ（ｓ’，ｓ）を計算する。計算された最大登坂角度ｄθ_ｍａｘ（ｓ’，ｓ）は、障害物判断部１１３に送られる。

The climbing angle calculation unit 112 calculates the maximum climbing angle dθ _max (s ′, s) of the autonomous mobile robot when transitioning from the state s to the transition destination state s ′. The calculated maximum climb angle dθ _max (s ′, s) is sent to the obstacle determination unit 113.

ｖ_ｚ（ｓ）を状態ｓにおけるピッチ角の変化速度、ａ_ｈを自律移動ロボットのピッチ角変化の加速度の最大値、ｆ_ｘｂを風の前後方向の対機体速度とし、登坂角度は十分に小さく、上下方向には風は吹かないものとすると、最大登坂角度ｄθ_ｍａｘ（ｓ’，ｓ）は、以下のように定義される。最大登坂角度ｄθ_ｍａｘ（ｓ’，ｓ）は、自律移動ロボットが一回の行動で、どれだけ登坂角度を変化させることができるかということを表す。 V _z (s) is the change speed of the pitch angle in the state s, a _h is the maximum value of the acceleration of the change of the pitch angle of the autonomous mobile robot, and _fxb is the vehicle speed in the longitudinal direction of the wind. Assuming that no wind blows in the vertical direction, the maximum climbing angle dθ _max (s ′, s) is defined as follows. The maximum climb angle dθ _max (s ′, s) represents how much the climb angle can be changed by a single action of the autonomous mobile robot.

障害物判断部１１３は、傾斜角の差の絶対値ｄθ_{ｓｔｅｅｐ}（ｓ’，ｓ）と、最大登坂角度ｄθ_ｍａｘ（ｓ’，ｓ）とを比較して、上記傾斜角の差の絶対値ｄθ_{ｓｔｅｅｐ}（ｓ’，ｓ）が大きければ、その遷移先の状態ｓ’は障害物を含むと判断する。 Obstacle verifying section 113, the absolute value d [theta] _steep difference of the tilt angle (s ', s) and the maximum climbing angle d [theta] _max (s' compared, s) and the absolute value of the difference between the inclination angle d [theta] _{If step} (s ′, s) is large, it is determined that the transition destination state s ′ includes an obstacle.

第一報酬決定部１１は、障害物判断部１１３が出力した遷移先の状態ｓ’が障害物を含むかどうかの判断に基づいて、第一報酬を決定する。 The first reward determination unit 11 determines the first reward based on the determination as to whether or not the transition destination state s ′ output by the obstacle determination unit 113 includes an obstacle.

＜ステップＳ３＞
第一動的計画部１２は、第一状態遷移確率Ｐ^ａ _ｓｓ’及び第一報酬Ｒ^ａ _ｓｓ’を用いて、マルコフ決定過程における動的計画法に基づき、自律移動ロボットの障害物へのぶつかりやすさを表す第一指標を状態ごとに求める（ステップＳ３）。求まった第一指標は、第二報酬決定部１５に送られる。この例では、第一指標として、状態価値関数Ｖ^π（ｓ）を用いる。 <Step S3>
The first dynamic programming unit 12 uses the first state transition probability P ^a _{ss 'and} the first reward R ^a _ss', based on dynamic programming in Markov decision processes, hit the obstacle of the autonomous mobile robot A first index representing ease is obtained for each state (step S3). The obtained first index is sent to the second reward determining unit 15. In this example, the state value function V ^π (s) is used as the first index.

上述の通り、状態ｓ、行動ａ及び遷移先の状態ｓ’の各組合せについての第一状態遷移確率Ｐ^ａ _ｓｓ’及び第一報酬Ｒ^ａ _ｓｓ’が計算されていれば、動的計画法に基づいて、状態価値関数Ｖ^π（ｓ）を計算することができる。 As described above, if the first state transition probability P ^a _{ss ′} and the first reward R ^a _{ss ′} for each combination of the state s, the action a, and the transition destination state s _′ are calculated, the dynamic programming method is used. Based on this, the state value function V ^π (s) can be calculated.

＜ステップＳ４＞
第二状態遷移確率計算部１４は、想定される流速の下で、各状態にある上記自律移動ロボットが各行動を取った場合に各状態に遷移する第二状態遷移確率Ｐ^ａ _ｓｓ’を計算する。すなわち、状態ｓ、行動ａ及び遷移先の状態ｓ’の各組合せについての第二状態遷移確率Ｐ^ａ _ｓｓ’を計算する。計算された第二状態遷移確率Ｐ^ａ _ｓｓ’は、第二動的計算部１６に送られる。 <Step S4>
The second state transition probability calculation portion 14, under the flow rate is assumed, calculating a second state transition probability P ^a _ss' of the autonomous mobile robot in each state transitions to each state when taking each behavior To do. That is, the second state transition probability P ^a _{ss ′} for each combination of the state s, the action a, and the transition destination state s _′ is calculated. The calculated second state transition probability P ^a _{ss ′} is sent to the second dynamic calculation unit 16.

第一状態遷移確率計算部１０は、想定される流速よりも速い流速の下で第一状態遷移確率Ｐ^ａ _ｓｓ’を計算するのに対して、第二状態遷移確率計算部１４は、想定される流速の下で第二状態遷移確率Ｐ^ａ _ｓｓ’を計算する。この２回目の動作計画においては、自律移動ロボットを高精度で到達点に誘導するために、発生確率が高い流速を想定することが望ましいのである。この相違点を除き、第二状態遷移確率計算部１４における第二状態遷移確率Ｐ^ａ _ｓｓ’の計算方法は、第一状態遷移確率計算部１０における第一状態遷移確率Ｐ^ａ _ｓｓ’の計算方法と同様である。 First state transition probability calculation portion 10, whereas for calculating a first state transition probability P ^a _ss' under high flow velocity than the flow rate is assumed, the second state transition probability calculating unit 14 is assumed The second state transition probability P ^a _{ss ′} is calculated under the flow velocity of In this second motion plan, it is desirable to assume a flow velocity with a high probability of occurrence in order to guide the autonomous mobile robot to the arrival point with high accuracy. Except for this difference, the calculation method of the second state transition probability P ^a _{ss ′} in the second state transition probability calculation unit 14 is the same as the calculation method of the first state transition probability P ^a _{ss ′} in the first state transition probability calculation unit 10. It is the same.

すなわち、例えば、第二状態遷移確率計算部１４は、第一状態遷移確率計算部１０と同様に（図２参照）、目標速度計算部１０１、変位量計算部１０２及び確率計算部１０３を有しており、これらの各部が想定される流速の下で第二状態遷移確率Ｐ^ａ _ｓｓ’を計算する。この場合、目標速度、方位角ψの変位量Ｄ_ψ（ψ_０，ａ）、旋回速度ψ’の変位量Ｄ_ψ’（ψ_０，ａ）及び旋回速度ψ’の変位量Ｄ_ψ’（ψ_０，ａ）は流速に依存しないため、第一状態遷移確率計算部１０におけるこれらの計算結果を、第二状態遷移確率計算部１４において再利用することにより、計算の重複を省いてもよい。 That is, for example, the second state transition probability calculation unit 14 includes a target speed calculation unit 101, a displacement amount calculation unit 102, and a probability calculation unit 103, similarly to the first state transition probability calculation unit 10 (see FIG. 2). Then, the second state transition probability P ^a _{ss ′} is calculated under the flow velocity assumed for each of these parts. In this case, the target speed, the displacement amount D _ψ (ψ ₀ , a) of the azimuth angle ψ, the displacement amount D _{ψ ′} (ψ ₀ , a) of the turning speed ψ ′, and the displacement amount D _{ψ ′} (ψ of the turning speed ψ ′ _{Since 0} , a) does not depend on the flow velocity, the calculation results in the first state transition probability calculation unit 10 may be reused in the second state transition probability calculation unit 14 to eliminate duplication of calculation.

＜ステップＳ５＞
第二報酬決定部１５は、各状態ｓにある上記自律移動ロボットが各行動ａを取り各状態ｓ’に遷移する場合に得られる第二報酬Ｒ^ａ _ｓｓ’を、遷移先の状態ｓ’についての第一指標に応じて定めると共に、到達点を含む状態に遷移する場合に得られる第二報酬が最も高くなるように定める（ステップＳ５）。定められた報酬は、第二動的計画部１６に送られる。 <Step S5>
The second reward determination unit 15 uses the second reward R ^a _{ss ′} obtained when the autonomous mobile robot in each state s takes each action a and transitions to each state s ′ as to the transition state s ′. The second reward obtained when transitioning to a state including the reaching point is determined so as to be the highest (step S5). The determined reward is sent to the second dynamic planning unit 16.

この例では、ステップＳ２において、第一報酬決定部１１は、障害物を含む状態ｓ’に遷移する場合に得られる第一報酬Ｒ^ａ _ｓｓ’が障害物を含まない状態ｓ’に遷移する場合に得られる第一報酬Ｒ^ａ _ｓｓ’よりも高くなるように定めている。このため、したがって、第一指標であるＶ^π（ｓ）の値が大きければ大きい程、その状態ｓおいて自律移動ロボットは障害物へぶつかりやすくなる。 In this example, in step S2, the first reward determining unit 11, when the 'first reward R ^a _ss obtained when a transition _to' state s including obstacles is shifted to the state s' that does not include an obstacle and determined to be higher than the first reward R ^a _ss' obtained. Therefore, the larger the value of V ^π (s) that is the first index, the easier it is for the autonomous mobile robot to hit an obstacle in the state s.

したがって、例えば、第一指標である状態価値関数Ｖ^π（ｓ’）の符号を反転させたものを、第二報酬Ｒ^ａ _ｓｓ’とする。
Ｒ^ａ _ｓｓ’＝−Ｖ^π（ｓ’） …（１）
また、到達点を含む状態ｓ’に遷移する場合に得られる第二報酬Ｒ^ａ _ｓｓ’を１とする。 Therefore, for example, a value obtained by inverting the sign of the state value function V ^π (s ′) as the first index is set as the second reward R ^a _{ss ′} .
R ^a _{ss ′} = −V ^π (s ′) (1)
Further, the 1 'second reward R ^a _ss obtained when a transition _to' state s including arrival point.

＜ステップＳ６＞
第二動的計画部１６は、第二状態遷移確率Ｐ^ａ _ｓｓ’及び第二報酬Ｒ^ａ _ｓｓ’を用いて、マルコフ決定過程における動的計画法に基づき、自律移動ロボットの到達点への到達しやすさを表す第二指標を計算する（ステップＳ６）。求まった第二指標は、第二指標記憶部１７に格納される。この例では、第二指標として、状態価値関数Ｖ^π（ｓ）を用いる。 <Step S6>
The second dynamic programming unit 16, using the second state transition probability P ^a _{ss 'and} second reward R ^a _ss', based on dynamic programming in Markov decision processes, reaching the arrival point of the autonomous mobile robot A second index representing ease of operation is calculated (step S6). The obtained second index is stored in the second index storage unit 17. In this example, the state value function V ^π (s) is used as the second index.

上述の通り、状態ｓ、行動ａ及び遷移先の状態ｓ’の各組合せについての第二状態遷移確率Ｐ^ａ _ｓｓ’及び第一報酬Ｒ^ａ _ｓｓ’が計算されていれば、動的計画法に基づいて、状態価値関数Ｖ^π（ｓ）を計算することができる。 As described above, if the second state transition probability P ^a _{ss ′} and the first reward R ^a _{ss ′} for each combination of the state s, the action a, and the transition destination state s _′ are calculated, the dynamic programming method is used. Based on this, the state value function V ^π (s) can be calculated.

このように、１回目の動的計画法による計算により、障害物を含まない状態における将来の障害物へのぶつかりやすさを計算し（ステップＳ１からステップＳ３）、そのぶつかりやすさをその状態に遷移するときに与えられる報酬として２回目の動的計画法に用いる（ステップＳ５）。これにより、障害物を含まない遷移先の状態ｓ’における障害物へのぶつかりやすさを考慮することができ、より動作計画が破綻しづらい動作計画を行うことができる。
以上が、自律移動ロボットの動作計画装置及び方法の実施形態について説明である。 In this way, by the first calculation by dynamic programming, the likelihood of hitting a future obstacle in a state that does not include the obstacle is calculated (step S1 to step S3). As a reward given at the time of transition, it is used for the second dynamic programming (step S5). As a result, it is possible to consider the ease of hitting an obstacle in the transition destination state s ′ that does not include an obstacle, and it is possible to perform an operation plan in which the operation plan is less likely to fail.
The above is the description of the embodiment of the operation planning apparatus and method for the autonomous mobile robot.

［自律移動ロボットの動作制御装置及び方法］
以下、図９及び図１０を参照して、自律移動ロボットの動作制御装置及び方法の実施形態について説明する。 [Operation control apparatus and method for autonomous mobile robot]
Hereinafter, an embodiment of an operation control apparatus and method for an autonomous mobile robot will be described with reference to FIGS. 9 and 10.

＜ステップＳ２１（図１０）＞
流速差取得部２１（図９）は、動作計画時に予想した流速である想定される流速と、流速の実測値との差である流速差を求める（ステップＳ２１）。求まった流速差は、遷移先予測部２２に送られる。想定される流速のＸ成分をｆ_ｘ、Ｙ成分をｆ_ｙとし、実際の流速のＸ成分をｆ_ｘａ、Ｙ成分をｆ_ｙａとすると、流速差ｄ_ｆｘ，ｄ_ｆｙは、それぞれ下記のように表される。 <Step S21 (FIG. 10)>
The flow velocity difference acquisition unit 21 (FIG. 9) obtains a flow velocity difference that is a difference between an assumed flow velocity that is predicted at the time of the operation plan and an actual flow velocity value (step S21). The obtained flow velocity difference is sent to the transition destination prediction unit 22. The X component of the velocity that is assumed _f x, the Y component and _{f y,} the actual flow rate of the X component _{f xa,} when the Y component is _{f ya,} flow rate difference _d _{fx, d fy,} respectively as follows expressed.

ｄ_ｆｘ＝ｆ_ｘ−ｆ_ｘａ
ｄ_ｆｙ＝ｆ_ｙ−ｆ_ｙａ d _fx = _f x _{-f xa}
d _fy = _f y _{-f ya}

＜ステップＳ２２＞
遷移先予測部２２は、自律移動ロボットが各行動を取った場合の遷移先の状態ｓ’を、自律移動ロボットの位置を流速差ｄ_ｆｘ，ｄ_ｆｙの分だけ移動させることにより求める（ステップＳ２２）。求まった遷移先の状態ｓ’は、行動決定部２３に送られる。 <Step S22>
The transition destination prediction unit 22 obtains the transition destination state s ′ when the autonomous mobile robot takes each action by moving the position of the autonomous mobile robot by the flow velocity differences d _fx and d _fy (step S22). ). The obtained transition destination state s ′ is sent to the action determining unit 23.

遷移先の状態ｓ’の求め方の例を述べる。
流速差ｄ_ｆｘを考慮したときの自律移動ロボットのＸ軸方向の位置の変位量Ｄ_Ｘａ（ψ_０，ａ）と、流速差ｄ_ｆｙを考慮したときの自律移動ロボットのＹ軸方向の位置の変位量Ｄ_Ｙａ（ψ_０，ａ）とは、それぞれ以下のように示される。 An example of how to obtain the transition destination state s ′ will be described.
The displacement D _Xa (ψ ₀ , a) of the position of the autonomous mobile robot in the X-axis direction when considering the flow velocity difference d _fx and the position of the autonomous mobile robot in the Y-axis direction when considering the flow velocity difference d _fy The displacement amount D _Ya (ψ ₀ , a) is expressed as follows.

遷移先予測部２２は、まず、上記式により、すなわち自律移動ロボットの位置を流速差ｄ_ｆｘ，ｄ_ｆｙの分だけ移動させることにより、実際のＸ軸方向の位置の変位量Ｄ_Ｘａ（ψ_０，ａ）及び実際のＹ軸方向の位置の変位量Ｄ_Ｙａ（ψ_０，ａ）を求める。 First, the transition destination prediction unit 22 moves the position of the autonomous mobile robot by the flow velocity differences d _fx and d _fy by the above-described equation, so that the actual displacement amount X _Xa (ψ ₀ , A) and the actual displacement amount D _Ya (ψ ₀ , a) in the Y-axis direction.

遷移先予測部２２は、次に、下記式により、行動ａを取った場合の遷移先の状態ｓ’を求める。具体的には、行動ａの開始時における、Ｘ軸方向の位置Ｘ（ｓ）に、Ｙ軸方向の位置Ｙ（ｓ）、方位角ψ_０（ｓ）及び旋回速度ψ_０（ｓ）に、それぞれ実際のＸ軸方向の位置の変位量Ｄ_Ｘａ（ψ_０，ａ）、実際のＹ軸方向の位置の変位量Ｄ_Ｙａ（ψ_０，ａ）、方位角の変位量Ｄ_ψ（ψ_０，ａ）及び旋回速度ψ’の変位量Ｄ_ψ’（ψ_０，ａ）を加算することにより遷移先の状態ｓ’を求める。 Next, the transition destination predicting unit 22 obtains the transition destination state s ′ when the action a is taken according to the following formula. Specifically, at the start of the action a, the position X (s) in the X-axis direction, the position Y (s) in the Y-axis direction, the azimuth angle ψ ₀ (s), and the turning speed ψ ₀ (s) The actual displacement amount X _Xa (ψ ₀ , a) in the X-axis direction position, the actual displacement amount D _Ya (ψ ₀ , a) in the Y-axis direction, and the azimuth displacement amount D _ψ (ψ ₀ , The transition destination state s ′ is obtained by adding a) and the displacement amount D _{ψ ′} (ψ ₀ , a) of the turning speed ψ ′.

Ｘ軸方向の位置Ｘ（ｓ）に、Ｙ軸方向の位置Ｙ（ｓ）、方位角ψ_０（ｓ）及び旋回速度ψ_０（ｓ）については、位置計測部２５が測定したものを用いる。Ｄ_Ｘ（ψ_０，ａ）及びＤ_Ｙ（ψ_０，ａ）については、動作計画時に計算したＤ_Ｘ（ψ_０，ａ）及びＤ_Ｙ（ψ_０，ａ）を再利用してもよい。この場合、図示していない記憶部にＤ_Ｘ（ψ_０，ａ）及びＤ_Ｙ（ψ_０，ａ）が記憶され、遷移先予測部２２が適宜これらを読み込む。もちろん、遷移先予測部２２がこれらを再度計算してもよい。 For the position X (s) in the X-axis direction, the position Y (s) in the Y-axis direction, the azimuth angle ψ ₀ (s), and the turning speed ψ ₀ (s) are those measured by the position measurement unit 25. The D _X (ψ _0, a) and _{_{D Y (ψ 0, a)}} , D X calculated at the operation plan (ψ _0, a) and _{D Y} (ψ _0, a) may be reused. In this case, D _X (ψ ₀ , a) and D _Y (ψ ₀ , a) are stored in a storage unit (not shown), and the transition destination prediction unit 22 reads them appropriately. Of course, the transition destination prediction unit 22 may calculate these again.

＜ステップＳ２３＞
行動決定部２３は、遷移先予測部２２が求めた、状態ｓにおいて取り得る各行動ａに従って移動した場合の遷移先の状態ｓ’についての第二指標を比較して、到達点に最も到達しやすい行動ａを決定する（ステップＳ２３）。決定された行動ａは、制御部２３に送られる。 <Step S23>
The action determination unit 23 compares the second index for the transition destination state s ′ when moving according to each action a that can be taken in the state s, which is obtained by the transition destination prediction unit 22, and reaches the arrival point most. The easy action a is determined (step S23). The determined action a is sent to the control unit 23.

この例では、第二指標として状態価値関数Ｖ^π（ｓ）を用いており、かつ、到達点を含む状態に遷移する場合に得られる報酬が最も高くなるように第二報酬が決定されているため、状態価値関数Ｖ^π（ｓ）の値を最も大きくする行動ａが、到達点に最も到達しやすい行動となる。 In this example, the state value function V ^π (s) is used as the second index, and the second reward is determined so that the reward obtained when transitioning to the state including the reaching point is the highest. Therefore, the action a that ^{maximizes the} value of the state value function V ^π (s) is the action that most easily reaches the arrival point.

したがって、行動決定部２３は、第二指標記憶部１７を参照して、状態ｓにおいて取り得る各行動ａに従って移動した場合の遷移先の状態ｓ’における状態価値関数Ｖ^π（ｓ’）をそれぞれ求め、比較することにより、状態価値関数Ｖ^π（ｓ’）の値を最も大きくする行動ａを決定する。 Therefore, the action determination unit 23 refers to the second index storage unit 17 and determines the state value function V ^π (s ′) in the transition destination state s ′ when moving according to each action a that can be taken in the state s. By obtaining and comparing, the action a that ^{maximizes the} value of the state value function V ^π (s ′) is determined.

＜ステップＳ２４＞
制御部２４は、決定された行動ａに従って移動するように、自律移動ロボットを制御する（ステップＳ２４）。具体的には、行動ａに対応する目標速度を維持することができるように、自律移動ロボットの主推進器３及び舵２を制御する。 <Step S24>
The control unit 24 controls the autonomous mobile robot so as to move according to the determined action a (step S24). Specifically, the main propulsion unit 3 and the rudder 2 of the autonomous mobile robot are controlled so that the target speed corresponding to the action a can be maintained.

［変形例等］
出発点と到達点との間に障害物があり、出発点から到達点に向かう方向における流速が想定される流速よりも速い場合には、障害物との衝突が起こりやすい。想定される流速と流速の実測値との流速差の分だけ、自律ロボットが障害物に近づいてしまうためである。一方、出発点から到達点に向かう方向とは反対側の方向（到達点から出発点に向かう方向）における流速が速い場合には、障害物との衝突が起こりづらい。想定される流速と流速の実測値との流速差の分だけ、自律ロボットは障害物から遠ざかるためである。 [Modifications, etc.]
When there is an obstacle between the starting point and the reaching point and the flow velocity in the direction from the starting point to the reaching point is faster than the assumed flow velocity, collision with the obstacle is likely to occur. This is because the autonomous robot approaches the obstacle by the difference in flow rate between the assumed flow rate and the actual measured flow rate. On the other hand, when the flow velocity in the direction opposite to the direction from the starting point to the reaching point (the direction from the reaching point to the starting point) is high, the collision with the obstacle is difficult to occur. This is because the autonomous robot moves away from the obstacle by the difference in flow rate between the assumed flow rate and the measured flow rate.

したがって、想定される流速よりも速い流速として、出発点から到達点に向かう方向において、想定される流速よりも速い流速を選択してもよい。これにより、より安全な動作計画を行うことができる。 Therefore, as a flow rate faster than the assumed flow rate, a flow rate faster than the assumed flow rate may be selected in the direction from the starting point to the arrival point. Thereby, a safer operation plan can be performed.

ここで、出発点から到達点に向かう方向において流速が速いとは、その流速のベクトルと出発点から到達点に向かうベクトルとの内積が０より大ということに等しい。 Here, that the flow velocity is fast in the direction from the starting point to the destination point is equivalent to the inner product of the vector of the flow velocity and the vector from the starting point toward the destination point being greater than zero.

上記の実施形態では、第一報酬決定部１１（図１）のステップＳ２（図８）の処理において、障害物を含む状態ｓ’に遷移する場合に得られる第一報酬Ｒ^ａ _ｓｓ’を１として、障害物を含まない状態ｓ’に遷移する場合に得られる第一報酬Ｒ^ａ _ｓｓ’を０としたが、障害物を含む状態ｓ’に遷移する場合に得られる第一報酬Ｒ^ａ _ｓｓ’＞障害物を含まない状態ｓ’に遷移する場合に得られる第一報酬Ｒ^ａ _ｓｓ’となるように、第一報酬Ｒ^ａ _ｓｓ’を定めてもよい。 In the above embodiment, in the process of step S2 (FIG. 8) of the first reward determination unit 11 (FIG. 1), the first reward R ^a _{ss ′} obtained when transitioning to the state s ′ including an obstacle is 1 The first reward R ^a _{ss ′} obtained when transitioning to the state s ′ not including an obstacle is set to 0, but the first reward R ^a _ss obtained when transitioning to the state s ′ including an obstacle is _'as a first reward R ^a _ss' first reward R ^a _ss obtained when a transition to _'> state s free of obstructions' may define.

また、障害物を含む状態ｓ’に遷移する場合に得られる第一報酬Ｒ^ａ _ｓｓ’＜障害物を含まない状態ｓ’に遷移する場合に得られる第一報酬Ｒ^ａ _ｓｓ’となるように、第一報酬Ｒ^ａ _ｓｓ’を定めてもよい。例えば、障害物を含む状態ｓ’に遷移する場合に得られる第一報酬Ｒ^ａ _ｓｓ’＝０、障害物を含まない状態ｓ’に遷移する場合に得られる第一報酬Ｒ^ａ _ｓｓ’＝１とする。この場合、第一動的計画部１２が求める第一指標であるＶ^π（ｓ）は、その値が小さければ小さい程、その状態ｓにおいて自律移動ロボットは障害物へぶつかりやすくなることを表す。したがって、第二報酬Ｒ^ａ _ｓｓ’を定める際に、状態価値関数Ｖ^π（ｓ）の符号を反転させる必要はない。例えば、下記式のように、状態価値関数Ｖ^π（ｓ’）の値をそのまま第二報酬Ｒ^ａ _ｓｓ’として用いる。 Further, the first reward R ^a _{ss ′} obtained when transitioning to the state s ′ including an obstacle is less than the first reward R ^a _{ss ′} obtained when transitioning to the state s ′ not including the obstacle. The first reward R ^a _ss' may be determined. For example, the first reward R ^a _{ss ′} = 0 obtained when transitioning to a state s ′ including an obstacle, and the first reward R ^a _{ss ′} = 1 obtained when transitioning to a state s ′ not including an obstacle And In this case, V ^π (s), which is the first index obtained by the first dynamic planning unit 12, indicates that the smaller the value, the easier the autonomous mobile robot hits an obstacle in the state s. Accordingly, it is not necessary to invert the sign of the state value function V ^π (s) when determining the second reward R ^a _{ss ′} . For example, as shown in Equation used state value function V ^[pi 'the value of the intact second reward R ^a _{ss (s)'} as.

Ｒ^ａ _ｓｓ’＝Ｖ^π（ｓ’） …（２）
単調増加関数ｆを用いて、第二報酬を第一指標に応じて定めてもよい。上記（１）式に代えて、下記式を用いる。 R ^a _{ss ′} = V ^π (s ′) (2)
The second reward may be determined according to the first index using the monotonically increasing function f. The following formula is used instead of the formula (1).

Ｒ^ａ _ｓｓ’＝−ｆ（Ｖ^π（ｓ’））
また、上記（２）式に代えて、下記式を用いる。 R ^a _{ss ′} = −f (V ^π (s ′))
Further, the following formula is used instead of the formula (2).

Ｒ^ａ _ｓｓ’＝ｆ（Ｖ^π（ｓ’））
上記実施形態では、第二報酬決定部１５（図１）のステップＳ５（図８）の処理において、到達点を含む状態に遷移する場合に得られる第二報酬Ｒ^ａ _ｓｓ’が最も高くなるように第二報酬Ｒ^ａ _ｓｓ’を定めるたが、到達点を含む状態に遷移する場合に得られる第二報酬Ｒ^ａ _ｓｓ’が最も低くなるように第二報酬Ｒ^ａ _ｓｓ’を定めてもよい。この場合、第二動的計画部１６が求める第二指標は、その値が小さければ小さい程、その状態に位置する自律移動ロボットは到達点へ到達しやすいことを意味する。したがって、行動決定部２３（図９）はステップＳ２４（図１０）の処理において、第二指標を最も小さくする行動を選択すればよい。 R ^a _{ss ′} = f (V ^π (s ′))
In the above embodiment, in the processing of step S5 (FIG. 8) of the second reward determining unit 15 (FIG. 1), the second reward R ^a _ss obtained when a transition to a state containing arrival _points' so that the highest the _'but defines a second reward R ^a _ss obtained when a transition to a state containing the arrival _point' second reward R ^a _ss may define second reward R ^a _ss' so that the lowest . In this case, the second index obtained by the second dynamic planning unit 16 means that the smaller the value, the easier the autonomous mobile robot located in that state reaches the destination. Therefore, the action determination unit 23 (FIG. 9) may select an action that minimizes the second index in the process of step S24 (FIG. 10).

第二指標として、行動価値関数Ｑ^π（ｓ，ａ）を用いてもよい。この場合、第二動的計画部１６（図１）はステップＳ６の処理において、状態ｓ、行動ａ及び遷移先の状態ｓ’の各組合せについての第二状態遷移確率Ｐ^ａ _ｓｓ’及び第一報酬Ｒ^ａ _ｓｓ’を用いて、動的計画法に基づいて、行動価値関数Ｑ^π（ｓ，ａ）を計算する。そして、行動決定部２３（図９）はステップＳ２３の処理において、第二指標である行動価値関数Ｑ^π（ｓ，ａ）を比較して、到達点に最も到達しやすい行動を決定する。具体的には、行動価値関数Ｑ^π（ｓ，ａ）がその値が大きい程到達点に到達しやすいことを表すように定められている場合には、遷移前の状態ｓにおいて取り得る行動を行動ａ、遷移先の状態ｓ’において取り得る行動を行動ａ’として、ｍａｘ_ａ’Ｑ^π（ｓ’，ａ’）を比較して、ｍａｘ_ａ’Ｑ^π（ｓ’，ａ’）を最大にする行動ａを選択する。 The behavior value function Q ^π (s, a) may be used as the second index. In this case, the second dynamic planning unit 16 (FIG. 1) determines the second state transition probability P ^a _{ss ′} and the first state for each combination of the state s, the action a, and the transition destination state s ′ in the process of step S6. Using the reward R ^a _{ss ′} , an action value function Q ^π (s, a) is calculated based on dynamic programming. And the action determination part 23 (FIG. 9) compares action value function Q ( ^pi ) (s, a) which is a 2nd parameter | index in the process of step S23, and determines the action which reaches | attains most easily to an arrival point. Specifically, when the action value function Q ^π (s, a) is determined so as to represent that the reaching point is easily reached as the value increases, actions that can be taken in the state s before the transition are determined. The action a and the action that can be taken in the transition destination state s ′ are set as action a ′, and max _{a ′} Q ^π (s ′, a ′) is compared, and max _{a ′} Q ^π (s ′, a ′) is maximized. The action a to be selected is selected.

上記自律移動ロボットの動作計画装置及び上記自律移動ロボットの動作制御装置における処理機能は、コンピュータによって実現することができる。この場合、これらの装置がそれぞれ有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、これらの装置における各処理機能が、コンピュータ上で実現される。 The processing functions of the autonomous mobile robot motion planning device and the autonomous mobile robot motion control device can be realized by a computer. In this case, the processing contents of the functions that each of these apparatuses should have are described by a program. Then, by executing this program on a computer, each processing function in these devices is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、これらの装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. In this embodiment, these apparatuses are configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

この発明は、上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 The present invention is not limited to the above-described embodiment, and can be modified as appropriate without departing from the spirit of the present invention.

自律移動ロボットの動作計画装置の例の機能ブロック図。The functional block diagram of the example of the operation | movement plan apparatus of an autonomous mobile robot. 第一状態遷移確率計算部の例のブロック図。The block diagram of the example of a 1st state transition probability calculation part. 第一報酬決定部の例のブロック図。The block diagram of the example of a 1st reward determination part. 自律移動ロボットの模式図。Schematic diagram of an autonomous mobile robot. 水平方向の位置の変位量を説明するための図。The figure for demonstrating the displacement amount of the position of a horizontal direction. 状態ｓと遷移先の状態ｓ’の傾斜角差を表した図。The figure showing the inclination-angle difference of state s and state s' of a transition destination. 状態遷移確率の計算を説明するための図。The figure for demonstrating calculation of a state transition probability. 自律移動ロボットの動作計画方法の例を示す流れ図。The flowchart which shows the example of the operation | movement planning method of an autonomous mobile robot. 自律移動ロボットの動作制御装置の例の機能ブロック図。The functional block diagram of the example of the operation control apparatus of an autonomous mobile robot. 自律移動ロボットの動作制御方法の例を示す流れ図。The flowchart which shows the example of the operation control method of an autonomous mobile robot.

符号の説明Explanation of symbols

１０第一状態遷移確率計算部
１１第一報酬決定部
１２第一動的計画部
１３地形モデル保存部
１４第二状態遷移確率計算部
１５第二報酬決定部
１６第二動的計画部
１７第二指標記憶部
２１流速差取得部
２２遷移先予測部
２３行動決定部
２４制御部
２５位置計測部
１１１傾斜角差計算部
１１２登坂角度計算部
１１３障害物判断部 DESCRIPTION OF SYMBOLS 10 1st state transition probability calculation part 11 1st reward determination part 12 1st dynamic planning part 13 Terrain model preservation | save part 14 2nd state transition probability calculation part 15 2nd reward determination part 16 2nd dynamic planning part 17 2nd Index storage unit 21 Flow velocity difference acquisition unit 22 Transition destination prediction unit 23 Action determination unit 24 Control unit 25 Position measurement unit 111 Inclination angle difference calculation unit 112 Uphill angle calculation unit 113 Obstacle determination unit

Claims

不定の流速の流れがある流体の中で出発点に位置する自律移動ロボットを到達点に到達させるための動作計画を行う自律移動ロボットの動作計画装置において、
想定される流速よりも速い流速の下で、各状態にある上記自律移動ロボットが各行動を取った場合に各状態に遷移する第一状態遷移確率を計算する第一状態遷移確率計算部と、
各状態にある上記自律移動ロボットが各行動を取り各状態に遷移する場合に得られる第一報酬を、障害物を含む状態に遷移する場合に得られる第一報酬が障害物を含まない状態に遷移する場合に得られる第一報酬よりも低く又は高くなるように定める第一報酬決定部と、
上記第一状態遷移確率及び上記第一報酬を用いて、マルコフ決定過程における動的計画法に基づき、上記自律移動ロボットの障害物へのぶつかりやすさを表す第一指標を状態ごとに求める第一動的計画部と、
上記想定される流速の下で、各状態にある上記自律移動ロボットが各行動を取った場合に各状態に遷移する第二状態遷移確率を計算する第二状態遷移確率計算部と、
各状態にある上記自律移動ロボットが各行動を取り各状態に遷移する場合に得られる第二報酬を、遷移先の状態についての上記第一指標に応じて定めると共に、上記到達点を含む状態に遷移する場合に得られる第二報酬が最も高く又は低くなるように定める第二報酬決定部と、
上記第二状態遷移確率及び上記第二報酬を用いて、マルコフ決定過程における動的計画法に基づき、上記自律移動ロボットの上記到達点への到達しやすさを表す第二指標を状態ごとに求める第二動的計画部と、
を含む自律移動ロボットの動作計画装置。 In the autonomous mobile robot motion planning device that performs the motion planning for the autonomous mobile robot located at the starting point in the fluid with the flow of indefinite velocity to reach the destination,
A first state transition probability calculation unit that calculates a first state transition probability that transitions to each state when the autonomous mobile robot in each state takes each action under a flow rate that is faster than an assumed flow rate;
The first reward obtained when the autonomous mobile robot in each state takes each action and transitions to each state is changed to the state where the first reward obtained when transitioning to a state including an obstacle does not include an obstacle A first reward determination unit that is determined to be lower or higher than the first reward obtained when transitioning;
A first index for each state that indicates the likelihood of the autonomous mobile robot hitting an obstacle based on dynamic programming in the Markov decision process using the first state transition probability and the first reward A dynamic planning department;
A second state transition probability calculation unit that calculates a second state transition probability that transitions to each state when the autonomous mobile robot in each state takes each action under the assumed flow velocity;
A second reward obtained when the autonomous mobile robot in each state takes each action and transitions to each state is determined according to the first index for the state of the transition destination, and includes a state including the arrival point A second reward determining unit that determines that the second reward obtained when making a transition is the highest or lowest;
Using the second state transition probability and the second reward, a second index representing the ease of reaching the destination of the autonomous mobile robot is obtained for each state based on dynamic programming in the Markov decision process. A second dynamic planning department;
An autonomous mobile robot motion planning device.

請求項１に記載の自律移動ロボットの動作計画装置において、
地形モデルを参照して、上記マルコフ状態遷移モデルのある状態における地形の傾斜角と、その状態においてある行動を選択した後の遷移先の状態における地形の傾斜角との差の絶対値を計算する傾斜角差計算部と、
想定される流速よりも速い流速の下で、上記ある状態から上記遷移先の状態に遷移するときの、上記自律移動ロボットの最大登坂角度を計算する登坂角度計算部と、
上記傾斜角の差の絶対値と上記最大登坂角度とを比較して、上記傾斜角の差の絶対値が大きければ、その遷移先の状態は障害物を含むと判断する障害物判断部と、
を更に含むことを特徴とする自律移動ロボットの動作計画装置。 The operation planning device for an autonomous mobile robot according to claim 1,
Referring to the terrain model, calculate the absolute value of the difference between the inclination angle of the terrain in the state of the Markov state transition model and the inclination angle of the terrain in the transition destination state after selecting an action in the state An inclination angle difference calculation unit;
A climbing angle calculation unit that calculates a maximum climbing angle of the autonomous mobile robot when transitioning from the certain state to the transition destination state under a flow rate that is faster than an assumed flow rate;
An obstacle determination unit that compares the absolute value of the difference in inclination angle with the maximum climbing angle and determines that the state of the transition destination includes an obstacle if the absolute value of the difference in inclination angle is large;
An autonomous mobile robot motion planning apparatus, further comprising:

請求項１又は２に記載の自律移動ロボットの動作計画装置において、
上記想定される流速よりも速い流速は、上記想定される流速の中で最も速い流速である、ことを特徴とする自律移動ロボットの動作計画装置。 The operation planning apparatus for an autonomous mobile robot according to claim 1 or 2,
An operation planning apparatus for an autonomous mobile robot, wherein a flow velocity faster than the assumed flow velocity is the fastest flow velocity among the assumed flow velocity.

請求項１から３の何れかに記載の自律移動ロボットの動作計画装置において、
上記想定される流速よりも速い流速は、上記出発点から上記到達点に向かう方向において、上記想定される流速よりも速い、
ことを特徴とする自律移動ロボットの動作計画装置。 In the operation planning device for an autonomous mobile robot according to any one of claims 1 to 3,
A flow rate faster than the assumed flow rate is faster than the assumed flow rate in the direction from the starting point to the arrival point.
An operation planning device for an autonomous mobile robot.

請求項１から４の何れかに記載の自律移動ロボットの動作計画装置で決まった動作計画に基づき、不定の流速の下で上記出発点に位置する上記自律移動ロボットを上記到達点に到達するように制御する自律移動ロボットの動作制御装置において、
上記想定される流速と、流速の実測値との差である流速差を求める流速差取得部と、
上記自律移動ロボットが各行動を取った場合の遷移先の状態を、上記自律移動ロボットの位置を上記流速差の分だけ移動させることにより求める遷移先予測部と、
上記遷移先予測部が求めた遷移先の状態についての上記第二指標を互いに比較して、上記到達点に最も到達しやすい行動を決定する行動決定部と、
上記自律移動ロボットが上記決定された行動に従って移動するように、上記自律移動ロボットを制御する制御部と、
を含む自律移動ロボットの動作制御装置。 The autonomous mobile robot positioned at the starting point is reached at the arrival point under an indefinite flow velocity based on the action plan determined by the action planning apparatus for the autonomous mobile robot according to any one of claims 1 to 4. In the autonomous mobile robot motion control device that controls
A flow velocity difference acquisition unit for obtaining a flow velocity difference that is a difference between the assumed flow velocity and an actual measurement value of the flow velocity;
A transition destination prediction unit for obtaining a state of a transition destination when the autonomous mobile robot takes each action by moving the position of the autonomous mobile robot by the amount of the flow velocity difference; and
Comparing the second index for the state of the transition destination obtained by the transition destination prediction unit with each other, and determining an action that is most likely to reach the reaching point;
A control unit that controls the autonomous mobile robot so that the autonomous mobile robot moves according to the determined behavior;
Control device for autonomous mobile robot including

不定の流速の流れがある流体の中で出発点に位置する自律移動ロボットを到達点に到達させるための動作計画を行う自律移動ロボットの動作計画方法において、
第一状態遷移確率計算部が、想定される流速よりも速い流速の下で、各状態にある上記自律移動ロボットが各行動を取った場合に各状態に遷移する第一状態遷移確率を計算する第一状態遷移確率計算ステップと、
第一報酬決定部が、各状態にある上記自律移動ロボットが各行動を取り各状態に遷移する場合に得られる第一報酬を、障害物を含む状態に遷移する場合に得られる第一報酬が障害物を含まない状態に遷移する場合に得られる第一報酬よりも低く又は高くなるように定める第一報酬決定ステップと、
第一動的計画部が、上記第一状態遷移確率及び上記第一報酬を用いて、マルコフ決定過程における動的計画法に基づき、上記自律移動ロボットの障害物へのぶつかりやすさを表す第一指標を状態ごとに求める第一動的計画ステップと、
第二状態遷移確率計算部が、上記想定される流速の下で、各状態にある上記自律移動ロボットが各行動を取った場合に各状態に遷移する第二状態遷移確率を計算する第二状態遷移確率計算ステップと、
第二報酬決定部が、各状態にある上記自律移動ロボットが各行動を取り各状態に遷移する場合に得られる第二報酬を、遷移先の状態についての上記第一指標に応じて定めると共に、上記到達点を含む状態に遷移する場合に得られる第二報酬が最も高く又は低くなるように定める第二報酬決定ステップと、
第二動的計画部が、上記第二状態遷移確率及び上記第二報酬を用いて、マルコフ決定過程における動的計画法に基づき、上記自律移動ロボットの上記到達点への到達しやすさを表す第二指標を状態ごとに求める第二動的計画ステップと、
を含む自律移動ロボットの動作計画方法。 In the operation planning method of an autonomous mobile robot that performs an operation plan for causing an autonomous mobile robot located at a starting point to reach a destination in a fluid having an indefinite flow velocity,
The first state transition probability calculation unit calculates a first state transition probability that transitions to each state when the autonomous mobile robot in each state takes each action under a flow velocity that is faster than the assumed flow velocity. A first state transition probability calculation step;
The first reward obtained when the first reward determination unit makes a transition to a state including an obstacle is the first reward obtained when the autonomous mobile robot in each state takes each action and transitions to each state. A first reward determination step that is determined to be lower or higher than the first reward obtained when transitioning to an obstacle-free state;
The first dynamic planning unit uses the first state transition probability and the first reward to express the ease of collision of the autonomous mobile robot with an obstacle based on the dynamic programming method in the Markov decision process. A first dynamic planning step for determining an indicator for each state;
A second state in which a second state transition probability calculation unit calculates a second state transition probability that transitions to each state when the autonomous mobile robot in each state takes each action under the assumed flow velocity A transition probability calculation step;
The second reward determination unit determines the second reward obtained when the autonomous mobile robot in each state takes each action and transitions to each state according to the first index for the transition destination state, A second reward determination step for determining that the second reward obtained when transitioning to a state including the reaching point is highest or lower;
The second dynamic planning unit uses the second state transition probability and the second reward to represent the ease of reaching the destination of the autonomous mobile robot based on the dynamic programming method in the Markov decision process. A second dynamic planning step for obtaining a second index for each state;
Planning method for autonomous mobile robots.

請求項６に記載の自律移動ロボットの動作計画方法において、
傾斜角差計算部が、地形モデルを参照して、上記マルコフ状態遷移モデルのある状態における地形の傾斜角と、その状態においてある行動を選択した後の遷移先の状態における地形の傾斜角との差の絶対値を計算する傾斜角差計算ステップと、
登坂角度計算部が、想定される流速よりも速い流速の下で、上記ある状態から上記遷移先の状態に遷移するときの、上記自律移動ロボットの最大登坂角度を計算する登坂角度計算ステップと、
障害物判断部が、上記傾斜角の差の絶対値と上記最大登坂角度とを比較して、上記傾斜角の差の絶対値が大きければ、その遷移先の状態は障害物を含むと判断する障害物判断ステップと、
を更に含むことを特徴とする自律移動ロボットの動作計画方法。 The operation planning method for an autonomous mobile robot according to claim 6,
The slope difference calculation unit refers to the terrain model, and calculates the slope angle of the terrain in the state with the Markov state transition model and the slope angle of the terrain in the transition destination state after selecting an action in the state. An inclination angle difference calculating step for calculating an absolute value of the difference,
A climbing angle calculation step for calculating a maximum climbing angle of the autonomous mobile robot when the climbing angle calculation unit transitions from the certain state to the transition destination state under a flow rate faster than an assumed flow rate;
The obstacle determination unit compares the absolute value of the difference in inclination angle with the maximum climbing angle, and determines that the transition destination state includes an obstacle if the absolute value of the difference in inclination angle is large. Obstacle judgment step,
An operation planning method for an autonomous mobile robot, further comprising:

請求項６又は７に記載の自律移動ロボットの動作計画方法で決まった動作計画に基づき、不定の流速の下で上記出発点に位置する上記自律移動ロボットを上記到達点に到達するように制御する自律移動ロボットの動作制御方法において、
流速差取得部が、上記想定される流速と、流速の実測値との差である流速差を求める流速差取得ステップと、
遷移先予測部が、上記自律移動ロボットが各行動を取った場合の遷移先の状態を、上記自律移動ロボットの位置を上記流速差の分だけ移動させることにより求める遷移先予測ステップと、
行動決定部が、上記遷移先予測部が求めた遷移先の状態についての上記第二指標を互いに比較して、上記到達点に最も到達しやすい行動を決定する行動決定ステップと、
制御部が、上記自律移動ロボットが上記決定された行動に従って移動するように、上記自律移動ロボットを制御する制御ステップと、
を含む自律移動ロボットの動作制御方法。 The autonomous mobile robot positioned at the starting point is controlled to reach the arrival point under an indefinite flow velocity based on the action plan determined by the action planning method for the autonomous mobile robot according to claim 6 or 7. In the operation control method of an autonomous mobile robot,
The flow rate difference acquisition unit obtains a flow rate difference that is a difference between the assumed flow rate and the actual measurement value of the flow rate; and
A transition destination prediction step in which the transition destination prediction unit obtains the state of the transition destination when the autonomous mobile robot takes each action by moving the position of the autonomous mobile robot by the amount of the flow velocity difference;
An action determining unit that compares the second index for the state of the transition destination obtained by the transition destination prediction unit with each other and determines an action that is most likely to reach the reaching point;
A control unit that controls the autonomous mobile robot such that the autonomous mobile robot moves according to the determined behavior;
Control method of autonomous mobile robot including

請求項１から５の何れかに記載の自律移動ロボットの動作計画装置の各部をコンピュータに実行させるための自律移動ロボットの動作計画プログラム。 An operation planning program for an autonomous mobile robot for causing a computer to execute each part of the operation planning apparatus for an autonomous mobile robot according to any one of claims 1 to 5.

請求項９記載の自律移動ロボットの動作計画プログラムを記録したコンピュータ読み取り可能な記録媒体。 A computer-readable recording medium in which the operation planning program for an autonomous mobile robot according to claim 9 is recorded.