JP2008052473A

JP2008052473A - Operation control method and device for underwater robot, program and its recording medium

Info

Publication number: JP2008052473A
Application number: JP2006227431A
Authority: JP
Inventors: Hiroshi Kawano; 洋川野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2006-08-24
Filing date: 2006-08-24
Publication date: 2008-03-06
Anticipated expiration: 2026-08-24
Also published as: JP4495703B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a control method of an underactuated autonomous robot for requiring no information on the location of an obstacle in advance. <P>SOLUTION: This operation control method of an underwater robot includes: generating a value function without considering the location of an obstacle with an origin as a goal; searching the location of the obstacle each time a new obstacle is detected; generating a target orbit location in each time step until the underwater robot reaches the goal without colliding with the obstacle; evaluating the approach to the target orbit location based on numerical values when the underwater robot takes each action; giving a priority order to each action according to the evaluation value; calculating the probability of the underwater robot to collide the obstacle when the underwater robot takes each action; deciding whether or not the probability of the underwater robot to collide with the obstacle when it takes such an action in the descending order of the priority is smaller than a fixed threshold; repeating processing to select the action until the action is selected when decided that the probability of the underwater robot to collide with the obstacle is smaller; and controlling the underwater robot to follow the selected action. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、未知障害物の存在する環境において、劣駆動型自律ロボットに実時間での障害物回避の行動をさせる動作制御方法、装置、プログラム及びその記録媒体に関する。 The present invention relates to an operation control method, apparatus, program, and recording medium for causing an underactuated autonomous robot to perform obstacle avoidance behavior in real time in an environment where an unknown obstacle exists.

従来、以下に説明するマルコフ状態遷移モデルを利用した自律ロボットの動作計画法が提案されている（例えば、非特許文献１参照。）。
マルコフ状態遷移モデルを利用した自律ロボットの動作計画法を説明する前に、まず、前提知識となるマルコフ状態遷移モデル及びマルコフ状態遷移モデルを用いた動作計画法について説明する。 Conventionally, an autonomous robot motion planning method using a Markov state transition model described below has been proposed (see Non-Patent Document 1, for example).
Before describing the autonomous robot motion planning method using the Markov state transition model, first, the Markov state transition model and the motion planning method using the Markov state transition model as prerequisite knowledge will be described.

環境を以下のようにモデル化したものが、マルコフ状態遷移モデルである。環境のとりうる離散的な状態の集合をＳ＝｛ｓ _１，ｓ_２，…，ｓ_ｎ｝、行動主体が取り得る行動の集合をＡ＝｛ａ_１，ａ_２，…ａ_ｌ｝と表す。環境中のある状態 s ∈ Ｓにおいて、行動主体がある行動 a を実行すると、環境は確率的に状態 s' ∈Ｓへ遷移する。その遷移確率を

Ｐ^ａ _ｓｓ’＝Ｐｒ｛ｓ_ｔ＋１＝ｓ’｜ｓ_ｔ＝ｓ，ａ_ｔ＝ａ｝

により表す。このとき環境から行動主体へ報酬ｒが確率的に与えられるが、その期待値を

Ｒ^ａ _ｓｓ’＝Ｅ｛ｒ_t ｜ｓ_ｔ=ｓ, ａ_ｔ＝ａ，s_ｔ＋１＝s’｝

とする。行動主体の各時刻における意志決定は方策関数

π（ｓ，ａ）＝Ｐｒ｛ａ_ｔ＝ａ｜ｓ_ｔ＝ｓ｝

によって表される。π（ｓ，ａ）は、全状態ｓ，全行動ａにおいて定義される。方策関数π（ｓ，ａ）は、単に方策π とも呼ばれる。 The Markov state transition model is a model of the environment as follows. Possible discrete _S = the set of states of the environment _{{s 1, s 2, ...} , s n}, action _A = the set of entities can take action _{{a 1, a 2, ...} a l} expressed as . When an action subject executes an action a in a state s ∈ S in the environment, the environment probabilistically transitions to the state s ′ ∈ S. The transition probability

P ^a _{ss ′} = Pr {s _{t + 1} = s ′ | s _t = s, a _t = a}

Is represented by At this time, the reward r is given probabilistically from the environment to the action subject, but the expected value is

R ^a _{ss ′} = E {r _t | s _t = s, a _t = a, _{st +1} = s ′}

And Decision making at each time of action subject is a policy function

π (s, a) = Pr {a _t = a | s _t = s}

Represented by π (s, a) is defined in all states s and all actions a. The policy function π (s, a) is also simply referred to as policy π.

ある時間ステップｔで選択した行動が、その後の報酬獲得にどの程度貢献したのかを評価するため、その後得られる報酬の時系列を考える。報酬の時系列評価は価値Ｑと呼ばれる。行動主体の目標は、価値Ｑを最大化することである。価値は、時間の経過とともに報酬を割引率γ（０≦γ＜１）で割引いて合計される。
すべての状態ｓ、行動ａ、遷移先の状態ｓ’の組み合わせについてのＰ^ａ _ｓｓ’とＲ^ａ _ｓｓ’の値が定まっていれば、ダイナミックプログラミング（Dynamic Programminｇ）法により、価値関数Ｑ（ｓ，ａ）及び方策πを計算することができる（例えば、三上貞芳、皆川雅章共訳、R.S.Sutton、A.G.Barto 原著「強化学習」森北出版、1998、pp.94-118参照）。
従来例による動作計画法の行動主体は、図１７に例示される水中ロボット１である。図１７は、上から見た水中ロボット１の概念図である。水中ロボット１は、舵８、主推進器３、海流差計測部６、位置計測部７を有する。この水中ロボット１は、真横方向に直接移動することができない。すなわち、搭載アクチュエータである舵８、主推進器３が制御可能な運動自由度よりも、水中ロボットの運動自由度が高いので、この水中ロボット１は劣駆動ロボットである。 In order to evaluate how much the action selected at a certain time step t contributed to the subsequent reward acquisition, a time series of rewards obtained thereafter is considered. The time series evaluation of reward is called value Q. The goal of the action subject is to maximize the value Q. The value is totaled by discounting the reward with a discount rate γ (0 ≦ γ <1) over time.
If the values of P ^a _{ss ′} and R ^a _{ss ′} for all combinations of state s, action a, and transition destination state s ′ are determined, the value function Q (s, a) and policy π can be calculated (see, for example, Sadayoshi Mikami, Masaaki Minagawa co-translation, RSSutton, AGBarto, “Reinforcement Learning” Morikita Publishing, 1998, pp.94-118).
The action subject of the motion planning method according to the conventional example is the underwater robot 1 illustrated in FIG. FIG. 17 is a conceptual diagram of the underwater robot 1 viewed from above. The underwater robot 1 includes a rudder 8, a main propelling device 3, a sea current difference measurement unit 6, and a position measurement unit 7. The underwater robot 1 cannot move directly in the lateral direction. That is, the underwater robot 1 is a poorly driven robot because the underwater robot has a higher degree of freedom of movement than the rudder 8 and the main propelling device 3 that can be controlled.

以下、図１３を参照して、従来例による水中ロボットの動作制御方法を実現するための装置の機能構成例及びその処理例について説明する。図１３は、従来例による水中ロボットの動作制御装置の構成例を示す図である。
水中ロボットの動作計画装置１００は、最大加速度設定部１１２、目標速度計算部１０２、想定海流速度値入力部１０３、変位量計算部１０４、状態遷移確率計算部１０５、地形モデル保存部１０６、傾斜角差計算部１０７、登坂角度計算部１０８、報酬決定部１０９、動作計画部１１０、価値関数保存部１１１、海流差計測部６、位置計測部７、遷移先予測部２１３、価値関数値計算部２１４、動作選択部２１５、フィードバック制御部２１６から構成される。
目標速度計算部１０２が、マルコフ状態遷移モデルの各状態ｓにおいてある行動ａを取ったときの目標速度を決定する。この実施例では、水中ロボット目標速度は、対水機体座標に基づいて、かつ、飛行船の最大加速度を考慮して設定される。また、目標速度は、例えば、劣駆動水中ロボットの旋回速度ψ^’ _τ（ｔ）と飛行船の機軸方向の速度ｖ_ｘｗτ（ｔ）（以下、サージ速度とする）である。 Hereinafter, with reference to FIG. 13, an example of a functional configuration of an apparatus for realizing an underwater robot motion control method according to a conventional example and a processing example thereof will be described. FIG. 13 is a diagram illustrating a configuration example of a motion control apparatus for an underwater robot according to a conventional example.
The underwater robot motion planning device 100 includes a maximum acceleration setting unit 112, a target velocity calculation unit 102, an assumed ocean current velocity value input unit 103, a displacement amount calculation unit 104, a state transition probability calculation unit 105, a terrain model storage unit 106, an inclination angle. Difference calculation unit 107, climbing angle calculation unit 108, reward determination unit 109, motion planning unit 110, value function storage unit 111, ocean current difference measurement unit 6, position measurement unit 7, transition destination prediction unit 213, value function value calculation unit 214 , An operation selection unit 215, and a feedback control unit 216.
The target speed calculation unit 102 determines a target speed when taking action a in each state s of the Markov state transition model. In this embodiment, the underwater robot target speed is set based on the airframe coordinates and taking into account the maximum acceleration of the airship. The target speed is, for example, the turning speed ψ ^′ _τ (t) of the under-actuated underwater robot and the speed v _xwτ (t) in the axis direction of the airship (hereinafter referred to as the surge speed).

水中ロボット１が航行する空間はマルコフ遷移状態モデルにより離散的にモデル化されており、その状態ｓは、水中ロボット１の水平面内位置のＸ座標，Ｙ座標，方位角ψ，旋回速度ψ’の４つのパラメータ・軸から構成される。各軸は、水中ロボットに搭載可能なセンサーの精度を考慮して離散化されている。例えば、状態ｓを構成する格子の一辺の長さが５〜１０ｍになるように、Ｘ軸方向，Ｙ軸方向に離散化される。また、格子の一辺の長さが、水中ロボットの大きさとほぼ同じになるように、Ｘ軸方向，Ｙ軸方向に離散化しても良い。方位角Ψは、５〜１０度ごとに離散化される。旋回速度Ψ’は、１度／秒ごとに離散化される。
例えば、目標速度計算部１０２は、最大加速度設定部１１２において予め設定された水中ロボット１の旋回加速度αを読み出し、各行動aについて、下記の式に従って水中ロボット１の旋回速度ψ^’ _τ（ｔ）とサージ速度ｖ_ｘｗτ（ｔ）を水中ロボット１の目標速度として定め、変位量計算部１０４に出力する。 The space in which the underwater robot 1 navigates is discretely modeled by a Markov transition state model, and the state s indicates the X coordinate, Y coordinate, azimuth angle ψ, and turning speed ψ ′ of the position in the horizontal plane of the underwater robot 1. It consists of four parameters and axes. Each axis is discretized in consideration of the accuracy of sensors that can be mounted on the underwater robot. For example, it is discretized in the X-axis direction and the Y-axis direction so that the length of one side of the lattice constituting the state s is 5 to 10 m. Further, the length of one side of the lattice may be discretized in the X-axis direction and the Y-axis direction so that the size of the underwater robot is substantially the same. The azimuth angle Ψ is discretized every 5 to 10 degrees. The turning speed ψ ′ is discretized every 1 degree / second.
For example, the target speed calculation unit 102 reads the turn acceleration α of a preset underwater robot 1 in the maximum acceleration setting unit 112, for each action a, the rotation speed of the underwater robot 1 according to the following equation ψ ^'τ _(t) And the surge speed v _xwτ (t) are determined as the target speed of the underwater robot 1 and output to the displacement amount calculation unit 104.

ここで、ψ^’ _τ（ｔ）は水中ロボット１の目標旋回速度、ａはマルコフ状態遷移モデルの各状態ｓにおける行動、αは水中ロボット１の旋回加速度、ｔは各行動aの開始時からの経過時間、ψ^’ _τ０は各状態ｓの開始時における水中ロボットの旋回速度、ｖ_ｘｗτ（ｔ）は水中ロボット１の目標対水機体速度（以下、目標サージ速度とする）、ｖ_ｘ０は正の一定値とした場合である。ここで、αは、水中ロボットの最大加速度を超えないように設定される。また、この実施例では、計算量を少なくするためにｖ_ｘ０を正の一定値として扱ったが、より精度の高い動作計画を行うためにｖ_ｘ０の値を行動として選択することができるようにしても良い。
この実施例では、説明の簡略化のために、行動ａが−１，０，１の何れかの値を取るとし、行動ａが１の場合には、上記式（２）に従い、水中ロボットの旋回速度ψ’は一定加速度αで加速され、行動ａが０の場合には、水中ロボットの旋回速度ψ’は維持され、行動ａが−１の場合には、水中ロボットの旋回速度ψ’は一定加速度αで減速されるものとする。

Here, ψ ^′ _τ (t) is the target turning speed of the underwater robot 1, a is the action in each state s of the Markov state transition model, α is the turning acceleration of the underwater robot 1, and t is the time from the start of each action a. Elapsed time, ψ ^′ _τ0 is the turning speed of the underwater robot at the start of each state s, v _xwτ (t) is the target-to- _watercraft speed of the underwater robot 1 (hereinafter referred to as the target surge speed), and v _x0 is positive This is the case where the value is constant. Here, α is set so as not to exceed the maximum acceleration of the underwater robot. Further, in this embodiment, v _x0 is treated as a positive constant value in order to reduce the amount of calculation, but the value of v _x0 can be selected as an action in order to perform a more accurate motion plan. May be.
In this embodiment, for simplification of explanation, it is assumed that the action a takes a value of -1, 0, or 1. When the action a is 1, the underwater robot is in accordance with the above formula (2). The turning speed ψ ′ is accelerated at a constant acceleration α. When the action a is 0, the turning speed ψ ′ of the underwater robot is maintained, and when the action a is −1, the turning speed ψ ′ of the underwater robot is It is assumed that the vehicle is decelerated at a constant acceleration α.

例えば、時刻０の時の状態において行動として１を選択し、時刻Ｔの状態において行動として０を選択し、時刻２Ｔの時の状態において行動１、時刻３Ｔの時の状態において行動１、時刻４Ｔの時の状態において行動０、時刻５Ｔの時の状態において行動−１、時刻６Ｔの時の状態において行動−１、時刻７Ｔの時の状態において１を選択した場合には、水中ロボットの旋回速度ψ’は、図１２に示すように変化する。
なお、この行動ａの例は一例であることに注意する。すなわち、ａは、−１０，０，１０の値を取っても良いし、−１，―０．５，０，０．５，１の５つの値等を取っても良い。さらには、各行動時間中に水中ロボットの速度・加速度が変わるような行動を選択することができるようにしても良い。
このように、目標軌道や対地世界座標に基づく目標速度ではなく、対水機体座標に基づく目標速度の形で、目標となる行動を水中ロボット１に与えることにより、未知海流の外乱等により水中ロボットの水平面内の位置がずれた場合であっても、動作計画が破綻しないというメリットがある。すなわち、もし、目標軌道や対地世界座標で目標速度を設定した場合、任意環境において未知の強い海流の外乱があったときに、水中ロボットのアクチュエータ力では海流に対抗しきれず、目標軌道又は対地世界座標で表現された目標速度を大きく外れて航行してしまう可能性があった。本発明においては、上記のように、対水機体座標で目標速度を記述し、それを追従することで、海流による追従制御への影響を最小限とすることができる。また、これにより、海流外乱による影響を水中ロボット１の位置変化の誤差にのみ現れるようにすることができるのである。 For example, 1 is selected as the action at the time 0 time, 0 is selected as the action at the time T, the action 1 is at the time 2T, the action 1 is at the time 3T, and the time 4T is at the time 3T. If the action 0 is selected in the state at the time 5; the action 1 is selected in the state at the time 5T; the action 1 is selected in the state at the time 6T; and 1 is selected in the state at the time 7T. ψ ′ changes as shown in FIG.
Note that this example of action a is an example. That is, a may take a value of -10, 0, 10 or may take five values of -1, -0.5, 0, 0.5, 1 and the like. Furthermore, an action that changes the speed and acceleration of the underwater robot during each action time may be selected.
In this way, by giving the target behavior to the underwater robot 1 in the form of the target speed based on the watercraft body coordinates instead of the target speed based on the target trajectory and the ground world coordinates, the underwater robot is caused by disturbance of an unknown ocean current or the like. Even if the position in the horizontal plane is shifted, there is an advantage that the operation plan does not fail. In other words, if the target velocity is set in the target trajectory or ground world coordinates, and there is an unknown strong ocean current disturbance in an arbitrary environment, the underwater robot actuator force cannot resist the ocean current, the target trajectory or ground world There was a possibility of navigating far from the target speed expressed in coordinates. In the present invention, as described above, the target speed is described in the coordinates against the watercraft body, and by following it, the influence on the tracking control by the ocean current can be minimized. This also allows the influence of ocean current disturbances to appear only in the position change error of the underwater robot 1.

また、水中ロボットは、その運動のイナーシャが高く、運動が海流による影響を受けやすいという性質があり、舵の切り角に対する旋回速度のステップ応答は遅く、舵を切って即、旋回速度が希望の値になることが望めないことがある。しかし、上記の式（２）により、水中ロボット１の加速度制限値を考慮した目標速度の設定をすることができるため、追従可能な動作計画を行うことができるのである。
変位量計算部１０４は、状態ｓにある水中ロボット１が、想定される海流下ｆ_ｘ、ｆ_ｙにおいて、各行動ａに従って航行した場合、水中ロボットの世界座標系における水平面内位置のＸ座標，Ｙ座標，方位角ψ，旋回速度ψ’がそれぞれどれくらい変位するのかを計算する。これらの変位を求めることにより、後述するように、状態ｓにいる水中ロボットが、行動ａを選択した場合に、次に、どの状態ｓ’にどのくらいの確率で遷移するのかを計算することができるのである。
水中ロボット１の水平面内位置のＸ座標の変位量をＤ_Ｘ（ψ_０，ａ）、Ｙ座標の変位量をＤ_Ｙ（ψ_０，ａ）、方位角ψの変位量をＤ_ψ（ψ_０，ａ）、旋回速度ψ’の変位量をＤ_ψ’（ψ_０，ａ）とすると、それぞれの変位量は、次式にように与えられる（図１４を参照のこと）。 In addition, underwater robots have the property that their inertia is high and their motion is easily affected by ocean currents, and the step response of the turning speed to the turning angle of the rudder is slow. Sometimes it can't be expected to be a value. However, since the target speed can be set in consideration of the acceleration limit value of the underwater robot 1 by the above equation (2), an action plan that can be followed can be performed.
Displacement amount calculation unit 104, underwater robot 1 in the state s is ocean currents under f _{x envisaged,} in f _y, when sailing in accordance each behavior a, X-coordinate of the horizontal plane the position in the world coordinate system of the underwater robot, The amount of displacement of the Y coordinate, azimuth angle ψ, and turning speed ψ ′ is calculated. By calculating these displacements, as will be described later, when the underwater robot in the state s selects the action a, it is possible to calculate to what state s ′ the probability of the next transition. It is.
The displacement amount of the X coordinate at the position in the horizontal plane of the underwater robot 1 is D _X (ψ ₀ , a), the displacement amount of the Y coordinate is D _Y (ψ ₀ , a), and the displacement amount of the azimuth angle ψ is D _ψ (ψ ₀ , A), where the displacement amount of the turning speed ψ ′ is D _{ψ ′} (ψ ₀ , a), the respective displacement amounts are given by the following equations (see FIG. 14).

ここで、ψ_０は各状態ｓの開始時の方位角、Ｔは状態ｓから次の状態ｓ’に遷移するまでの時間（以下、行動単位時間とする）、ｆ_ｘは想定される海流の速度の世界座標系におけるＸ軸方向の成分、ｆ_ｙは想定される海流の速度の世界座標系におけるＹ軸方向の成分である。なお、方位角ψの変位量Ｄ_ψ（ψ_０，ａ）と、旋回速度ψ’の変位量Ｄ_ψ’（ψ_０，ａ）については、旋回速度ψ’の制御を行うことになるため、風の影響による補正は行わない。行動単位時間は例えば１５秒とすることができる。
変位量計算部１０４は、目標速度計算部が出力した各時刻ｔにおける水中ロボットの旋回速度ψ’_τとサージ速度ｖ_ｘｗτ、想定海流速度値入力部で入力されたｆ_ｘとｆ_ｙを用いて、水中ロボットの水平面内位置のＸ座標の変位量Ｄ_Ｘ（ψ_０，ａ）、Ｙ座標の変位量Ｄ_Ｙ（ψ_０，ａ）、方位角ψの変位量Ｄ_ψ（ψ_０，ａ）、旋回速度ψ’の変位量Ｄ_ψ’（ψ_０，ａ）を上記式に基づいてそれぞれ計算して、その計算結果を状態遷移確率計算部１０５に出力する。

Here, [psi ₀ azimuth at the start of each state s, the time T from the state s to transition to the next state s' (hereinafter referred to as action unit time), f _x is the ocean currents envisaged A component of velocity in the X-axis direction in the world coordinate system, and _fy is a component in the Y-axis direction of the assumed ocean current velocity in the world coordinate system. Since the displacement amount D _ψ (ψ ₀ , a) of the azimuth angle ψ and the displacement amount D _{ψ ′} (ψ ₀ , a) of the turning speed ψ ′, the turning speed ψ ′ is controlled. No correction is made due to wind effects. The action unit time can be set to 15 seconds, for example.
Displacement amount calculation unit 104, the target speed calculation unit swing speed [psi _'tau and the surge velocity v _Xwtau underwater robot at each time t of the output, with f _x and f _y which is input on the assumption ocean current velocity value input unit , X-axis displacement D _X (ψ ₀ , a), Y-coordinate displacement D _Y (ψ ₀ , a), azimuth angle ψ displacement D _ψ (ψ ₀ , a) The displacement amount D _{ψ ′} (ψ ₀ , a) of the turning speed ψ ′ is calculated based on the above formula, and the calculation result is output to the state transition probability calculation unit 105.

状態遷移確率計算部１０５は、水中ロボット１の水平面内位置のＸ座標の変位量Ｄ_Ｘ（ψ_０，ａ）、Ｙ座標の変位量Ｄ_Ｙ（ψ_０，ａ）、方位角ψの変位量Ｄ_ψ（ψ_０，ａ）、旋回速度ψ’の変位量Ｄ_ψ’（ψ_０，ａ）に基づいて、状態遷移確率Ｐ^ａ _ｓｓ’を計算する。すなわち、これらの変位量を用いて、すべての状態ｓと状態ｓ’と行動ａの組み合わせについて、状態ｓにいる水中ロボットが行動ａを選択した場合に、状態ｓ’へ遷移する確率を計算する。
まず、状態ｓが、水中ロボット１の水平面内位置のＸ座標，Ｙ座標，方位角ψ，旋回速度ψ’の４つの軸から構成される４次元の格子で示されるとし、その格子をＲ（ｓ）と定義する（図１５を参照のこと）。そして、その格子Ｒ（ｓ）を、上記各変位量から構成される変位量ベクトル（Ｄ_Ｘ（ψ_０，ａ），Ｄ_Ｙ（ψ_０，ａ），Ｄ_ψ（ψ_０，ａ），Ｄ_ψ’（ψ_０，ａ））で、平行移動したものをＲ_ｔ（ｓ）と定義する。
ここで、水中ロボット１が状態ｓにあるときは、水中ロボット１は、その状態ｓを表わす４次元の格子Ｒ（ｓ）の各点の何れかに、等しい確率で存在するものと仮定する。この仮定の下では、状態遷移確率Ｐ^ａ _ｓｓ’は、Ｒ_ｔ（ｓ）と各Ｒ（ｓ’）の重なった部分の体積に比例してそれぞれ求めることができる。ここで、Ｒ（ｓ’）は、Ｒ_ｔ（ｓ）と重なった格子である。すなわち、Ｒ（ｓ’）は、状態ｓにおいてある行動ａを取ったときの遷移先の候補の状態ｓ’に対応した４次元の格子である。Ｒ_ｔ（ｓ）は最大で８つのＲ（ｓ’）と重なる可能性がある。 The state transition probability calculation unit 105 includes an X-coordinate displacement amount D _X (ψ ₀ , a), a Y-coordinate displacement amount D _Y (ψ ₀ , a), and an azimuth angle ψ displacement amount in the horizontal plane of the underwater robot 1. _{_{D ψ (ψ 0, a)}} , based on the turning speed [psi 'displacement of _{_{D ψ' (ψ 0, a}} ), to calculate the state transition probability ^P _{a ss'.} That is, using these displacement amounts, for all combinations of state s, state s ′, and action a, the probability of transition to state s ′ when the underwater robot in state s selects action a is calculated. .
First, it is assumed that the state s is represented by a four-dimensional grid composed of four axes of the X coordinate, the Y coordinate, the azimuth angle ψ, and the turning speed ψ ′ of the position in the horizontal plane of the underwater robot 1, and the lattice is represented by R ( s) (see FIG. 15). Then, the lattice R (s) is converted into displacement vector (D _X (ψ ₀ , a), D _Y (ψ ₀ , a), D _ψ (ψ ₀ , a), D) composed of the respective displacements. _{ψ ′} (ψ ₀ , a)) is defined as R _t (s).
Here, when the underwater robot 1 is in the state s, it is assumed that the underwater robot 1 exists with an equal probability at any point of the four-dimensional lattice R (s) representing the state s. Under this assumption, the state transition probability P ^a _{ss ′} can be obtained in proportion to the volume of the overlapping portion of R _t (s) and each R (s ′). Here, R (s ′) is a lattice overlapping with R _t (s). That is, R (s ′) is a four-dimensional lattice corresponding to the transition destination candidate state s ′ when the action a in the state s is taken. R _t (s) may overlap with up to 8 R (s ′).

状態遷移確率Ｐ^ａ _ｓｓ’は、Ｒ_ｔ（ｓ）とあるＲ（ｓ’）の重なった部分の体積をＶ_０（ｓ，ｓ’，ａ）、Ｒ_ｔ（ｓ）とすべてのＲ（ｓ’）との重なった部分の体積をΣ_ｓ’Ｖ_０（ｓ，ｓ’，ａ）とすると、次式により求めることができる。 The state transition probability P ^a _{ss ′} is the volume of the overlapping portion of R _t (s) and a certain R (s ′) as V ₀ (s, s ′, a), R _t (s) and all R (s If the volume of the portion overlapping with ') is Σ _s' V ₀ (s, s', a), it can be obtained by the following equation.

このように状態遷移確率Ｐ^ａ _ｓｓ’を計算することにより、最大８つの遷移先の状態ｓ’についてのみ０でない状態遷移確率Ｐ^ａ _ｓｓ’を与え、他のすべての状態ｓ’については０の状態遷移確率Ｐ^ａ _ｓｓ’を与えることになるため、後述するダイナミックプログラミング（ＤＰ）法による計算量を削減することができる。
傾斜角差計算部１０７が、地形モデル保存部１０６に保存された地形モデルを参照して、状態ｓにおける地形の傾斜角θ_{ｓｔｅｅｐ}（ｓ）と、その状態においてある行動ａを取ったときの遷移先の状態ｓ’における地形の傾斜角θ_{ｓｔｅｅｐ}（ｓ’）の差の絶対値ｄθ_{ｓｔｅｅｐ}（ｓ’，ｓ）を求める。計算されたｄθ_{ｓｔｅｅｐ}（ｓ’，ｓ）は、報酬決定部１０９に出力される。
すなわち、状態間の傾斜角差ｄθ_{ｓｔｅｅｐ}（ｓ’，ｓ）は下記の式により定義される（図１６を参照のこと）。

_'By calculating, up to eight destination state s' Thus the state transition probability P ^a _ss _'giving, all other state s' state transition probability P ^a _ss only non-zero for zero for Since the state transition probability P ^a _{ss ′} is given, the calculation amount by the dynamic programming (DP) method described later can be reduced.
Inclination angle difference calculation unit 107, with reference to the stored terrain model terrain model storage unit 106, a transition when the inclination angle of the terrain in the state _s θ _{steep (s),} took the action a with in its state previous state s' terrain in inclination angle theta _steep (s') the absolute value dθ _steep (s', s) of the difference between the seek. Calculated dθ _steep (s', s) is output to the compensation determining section 109.
That is, the inclination angle difference d [theta] _steep between states (s', s) is (see FIG. 16) that is as defined by the following formula.

地形モデルは、位置（Ｘ，Ｙ）と方位角のすべての組み合わせについての傾斜角のデータと、障害物の位置が登録されているデータベースである。水中ロボットの動作計画を行う前に、実際に水中ロボットを航行させる予定の地形についての地形モデルを予め取得し、地形モデル保存部１０６に格納しておく必要がある。
登坂角度計算部１０８が、ある状態ｓから別の状態ｓ’に遷移するときの水中ロボットの最大登坂角度ｄθ_ｍａｘ（ｓ，ｓ’）を計算する。計算された水中ロボットの最大登坂角度ｄθ_ｍａｘ（ｓ，ｓ’）は、報酬決定部１０９に出力される。
ｖ_ｚ（ｓ）を状態ｓにおけるピッチ角の変化速度、ａ_ｈを水中ロボットのピッチ角変化の加速度の最大値、ｆ_ｘｂを海流の機軸方向の対機体速度とし、登坂角度は限りなく小さく、上下方向には海流がないものとすると、水中ロボットの最大登坂角度ｄθ_ｍａｘ（ｓ，ｓ’）は、以下のように定義される。

The terrain model is a database in which tilt angle data for all combinations of position (X, Y) and azimuth and obstacle positions are registered. Before performing the operation plan of the underwater robot, it is necessary to obtain in advance a terrain model for the terrain on which the underwater robot is actually to be navigated and store it in the terrain model storage unit 106.
The uphill angle calculation unit 108 calculates the maximum uphill angle dθ _max (s, s ′) of the underwater robot when transitioning from one state s to another state s ′. The calculated maximum climb angle dθ _max (s, s ′) of the underwater robot is output to the reward determination unit 109.
V _z (s) is the change speed of the pitch angle in the state s, a _h is the maximum value of the acceleration of the change of the pitch angle of the underwater robot, and _fxb is the speed of the aircraft in the direction of the axis of the ocean current. Assuming that there is no ocean current in the vertical direction, the maximum climbing angle dθ _max (s, s ′) of the underwater robot is defined as follows.

報酬決定部１０９が、状態ｓ、行動ａ、遷移先の状態ｓ’の各組み合わせについて、それぞれ、水中ロボットの最大登坂角度ｄθ_ｍａｘ（ｓ，ｓ’）と状態間の傾斜角差ｄθ_{ｓｔｅｅｐ}（ｓ’，ｓ）の大小関係を調べて、報酬を決定する。
具体的には、報酬決定部１０９は、状態間の傾斜角差ｄθ_{ｓｔｅｅｐ}（ｓ’，ｓ）の方が大きい場合には、その状態ｓ、行動ａ、遷移先の状態ｓ’各組み合わせについての報酬Ｒ^ａ _ｓｓ’を−１に設定する。状態間の傾斜角差ｄθ_{ｓｔｅｅｐ}（ｓ’，ｓ）の方が大きい場合には、水中ロボット１が、その遷移先の状態ｓ’に移行することは不可能であり、かかる遷移先の状態ｓは障害物とみなすことができるためである。
また、報酬決定部１０９は、遷移先の状態ｓ’が到達点を含む場合には報酬Ｒ^ａ _ｓｓ’を１に設定し、その他の場合には報酬Ｒ^ａ _ｓｓ’を０に設定する。
なお、報酬決定部１０９は、ｄθ_ｍａｘ（ｓ，ｓ’）と状態間の傾斜角差ｄθ_{ｓｔｅｅｐ}（ｓ’，ｓ）の比較をしないで、報酬を決定しても良い。具体的には、報酬決定部１０９は、地形モデル保存部１０６に保存された地形データを参照して、遷移先の状態ｓ’が到達点を含む場合には報酬Ｒ^ａ _ｓｓ’を１に設定し、遷移先の状態ｓ’が障害物を含む場合には報酬Ｒ^ａ _ｓｓ’を−１に設定し、その他の場合には報酬Ｒ^ａ _ｓｓ’を０に設定する。

Compensation determination unit 109, the state s, action a, 'for each combination of each, the maximum climbing angle d [theta] _max (s, s underwater robot' destination state s) inclination angle difference between the states and d [theta] _steep (s The reward is determined by examining the magnitude relationship of ', s).
Specifically, compensation determining section 109, the inclination angle difference d [theta] _steep between states _(s ', s) in the case who is large, the state s, action a, destination state s' for each combination Reward R ^a _{ss ′} is set to −1. Inclination angle difference between state d [theta] _{steep (s} ', s) in the case who is large, underwater robot 1, the destination state s' it is impossible to transition to, such a transition destination state s This is because it can be regarded as an obstacle.
Further, reward determination unit 109, the destination state s is set to 1 'reward R ^a _ss when including arrival _point', a reward R ^a _ss' is otherwise set to zero.
Incidentally, reward determination unit _{109, dθ max (s, s '} ) and the inclination angle difference d [theta] _steep between states (s', s) without the comparison may determine the compensation. Specifically, reward determination unit 109 refers to the terrain data stored in the terrain model storage unit 106, set to 1 the 'reward R ^a _ss if contains _goal' state s of the transition destination and, transition destination state s is set to -1 'reward R ^a _ss when including an _obstacle', and otherwise setting the reward R ^a _ss' to zero.

また、先に述べたように、この｛１，０，−１｝の何れかの報酬を与えるという決定方法は一例に過ぎず、到達点を含む場合の報酬＞その他の場合の報酬＞障害物である場合の報酬という関係が成り立っている限り、報酬の値はどのような値でも良い。
動作計画部１１０は、状態遷移確率計算部１０５が計算した状態遷移確率Ｐ^ａ _ｓｓ’と、報酬決定部１０９が計算した報酬Ｒ^ａ _ｓｓ’を用いて、ダイナミックプログラミング法により、価値関数Ｑ（ｓ，ａ）を計算し、これを価値関数保存部１１１に格納する。
先に述べたように、すべての状態ｓ、行動ａ、遷移先の状態ｓ’の組み合わせについてのＰ^ａ _ｓｓ’とＲ^ａ _ｓｓ’の値がそれぞれ定まっていれば、ダイナミックプログラミング（ＤｙｎａｍｉｃＰｒｏｇｒａｍｍｉｎｇ）法により、価値関数Ｑ（ｓ，ａ）を計算することができる。
Ｑ（ｓ，ａ）は、各状態ｓにおいて、行動ａを選択した結果、将来水中ロボットが受ける報酬の推定量を示すものであり、各状態ｓにおいて、Ｑ（ｓ，ａ）の値を最大化する行動ａを選択することが最適な行動方策となる。 Further, as described above, the determination method of giving a reward of any of {1, 0, -1} is merely an example, and reward when including a reaching point> reward in other cases> obstacle As long as the relationship of reward is established, the value of reward may be any value.
Operation plan unit 110 _'and the reward ^R _{a ss} to reward determination unit 109 calculates _"state transition probability calculation unit 105 is a state transition probability ^P _{a ss} calculated using, by the dynamic programming method, value function Q (s , A) is calculated and stored in the value function storage unit 111.
As described above, if the values of P ^a _{ss ′} and R ^a _{ss ′} for all combinations of the state s, the action a, and the transition destination state s ′ are determined, the dynamic programming method is used. Thus, the value function Q (s, a) can be calculated.
Q (s, a) indicates the estimated amount of reward that the future underwater robot will receive as a result of selecting the action a in each state s. The value of Q (s, a) is maximized in each state s. Selecting the action a to be converted is the optimum action policy.

以上に説明したＱ（ｓ，ａ）の値を利用して、実際の任務行動における自律水中ロボットの制御が行われる。すなわち、行動単位時間Ｔおきに、水中ロボットは、Ｑ（ｓ，ａ）の値を最大化する行動ａを選択する。ただし、この手法は、任務環境における海流速度ｆｘ、ｆｙが、動作計画時に想定していた場合と等しいときのみに利用することができる手法である。任意環境における海流速度が想定していた値と異なる場合には、以下に示す行動選択手法を利用する。
海流差計測部６が、各行動単位時間毎に、動作計画時に予測した海流の速度ｆ_ｘ，ｆ_ｙと、実際の海流の速度の実測値ｆ_ｘａ，ｆ_ｙａの差ｄｆ_ｘ，ｄｆ_ｙを計測する。ここで、ｄｆ_ｘ＝ｆ_ｘ−ｆ_ｘａ，ｄｆ_ｙ＝ｆ_ｙ−ｆ_ｙａである。計測された海流の速度差ｄｆ_ｘ，ｄｆ_ｙは、遷移先予測部２１３に出力される。
位置計測部７が、各行動単位時間毎に、水中ロボットの位置Ｘ，Ｙ、方位角ψ、旋回速度ψ’を計測する。計測結果は、遷移先予測部２１３に出力される。
遷移先予測部２１３が、海流差計測部６が出力した海流の速度差ｄｆ_ｘ，ｄｆ_ｙと、位置計測部７が出力した水中ロボットの位置Ｘ，Ｙ、方位角ψ、旋回速度ψ’を用いて、状態ｓにいる水中ロボットが、各行動ａを取った場合に、次にどの状態に遷移するのかを予測し、その予測される遷移先の状態ｓ_ｅをそれぞれ求める。各行動ａごとに求められた予想される遷移先の状態ｓ_ｅは、価値関数値計算部２１４に出力される。
具体的には、遷移先予測部２１３はまず、ある行動ａについて、 The autonomous underwater robot is controlled in the actual mission behavior using the value of Q (s, a) described above. That is, at every action unit time T, the underwater robot selects an action a that maximizes the value of Q (s, a). However, this method is a method that can be used only when the ocean current speeds fx and fy in the mission environment are equal to those assumed when the operation is planned. If the ocean current velocity in an arbitrary environment is different from the expected value, the action selection method shown below is used.
The ocean current difference measurement unit 6 calculates the difference df _x , df _y between the ocean current speeds f _x , f _y predicted at the time of operation planning and the actual current speed values f _xa , f _ya for each action unit time. measure. Here, df _x = f _x −f _xa and df _y = f _y −f _ya . The measured ocean current velocity differences df _x and df _y are output to the transition destination prediction unit 213.
The position measuring unit 7 measures the position X and Y of the underwater robot, the azimuth angle ψ, and the turning speed ψ ′ for each action unit time. The measurement result is output to the transition destination prediction unit 213.
Transition destination prediction unit 213, the speed difference between the currents that currents difference measuring unit 6 has output df _x, and df _y, the position X of the underwater robot position measurement section 7 has output, Y, azimuth [psi, the turning speed [psi ' used, underwater robot being in state s is, when taking each behavior a, to predict what next to a transition in which state, obtains the predicted transition destination state s _e respectively. The expected transition destination state s _e obtained for each action a is output to the value function value calculation unit 214.
Specifically, the transition destination predicting unit 213 first determines a certain action a.

上記式（３）を用いて、海流の速度差ｄｆ_ｘを考慮したときの水中ロボットのＸ軸方向の変位量Ｄ_Ｘａ（ψ_０，ａ）、海流の速度差ｄｆ_ｙを考慮したときの水中ロボットのＹ軸方向の変位量Ｄ_Ｙａ（ψ_０，ａ）をそれぞれ求める。その後、

上記式（４）を用いて、その行動ａを取ったときの予想遷移先状態ｓ_ｅを求める。ここで、Ｘ_ｅ（ｓ，ａ），Ｙ_ｅ（ｓ，ａ）、ψ_ｅ（ｓ，ａ）、ψ’_ｅ（ｓ，ａ）は、状態ｓにある水中ロボットがある行動ａを取ったときの予想される遷移先の状態ｓ_ｅの位置、方位角、旋回速度である。これを、すべての行動ａについて行い、各行動ａごとにそれぞれ予想される遷移先の状態ｓ_ｅを求める。各行動ａごとに求められた予想される遷移先の状態ｓ_ｅは、価値関数値計算部２１４に出力される。

Using the above equation (3), the amount of displacement D _Xa (ψ ₀ , a) in the X-axis direction of the underwater robot when considering the ocean current velocity difference df _x and the ocean current velocity difference df _y when considering the ocean current velocity difference df _y A displacement amount D _Ya (ψ ₀ , a) in the Y-axis direction of the robot is obtained. afterwards,

By using equation (4), determining the expected target state s _e when took that action a. Here, X _e (s, a), Y _e (s, a), ψ _e (s, a), and ψ ′ _e (s, a) took action a with an underwater robot in state s. position of the expected destination state s _e time, azimuth, a turning speed. This is performed for all the actions a, and the expected transition destination state s _e is obtained for each action a. The expected transition destination state s _e obtained for each action a is output to the value function value calculation unit 214.

価値関数値計算部２１４が、価値関数保存部１１１を参照して、遷移後の状態ｓ_ｅにおける行動ａ’を変化させたときのＱ（ｓ_ｅ，ａ’）の最大値Ｑｍａｘ（ｓ_ｅ（ａ））を、遷移前の状態ｓにおける各行動ａごとに求める。Ｑｍａｘ（ｓ_ｅ（ａ））は、動作選択部２１５に出力される。
動作選択部２１５が、価値関数値計算部２１４が計算したＱｍａｘ（ｓ_ｅ（ａ））の大小関係を比較して、Ｑｍａｘ（ｓ_ｅ（ａ））を最大にする遷移前状態ｓでの行動ａを選択する。そして、その行動により決定される目標速度を式（２）から再計算する。再計算された目標速度は、フィードバック制御部２１６に出力される。 Value function value calculation unit 214, with reference to the value function storage unit 111, the maximum value Qmax _(s e of 'Q when changing the _(s e, a' action a in state _{s e} after the transition) ( a)) is obtained for each action a in the state s before the transition. Qmax (s _e (a)) is output to the operation selection unit 215.
The action selection unit 215 compares the magnitude relationship of Qmax (s _e (a)) calculated by the value function value calculation unit 214, and the action in the pre-transition state s that maximizes Qmax (s _e (a)) Select a. Then, the target speed determined by the action is recalculated from the equation (2). The recalculated target speed is output to the feedback control unit 216.

フィードバック制御部２１６は、再計算された目標速度を維持することができるように、主推進器力、舵角度を調整する。
以上が、従来例による自律ロボットの動作制御方法の説明である（例えば、非特許文献５参照。）。
T. Yamasaki and N. Goto:“identification of Blimp Dynamics by Flight Tests”,Transactions of JSASS,Vol.43,pp.195-205,2003. T. Yamasaki and N. Goto:“identification of Blimp Dynamics by Flight Tests”,Transactions of JSASS,Vol.43,pp.195-205,2003. 中村仁彦「非ホロノミックロボットシステム第２回幾何学的な非ホロノミック拘束の下での運動計画」日本ロボット学会誌 Vol.11 No.5,pp.655〜662,1993 川野洋「未知不均一潮流中での航行を考慮した劣駆動水中ロボットの動作計画と制御」,JSAI2005,人工知能学会（第１９回）,1D1-04,2005年川野洋「未知外乱中を航行する自律飛行船の三次元障害物回避」、日本機械学会ロボティクスメカトロニクス講演会２００６講演予稿集、2006年5月26日 The feedback control unit 216 adjusts the main thruster force and the rudder angle so that the recalculated target speed can be maintained.
The above is description of the operation control method of the autonomous robot by a prior art example (for example, refer nonpatent literature 5).
T. Yamasaki and N. Goto: “identification of Blimp Dynamics by Flight Tests”, Transactions of JSASS, Vol.43, pp.195-205, 2003. T. Yamasaki and N. Goto: “identification of Blimp Dynamics by Flight Tests”, Transactions of JSASS, Vol.43, pp.195-205, 2003. Yoshihiko Nakamura "Nonholonomic Robot System 2nd Motion Planning under Geometric Nonholonomic Constraint" Journal of the Robotics Society of Japan Vol.11 No.5, pp.655-662,1993 Hiroshi Kawano “Operation Planning and Control of Underactuated Underwater Robot Considering Navigation in Unknown and Uneven Currents”, JSAI2005, Japanese Society for Artificial Intelligence (19th), 1D1-04, 2005 Hiroshi Kawano “3D Obstacle Avoidance of an Autonomous Airship Navigating in an Unknown Disturbance”, Proceedings of the 2006 JSME Robotics Mechatronics Lecture, May 26, 2006

従来技術では、自律ロボットが航行する環境についての障害物に関する情報を事前に知っておく必要があり、その情報が得られていない場合には、動作計画及びその動作計画を用いた動作制御を行うことができないという問題があった。また、マルコフ状態遷移モデルを利用した動作計画は時間がかかるため、自律ロボットが航行中に、障害物の情報を取得して、マルコフ状態遷移モデルを利用した動作計画及びその動作計画を用いた動作制御を実時間で行うことができないという問題があった。 In the prior art, it is necessary to know in advance information about obstacles about the environment in which the autonomous robot navigates. If that information is not available, the action plan and action control using the action plan are performed. There was a problem that I could not. In addition, since an operation plan using a Markov state transition model takes time, an autonomous robot acquires information on an obstacle while navigating, and an operation plan using the Markov state transition model and an operation using the operation plan. There was a problem that control could not be performed in real time.

本発明によれば、原点を目標到達位置とし、水中ロボットが原点に到達するときの報酬＞その他の場合の報酬として、マルコフ状態遷移モデルにおける動作計画法に基づいて生成された価値関数が価値関数保存手段に保存されており、環境モデル生成手段が、新たな障害物を検出するごとに、その障害物の位置を求めて、障害物情報保存手段に格納する。軌道生成手段が、上記障害物情報保存手段から読み出した障害物にぶつからずに目標到達位置に到達するまでの、各時刻ステップにおける目標軌道位置を生成して、目標軌道位置保存手段に格納する。軌道追従計算手段が、水中ロボットが各行動を取ったときに、上記目標軌道位置保存手段から読み出した目標軌道位置にどの程度近づくことができるのかを数値で評価し、その評価値により、各行動に優先順位を付ける。障害物回避計算手段が、水中ロボットが各行動を取ったときの障害物へのぶつかりやすさを計算する。行動選択手段が、軌道追従計算ステップで付けられた優先順位の高い行動の順番で、上記障害物回避計算ステップで求められた、その行動を取ったときの障害物へのぶつかりやすさが一定の閾値よりも小さいかどうかを順次判定し、小さいと判定された場合にはその行動を選択する処理を、小さいと判定される行動が見つかるまで繰り返す。フィードバック制御手段が、行動選択ステップで選択された行動に従って、水中ロボットが動作をするように制御する。 According to the present invention, the value function generated based on the motion planning method in the Markov state transition model is a value function as a reward when the origin is the target arrival position and the reward when the underwater robot reaches the origin> the reward in other cases Each time the environmental model generation unit detects a new obstacle, the position of the obstacle is obtained and stored in the obstacle information storage unit. The trajectory generation means generates a target trajectory position at each time step until it reaches the target arrival position without hitting the obstacle read from the obstacle information storage means, and stores it in the target trajectory position storage means. The trajectory tracking calculation means evaluates numerically how close the target trajectory position read from the target trajectory position storage means can be when each action of the underwater robot is taken. Prioritize The obstacle avoidance calculation means calculates the likelihood of hitting the obstacle when the underwater robot takes each action. The action selection means, in the order of actions with the highest priority given in the trajectory follow-up calculation step, is determined to be easy to collide with the obstacle when taking the action obtained in the obstacle avoidance calculation step. Whether it is smaller than the threshold value is sequentially determined, and if it is determined to be small, the process of selecting the action is repeated until an action determined to be small is found. The feedback control means controls the underwater robot to operate according to the action selected in the action selection step.

未知の障害物が存在する環境においても、自律ロボットが障害物に衝突せずに目標位置に到達することができるように制御することができる。 Even in an environment where an unknown obstacle exists, control can be performed so that the autonomous robot can reach the target position without colliding with the obstacle.

本発明による制御の対象となる水中ロボット１’を、図１１に例示する。水中ロボット１’は、例えば、超音波測距センサ２、主推進器３、海流差計測部６、位置計測部７、舵８、動作制御部１０００を有する。
動作制御部１０００は、例えば、図１に示すように、環境モデル生成部１０、軌道生成部２０、実時間制御部９０、フィードバック制御部６０、価値関数保存部７０、ダイナミックプログラミング部８０から構成される。
ダイナミックプログラミング部８０は、例えば、図１８に示すように、最大加速度設定部１１２、目標速度計算部１０２、想定海流速度値入力部１０３、変位量計算部１０４、状態遷移確率計算部１０５、報酬決定部１０９’、動作計画部１１０から構成される。
軌道生成部２０は、例えば、図４に示すように、目標軌道位置保存部２０１、初期値設定部２０２、存在確率計算部２０３、確率補正部２０４、存在確率記憶部２０５、制御部２０６、軌道決定部２０７から構成される。
軌道追従計算部３０は、例えば、図８に示すように、相対位置決定部３０１、遷移先予測部２１３、最大値抽出部３０３、加算部３０４、順序決定部３０５、平均値抽出部３０６から構成される。
障害物回避計算部４０は、例えば、図９に示すように、相対位置決定部４０１、遷移先予測部２１３、最大値抽出部４０３、最大値選択部４０４、リスク計算部４０５、平均値抽出部４０６から構成される。 An underwater robot 1 ′ to be controlled by the present invention is illustrated in FIG. The underwater robot 1 ′ includes, for example, an ultrasonic distance measuring sensor 2, a main propelling device 3, a sea current difference measuring unit 6, a position measuring unit 7, a rudder 8, and an operation control unit 1000.
For example, as shown in FIG. 1, the motion control unit 1000 includes an environment model generation unit 10, a trajectory generation unit 20, a real time control unit 90, a feedback control unit 60, a value function storage unit 70, and a dynamic programming unit 80. The
For example, as shown in FIG. 18, the dynamic programming unit 80 includes a maximum acceleration setting unit 112, a target velocity calculation unit 102, an assumed ocean current velocity value input unit 103, a displacement amount calculation unit 104, a state transition probability calculation unit 105, a reward determination. Unit 109 ′ and operation planning unit 110.
For example, as shown in FIG. 4, the trajectory generation unit 20 includes a target trajectory position storage unit 201, an initial value setting unit 202, an existence probability calculation unit 203, a probability correction unit 204, an existence probability storage unit 205, a control unit 206, an orbit, and the like. It is comprised from the determination part 207.
For example, as shown in FIG. 8, the trajectory tracking calculation unit 30 includes a relative position determination unit 301, a transition destination prediction unit 213, a maximum value extraction unit 303, an addition unit 304, an order determination unit 305, and an average value extraction unit 306. Is done.
For example, as shown in FIG. 9, the obstacle avoidance calculation unit 40 includes a relative position determination unit 401, a transition destination prediction unit 213, a maximum value extraction unit 403, a maximum value selection unit 404, a risk calculation unit 405, and an average value extraction unit. 406.

＜ステップＳ０＞
ダイナミックプログラミング部８０は、原点（０，０）を目標到達位置、水中ロボットが目標到達位置に到達するときの報酬を１、その他の場合の報酬を０として、障害物の位置を考慮しないで価値関数Ｑ（ｓ，ａ）を生成する。
ダイナミックプログラミング部８０は、最大加速度設定部１１２、目標速度計算部１０２、想定海流速度値入力部１０３、変位量計算部１０４、状態遷移確率計算部１０５、報酬決定部１０９’、動作計画部１１０を有する。最大加速度設定部１１２、目標速度計算部１０２、想定海流速度値入力部１０３、変位量計算部１０４、状態遷移確率計算部１０５、動作計画部１１０の処理は、従来技術と同じであるためその説明を省略する。
報酬決定部１０９’は、障害物の有無を考慮せずに、原点（０，０）を目標到達位置、水中ロボットが目標到達位置に到達するときの報酬＞その他の場合の報酬とする。例えば、水中ロボットが目標到達位置に到達するときの報酬を１、その他の場合の報酬決定された報酬を０とする。決定された報酬は、動作計画部１１０に出力される。この報酬決定部１０９’の処理のみが、従来技術とは異なる。ダイナミックプログラミング部８０の他の処理は、従来技術と同様である。
動作計画部１１０は、報酬決定部１０９’が決定した報酬に基づいて価値関数Ｑ（ｓ，
ａ）を求める。遷移前の状態ｓにおける行動ａ_１ ^（ｍ）（ｍ＝１，…，Ｍ）ごとに計算された価値関数Ｑ（ｓ，ａ）は、価値関数保存部７０に格納される。
Ｑ（ｓ，ａ）は、各状態ｓにおいて、行動ａを選択した結果、将来水中ロボットが受ける報酬の推定量を示すものであり、各状態ｓにおいて、Ｑの値を最大化するａを選択することにより最適な行動方策を選ぶことができる。 <Step S0>
The dynamic programming unit 80 sets the origin (0, 0) as the target arrival position, the reward when the underwater robot reaches the target arrival position as 1, and the reward in other cases as 0, and is worth considering the position of the obstacle. A function Q (s, a) is generated.
The dynamic programming unit 80 includes a maximum acceleration setting unit 112, a target speed calculation unit 102, an assumed ocean current velocity value input unit 103, a displacement amount calculation unit 104, a state transition probability calculation unit 105, a reward determination unit 109 ′, and an action plan unit 110. Have. The processing of the maximum acceleration setting unit 112, the target velocity calculation unit 102, the assumed ocean current velocity value input unit 103, the displacement amount calculation unit 104, the state transition probability calculation unit 105, and the motion planning unit 110 is the same as that of the prior art, so that explanation Is omitted.
The reward determining unit 109 ′ sets the origin (0, 0) as the target arrival position and the reward when the underwater robot reaches the target arrival position> reward in other cases without considering the presence or absence of an obstacle. For example, the reward when the underwater robot reaches the target reaching position is 1, and the reward determined in other cases is 0. The determined reward is output to the motion planning unit 110. Only the processing of the reward determination unit 109 ′ is different from the conventional technology. Other processes of the dynamic programming unit 80 are the same as those in the prior art.
The motion planning unit 110 calculates the value function Q (s, based on the reward determined by the reward determination unit 109 ′.
a) is determined. The value function Q (s, a) calculated for each action a ₁ ^(m) (m = 1,..., M) in the state s before the transition is stored in the value function storage unit 70.
Q (s, a) indicates the estimated amount of reward that the future underwater robot will receive as a result of selecting the action a in each state s, and in each state s, select a that maximizes the value of Q. By doing so, you can choose the best action strategy.

なお、上記式（２’）では、水流の速度（ｆｘ，ｆｙ）の方向や速さ及び旋回速度Ψ’が異なっても、水中ロボットの位置（Ｘ，Ｙ）は、水流の速度（ｆｘ，ｆｙ）を行動単位時間Ｔで積分した値だけが変化し、方位角Ψは、旋回速度Ψ’を行動単位時間Ｔだけ積分した値だけ変化すると仮定している。しかし、上記の仮定は、水中ロボットの形状やその他の要因を考慮すると、現実の世界においては必ずしも成り立つとは限らない。したがって、上記式（２’）を用いる代わりに、想定される水流を発生させた水槽の中に水中ロボットを入れて、Ｄｘ（Ψ_０，ａ），Ｄｙ（Ψ_０，ａ），ＤΨ，ＤΨ’の値を実際に測定しても良い。すなわち、いわゆる水槽試験等の手法により、Ｄｘ（Ψ_０，ａ），Ｄｙ（Ψ_０，ａ），ＤΨ，ＤΨ’を直接計測しても良い。
なお、本発明では、原点（０，０）を目標到達位置、水中ロボットが目標到達位置に到達するときの報酬＞その他の場合の報酬を０として、障害物の位置を考慮しないで生成された価値関数Ｑ（ｓ，ａ）に基づいて、下記に述べる処理を行う点が重要であり、価値関数Ｑ（ｓ，ａ）の生成手段であるダイナミックプログラミング部は、必須の構成要件ではない点に留意する。 In the above formula (2 ′), even if the direction and speed of the water flow velocity (fx, fy) and the turning speed ψ ′ are different, the position (X, Y) of the underwater robot is the water flow velocity (fx, fx). It is assumed that only the value obtained by integrating fy) by the action unit time T changes, and the azimuth angle ψ changes by a value obtained by integrating the turning speed ψ ′ by the action unit time T. However, the above assumption does not always hold true in the real world, considering the shape of the underwater robot and other factors. Therefore, instead of using the above equation (2 ′), an underwater robot is placed in a water tank in which an assumed water flow is generated, and Dx (Ψ ₀ , a), Dy (Ψ ₀ , a), DΨ, DΨ. You may actually measure the value of '. That is, Dx (Ψ ₀ , a), Dy (Ψ ₀ , a), DΨ, DΨ ′ may be directly measured by a technique such as a so-called water tank test.
In the present invention, the origin (0, 0) is set as the target arrival position, the reward when the underwater robot reaches the target arrival position> the reward in other cases is set as 0, and is generated without considering the position of the obstacle. It is important that the processing described below is performed based on the value function Q (s, a), and the dynamic programming unit that is a means for generating the value function Q (s, a) is not an essential component. pay attention to.

＜ステップＳ１＞
環境モデル生成部１０は、新たな障害物を検出するごとに、その障害物の位置を求めて、環境モデル生成部１０の障害物情報保存部１０１に格納する。障害物の検出及び障害物の位置の計算には、例えば、超音波測距センサを用いる。例えば、超音波測距センサは、水中ロボットの進行方向に対して−ω〜＋ω度の範囲を等間隔に分けるように３〜５つ程度設けることができる。ここで、ω＝４５〜６０度である。図３は、超音波測距センサが、水中ロボット１の前方に５つ設けられている例である。
これらの超音波測距センサのうち、ｋ番目の超音波測距センサ２が検出した障害物９の位置（Ｘｏｋ，Ｙｏｋ）は、水中ロボット１の位置を（Ｘ，Ｙ）、水中ロボット１の方位角をΨ、ｋ番目の超音波測距センサとｋ番目の超音波測距センサが検出したその超音波測距センサと障害物の距離をｒｋ、水中ロボットの進行方向に対する超音波測距センサ２の角度をθｋとすると、方位角Ψが、Ｘ軸の正方向からの方位角として定義されている場合には、
Ｘｏ＝Ｘ＋ｒｋ×ｃｏｓ（Ψ＋θｋ）
Ｙｏ＝Ｙ＋ｒｋ×ｓｉｎ（Ψ＋θｋ）
として求めることができる。ここで、超音波測距センサが取り付けられた位置と水中ロボットの位置を定める際の基準となる点は、十分に近いものとする。 <Step S1>
Every time a new obstacle is detected, the environmental model generation unit 10 obtains the position of the obstacle and stores it in the obstacle information storage unit 101 of the environmental model generation unit 10. For example, an ultrasonic distance sensor is used for detecting the obstacle and calculating the position of the obstacle. For example, about 3 to 5 ultrasonic distance measuring sensors can be provided so as to divide the range of −ω to + ω degrees at equal intervals with respect to the traveling direction of the underwater robot. Here, ω = 45 to 60 degrees. FIG. 3 is an example in which five ultrasonic distance measuring sensors are provided in front of the underwater robot 1.
Among these ultrasonic distance measuring sensors, the position (Xok, Yok) of the obstacle 9 detected by the kth ultrasonic distance measuring sensor 2 is the position of the underwater robot 1 (X, Y), and the position of the underwater robot 1 The azimuth angle is Ψ, the distance between the ultrasonic distance sensor detected by the kth ultrasonic distance sensor and the kth ultrasonic distance sensor is rk, and the distance between the obstacle is rk, and the ultrasonic distance sensor with respect to the traveling direction of the underwater robot When the angle of 2 is θk, the azimuth angle Ψ is defined as the azimuth angle from the positive direction of the X axis.
Xo = X + rk × cos (Ψ + θk)
Yo = Y + rk × sin (Ψ + θk)
Can be obtained as Here, it is assumed that the reference point for determining the position where the ultrasonic distance measuring sensor is attached and the position of the underwater robot is sufficiently close.

なお、障害物情報データの更新は、例えば、行動単位時間Ｔごとに行うことができるが、必ずしも行動単位時間Ｔごとに行う必要はない。例えば、行動単位時間Ｔよりも短い間隔で行っても良い。これにより、障害物の検知率が上がり、水中ロボットが障害物にぶつかる可能性を低くすることができる。 The obstacle information data can be updated every action unit time T, for example. However, it is not always necessary to update the obstacle information data every action unit time T. For example, it may be performed at intervals shorter than the action unit time T. This increases the obstacle detection rate and reduces the possibility of the underwater robot hitting the obstacle.

＜ステップＳ２＞
軌道生成部２０は、障害物情報保存部１０１から読み出した障害物の位置情報を基にして、水中ロボットの現在位置から目標位置に至るまでの、障害物と接触しない目標軌道位置を計算する。障害物の位置情報のみを考慮した目標軌道位置の計算方法としては既出の方法が多数ある。本実施形態では、例えば以下のようにして、目標軌道位置を求める。
図４は、軌道生成部２０の機能構成を例示する図である。図５は、軌道生成部２０の処理を例示する図である。
まず、現時点での水中ロボットの位置を含む格子をｓ０とする。そして、目標位置を含む格子をｓｄとする。１時刻ステップごとに、つまり、行動単位時間Ｔごと、水中ロボットは、隣り合う格子の何れかに等しい確率で移動するものとする。「隣り合う」とは、格子が辺を介して隣り合う場合だけではなく、格子が頂点を介して隣り合う場合をも意味する点に注意する。すなわち、水中ロボットの運動学的特性を考慮せずに、水中ロボットが自機を囲む８つのどの格子にも移動可能であると仮定する。 <Step S2>
The trajectory generation unit 20 calculates a target trajectory position that does not contact the obstacle from the current position of the underwater robot to the target position based on the position information of the obstacle read from the obstacle information storage unit 101. There are many existing methods for calculating the target trajectory position considering only the position information of the obstacle. In the present embodiment, for example, the target trajectory position is obtained as follows.
FIG. 4 is a diagram illustrating a functional configuration of the trajectory generation unit 20. FIG. 5 is a diagram illustrating processing of the trajectory generation unit 20.
First, s0 is a grid that includes the current position of the underwater robot. And let sd be the grid containing the target position. It is assumed that the underwater robot moves with a probability equal to one of the adjacent grids every time step, that is, every action unit time T. Note that “adjacent” means not only when the grids are adjacent via edges, but also when the grids are adjacent via vertices. In other words, it is assumed that the underwater robot can move to any of the eight lattices surrounding itself without considering the kinematic characteristics of the underwater robot.

例えば、図６に示すように、ある時刻ステップで水中ロボットが黒丸が描かれた格子にいる場合には、次に時刻ステップにおいて水中ロボットは白丸で描かれた格子の何れかに等しい確率で移動するものとする。具体的には、図６の例では、水中ロボットは、黒丸が描かれた格子に１の確率で存在しているため、行動単位時間Ｔの経過後には、白丸が描かれた格子のそれぞれに１／８の確率で存在するものとする。
各時刻ステップｔで、水中ロボットが格子ｓに存在する確率をＰ（ｓ，ｔ）とする。そうすると、時刻ステップ０において、水中ロボットが格子ｓ０に位置する確率Ｐ（ｓ０，０）＝１であり、時刻ステップ０において、水中ロボットが格子ｓ０以外の格子に位置する確率Ｐ（ｓ（ｓ！＝ｓ０），０）＝０である。ここで、「ｓ！＝ｓ０」は、格子ｓ０以外の格子を意味する。
軌道生成部２０の初期値設定部２０２が、水中ロボットの現在位置が含まれる格子に水中ロボットが時刻ステップ０において存在する確率Ｐ（ｓ０，０）を１とし、その他の格子に存在する確率Ｐ（ｓ（ｓ！＝ｓ０），０）を０に設定する（ステップＳ２１）。 For example, as shown in FIG. 6, when an underwater robot is in a grid with black circles at a certain time step, the underwater robot moves with a probability equal to one of the grids drawn with white circles at the next time step. It shall be. Specifically, in the example of FIG. 6, the underwater robot exists with a probability of 1 in the grid with the black circles drawn, so after the action unit time T has elapsed, It is assumed that it exists with a probability of 1/8.
Let P (s, t) be the probability that an underwater robot is present in the grid s at each time step t. Then, at time step 0, the probability P (s0,0) = 1 that the underwater robot is located on the lattice s0, and at time step 0, the probability P (s (s!) That the underwater robot is located on a lattice other than the lattice s0. = S0), 0) = 0. Here, “s! = S0” means a lattice other than the lattice s0.
The initial value setting unit 202 of the trajectory generation unit 20 sets the probability P (s0,0) that the underwater robot exists at the time step 0 in the lattice including the current position of the underwater robot to 1, and the probability P that exists in the other lattices. (S (s! = S0), 0) is set to 0 (step S21).

任意の時刻ステップｔにおけるＰ（ｓ，ｔ）は、下記の式で計算することができる。
Ｐ（ｓ，ｔ）＝Σ_ｓ’（１／８）×Ｐ（ｓ’，ｔ−１） …（５）
上記式中のｓ’は、格子ｓに隣り合う８つ格子、言い換えると、格子ｓの周りのｓを含まない８つの格子のことである。上記式中のΣは、このｓ’についての和を取るものである。例えば、図７に示すように、格子ｓの周りにｓ１’，…ｓ８’の８つの格子があり、水中ロボットは、時刻ステップｔ−１において格子ｓｐ’（ｐ＝１，…，８）に、それぞれＰ（ｓｐ’，ｔ−１）の確率で存在しているものとする。このとき、上記の仮定より、水中ロボットは、ｓｐ’（ｐ＝１，…，８）の各格子から１／８の確率で格子ｓに移動するため、水中ロボットが時刻ステップｔにおいて格子ｓに存在する確率Ｐ（ｓ，ｔ）は、
Ｐ（ｓ，ｔ）＝Σ_ｐ＝１ ^８（１／８）×Ｐ（ｓｐ’，ｔ−１）
となる。
軌道生成部２０の存在確率計算部２０３が、上記式（５）に基づいて各時刻ステップにおける水中ロボットの存在確率を計算する（ステップＳ２２）。 P (s, t) at an arbitrary time step t can be calculated by the following equation.
P (s, t) = Σ _{s ′} (1/8) × P (s ′, t−1) (5)
In the above formula, s ′ is eight lattices adjacent to the lattice s, in other words, eight lattices that do not include s around the lattice s. In the above equation, Σ is the sum for s ′. For example, as shown in FIG. 7, there are eight lattices s1 ′,... S8 ′ around the lattice s, and the underwater robot moves to the lattice sp ′ (p = 1,..., 8) at time step t−1. , And P (sp ′, t−1) respectively. At this time, from the above assumption, the underwater robot moves to the lattice s with a probability of 1/8 from each lattice of sp ′ (p = 1,..., 8). The probability P (s, t) that exists is
P (s, t) = Σ _{p = 1} ⁸ (1/8) × P (sp ′, t−1)
It becomes.
The existence probability calculation unit 203 of the trajectory generation unit 20 calculates the existence probability of the underwater robot at each time step based on the above equation (5) (step S22).

格子が障害物を含む場合には、水中ロボットはその格子に侵入することはできない。このため、軌道生成部２０の確率補正部２０４が、障害物の位置についての情報を障害物情報保存部１０１から読み出し、上記存在確率計算部２０３で求まった存在確率のうち、各障害物が存在している格子についての存在確率を０とする（ステップＳ２３）。
計算された各時刻ステップにおける水中ロボットが各格子に存在する確率Ｐ（ｓ，ｔ）は、存在確率記憶部２０５に保存される。また、存在確率記憶部２０５から読み出した一時刻ステップ前の確率Ｐ（ｓ，ｔ−１）に基づいて、次の時刻ステップにおける確率Ｐ（ｓ，ｔ）は計算される。
軌道生成部２０の制御部２０６は、水中ロボットが目標到達位置ｓｄを含む格子に存在する確率Ｐ（ｓｄ，ｔ）が０でない値になるまで、存在確率計算部２０３と確率補正部２０４の処理を繰り返すように制御する。制御部２０６は、Ｐ（ｓｄ，ｔ）が０でない値になった場合には、そのときの時刻ステップｔｄを、軌道生成部２０の軌道決定部２０７に出力する（ステップＳ２４）。 If the grid contains obstacles, the underwater robot cannot enter the grid. For this reason, the probability correction unit 204 of the trajectory generation unit 20 reads information on the position of the obstacle from the obstacle information storage unit 101, and each obstacle exists among the existence probabilities obtained by the existence probability calculation unit 203. The existence probability with respect to the lattice being set is set to 0 (step S23).
The calculated probability P (s, t) that the underwater robot exists in each grid at each time step is stored in the existence probability storage unit 205. Further, the probability P (s, t) at the next time step is calculated based on the probability P (s, t−1) one step before the time read from the existence probability storage unit 205.
The control unit 206 of the trajectory generation unit 20 performs processing of the existence probability calculation unit 203 and the probability correction unit 204 until the probability P (sd, t) that the underwater robot exists in the lattice including the target arrival position sd becomes a non-zero value. Control to repeat. When P (sd, t) becomes a value other than 0, the control unit 206 outputs the time step td at that time to the trajectory determination unit 207 of the trajectory generation unit 20 (step S24).

軌道生成部２０の軌道決定部２０７は、まず、時刻ステップｔｄに対応した目標軌道位置τ（ｔｄ）を、τ（ｔｄ）＝ｓｄとして、軌道生成部２０内の目標軌道位置保存部２０１に格納する（ステップＳ２５）。
軌道決定部２０７は、水中ロボットが時刻ステップｔ−１において、目標軌道位置τ（ｔ）の周りにある各格子に存在している確率を存在確率記憶部２０５から読み出す。そして、それらの確率を最も大きくする格子をτ（ｔ−１）として、目標軌道位置保存部２０１に格納する（ステップＳ２６）。
この処理をｔ＝ｔｄからｔ＝１まで繰り返すことによって、軌道決定部２０７は、最終的に、各時刻ステップにおける目標軌道位置τ（ｔｄ），τ（ｔｄ−１），…，τ（１），τ（０）を求める（ステップＳ２７）。
上記の計算方法は、既出の手法に比べて計算負担が少ない。このため、計算の実時間性を担保することができる。 The trajectory determination unit 207 of the trajectory generation unit 20 first stores the target trajectory position τ (td) corresponding to the time step td in the target trajectory position storage unit 201 in the trajectory generation unit 20 as τ (td) = sd. (Step S25).
The trajectory determination unit 207 reads from the existence probability storage unit 205 the probability that the underwater robot exists in each grid around the target trajectory position τ (t) at time step t−1. Then, the lattice that maximizes the probability is stored in the target trajectory position storage unit 201 as τ (t−1) (step S26).
By repeating this process from t = td to t = 1, the trajectory determination unit 207 finally has target trajectory positions τ (td), τ (td−1),..., Τ (1) at each time step. , Τ (0) is obtained (step S27).
The calculation method described above has less calculation burden than the above-described method. For this reason, the real-time property of calculation can be secured.

なお、上記実施形態では、その格子が障害物を含むか含まないかを問わず、すべての格子について、水中ロボットがその格子に存在する確率を求めたのちに、その格子が障害物を含むかどうかを判断し、その格子が障害物を含む場合には、水中ロボットがその格子に存在する確率を０とした。しかし、障害物を含む格子については、水中ロボットがその格子に存在する確率を求めずに、常に、その格子に存在する確率を０としても良い。
また、軌道生成部２０は、障害物情報保存部１０１から読み出した障害物の位置情報を参照して、障害物の位置を構成する格子空間（Ｘ，Ｙで構成される）を利用して、いわゆる動的計画法により、目標軌道位置を求めても良い。 In the above embodiment, whether or not the lattice includes an obstacle after the probability that the underwater robot exists in the lattice is obtained for all lattices regardless of whether or not the lattice includes an obstacle. In the case where the grid includes an obstacle, the probability that the underwater robot exists in the grid is set to zero. However, for a lattice including an obstacle, the probability that the underwater robot exists in the lattice may be always set to 0 without obtaining the probability that the underwater robot exists in the lattice.
Further, the trajectory generation unit 20 refers to the obstacle position information read from the obstacle information storage unit 101, and uses a lattice space (configured by X, Y) that constitutes the position of the obstacle, The target trajectory position may be obtained by so-called dynamic programming.

＜ステップＳ３＞
軌道追従計算部３０は、各行動ａ_１ ^（ｍ）（ｍ＝１，…，Ｍ）を取ったときに、どの程度目標軌道位置τ（ｉ）（ｉ＝０，１，…，ｔｄ）に近づくことができるのかを値で評価し、その評価値により、各行動ａ_１ ^（ｍ）（ｍ＝１，…，Ｍ）に優先順位を付ける。図８は、軌道追従計算部３０の機能構成を例示した図である。
軌道追従計算部３０の相対位置決定部３０１は、目標軌道位置τ（ｉ）（ｉ＝０，１，…，ｔｄ）に対する現時点における水中ロボットの相対位置を（Ｘτ（ｉ），Ｙτ（ｉ））を求める。すなわち、目標軌道位置τ（ｉ）（ｉ＝０，１，…，ｔｄ）を原点としたときの、現時点における水中ロボットの位置を（Ｘτ（ｉ），Ｙτ（ｉ））を求める。 <Step S3>
The trajectory follow-up calculation unit 30 sets the target trajectory position τ (i) (i = 0, 1,..., Td) when taking each action a ₁ ^(m) (m = 1,..., M). Evaluate whether it is possible to approach, and give priority to each action a ₁ ^(m) (m = 1,..., M) based on the evaluation value. FIG. 8 is a diagram illustrating a functional configuration of the trajectory tracking calculation unit 30.
The relative position determination unit 301 of the trajectory tracking calculation unit 30 determines the relative position of the underwater robot at the current time relative to the target trajectory position τ (i) (i = 0, 1,..., Td) (Xτ (i), Yτ (i) ) That is, (Xτ (i), Yτ (i)) is determined as the current position of the underwater robot when the target trajectory position τ (i) (i = 0, 1,..., Td) is the origin.

軌道追従計算部３０の遷移先予測部２１３は、（Ｘτ（ｉ），Ｙτ（ｉ））に位置する水中ロボットが、各行動ａ_１ ^（ｍ）（ｍ＝１，…，Ｍ）を取ったときの遷移先の状態ｓ（τｉ，ａ_１ ^（ｍ））を求める。行動ａ_１ ^（ｍ）（ｍ＝１，…，Ｍ）を第一の行動と呼ぶ。すなわち、時刻ｔ＝０における水中ロボットの方位角をΨ、旋回速度をΨ’とすると、位置（Ｘτ（ｉ）＋Ｄｘａ（ｓ，ａ），Ｙτ（ｉ）＋Ｄｙａ（ｓ，ａ），ＤΨ＋ＤΨ，ＤΨ’＋ＤΨ’）が含まれる状態ｓ（τｉ，ａ_１ ^（ｍ））を求める。遷移先予測部２１３、海流差計測部６、位置計測部７の処理は、従来例と同様であるため説明を省略する。
軌道追従計算部３０の最大値抽出部３０３は、遷移先の状態ｓ（τｉ，ａ_１ ^（ｍ））において、水中ロボットが各行動ａ_２ ^（ｎ）（ｎ＝１，…，Ｎ）を取ったときのＱ（ｓ（τｉ，ａ_１ ^（ｍ）），ａ_２ ^（ｎ））の最大値Ｑｍａｘ（τｉ，ａ_１ ^（ｍ））を求める。
行動ａ_２ ^（ｎ）（ｎ＝１，…，Ｎ）を第二の行動と呼ぶ。すなわち、遷移先の状況ｓ（τｉ，ａ_１ ^（ｍ））において、水中ロボットが各行動ａ_２ ^（ｎ）を取ったときにＱ（ｓ（τｉ，ａ_１ ^（ｍ）），ａ_２ ^（ｎ））が定まるが、これらの各行動ａ_２ ^（ｎ）ごとに求まった報酬Ｑ（ｓ（τｉ，ａ_１ ^（ｍ）），ａ_２ ^（ｎ））のうち、最大のものＱｍａｘ（τｉ，ａ_１ ^（ｍ））を求める。 In the transition destination prediction unit 213 of the trajectory tracking calculation unit 30, the underwater robot located at (Xτ (i), Yτ (i)) takes each action a ₁ ^(m) (m = 1,..., M). The transition destination state s (τi, a ₁ ^(m) ) is obtained. Action a ₁ ^(m) (m = 1,..., M) is called a first action. That is, assuming that the azimuth angle of the underwater robot at time t = 0 is Ψ and the turning speed is Ψ ′, the position (Xτ (i) + Dxa (s, a), Yτ (i) + Dya (s, a), DΨ + DΨ, DΨ A state s (τi, a ₁ ^(m) ) including “+ DΨ ′) is obtained. Since the processes of the transition destination prediction unit 213, the ocean current difference measurement unit 6, and the position measurement unit 7 are the same as those in the conventional example, the description thereof is omitted.
The maximum value extraction unit 303 of the trajectory follow-up calculation unit 30 takes each action a ₂ ⁽ⁿ⁾ (n = 1,..., N) in the transition destination state s (τi, a ₁ ^(m) ). The maximum value Qmax (τi, a ₁ ^(m) ) of Q (s (τi, a ₁ ^(m) ), a ₂ ⁽ⁿ⁾ ) at this time is obtained.
Action a ₂ ⁽ⁿ⁾ (n = 1,..., N) is referred to as a second action. That is, in the transition destination situation s (τi, a ₁ ^(m) ), when the underwater robot takes each action a ₂ ⁽ⁿ⁾ , Q (s (τi, a ₁ ^(m) ), a ₂ ^{(n )} ) Is determined, but among the rewards Q (s (τi, a ₁ ^(m) ), a ₂ ⁽ⁿ⁾ ) determined for each of these actions a ₂ ⁽ⁿ⁾ , the maximum one Qmax (τi, a ₁ ^(m) ).

相対位置決定部３０１と遷移先予測部２１３と最大値抽出部３０３は、上記の処理を行うことにより、τｉとａ_１ ^（ｍ）のすべての組について、Ｑｍａｘ（τｉ，ａ_１ ^（ｍ））を求める。
軌道追従計算部３０の加算部３０４は、すべてのτｉについての、τｉとａ_１ ^（ｍ）の各組ごとに求まったＱｍａｘ（τｉ，ａ_１ ^（ｍ））の和を求める。すなわち、
Ｑｍａｘ（ａ_１ ^（ｍ））＝Σ_ｉ＝０ ^ｔｄＱｍａｘ（τｉ，ａ_１ ^（ｍ））
を計算する。
Ｑｍａｘ（ａ_１ ^（ｍ））は、行動ａ_１ ^（ｍ）を選択した水中ロボットが、将来的に、目標軌道上に乗っている可能性の大きさを示す指標となる。なぜなら、上記したように、価値関数Ｑ（ｓ，ａ）は、状態ｓにある自機が、価値関数Ｑ（ｓ，ａ）を最大にする行動ａを選択することにより、原点に最も効率良く近づくことができるように設計されており、かつ、上記相対位置決定部３０１の処理により、原点が目標軌道τｉに置き換えられているためである。
軌道追従計算部３０の順序決定部３０５は、行動ａ_１ ^（ｍ）を序列化する。具体的には、Ｑｍａｘ（ａ_１ ^（ｍ））の値が大きい順に、行動ａ_１ ^（ｍ）（ｍ＝１，…，Ｍ）を並び替えて、ａ_１’，ａ_２’，…，ａ_Ｍ’とすることにより優先順位を付ける。ａ_ｍ’（ｍ＝１，…，Ｍ）は、行動選択部５０に出力される。 The relative position determination unit 301 and the transition destination prediction unit 213 and the maximum value extractor 303 by performing the above process, for all the pairs of .tau.i and _{^{a 1 (m), Qmax (}} τi, a 1 (m)) Ask for.
The adder 304 of the trajectory follow-up calculator 30 calculates the sum of Qmax (τi, a ₁ ^(m) ) obtained for each set of τ i and a ₁ ^(m) for all τ i. That is,
Qmax (a ₁ ^(m) ) = Σ _{i = 0} ^td Qmax (τi, a ₁ ^(m) )
Calculate
Qmax (a ₁ ^(m) ) is an index indicating the possibility that the underwater robot that has selected the action a ₁ ^(m) is on the target trajectory in the future. Because, as described above, the value function Q (s, a) is most efficiently at the origin by selecting the action a that maximizes the value function Q (s, a). This is because it is designed to be able to approach, and the origin is replaced with the target trajectory τi by the processing of the relative position determination unit 301.
The order determination unit 305 of the trajectory tracking calculation unit 30 ranks the actions a ₁ ^(m) . Specifically, the actions a ₁ ^(m) (m = 1,..., M) are rearranged in descending order of Qmax (a ₁ ^(m) ), and a ₁ ′, a ₂ ′,. Prioritize by setting _M '. a _m ′ (m = 1,..., M) is output to the action selection unit 50.

なお、上記最大値抽出部３０３に替えて、以下に説明する平均値抽出部３０６を設けても良い。
軌道追従計算部３０の平均値抽出部３０６は、遷移先の状態ｓ（τｉ，ａ_１ ^（ｍ））において、水中ロボットが各行動ａ_２ ^（ｎ）（ｎ＝１，…，Ｎ）を取ったときの報酬Ｑ（ｓ（τｉ，ａ_１ ^（ｍ）），ａ_２ ^（ｎ））の平均値Ｑａｖｅ（τｉ，ａ_１ ^（ｍ））を求める。すなわち、遷移先の状況ｓ（τｉ，ａ_１ ^（ｍ））において、水中ロボットが各行動ａ_２ ^（ｎ）を取ったときに報酬Ｑ（ｓ（τｉ，ａ_１ ^（ｍ）），ａ_２ ^（ｎ））が定まるが、これらの各行動ａ_２ ^（ｎ）ごとに求まった報酬Ｑ（ｓ（τｉ，ａ_１ ^（ｍ）），ａ_２ ^（ｎ））の平均値Ｑａｖｅ（τｉ，ａ_１ ^（ｍ））を求める。
この場合には、加算部３０４は、Ｑｍａｘ（τｉ，ａ_１ ^（ｍ））ではなく、Ｑａｖｅ（τｉ，ａ_１ ^（ｍ））についての和を取る。すなわち、
Ｑａｖｅ（ａ_１ ^（ｍ））＝Σ_ｉ＝０ ^ｔｄＱａｖｅ（τｉ，ａ_１ ^（ｍ））
を計算する。
そして、順序決定部３０５は、Ｑａｖｅ（ａ_１ ^（ｍ））の値が大きい順に、行動ａ_１ ^（ｍ）（ｍ＝１，…，Ｍ）を並び替えて、ａ_１’，ａ_２’，…，ａ_Ｍ’とする。 Instead of the maximum value extraction unit 303, an average value extraction unit 306 described below may be provided.
The average value extraction unit 306 of the trajectory follow-up calculation unit 30 is configured so that the underwater robot takes each action a ₂ ⁽ⁿ⁾ (n = 1,..., N) in the transition destination state s (τi, a ₁ ^(m) ). The average value Qave (τi, a ₁ ^(m) ) of the reward Q (s (τi, a ₁ ^(m) ), a ₂ ⁽ⁿ⁾ ) is obtained. That is, in the situation s (τi, a ₁ ^(m) ) of the transition destination, when the underwater robot takes each action a ₂ ⁽ⁿ⁾ , reward Q (s (τi, a ₁ ^(m) ), a ₂ ^{( n)} ) is determined, but the average value Qave (τi, a ₁ ⁽ ) of the rewards Q (s (τi, a ₁ ^(m) ), a ₂ ⁽ⁿ⁾ ) determined for each of these actions a ₂ ⁽ⁿ⁾ ^m) Obtain).
In this case, the adding unit _{^{304, Qmax (τi, a 1 (}} m)) , rather than the sum of the _{^{Qave (τi, a 1 (m}} )). That is,
Qave (a ₁ ^(m) ) = Σ _{i = 0} ^td Qave (τi, a ₁ ^(m) )
Calculate
Then, the order determining unit 305 rearranges the actions a ₁ ^(m) (m = 1,..., M) in descending order of the value of Qave (a ₁ ^(m) ), and a ₁ ′, a ₂ ′, ..., _aM '.

＜ステップＳ４＞
障害物回避計算部４０の相対位置決定部４０１は、環境モデル作成部の障害物情報記録部から読み出した障害物の位置ｏｂ（ｊ）（ｊ＝１，２，…，Ｊ）に対する現時点における水中ロボットの相対位置を（Ｘｏｂ（ｊ），Ｙｏｂ（ｊ））を求める。すなわち、障害物の位置ｏｂ（ｊ）（ｊ＝１，２，…，Ｊ）を原点としたときの、現時点における水中ロボットの位置を（Ｘｏｂ（ｊ），Ｙｏｂ（ｊ））を求める。
障害物回避計算部４０の遷移先予測部２１３は、（Ｘｏｂ（ｊ），Ｙｏｂ（ｊ））に位置する水中ロボットが、各行動ａ_１ ^（ｍ）（ｍ＝１，…，Ｍ）を取ったときの遷移先の状態ｓ（ｏｂｊ，ａ_１ ^（ｍ））を求める。行動ａ_１ ^（ｍ）（ｍ＝１，…，Ｍ）を第一の行動と呼ぶ。すなわち、時刻ｔ＝０における水中ロボットの方位角をΨ、旋回速度をΨ’とすると、位置（Ｘｏｂ（ｊ）＋Ｄｘａ（ｓ，ａ），Ｙｏｂ（ｊ）＋Ｄｙａ（ｓ，ａ），ＤΨ＋ＤΨ，ＤΨ’＋ＤΨ’）が含まれる状態ｓ（ｏｂｊ，ａ_１ ^（ｍ））を求める。遷移先予測部２１３、海流差計測部６、位置計測部７の処理は、従来例と同様であるため説明を省略する。 <Step S4>
The relative position determination unit 401 of the obstacle avoidance calculation unit 40 is currently underwater with respect to the position ob (j) (j = 1, 2,..., J) of the obstacle read from the obstacle information recording unit of the environmental model creation unit. The relative position of the robot is obtained as (Xob (j), Yob (j)). That is, (Xob (j), Job (j)) is obtained as the current position of the underwater robot when the position ob (j) (j = 1, 2,..., J) of the obstacle is the origin.
The transition destination prediction unit 213 of the obstacle avoidance calculation unit 40 has the underwater robot located at (Xob (j), Job (j)) take each action a ₁ ^(m) (m = 1,..., M). The state s (obj, a ₁ ^(m) ) of the transition destination at this time is obtained. Action a ₁ ^(m) (m = 1,..., M) is called a first action. That is, if the azimuth angle of the underwater robot at time t = 0 is Ψ and the turning speed is Ψ ′, the position (Xob (j) + Dxa (s, a), Job (j) + Dya (s, a), DΨ + DΨ, DΨ A state s (obj, a ₁ ^(m) ) including “+ DΨ ′) is obtained. Since the processes of the transition destination prediction unit 213, the ocean current difference measurement unit 6, and the position measurement unit 7 are the same as those in the conventional example, the description thereof is omitted.

障害物回避計算部４０の最大値抽出部４０３は、遷移先の状態ｓ（ｏｂｊ，ａ_１ ^（ｍ））において、水中ロボットが各行動ａ_２ ^（ｎ）（ｎ＝１，…，Ｎ）を取ったときの報酬Ｑ（ｓ（ｏｂｊ，ａ_１ ^（ｍ）），ａ_２ ^（ｎ））の最大値Ｑｏｂｍａｘ（ｏｂｊ，ａ_１ ^（ｍ））を求める。行動ａ_２ ^（ｎ）（ｎ＝１，…，Ｎ）を第二の行動と呼ぶ。すなわち、遷移先の状況ｓ（ｏｂｊ，ａ_１ ^（ｍ））において、水中ロボットが各行動ａ_２ ^（ｎ）を取ったときに報酬Ｑ（ｓ（ｏｂｊ，ａ_１ ^（ｍ）），ａ_２ ^（ｎ））が定まるが、これらの各行動ａ_２ ^（ｎ）ごとに求まった報酬Ｑ（ｓ（ｏｂｊ，ａ_１ ^（ｍ）），ａ_２ ^（ｎ））のうち、最大のものＱｏｂｍａｘ（ｏｂｊ，ａ_１ ^（ｍ））を求める。Ｑｏｂｍａｘ（ｏｂｊ，ａ_１ ^（ｍ））は、最大値選択部４０４とリスク計算部４０５に出力される。
相対位置決定部４０１と遷移先予測部２１３と最大値抽出部４０３は、上記の処理を行うことにより、ｏｂｊとａ_１ ^（ｍ）のすべての組について、Ｑｏｂｍａｘ（ｏｂｊ，ａ_１ ^（ｍ））を求める。
障害物回避計算部４０の最大値選択部４０４は、各障害物の位置ｏｂ（ｊ）ごとに、Ｑｏｂｍａｘ（ｏｂｊ，ａ_１ ^（ｍ））を最大にする行動ａ_１ ^（ｍ）を選択する。選択されたａ_１ ^（ｍ）は、ａｍａｘ（ｊ）としてリスク計算部４０５に出力される。 The maximum value extraction unit 403 of the obstacle avoidance calculation unit 40 causes the underwater robot to perform each action a ₂ ⁽ⁿ⁾ (n = 1,..., N) in the transition destination state s (obj, a ₁ ^(m) ). The maximum value Qobmax (obj, a ₁ ^(m) ) of the reward Q (s (obj, a ₁ ^(m) ), a ₂ ⁽ⁿ⁾ ) when it is taken is obtained. Action a ₂ ⁽ⁿ⁾ (n = 1,..., N) is referred to as a second action. That is, in the situation s (obj, a ₁ ^(m) ) of the transition destination, when the underwater robot takes each action a ₂ ⁽ⁿ⁾ , reward Q (s (obj, a ₁ ^(m) ), a ₂ ^{( n)} ) is determined, but among the rewards Q (s (obj, a ₁ ^(m) ), a ₂ ⁽ⁿ⁾ ) determined for each of these actions a ₂ ⁽ⁿ⁾ , the largest one Q maxmax (obj, a ₁ ^(m) ) is obtained. Qobmax (obj, a ₁ ^(m) ) is output to the maximum value selection unit 404 and the risk calculation unit 405.
The relative position determination unit 401, the transition destination prediction unit 213, and the maximum value extraction unit 403 perform the above processing, so that all the pairs of obj and a ₁ ^(m) are subjected to Qobmax (obj, a ₁ ^(m) ). Ask for.
The maximum value selection unit 404 of the obstacle avoidance calculation unit 40 selects an action a ₁ ^(m) that maximizes Qobmax (obj, a ₁ ^(m) ) for each obstacle position ob (j). The selected a ₁ ^(m) is output to the risk calculation unit 405 as amax (j).

リスク計算部４０５は、障害物の位置ｏｂ（ｊ）（ｊ＝１，２，…，Ｊ）を、ａｍａｘ（ｊ）の値が同じもの同士でグループ分けする。そして、それぞれのグループ内における、Ｑｏｂｍａｘ（ｏｂｊ，ａｍａｘ（ｊ））の最大値をＱｒｉｓｋ（ｓ，ａ_１ ^（ｎ））とする。すなわち、
リスク計算部４０５は、
Ｑｒｉｓｋ（ｓ，ａ_１ ^（ｍ））＝ｍａｘ｛Ｑｏｂｍａｘ（ｏｂｊ，ａ_１ ^（ｍ））｜ａｍａｘ（ｊ）＝ａ_１ ^（ｍ）｝
を計算する。
Ｑｒｉｓｋ（ｓ，ａ_１ ^（ｍ））は、行動ａ_１ ^（ｍ）を選択した水中ロボットが、将来的に、障害物にぶつかる可能性の大きさを示す指標となる。なぜなら、上記したように、価値関数Ｑ（ｓ，ａ）は、状態ｓにある自機が、価値関数Ｑ（ｓ，ａ）を最大にする行動ａを選択することにより、原点に最も効率良く近づくことができるように設計されており、かつ、上記相対位置決定部４０１の処理により、原点が障害物の位置ｏｂ（ｊ）に置き換えられているためである。各行動ａ_１ ^（ｍ）ごとに求まったＱｒｉｓｋ（ｓ，ａ_１ ^（ｍ））は、行動選択部５０に出力される。
なお、最大値抽出部４０３に替えて、以下に説明する平均値抽出部４０６を設けても良い。 The risk calculation unit 405 groups obstacle positions ob (j) (j = 1, 2,..., J) with the same value of amax (j). Then, the maximum value of Qobmax (obj, amax (j)) in each group is defined as Qrisk (s, a ₁ ⁽ⁿ⁾ ). That is,
The risk calculator 405
Qrisk (s, a ₁ ^(m) ) = max {Qobmax (obj, a ₁ ^(m) ) | amax (j) = a ₁ ^(m) }
Calculate
Qrisk (s, a ₁ ^(m) ) is an index indicating the magnitude of the possibility that the underwater robot that has selected the action a ₁ ^(m) will hit an obstacle in the future. Because, as described above, the value function Q (s, a) is most efficiently at the origin by selecting the action a that maximizes the value function Q (s, a). This is because the origin is replaced with the position ob (j) of the obstacle by the processing of the relative position determination unit 401. Qrisk (s, a ₁ ^(m) ) obtained for each action a ₁ ^(m) is output to the action selection unit 50.
In place of the maximum value extraction unit 403, an average value extraction unit 406 described below may be provided.

障害物回避計算部４０の平均値抽出部４０６は、遷移先の状態ｓｏｂ（ｊ，ａ_１ ^（ｍ））において、水中ロボットが各行動ａ_２ ^（ｎ）（ｎ＝１，…，Ｎ）を取ったときの報酬Ｑ（ｓ（ｏｂｊ，ａ_１ ^（ｍ）），ａ_２ ^（ｎ））の平均値Ｑａｖｅ（ｏｂｊ，ａ_１ ^（ｍ））を求める。すなわち、遷移先の状況ｓｏｂ（ｊ，ａ_１ ^（ｍ））において、水中ロボットが各行動ａ_２ ^（ｎ）を取ったときに報酬Ｑ（ｓ（ｏｂｊ，ａ_１ ^（ｍ）），ａ_２ ^（ｎ））が定まるが、これらの各行動ａ_２ ^（ｎ）ごとに求まった報酬Ｑ（ｓ（ｏｂｊ，ａ_１ ^（ｍ）），ａ_２ ^（ｎ））の平均値Ｑａｖｅ（ｏｂｊ，ａ_１ ^（ｍ））を求める。計算された平均値Ｑａｖｅ（ｏｂｊ，ａ_１ ^（ｍ））は、最大値選択部４０４とリスク計算部４０５に出力される。
この場合には、最大値選択部４０４は、各障害物の位置ｏｂ（ｊ）ごとに、Ｑｏｂｍａｘ（ｏｂｊ，ａ_１ ^（ｍ））ではなく、Ｑａｖｅ（ｏｂｊ，ａ_１ ^（ｍ））を最大にする行動ａ_１ ^（ｍ）を選択する。選択されたａ_１ ^（ｍ）は、ａｍａｘ（ｊ）としてリスク計算部４０５に出力される。
リスク計算部４０５は、障害物の位置ｏｂ（ｊ）（ｊ＝１，２，…，Ｊ）を、ａｍａｘ（ｊ）の値が同じもの同士でグループ分けする。そして、それぞれのグループ内における、Ｑａｖｅ（ｏｂｊ，ａｍａｘ（ｊ））の最大値をＱｒｉｓｋ（ｓ，ａ_１ ^（ｎ））とする。すなわち、
リスク計算部４０５は、
Ｑｒｉｓｋ（ｓ，ａ_１ ^（ｍ））＝ｍａｘ｛Ｑｏｂａｖｅ（ｏｂｊ，ａ_１ ^（ｍ））｜ａｍａｘ（ｊ）＝ａ_１ ^（ｍ）｝
を計算する。 The average value extraction unit 406 of the obstacle avoidance calculation unit 40 causes the underwater robot to perform each action a ₂ ⁽ⁿ⁾ (n = 1,..., N) in the transition destination state sob (j, a ₁ ^(m) ). reward when taken _{^{Q (s (obj, a 1}} (m)), a 2 (n)) obtaining an average value Qave of _{^{(obj, a 1 (m)}} ). That is, in the transition destination situation sob (j, a ₁ ^(m) ), when the underwater robot takes each action a ₂ ⁽ⁿ⁾ , reward Q (s (obj, a ₁ ^(m) ), a ₂ ^{( n))} but is determined, the average value of each of these actions _a ^{2 (n)} by the Motoma' compensation _{^{Q (s (obj, a 1}} (m)), a 2 (n)) Qave (obj, a 1 ( ^m) Obtain). The calculated average value Qave (obj, a ₁ ^(m) ) is output to the maximum value selection unit 404 and the risk calculation unit 405.
In this case, the maximum value selection unit 404 maximizes Qave (obj, a ₁ ^(m) ) instead of Qobmax (obj, a ₁ ^(m) ) for each obstacle position ob (j). The action a ₁ ^(m) to be selected is selected. The selected a ₁ ^(m) is output to the risk calculation unit 405 as amax (j).
The risk calculation unit 405 groups obstacle positions ob (j) (j = 1, 2,..., J) with the same value of amax (j). The maximum value of Qave (obj, amax (j)) in each group is defined as Qrisk (s, a ₁ ⁽ⁿ⁾ ). That is,
The risk calculator 405
Qrisk (s, a ₁ ^(m) ) = max {Qaveve (obj, a ₁ ^(m) ) | amax (j) = a ₁ ^(m) }
Calculate

＜ステップＳ５＞
行動選択部５０は、軌道追従計算部３０が求めたａ_１’，ａ_２’，…，ａ_Ｍ’と、障害物回避計算部４０が求めたＱｒｉｓｋ（ｓ，ａ_１ ^（ｍ））を利用して、最適な行動を決定する。図１０は、行動選択部５０の処理を例示した図である。
ここで、Ｑｔｈｒｅｓｈを水中ロボットの安全性を保障する一定の閾値とする。すなわち、Ｑｒｉｓｋ（ｓ，ａ）＜Ｑｔｈｒｅｓｈであれば、その行動ａを取る水中ロボットが障害物にぶつからないことが保障される。例えば、ｒを水中ロボットが目標位置に着いたときに得られる報酬、γを割引率、ｎを行動ステップの数とすると、Ｑｔｈｒｅｓｈとしては、ｎ行動ステップ後にロボットが障害物にぶつかる場合のＱ値の値にすることができる。すなわち、Ｑｔｈｒｅｓｈ＝ｒ×γ^ｎにすることができる。ここで、ｎは、ロボットの旋回半径を考慮して、２〜４の値にすると望ましい。
行動選択部５０は、まず、Ｑｒｉｓｋ（ｓ，ａ_１’）とＱｔｈｒｅｓｈの大小関係を比較する（ステップＳ５１）。その結果、Ｑｒｉｓｋ（ｓ，ａ_１’）＜Ｑｔｈｒｅｓｈであれば、行動ａ_１’を最適な行動として選択する（ステップＳ５１’）。Ｑｒｉｓｋ（ｓ，ａ_１’）＞Ｑｔｈｒｅｓｈであれば、Ｑｒｉｓｋ（ｓ，ａ_２’）とＱｔｈｒｅｓｈの大小関係を比較する（ステップＳ５２）。その結果、Ｑｒｉｓｋ（ｓ，ａ_２’）＜Ｑｔｈｒｅｓｈであれば、行動ａ_２’を最適な行動として選択する（ステップＳ５２’）。Ｑｒｉｓｋ（ｓ，ａ_２’）＞Ｑｔｈｒｅｓｈであれば、Ｑｒｉｓｋ（ｓ，ａ_３’）とＱｔｈｒｅｓｈの大小関係を比較する（ステップＳ５３）。その結果、Ｑｒｉｓｋ（ｓ，ａ_３’）＜Ｑｔｈｒｅｓｈであれば、行動ａ_３’を最適な行動として選択する（ステップＳ５３’）。Ｑｒｉｓｋ（ｓ，ａ_３’）＞Ｑｔｈｒｅｓｈであれば、Ｑｒｉｓｋ（ｓ，ａ_４’）とＱｔｈｒｅｓｈの大小関係を比較する。 <Step S5>
The action selection unit 50 uses a ₁ ′, a ₂ ′,..., A _M ′ obtained by the trajectory tracking calculation unit 30 and Qrisk (s, a ₁ ^(m) ) obtained by the obstacle avoidance calculation unit 40. And determine the best behavior. FIG. 10 is a diagram illustrating processing of the action selection unit 50.
Here, let Qthresh be a certain threshold value that ensures the safety of the underwater robot. That is, if Qrisk (s, a) <Qthresh, it is ensured that the underwater robot taking the action a does not hit an obstacle. For example, if r is the reward obtained when the underwater robot reaches the target position, γ is the discount rate, and n is the number of action steps, Qthresh is the Q value when the robot hits an obstacle after n action steps. The value can be That is, it is possible to Qthresh = r × γ ^n. Here, n is preferably set to a value of 2 to 4 in consideration of the turning radius of the robot.
The action selection unit 50 first compares the magnitude relationship between Qrisk (s, a ₁ ′) and Qthresh (step S51). As a result, if Qrisk (s, a ₁ ′) <Qthresh, the behavior a ₁ ′ is selected as the optimum behavior (step S51 ′). If Qrisk (s, a ₁ ′)> Qthresh, the magnitude relationship between Qrisk (s, a ₂ ′) and Qthresh is compared (step S52). As a result, if Qrisk (s, a ₂ ′) <Qthresh, the action a ₂ ′ is selected as the optimum action (step S52 ′). If Qrisk (s, a ₂ ′)> Qthresh, the magnitude relationship between Qrisk (s, a ₃ ′) and Qthresh is compared (step S53). As a result, if Qrisk (s, a ₃ ′) <Qthresh, the action a ₃ ′ is selected as the optimum action (step S53 ′). If Qrisk (s, a ₃ ′)> Qthresh, the magnitude relationship between Qrisk (s, a ₄ ′) and Qthresh is compared.

上記の処理を、行動が選択されるまで、又は、Ｑｒｉｓｋ（ｓ，ａ_Ｍ’）＞Ｑｔｈｒｅｓｈと判断されるまで繰り返す（ステップＳ５Ｍ）。
このように、目標軌道に到達する可能性が高い行動ａ_１’，ａ_２’，…，ａ_Ｍ’の順番で、その行動を取ったときの障害物へのぶつかりやすさＱｒｉｓｋ（ｓ，ａ）が、水中ロボットの安全性を保障する一定の閾値を下回っているかどうかを検証することにより、障害物にぶつからない行動の中で最も軌道追従性の高い行動を選択することができる。 The above processing is repeated until an action is selected or until it is determined that Qrisk (s, a _M ′)> Qthresh (step S5M).
In this way, the risk of hitting an obstacle when taking the actions in the order of actions a ₁ ′, a ₂ ′,..., A _M ′ that are likely to reach the target trajectory Qrisk (s, a ) Is below a certain threshold that ensures the safety of the underwater robot, it is possible to select the action with the highest trajectory tracking ability among the actions that do not hit the obstacle.

＜ステップＳ６＞
フィードバック制御部６０は、水中ロボットが、行動選択部５０が選択した行動ａに従った動作をするように、舵の切り角δや、推進気力器力Ｍｔｈを制御する。
以上の処理を、行動単位時間Ｔ単位ごとに繰り返すことにより、未知海流外乱と未知障害物が存在する中での水中ロボットの障害物回避制御が可能となる。
以上が、本発明による水中ロボットの動作制御装置の概要である。
水中ロボットの動作制御装置の処理機能をコンピュータによって実現することができる。この場合、水中ロボットの動作制御装置の処理機能の内容はプログラムによって記述される。そして、このプログラムを、図１９に示すようなコンピュータで実行することにより、例えば、図１に示す水中ロボットの動作制御装置の各処理機能がコンピュータ上で実現される。
この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 <Step S6>
The feedback control unit 60 controls the turning angle δ of the rudder and the propulsion aerodynamic force Mth so that the underwater robot operates according to the action a selected by the action selection unit 50.
By repeating the above processing every action unit time T, it is possible to perform obstacle avoidance control of the underwater robot in the presence of unknown ocean current disturbance and unknown obstacle.
The above is the outline of the motion control device for the underwater robot according to the present invention.
The processing function of the motion control device of the underwater robot can be realized by a computer. In this case, the content of the processing function of the motion control device of the underwater robot is described by a program. Then, by executing this program on a computer as shown in FIG. 19, for example, each processing function of the motion control device for the underwater robot shown in FIG. 1 is realized on the computer.
The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto-Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。
このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.
A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is provided for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、水中ロボットの動作制御装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。
以上の各実施形態の他、本発明である水中ロボットの動作制御方法、装置、プログラム及びその記録媒体は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 In this embodiment, the operation control device for the underwater robot is configured by executing a predetermined program on the computer. However, at least a part of these processing contents may be realized by hardware. Good.
In addition to the above-described embodiments, the operation control method, apparatus, program, and recording medium for the underwater robot according to the present invention are not limited to the above-described embodiments, and may be changed as appropriate without departing from the spirit of the present invention. Is possible.

本発明による水中ロボットの動作制御部１０００の機能構成を例示する図。The figure which illustrates the function structure of the operation control part 1000 of the underwater robot by this invention. 本発明による水中ロボットの動作制御部１０００の処理を例示した図。The figure which illustrated processing of operation control part 1000 of an underwater robot by the present invention. 障害物の位置の求め方の説明を補助する図。The figure which assists description of how to obtain | require the position of an obstruction. 軌道生成部２０の機能構成を例示する図。The figure which illustrates the function structure of the orbital generation part 20. 軌道生成部２０の処理を例示する図。The figure which illustrates the process of the track generation part. 目標軌道位置の求め方の説明を補助する図。The figure which assists description of how to obtain | require a target track position. 確率Ｐ（ｓ，ｔ）の計算の説明を補助する図。The figure which assists description of calculation of the probability P (s, t). 軌道追従計算部３０の機能構成を例示した図。The figure which illustrated the functional composition of orbital follow-up calculation part 30. 障害物回避計算部４０の機能構成を例示した図。The figure which illustrated the functional structure of the obstacle avoidance calculation part 40. 行動選択部５０の処理を例示した図。The figure which illustrated the process of the action selection part 50. FIG. 本発明による動作制御の対象となるロボットを例示した図。The figure which illustrated the robot used as the object of operation control by the present invention. 時間に応じて変化する目標旋回速度の様子を例示した図。The figure which illustrated the mode of the target turning speed which changes according to time. 従来例による自律ロボットの動作制御装置の機能構成を例示した図。The figure which illustrated the functional structure of the operation control apparatus of the autonomous robot by a prior art example. ロボットの位置の変位量の説明を補助する図。The figure which assists description of the displacement amount of the position of a robot. 状態遷移確率の計算の説明を補助する図。The figure which assists description of calculation of a state transition probability. 状態ｓと遷移先の状態ｓ’の傾斜角差を表わす図。The figure showing the inclination-angle difference of state s and state s' of a transition destination. 従来例による動作制御の対象となるロボットを例示した図。The figure which illustrated the robot used as the object of operation control by a prior art example. ダイナミックプログラミング部８０の機能構成を例示する図。3 is a diagram illustrating a functional configuration of a dynamic programming unit 80. FIG. 本発明による水中ロボットの動作制御装置をコンピュータで実施するときの機能構成を例示した図。The figure which illustrated the functional structure when implementing the operation control apparatus of the underwater robot by this invention with a computer.

Claims

原点を目標到達位置とし、水中ロボットが原点に到達するときの報酬＞その他の場合の報酬として、マルコフ状態遷移モデルにおける動作計画法に基づいて生成された価値関数が価値関数保存手段に保存されており、
環境モデル生成手段が、新たな障害物を検出するごとに、その障害物の位置を求めて、障害物情報保存手段に格納する環境モデル生成ステップと、
軌道生成手段が、上記障害物情報保存手段から読み出した障害物にぶつからずに目標到達位置に到達するまでの、各時刻ステップにおける目標軌道位置を生成して、目標軌道位置保存手段に格納する軌道生成ステップと、
軌道追従計算手段が、水中ロボットが各行動を取ったときに、上記目標軌道位置保存手段から読み出した目標軌道位置にどの程度近づくことができるのかを数値で評価し、その評価値により、各行動に優先順位を付ける軌道追従計算ステップと、
障害物回避計算手段が、水中ロボットが各行動を取ったときの障害物へのぶつかりやすさを計算する障害物回避計算ステップと、
行動選択手段が、軌道追従計算ステップで付けられた優先順位が高い行動の順番で、上記障害物回避計算ステップで求められた、その行動を取ったときの障害物へのぶつかりやすさが一定の閾値よりも小さいかどうかを順次判定し、小さいと判定された場合にはその行動を選択する処理を、小さいと判定される行動が見つかるまで繰り返す行動選択ステップと、
フィードバック制御手段が、行動選択ステップで選択された行動に従って、水中ロボットが動作をするように制御するフィードバック制御ステップと、
を有する水中ロボットの動作制御方法。 The value function generated based on the motion planning method in the Markov state transition model is stored in the value function storage means as a reward when the origin is the target arrival position, and the reward when the underwater robot reaches the origin> other cases And
Each time the environmental model generation means detects a new obstacle, an environment model generation step for obtaining the position of the obstacle and storing it in the obstacle information storage means;
The trajectory generating means generates a target trajectory position at each time step until reaching the target arrival position without hitting the obstacle read from the obstacle information storage means, and stores the target trajectory position in the target trajectory position storage means Generation step;
The trajectory tracking calculation means evaluates numerically how close the target trajectory position read out from the target trajectory position storage means when the underwater robot takes each action, Orbit following calculation step to prioritize
An obstacle avoidance calculating means for calculating an obstacle avoidance calculating step for calculating the ease of hitting the obstacle when the underwater robot takes each action;
The action selection means is the order of actions with the highest priority given in the trajectory follow-up calculation step, and the likelihood of hitting the obstacle when taking the action determined in the obstacle avoidance calculation step is constant. An action selection step that sequentially determines whether or not the threshold is smaller than the threshold, and repeats the process of selecting the action when it is determined to be small until an action determined to be small is found,
A feedback control step in which the feedback control means controls the underwater robot to operate according to the behavior selected in the behavior selection step;
An underwater robot motion control method.

請求項１に記載の水中ロボットの動作制御方法において、
上記軌道生成ステップは、
初期値設定手段が、水中ロボットの現在位置が含まれる格子に水中ロボットが時刻ステップ０において存在する確率を１とし、その他の格子に存在する確率を０とする初期値設定ステップと、
存在確率計算手段が、水中ロボットが時刻ステップｔ−１において格子ｓの周りに存在する格子ｓ’に存在する確率に１／８を乗算したものを各格子ｓ’ごとに求め、この格子ｓ’ごとに求まった乗算結果を加算することによって、水中ロボットが時刻ステップｔにおいて格子ｓに存在する確率を計算する存在確率計算ステップと、
確率補正手段が、上記存在確率計算ステップで求まった、上記障害物情報保存手段から読み出した各障害物が存在している各格子についての存在確率を０とする確率補正ステップと、
制御手段が、上記存在確率計算ステップで求まった、目標到達位置を含む格子についての存在確率が０でない値になるまで、上記存在確率計算ステップと上記確率補正ステップの処理を繰り返すように制御する制御ステップと、
軌道決定手段が、上記存在確率計算ステップで求まった、目標到達位置を含む格子についての存在確率が０でない値になったときの時刻ステップをｔｄとし、時刻ステップｔｄにおける目標軌道位置τ（ｔｄ）とし、水中ロボットが時刻ステップｔ−１において目標到達位置τ（ｔ）の周りの格子に存在する確率を最も大きくする格子を目標軌道位置（τ−１）とする処理を繰り返すことによって、各時刻ステップにおける目標軌道位置を求めて目標軌道位置保存手段に格納する軌道決定ステップと、
から構成される、
ことを特徴とする水中ロボットの動作制御方法。 The operation control method of the underwater robot according to claim 1,
The trajectory generation step includes
An initial value setting step in which an initial value setting means sets the probability that an underwater robot exists in a grid including the current position of the underwater robot at time step 0 to 1 and sets the probability of existing in another grid to 0;
The existence probability calculating means obtains, for each lattice s ′, a value obtained by multiplying the probability that the underwater robot exists in the lattice s ′ existing around the lattice s at time step t−1 by 1/8, and this lattice s ′. An existence probability calculating step of calculating a probability that the underwater robot exists in the lattice s at time step t by adding the multiplication results obtained for each time;
A probability correction step in which the probability correction means determines the existence probability for each lattice in which each obstacle read from the obstacle information storage means is found in the existence probability calculation step;
Control for controlling the control means to repeat the processing of the existence probability calculation step and the probability correction step until the existence probability for the lattice including the target arrival position obtained in the existence probability calculation step becomes a non-zero value. Steps,
The trajectory determining means obtains the time step when the existence probability for the lattice including the target arrival position, which is obtained in the existence probability calculation step, becomes a non-zero value, td, and the target trajectory position τ (td) at the time step td. By repeating the process of setting the lattice that maximizes the probability that the underwater robot exists in the lattice around the target arrival position τ (t) at time step t−1 as the target trajectory position (τ−1), A trajectory determination step for obtaining the target trajectory position in the step and storing it in the target trajectory position storage means;
Composed of,
A motion control method for an underwater robot.

請求項１又は２に記載の水中ロボットの動作制御方法において、
上記軌道追従計算ステップは、
相対位置決定手段が、上記目標軌道位置保存手段から読み出した各目標軌道位置に対する水中ロボットの相対位置を各目標軌道位置ごとに求める相対位置決定ステップと、
遷移先予測手段が、上記相対位置決定ステップで求まった各相対位置に位置する水中ロボットが、第一の行動を取ったときの遷移先の状態を第一の各行動ごとに求める遷移先予測ステップと、
最大値抽出手段が、価値関数保存手段を参照して、上記遷移先予測ステップで求まった遷移先の状態にある水中ロボットが、第二の各行動を取ったときに与えられる価値関数の値の最大値を求める最大値抽出ステップと、
加算手段が、上記最大値抽出ステップで求まった最大値の、上記各目標軌道位置についての和を取る加算ステップと、
順序決定手段が、上記加算ステップで求まった加算値が大きい順に、上記第一の各行動に優先順位を付ける順序決定ステップと、
から構成される、
ことを特徴とする水中ロボットの動作制御方法。 In the operation control method of the underwater robot according to claim 1 or 2,
The trajectory tracking calculation step is as follows:
A relative position determining step for determining, for each target trajectory position, a relative position of the underwater robot with respect to each target trajectory position read from the target trajectory position storing means;
Transition destination prediction step in which the transition destination prediction means obtains the state of the transition destination for each first action when the underwater robot located at each relative position obtained in the relative position determination step takes the first action. When,
The maximum value extraction means refers to the value function storage means, and the value of the value function given when the underwater robot in the transition destination state determined in the transition destination prediction step takes each second action. A maximum value extraction step for obtaining a maximum value;
An adding step in which the adding means takes the sum of the maximum values obtained in the maximum value extracting step with respect to each target trajectory position;
An order determining step for prioritizing the first actions in descending order of the addition value obtained in the adding step;
Composed of,
A motion control method for an underwater robot.

請求項１又は２に記載の水中ロボットの動作制御方法において、
上記軌道追従計算ステップは、
相対位置決定手段が、上記目標軌道位置保存手段から読み出した各目標軌道位置に対する水中ロボットの相対位置を各目標軌道位置ごとに求める相対位置決定ステップと、
遷移先予測手段が、上記相対位置決定ステップで求まった各相対位置に位置する水中ロボットが、第一の行動を取ったときの遷移先の状態を第一の各行動ごとに求める遷移先予測ステップと、
平均値抽出手段が、価値関数保存手段を参照して、上記遷移先予測ステップで求まった遷移先の状態にある水中ロボットが、第二の各行動を取ったときに与えられる価値関数の値の平均値を求める平均値抽出ステップと、
加算手段が、上記平均値抽出ステップで求まった平均値の、上記各目標軌道位置についての和を取る加算ステップと、
順序決定手段が、上記加算ステップで求まった加算値が大きい順に、上記第一の各行動に優先順位を付ける順序決定ステップと、
から構成される、
ことを特徴とする水中ロボットの動作制御方法。 In the operation control method of the underwater robot according to claim 1 or 2,
The trajectory tracking calculation step is as follows:
A relative position determining step for determining, for each target trajectory position, a relative position of the underwater robot with respect to each target trajectory position read from the target trajectory position storing means;
Transition destination prediction step in which the transition destination prediction means obtains the state of the transition destination for each first action when the underwater robot located at each relative position obtained in the relative position determination step takes the first action. When,
The average value extraction means refers to the value function storage means, and the value function value given when the underwater robot in the transition destination state obtained in the transition destination prediction step takes each second action. An average value extraction step for obtaining an average value;
An adding step in which the adding means takes the sum of the average values obtained in the average value extracting step for each of the target trajectory positions;
An order determining step for prioritizing the first actions in descending order of the addition value obtained in the adding step;
Composed of,
A motion control method for an underwater robot.

請求項１から４の何れかに記載の水中ロボットの動作制御方法において、
上記障害物回避計算ステップは、
相対位置決定手段が、各障害物の位置に対する水中ロボットの相対位置を各障害物の位置ごとに求める相対位置決定ステップと、
遷移先予測手段が、上記相対位置決定ステップで求まった各相対位置に位置する水中ロボットが、第一の各行動を取ったときの遷移先の状態を第一の各行動ごとに求める遷移先予測ステップと、
最大値抽出手段が、価値関数保存手段を参照して、上記遷移先予測ステップで求まった遷移先の状態にある水中ロボットが、第二の各行動を取ったときに与えられる価値関数の値の最大値を求める最大値抽出ステップと、
最大値選択手段が、上記最大値抽出ステップで求まった最大値を最大にする第一の行動を各障害物の位置ごとに求める最大値選択ステップと、
リスク計算手段が、各障害物の位置を、最大値選択ステップで求まった第一の行動が同じもの同士でグループ分けし、上記各グループごとに最大値抽出ステップで求まった最大値の中で最も大きいものを選ぶことにより、水中ロボットが各行動を取ったときの障害物へのぶつかりやすさを計算するリスク計算ステップと、
から構成される、
ことを特徴とする水中ロボットの動作制御方法。 In the operation control method of the underwater robot according to any one of claims 1 to 4,
The obstacle avoidance calculation step is
A relative position determining step for obtaining a relative position of the underwater robot for each obstacle position for each obstacle position;
Transition destination prediction in which the transition destination prediction means obtains the state of the transition destination for each first action when the underwater robot located at each relative position obtained in the relative position determination step takes the first action. Steps,
The maximum value extraction means refers to the value function storage means, and the value of the value function given when the underwater robot in the transition destination state determined in the transition destination prediction step takes each second action. A maximum value extraction step for obtaining a maximum value;
A maximum value selection means for determining a first action for maximizing the maximum value obtained in the maximum value extraction step for each position of each obstacle; and
The risk calculation means divides the position of each obstacle into groups of the same first action determined in the maximum value selection step, and the largest value among the maximum values determined in the maximum value extraction step for each group. A risk calculation step for calculating the likelihood of hitting an obstacle when the underwater robot takes each action by selecting a larger one,
Composed of,
A motion control method for an underwater robot.

請求項１から４の何れかに記載の水中ロボットの動作制御方法において、
上記障害物回避計算ステップは、
相対位置決定手段が、各障害物の位置に対する水中ロボットの相対位置を各障害物の位置ごとに求める相対位置決定ステップと、
遷移先予測手段が、上記相対位置決定ステップで求まった各相対位置に位置する水中ロボットが、第一の各行動を取ったときの遷移先の状態を第一の各行動ごとに求める遷移先予測ステップと、
平均値抽出手段が、価値関数保存手段を参照して、上記遷移先予測ステップで求まった遷移先の状態にある水中ロボットが、第二の各行動を取ったときに与えられる価値関数の値の平均値を求める平均値抽出ステップと、
最大値選択手段が、上記平均値値抽出ステップで求まった平均値を最大にする第一の行動を各障害物の位置ごとに求める最大値選択ステップと、
リスク計算手段が、各障害物の位置を、最大値選択ステップで求まった第一の行動が同じもの同士でグループ分けし、上記各グループごとに最大値抽出ステップで求まった最大値の中で最も大きいものを選ぶことにより、水中ロボットが各行動を取ったときの障害物へのぶつかりやすさを計算するリスク計算ステップと、
から構成される、
ことを特徴とする水中ロボットの動作制御方法。 In the operation control method of the underwater robot according to any one of claims 1 to 4,
The obstacle avoidance calculation step is
A relative position determining step for obtaining a relative position of the underwater robot for each obstacle position for each obstacle position;
Transition destination prediction in which the transition destination prediction means obtains the state of the transition destination for each first action when the underwater robot located at each relative position obtained in the relative position determination step takes the first action. Steps,
The average value extraction means refers to the value function storage means, and the value function value given when the underwater robot in the transition destination state obtained in the transition destination prediction step takes each second action. An average value extraction step for obtaining an average value;
A maximum value selection means for determining a first action for maximizing the average value obtained in the average value extraction step for each obstacle position;
The risk calculation means divides the position of each obstacle into groups of the same first action determined in the maximum value selection step, and the largest value among the maximum values determined in the maximum value extraction step for each group. A risk calculation step for calculating the likelihood of hitting an obstacle when the underwater robot takes each action by selecting a larger one,
Composed of,
A motion control method for an underwater robot.

原点を目標到達位置とし、水中ロボットが原点に到達するときの報酬＞その他の場合の報酬として、マルコフ状態遷移モデルにおける動作計画法に基づいて生成された価値関数を保存する価値関数保存手段と、
新たな障害物を検出するごとに、その障害物の位置を求めて、障害物情報保存手段に格納する環境モデル生成手段と、
上記障害物情報保存手段から読み出した障害物にぶつからずに目標到達位置に到達するまでの、各時刻ステップにおける目標軌道位置を生成して、目標軌道位置保存手段に格納する軌道生成手段と、
水中ロボットが各行動を取ったときに、上記目標軌道位置保存手段から読み出した目標軌道位置にどの程度近づくことができるのかを数値で評価し、その評価値により、各行動に優先順位を付ける軌道追従計算手段と、
水中ロボットが各行動を取ったときの障害物へのぶつかりやすさを計算する障害物回避計算手段と、
軌道追従計算手段で付けられた優先順位の高い行動の順番で、上記障害物回避計算手段で求められた、その行動を取ったときの障害物へのぶつかりやすさが一定の閾値よりも小さいかどうかを判定し、小さいと判定された場合にはその行動を選択する処理を、小さいと判定される行動が見つかるまで繰り返す行動選択手段と、
行動選択手段で選択された行動に従って、水中ロボットが動作をするように制御するフィードバック制御手段と、
を有する水中ロボットの動作制御装置。 Value function storage means for storing the value function generated based on the motion planning method in the Markov state transition model, with the origin as the target arrival position and the reward when the underwater robot reaches the origin> the reward in other cases;
Each time a new obstacle is detected, the position of the obstacle is obtained and stored in the obstacle information storage means.
A trajectory generating means for generating a target trajectory position at each time step until reaching the target reaching position without hitting the obstacle read from the obstacle information storing means, and storing the target trajectory position in the target trajectory position storing means;
When an underwater robot takes each action, it evaluates numerically how much it can approach the target trajectory position read from the target trajectory position storage means, and the trajectory prioritizes each action based on the evaluation value Tracking calculation means;
Obstacle avoidance calculation means for calculating the ease of hitting an obstacle when the underwater robot takes each action;
In the order of actions with the highest priority given by the trajectory tracking calculation means, whether or not the obstacle avoidance calculation means obtained by the obstacle avoidance calculation means is less than a certain threshold A behavior selection means for repeating the process of selecting the behavior when it is determined to be small until the behavior determined to be small is found,
Feedback control means for controlling the underwater robot to operate according to the action selected by the action selection means;
A motion control apparatus for an underwater robot.

請求項１から６の何れかに記載の水中ロボットの動作制御方法の各ステップをコンピュータに実行させるための水中ロボットの動作制御プログラム。 An underwater robot operation control program for causing a computer to execute each step of the underwater robot operation control method according to claim 1.

請求項８記載の水中ロボットの動作制御プログラムを記録したコンピュータ読み取り可能な記録媒体。 A computer-readable recording medium on which the operation control program for the underwater robot according to claim 8 is recorded.