JP6644191B1

JP6644191B1 - Robot control device, robot control learning device, and robot control method

Info

Publication number: JP6644191B1
Application number: JP2019528938A
Authority: JP
Inventors: 高志南本; 佳太田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2018-12-26
Filing date: 2018-12-26
Publication date: 2020-02-12
Anticipated expiration: 2038-12-26
Also published as: JPWO2020136769A1; DE112018008159T5; WO2020136769A1; DE112018008159B4; TW202024832A

Abstract

ロボット制御装置（１００，１００ａ）は、ロボット（１０）のアーム（１１）の現在位置を示す現在位置情報を取得する現在位置取得部（１０５）と、アーム（１１）の目標位置を示す目標位置情報を取得する目標位置取得部（１０６）と、参照経路を示す参照経路情報を参照してアーム（１１）が参照経路を基にして移動していることを評価することにより報酬を算出する項を含む、報酬を算出するための演算式を用いて学習させたモデルを示すモデル情報と、現在位置取得部（１０５）が取得した現在位置情報と、目標位置取得部（１０６）が取得した目標位置情報とに基づいて目標位置情報が示す目標位置に向かってアーム（１１）を移動させるための制御内容を示す制御信号を生成する制御生成部（１０７，１０７ａ）と、を備えた。The robot controller (100, 100a) includes a current position acquisition unit (105) for acquiring current position information indicating a current position of the arm (11) of the robot (10), and a target position indicating a target position of the arm (11). A target position acquisition unit (106) for acquiring information; and a term for calculating a reward by evaluating that the arm (11) is moving based on the reference path with reference to reference path information indicating a reference path. , Model information indicating a model learned using an arithmetic expression for calculating a reward, current position information obtained by the current position obtaining unit (105), and a target obtained by the target position obtaining unit (106). A control generation section (107, 107a) for generating a control signal indicating control content for moving the arm (11) toward the target position indicated by the target position information based on the position information. .

Description

この発明は、ロボット制御装置、ロボット制御学習装置、及びロボット制御方法に関するものである。 The present invention relates to a robot control device, a robot control learning device, and a robot control method.

ロボットのアームが移動する経路を自動で生成する技術がある。 There is a technology for automatically generating a path along which a robot arm moves.

例えば、特許文献１には、レーザーレンジセンサで測定した障害物の距離情報をメモリに記憶する距離情報記憶部と、メモリに記憶された距離情報に基づき斥力を計算する斥力計算部と、計算された斥力を軸方向の成分に分離する斥力成分分離部と、分離された軸成分から軸毎に正方向の最大値と負方向の最大値とを抽出し、軸毎に抽出した正方向と負方向の最大値の和を計算し、計算結果を各軸の軸成分とした回避用斥力を生成する回避用斥力生成部とを含むように構成した障害物回避支援部を備えた障害物回避支援装置が開示されている。 For example, Patent Literature 1 discloses a distance information storage unit that stores distance information of an obstacle measured by a laser range sensor in a memory, a repulsion calculation unit that calculates a repulsion based on the distance information stored in the memory, A repulsive force component separating unit for separating the repulsive force into axial components, extracting a maximum value in a positive direction and a maximum value in a negative direction for each axis from the separated axial components, and extracting a positive direction and a negative value extracted for each axis. Obstacle avoidance support including an obstacle avoidance support unit configured to include a repulsion generation unit for avoidance that calculates the sum of the maximum values in the directions and generates a repulsion force for avoidance with the calculation result as an axis component of each axis An apparatus is disclosed.

特許第５５１００８１号Japanese Patent No. 5510081

従来技術によれば、斥力を発生させるポテンシャルを合成した結果、障害物付近のポテンシャルが相対的に下がる場合があり、障害物とロボットのアームとの間において引力が生じ、ロボットのアームが障害物に干渉してしまうという問題があった。
上述の問題点を解決するためには、連続した空間におけるあらゆる方向において、ロボットのアームが移動する経路を探索する必要がある。
しかしながら、連続した空間におけるあらゆる方向において、ロボットのアームが移動する経路を探索する場合、演算量が増え、ロボットのアームが移動する経路を決定するまでに長い時間を要してしまう。According to the related art, as a result of synthesizing a potential for generating a repulsive force, a potential near an obstacle may be relatively reduced, and an attractive force is generated between the obstacle and the robot arm. There was a problem that it interfered with.
In order to solve the above-mentioned problems, it is necessary to search for a path along which the robot arm moves in all directions in a continuous space.
However, when searching for a path on which the robot arm moves in all directions in a continuous space, the amount of calculation increases, and it takes a long time to determine the path on which the robot arm moves.

この発明は、上述の問題点を解決するためのもので、演算量を減らしつつ、ロボットのアームが不連続な動作を行うことないようにロボットを制御することができるロボット制御装置を提供することを目的としている。 An object of the present invention is to solve the above-described problems, and to provide a robot control device capable of controlling a robot so that the robot arm does not perform a discontinuous operation while reducing the amount of calculation. It is an object.

この発明に係るロボット制御装置は、ロボットのアームの現在位置を示す現在位置情報を取得する現在位置取得部と、アームの目標位置を示す目標位置情報を取得する目標位置取得部と、参照経路を示す参照経路情報を参照してアームが参照経路を基にして移動していることを評価することにより報酬を算出する項を含む、報酬を算出するための演算式を用いて学習させたモデルを示すモデル情報と、現在位置取得部が取得した現在位置情報と、目標位置取得部が取得した目標位置情報とに基づいて目標位置情報が示す目標位置に向かってアームを移動させるための制御内容を示す制御信号を生成する制御生成部と、を備え、演算式は、アームが参照経路を基にして移動していることを評価することにより報酬を算出する項として、ロボットのアームの位置と参照経路との間の距離を評価することにより報酬を算出する項を含むように構成した。 A robot control device according to the present invention includes a current position acquisition unit that acquires current position information indicating a current position of an arm of a robot, a target position acquisition unit that acquires target position information indicating a target position of an arm, and a reference path. A model learned using an arithmetic expression for calculating a reward, including a term for calculating a reward by evaluating that the arm is moving based on the reference route with reference to the reference route information shown. Model information, the current position information acquired by the current position acquisition unit, and the control content for moving the arm toward the target position indicated by the target position information based on the target position information acquired by the target position acquisition unit. and a control generator for generating a control signal indicating, arithmetic expression, as terms for calculating the compensation by evaluating the arm is moving based on the reference path, the robot And configured to include a section for calculating the compensation by evaluating the distance between the position and the reference path of the arm.

この発明によれば、演算量を減らしつつ、ロボットのアームが不連続な動作を行うことないようにロボットを制御することができる。 According to the present invention, it is possible to control the robot such that the arm of the robot does not perform a discontinuous operation while reducing the amount of calculation.

図１は、実施の形態１に係るロボット制御装置が適用されたロボット制御システムの構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of a configuration of a robot control system to which the robot control device according to the first embodiment is applied. 実施の形態１に係るロボット制御装置及びロボット制御システムの要部の構成の一例を示すブロック図である。2 is a block diagram illustrating an example of a configuration of a main part of the robot control device and the robot control system according to Embodiment 1. FIG. 図３は、実施の形態１に係る仮想空間画像生成部が生成した仮想空間画像情報が示す画像の一例を示す図である。FIG. 3 is a diagram illustrating an example of an image indicated by virtual space image information generated by the virtual space image generation unit according to the first embodiment. 図４Ａ及び図４Ｂは、実施の形態１に係るロボット制御装置の要部のハードウェア構成の一例を示す図である。4A and 4B are diagrams illustrating an example of a hardware configuration of a main part of the robot control device according to the first embodiment. 図５は、実施の形態１に係るロボット制御装置の処理の一例を説明するフローチャートである。FIG. 5 is a flowchart illustrating an example of processing of the robot control device according to the first embodiment. 図６は、実施の形態１に係るロボット制御学習装置及びロボット制御学習システムの要部の構成の一例を示すブロック図である。FIG. 6 is a block diagram illustrating an example of a configuration of a main part of the robot control learning device and the robot control learning system according to Embodiment 1. 図７は、実施の形態１に係るロボットのアームの状態が状態Ｓ_ｔであるときにロボットのアームが取り得る行動ａ_ｔから、行動ａ^＊を選択する一例を示す図である。7, the action a _t the arm of the robot can take when the state of the arm of the robot according to the first embodiment is in state S _t, is a diagram illustrating an example of selecting an action a ^*. 図８は、実施の形態１に係るロボット制御学習装置の処理の一例を説明するフローチャートである。FIG. 8 is a flowchart illustrating an example of a process of the robot control learning device according to the first embodiment. 図９Ａ、図９Ｂ、及び図９Ｃは、アームが目標位置に到達するまでに移動した経路の一例を示した図である。9A, 9B, and 9C are diagrams illustrating an example of a path on which the arm has moved until reaching the target position. 図１０は、実施の形態２に係るロボット制御装置及びロボット制御システムの要部の構成の一例を示すブロック図である。FIG. 10 is a block diagram illustrating an example of a configuration of a main part of the robot control device and the robot control system according to the second embodiment. 図１１は、実施の形態２に係るロボット制御装置の処理の一例を説明するフローチャートである。FIG. 11 is a flowchart illustrating an example of processing of the robot control device according to the second embodiment.

以下、この発明の実施の形態について、図面を参照しながら詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

実施の形態１．
図１及び図２を参照して実施の形態１に係るロボット制御装置１００の要部の構成について説明する。
図１は、実施の形態１に係るロボット制御装置１００が適用されたロボット制御システム１の構成の一例を示す図である。
図２は、実施の形態１に係るロボット制御装置１００及びロボット制御システム１の要部の構成の一例を示すブロック図である。
ロボット制御装置１００は、ロボット制御システム１に適用される。
ロボット制御システム１は、ロボット制御装置１００、ロボット１０、ネットワーク３０、記憶装置４０、及び撮像装置５０を備える。Embodiment 1 FIG.
The configuration of the main part of the robot control device 100 according to the first embodiment will be described with reference to FIGS.
FIG. 1 is a diagram illustrating an example of a configuration of a robot control system 1 to which a robot control device 100 according to Embodiment 1 is applied.
FIG. 2 is a block diagram illustrating an example of a configuration of a main part of the robot control device 100 and the robot control system 1 according to the first embodiment.
The robot control device 100 is applied to the robot control system 1.
The robot control system 1 includes a robot control device 100, a robot 10, a network 30, a storage device 40, and an imaging device 50.

ロボット１０は、例えば、ロボット１０が備えるアーム１１を、アーム１１の関節をモータ１２−１，１２−２により制御することで移動させることのより、作業環境２０において所定の作業を行うアーム式ロボット装置である。
ロボット１０は、アーム１１、モータ１２−１，１２−２、モータ制御手段１３、回転センサ１４−１，１４−２、及び接触センサ１５を備える。The robot 10 is, for example, an arm-type robot that performs a predetermined operation in a work environment 20 by moving an arm 11 included in the robot 10 by controlling the joints of the arm 11 by motors 12-1 and 12-2. Device.
The robot 10 includes an arm 11, motors 12-1 and 12-2, a motor control unit 13, rotation sensors 14-1 and 14-2, and a contact sensor 15.

アーム１１は、複数の関節を有する。実施の形態１では、アーム１１は、２個の関節１１−１，１１−２を有するものとして説明する。アーム１１が有する関節の数は２個とは限らず、３個以上の関節を有する者であっても良い。例えば、アーム１１を３次元で動作させることにより作業されるものである場合、アーム１１は、ロボット工学上、少なくとも６個の関節を有する必要がある。 The arm 11 has a plurality of joints. In the first embodiment, the arm 11 will be described as having two joints 11-1 and 11-2. The number of joints of the arm 11 is not limited to two, and may be a person having three or more joints. For example, when the operation is performed by operating the arm 11 in three dimensions, the arm 11 needs to have at least six joints in terms of robotics.

モータ１２−１，１２−２は、それぞれ、アーム１１が有する２個の関節１１−１，１１−２を動かすためのものである。
モータ制御手段１３は、ロボット制御装置１００が出力した制御信号に基づいてモータ１２−１，１２−２を制御する。具体的には、モータ制御手段１３は、ロボット制御装置１００が出力した制御信号に基づいてモータ１２−１，１２−２を動作されるための電気信号を生成し、生成した電気信号をモータ１２−１，１２−２に出力することによりモータ１２−１，１２−２を制御する。
回転センサ１４−１，１４−２は、それぞれ、モータ１２−１，１２−２の回転量又は回転速度等のモータ１２−１，１２−２の回転状況を示す回転状況信号をロボット制御装置１００に出力する。
接触センサ１５は、作業環境２０における障害物とアーム１１とが接触したか否かを示す接触信号をロボット制御装置１００に出力する。The motors 12-1 and 12-2 are for moving two joints 11-1 and 11-2 of the arm 11, respectively.
The motor control means 13 controls the motors 12-1 and 12-2 based on the control signal output from the robot control device 100. Specifically, the motor control unit 13 generates an electric signal for operating the motors 12-1 and 12-2 based on the control signal output from the robot control device 100, and outputs the generated electric signal to the motor 12. -1 and 12-2 to control the motors 12-1 and 12-2.
The rotation sensors 14-1 and 14-2 output rotation status signals indicating the rotation status of the motors 12-1 and 12-2 such as the rotation amounts or the rotation speeds of the motors 12-1 and 12-2, respectively. Output to
The contact sensor 15 outputs a contact signal indicating whether or not the obstacle 11 in the work environment 20 has contacted the arm 11 to the robot control device 100.

ネットワーク３０は、例えば、ＣＡＮ（ＣｏｎｔｒｏｌｌｅｒＡｒｅａＮｅｔｗｏｒｋ）、若しくはＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）等の有線ネットワーク、又は、無線ＬＡＮ、若しくはＬＴＥ（ＬｏｎｇＴｅｒｍＥｖｏｌｕｔｉｏｎ）（登録商標）等の無線ネットワークにより構成される通信手段である。 The network 30 is a communication configured by, for example, a wired network such as a CAN (Controller Area Network) or a LAN (Local Area Network), or a wireless LAN or a wireless network such as an LTE (Long Term Evolution) (registered trademark). Means.

記憶装置４０は、ロボット制御装置１００が、目標位置に向かってアーム１１を移動させるための制御内容を示す制御信号を生成するために必要な情報を記憶するためのものである。ロボット制御装置１００が、制御信号を生成するために必要な情報は、例えば、モデル情報である。記憶装置４０は、例えば、ハードディスクドライブ又はＳＤメモリーカード等の不揮発性記憶媒体を有し、ロボット制御装置１００が制御信号を生成するために必要な情報を不揮発性記憶媒体に記憶する。 The storage device 40 is for storing information necessary for the robot control device 100 to generate a control signal indicating the control content for moving the arm 11 toward the target position. Information necessary for the robot control device 100 to generate the control signal is, for example, model information. The storage device 40 has, for example, a nonvolatile storage medium such as a hard disk drive or an SD memory card, and stores information necessary for the robot controller 100 to generate a control signal in the nonvolatile storage medium.

撮像装置５０は、例えば、デジタルスチルカメラ又はデジタルビデオカメラ等の撮影した画像を画像情報として外部に出力可能な画像入力装置である。撮像装置５０は、深度カメラ又はステレオカメラ等の物体の位置又は領域を３次元空間において特定可能な画像情報を生成する画像入力装置であっても良い。また、撮像装置５０は、１台とは限らず、２台以上であっても良い。また、撮像装置５０は、ロボット１０及び作業環境２０とは離れた位置に設置されたものであっても、ロボット１０の任意の部位に固定されて設定されたものであっても良い。実施の形態１では、撮像装置５０は、ロボット１０及び作業環境２０とは離れた位置に設置された１台のデジタルスチルカメラであるものとして説明する。実施の形態１に係る撮像装置５０は、作業環境２０を撮影し、撮影した画像を画像情報として出力する。ロボット制御装置１００は、撮像装置５０が出力する画像情報と、ロボット１０に対する撮像装置５０の相対位置を示す情報とを用いて、ロボット１０に対する作業環境２０の相対位置、特に、ロボット１０に対する作業環境２０における障害物の位置又は領域の相対位置を特定する。 The imaging device 50 is, for example, an image input device that can output a captured image from a digital still camera, a digital video camera, or the like as image information to the outside. The imaging device 50 may be an image input device that generates image information capable of specifying a position or a region of an object in a three-dimensional space, such as a depth camera or a stereo camera. The number of the imaging devices 50 is not limited to one, but may be two or more. The imaging device 50 may be installed at a position separated from the robot 10 and the work environment 20 or may be fixedly set at any part of the robot 10. In the first embodiment, the description will be given on the assumption that the imaging device 50 is a single digital still camera installed at a position apart from the robot 10 and the work environment 20. The imaging device 50 according to the first embodiment captures the work environment 20 and outputs the captured image as image information. The robot control device 100 uses the image information output by the imaging device 50 and the information indicating the relative position of the imaging device 50 with respect to the robot 10 to use the relative position of the work environment 20 with respect to the robot 10, in particular, the work environment with respect to the robot 10. The position of the obstacle or the relative position of the area at 20 is specified.

ロボット１０に備えられたモータ制御手段１３、回転センサ１４−１，１４−２、及び接触センサ１５、記憶装置４０、撮像装置５０、並びに、ロボット制御装置１００は、それぞれ、ネットワーク３０に接続されている。 The motor control means 13, the rotation sensors 14-1, 14-2, and the contact sensor 15, the storage device 40, the imaging device 50, and the robot control device 100 provided in the robot 10 are connected to the network 30 respectively. I have.

ロボット制御装置１００は、モデル情報、アーム１１の現在位置を示す現在位置情報、及び作業環境２０における目標位置を示す目標位置情報に基づいて目標位置情報が示す目標位置に向かってアーム１１を移動させるための制御内容を示す制御信号を生成し、生成した制御信号を、ネットワーク３０を介してロボット１０に備えられたモータ制御手段１３に出力するものである。
ロボット制御装置１００は、画像取得部１０１、仮想空間画像生成部１０２、モデル取得部１０３、回転状況取得部１０４、目標位置取得部１０６、現在位置取得部１０５、制御生成部１０７、及び制御出力部１０８を備える。ロボット制御装置１００は、上述の構成に加えて、制御補正部１１１、及び制御補間部１１２を備えるものであっても良い。The robot control device 100 moves the arm 11 toward the target position indicated by the target position information based on the model information, the current position information indicating the current position of the arm 11, and the target position information indicating the target position in the work environment 20. And outputs the generated control signal to the motor control means 13 provided in the robot 10 via the network 30.
The robot control device 100 includes an image acquisition unit 101, a virtual space image generation unit 102, a model acquisition unit 103, a rotation status acquisition unit 104, a target position acquisition unit 106, a current position acquisition unit 105, a control generation unit 107, and a control output unit. 108. The robot control device 100 may include a control correction unit 111 and a control interpolation unit 112 in addition to the above-described configuration.

画像取得部１０１は、撮像装置５０が出力した画像情報を、ネットワーク３０を介して取得する。 The image acquisition unit 101 acquires the image information output by the imaging device 50 via the network 30.

仮想空間画像生成部１０２は、画像取得部１０１が取得した画像情報に基づいて作業環境２０を仮想空間に画像として再現し、当該画像を示す仮想空間画像情報を生成する。
具体的には、例えば、仮想空間画像生成部１０２は、画像情報が示す画像から周知の画像解析技術を用いて特徴点を抽出し、特徴点から物体形状を特定するための予め用意された指標情報を用いて、抽出した特徴点から作業環境２０における物体又は障害物の位置又は領域を特定する。仮想空間画像生成部１０２は、特定した物体又は障害物の位置又は領域と、ロボット１０に対する撮像装置５０の相対位置を示す情報と、撮像装置５０の画角を示す情報に基づいてピンホールカメラ技術等を用いて、ロボット１０に対する特定した物体又は障害物の位置又は領域の位置を特定し、作業環境２０を仮想空間に再現する。なお、実施の形態１では、指標情報は、モデル情報に含まれるものとして説明する。
撮像装置５０が深度カメラ又はステレオカメラ等の物体の位置又は領域を３次元空間において特定可能な画像情報を生成する画像入力装置である場合、仮想空間画像生成部１０２は、画像取得部１０１が取得した画像情報を用いて、画像情報が示す画像に写る物体又は障害物の位置又は領域を３次元空間において特定し、作業環境２０を仮想空間に再現する。The virtual space image generation unit 102 reproduces the work environment 20 as an image in a virtual space based on the image information acquired by the image acquisition unit 101, and generates virtual space image information indicating the image.
Specifically, for example, the virtual space image generation unit 102 extracts a feature point from an image indicated by the image information using a well-known image analysis technique, and prepares an index prepared in advance to specify an object shape from the feature point. The position or area of the object or obstacle in the work environment 20 is specified from the extracted feature points using the information. The virtual space image generation unit 102 performs the pinhole camera technology based on the position or area of the specified object or obstacle, the information indicating the relative position of the imaging device 50 with respect to the robot 10, and the information indicating the angle of view of the imaging device 50. The position of the specified object or obstacle with respect to the robot 10 or the position of the area is specified, and the work environment 20 is reproduced in the virtual space. In the first embodiment, the description will be made assuming that the index information is included in the model information.
When the imaging device 50 is an image input device that generates image information capable of specifying the position or area of an object such as a depth camera or a stereo camera in a three-dimensional space, the virtual space image generation unit 102 Using the obtained image information, the position or area of an object or an obstacle shown in the image indicated by the image information is specified in the three-dimensional space, and the work environment 20 is reproduced in the virtual space.

モデル取得部１０３は、モデル情報を取得する。モデル取得部１０３は、例えば、ネットワーク３０を介して、記憶装置４０からモデル情報を読み出すことにより、モデル情報を取得する。なお、実施の形態１において、制御生成部１０７等が予めモデル情報を保持する場合、モデル取得部１０３は、ロボット制御装置１００において、必須な構成ではない。 The model acquisition unit 103 acquires model information. The model acquisition unit 103 acquires model information by reading the model information from the storage device 40 via the network 30, for example. In the first embodiment, when the control generation unit 107 and the like hold model information in advance, the model acquisition unit 103 is not an essential component in the robot control device 100.

回転状況取得部１０４は、ロボット１０に備えられた回転センサ１４−１，１４−２から、モータ１２−１，１２−２の回転状況を示す回転状況信号を取得する。 The rotation status obtaining unit 104 obtains a rotation status signal indicating the rotation status of the motors 12-1 and 12-2 from the rotation sensors 14-1 and 14-2 provided in the robot 10.

現在位置取得部１０５は、ロボット１０におけるアーム１１の現在位置を示す現在位置情報を取得する。
具体的には、例えば、現在位置取得部１０５は、回転状況取得部１０４が取得した回転状況信号に基づいてロボット１０におけるアーム１１の現在位置を特定することにより、現在位置情報を取得する。ロボット１０におけるアーム１１の現在位置は、アーム１１における関節１１−１，１１−２の曲がり具合、すなわち、関節１１−１，１１−２の曲がる角度を特定することにより一意に決定される。アーム１１における関節１１−１，１１−２の曲がる角度は、関節１１−１，１１−２に対応するモータ１２−１，１２−２が回転することにより制御される。現在位置取得部１０５は、回転状況信号が示すモータ１２−１，１２−２の回転量を用いて、モータ１２−１，１２−２における予め決められた基準位置からの回転量を特定することにより、ロボット１０におけるアーム１１の現在位置を特定する。The current position acquisition unit 105 acquires current position information indicating the current position of the arm 11 in the robot 10.
Specifically, for example, the current position acquisition unit 105 acquires the current position information by specifying the current position of the arm 11 in the robot 10 based on the rotation status signal acquired by the rotation status acquisition unit 104. The current position of the arm 11 in the robot 10 is uniquely determined by specifying the degree of bending of the joints 11-1 and 11-2 in the arm 11, that is, the angle at which the joints 11-1 and 11-2 bend. The bending angles of the joints 11-1 and 11-2 in the arm 11 are controlled by rotating the motors 12-1 and 12-2 corresponding to the joints 11-1 and 11-2. The current position acquisition unit 105 specifies the rotation amount of the motors 12-1 and 12-2 from a predetermined reference position using the rotation amounts of the motors 12-1 and 12-2 indicated by the rotation status signal. Thereby, the current position of the arm 11 in the robot 10 is specified.

現在位置取得部１０５が現在位置情報を取得する方法は、回転状況信号に基づいてアーム１１の現在位置を特定する方法に限るものではない。例えば、現在位置取得部１０５は、画像取得部１０１が取得した画像情報に基づいてアーム１１の現在位置を特定しても良い。具体的には、例えば、現在位置取得部１０５は、画像取得部１０１が取得した画像情報が示す画像から周知の画像解析技術を用いて特徴点を抽出し、抽出した特徴点から画像に写るアーム１１を特定して、アーム１１の現在位置を特定する。
また、例えば、現在位置取得部１０５は、超音波センサ又はレーザーセンサ等の物体の位置を検知可能なセンサが出力するセンサ信号に基づいてアーム１１の現在位置を特定しても良い。
実施の形態１では、現在位置取得部１０５は、回転状況取得部１０４が取得した回転状況信号に基づいてロボット１０におけるアーム１１の現在位置を特定することにより、現在位置情報を取得するものとして説明する。The method by which the current position acquisition unit 105 acquires the current position information is not limited to the method of specifying the current position of the arm 11 based on the rotation status signal. For example, the current position acquisition unit 105 may specify the current position of the arm 11 based on the image information acquired by the image acquisition unit 101. Specifically, for example, the current position acquisition unit 105 extracts a feature point from an image indicated by the image information acquired by the image acquisition unit 101 using a well-known image analysis technique, and an arm that appears in the image from the extracted feature point 11 is specified, and the current position of the arm 11 is specified.
Further, for example, the current position obtaining unit 105 may specify the current position of the arm 11 based on a sensor signal output from a sensor capable of detecting the position of an object such as an ultrasonic sensor or a laser sensor.
In the first embodiment, the current position acquisition unit 105 acquires the current position information by specifying the current position of the arm 11 in the robot 10 based on the rotation status signal acquired by the rotation status acquisition unit 104. I do.

仮想空間画像生成部１０２は、現在位置取得部１０５が取得したロボット１０におけるアーム１１の現在位置を示す現在位置情報を用いてロボット１０におけるアーム１１の一部又は全部を作業環境２０と共に、仮想空間に再現し、仮想空間画像情報を生成しても良い。
図３は、実施の形態１に係る仮想空間画像生成部１０２が生成した仮想空間画像情報が示す画像の一例を示す図である。
図３は、仮想空間画像生成部１０２が、ロボット１０におけるアーム１１の全部を作業環境２０と共に、仮想空間に再現した場合を示している。The virtual space image generation unit 102 uses the current position information indicating the current position of the arm 11 in the robot 10 acquired by the current position acquisition unit 105 to change a part or all of the arm 11 in the robot 10 together with the work environment 20 in the virtual space. And virtual space image information may be generated.
FIG. 3 is a diagram illustrating an example of an image indicated by the virtual space image information generated by the virtual space image generation unit 102 according to the first embodiment.
FIG. 3 shows a case where the virtual space image generation unit 102 reproduces the entire arm 11 of the robot 10 together with the work environment 20 in a virtual space.

目標位置取得部１０６は、アーム１１を移動させる目標位置を示す目標位置情報を取得する。目標位置情報が示す目標位置は、例えば、アーム１１における任意の一部を移動させる目標の位置であり、アーム１１における任意の一部とは、例えば、アーム１１における先端１１−３である。
目標位置取得部１０６は、例えば、図示しない入力装置に対するユーザの操作により入力された目標位置情報を受け付けることにより、目標位置情報を取得する。具体的には、目標位置取得部１０６は、図示しない表示装置に対して仮想空間画像生成部１０２が生成した仮想空間画像情報を表示させ、表示装置に表示された仮想空間画像情報の位置を指定する入力装置に対するユーザの操作により目標位置情報を受け付ける。The target position acquisition unit 106 acquires target position information indicating a target position at which the arm 11 is moved. The target position indicated by the target position information is, for example, a target position for moving an arbitrary part of the arm 11, and the arbitrary part of the arm 11 is, for example, a tip 11-3 of the arm 11.
The target position obtaining unit 106 obtains target position information by receiving, for example, target position information input by a user operation on an input device (not shown). Specifically, the target position acquisition unit 106 displays the virtual space image information generated by the virtual space image generation unit 102 on a display device (not shown), and specifies the position of the virtual space image information displayed on the display device. The target position information is received by the user's operation on the input device.

制御生成部１０７は、モデル取得部１０３が取得したモデル情報と、現在位置取得部１０５が取得した現在位置情報と、目標位置取得部１０６が取得した目標位置情報とに基づいて目標位置情報が示す目標位置に向かってアーム１１を移動させるための制御内容を示す制御信号を生成する。
モデル情報は、参照経路を示す参照経路情報を参照してロボット１０におけるアーム１１が参照経路を基にして移動していることを評価することにより報酬を算出する項を含む、報酬を算出するための演算式を用いて学習させたモデルを示す情報である。The control generation unit 107 indicates the target position information based on the model information acquired by the model acquisition unit 103, the current position information acquired by the current position acquisition unit 105, and the target position information acquired by the target position acquisition unit 106. A control signal indicating the control content for moving the arm 11 toward the target position is generated.
The model information is for calculating a reward including a term for calculating a reward by evaluating that the arm 11 of the robot 10 is moving based on the reference path with reference to the reference path information indicating the reference path. Is information indicating a model that has been trained using the arithmetic expression (1).

具体的には、例えば、モデル情報は、ロボット１０におけるアーム１１の位置を示す位置情報と、アーム１１を移動させるための制御内容を示す制御信号とが対応付けられた対応情報を含むものである。対応情報は、互いに異なる複数の目標位置において、目標位置毎に、複数の位置情報と、各位置情報に対応する制御信号がセットになった情報である。モデル情報は、複数の対応情報を含み、各対応情報は、互いに異なる複数の目標位置のそれぞれに対応付けられたものである。
制御生成部１０７は、モデル情報に含まれる対応情報から、目標位置取得部１０６が取得した目標位置情報が示す目標位置に対応する対応情報を特定し、特定した対応情報と、現在位置取得部１０５が取得した現在位置情報とに基づいて制御情報を生成する。
より具体的には、制御生成部１０７は、特定した対応情報を参照して、現在位置取得部１０５が取得した現在位置情報が示すロボット１０におけるアーム１１の現在位置に対応する制御信号を特定することにより、アーム１１を移動させるための制御内容を示す制御信号を生成する。Specifically, for example, the model information includes correspondence information in which position information indicating the position of the arm 11 in the robot 10 is associated with a control signal indicating control content for moving the arm 11. The correspondence information is information in which a plurality of position information and a control signal corresponding to each position information are set for each of the plurality of different target positions. The model information includes a plurality of pieces of correspondence information, and each piece of correspondence information is associated with each of a plurality of different target positions.
The control generation unit 107 specifies, from the correspondence information included in the model information, the correspondence information corresponding to the target position indicated by the target position information acquired by the target position acquisition unit 106, and identifies the identified correspondence information and the current position acquisition unit 105 The control information is generated based on the current position information acquired by.
More specifically, the control generation unit 107 specifies a control signal corresponding to the current position of the arm 11 in the robot 10 indicated by the current position information acquired by the current position acquisition unit 105 with reference to the identified correspondence information. Thereby, a control signal indicating the control content for moving the arm 11 is generated.

制御出力部１０８は、制御生成部１０７が生成した制御信号を、ネットワーク３０を介して、ロボット１０に備えられたモータ制御手段１３に出力する。
ロボット１０に備えられたモータ制御手段１３は、ネットワーク３０を介して、制御出力部１０８が出力した制御信号を受信し、上述のとおり、受信した制御信号に基づいてモータ１２−１，１２−２を動作されるための電気信号を生成し、生成した電気信号をモータ１２−１，１２−２に出力する。The control output unit 108 outputs the control signal generated by the control generation unit 107 to the motor control unit 13 provided in the robot 10 via the network 30.
The motor control unit 13 provided in the robot 10 receives the control signal output from the control output unit 108 via the network 30 and, as described above, the motors 12-1 and 12-2 based on the received control signal. And outputs the generated electric signal to the motors 12-1 and 12-2.

制御補正部１１１は、制御生成部１０７が生成した制御信号（以下「第１制御信号」という。）が示す制御内容が、制御生成部１０７が直前に生成した制御信号（以下「第２制御信号」という。）が示す制御内容と比較して、予め定められた範囲内の変化量になるように、第１制御信号を補正する。
制御補正部１１１が生成するロボット１０に備えられたモータ１２−１又はモータ１２−２の回転を制御するための制御信号において、第２制御信号が示す回転制御と比較して、第１制御信号が示す回転制御が、大きく異なる回転制御である場合、モータ１２−１又はモータ１２−２は、急激なトルク変化を強いられ、脱調等の不具合が発生することがある。更には、モータ１２−１又はモータ１２−２は、予め決められた定格電圧を超える電気信号が入力され、故障等の不具合が発生することがある。
制御補正部１１１は、制御補正部１１１が生成するロボット１０に備えられたモータ１２−１又はモータ１２−２の回転を制御するための制御信号において、第２制御信号が示す回転制御と比較して、急激な変化にならい範囲になるように、第１制御信号が示す回転制御を補正する。The control correction unit 111 determines that the control content indicated by the control signal (hereinafter, referred to as “first control signal”) generated by the control generation unit 107 is the control signal generated immediately before by the control generation unit 107 (hereinafter, “second control signal”). ) Is corrected so that the amount of change is within a predetermined range.
In the control signal for controlling the rotation of the motor 12-1 or the motor 12-2 provided in the robot 10 generated by the control correction unit 111, the first control signal is compared with the rotation control indicated by the second control signal. In the case where the rotation control indicated by is a greatly different rotation control, the motor 12-1 or the motor 12-2 is forced to change abruptly in torque, and a malfunction such as step-out may occur. Further, an electric signal exceeding a predetermined rated voltage is input to the motor 12-1 or the motor 12-2, which may cause a trouble such as a failure.
The control correction unit 111 compares the control signal generated by the control correction unit 111 for controlling the rotation of the motor 12-1 or the motor 12-2 provided in the robot 10 with the rotation control indicated by the second control signal. Thus, the rotation control indicated by the first control signal is corrected so as to be in a range following the rapid change.

ロボット制御装置１００は、制御補正部１１１を有することで、ロボット１０に備えられたモータ１２−１又はモータ１２−２の回転を制御するための制御信号において、脱調又は故障等の不具合が発生しないように、ロボット１０のアーム１１を安定して制御することができる。
なお、制御補正部１１１は、第１制御信号と第２制御信号とを比較する例を説明したが、制御補正部１１１は、第１制御信号と、回転状況取得部１０４が取得する回転状況信号とを比較し、第１制御信号が示す制御内容が、回転状況信号が示すモータ１２−１又はモータ１２−２の回転速度等の回転状況に対して、予め定められた範囲内の変化量になるように、第１制御信号を補正しても良い。Since the robot control device 100 includes the control correction unit 111, a failure such as a step out or a failure occurs in a control signal for controlling the rotation of the motor 12-1 or the motor 12-2 provided in the robot 10. Therefore, the arm 11 of the robot 10 can be stably controlled so as not to cause the problem.
Although the control correction unit 111 has described the example in which the first control signal and the second control signal are compared, the control correction unit 111 includes the first control signal and the rotation status signal acquired by the rotation status acquisition unit 104. And the control content indicated by the first control signal is changed to a variation within a predetermined range with respect to the rotation status such as the rotation speed of the motor 12-1 or the motor 12-2 indicated by the rotation status signal. Thus, the first control signal may be corrected.

制御補間部１１２は、制御生成部１０７が生成した第１制御信号が示す制御内容の一部又は全部が欠落している場合、制御生成部１０７が直前に生成した第２制御信号が示す制御内容に基づいて第１制御信号における欠落している制御内容を補間して第１制御信号を補正する。制御補間部１１２は、第２制御信号が示す制御内容に基づいて第１制御信号における欠落している制御内容を補間する際、第１制御信号における欠落している制御内容が、第２制御信号が示す制御内容から予め定められた範囲内の変化量になるように補間して第１制御信号を補正する。 When part or all of the control content indicated by the first control signal generated by the control generation unit 107 is missing, the control interpolation unit 112 controls the control content indicated by the second control signal generated immediately before by the control generation unit 107. And corrects the first control signal by interpolating the missing control content in the first control signal. When the control interpolation unit 112 interpolates the missing control content of the first control signal based on the content of the control indicated by the second control signal, the missing control content of the first control signal is replaced by the second control signal. The first control signal is corrected by interpolating so as to be a variation within a predetermined range from the control content indicated by.

例えば、制御生成部１０７が予め定められた期間ごとに定期的に制御信号を生成し、ロボット１０に備えられたモータ１２−１又はモータ１２−２の回転制御を行う場合、制御生成部１０７による制御信号の生成が当該期間内に完了しない場合がある。このような場合、例えば、制御生成部１０７により生成された制御信号は、制御内容の一部又は全部が欠落した状態となる。例えば、制御信号が示す制御内容が相対値ではなく絶対値を指定する制御信号である場合、制御生成部１０７が生成する制御信号の制御内容の一部又は全部が欠落すると、モータ１２−１又はモータ１２−２は、急激なトルク変化を強いられ、脱調等の不具合が発生することがある。更には、モータ１２−１又はモータ１２−２は、予め決められた定格電圧を超える電気信号が入力され、故障等の不具合が発生することがある。
ロボット制御装置１００は、制御補間部１１２を有することで、ロボット１０に備えられたモータ１２−１又はモータ１２−２の回転を制御するための制御信号において、脱調又は故障等の不具合が発生しないように、ロボット１０のアーム１１を安定して制御することができる。
なお、制御補間部１１２は、第１制御信号における欠落している制御内容を補間する際、第２制御信号に基づいて第１制御信号を補間する例を説明したが、制御補正部１１１は、第１制御信号と、回転状況取得部１０４が取得する回転状況信号とを比較し、第１制御信号が示す制御内容が、回転状況信号が示すモータ１２−１又はモータ１２−２の回転速度等の回転状況に対して、予め定められた範囲内の変化量になるように、第１制御信号を補間して補正しても良い。For example, when the control generation unit 107 periodically generates a control signal for each predetermined period and controls the rotation of the motor 12-1 or the motor 12-2 provided in the robot 10, the control generation unit 107 The generation of the control signal may not be completed within the period. In such a case, for example, the control signal generated by the control generation unit 107 is in a state where part or all of the control content is missing. For example, when the control content indicated by the control signal is a control signal that specifies an absolute value instead of a relative value, if a part or all of the control content of the control signal generated by the control generation unit 107 is missing, the motor 12-1 or The motor 12-2 is forced to undergo a sudden change in torque, which may cause problems such as step-out. Further, an electric signal exceeding a predetermined rated voltage is input to the motor 12-1 or the motor 12-2, which may cause a trouble such as a failure.
Since the robot control device 100 includes the control interpolation unit 112, a failure such as a step out or a failure occurs in a control signal for controlling the rotation of the motor 12-1 or the motor 12-2 provided in the robot 10. Therefore, the arm 11 of the robot 10 can be stably controlled so as not to cause the problem.
Note that the control interpolation unit 112 has described an example in which the control control unit 111 interpolates the first control signal based on the second control signal when interpolating the missing control content in the first control signal. The first control signal is compared with the rotation status signal acquired by the rotation status acquisition unit 104, and the control content indicated by the first control signal is the rotation speed of the motor 12-1 or the motor 12-2 indicated by the rotation status signal. The first control signal may be interpolated and corrected so that the amount of change within the predetermined rotation range is within a predetermined range.

図４Ａ及び図４Ｂを参照して、実施の形態１に係るロボット制御装置１００の要部のハードウェア構成について説明する。
図４Ａ及び図４Ｂは、実施の形態１に係るロボット制御装置１００の要部のハードウェア構成の一例を示す図である。With reference to FIGS. 4A and 4B, a hardware configuration of a main part of the robot control device 100 according to the first embodiment will be described.
4A and 4B are diagrams illustrating an example of a hardware configuration of a main part of the robot control device 100 according to the first embodiment.

図４Ａに示す如く、ロボット制御装置１００はコンピュータにより構成されており、当該コンピュータはプロセッサ２０１及びメモリ２０２を有している。メモリ２０２には、当該コンピュータを、画像取得部１０１、仮想空間画像生成部１０２、モデル取得部１０３、回転状況取得部１０４、目標位置取得部１０６、現在位置取得部１０５、制御生成部１０７、制御出力部１０８、制御補正部１１１、及び制御補間部１１２として機能させるためのプログラムが記憶されている。メモリ２０２に記憶されているプログラムをプロセッサ２０１が読み出して実行することにより、画像取得部１０１、仮想空間画像生成部１０２、モデル取得部１０３、回転状況取得部１０４、目標位置取得部１０６、現在位置取得部１０５、制御生成部１０７、制御出力部１０８、制御補正部１１１、及び制御補間部１１２が実現される。 As shown in FIG. 4A, the robot control device 100 is configured by a computer, and the computer has a processor 201 and a memory 202. The memory 202 stores the computer in an image acquisition unit 101, a virtual space image generation unit 102, a model acquisition unit 103, a rotation status acquisition unit 104, a target position acquisition unit 106, a current position acquisition unit 105, a control generation unit 107, a control A program for functioning as the output unit 108, the control correction unit 111, and the control interpolation unit 112 is stored. The processor 201 reads out and executes the program stored in the memory 202, so that the image acquisition unit 101, the virtual space image generation unit 102, the model acquisition unit 103, the rotation status acquisition unit 104, the target position acquisition unit 106, the current position An acquisition unit 105, a control generation unit 107, a control output unit 108, a control correction unit 111, and a control interpolation unit 112 are realized.

また、図４Ｂに示す如く、ロボット制御装置１００は処理回路２０３により構成されても良い。この場合、画像取得部１０１、仮想空間画像生成部１０２、モデル取得部１０３、回転状況取得部１０４、目標位置取得部１０６、現在位置取得部１０５、制御生成部１０７、制御出力部１０８、制御補正部１１１、及び制御補間部１１２の機能が処理回路２０３により実現されても良い。 Further, as shown in FIG. 4B, the robot control device 100 may be configured by a processing circuit 203. In this case, the image acquisition unit 101, virtual space image generation unit 102, model acquisition unit 103, rotation status acquisition unit 104, target position acquisition unit 106, current position acquisition unit 105, control generation unit 107, control output unit 108, control correction The functions of the unit 111 and the control interpolation unit 112 may be realized by the processing circuit 203.

また、ロボット制御装置１００はプロセッサ２０１、メモリ２０２及び処理回路２０３により構成されても良い（不図示）。この場合、画像取得部１０１、仮想空間画像生成部１０２、モデル取得部１０３、回転状況取得部１０４、目標位置取得部１０６、現在位置取得部１０５、制御生成部１０７、制御出力部１０８、制御補正部１１１、及び制御補間部１１２の機能のうちの一部の機能がプロセッサ２０１及びメモリ２０２により実現されて、残余の機能が処理回路２０３により実現されるものであっても良い。 Further, the robot control device 100 may include a processor 201, a memory 202, and a processing circuit 203 (not shown). In this case, the image acquisition unit 101, virtual space image generation unit 102, model acquisition unit 103, rotation status acquisition unit 104, target position acquisition unit 106, current position acquisition unit 105, control generation unit 107, control output unit 108, control correction Some of the functions of the unit 111 and the control interpolation unit 112 may be realized by the processor 201 and the memory 202, and the remaining functions may be realized by the processing circuit 203.

プロセッサ２０１は、例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、マイクロプロセッサ、マイクロコントローラ又はＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）を用いたものである。 The processor 201 uses, for example, a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a microprocessor, a microcontroller, or a DSP (Digital Signal Processor).

メモリ２０２は、例えば、半導体メモリ又は磁気ディスクを用いたものである。より具体的には、メモリ２０２は、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、フラッシュメモリ、ＥＰＲＯＭ（ＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＥＥＰＲＯＭ（ＥｌｅｃｔｒｉｃａｌｌｙＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）又はＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）などを用いたものである。 The memory 202 uses, for example, a semiconductor memory or a magnetic disk. More specifically, the memory 202 includes a RAM (Random Access Memory), a ROM (Read Only Memory), a flash memory, an EPROM (Erasable Programmable Read Only Memory), and an EEPROM (Electrical RadioSmallable EMS). A state drive) or an HDD (hard disk drive) is used.

処理回路２０３は、例えば、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、ＰＬＤ（ＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＤｅｖｉｃｅ）、ＦＰＧＡ（Ｆｉｅｌｄ−ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）、ＳｏＣ（Ｓｙｓｔｅｍ−ｏｎ−ａ−Ｃｈｉｐ）又はシステムＬＳＩ（Ｌａｒｇｅ−ＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）を用いたものである。 The processing circuit 203 includes, for example, an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), an FPGA (Field-Programmable Gate Array), and a SoC (System-Lig- Is used.

図５を参照して、実施の形態１に係るロボット制御装置１００の動作について説明する。
図５は、実施の形態１に係るロボット制御装置１００の処理の一例を説明するフローチャートである。
ロボット制御装置１００は、例えば、新たな目標位置が設定される毎に当該フローチャートの処理を繰り返して実行する。The operation of the robot control device 100 according to the first embodiment will be described with reference to FIG.
FIG. 5 is a flowchart illustrating an example of a process of the robot control device 100 according to the first embodiment.
The robot control device 100 repeatedly executes the processing of the flowchart, for example, every time a new target position is set.

まず、ステップＳＴ５０１にて、画像取得部１０１は、画像情報を取得する。
次に、ステップＳＴ５０２にて、モデル取得部１０３は、モデル情報を取得する。
次に、ステップＳＴ５０３にて、回転状況取得部１０４は、回転状況信号を取得する。
次に、ステップＳＴ５０４にて、現在位置取得部１０５は、ロボット１０におけるアーム１１の現在位置を示す現在位置情報を取得する。
次に、ステップＳＴ５０５にて、仮想空間画像生成部１０２は、仮想空間画像情報を生成する。
次に、ステップＳＴ５０６にて、目標位置取得部１０６は、目標位置情報を取得する。
次に、ステップＳＴ５０７にて、制御生成部１０７は、モデル情報に含まれる対応情報のうち、目標位置情報が示す目標位置に対応する対応情報を特定する。First, in step ST501, the image obtaining unit 101 obtains image information.
Next, in step ST502, model acquiring section 103 acquires model information.
Next, in step ST503, rotation status obtaining section 104 obtains a rotation status signal.
Next, in step ST504, the current position acquisition unit 105 acquires current position information indicating the current position of the arm 11 in the robot 10.
Next, in step ST505, the virtual space image generation unit 102 generates virtual space image information.
Next, in step ST506, target position obtaining section 106 obtains target position information.
Next, in step ST507, control generation section 107 specifies correspondence information corresponding to the target position indicated by the target position information, among the correspondence information included in the model information.

次に、ステップＳＴ５０８にて、制御生成部１０７は、現在位置取得部１０５が取得した現在位置情報が示すロボット１０におけるアーム１１の現在位置と目標位置情報が示す目標位置とが同一であるか否かを判定する。なお、ここで言う同一とは、必ずしも完全に一致するものに限らず、同一は、略同一を含むものである。
ステップＳＴ５０８にて、制御生成部１０７が、アーム１１の現在位置と目標位置とが同一であると判定した場合、ロボット制御装置１００は、当該フローチャートの処理を終了する。
ステップＳＴ５０８にて、制御生成部１０７が、アーム１１の現在位置と目標位置とが同一でないと判定した場合、ステップＳＴ５１１にて、制御生成部１０７は、特定した対応情報を参照して、現在位置取得部１０５が取得した現在位置情報が示すロボット１０におけるアーム１１の現在位置に対応する制御信号を特定することにより、アーム１１を移動させるための制御内容を示す制御信号を生成する。Next, in step ST508, the control generation unit 107 determines whether the current position of the arm 11 in the robot 10 indicated by the current position information acquired by the current position acquisition unit 105 is the same as the target position indicated by the target position information. Is determined. It should be noted that the term "identical" used herein is not limited to a completely identical one, and the same includes substantially the same.
In step ST508, when the control generation unit 107 determines that the current position and the target position of the arm 11 are the same, the robot control device 100 ends the processing of the flowchart.
In step ST508, when the control generation unit 107 determines that the current position of the arm 11 is not the same as the target position, in step ST511, the control generation unit 107 refers to the specified correspondence information and By specifying a control signal corresponding to the current position of the arm 11 in the robot 10 indicated by the current position information acquired by the acquisition unit 105, a control signal indicating control details for moving the arm 11 is generated.

次に、ステップＳＴ５１２にて、制御補正部１１１は、制御生成部１０７が生成した第１制御信号が示す制御内容が、制御生成部１０７が直前に生成した第２制御信号が示す制御内容と比較して、予め定められた範囲内の変化量になるように、第１制御信号を補正する。
次に、ステップＳＴ５１３にて、制御補間部１１２は、制御生成部１０７が生成した第１制御信号が示す制御内容の一部又は全部が欠落している場合、制御生成部１０７が直前に生成した第２制御信号が示す制御内容に基づいて第１制御信号における欠落している制御内容を補間して第１制御信号を補正する。
次に、ステップＳＴ５１４にて、制御出力部１０８は、制御生成部１０７が生成した制御信号又は制御補正部１１１若しくは制御補間部１１２が補正した制御信号を、ロボット１０に出力する。
次に、ステップＳＴ５１５にて、回転状況取得部１０４は、回転状況信号を取得する。
次に、ステップＳＴ５１６にて、現在位置取得部１０５は、ロボット１０におけるアーム１１の現在位置を示す現在位置情報を取得する。
次に、ステップＳＴ５１７にて、仮想空間画像生成部１０２は、仮想空間画像情報を生成する。Next, in step ST512, the control correction unit 111 compares the control content indicated by the first control signal generated by the control generation unit 107 with the control content indicated by the second control signal generated immediately before by the control generation unit 107. Then, the first control signal is corrected so that the change amount is within a predetermined range.
Next, in step ST513, when some or all of the control content indicated by the first control signal generated by the control generation unit 107 is missing, the control interpolation unit 112 generates the control content immediately before. The first control signal is corrected by interpolating the missing control content of the first control signal based on the control content indicated by the second control signal.
Next, in step ST514, the control output unit 108 outputs, to the robot 10, the control signal generated by the control generation unit 107 or the control signal corrected by the control correction unit 111 or the control interpolation unit 112.
Next, in step ST515, rotation status obtaining section 104 obtains a rotation status signal.
Next, in step ST516, the current position acquisition unit 105 acquires current position information indicating the current position of the arm 11 in the robot 10.
Next, in step ST517, the virtual space image generation unit 102 generates virtual space image information.

ロボット制御装置１００は、ステップＳＴ５１７の処理を実行した後、ステップＳＴ５０８の処理に戻って、ステップＳＴ５０８にて、制御生成部１０７が、アーム１１の位置と目標位置とが同一であると判定するまでの期間において、ステップＳＴ５０８からステップＳＴ５１７までの処理を繰り返し実行する。
なお、当該フローチャートの処理において、ステップＳＴ５１２、ステップＳＴ５１３，及びステップＳＴ５１７の処理は、ロボット制御装置１００において必須の処理ではない。また、当該フローチャートの処理において、ステップＳＴ５０１及びステップＳＴ５０２の処理は順序が逆であっても良い。また、当該フローチャートの処理において、ステップＳＴ５１２及びステップＳＴ５１３の処理は順序が逆であっても良い。After executing the process of step ST517, the robot control device 100 returns to the process of step ST508, and until the control generation unit 107 determines in step ST508 that the position of the arm 11 and the target position are the same. During the period, the processing from step ST508 to step ST517 is repeatedly executed.
In the process of the flowchart, the processes of step ST512, step ST513, and step ST517 are not essential processes in the robot control device 100. Further, in the processing of the flowchart, the order of the processing of step ST501 and step ST502 may be reversed. Further, in the processing of the flowchart, the processing of steps ST512 and ST513 may be reversed.

モデル情報の生成方法について説明する。
ロボット制御装置１００が制御信号を生成する際に用いるモデル情報は、ロボット制御学習装置３００により生成される。
図６を参照して実施の形態１に係るロボット制御学習装置３００の要部の構成について説明する。
図６は、実施の形態１に係るロボット制御学習装置３００及びロボット制御学習システム３の要部の構成の一例を示すブロック図である。
ロボット制御学習装置３００は、ロボット制御学習システム３に適用される。
ロボット制御学習システム３は、ロボット制御学習装置３００、ロボット１０、ネットワーク３０、記憶装置４０、及び撮像装置５０を備える。
ロボット制御学習システム３の構成において、ロボット制御システム１と同様の構成については、同じ符号を付して重複した説明を省略する。すなわち、図２に記載した符号と同じ符号を付した図６の構成については、説明を省略する。A method for generating model information will be described.
Model information used when the robot control device 100 generates a control signal is generated by the robot control learning device 300.
The configuration of the main part of the robot control learning device 300 according to the first embodiment will be described with reference to FIG.
FIG. 6 is a block diagram illustrating an example of a configuration of a main part of the robot control learning device 300 and the robot control learning system 3 according to the first embodiment.
The robot control learning device 300 is applied to the robot control learning system 3.
The robot control learning system 3 includes a robot control learning device 300, a robot 10, a network 30, a storage device 40, and an imaging device 50.
In the configuration of the robot control learning system 3, the same components as those of the robot control system 1 are denoted by the same reference numerals, and redundant description will be omitted. That is, the description of the configuration in FIG. 6 to which the same reference numerals as those in FIG.

ロボット１０に備えられたモータ制御手段１３、回転センサ１４−１，１４−２、及び接触センサ１５、記憶装置４０、撮像装置５０、並びに、ロボット制御学習装置３００は、それぞれ、ネットワーク３０に接続されている。
ロボット制御学習装置３００は、ロボット１０に備えられたモータ１２−１又はモータ１２−２の回転を制御するための制御信号を生成し、当該制御信号によりロボット１０に備えられたモータ１２−１又はモータ１２−２を制御することによってロボット１０にアーム１１を制御するための学習を行い、ロボット制御装置１００がロボット１０のアーム１１を制御する際に用いるモデル情報を生成するものである。The motor control means 13, the rotation sensors 14-1, 14-2, and the contact sensor 15, the storage device 40, the imaging device 50, and the robot control learning device 300 provided in the robot 10 are connected to the network 30. ing.
The robot control learning device 300 generates a control signal for controlling the rotation of the motor 12-1 or the motor 12-2 provided in the robot 10, and the motor 12-1 or the motor 12- By controlling the motor 12-2, the robot 10 learns to control the arm 11 and generates model information used when the robot controller 100 controls the arm 11 of the robot 10.

ロボット制御学習装置３００は、ロボット１０のアーム１１の現在位置を示す現在位置情報、アーム１１の目標位置を示す目標位置情報、及び参照経路を示す参照経路情報に基づいてロボット制御装置１００が目標位置に向かってアーム１１を移動させるための制御内容を示す制御信号を生成する際に用いるモデル情報を生成するものである。
ロボット制御学習装置３００は、画像取得部３０１、仮想空間画像生成部３０２、回転状況取得部３０４、目標位置取得部３０６、現在位置取得部３０５、制御生成部３０７、制御出力部３０８、参照経路取得部３２０、報酬算出部３２１、モデル生成部３２２、モデル出力部３２３、及び接触信号取得部３２４を備える。ロボット制御学習装置３００は、上述の構成に加えて、制御補正部３１１、及び制御補間部３１２を備えるものであっても良い。The robot control learning device 300 determines whether the robot control device 100 has the target position based on the current position information indicating the current position of the arm 11 of the robot 10, the target position information indicating the target position of the arm 11, and the reference path information indicating the reference path. This is to generate model information used when generating a control signal indicating the control content for moving the arm 11 toward.
The robot control learning device 300 includes an image acquisition unit 301, a virtual space image generation unit 302, a rotation status acquisition unit 304, a target position acquisition unit 306, a current position acquisition unit 305, a control generation unit 307, a control output unit 308, and a reference route acquisition. A unit 320, a reward calculation unit 321, a model generation unit 322, a model output unit 323, and a contact signal acquisition unit 324 are provided. The robot control learning device 300 may include a control correction unit 311 and a control interpolation unit 312 in addition to the above-described configuration.

なお、実施の形態１に係るロボット制御学習装置３００における画像取得部３０１、仮想空間画像生成部３０２、回転状況取得部３０４、目標位置取得部３０６、現在位置取得部３０５、制御生成部３０７、制御出力部３０８、参照経路取得部３２０、報酬算出部３２１、モデル生成部３２２、モデル出力部３２３、接触信号取得部３２４、制御補正部３１１、及び制御補間部３１２の各機能は、実施の形態１に係るロボット制御装置１００について図４Ａ及び図４Ｂに一例を示したハードウェア構成におけるプロセッサ２０１及びメモリ２０２により実現されるものであっても良く、又は処理回路２０３により実現されるものであっても良い。 Note that the image acquisition unit 301, virtual space image generation unit 302, rotation status acquisition unit 304, target position acquisition unit 306, current position acquisition unit 305, control generation unit 307, control The functions of the output unit 308, the reference route acquisition unit 320, the reward calculation unit 321, the model generation unit 322, the model output unit 323, the contact signal acquisition unit 324, the control correction unit 311, and the control interpolation unit 312 are described in Embodiment 1. 4A and 4B may be realized by the processor 201 and the memory 202 in the hardware configuration illustrated in FIGS. 4A and 4B, or may be realized by the processing circuit 203. good.

画像取得部３０１は、撮像装置５０が出力した画像情報を、ネットワーク３０を介して取得する。 The image acquisition unit 301 acquires the image information output by the imaging device 50 via the network 30.

仮想空間画像生成部３０２は、画像取得部３０１が取得した画像情報に基づいて作業環境２０を仮想空間に画像として再現し、当該画像を示す仮想空間画像情報を生成する。
仮想空間画像生成部３０２が仮想空間画像情報を生成する処理は、実施の形態１に係るロボット制御装置１００における仮想空間画像生成部１０２が実施する処理と同様であるため、詳細な説明を省略する。
なお、実施の形態１に係るロボット制御学習装置３００において、仮想空間画像生成部３０２が仮想空間画像情報を生成するために必要な指標情報は、仮想空間画像生成部３０２が、ネットワーク３０を介して記憶装置４０から読み出すことにより取得するものとして説明する。The virtual space image generation unit 302 reproduces the work environment 20 as an image in a virtual space based on the image information acquired by the image acquisition unit 301, and generates virtual space image information indicating the image.
The process in which the virtual space image generation unit 302 generates the virtual space image information is the same as the process performed by the virtual space image generation unit 102 in the robot control device 100 according to the first embodiment, and thus a detailed description is omitted. .
In the robot control learning device 300 according to Embodiment 1, the index information necessary for the virtual space image generation unit 302 to generate the virtual space image information is transmitted via the network 30 by the virtual space image generation unit 302. The description will be made on the assumption that the information is obtained by reading from the storage device 40.

回転状況取得部３０４は、ロボット１０に備えられた回転センサ１４−１，１４−２から、モータ１２−１，１２−２の回転状況を示す回転状況信号を取得する。 The rotation status acquisition unit 304 obtains a rotation status signal indicating the rotation status of the motors 12-1 and 12-2 from the rotation sensors 14-1 and 14-2 provided in the robot 10.

現在位置取得部３０５は、ロボット１０におけるアーム１１の現在位置を示す現在位置情報を取得する。
具体的には、例えば、現在位置取得部３０５は、回転状況取得部３０４が取得した回転状況信号に基づいてロボット１０におけるアーム１１の現在位置を特定することにより、現在位置情報を取得する。
現在位置取得部３０５が現在位置情報を取得する処理は、実施の形態１に係るロボット制御装置１００における現在位置取得部１０５が実施する処理と同様であるため、詳細な説明を省略する。The current position acquisition unit 305 acquires current position information indicating the current position of the arm 11 in the robot 10.
Specifically, for example, the current position acquisition unit 305 acquires the current position information by specifying the current position of the arm 11 in the robot 10 based on the rotation status signal acquired by the rotation status acquisition unit 304.
The process in which the current position acquisition unit 305 acquires the current position information is the same as the process executed by the current position acquisition unit 105 in the robot control device 100 according to the first embodiment, and thus a detailed description is omitted.

目標位置取得部３０６は、アーム１１を移動させる目標位置を示す目標位置情報を取得する。
目標位置取得部３０６が目標位置情報を取得する処理は、実施の形態１に係るロボット制御装置１００における目標位置取得部１０６が実施する処理と同様であるため、詳細な説明を省略する。The target position acquisition unit 306 acquires target position information indicating a target position at which the arm 11 is moved.
The process in which the target position acquisition unit 306 acquires the target position information is the same as the process executed by the target position acquisition unit 106 in the robot control device 100 according to the first embodiment, and a detailed description will be omitted.

参照経路取得部３２０は、現在位置取得部３０５が取得した現在位置情報が示すロボット１０におけるアーム１１の現在位置から、目標位置取得部３０６が取得した目標位置情報が示す目標位置までの経路のうち、少なくとも一部の経路を含む参照経路を示す参照経路情報を取得する。
参照経路取得部３２０は、例えば、図示しない表示装置に仮想空間画像生成部３０２が生成した仮想空間画像情報を表示させて、図示しない入力装置がユーザから入力を受け付けて、入力された参照経路情報を取得する。The reference route acquisition unit 320 is a part of the route from the current position of the arm 11 in the robot 10 indicated by the current position information acquired by the current position acquisition unit 305 to the target position indicated by the target position information acquired by the target position acquisition unit 306. And obtains reference route information indicating a reference route including at least a part of the route.
The reference route acquisition unit 320 displays the virtual space image information generated by the virtual space image generation unit 302 on a display device (not shown), for example. To get.

参照経路取得部３２０における参照経路情報の取得方法は、上述の方法に限定されるものではない。
例えば、参照経路取得部３２０は、所定の計算処理に基づいて自動で生成しても良い。具体的には、例えば、参照経路取得部３２０は、現在位置情報、目標位置情報、及び仮想空間画像情報に基づいてＲＲＴ（Ｒａｐｉｄｌｙ−ｅｘｐｌｏｒｉｎｇ Rａｎｄｏｍ Tｒｅｅ）等を用いたランダムサーチを実行し、ランダムサーチの結果に基づいて参照経路情報を生成することにより、参照経路情報を取得しても良い。
参照経路取得部３２０は、参照経路情報を取得する際にランダムサーチの結果を用いることにより、参照経路情報を自動で生成できる。
なお、ＲＲＴ等を用いたランダムサーチにより２地点間の経路を求める方法は、公知であるため説明を省略する。The method of acquiring the reference route information in the reference route acquisition unit 320 is not limited to the above-described method.
For example, the reference route acquisition unit 320 may automatically generate the reference route based on a predetermined calculation process. Specifically, for example, the reference route acquisition unit 320 performs a random search using an RRT (Rapidly-exploring Random Tree) based on the current position information, the target position information, and the virtual space image information, and performs a random search. By generating the reference route information based on the result of the above, the reference route information may be acquired.
The reference route acquisition unit 320 can automatically generate the reference route information by using the result of the random search when acquiring the reference route information.
Note that a method of obtaining a route between two points by random search using RRT or the like is well-known, and thus description thereof is omitted.

また、例えば、参照経路取得部３２０は、現在位置情報が示すアーム１１の現在位置から目標位置情報が示す目標位置までの区間において、アーム１１が過去に移動した経路を示す移動履歴情報に基づいて参照経路情報を生成することにより、参照経路情報を取得しても良い。
移動履歴情報は、例えば、アーム１１が過去に当該区間を移動した際に、ネットワーク３０を介して記憶装置４０に記憶させる。参照経路取得部３２０は、記憶装置４０から移動履歴情報を読み出すことにより、移動履歴情報を取得する。
参照経路取得部３２０は、参照経路情報を取得する際に移動履歴情報を用いることにより、参照経路情報を自動で生成できる。In addition, for example, the reference route obtaining unit 320 may be configured to perform, based on movement history information indicating a route that the arm 11 has moved in the past in a section from the current position of the arm 11 indicated by the current position information to the target position indicated by the target position information. The reference route information may be obtained by generating the reference route information.
The movement history information is stored in the storage device 40 via the network 30, for example, when the arm 11 has moved in the section in the past. The reference route acquisition unit 320 acquires the movement history information by reading the movement history information from the storage device 40.
The reference route acquisition unit 320 can automatically generate the reference route information by using the movement history information when acquiring the reference route information.

接触信号取得部３２４は、ネットワーク３０を介して、接触センサ１５が出力した接触信号を受信する。 The contact signal acquisition unit 324 receives the contact signal output by the contact sensor 15 via the network 30.

報酬算出部３２１は、現在位置取得部３０５が取得した現在位置情報と、目標位置取得部３０６が取得した目標位置情報と、参照経路取得部３２０が取得した参照経路情報とに基づいてアーム１１が参照経路を基にして移動していることを評価することにより報酬を算出する項を含む演算式を用いて、報酬を算出する。
報酬算出部３２１が報酬を算出する際に用いる演算式は、アーム１１が参照経路を基にして移動していることを評価することにより報酬を算出する項に加えて、現在位置取得部３０５が取得した現在位置情報が示すアーム１１の現在位置の移動の連続性を評価することにより報酬を算出する項を含むものであっても良い。
また、報酬算出部３２１が報酬を算出する際に用いる演算式は、アーム１１が参照経路を基にして移動していることを評価することにより報酬を算出する項に加えて、アーム１１と作業環境２０における障害物とが接触したか否かを評価することにより報酬を算出する項を含むものであっても良い。報酬算出部３２１は、アーム１１と作業環境２０における障害物とが接触したか否かを、例えば、接触信号取得部３２４が接触センサ１５から取得した接触信号に基づいて判定する。The reward calculation unit 321 determines whether the arm 11 is based on the current position information acquired by the current position acquisition unit 305, the target position information acquired by the target position acquisition unit 306, and the reference route information acquired by the reference route acquisition unit 320. The reward is calculated using an arithmetic expression including a term for calculating the reward by evaluating that the user is moving based on the reference route.
The calculation formula used when the reward calculation unit 321 calculates the reward includes a term for calculating the reward by evaluating that the arm 11 is moving based on the reference route. It may include a term for calculating a reward by evaluating continuity of movement of the current position of the arm 11 indicated by the obtained current position information.
The calculation formula used when the reward calculation unit 321 calculates the reward includes a term for calculating the reward by evaluating that the arm 11 is moving based on the reference route. It may include a term for calculating a reward by evaluating whether or not an obstacle in the environment 20 has contacted. The reward calculation unit 321 determines whether or not the arm 11 and an obstacle in the work environment 20 have contacted each other, for example, based on the contact signal acquired by the contact signal acquisition unit 324 from the contact sensor 15.

具体的には、報酬算出部３２１は、以下の式（１）を用いて、時点ｔ−１におけるロボット１０におけるアーム１１の位置から、ロボット１０におけるアーム１１が任意の制御信号に基づいて時点ｔまでの間に行動し、時点ｔにおけるロボット１０におけるアーム１１の位置となる際の報酬を算出するものである。なお、時点ｔ−１から時点ｔまでの期間は、例えば、制御生成部３０７が、ロボット１０備えられたモータ制御手段１３に出力する制御信号を生成する予め決められた時間間隔である。 Specifically, the reward calculation unit 321 uses the following equation (1) to calculate the position of the arm 11 in the robot 10 at the time point t-1 based on an arbitrary control signal. This is to calculate the reward when the robot 11 reaches the position of the arm 11 at the time t. The period from the time point t-1 to the time point t is, for example, a predetermined time interval in which the control generation unit 307 generates a control signal to be output to the motor control unit 13 provided in the robot 10.

モデル生成部３２２は、Ｑ学習法、Ａｃｔｏｒ−Ｃｒｉｔｉｃ法、若しくはＳａｒｓａ法等のＴＤ（ＴｅｍｐｏｒａｌＤｉｆｆｅｒｅｎｃｅ）学習法、又はモンテカルロ法等の強化学習によりモデルを生成し、生成したモデルを示すモデル情報を生成する。
強化学習は、ある時刻ｔにおける行動主体の状態Ｓ_ｔにおいて、行動主体が行動し得る１以上の行動のうち、ある行動ａ_ｔを選択して行動した際の当該ある行動ａ_ｔに対する価値Ｑ（Ｓ_ｔ，ａ_ｔ）と当該ある行動ａ_ｔに対する報酬ｒ_ｔを定義し、価値Ｑ（Ｓ_ｔ，ａ_ｔ）と報酬ｒ_ｔとを高めていくものである。
一般に、行動価値関数の更新式は、以下の式（２）により示される。
Ｑ（Ｓ_ｔ，ａ_ｔ） ← Ｑ（Ｓ_ｔ，ａ_ｔ）+α（ｒ_ｔ+１+γｍａｘＱ（Ｓ_ｔ+１，ａ_ｔ+１）-Ｑ（Ｓ_ｔ，ａ_ｔ））・・・式（２）The model generation unit 322 generates a model by a TD (Temporal Difference) learning method such as a Q learning method, an Actor-Critic method, or a Sarsa method, or reinforcement learning such as a Monte Carlo method, and generates model information indicating the generated model. I do.
RL, in the state S _t of actors at a certain time t, of the one or more actions actors can act, valuable for certain actions a _t the when acted by selecting a certain action a _t Q ( define a reward _{r t} for S _{_t, a} _t) the there act _{a t,} in which will enhance the value _{Q (S} _{t, a} t) and the reward _{r t.}
Generally, the update equation of the action value function is represented by the following equation (2).
_{_{Q (S t, a t)}} ← Q (S t, a t) + α (r t + 1 + γmaxQ (S t + 1, a t + 1) -Q (S t, a t)) ··· Equation (2)

ここで、Ｓ_ｔは、ある時点ｔにおける行動主体の状態、ａ_ｔは、ある時点ｔにおける行動主体の行動、及び、Ｓ_ｔ+１は、時点ｔより所定の時間間隔だけ時刻が進んだ時点ｔ＋１における行動主体の状態を表す。時点ｔにおいて状態Ｓ_ｔである行動主体は、行動ａ_ｔにより、時点ｔ＋１において、状態Ｓ_ｔ+１に遷移する。
Ｑ（Ｓ_ｔ，ａ_ｔ）は、状態Ｓ_ｔにある行動主体が行った行動ａ_ｔに対する価値を表す。
ｒ_ｔ+１は、行動主体が状態Ｓ_ｔから状態Ｓ_ｔ+１に遷移した際の報酬を示す値である。
ｍａｘＱ（Ｓ_ｔ+１，ａ_ｔ+１）は、行動主体の状態が状態Ｓ_ｔ+１であるときに行動主体が取り得る行動ａ_ｔ+１のうち、行動主体が、最もＱ（Ｓ_ｔ+１，ａ_ｔ+１）の値が大きな値となる行動ａ^＊を選択した際のＱ（Ｓ_ｔ+１，ａ^＊）を表す。
γは、１以下の正の値を示すパラメータであり、一般に、割引率と呼ばれる値である。
αは、１以下の正の値を示す学習係数である。Point where, S _t, the state of the actors at a point in time t, a _t, the action of actors at a point in time t, and, S t _{+ 1} is advanced by the time from the time t by a predetermined time interval This represents the state of the action subject at t + 1. Actors is a state _{S t} at time t, due action _{a t,} at time t + 1, a transition to a state S _{t + 1.}
Q _(S _{t, a} t) represents the value for the action _{a t} the actors in a state _{S t} went.
rt _{+ 1} is a value indicating a reward when the action subject transitions from the state _St to the state _{St + 1} .
_{maxQ (S t + 1, a} t + 1) , among the actors may assume action a _{t + 1} when the state of the actors is in state S _{t + 1,} actors are most Q _{(S t +1} , a _{t + 1} ) represents Q ( _{St + 1} , a ^* ) when an action a ^* having a large value is selected.
γ is a parameter indicating a positive value of 1 or less, and is a value generally called a discount rate.
α is a learning coefficient indicating a positive value of 1 or less.

式（２）は、行動主体の状態Ｓ_ｔにおける行動主体が行う行動ａ_ｔに基づく報酬ｒ_ｔ+１と、行動ａ_ｔにより遷移した行動主体の状態Ｓ_ｔ+１における行動主体が行う行動ａ^＊の価値Ｑ（Ｓ_ｔ+１，ａ^＊）とに基づいて行動主体の状態Ｓ_ｔにおける行動主体が行う行動ａ_ｔの価値Ｑ（Ｓ_ｔ，ａ_ｔ）を更新するものである。
具体的には、式（２）は、状態Ｓ_ｔにおける行動ａ_ｔによる価値Ｑ（Ｓ_ｔ，ａ_ｔ）よりも、状態Ｓ_ｔにおける行動ａ_ｔに基づく報酬ｒ_ｔ+１と、行動ａ_ｔにより遷移した状態Ｓ_ｔ+１における行動ａ^＊の価値Ｑ（Ｓ_ｔ+１，ａ^＊）との和の方が大きい場合、価値Ｑ（Ｓ_ｔ，ａ_ｔ）を大きくするように更新する。反対に、式（２）は、状態Ｓ_ｔにおける行動ａ_ｔによる価値Ｑ（Ｓ_ｔ，ａ_ｔ）よりも、状態Ｓ_ｔにおける行動ａ_ｔに基づく報酬ｒ_ｔ+１と、行動ａ_ｔにより遷移した状態Ｓ_ｔ+１における行動ａ^＊の価値Ｑ（Ｓ_ｔ+１，ａ^＊）との和の方が小さい場合、価値Ｑ（Ｓ_ｔ，ａ_ｔ）を小さくするように更新する。Equation (2) is provided with a reward r _{t + 1} based on the action a _t the actors in the state S _t of actors do, action a the actors in the state S _{t + 1} transitions to the actors by action a _t is performed ^* value _{^{Q (S t + 1, a}} *) values _{Q (,} S t _{a t)} actions _{a t} the actors performed in the state _{S t} of actors based on the is to update.
Specifically, equation (2) is behavior in state _{S t} _{a t} by Value _{Q (S} t, _{a t)} than a reward r _{t + 1} based on the action _{a t} in state _{S t,} act _{a t} If found the following sum of the behavior in state S _{t + 1} a transition ^{a *} value _{^{Q (S t + 1, a}} *) greater by the value _{Q (S} _{t, a} t) is updated so as to increase the. Transition Conversely, equation (2) is behavior in state _{S t} _{a t} by Value _{Q (S} t, _{a t)} than a reward r _{t + 1} based on the action _{a t} in state _{S t,} the action _{a t} If the state S _{t + 1} in action ^{a *} of the value Q was _{^{(S t + 1, a *}} ) towards the sum of the small, value _{Q (S} _{t, a} t) is updated so as to reduce the.

つまり、式（２）は、行動主体がある状態である場合において、行動主体がある行動を行った際の当該行動の価値を、当該行動に基づく報酬と、当該行動により遷移した状態における最良の行動の価値との和に近付けるように更新するためのものである。
行動主体の状態が状態Ｓ_ｔ+１であるときに行動主体が取り得る行動ａ_ｔ+１のうち、行動主体が、最もＱ（Ｓ_ｔ+１，ａ_ｔ+１）の値が大きな値となる行動ａ^＊を決定する方法は、例えば、ε−ｇｒｅｅｄｙ法、Ｓｏｆｔｍａｘ法、又は、ＲＢＦ（ＲａｄｉａｌＢａｓｉｓＦｕｎｃｔｉｏｎ）関数を用いる方法がある。これらの方法は、公知であるため説明を省略する。That is, in the case where the action subject is in a certain state, the expression (2) calculates the value of the action when the action subject performs a certain action, the reward based on the action, and the best value in the state shifted by the action. It is for updating so as to approach the sum of the value of the action.
Among actors can take action a _{t + 1} when the state of the actors is in state S _{t + 1,} actors is the value of the most _{Q (S t + 1, a} t + 1) a large value As a method of determining the action a ^* , there is, for example, a method using an ε-greedy method, a Softmax method, or a method using an RBF (Radial Basis Function) function. These methods are well-known and will not be described.

上述の一般的な式（２）において、行動主体は、実施の形態１に係るアーム１１であり、行動主体の状態は、アーム１１の位置であり、行動は、アーム１１の位置の移動である。 In the above general formula (2), the action subject is the arm 11 according to Embodiment 1, the state of the action subject is the position of the arm 11, and the action is movement of the position of the arm 11. .

モデル生成部３２２は、式（２）に式（１）を適用することにより、モデル情報を生成する。
具体的には、モデル生成部３２２は、式（２）に式（１）を適用することにより、現在位置取得部３０５が取得したロボット１０におけるアーム１１の現在位置を示す現在位置情報と、アーム１１を移動させるための制御内容を示す制御信号とを対応付けた対応情報を生成する。対応情報は、互いに異なる複数の目標位置において、目標位置毎に、複数の位置と、各位置に対応する制御信号がセットになった情報である。モデル生成部３２２は、互いに異なる複数の目標位置のそれぞれに対応付けた複数の対応情報を含むモデル情報を生成する。
以上のとおり、モデル生成部３２２は、現在位置取得部３０５が取得した現在位置情報と、目標位置取得部３０６が取得した目標位置情報と、参照経路取得部３２０が取得した参照経路情報と、報酬算出部３２１が算出した報酬とに基づいてモデル情報を生成するものである。The model generation unit 322 generates model information by applying Expression (1) to Expression (2).
More specifically, the model generation unit 322 applies the expression (1) to the expression (2) to obtain current position information indicating the current position of the arm 11 in the robot 10 acquired by the current position acquisition unit 305, 11 generates correspondence information in which a control signal indicating the control content for moving 11 is associated. The correspondence information is information in which a plurality of positions and a control signal corresponding to each position are set for each of the plurality of different target positions. The model generation unit 322 generates model information including a plurality of pieces of correspondence information associated with a plurality of mutually different target positions.
As described above, the model generation unit 322 includes the current position information acquired by the current position acquisition unit 305, the target position information acquired by the target position acquisition unit 306, the reference route information acquired by the reference route acquisition unit 320, The model information is generated based on the reward calculated by the calculation unit 321.

図７を参照して、実施の形態１に係るアーム１１の状態が状態Ｓ_ｔであるときにアーム１１が取り得る行動ａ_ｔから、行動ａ^＊を選択する方法について説明する。
図７は、実施の形態１に係るアーム１１の状態が状態Ｓ_ｔであるときにアーム１１が取り得る行動ａ_ｔから、行動ａ^＊を選択する一例を示す図である。Referring to FIG. 7, from the action a _t the arm 11 can take when the state of the arm 11 according to the first embodiment is in state S _t, a method of selecting an action a ^* will be described.
7, the action a _t the arm 11 can take when the state of the arm 11 according to the first embodiment is in state S _t, is a diagram illustrating an example of selecting an action a ^*.

図７において、ａ_ｉ、ａ_ｊ、及びａ^＊は、時点ｔにおいて、アーム１１の状態が状態Ｓ_ｔであるときにアーム１１が取り得る行動である。また、Ｑ（Ｓ_ｔ，ａ_ｉ）、Ｑ（Ｓ_ｔ，ａ_ｊ）、及びＱ（Ｓ_ｔ，ａ^＊）は、アーム１１の状態が状態Ｓ_ｔであるときにアーム１１が行動ａ_ｉ、行動ａ_ｊ、及び行動ａ^＊を行った際の各行動に対する価値である。
モデル生成部３２２は、式（２）に式（１）を適用することにより、モデル情報を生成するため、価値Ｑ（Ｓ_ｔ，ａ_ｉ）、価値Ｑ（Ｓ_ｔ，ａ_ｊ）、及び価値Ｑ（Ｓ_ｔ，ａ^＊）は、式（１）における第６項及び第７項を含む演算式により評価される。すなわち、価値Ｑ（Ｓ_ｔ，ａ_ｉ）、価値Ｑ（Ｓ_ｔ，ａ_ｊ）、及び価値Ｑ（Ｓ_ｔ，ａ^＊）は、アーム１１の位置と参照経路との間の距離が近いほど、また、アーム１１が参照経路に沿って目標位置の方向に向かって移動した距離が長いほど、高い値となる。In FIG. 7, _ai , _aj , and a ^* are actions that the arm 11 can take when the state of the arm 11 is the state _St at the time _t . _{_{Furthermore, Q (S t, a i}} ), Q (S t, a j), and _{Q (S} t, ^{a *),} the arm 11 action _a i when the state of the arm 11 is in the state _{S t,} The action a _j and the value for each action when the action a ^* is performed.
The model generation unit 322 applies the expression (1) to the expression (2) to generate model information, and thus the value Q ( _St , _ai ), the value Q ( _St , _aj ), and the value Q ( _St , _aj ). Q ( _St , a ^* ) is evaluated by an arithmetic expression including the sixth and seventh terms in equation (1). That is, the value Q ( _St , _ai ), the value Q ( _St , _aj ), and the value Q ( _St , a ^* ) are determined as the distance between the position of the arm 11 and the reference path decreases. The value increases as the distance that the arm 11 has moved toward the target position along the reference path is longer.

したがって、価値Ｑ（Ｓ_ｔ，ａ_ｉ）、価値Ｑ（Ｓ_ｔ，ａ_ｊ）、及び価値Ｑ（Ｓ_ｔ，ａ^＊）を比較した場合、価値Ｑ（Ｓ_ｔ，ａ^＊）が最も高い値を示すため、モデル生成部３２２は、ロボット１０の状態が状態Ｓ_ｔであるとき、行動ａ^＊を選択して、状態Ｓ_ｔと行動ａ^＊に対応する制御信号とを対応付けてモデル情報を生成する。
なお、モデル生成部３２２は、モデル情報を生成する際に、報酬を算出する適切な演算式を採用することにより、上述の行動ａ^＊を決定するための試行回数を低減させることが可能なＴＤ学習を用いることが好適である。Therefore, when the value Q ( _St , _ai ), the value Q ( _St , _aj ), and the value Q ( _St , a ^* ) are compared, the value Q ( _St , a ^* ) is the highest value. to indicate, the model generation unit 322, when the state of the robot 10 is in state S _t, and select an action a ^*, model information in association with the control signal corresponding to a state S _t action a ^* Generate.
When generating model information, the model generation unit 322 can reduce the number of trials for determining the above-mentioned action a ^* by adopting an appropriate arithmetic expression for calculating a reward. Preferably, learning is used.

制御生成部３０７は、モデル生成部３２２がモデル情報を生成する際に選択した行動に対応する制御信号を生成する。 The control generation unit 307 generates a control signal corresponding to the action selected when the model generation unit 322 generates the model information.

制御出力部３０８は、制御生成部３０７が生成した制御信号を、ネットワーク３０を介して、ロボット１０備えられたモータ制御手段１３に出力する。
ロボット１０に備えられたモータ制御手段１３は、ネットワーク３０を介して、制御出力部３０８が出力した制御信号を受信し、上述のとおり、受信した制御信号を入力信号として、制御信号に基づいてモータ１２−１，１２−２を制御する。
モデル出力部３２３は、モデル生成部３２２が生成したモデル情報を、ネットワーク３０を介して、記憶装置４０に出力し、記憶装置４０に記憶させる。The control output unit 308 outputs the control signal generated by the control generation unit 307 to the motor control unit 13 provided in the robot 10 via the network 30.
The motor control unit 13 provided in the robot 10 receives the control signal output from the control output unit 308 via the network 30, and uses the received control signal as an input signal and 12-1 and 12-2 are controlled.
The model output unit 323 outputs the model information generated by the model generation unit 322 to the storage device 40 via the network 30 and causes the storage device 40 to store the model information.

制御補正部３１１は、制御生成部３０７が生成した制御信号（以下「第１制御信号」という。）が示す制御内容が、制御生成部３０７が直前に生成した制御信号（以下「第２制御信号」という。）が示す制御内容と比較して、予め定められた範囲内の変化量になるように、第１制御信号を補正する。
なお、制御補正部３１１が、第１制御信号と第２制御信号とを比較する例を説明したが、制御補正部３１１は、第１制御信号と、回転状況取得部３０４が取得する回転状況信号とを比較し、ロボット１０において、モータ制御手段１３が行っている制御に対して予め定められた範囲内の変化量になるように、第１制御信号を補正しても良い。
制御補正部３１１は、ロボット制御装置１００における制御補正部１１１と同様の動作であるため、詳細な説明は省略する。
なお、モデル生成部３２２は、制御補正部３１１が補正した制御信号を用いてモデル情報を生成しても良い。The control correction unit 311 determines that the control content indicated by the control signal (hereinafter, referred to as “first control signal”) generated by the control generation unit 307 is the control signal generated immediately before by the control generation unit 307 (hereinafter, “second control signal”). ) Is corrected so that the amount of change is within a predetermined range.
Although the example in which the control correction unit 311 compares the first control signal with the second control signal has been described, the control correction unit 311 performs the first control signal and the rotation status signal acquired by the rotation status acquisition unit 304. And the robot 10 may correct the first control signal so that the amount of change in the control performed by the motor control means 13 falls within a predetermined range.
The operation of the control correction unit 311 is the same as that of the control correction unit 111 in the robot control device 100, and thus detailed description is omitted.
Note that the model generation unit 322 may generate model information using the control signal corrected by the control correction unit 311.

制御補間部３１２は、制御生成部３０７が生成した第１制御信号が示す制御内容の一部又は全部が欠落している場合、制御生成部３０７が直前に生成した第２制御信号が示す制御内容に基づいて第１制御信号における欠落している制御内容を補間して第１制御信号を補正する。制御補間部３１２は、第２制御信号が示す制御内容に基づいて第１制御信号における欠落している制御内容を補間する際、第１制御信号における欠落している制御内容が、第２制御信号が示す制御内容から予め定められた範囲内の変化量になるように補間して第１制御信号を補正する。
なお、制御補間部３１２が、第１制御信号における欠落している制御内容を補間する際、第２制御信号に基づいて第１制御信号を補間する例を説明したが、制御補間部３１２は、回転状況取得部３０４が取得する回転状況信号に基づいてロボット１０において、モータ制御手段１３が行っている制御に対して予め定められた範囲内の変化量になるように、第１制御信号を補間して補正しても良い。
制御補間部３１２は、ロボット制御装置１００における制御補間部１１２と同様の動作であるため、詳細な説明は省略する。
なお、モデル生成部３２２は、制御補間部３１２が補正した制御信号を用いてモデル情報を生成しても良い。When part or all of the control content indicated by the first control signal generated by the control generation unit 307 is missing, the control interpolation unit 312 determines the control content indicated by the second control signal generated immediately before by the control generation unit 307. And corrects the first control signal by interpolating the missing control content in the first control signal. When interpolating the missing control content of the first control signal based on the content of the control indicated by the second control signal, the control interpolation unit 312 converts the missing control content of the first control signal into the second control signal. The first control signal is corrected by interpolating so as to be a variation within a predetermined range from the control content indicated by.
Note that, when the control interpolation unit 312 interpolates the missing control content in the first control signal, an example has been described in which the first control signal is interpolated based on the second control signal. The first control signal is interpolated by the robot 10 based on the rotation status signal acquired by the rotation status acquisition unit 304 so that the amount of change in the control performed by the motor control unit 13 falls within a predetermined range. May be corrected.
The operation of the control interpolation unit 312 is the same as that of the control interpolation unit 112 in the robot control device 100, and thus detailed description is omitted.
Note that the model generation unit 322 may generate model information using the control signal corrected by the control interpolation unit 312.

図８を参照して、実施の形態１に係るロボット制御学習装置３００の動作について説明する。
図８は、実施の形態１に係るロボット制御学習装置３００の処理の一例を説明するフローチャートである。
ロボット制御学習装置３００は、例えば、当該フローチャートの処理を繰り返して実行する。The operation of the robot control learning device 300 according to the first embodiment will be described with reference to FIG.
FIG. 8 is a flowchart illustrating an example of a process of the robot control learning device 300 according to the first embodiment.
The robot control learning device 300 repeatedly executes the processing of the flowchart, for example.

まず、ステップＳＴ８０１にて、画像取得部３０１は、画像情報を取得する。
次に、ステップＳＴ８０２にて、回転状況取得部３０４は、回転状況信号を取得する。
次に、ステップＳＴ８０３にて、現在位置取得部３０５は、ロボット１０におけるアーム１１の現在位置を示す現在位置情報を取得する。
次に、ステップＳＴ８０４にて、仮想空間画像生成部３０２は、指標情報を取得する。次に、ステップＳＴ８０５にて、仮想空間画像生成部３０２は、仮想空間画像情報を生成する。
次に、ステップＳＴ８０６にて、目標位置取得部３０６は、目標位置情報を取得する。First, in step ST801, image obtaining section 301 obtains image information.
Next, in step ST802, rotation status obtaining section 304 obtains a rotation status signal.
Next, in step ST803, the current position acquisition unit 305 acquires current position information indicating the current position of the arm 11 in the robot 10.
Next, in step ST804, virtual space image generating section 302 acquires index information. Next, in step ST805, the virtual space image generation unit 302 generates virtual space image information.
Next, in step ST806, target position obtaining section 306 obtains target position information.

ステップＳＴ８０７にて、制御生成部３０７は、現在位置取得部３０５が取得した現在位置情報が示すロボット１０におけるアーム１１の現在位置と目標位置情報が示す目標位置とが同一であるか否かを判定する。なお、ここで言う同一とは、必ずしも完全に一致するものに限らず、同一は、略同一を含むものである。
ステップＳＴ８０７にて、制御生成部３０７が、アーム１１の現在位置と目標位置とが同一でないと判定した場合、ロボット制御学習装置３００は、ステップＳＴ８１１以降の処理を実行する。
ステップＳＴ８１１にて、報酬算出部３２１は、アーム１１が取り得る複数の行動における報酬を行動ごとに算出する。
次に、ステップＳＴ８１２にて、モデル生成部３２２は、報酬算出部３２１が行動ごとに算出した報酬と、当該行動ごとの価値と、当該行動ごとに次に取りうる複数の行動ごとの価値とに基づいて行うべき行動を選択する。
次に、ステップＳＴ８１３にて、制御生成部３０７は、モデル生成部３２２が選択した行動に対応する制御信号を生成する。In step ST807, the control generation unit 307 determines whether the current position of the arm 11 in the robot 10 indicated by the current position information acquired by the current position acquisition unit 305 is the same as the target position indicated by the target position information. I do. It should be noted that the term "identical" used herein is not limited to a completely identical one, and the same includes substantially the same.
In step ST807, when the control generation unit 307 determines that the current position and the target position of the arm 11 are not the same, the robot control learning device 300 executes the processing after step ST811.
In step ST811, the reward calculation unit 321 calculates rewards for a plurality of actions that the arm 11 can take for each action.
Next, in step ST812, the model generation unit 322 converts the reward calculated by the reward calculation unit 321 for each action, the value for each action, and the value for a plurality of actions that can be taken next for each action. Choose the action to take based on that.
Next, in step ST813, control generation section 307 generates a control signal corresponding to the action selected by model generation section 322.

次に、ステップＳＴ８１４にて、制御補正部３１１は、制御生成部３０７が生成した第１制御信号が示す制御内容が、制御生成部３０７が直前に生成した第２制御信号が示す制御内容と比較して、予め定められた範囲内の変化量になるように、第１制御信号を補正する。
次に、ステップＳＴ８１５にて、制御補間部３１２は、制御生成部３０７が生成した第１制御信号が示す制御内容の一部又は全部が欠落している場合、制御生成部３０７が直前に生成した第２制御信号が示す制御内容に基づいて第１制御信号における欠落している制御内容を補間して第１制御信号を補正する。
次に、ステップＳＴ８１６にて、モデル生成部３２２は、現在位置取得部３０５が取得した現在位置情報と、アーム１１を移動させるための制御内容を示す制御信号とを対応付けた対応情報を生成することにより、モデル情報を生成する。Next, in step ST814, the control correction unit 311 compares the control content indicated by the first control signal generated by the control generation unit 307 with the control content indicated by the second control signal generated immediately before by the control generation unit 307. Then, the first control signal is corrected so that the change amount is within a predetermined range.
Next, in step ST815, when a part or all of the control content indicated by the first control signal generated by the control generation unit 307 is missing, the control generation unit 307 generates the control content immediately before. The first control signal is corrected by interpolating the missing control content of the first control signal based on the control content indicated by the second control signal.
Next, in step ST816, the model generation unit 322 generates correspondence information in which the current position information acquired by the current position acquisition unit 305 is associated with a control signal indicating control content for moving the arm 11. Thus, model information is generated.

次に、ステップＳＴ８１７にて、制御出力部３０８は、制御生成部３０７が生成した制御信号、又は、制御補正部３１１若しくは制御補間部３１２が補正した制御信号を、ロボット１０備えられたモータ制御手段１３に出力する。
次に、ステップＳＴ８１８にて、回転状況取得部３０４は、回転状況信号を取得する。
次に、ステップＳＴ８１９にて、現在位置取得部３０５は、ロボット１０におけるアーム１１の現在位置を示す現在位置情報を取得する。
次に、ステップＳＴ８２０にて、仮想空間画像生成部３０２は、仮想空間画像情報を生成する。Next, in step ST817, the control output unit 308 outputs the control signal generated by the control generation unit 307 or the control signal corrected by the control correction unit 311 or the control interpolation unit 312 to the motor control unit provided in the robot 10. 13 is output.
Next, in step ST818, rotation status obtaining section 304 obtains a rotation status signal.
Next, in step ST819, the current position acquisition unit 305 acquires current position information indicating the current position of the arm 11 in the robot 10.
Next, in step ST820, virtual space image generation section 302 generates virtual space image information.

ロボット制御学習装置３００は、ステップＳＴ８２０の処理を実行した後、ステップＳＴ８０７の処理に戻って、ステップＳＴ８０７にて、制御生成部３０７が、アーム１１の現在位置と目標位置とが同一であると判定するまでの期間において、ステップＳＴ８０７からステップＳＴ８２０までの処理を繰り返し実行する。
ステップＳＴ８０７にて、制御生成部３０７が、アーム１１の現在位置と目標位置とが同一であると判定した場合、ステップＳＴ８２１にて、モデル出力部３２３は、モデル生成部３２２が生成したモデル情報を出力する。
ステップＳＴ８２１の処理を実行した後、ロボット制御学習装置３００は、当該フローチャートの処理を終了する。After executing the process of step ST820, the robot control learning device 300 returns to the process of step ST807, and in step ST807, the control generation unit 307 determines that the current position and the target position of the arm 11 are the same. Until the process, the processes from step ST807 to step ST820 are repeatedly executed.
In step ST807, when the control generation unit 307 determines that the current position and the target position of the arm 11 are the same, in step ST821, the model output unit 323 outputs the model information generated by the model generation unit 322. Output.
After executing the process of step ST821, the robot control learning device 300 ends the process of the flowchart.

なお、当該フローチャートの処理において、ステップＳＴ８１４、ステップＳＴ８１５，及びステップＳＴ８２０の処理は、ロボット制御学習装置３００において必須の処理ではない。また、当該フローチャートの処理において、ステップＳＴ８０１及びステップＳＴ８０２の処理は、実行される順序が逆でも良い。また、当該フローチャートの処理において、ステップＳＴ８１４及びステップＳＴ８１５の処理は、実行される順序が逆でも良い。 Note that, in the processing of the flowchart, the processing of step ST814, step ST815, and step ST820 is not essential processing in the robot control learning device 300. In addition, in the processing of the flowchart, the order of execution of the processing of step ST801 and step ST802 may be reversed. In addition, in the processing of the flowchart, the processing of steps ST814 and ST815 may be performed in the reverse order.

図９は、アーム１１が目標位置に到達するまでに移動した経路の一例を示した図である。図９Ａは、ある時点におけるアーム１１の位置から目標位置まで参照経路を設定して式（１）に示した演算式を用いる場合、図９Ｂは、ある時点におけるアーム１１の位置から目標位置に至る途中まで参照経路を設定して式（１）に示した演算式を用いた場合、図９Ｃは、参照経路を設定せずに、式（１）に示した演算式から第６項と第７項を除いた演算式を用いる場合を示している。
図９Ａは、アーム１１が目標位置に到達するまで、設定された参照経路に沿って移動することが見て取れる。また、図９Ｂは、アーム１１が設定された参照経路が存在する地点まで参照経路に沿って移動し、その後、目標位置に向かって移動することが見て取れる。これに対して、図９Ｃは、アーム１１が目標位置に向かって移動する際に、アーム１１が障害物を避けるように移動するため目標位置に到達することができないことが見て取れる。すなわち、ロボット制御学習装置３００は、図９Ａ及び図９Ｂに示すように、参照経路を設定して式（１）に示した演算式を用いて学習を行うことにより、短期間で学習を完了することができる。FIG. 9 is a diagram illustrating an example of a path on which the arm 11 has moved until reaching the target position. FIG. 9A shows a case where the reference path is set from the position of the arm 11 at a certain time to the target position and the arithmetic expression shown in Expression (1) is used, and FIG. In the case where the reference path is set halfway and the arithmetic expression shown in Expression (1) is used, FIG. 9C shows the case where the reference path is not set and the sixth and seventh terms are calculated from the arithmetic expression shown in Expression (1). The case where an arithmetic expression excluding the term is used is shown.
FIG. 9A shows that the arm 11 moves along the set reference path until the arm 11 reaches the target position. Also, in FIG. 9B, it can be seen that the arm 11 moves along the reference route to the point where the set reference route exists, and then moves toward the target position. On the other hand, FIG. 9C shows that when the arm 11 moves toward the target position, the arm 11 moves so as to avoid an obstacle and cannot reach the target position. That is, as shown in FIGS. 9A and 9B, the robot control learning device 300 completes learning in a short period of time by setting a reference route and performing learning using the arithmetic expression shown in Expression (1). be able to.

以上のように、ロボット制御装置１００は、ロボット１０のアーム１１の現在位置を示す現在位置情報を取得する現在位置取得部１０５と、アーム１１の目標位置を示す目標位置情報を取得する目標位置取得部１０６と、参照経路を示す参照経路情報を参照してアーム１１が参照経路を基にして移動していることを評価することにより報酬を算出する項を含む、報酬を算出するための演算式を用いて学習させたモデルを示すモデル情報と、現在位置取得部１０５が取得した現在位置情報と、目標位置取得部１０６が取得した目標位置情報とに基づいて目標位置情報が示す目標位置に向かってアーム１１を移動させるための制御内容を示す制御信号を生成する制御生成部１０７と、を備えた。 As described above, the robot control device 100 includes the current position acquisition unit 105 that acquires current position information indicating the current position of the arm 11 of the robot 10 and the target position acquisition unit that acquires target position information indicating the target position of the arm 11. An arithmetic expression for calculating a reward including a unit 106 and a term for calculating a reward by evaluating that the arm 11 is moving based on the reference route with reference to reference route information indicating the reference route. To the target position indicated by the target position information based on the model information indicating the model trained by using the current position information, the current position information obtained by the current position obtaining unit 105, and the target position information obtained by the target position obtaining unit 106. And a control generation unit 107 that generates a control signal indicating the control content for moving the arm 11 with the control unit 107.

このように構成することで、ロボット制御装置１００は、演算量を減らしつつ、ロボット１０のアーム１１が実質的に不連続な動作を行うことないようにロボット１０を制御することができる。 With this configuration, the robot control device 100 can control the robot 10 such that the arm 11 of the robot 10 does not perform a substantially discontinuous operation while reducing the amount of calculation.

また、以上のように、ロボット制御学習装置３００は、ロボット１０のアーム１１の現在位置を示す現在位置情報を取得する現在位置取得部３０５と、アーム１１の目標位置を示す目標位置情報を取得する目標位置取得部３０６と、参照経路を示す参照経路情報を取得する参照経路取得部３２０と、現在位置取得部３０５が取得した現在位置情報と、目標位置取得部３０６が取得した目標位置情報と、参照経路取得部３２０が取得した参照経路情報とに基づいてアーム１１が参照経路を基にして移動していることを評価することにより報酬を算出する項を含む、報酬を算出するための演算式を用いて、報酬を算出する報酬算出部３２１と、目標位置情報が示す目標位置に向かってアーム１１を移動させるための制御内容を示す制御信号を生成する制御生成部３０７と、現在位置取得部３０５が取得した現在位置情報と、目標位置取得部３０６が取得した目標位置情報と、参照経路取得部３２０が取得した参照経路情報と、報酬算出部３２１が算出した報酬とに基づいて制御信号によりアーム１１を移動させる価値を評価することにより、モデル情報を生成するモデル生成部３２２と、を備えた。 Further, as described above, the robot control learning device 300 acquires the current position acquisition unit 305 that acquires the current position information indicating the current position of the arm 11 of the robot 10 and the target position information that indicates the target position of the arm 11. A target position acquisition unit 306, a reference route acquisition unit 320 that acquires reference route information indicating a reference route, current position information acquired by the current position acquisition unit 305, target position information acquired by the target position acquisition unit 306, An arithmetic expression for calculating a reward including a term for calculating a reward by evaluating that the arm 11 is moving based on the reference route based on the reference route information acquired by the reference route acquiring unit 320 And a control signal indicating a control content for moving the arm 11 toward the target position indicated by the target position information. Control generation unit 307, current position information acquired by current position acquisition unit 305, target position information acquired by target position acquisition unit 306, reference route information acquired by reference route acquisition unit 320, and reward calculation unit 321 And a model generation unit 322 that generates model information by evaluating the value of moving the arm 11 based on the control signal based on the calculated reward.

このように構成することで、ロボット制御学習装置３００は、ロボット１０のアーム１１が実質的に不連続な動作を行うことないようにロボット１０を制御させるためのモデル情報を、短い学習期間で生成することができる。 With this configuration, the robot control learning device 300 generates model information for controlling the robot 10 in a short learning period so that the arm 11 of the robot 10 does not perform a substantially discontinuous operation. can do.

実施の形態２．
図１０を参照して実施の形態２に係るロボット制御装置１００ａについて説明する。
図１０は、実施の形態２に係るロボット制御装置１００ａ及びロボット制御システム１ａの要部の構成の一例を示すブロック図である。
ロボット制御装置１００ａは、例えば、ロボット制御システム１ａに適用される。
ロボット制御装置１００ａは、ロボット制御装置１００と同様に、モデル情報、アーム１１の現在位置を示す現在位置情報、及び作業環境２０における目標位置を示す目標位置情報に基づいて目標位置情報が示す目標位置に向かってアーム１１を移動させるための制御内容を示す制御信号を生成し、生成した制御信号を、ネットワーク３０を介してロボット１０に備えられたモータ制御手段１３に出力するものである。ロボット制御装置１００ａが制御信号を生成する際に用いるモデル情報は、ロボット制御学習装置３００により生成される。Embodiment 2 FIG.
The robot control device 100a according to the second embodiment will be described with reference to FIG.
FIG. 10 is a block diagram illustrating an example of a configuration of a main part of the robot control device 100a and the robot control system 1a according to the second embodiment.
The robot control device 100a is applied to, for example, a robot control system 1a.
The robot control device 100a, like the robot control device 100, sets the target position indicated by the target position information based on the model information, the current position information indicating the current position of the arm 11, and the target position information indicating the target position in the work environment 20. A control signal indicating the content of control for moving the arm 11 toward is generated, and the generated control signal is output to the motor control means 13 provided in the robot 10 via the network 30. Model information used when the robot control device 100a generates the control signal is generated by the robot control learning device 300.

実施の形態２に係るロボット制御装置１００ａは、実施の形態１に係るロボット制御装置１００と比較して、参照経路取得部１２０、報酬算出部１２１、モデル更新部１２２、モデル出力部１２３、及び接触信号取得部１２４が追加され、ロボット制御学習装置３００が出力した学習済みのモデル情報を更新可能にしたものである。
実施の形態２に係るロボット制御装置１００ａ及びロボット制御システム１ａの構成において、実施の形態１に係るロボット制御装置１００又はロボット制御システム１と同様の構成については、同じ符号を付して重複した説明を省略する。すなわち、図２に記載した符号と同じ符号を付した図１０の構成については、説明を省略する。The robot control device 100a according to the second embodiment is different from the robot control device 100 according to the first embodiment in that the reference route acquisition unit 120, the reward calculation unit 121, the model update unit 122, the model output unit 123, and the contact A signal acquisition unit 124 has been added to enable the learned model information output by the robot control learning device 300 to be updated.
In the configurations of the robot control device 100a and the robot control system 1a according to the second embodiment, the same components as those of the robot control device 100 or the robot control system 1 according to the first embodiment are denoted by the same reference numerals, and the description will be repeated. Is omitted. That is, the description of the configuration in FIG. 10 to which the same reference numerals as those in FIG.

ロボット制御システム１ａは、ロボット制御装置１００ａ、ロボット１０、ネットワーク３０、記憶装置４０、及び撮像装置５０を備える。
ロボット１０に備えられたモータ制御手段１３、回転センサ１４−１，１４−２、及び接触センサ１５、記憶装置４０、撮像装置５０、並びに、ロボット制御システム１ａは、それぞれ、ネットワーク３０に接続されている。
ロボット制御装置１００ａは、画像取得部１０１、仮想空間画像生成部１０２、モデル取得部１０３、回転状況取得部１０４、目標位置取得部１０６、現在位置取得部１０５、制御生成部１０７ａ、及び制御出力部１０８ａ、参照経路取得部１２０、報酬算出部１２１、モデル更新部１２２、モデル出力部１２３、及び接触信号取得部１２４を備える。ロボット制御装置１００ａは、上述の構成に加えて、制御補正部１１１ａ、及び制御補間部１１２ａを備えるものであっても良い。The robot control system 1a includes a robot control device 100a, a robot 10, a network 30, a storage device 40, and an imaging device 50.
The motor control means 13, the rotation sensors 14-1, 14-2, and the contact sensor 15, the storage device 40, the imaging device 50, and the robot control system 1a provided in the robot 10 are connected to the network 30 respectively. I have.
The robot control device 100a includes an image acquisition unit 101, a virtual space image generation unit 102, a model acquisition unit 103, a rotation status acquisition unit 104, a target position acquisition unit 106, a current position acquisition unit 105, a control generation unit 107a, and a control output unit. 108a, a reference route acquisition unit 120, a reward calculation unit 121, a model update unit 122, a model output unit 123, and a contact signal acquisition unit 124. The robot control device 100a may include a control correction unit 111a and a control interpolation unit 112a in addition to the above-described configuration.

なお、実施の形態２に係るロボット制御装置１００ａにおける画像取得部１０１、仮想空間画像生成部１０２、モデル取得部１０３、回転状況取得部１０４、目標位置取得部１０６、現在位置取得部１０５、制御生成部１０７ａ、及び制御出力部１０８ａ、参照経路取得部１２０、報酬算出部１２１、モデル更新部１２２、モデル出力部１２３、接触信号取得部１２４、制御補正部１１１ａ、及び制御補間部１１２ａの各機能は、実施の形態１において図２Ａ及び図２Ｂに一例を示したハードウェア構成におけるプロセッサ２０１及びメモリ２０２により実現されるものであっても良く、又は処理回路２０３により実現されるものであっても良い。 Note that the image acquisition unit 101, virtual space image generation unit 102, model acquisition unit 103, rotation status acquisition unit 104, target position acquisition unit 106, current position acquisition unit 105, control generation in the robot control device 100a according to the second embodiment. The functions of the unit 107a, the control output unit 108a, the reference route acquisition unit 120, the reward calculation unit 121, the model update unit 122, the model output unit 123, the contact signal acquisition unit 124, the control correction unit 111a, and the control interpolation unit 112a are as follows. 2A and 2B in the first embodiment, it may be realized by the processor 201 and the memory 202 in the hardware configuration, or may be realized by the processing circuit 203. .

参照経路取得部１２０は、参照経路を示す参照経路情報を取得する。具体的には、例えば、参照経路取得部１２０は、ロボット制御学習装置３００がモデル情報を生成する際に用いた参照経路情報を、モデル取得部１０３が取得したモデル情報から読み出すことにより、参照経路情報を取得する。 The reference route acquisition unit 120 acquires reference route information indicating a reference route. Specifically, for example, the reference route acquisition unit 120 reads the reference route information used when the robot control learning device 300 generates the model information from the model information acquired by the model acquisition unit 103, and Get information.

報酬算出部１２１は、現在位置取得部１０５が取得した現在位置情報と、目標位置取得部１０６が取得した目標位置情報と、参照経路取得部１２０が取得した参照経路情報とに基づいてアーム１１が参照経路を基にして移動していることを評価することにより報酬を算出する項を含む演算式を用いて、報酬を算出する。 The reward calculation unit 121 determines whether the arm 11 is based on the current position information acquired by the current position acquisition unit 105, the target position information acquired by the target position acquisition unit 106, and the reference route information acquired by the reference route acquisition unit 120. The reward is calculated using an arithmetic expression including a term for calculating the reward by evaluating that the user is moving based on the reference route.

報酬算出部１２１が報酬を算出する際に用いる演算式は、アーム１１が参照経路を基にして移動していることを評価することにより報酬を算出する項に加えて、現在位置取得部１０５が取得した現在位置情報が示すアーム１１の現在位置の移動の連続性を評価することにより報酬を算出する項を含むものであっても良い。
また、報酬算出部１２１が報酬を算出する際に用いる演算式は、アーム１１が参照経路を基にして移動していることを評価することにより報酬を算出する項に加えて、アーム１１と作業環境２０における障害物とが接触したか否かを評価することにより報酬を算出する項を含むものであっても良い。報酬算出部１２１は、アーム１１と作業環境２０における障害物とが接触したか否かを、例えば、接触信号取得部１２４が接触センサ１５から取得した接触信号に基づいて判定する。
具体的には、報酬算出部１２１は、実施の形態１に示した式（１）に基づいて当該報酬を算出する。The calculation formula used when the reward calculation unit 121 calculates the reward includes a term for calculating the reward by evaluating that the arm 11 is moving based on the reference route. It may include a term for calculating a reward by evaluating continuity of movement of the current position of the arm 11 indicated by the obtained current position information.
The calculation formula used by the reward calculation unit 121 when calculating the reward includes, in addition to the term for calculating the reward by evaluating that the arm 11 is moving based on the reference route, the work with the arm 11 It may include a term for calculating a reward by evaluating whether or not an obstacle in the environment 20 has contacted. The reward calculation unit 121 determines whether or not the arm 11 and an obstacle in the work environment 20 have contacted each other, for example, based on the contact signal acquired from the contact sensor 15 by the contact signal acquisition unit 124.
Specifically, reward calculating section 121 calculates the reward based on equation (1) shown in the first embodiment.

モデル更新部１２２は、現在位置取得部１０５が取得した現在位置情報と、目標位置取得部１０６が取得した目標位置情報と、参照経路取得部１２０が取得した参照経路情報と、報酬算出部１２１が算出した報酬とに基づいてモデル情報を更新する。
モデル更新部１２２は、実施の形態１に示した式（２）に式（１）を適用することにより、現在位置取得部１０５が取得した現在位置情報が示すロボット１０におけるアーム１１の現在位置と、アーム１１を移動させるための制御内容を示す制御信号とを対応付けた対応情報を更新することより、モデル情報を更新する。
モデル出力部１２３は、モデル更新部１２２が更新したモデル情報を、ネットワーク３０を介して、記憶装置４０に出力し、記憶装置４０に記憶させる。The model update unit 122 includes the current position information acquired by the current position acquisition unit 105, the target position information acquired by the target position acquisition unit 106, the reference route information acquired by the reference route acquisition unit 120, and the reward calculation unit 121 The model information is updated based on the calculated reward.
The model updating unit 122 applies the expression (1) to the expression (2) shown in the first embodiment to obtain the current position of the arm 11 in the robot 10 indicated by the current position information acquired by the current position acquisition unit 105. The model information is updated by updating the corresponding information that is associated with a control signal indicating the control content for moving the arm 11.
The model output unit 123 outputs the model information updated by the model update unit 122 to the storage device 40 via the network 30 and causes the storage device 40 to store the model information.

制御生成部１０７ａは、モデル取得部１０３が取得したモデル情報と、現在位置取得部１０５が取得した現在位置情報と、目標位置取得部１０６が取得した目標位置情報とに基づいて目標位置情報が示す目標位置に向かってアーム１１を移動させるための制御内容を示す制御信号を生成する。制御生成部１０７ａは、モデル取得部１０３が取得したモデル情報の代わりにモデル更新部１２２が更新したモデル情報に基づいて制御信号を生成する場合があることを除いて、実施の形態１に示した制御生成部１０７と同様であるため、詳細な説明を省略する。 The control generation unit 107a indicates the target position information based on the model information acquired by the model acquisition unit 103, the current position information acquired by the current position acquisition unit 105, and the target position information acquired by the target position acquisition unit 106. A control signal indicating the control content for moving the arm 11 toward the target position is generated. The control generation unit 107a has been described in the first embodiment except that the control signal may be generated based on the model information updated by the model update unit 122 instead of the model information acquired by the model acquisition unit 103. Since it is the same as the control generation unit 107, detailed description will be omitted.

制御補正部１１１ａは、制御生成部１０７ａが生成した第１制御信号が示す制御内容が、制御生成部１０７ａが直前に生成した第２制御信号が示す制御内容と比較して、予め定められた範囲内の変化量になるように、第１制御信号を補正する。
制御補間部１１２ａは、制御生成部１０７ａが生成した第１制御信号が示す制御内容の一部又は全部が欠落している場合、制御生成部１０７ａが直前に生成した第２制御信号が示す制御内容に基づいて第１制御信号における欠落している制御内容を補間して第１制御信号を補正する。
なお、制御補正部１１１ａ及び制御補間部１１２ａの動作は、実施の形態１に示した制御補正部１１１及び制御補間部１１２の動作と同様であるため、詳細な説明を省略する。
また、モデル更新部１２２は、制御補正部１１１ａ又は制御補間部１１２ａが補正した制御信号を用いてモデル情報を更新しても良い。The control correction unit 111a compares the control content indicated by the first control signal generated by the control generation unit 107a with the control content indicated by the second control signal generated immediately before by the control generation unit 107a, and determines a predetermined range. The first control signal is corrected so that the amount of change is within.
When part or all of the control content indicated by the first control signal generated by the control generation unit 107a is missing, the control interpolation unit 112a controls the control content indicated by the second control signal generated immediately before by the control generation unit 107a. And corrects the first control signal by interpolating the missing control content in the first control signal.
Note that the operations of the control correction unit 111a and the control interpolation unit 112a are the same as the operations of the control correction unit 111 and the control interpolation unit 112 described in the first embodiment, and a detailed description will be omitted.
Further, the model updating unit 122 may update the model information using the control signal corrected by the control correction unit 111a or the control interpolation unit 112a.

制御出力部１０８ａ、制御生成部１０７ａが生成した制御信号、又は、制御補正部１１１ａ若しくは制御補間部１１２ａが補正した制御信号を、ロボット１０に備えられたモータ制御手段１３に出力する。 The control output unit 108a, the control signal generated by the control generation unit 107a, or the control signal corrected by the control correction unit 111a or the control interpolation unit 112a is output to the motor control unit 13 provided in the robot 10.

図１１を参照して、実施の形態２に係るロボット制御装置１００ａの動作について説明する。
図１１は、実施の形態２に係るロボット制御装置１００ａの処理の一例を説明するフローチャートである。
ロボット制御装置１００ａは、例えば、新たな目標位置が設定される毎に当該フローチャートの処理を繰り返して実行する。The operation of the robot control device 100a according to Embodiment 2 will be described with reference to FIG.
FIG. 11 is a flowchart illustrating an example of a process of the robot control device 100a according to the second embodiment.
For example, the robot control device 100a repeatedly executes the processing of the flowchart each time a new target position is set.

まず、ステップＳＴ１１０１にて、画像取得部１０１は、画像情報を取得する。
次に、ステップＳＴ１１０２にて、モデル取得部１０３は、モデル情報を取得する。
次に、ステップＳＴ１１０３にて、回転状況取得部１０４は、回転状況信号を取得する。
次に、ステップＳＴ１１０４にて、現在位置取得部１０５は、ロボット１０におけるアーム１１の現在位置を示す現在位置情報を取得する。
次に、ステップＳＴ１１０５にて、仮想空間画像生成部１０２は、仮想空間画像情報を生成する。
次に、ステップＳＴ１１０６にて、目標位置取得部１０６は、目標位置情報を取得する。
次に、ステップＳＴ１１０７にて、制御生成部１０７ａは、モデル情報に含まれる対応情報のうち、目標位置情報が示す目標位置に対応する対応情報を特定する。First, in step ST1101, image obtaining section 101 obtains image information.
Next, in step ST1102, model acquiring section 103 acquires model information.
Next, in step ST1103, rotation status obtaining section 104 obtains a rotation status signal.
Next, in step ST1104, the current position acquisition unit 105 acquires current position information indicating the current position of the arm 11 in the robot 10.
Next, in step ST1105, the virtual space image generation unit 102 generates virtual space image information.
Next, in step ST1106, target position obtaining section 106 obtains target position information.
Next, in step ST1107, the control generation unit 107a specifies correspondence information corresponding to the target position indicated by the target position information among the correspondence information included in the model information.

次に、ステップＳＴ１１０８にて、制御生成部１０７ａは、現在位置取得部１０５が取得した現在位置情報が示すロボット１０におけるアーム１１の現在位置と目標位置情報が示す目標位置とが同一であるか否かを判定する。なお、ここで言う同一とは、必ずしも完全に一致するものに限らず、同一は、略同一を含むものである。 Next, in step ST1108, the control generation unit 107a determines whether the current position of the arm 11 in the robot 10 indicated by the current position information acquired by the current position acquisition unit 105 is the same as the target position indicated by the target position information. Is determined. It should be noted that the term "identical" used herein is not limited to a completely identical one, and the same includes substantially the same.

ステップＳＴ１１０８にて、制御生成部１０７ａが、アーム１１と目標位置とが同一でないと判定した場合、ステップＳＴ１１１１にて、報酬算出部１２１は、報酬を算出する。
次に、ステップＳＴ１１１２にて、モデル更新部１２２は、制御生成部１０７ａが特定した対応情報を更新することにより、モデル情報を更新する。
次に、ステップＳＴ１１１３にて、制御生成部１０７ａは、モデル更新部１２２が更新した対応情報を参照して、現在位置取得部１０５が取得した現在位置情報が示すロボット１０におけるアーム１１の現在位置に対応する制御信号を特定することにより、アーム１１を移動させるための制御内容を示す制御信号を生成する。
次に、ステップＳＴ１１１４にて、制御補正部１１１ａは、制御生成部１０７ａが生成した第１制御信号が示す制御内容が、制御生成部１０７ａが直前に生成した第２制御信号が示す制御内容と比較して、予め定められた範囲内の変化量になるように、第１制御信号を補正する。In step ST1108, when the control generation unit 107a determines that the arm 11 and the target position are not the same, in step ST1111, the reward calculation unit 121 calculates a reward.
Next, in step ST1112, the model updating unit 122 updates the model information by updating the correspondence information specified by the control generating unit 107a.
Next, in step ST1113, the control generation unit 107a refers to the correspondence information updated by the model update unit 122 and sets the current position of the arm 11 in the robot 10 indicated by the current position information acquired by the current position acquisition unit 105. By specifying the corresponding control signal, a control signal indicating the control content for moving the arm 11 is generated.
Next, in step ST1114, control correction section 111a compares control content indicated by the first control signal generated by control generation section 107a with control content indicated by the second control signal generated immediately before by control generation section 107a. Then, the first control signal is corrected so that the change amount is within a predetermined range.

次に、ステップＳＴ１１１５にて、制御補間部１１２ａは、制御生成部１０７ａが生成した第１制御信号が示す制御内容の一部又は全部が欠落している場合、制御生成部１０７ａが直前に生成した第２制御信号が示す制御内容に基づいて第１制御信号における欠落している制御内容を補間して第１制御信号を補正する。
次に、ステップＳＴ１１１６にて、制御出力部１０８ａは、制御生成部１０７ａが生成した制御信号又は制御補正部１１１ａ若しくは制御補間部１１２ａが補正した制御信号を、ロボット１０に備えられたモータ制御手段１３に出力する。
次に、ステップＳＴ１１１７にて、回転状況取得部１０４は、回転状況信号を取得する。
次に、ステップＳＴ１１１８にて、現在位置取得部１０５は、ロボット１０におけるアーム１１の現在位置を示す現在位置情報を取得する。
次に、ステップＳＴ１１１９にて、仮想空間画像生成部１０２は、仮想空間画像情報を生成する。Next, in step ST1115, when part or all of the control content indicated by the first control signal generated by the control generation unit 107a is missing, the control interpolation unit 112a generates the control content immediately before. The first control signal is corrected by interpolating the missing control content of the first control signal based on the control content indicated by the second control signal.
Next, in step ST1116, the control output unit 108a sends the control signal generated by the control generation unit 107a or the control signal corrected by the control correction unit 111a or the control interpolation unit 112a to the motor control unit 13 provided in the robot 10. Output to
Next, in step ST1117, rotation status obtaining section 104 obtains a rotation status signal.
Next, in step ST1118, the current position acquisition unit 105 acquires current position information indicating the current position of the arm 11 in the robot 10.
Next, in step ST1119, virtual space image generation section 102 generates virtual space image information.

ロボット制御装置１００ａは、ステップＳＴ１１１９の処理を実行した後、ステップＳＴ１１０８の処理に戻って、ステップＳＴ１１０８にて、制御生成部１０７ａが、アーム１１の現在位置と目標位置とが同一であると判定するまでの期間において、ステップＳＴ１１０８からステップＳＴ１１１９までの処理を繰り返し実行する。
ステップＳＴ１１０８にて、制御生成部１０７ａが、アーム１１の現在位置と目標位置とが同一であると判定した場合、ステップＳＴ１１２１にて、モデル出力部１２３は、モデル更新部１２２が更新したモデル情報を出力する。
ステップＳＴ１１２１の処理を実行した後、ロボット制御装置１００ａは、当該フローチャートの処理を終了する。After executing the process of step ST1119, the robot control device 100a returns to the process of step ST1108, and in step ST1108, the control generation unit 107a determines that the current position and the target position of the arm 11 are the same. During the period up to, the processes from step ST1108 to step ST1119 are repeatedly executed.
In step ST1108, when the control generation unit 107a determines that the current position and the target position of the arm 11 are the same, in step ST1121, the model output unit 123 outputs the model information updated by the model update unit 122. Output.
After executing the process of step ST1121, the robot control device 100a ends the process of the flowchart.

なお、当該フローチャートの処理において、ステップＳＴ１１１４、ステップＳＴ１１１５，及びステップＳＴ１１１９の処理は、ロボット制御装置１００ａにおいて必須の処理ではない。当該フローチャートの処理において、ステップＳＴ１１０１からステップＳＴ１１０３までの処理は、ステップＳＴ１１０４の処理に前に実行されれば、実行される順序は問わない。また、当該フローチャートの処理において、ステップＳＴ１１１４及びステップＳＴ１１１５の処理は、実行される順序が逆でも良い。 Note that, in the processing of the flowchart, the processing of step ST1114, step ST1115, and step ST1119 is not essential processing in the robot control device 100a. In the processing of the flowchart, the order in which the processing from step ST1101 to step ST1103 is performed is not limited as long as the processing is performed before the processing in step ST1104. Further, in the processing of the flowchart, the order of execution of the processing of step ST1114 and step ST1115 may be reversed.

以上のように、ロボット制御装置１００ａは、ロボット１０のアーム１１の現在位置を示す現在位置情報を取得する現在位置取得部１０５と、アーム１１の目標位置を示す目標位置情報を取得する目標位置取得部１０６と、参照経路を示す参照経路情報を参照してアーム１１が参照経路を基にして移動していることを評価することにより報酬を算出する項を含む、報酬を算出するための演算式を用いて学習させたモデルを示すモデル情報と、現在位置取得部１０５が取得した現在位置情報と、目標位置取得部１０６が取得した目標位置情報とに基づいて目標位置情報が示す目標位置に向かってアーム１１を移動させるための制御内容を示す制御信号を生成する制御生成部１０７と、に加えて、参照経路を示す参照経路情報を取得する参照経路取得部１２０と、現在位置取得部１０５が取得した現在位置情報と、目標位置取得部１０６が取得した目標位置情報と、参照経路取得部１２０が取得した参照経路情報とに基づいてアーム１１が参照経路を基にして移動していることを評価することにより報酬を算出する項を含む、報酬を算出するための演算式を用いて、報酬を算出する報酬算出部１２１と、現在位置取得部１０５が取得した現在位置情報と、目標位置取得部１０６が取得した目標位置情報と、参照経路取得部１２０が取得した参照経路情報と、報酬算出部１２１が算出した報酬とに基づいて制御信号によりアーム１１を移動させる価値を評価することにより、モデル情報を更新するモデル更新部１２２と、を備えた。 As described above, the robot control device 100a includes the current position acquisition unit 105 that acquires the current position information indicating the current position of the arm 11 of the robot 10, and the target position acquisition that acquires the target position information indicating the target position of the arm 11. An arithmetic expression for calculating a reward including a unit 106 and a term for calculating a reward by evaluating that the arm 11 is moving based on the reference route with reference to reference route information indicating the reference route. To the target position indicated by the target position information based on the model information indicating the model trained by using the current position information, the current position information obtained by the current position obtaining unit 105, and the target position information obtained by the target position obtaining unit 106. And a control generation unit 107 that generates a control signal indicating the control content for moving the arm 11 by using a reference path that obtains reference path information indicating a reference path. The arm 11 refers to the current position information acquired by the acquisition unit 120, the current position acquisition unit 105, the target position information acquired by the target position acquisition unit 106, and the reference route information acquired by the reference route acquisition unit 120. A reward calculating unit 121 that calculates a reward using an arithmetic expression for calculating a reward, including a term for calculating a reward by evaluating that the user is moving based on a route; , The target position information acquired by the target position acquisition unit 106, the reference route information acquired by the reference route acquisition unit 120, and the arm calculated by the control signal based on the reward calculated by the reward calculation unit 121. And a model updating unit 122 that updates the model information by evaluating the value of moving the model information 11.

このように構成することで、参照経路を示す参照経路情報を参照してアーム１１が参照経路を基にして移動していることを評価することにより、ロボット制御装置１００ａは、ロボット制御学習装置３００が生成したモデル情報を、少ない演算量により短時間で更新しつつ、ロボット１０のアーム１１が実質的に不連続な動作を行うことのないようにロボット１０をより高精度で制御することができる。 With this configuration, the robot control device 100a evaluates that the arm 11 is moving on the basis of the reference route with reference to the reference route information indicating the reference route. Can update the model information generated by the robot 10 in a short time with a small amount of calculation, and can control the robot 10 with higher precision so that the arm 11 of the robot 10 does not perform a substantially discontinuous operation. .

なお、この発明はその発明の範囲内において、各実施の形態の自由な組み合わせ、あるいは各実施の形態の任意の構成要素の変形、もしくは各実施の形態において任意の構成要素の省略が可能である。 In the present invention, within the scope of the present invention, any combination of the embodiments can be freely combined, or any component of each embodiment can be modified, or any component can be omitted in each embodiment. .

この発明に係るロボット制御装置は、ロボット制御システムに適用することができる。また、ロボット制御学習装置は、ロボット制御学習システムに適用することができる。 The robot control device according to the present invention can be applied to a robot control system. Further, the robot control learning device can be applied to a robot control learning system.

１，１ａロボット制御システム、３ロボット制御学習システム、１０ロボット、１１アーム、１１−１，１１−２関節、１１−３先端、１２−１，１２−２モータ、１３モータ制御手段、１４−１，１４−２回転センサ、１５接触センサ、２０作業環境、３０ネットワーク、４０記憶装置、５０撮像装置、１００，１００ａロボット制御装置、３００ロボット制御学習装置、１０１，３０１画像取得部、１０２，３０２仮想空間画像生成部、１０３モデル取得部、１０４，３０４回転状況取得部、１０５，３０５現在位置取得部、１０６，３０６目標位置取得部、１０７，１０７ａ，３０７制御生成部、１０８，１０８ａ，３０８制御出力部、１１１，１１１ａ，３１１制御補正部、１１２，１１２ａ，３１２制御補間部、１２０，３２０参照経路取得部、１２１，３２１報酬算出部、１２２モデル更新部、１２３，３２３モデル出力部、１２４，３２４接触信号取得部、３２２モデル生成部、２０１プロセッサ、２０２メモリ、２０３処理回路。 1, 1a robot control system, 3 robot control learning system, 10 robots, 11 arms, 11-1, 11-2 joints, 11-3 tip, 12-1, 12-2 motor, 13 motor control means, 14-1 , 14-2 rotation sensor, 15 contact sensor, 20 working environment, 30 network, 40 storage device, 50 imaging device, 100, 100a robot control device, 300 robot control learning device, 101, 301 image acquisition unit, 102, 302 virtual Spatial image generation unit, 103 model acquisition unit, 104, 304 rotation status acquisition unit, 105, 305 current position acquisition unit, 106, 306 target position acquisition unit, 107, 107a, 307 control generation unit, 108, 108a, 308 control output , 111, 111a, 311 control correction unit, 112, 112a, 12 control interpolation unit, 120, 320 reference route acquisition unit, 121, 321 reward calculation unit, 122 model update unit, 123, 323 model output unit, 124, 324 contact signal acquisition unit, 322 model generation unit, 201 processor, 202 memory , 203 processing circuit.

Claims

ロボットのアームの現在位置を示す現在位置情報を取得する現在位置取得部と、
前記アームの目標位置を示す目標位置情報を取得する目標位置取得部と、
参照経路を示す参照経路情報を参照して前記アームが前記参照経路を基にして移動していることを評価することにより報酬を算出する項を含む、報酬を算出するための演算式を用いて学習させたモデルを示すモデル情報と、前記現在位置取得部が取得した前記現在位置情報と、前記目標位置取得部が取得した前記目標位置情報とに基づいて前記目標位置情報が示す前記目標位置に向かって前記アームを移動させるための制御内容を示す制御信号を生成する制御生成部と、
を備え、
前記演算式は、前記アームが前記参照経路を基にして移動していることを評価することにより報酬を算出する項として、前記ロボットの前記アームの位置と前記参照経路との間の距離を評価することにより報酬を算出する項を含むこと
を特徴とするロボット制御装置。 A current position acquisition unit that acquires current position information indicating the current position of the robot arm;
A target position acquisition unit that acquires target position information indicating a target position of the arm,
Using a calculation formula for calculating a reward, including a term for calculating a reward by evaluating that the arm is moving based on the reference route with reference to reference route information indicating a reference route. Based on the model information indicating the learned model, the current position information acquired by the current position acquisition unit, and the target position indicated by the target position information based on the target position information acquired by the target position acquisition unit. A control generation unit that generates a control signal indicating control content for moving the arm toward the control unit;
Equipped with a,
The arithmetic expression evaluates a distance between the position of the arm of the robot and the reference path as a term for calculating a reward by evaluating that the arm is moving based on the reference path. A robot control device comprising a term for calculating a reward by performing the following .

前記演算式は、前記アームが前記参照経路を基にして移動していることを評価することにより報酬を算出する項として、前記ロボットの前記アームの位置と前記参照経路との間の距離を評価することにより報酬を算出する項に加え、前記参照経路に沿って前記目標位置の方向に向かって移動した距離を評価することにより報酬を算出する項を含むこと The arithmetic expression evaluates a distance between the position of the arm of the robot and the reference path as a term for calculating a reward by evaluating that the arm is moving based on the reference path. In addition to the term for calculating a reward by performing, a term for calculating a reward by evaluating a distance moved toward the direction of the target position along the reference route is included.
を特徴とする請求項１記載のロボット制御装置。 The robot control device according to claim 1, wherein:

前記演算式は、前記アームが前記参照経路を基にして移動していることを評価することにより報酬を算出する項に加えて、前記現在位置取得部が取得した前記現在位置情報が示す前記アームの前記現在位置の移動の連続性を評価することにより報酬を算出する項を含むこと
を特徴とする請求項１又は請求項２記載のロボット制御装置。 In the arithmetic expression, in addition to a term for calculating a reward by evaluating that the arm is moving based on the reference path, the arm indicated by the current position information acquired by the current position acquisition unit The robot control device according to claim 1 or 2 , further comprising: calculating a reward by evaluating the continuity of the movement of the current position.

前記演算式は、前記アームが前記参照経路を基にして移動していることを評価することにより報酬を算出する項に加えて、前記アームの一部と障害物とが接触したか否かを評価することにより報酬を算出する項を含むこと
を特徴とする請求項１又は請求項２記載のロボット制御装置。 The arithmetic expression, in addition to the term to calculate a reward by evaluating that the arm is moving based on the reference path, whether or not a part of the arm and the obstacle contacted robot control apparatus according to claim 1 or claim 2 wherein, characterized in that it comprises a section for calculating the compensation by evaluating.

前記参照経路情報は、所定の計算処理に基づいて自動で生成されること
を特徴とする請求項１又は請求項２記載のロボット制御装置。 The reference path information, the robot control apparatus according to claim 1 or claim 2, wherein the generated automatically on the basis of a predetermined calculation processing.

前記参照経路情報は、前記アームが過去に移動した経路を示す移動履歴情報に基づいて生成されること
を特徴とする請求項１又は請求項２記載のロボット制御装置。 The reference path information, said arm robot control apparatus according to claim 1 or claim 2, wherein the generated based on the movement history information indicating a route that has moved in the past.

前記制御生成部が生成した第１制御信号が示す制御内容が、前記制御生成部が直前に生成した第２制御信号が示す制御内容と比較して、予め定められた範囲内の変化量になるように、前記第１制御信号を補正する制御補正部
を備えること
を特徴とする請求項１又は請求項２記載のロボット制御装置。 The control content indicated by the first control signal generated by the control generation unit is changed in a predetermined range as compared with the control content indicated by the second control signal generated immediately before by the control generation unit. as such, the robot control apparatus according to claim 1 or claim 2 wherein, characterized in that it comprises a control correction unit which corrects the first control signal.

前記制御生成部が生成した第１制御信号が示す制御内容の一部又は全部が欠落している場合、前記制御生成部が直前に生成した第２制御信号が示す制御内容に基づいて前記第２制御信号が示す制御内容から予め定められた範囲内の変化量になるように、前記第１制御信号において欠落している制御内容を補間して前記第１制御信号を補正する制御補間部
を備えること
を特徴とする請求項１又は請求項２記載のロボット制御装置。 When a part or all of the control content indicated by the first control signal generated by the control generation unit is missing, the second control signal is generated based on the control content indicated by the second control signal generated immediately before by the control generation unit. A control interpolating unit that corrects the first control signal by interpolating the control content missing in the first control signal so that a change amount within a predetermined range from the control content indicated by the control signal is provided. The robot control device according to claim 1 or 2, wherein:

前記参照経路を示す前記参照経路情報を取得する参照経路取得部と、
前記現在位置取得部が取得した前記現在位置情報と、前記目標位置取得部が取得した前記目標位置情報と、前記参照経路取得部が取得した前記参照経路情報とに基づいて前記アームが前記参照経路を基にして移動していることを評価することにより報酬を算出する項を含む、報酬を算出するための演算式を用いて、報酬を算出する報酬算出部と、
前記現在位置取得部が取得した前記現在位置情報と、前記目標位置取得部が取得した前記目標位置情報と、前記参照経路取得部が取得した前記参照経路情報と、前記報酬算出部が算出した報酬とに基づいて前記制御信号により前記アームを移動させる価値を評価することにより、前記モデル情報を更新するモデル更新部と、
を備えたこと
を特徴とする請求項１又は請求項２記載のロボット制御装置。 A reference route acquisition unit that acquires the reference route information indicating the reference route,
The arm sets the reference route based on the current position information acquired by the current position acquisition unit, the target position information acquired by the target position acquisition unit, and the reference route information acquired by the reference route acquisition unit. A reward calculation unit that calculates a reward using an arithmetic expression for calculating a reward, including a term for calculating a reward by evaluating that the user is moving based on
The current position information acquired by the current position acquisition unit, the target position information acquired by the target position acquisition unit, the reference route information acquired by the reference route acquisition unit, and a reward calculated by the reward calculation unit A model updating unit that updates the model information by evaluating a value of moving the arm with the control signal based on
The robot control device according to claim 1 or 2, further comprising:

ロボットのアームの現在位置を示す現在位置情報を取得する現在位置取得部と、
前記アームの目標位置を示す目標位置情報を取得する目標位置取得部と、
参照経路を示す参照経路情報を取得する参照経路取得部と、
前記現在位置取得部が取得した前記現在位置情報と、前記目標位置取得部が取得した前記目標位置情報と、前記参照経路取得部が取得した前記参照経路情報とに基づいて前記アームが前記参照経路を基にして移動していることを評価することにより報酬を算出する項を含む、報酬を算出するための演算式を用いて、報酬を算出する報酬算出部と、
前記目標位置情報が示す前記目標位置に向かって前記アームを移動させるための制御内容を示す制御信号を生成する制御生成部と、
前記現在位置取得部が取得した前記現在位置情報と、前記目標位置取得部が取得した前記目標位置情報と、前記参照経路取得部が取得した前記参照経路情報と、前記報酬算出部が算出した報酬とに基づいて前記制御信号により前記アームを移動させる価値を評価することにより、モデル情報を生成するモデル生成部と、
を備え、
前記演算式は、前記アームが前記参照経路を基にして移動していることを評価することにより報酬を算出する項として、前記ロボットの前記アームの位置と前記参照経路との間の距離を評価することにより報酬を算出する項を含むこと
を特徴とするロボット制御学習装置。 A current position acquisition unit that acquires current position information indicating the current position of the robot arm;
A target position acquisition unit that acquires target position information indicating a target position of the arm,
A reference route acquisition unit that acquires reference route information indicating a reference route,
The arm sets the reference route based on the current position information acquired by the current position acquisition unit, the target position information acquired by the target position acquisition unit, and the reference route information acquired by the reference route acquisition unit. A reward calculation unit that calculates a reward using an arithmetic expression for calculating a reward, including a term for calculating a reward by evaluating that the user is moving based on
A control generation unit that generates a control signal indicating control content for moving the arm toward the target position indicated by the target position information,
The current position information acquired by the current position acquisition unit, the target position information acquired by the target position acquisition unit, the reference route information acquired by the reference route acquisition unit, and a reward calculated by the reward calculation unit A model generation unit that generates model information by evaluating the value of moving the arm based on the control signal based on
Equipped with a,
The arithmetic expression evaluates a distance between the position of the arm of the robot and the reference path as a term for calculating a reward by evaluating that the arm is moving based on the reference path. A robot control learning device, which includes a term for calculating a reward by doing .

前記演算式は、前記アームが前記参照経路を基にして移動していることを評価することにより報酬を算出する項として、前記ロボットの前記アームの位置と前記参照経路との間の距離を評価することにより報酬を算出する項に加え、前記参照経路に沿って前記目標位置の方向に向かって移動した距離を評価することにより報酬を算出する項を含むこと The arithmetic expression evaluates a distance between the position of the arm of the robot and the reference path as a term for calculating a reward by evaluating that the arm is moving based on the reference path. In addition to the term for calculating a reward by performing, a term for calculating a reward by evaluating a distance moved toward the direction of the target position along the reference route is included.
を特徴とする請求項１０記載のロボット制御学習装置。 The robot control learning device according to claim 10, wherein:

前記演算式は、前記アームが前記参照経路を基にして移動していることを評価することにより報酬を算出する項に加えて、前記現在位置取得部が取得した前記現在位置情報が示す前記アームの前記現在位置の移動の連続性を評価することにより報酬を算出する項を含むこと
を特徴とする請求項１０又は請求項１１記載のロボット制御学習装置。 In the arithmetic expression, in addition to a term for calculating a reward by evaluating that the arm is moving based on the reference path, the arm indicated by the current position information acquired by the current position acquisition unit of the robot control learning device according to claim 10 or claim 11, wherein the includes a term for calculating the compensation by evaluating the continuity of the movement of the current position.

前記演算式は、前記アームが前記参照経路を基にして移動していることを評価することにより報酬を算出する項に加えて、前記アームの一部と障害物とが接触したか否かを評価することにより報酬を算出する項を含むこと
を特徴とする請求項１０又は請求項１１記載のロボット制御学習装置。 The arithmetic expression, in addition to the term to calculate a reward by evaluating that the arm is moving based on the reference path, whether or not a part of the arm and the obstacle contacted evaluation robot control learning device according to claim 10 or claim 11, wherein the includes a term for calculating the remuneration by.

前記参照経路情報は、所定の計算処理に基づいて自動で生成されること
を特徴とする請求項１０又は請求項１１記載のロボット制御学習装置。 The robot control learning device according to claim 10 or 11 , wherein the reference route information is automatically generated based on a predetermined calculation process.

前記参照経路情報は、前記アームが過去に移動した経路を示す移動履歴情報に基づいて生成されること
を特徴とする請求項１０又は請求項１１記載のロボット制御学習装置。 The robot control learning device according to claim 10 or 11 , wherein the reference route information is generated based on movement history information indicating a route on which the arm has moved in the past.

前記制御生成部が生成した第１制御信号が示す制御内容が、前記制御生成部が直前に生成した第２制御信号が示す制御内容と比較して、予め定められた範囲内の変化量になるように、前記第１制御信号を補正する制御補正部
を備えること
を特徴とする請求項１０又は請求項１１記載のロボット制御学習装置。 The control content indicated by the first control signal generated by the control generation unit is changed in a predetermined range as compared with the control content indicated by the second control signal generated immediately before by the control generation unit. as such, the robot controller learning apparatus according to claim 10 or claim 11, wherein further comprising a control correction unit which corrects the first control signal.

前記制御生成部が生成した第１制御信号が示す制御内容の一部又は全部が欠落している場合、前記制御生成部が直前に生成した第２制御信号が示す制御内容に基づいて前記第２制御信号が示す制御内容から予め定められた範囲内の変化量になるように、前記第１制御信号において欠落している制御内容を補間して前記第１制御信号を補正する制御補間部
を備えること
を特徴とする請求項１０又は請求項１１記載のロボット制御学習装置。 When a part or all of the control content indicated by the first control signal generated by the control generation unit is missing, the second control signal is generated based on the control content indicated by the second control signal generated immediately before by the control generation unit. A control interpolating unit that corrects the first control signal by interpolating the control content missing in the first control signal so that a change amount within a predetermined range from the control content indicated by the control signal is provided. The robot control learning device according to claim 10 or 11, wherein:

現在位置取得部が、ロボットのアームの現在位置を示す現在位置情報を取得し、
目標位置取得部が、前記アームの目標位置を示す目標位置情報を取得し、
制御生成部が、参照経路を示す参照経路情報を参照して前記アームが前記参照経路を基にして移動していることを評価することにより報酬を算出する項を含む、報酬を算出するための演算式を用いて学習させたモデルを示すモデル情報と、前記現在位置取得部が取得した前記現在位置情報と、前記目標位置取得部が取得した前記目標位置情報とに基づいて前記目標位置情報が示す前記目標位置に向かって前記アームを移動させるための制御内容を示す制御信号を生成し、
前記演算式は、前記アームが前記参照経路を基にして移動していることを評価することにより報酬を算出する項として、前記ロボットの前記アームの位置と前記参照経路との間の距離を評価することにより報酬を算出する項を含むこと
を特徴とするロボット制御方法。 A current position acquisition unit acquires current position information indicating a current position of the robot arm,
A target position acquisition unit acquires target position information indicating a target position of the arm,
A control generation unit includes a term for calculating a reward by evaluating that the arm is moving based on the reference path with reference to reference path information indicating a reference path, for calculating a reward. Model information indicating a model learned using an arithmetic expression, the current position information acquired by the current position acquisition unit, and the target position information based on the target position information acquired by the target position acquisition unit is Generating a control signal indicating the control content for moving the arm toward the target position shown ,
The arithmetic expression evaluates a distance between the position of the arm of the robot and the reference path as a term for calculating a reward by evaluating that the arm is moving based on the reference path. A method for controlling a robot , comprising a term for calculating a reward by performing the following .

前記演算式は、前記アームが前記参照経路を基にして移動していることを評価することにより報酬を算出する項として、前記ロボットの前記アームの位置と前記参照経路との間の距離を評価することにより報酬を算出する項に加え、前記参照経路に沿って前記目標位置の方向に向かって移動した距離を評価することにより報酬を算出する項を含むこと The arithmetic expression evaluates a distance between the position of the arm of the robot and the reference path as a term for calculating a reward by evaluating that the arm is moving based on the reference path. In addition to the term for calculating a reward by performing, a term for calculating a reward by evaluating a distance moved toward the direction of the target position along the reference route is included.
を特徴とする請求項１８記載のロボット制御方法。 The robot control method according to claim 18, wherein: