JP7179672B2

JP7179672B2 - Computer system and machine learning method

Info

Publication number: JP7179672B2
Application number: JP2019082488A
Authority: JP
Inventors: 崇弘三木; 亮坂井
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2019-04-24
Filing date: 2019-04-24
Publication date: 2022-11-29
Anticipated expiration: 2039-04-24
Also published as: JP2020179438A

Description

本発明は、物体を把持する装置の制御モデルを生成するための機械学習の技術に関する。 The present invention relates to a machine learning technique for generating a control model of an object-grasping device.

機械学習の技術の高度化に伴って、自動車及びロボット等を制御するためのアルゴリズム（モデル）の開発が進展している。例えば、物体を把持する装置（把持装置）を制御するためのアルゴリズムを開発する技術として特許文献１に記載の技術が知られている。 With the sophistication of machine learning technology, the development of algorithms (models) for controlling automobiles, robots, and the like is progressing. For example, the technique described in Patent Document 1 is known as a technique for developing an algorithm for controlling a device (gripping device) that grips an object.

特許文献１には、「ＤＢ１の過去の取り出し成否の情報をもとに、計測特徴識別部５２で、センサ２で計測したワークの一部の掴みやすさを推定し、把持動作演算部５３、開閉動作演算部５４において、掴みにくそうなものは優先して掴まないが、他に掴めそうなものがないときは、ハンドの開閉量・動作速度・把持力のうち少なくとも一つを調整し、より丁寧な取り出し動作をおこなう」装置が記載されている。また、特許文献１には、ニューラルネットワーク等の識別器を用いてワークのつかみやすさを推定することが記載されている。 In Patent Document 1, "Based on the information on the success or failure of past extraction in the DB 1, the measurement feature identification unit 52 estimates the easiness of gripping a part of the workpiece measured by the sensor 2, and the gripping operation calculation unit 53, In the opening/closing operation calculation unit 54, objects that are likely to be difficult to grip are given priority and are not gripped, but if there is no other object that is likely to be gripped, at least one of the opening/closing amount, operating speed, and gripping force of the hand is adjusted. , a more gentle ejection action” device is described. Further, Patent Literature 1 describes estimating the ease of gripping a workpiece using a classifier such as a neural network.

従来技術では、既知の形状の物体に関する学習データを用いて識別器が生成される。そのため、既知の形状の物体については把持の可否を予測できるが、未知の形状の物体については把持の可否を予測できない。 In the prior art, classifiers are generated using training data on objects of known shape. Therefore, it is possible to predict whether an object with a known shape can be grasped, but it is not possible to predict whether an object with an unknown shape can be grasped.

特開２０１３－５２４９０号公報JP 2013-52490 A

Hiroharu Kato, Yoshitaka Ushiku, Tatsuya Harada、"Neural 3D Mesh Renderer"、［online］、２０１７年１１月２０日、［２０１９年３月２５日検索］、インターネット＜https://arxiv.org/abs/1711.07566＞Hiroharu Kato, Yoshitaka Ushiku, Tatsuya Harada, "Neural 3D Mesh Renderer", [online], November 20, 2017, [searched March 25, 2019], Internet <https://arxiv.org/abs/1711.07566 ＞

識別器の精度を向上させるためには、様々な形状の物体に関する学習データを用いた学習が必要である。 In order to improve the accuracy of classifiers, learning using learning data on objects of various shapes is necessary.

本発明は、様々な形状の物体に関する学習データを自動的に生成し、当該学習データを用いて効率的に識別器の精度を向上させるための技術を提供する。 The present invention provides a technique for automatically generating learning data regarding objects of various shapes and efficiently improving the accuracy of classifiers using the learning data.

本願において開示される発明の代表的な一例を示せば以下の通りである。すなわち、把持装置が物体を把持するための制御アルゴリズムを学習する計算機システムであって、前記計算機システムは、演算装置及び前記演算装置に接続される記憶装置を有する少なくとも一つの計算機を備え、前記少なくとも一つの計算機は、前記物体の形状に関する情報を含む物体データ、当該物体を把持する位置の基準となる把持点、及び前記物体の把持の成否を示す情報を含む学習データを用いて、当該物体の把持成功確率を出力する第一モデルを生成するための第一学習処理と、基準物体の形状を変形させた物体の前記物体データを出力する第二モデルを生成するための第二学習処理と、を実行し、前記少なくとも一つの計算機は、前記第二学習処理において、前記第二モデルに基づいて、前記基準物体の形状を変形した第一物体の前記物体データを生成し、前記第一物体の把持点を決定し、前記第一物体の物体データ及び前記第一物体の把持点の複数の組を前記第一モデルに入力し、前記第一モデルより出力された前記第一物体の把持成功確率の最大値及び目標確率の差に基づいて、前記第二モデルを更新する。 A representative example of the invention disclosed in the present application is as follows. That is, a computer system in which a gripping device learns a control algorithm for gripping an object, the computer system comprising at least one computer having an arithmetic device and a storage device connected to the arithmetic device, wherein the at least One computer uses object data including information about the shape of the object, gripping points that serve as reference positions for gripping the object, and learning data including information indicating success or failure of gripping the object. a first learning process for generating a first model that outputs a gripping success probability; a second learning process for generating a second model that outputs the object data of an object obtained by deforming the shape of a reference object; and the at least one computer generates, in the second learning process, the object data of a first object obtained by deforming the shape of the reference object based on the second model, and Determining gripping points, inputting multiple sets of object data of the first object and gripping points of the first object into the first model, and obtaining a successful gripping probability of the first object output from the first model. The second model is updated based on the difference between the maximum value of and the target probability.

本発明によれば、様々な形状の物体に関する学習データを識別器（第一モデル）の学習状態に合わせて自動的に生成し、当該学習データを用いた学習処理を実行することにより識別器の精度を効率的に向上させることができる。上記した以外の課題、構成及び効果は、以下の実施例の説明により明らかにされる。 According to the present invention, learning data about objects of various shapes are automatically generated in accordance with the learning state of a classifier (first model), and a learning process using the learning data is executed to improve the classifier. Accuracy can be efficiently improved. Problems, configurations, and effects other than those described above will be clarified by the following description of the embodiments.

実施例１の計算機システムの構成例を示す図である。1 is a diagram illustrating a configuration example of a computer system of Example 1; FIG. 実施例１の学習対象のモデルの一例を示す図である。4 is a diagram showing an example of a model to be learned in Example 1; FIG. 実施例１の計算機のハードウェア構成を示す図である。3 is a diagram showing the hardware configuration of the computer of Example 1; FIG. 実施例１の計算機システムによって提示される設定画面の一例を示す図である。FIG. 4 is a diagram showing an example of a setting screen presented by the computer system of Example 1; 実施例１の第一学習部が実行する処理の一例を説明するフローチャートである。7 is a flowchart illustrating an example of processing executed by a first learning unit according to the first embodiment; 実施例１の第一学習部が実行する第一学習処理の一例を説明するフローチャートである。10 is a flowchart illustrating an example of first learning processing executed by a first learning unit of Example 1; 実施例１の第二学習部が実行する処理の一例を説明するフローチャートである。8 is a flowchart illustrating an example of processing executed by a second learning unit according to the first embodiment; 実施例１の第二学習部が実行する第二学習処理の一例を説明するフローチャートである。8 is a flowchart illustrating an example of a second learning process executed by a second learning unit of Example 1; 実施例１の把持装置の制御の一例を説明するフローチャートである。5 is a flow chart illustrating an example of control of the gripping device of Example 1. FIG. 実施例２の第二学習部が実行する処理の一例を説明するフローチャートである。10 is a flowchart illustrating an example of processing executed by a second learning unit of Example 2; 実施例２の第二学習部が実行する第二学習処理の一例を説明するフローチャートである。13 is a flowchart illustrating an example of a second learning process executed by a second learning unit of Example 2; 実施例３の計算機システムの構成例を示す図である。FIG. 11 is a diagram illustrating a configuration example of a computer system of Example 3; 実施例３の第一学習部が実行する処理の一例を説明するフローチャートである。FIG. 11 is a flowchart for explaining an example of processing executed by a first learning unit of Example 3; FIG. 実施例３の第二学習部が実行する処理の一例を説明するフローチャートである。FIG. 11 is a flowchart illustrating an example of processing executed by a second learning unit of Example 3; FIG.

まず、本発明の概要について説明する。本発明では、ＧＡＮ（ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋｓ）を応用して、把持装置の制御アルゴリズム（モデル）を学習する。ＧＡＮでは、画像を生成するＧｅｎｅｒａｔｏｒと、Ｇｅｎｅｒａｔｏｒによって生成された画像であるか否かを識別するＤｉｓｃｒｉｍｉｎａｔｏｒとがそれぞれ学習される。 First, the outline of the present invention will be explained. In the present invention, GAN (Generative Adversarial Networks) is applied to learn a control algorithm (model) of a grasping device. In the GAN, a generator that generates an image and a discriminator that discriminates whether or not the image is generated by the generator are learned.

本発明の計算機システムは、任意の把持点で任意の形状の物体を把持できる確率（把持成功確率）を算出する確率モデルと、任意の形状の物体の形状に関する情報を含む物体データを生成する物体データ生成モデルとをそれぞれ学習する。 The computer system of the present invention includes a probability model for calculating the probability of being able to grip an object of arbitrary shape at an arbitrary gripping point (grasping success probability), and an object data for generating object data including information on the shape of the object of arbitrary shape. Learn the data generation model and each.

確率モデルの学習では、物体データ生成モデルによって生成された物体データを用いて、任意の形状の物体の把持成功確率の最大値が閾値（目標確率）より大きくなるように確率モデルが学習される。これによって、様々な形状の物体を把持するための確率モデルを効率的に学習できる。 In probabilistic model learning, the probabilistic model is learned using object data generated by the object data generation model so that the maximum value of the gripping success probability of an object of arbitrary shape is greater than a threshold (target probability). This enables efficient learning of probabilistic models for gripping objects of various shapes.

物体データ生成モデルの学習では、目標確率及び把持成功確率の最大値の差が小さくなるように物体データ生成モデルが学習される。すなわち、確率モデルの学習の進捗状態に合わせるように物体データ生成モデルが更新される。これによって、把持成功確率が目標確率と近くなるような物体データを効率的に生成できる。したがって、より効率的な確率モデルの学習を実現できる。 In the learning of the object data generation model, the object data generation model is learned such that the difference between the target probability and the maximum value of the gripping success probability becomes small. That is, the object data generation model is updated so as to match the learning progress of the probability model. This makes it possible to efficiently generate object data in which the gripping success probability is close to the target probability. Therefore, more efficient probabilistic model learning can be realized.

以下、本発明の実施例を、図面を用いて説明する。ただし、本発明は以下に示す実施例の記載内容に限定して解釈されるものではない。本発明の思想ないし趣旨から逸脱しない範囲で、その具体的構成を変更し得ることは当業者であれば容易に理解される。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the present invention should not be construed as being limited to the contents of the examples described below. Those skilled in the art will easily understand that the specific configuration can be changed without departing from the idea or gist of the present invention.

以下に説明する発明の構成において、同一又は類似する構成又は機能には同一の符号を付し、重複する説明は省略する。 In the configurations of the invention described below, the same or similar configurations or functions are denoted by the same reference numerals, and overlapping descriptions are omitted.

本明細書等における「第１」、「第２」、「第３」等の表記は、構成要素を識別するために付するものであり、必ずしも、数又は順序を限定するものではない。 The notations such as “first”, “second”, “third”, etc. in this specification and the like are attached to identify the constituent elements, and do not necessarily limit the number or order.

図面等において示す各構成の位置、大きさ、形状、及び範囲等は、発明の理解を容易にするため、実際の位置、大きさ、形状、及び範囲等を表していない場合がある。したがって、本発明では、図面等に開示された位置、大きさ、形状、及び範囲等に限定されない。 The position, size, shape, range, etc. of each component shown in the drawings may not represent the actual position, size, shape, range, etc. in order to facilitate understanding of the invention. Therefore, the present invention is not limited to the positions, sizes, shapes, ranges, etc. disclosed in the drawings and the like.

図１は、実施例１の計算機システムの構成例を示す図である。図２は、実施例１の学習対象のモデルの一例を示す図である。図３は、実施例１の計算機１００のハードウェア構成を示す図である。 FIG. 1 is a diagram showing a configuration example of a computer system according to the first embodiment. FIG. 2 is a diagram illustrating an example of a model to be learned according to the first embodiment. FIG. 3 is a diagram showing the hardware configuration of the computer 100 of Example 1. As shown in FIG.

まず、図２を用いて、実施例１の計算機システムが学習するモデルについて説明する。 First, a model learned by the computer system of the first embodiment will be described with reference to FIG.

倉庫及び工場等の作業空間において、把持装置２００が物体２１０の把持作業を行う。把持装置２００は、制御装置２０１及びアーム２０２から構成される。なお、把持装置２００は、カメラ等のセンサ、並びに、タイヤ及びモータ等の移動装置を有してもよい。 The gripping device 200 grips an object 210 in a work space such as a warehouse or a factory. The grasping device 200 is composed of a control device 201 and an arm 202 . Note that the gripping device 200 may have a sensor such as a camera, and a moving device such as tires and a motor.

制御装置２０１は、把持装置２００全体を制御する。制御装置２０１には、図示しないコントローラ及び駆動装置等が含まれる。アーム２０２は、制御装置２０１による制御に従って物体を把持する。アーム２０２は、力覚センサ及び触覚センサ等を有してもよい。なお、本発明は、アーム２０２の形態に限定されない。物体２１０を把持できる形態であればどのような形態でもよい。 The control device 201 controls the gripping device 200 as a whole. The control device 201 includes a controller, drive device, and the like (not shown). Arm 202 grips an object under the control of control device 201 . Arm 202 may have a force sensor, a touch sensor, and the like. It should be noted that the present invention is not limited to the form of arm 202 . Any form may be used as long as the object 210 can be gripped.

ここで、把持装置２００の把持作業における処理の概要について説明する。 Here, an overview of the processing in the gripping operation of the gripping device 200 will be described.

制御装置２０１は、センサ等から物体２１０が存在する空間の画像を取得する。制御装置２０１は、確率モデルに基づいて、物体２１０の把持点２２０を選択する。ここで、把持点２２０は、例えば、アーム２０２が物体２１０を把持する位置の基準となる座標である。 The control device 201 acquires an image of the space in which the object 210 exists from a sensor or the like. Controller 201 selects grip points 220 for object 210 based on a probabilistic model. Here, the gripping point 220 is, for example, coordinates that serve as a reference for the position at which the arm 202 grips the object 210 .

制御装置２０１は、把持点２２０への軌道計画を生成する。ここで、軌道計画は、アーム２０２を把持点２２０まで移動させるためのアーム２０２及び把持装置２００の移動軌道に関する情報である。 Controller 201 generates a trajectory plan to grip point 220 . Here, the trajectory plan is information about the movement trajectory of arm 202 and gripper 200 to move arm 202 to gripping point 220 .

制御装置２０１は、軌道計画に基づいて、把持装置２００の動作を制御するための制御情報を生成し、出力する。把持装置２００は、制御情報に基づいて、アーム２０２が把持点２２０において物体２１０を把持するように動く。 The control device 201 generates and outputs control information for controlling the operation of the gripping device 200 based on the trajectory plan. Gripping device 200 moves such that arm 202 grips object 210 at gripping point 220 based on the control information.

実施例１の計算機システムは、把持装置２００が物体を把持するための制御アルゴリズムである確率モデルを生成するための学習処理を実行する。図１の説明に戻る。 The computer system of the first embodiment executes learning processing for generating a probability model, which is a control algorithm for gripping an object by the gripping device 200 . Returning to the description of FIG.

計算機システムは、複数の計算機１００－１、１００－２から構成される。各計算機１００－１、１００－２は、直接又はネットワークを介して互いに接続される。 The computer system comprises a plurality of computers 100-1 and 100-2. Computers 100-1 and 100-2 are connected to each other directly or via a network.

計算機１００－１は、任意の把持点２２０で任意の形状の物体２１０を把持できる確率（把持成功確率）を算出する確率モデルを生成するための学習処理を実行する。 Computer 100-1 executes learning processing for generating a probability model for calculating the probability of being able to grip an object 210 having an arbitrary shape at an arbitrary gripping point 220 (grasping success probability).

確率モデルは、物体２１０の形状に関する情報を含む物体データ及び把持点２２０を含む入力データから把持成功確率を算出するモデルである。なお、計算機１００－１は、物体データそのものを確率モデルに入力してもよいし、ＶＡＥ（ＶａｒｉａｔｉｏｎａｌＡｕｔｏＥｎｃｏｄｅｒ）等を用いて抽出された物体データの特徴量を確率モデルに入力してもよい。 The probabilistic model is a model for calculating the gripping success probability from object data including information about the shape of the object 210 and input data including the gripping point 220 . Calculator 100-1 may input the object data itself into the probability model, or may input the feature amount of the object data extracted using a VAE (Variational AutoEncoder) or the like into the probability model.

計算機１００－２は、学習用物体データ１７０を生成する物体データ生成モデルを生成するための学習処理を実行する。計算機１００－２には、学習用物体データ１７０を生成するために用いる基準物体データ１０１が入力される。基準物体データ１０１は、基準となる物体に関するデータである。基準物体データ１０１及び学習用物体データ１７０は、例えば、ポリゴンメッシュをレンダリングすることによって生成された画像データである。 Computer 100-2 executes learning processing for generating an object data generation model for generating object data 170 for learning. Reference object data 101 used to generate learning object data 170 is input to computer 100-2. Reference object data 101 is data relating to a reference object. The reference object data 101 and learning object data 170 are, for example, image data generated by rendering polygon meshes.

物体データ生成モデルは、基準物体データ１０１を入力データとして受け付けて、学習用物体データ１７０を生成するモデルである。 The object data generation model is a model that receives the reference object data 101 as input data and generates learning object data 170 .

ここで、図３を用いて計算機１００－１、１００－２のハードウェア構成について説明する。図３では、計算機１００－１のハードウェア構成を説明する。なお、計算機１００－２のハードウェア構成は計算機１００－１と同一である。 Here, the hardware configuration of the computers 100-1 and 100-2 will be explained using FIG. FIG. 3 illustrates the hardware configuration of the computer 100-1. The hardware configuration of the computer 100-2 is the same as that of the computer 100-1.

計算機１００－１は、プロセッサ３００、主記憶装置３０１、副記憶装置３０２、ネットワークインタフェース３０３、及びＩＯインタフェース３０４を有する。各ハードウェアは内部バスを介して互いに接続される。 The computer 100 - 1 has a processor 300 , a main storage device 301 , a secondary storage device 302 , a network interface 303 and an IO interface 304 . Each piece of hardware is connected to each other via an internal bus.

なお、図２に示す計算機１００－１のハードウェア構成は一例であってこれに限定されない。例えば、計算機１００－１は、図示しないハードウェアを有してもよいし、副記憶装置３０２及びＩＯインタフェース３０４等の一部のハードウェアを有していなくてもよい。 Note that the hardware configuration of the computer 100-1 shown in FIG. 2 is an example and is not limited to this. For example, the computer 100-1 may have hardware not shown, or may not have some hardware such as the secondary storage device 302 and the IO interface 304. FIG.

プロセッサ３００は、主記憶装置３０１に格納されるプログラムを実行する。プロセッサ３００がプログラムにしたがって処理を実行することによって、特定の機能を実現する機能部（モジュール）として動作する。以下の説明では、機能部を主語に処理を説明する場合、プロセッサ３００が当該モジュールを実現するプログラムを実行していることを示す。 The processor 300 executes programs stored in the main memory device 301 . Processor 300 operates as a functional unit (module) that implements a specific function by executing processing according to a program. In the following description, when the processing is described with the functional unit as the subject, it indicates that the processor 300 is executing a program that implements the module.

主記憶装置３０１は、ＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の記憶装置であり、プロセッサ３００が実行するプログラム及びプログラムが使用する情報を格納する。また、主記憶装置３０１は、プログラムが使用するワークエリアとしても用いられる。計算機１００－１、１００－２の主記憶装置３０１に格納されるプログラム及び情報については後述する。 The main storage device 301 is a storage device such as a DRAM (Dynamic Random Access Memory), and stores programs executed by the processor 300 and information used by the programs. The main storage device 301 is also used as a work area used by programs. The programs and information stored in the main storage devices 301 of the computers 100-1 and 100-2 will be described later.

なお、プログラム及び情報は副記憶装置３０２等、主記憶装置３０１以外のデバイス又は装置に格納されてもよい。この場合、プロセッサ３００が、デバイス又は装置からプログラム及び情報を読み出し、主記憶装置３０１にロードする。 The programs and information may be stored in a device or apparatus other than the main memory 301, such as the secondary memory 302. FIG. In this case, processor 300 reads programs and information from the device or apparatus and loads them into main memory 301 .

副記憶装置３０２は、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）及びＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記憶装置であり、データを永続的に格納する。 The secondary storage device 302 is a storage device such as a HDD (Hard Disk Drive) and an SSD (Solid State Drive), and permanently stores data.

ネットワークインタフェース３０３は、ネットワーク３１０を介して端末３２０等の外部装置と通信するためのインタフェースである。 A network interface 303 is an interface for communicating with an external device such as a terminal 320 via a network 310 .

ＩＯインタフェース３０４は、入力装置３３０及び出力装置３３１等に接続するためのインタフェースである。なお、入力装置３３０は、キーボード、マウス、及びタッチパネル等であり、出力装置３３１は、ディスプレイ及びプリンタ等である。 The IO interface 304 is an interface for connecting to the input device 330, the output device 331, and the like. Note that the input device 330 is a keyboard, mouse, touch panel, or the like, and the output device 331 is a display, printer, or the like.

図１の説明に戻る。次に、計算機１００－１、１００－２の主記憶装置３０１に格納されるプログラム及び情報について説明する。 Returning to the description of FIG. Next, programs and information stored in the main storage devices 301 of the computers 100-1 and 100-2 will be described.

計算機１００－１の主記憶装置３０１は、第一学習部１１０を実現するプログラム及び確率モデル管理情報１３０を格納する。 The main storage device 301 of the computer 100-1 stores a program that implements the first learning unit 110 and probability model management information 130. FIG.

確率モデル管理情報１３０は、確率モデルを管理するための情報である。確率モデル管理情報１３０は、少なくとも確率モデルを定義するパラメタを含む。確率モデル管理情報１３０には、確率モデルの構造等に関する情報が含まれてもよい。例えば、確率モデルがニューラルネットワークである場合、確率モデル管理情報１３０には、階層の構造等に関する情報が含まれてよい。 The probabilistic model management information 130 is information for managing probabilistic models. The probabilistic model management information 130 includes at least parameters defining the probabilistic model. The probabilistic model management information 130 may include information regarding the structure of the probabilistic model. For example, if the probabilistic model is a neural network, the probabilistic model management information 130 may include information regarding the hierarchical structure and the like.

第一学習部１１０は、確率モデルを生成するための学習処理（第一学習処理）を実行する。第一学習部１１０は、把持確率算出部１１１、パラメタ更新部１１２、及びシミュレーション部１１３を含む。 The first learning unit 110 executes learning processing (first learning processing) for generating a probability model. The first learning unit 110 includes a gripping probability calculation unit 111 , a parameter update unit 112 and a simulation unit 113 .

把持確率算出部１１１は、把持点及び物体データの組と、確率モデル管理情報１３０によって定義される確率モデルとに基づいて、把持点における物体の把持成功確率を算出する。パラメタ更新部１１２は、確率モデルを定義するパラメタを更新する。すなわち、パラメタ更新部１１２は確率モデルを更新する。 The gripping probability calculation unit 111 calculates the gripping success probability of the object at the gripping point based on the combination of the gripping point and the object data and the probability model defined by the probability model management information 130 . The parameter updating unit 112 updates parameters that define the probability model. That is, the parameter updating unit 112 updates the probability model.

シミュレーション部１１３は、学習用物体データ１７０を用いたシミュレーションを実行することによって学習データ１５０及び確率データ１６０を生成する。シミュレーションは、例えば、物理シミュレーションが考えられる。ここで、学習データ１５０は、学習用物体データ１７０及び把持点（座標）を含む。確率データ１６０は、把持点及び把持成功確率を含む。 The simulation unit 113 generates learning data 150 and probability data 160 by executing a simulation using the learning object data 170 . The simulation can be, for example, a physics simulation. Here, the learning data 150 includes learning object data 170 and gripping points (coordinates). Probability data 160 includes grip points and grip success probabilities.

計算機１００－２の主記憶装置３０１は、第二学習部１２０を実現するプログラム及び物体データ生成モデル管理情報１４０を格納する。 The main storage device 301 of the computer 100-2 stores a program that implements the second learning unit 120 and object data generation model management information 140. FIG.

物体データ生成モデル管理情報１４０は、物体データ生成モデルを管理するための情報である。物体データ生成モデル管理情報１４０は、少なくとも物体データ生成モデルを定義するパラメタを含む。物体データ生成モデル管理情報１４０には、物体データ生成モデルの構造等に関する情報が含まれてもよい。例えば、物体データ生成モデルがニューラルネットワークである場合、物体データ生成モデル管理情報１４０には、階層の構造等に関する情報が含まれてよい。 The object data generation model management information 140 is information for managing object data generation models. The object data generation model management information 140 includes at least parameters defining the object data generation model. The object data generation model management information 140 may include information regarding the structure of the object data generation model. For example, when the object data generation model is a neural network, the object data generation model management information 140 may include information regarding the hierarchical structure and the like.

第二学習部１２０は、物体データ生成モデルを生成するための学習処理（第二学習処理）を実行する。第二学習部１２０は、物体データ生成部１２１、パラメタ更新部１２２、及び把持点データ生成部１２３を含む。 The second learning unit 120 executes learning processing (second learning processing) for generating an object data generation model. The second learning unit 120 includes an object data generation unit 121 , a parameter update unit 122 and a grasping point data generation unit 123 .

物体データ生成部１２１は、基準物体データ１０１と、物体データ生成モデル管理情報１４０によって定義される物体データ生成モデルとに基づいて学習用物体データ１７０を生成する。物体データ生成部１２１は、内部で、自身の生成したデータも含めた繰り返し処理により学習用物体データ１７０を生成してもよい。パラメタ更新部１２２は、物体データ生成モデルを定義するパラメタを更新する。すなわち、パラメタ更新部１２２は物体データ生成モデルを更新する。 The object data generation unit 121 generates learning object data 170 based on the reference object data 101 and the object data generation model defined by the object data generation model management information 140 . The object data generation unit 121 may internally generate the learning object data 170 by repeating processing including data generated by itself. The parameter updating unit 122 updates parameters that define the object data generation model. That is, the parameter updating unit 122 updates the object data generation model.

把持点データ生成部１２３は、学習用物体データ１７０に対応する物体の把持点を決定し、学習用物体データ１７０及び把持点を含む評価用入力データ１８０を生成する。 The gripping point data generator 123 determines gripping points of an object corresponding to the learning object data 170 and generates evaluation input data 180 including the learning object data 170 and the gripping points.

なお、計算機１００－１、１００－２が有する各機能部については、複数の機能部を一つの機能部にまとめてもよいし、一つの機能部を機能毎に複数の機能部に分けてもよい。また、一つの計算機上で稼働する仮想計算機を用いて各機能部を実現してもよい。 It should be noted that, with respect to each functional unit possessed by the computers 100-1 and 100-2, a plurality of functional units may be combined into one functional unit, or one functional unit may be divided into multiple functional units for each function. good. Also, each functional unit may be implemented using a virtual computer running on one computer.

なお、第一学習部１１０の一部の機能部が第二学習部１２０に含まれてもよいし、第二学習部１２０の一部の機能部が第一学習部１１０に含まれてもよい。例えば、第二学習部１２０がシミュレーション部１１３を含んでもよい。 A part of the functional units of the first learning unit 110 may be included in the second learning unit 120, and a part of the functional units of the second learning unit 120 may be included in the first learning unit 110. . For example, the second learning section 120 may include the simulation section 113 .

なお、図１の実線は、第一学習処理におけるデータの入出力を示し、図１の点線は、第二学習処理におけるデータの入出力を示す。本明細書では、第一学習処理及び第二学習処理が循環するように実行されることを敵対的学習と記載する。 The solid line in FIG. 1 indicates input/output of data in the first learning process, and the dotted line in FIG. 1 indicates input/output of data in the second learning process. In this specification, the cyclic execution of the first learning process and the second learning process is referred to as adversarial learning.

図４は、実施例１の計算機システムによって提示される設定画面４００の一例を示す図である。 FIG. 4 is a diagram showing an example of a setting screen 400 presented by the computer system of the first embodiment.

設定画面４００は、第一学習部１１０及び第二学習部１２０が実行する学習処理に必要な設定を行うための画面であり、端末３２０又は出力装置３３１に表示される。なお、設定画面４００は、第一学習部１１０及び第二学習部１２０のどちらが提示してもよい。また、第一学習部１１０及び第二学習部１２０とは異なる機能部が設定画面４００を提示してもよい。 The setting screen 400 is a screen for making settings necessary for the learning process executed by the first learning unit 110 and the second learning unit 120 , and is displayed on the terminal 320 or the output device 331 . Note that the setting screen 400 may be presented by either the first learning unit 110 or the second learning unit 120 . Also, a functional unit other than the first learning unit 110 and the second learning unit 120 may present the setting screen 400 .

設定画面４００は、基準物体データ欄４０１、確率モデル欄４０２、物体データ生成モデル欄４０３、学習回数欄４０４、学習回数欄４０５、目標確率欄４０６、設定ボタン４１０、及び開始ボタン４１１を含む。 The setting screen 400 includes a reference object data field 401 , a probability model field 402 , an object data generation model field 403 , a learning times field 404 , a learning times field 405 , a target probability field 406 , a setting button 410 and a start button 411 .

基準物体データ欄４０１は、第二学習部１２０に入力する基準物体データ１０１を設定する欄である。確率モデル欄４０２は、確率モデルを格納する確率モデル管理情報１３０を設定する欄である。物体データ生成モデル欄４０３は、物体データ生成モデルを格納する物体データ生成モデル管理情報１４０を設定する欄である。 A reference object data column 401 is a column for setting the reference object data 101 to be input to the second learning unit 120 . The probabilistic model column 402 is a column for setting the probabilistic model management information 130 that stores the probabilistic model. The object data generation model column 403 is a column for setting the object data generation model management information 140 that stores the object data generation model.

基準物体データ欄４０１、確率モデル欄４０２、及び物体データ生成モデル欄４０３には、例えば、ファイルパス等が設定される。 For example, a file path or the like is set in the reference object data column 401, the probability model column 402, and the object data generation model column 403. FIG.

学習回数欄４０４は、確率モデルの更新回数の上限値（学習回数の上限値）を設定する欄である。学習回数欄４０５は、物体データ生成モデルの更新回数の上限値（学習回数の上限値）を設定する欄である。目標確率欄４０６は、目標確率を設定する欄である。 The number of times of learning column 404 is a column for setting the upper limit of the number of updates of the probability model (the upper limit of the number of times of learning). The number of times of learning column 405 is a column for setting the upper limit of the number of updates of the object data generation model (the upper limit of the number of times of learning). A target probability column 406 is a column for setting a target probability.

設定ボタン４１０は、計算機１００－１、１００－２に、各欄に設定された値を設定するための操作ボタンである。ユーザが設定ボタン４１０を操作した場合、各欄に設定された値を含む設定指示が計算機システムに送信される。 The setting button 410 is an operation button for setting the values set in each column in the computers 100-1 and 100-2. When the user operates the setting button 410, a setting instruction including the values set in each column is sent to the computer system.

このとき、第一学習部１１０は、初期の確率モデルを確率モデル管理情報１３０に登録し、学習回数欄４０４及び目標確率欄４０６に設定された値をワークエリアに格納する。また、第二学習部１２０は、初期の物体データ生成モデルを物体データ生成モデル管理情報１４０に登録し、基準物体データ１０１及び目標確率欄４０６に設定された値をワークエリアに格納する。 At this time, the first learning unit 110 registers the initial probabilistic model in the probabilistic model management information 130, and stores the values set in the learning times column 404 and the target probability column 406 in the work area. The second learning unit 120 also registers the initial object data generation model in the object data generation model management information 140, and stores the values set in the reference object data 101 and the target probability column 406 in the work area.

開始ボタン４１１は、計算機システムに学習開始指示を入力するための操作ボタンである。当該学習開始指示は、計算機１００－１及び計算機１００－２のいずれかに送信される。 A start button 411 is an operation button for inputting a learning start instruction to the computer system. The learning start instruction is transmitted to either computer 100-1 or computer 100-2.

図５は、実施例１の第一学習部１１０が実行する処理の一例を説明するフローチャートである。 FIG. 5 is a flowchart illustrating an example of processing executed by the first learning unit 110 of the first embodiment.

第一学習部１１０は、学習開始指示を受信した場合、以下で説明する処理を開始する。学習開始指示は、入力装置３３０、端末３２０、又は第二学習部１２０から送信される。なお、入力装置３３０及び端末３２０は、設定画面４００を介して学習開始指示を送信する。 When the first learning unit 110 receives the instruction to start learning, the first learning unit 110 starts the processing described below. A learning start instruction is transmitted from the input device 330 , the terminal 320 , or the second learning section 120 . Note that the input device 330 and the terminal 320 transmit a learning start instruction via the setting screen 400 .

まず、第一学習部１１０は、学習用物体データ１７０を取得する（ステップＳ１０１）。 First, the first learning unit 110 acquires learning object data 170 (step S101).

具体的には、第一学習部１１０は、学習用物体データ１７０の生成指示を第二学習部１２０に送信する。第二学習部１２０の物体データ生成部１２１は、基準物体データ１０１及び物体データ生成モデル管理情報１４０に基づいて学習用物体データ１７０を生成し、第一学習部１１０に学習用物体データ１７０を送信する。 Specifically, first learning unit 110 transmits an instruction to generate learning object data 170 to second learning unit 120 . The object data generation unit 121 of the second learning unit 120 generates learning object data 170 based on the reference object data 101 and the object data generation model management information 140, and transmits the learning object data 170 to the first learning unit 110. do.

次に、第一学習部１１０は、取得した学習用物体データ１７０を用いて学習データ１５０及び確率データ１６０を生成する（ステップＳ１０２）。具体的には、以下のような処理が実行される。 Next, the first learning unit 110 generates learning data 150 and probability data 160 using the obtained learning object data 170 (step S102). Specifically, the following processing is executed.

シミュレーション部１１３は、仮想的な空間に、学習用物体データ１７０に対応する物体を配置する。仮想的な空間における物体の位置及び向き等は任意に設定される。シミュレーション部１１３は、物体の把持点を決定し、当該把持点を基準に物体を把持できるか否かをシミュレーションする。なお、シミュレーション部１１３は、同一の把持点について複数回シミュレーションを実行してもよい。 The simulation unit 113 arranges an object corresponding to the learning object data 170 in the virtual space. The position, orientation, etc. of the object in the virtual space are arbitrarily set. The simulation unit 113 determines the gripping point of the object, and simulates whether or not the object can be gripped based on the gripping point. Note that the simulation unit 113 may perform the simulation multiple times for the same gripping point.

シミュレーション部１１３は、学習用物体データ１７０及び把持点を含む学習データ１５０を生成し、把持確率算出部１１１に学習データ１５０を送信する。 The simulation unit 113 generates learning data 150 including learning object data 170 and gripping points, and transmits the learning data 150 to the gripping probability calculation unit 111 .

シミュレーション部１１３は、シミュレーションの結果に基づいて、把持点毎の把持成功確率を算出する。以下の説明では、シミュレーション部１１３によって算出される把持成功確率を比較用確率とも記載する。 The simulation unit 113 calculates the gripping success probability for each gripping point based on the simulation result. In the following description, the gripping success probability calculated by the simulation unit 113 is also referred to as a comparative probability.

シミュレーション部１１３は、把持点及び比較用確率を含む確率データ１６０を生成し、パラメタ更新部１１２に確率データ１６０を送信する。以上がステップＳ１０２の処理の説明である。 The simulation unit 113 generates probability data 160 including grip points and comparison probabilities, and transmits the probability data 160 to the parameter update unit 112 . The above is the description of the processing in step S102.

次に、第一学習部１１０は、学習データ１５０及び確率データ１６０を用いて第一学習処理を実行する（ステップＳ１０３）。第一学習処理の詳細は図６を用いて後述する。 Next, the first learning unit 110 executes a first learning process using the learning data 150 and the probability data 160 (step S103). Details of the first learning process will be described later with reference to FIG.

次に、第一学習部１１０は、敵対的学習を終了するか否かを判定する（ステップＳ１０４）。 Next, the first learning unit 110 determines whether or not to end the hostile learning (step S104).

例えば、第一学習部１１０は、第一学習処理の実行回数が閾値より大きい場合、敵対的学習を終了すると判定する。 For example, the first learning unit 110 determines to end the adversarial learning when the number of executions of the first learning process is greater than the threshold.

敵対的学習を継続すると判定された場合、第一学習部１１０は、第二学習部１２０に学習開始指示を送信し（ステップＳ１０５）、その後、処理を終了する。敵対的学習を終了すると判定された場合、第一学習部１１０は処理を終了する。 When it is determined to continue the hostile learning, the first learning unit 110 transmits a learning start instruction to the second learning unit 120 (step S105), and then ends the process. If it is determined to end the adversarial learning, the first learning unit 110 ends the process.

なお、シミュレーション部１１３は、物体データ、把持点、及びシミュレーションの結果の組を学習データ１５０として生成し、把持確率算出部１１１及びパラメタ更新部１１２に出力してもよい。 The simulation unit 113 may generate a set of object data, gripping points, and simulation results as the learning data 150 and output it to the gripping probability calculation unit 111 and the parameter update unit 112 .

図６は、実施例１の第一学習部１１０が実行する第一学習処理の一例を説明するフローチャートである。 FIG. 6 is a flowchart illustrating an example of first learning processing executed by the first learning unit 110 of the first embodiment.

把持確率算出部１１１は、学習データ１５０を確率モデルに入力することによって、任意の把持点における物体の把持成功確率を算出する（ステップＳ２０１）。把持確率算出部１１１は、各把持点の把持成功確率をパラメタ更新部１１２に送信する。 The gripping probability calculation unit 111 inputs the learning data 150 into the probability model to calculate the gripping success probability of an object at an arbitrary gripping point (step S201). The gripping probability calculation unit 111 transmits the gripping success probability of each gripping point to the parameter updating unit 112 .

次に、パラメタ更新部１１２は、各把持点の把持成功確率及び比較用確率に基づいて確率モデルを更新する（ステップＳ２０２）。 Next, the parameter updating unit 112 updates the probability model based on the gripping success probability and comparison probability of each gripping point (step S202).

例えば、パラメタ更新部１１２は、各把持点の把持成功確率及び比較用確率の差が小さくなるように確率モデルを定義するパラメタ（確率モデル管理情報１３０）を更新する。 For example, the parameter updating unit 112 updates the parameters (probabilistic model management information 130) defining the probabilistic model so that the difference between the gripping success probability and the comparison probability of each gripping point becomes small.

次に、パラメタ更新部１１２は、確率モデルの学習を終了するか否かを判定する（ステップＳ２０３）。 Next, the parameter updating unit 112 determines whether or not to end the learning of the probability model (step S203).

例えば、確率モデルの更新回数が学習回数欄４０４の値より大きい場合、パラメタ更新部１１２は確率モデルの学習を終了すると判定する。また、パラメタ更新部１１２は、テスト用入力データを把持確率算出部１１１に入力し、把持確率算出部１１１より出力された把持成功確率の最大値が目標確率より大きい場合、確率モデルの学習を終了すると判定する。 For example, when the number of updates of the probability model is greater than the value in the number-of-learnings column 404, the parameter updating unit 112 determines to end the learning of the probability model. Further, the parameter update unit 112 inputs the test input data to the gripping probability calculation unit 111, and if the maximum value of the gripping success probability output from the gripping probability calculation unit 111 is greater than the target probability, the learning of the probability model is terminated. Then judge.

確率モデルの学習を継続すると判定された場合、第一学習部１１０は、ステップＳ２０１に戻り、同様の処理を実行する。確率モデルの学習を終了すると判定された場合、第一学習部１１０は第一学習処理を終了する。 When it is determined to continue learning the probabilistic model, the first learning unit 110 returns to step S201 and performs similar processing. When it is determined to end the learning of the probability model, the first learning unit 110 ends the first learning process.

図７は、実施例１の第二学習部１２０が実行する処理の一例を説明するフローチャートである。 FIG. 7 is a flowchart illustrating an example of processing executed by the second learning unit 120 according to the first embodiment.

第二学習部１２０は、学習開始指示を受信した場合、以下で説明する処理を開始する。学習開始指示は、入力装置３３０、端末３２０、又は第一学習部１１０から送信される。 When the second learning unit 120 receives the instruction to start learning, the second learning unit 120 starts processing described below. A learning start instruction is transmitted from the input device 330 , the terminal 320 , or the first learning section 110 .

まず、第二学習部１２０は、評価用入力データ１８０を生成する（ステップＳ３０１）。具体的には、以下のような処理が実行される。 First, the second learning unit 120 generates evaluation input data 180 (step S301). Specifically, the following processing is executed.

物体データ生成部１２１は、基準物体データ１０１及び物体データ生成モデルに基づいて、任意の形状の物体のポリゴンメッシュを生成する。物体データ生成部１２１は、ポリゴンメッシュをレンダリングすることによって画像データを生成する。物体データ生成部１２１は、画像データを含む学習用物体データ１７０を生成し、把持点データ生成部１２３に送信する。 The object data generator 121 generates a polygon mesh of an arbitrary shaped object based on the reference object data 101 and the object data generation model. The object data generator 121 generates image data by rendering polygon meshes. The object data generation unit 121 generates learning object data 170 including image data, and transmits the learning object data 170 to the grasping point data generation unit 123 .

レンダリングの手法としては例えば非特許文献１に記載の手法を用いる。なお、本発明は、レンダリングの手法に限定されない。 As a rendering method, for example, the method described in Non-Patent Document 1 is used. Note that the present invention is not limited to the rendering technique.

把持点データ生成部１２３は、学習用物体データ１７０に対応する物体の複数の把持点を決定し、学習用物体データ１７０及び把持点を含む評価用入力データ１８０を生成する。なお、学習用物体データ１７０及び一つの把持点の組に対して一つの評価用入力データ１８０が生成される。以上がステップＳ３０１の処理の説明である。 The gripping point data generation unit 123 determines a plurality of gripping points of an object corresponding to the learning object data 170 and generates evaluation input data 180 including the learning object data 170 and the gripping points. Note that one piece of evaluation input data 180 is generated for a set of learning object data 170 and one gripping point. The above is the description of the processing in step S301.

次に、第二学習部１２０は、評価用入力データ１８０を第一学習部１１０に送信し、参照用確率データ１９０を取得する（ステップＳ３０２）。 Next, the second learning unit 120 transmits the evaluation input data 180 to the first learning unit 110 and acquires the reference probability data 190 (step S302).

第一学習部１１０の把持確率算出部１１１は、評価用入力データ１８０を受信した場合、各把持点における物体の把持成功確率を算出し、算出された把持成功確率を含む参照用確率データ１９０を第二学習部１２０に送信する。一つの評価用入力データ１８０に対して一つの参照用確率データ１９０が生成される。 When the evaluation input data 180 is received, the gripping probability calculation unit 111 of the first learning unit 110 calculates the gripping success probability of the object at each gripping point, and generates reference probability data 190 including the calculated gripping success probability. Send to the second learning unit 120 . One piece of reference probability data 190 is generated for one piece of evaluation input data 180 .

次に、第二学習部１２０は、第二学習処理を実行する（ステップＳ３０３）。第二学習処理の詳細は図８を用いて後述する。 Next, the second learning unit 120 executes a second learning process (step S303). Details of the second learning process will be described later with reference to FIG.

次に、第二学習部１２０は、敵対的学習を終了するか否かを判定する（ステップＳ３０４）。 Next, the second learning unit 120 determines whether or not to end the hostile learning (step S304).

例えば、第二学習部１２０は、第二学習処理の実行回数が閾値より大きい場合、敵対的学習を終了すると判定する。 For example, the second learning unit 120 determines to end the adversarial learning when the number of executions of the second learning process is greater than the threshold.

敵対的学習を終了すると判定された場合、第二学習部１２０は処理を終了する。 If it is determined to end the adversarial learning, the second learning unit 120 ends the process.

敵対的学習を継続すると判定された場合、第二学習部１２０は、第一学習部１１０に学習開始指示を送信する（ステップＳ３０５）。その後、第二学習部１２０は処理を終了する。 When it is determined to continue the hostile learning, the second learning unit 120 transmits a learning start instruction to the first learning unit 110 (step S305). After that, the second learning unit 120 ends the processing.

図８は、実施例１の第二学習部１２０が実行する第二学習処理の一例を説明するフローチャートである。 FIG. 8 is a flowchart illustrating an example of the second learning process executed by the second learning unit 120 of the first embodiment.

パラメタ更新部１２２は、各把持点の把持成功確率及び目標確率に基づいて物体データ生成モデルを更新する（ステップＳ４０１）。 The parameter updating unit 122 updates the object data generation model based on the gripping success probability and the target probability of each gripping point (step S401).

具体的には、パラメタ更新部１２２は、把持成功確率の最大値及び目標確率の差が小さくなるように物体データ生成モデルを定義するパラメタ（物体データ生成モデル管理情報１４０）を更新する。例えば、ランダムに物体データ生成モデルを定義するパラメタ（物体データ生成モデル管理情報１４０）を更新してもよいし、ユーザが手動でパラメタを更新してもよい。なお、前述した更新方法は一例であってこれに限定されない。このように物体データ生成モデルを更新することによって、確率モデルから出力される把持成功確率の最大値が目標確率と近くなるような物体データを効率的に生成することができる。 Specifically, the parameter updating unit 122 updates the parameters (object data generation model management information 140) defining the object data generation model so that the difference between the maximum value of the gripping success probability and the target probability becomes small. For example, the parameters (object data generation model management information 140) defining the object data generation model may be randomly updated, or the parameters may be manually updated by the user. Note that the updating method described above is an example and is not limited to this. By updating the object data generation model in this way, it is possible to efficiently generate object data such that the maximum value of the gripping success probability output from the probability model is close to the target probability.

次に、パラメタ更新部１２２は、確率モデルの学習を終了するか否かを判定する（ステップＳ４０２）。 Next, the parameter updating unit 122 determines whether or not to end the learning of the probability model (step S402).

例えば、物体データ生成モデルの更新回数が学習回数欄４０５の値より大きい場合、パラメタ更新部１２２は物体データ生成モデルの学習を終了すると判定する。また、パラメタ更新部１２２は、テスト用入力データを把持確率算出部１１１に入力し、把持確率算出部１１１より出力された把持成功確率の最大値が目標確率より大きい場合、確率モデルの学習を終了すると判定する。 For example, when the number of updates of the object data generation model is greater than the value in the learning number column 405, the parameter updating unit 122 determines to finish learning the object data generation model. Further, the parameter update unit 122 inputs test input data to the gripping probability calculation unit 111, and if the maximum value of the gripping success probability output from the gripping probability calculation unit 111 is greater than the target probability, the learning of the probability model is terminated. Then judge.

物体データ生成モデルの学習を継続すると判定された場合、第二学習部１２０は、ステップＳ４０１に戻り、同様の処理を実行する。物体データ生成モデルの学習を終了すると判定された場合、第二学習部１２０は第二学習処理を終了する。 When it is determined to continue learning the object data generation model, the second learning unit 120 returns to step S401 and performs similar processing. When it is determined to end the learning of the object data generation model, the second learning unit 120 ends the second learning process.

次に、確率モデルに基づく把持装置２００の制御の一例を説明する。図９は、実施例１の把持装置２００の制御の一例を説明するフローチャートである。 Next, an example of control of the gripping device 200 based on the probability model will be described. FIG. 9 is a flowchart illustrating an example of control of the gripping device 200 according to the first embodiment.

制御装置２０１は、物体２１０が存在する空間（作業空間）の画像データを取得する（ステップＳ５０１）。 The control device 201 acquires image data of a space (work space) in which the object 210 exists (step S501).

次に、制御装置２０１は、画像データ及び確率モデルを用いて、対象空間における把持成功確率の分布を算出する（ステップＳ５０２）。 Next, the control device 201 calculates the distribution of gripping success probability in the target space using the image data and the probability model (step S502).

具体的には、制御装置２０１は、画像データ及び座標を確率モデルに入力し、各座標の把持成功確率を算出する。制御装置２０１は、公知の画像処理を用いて、画像データに含まれる物体を抽出する処理等を実行してもよい。 Specifically, the control device 201 inputs the image data and the coordinates to the probability model, and calculates the gripping success probability of each coordinate. The control device 201 may use known image processing to perform processing such as extracting an object included in the image data.

次に、制御装置２０１は、作業空間における把持成功確率の分布に基づいて、物体２１０の把持点を決定し、決定された把持点の軌道計画を生成する（ステップＳ５０３）。 Next, the control device 201 determines the gripping point of the object 210 based on the distribution of the gripping success probability in the work space, and generates a trajectory plan for the determined gripping point (step S503).

次に、制御装置２０１は、軌道計画に基づいてアーム２０２を制御する（ステップＳ５０４）。これによって、把持装置２００が物体２１０を把持する。 Next, the controller 201 controls the arm 202 based on the trajectory plan (step S504). Thereby, the gripping device 200 grips the object 210 .

以上で説明したように、実施例１によれば、第一学習処理において、物体データ生成モデルに基づいて生成された物体データを用いることによって様々な形状の物体を把持するための確率モデルを生成できる。また、把持成功確率の最大値及び目標確率の差に基づいて物体データ生成モデルを更新することによって、把持成功確率の最大値が目標確率と近くなるような物体データを効率的に生成できる。これによって、より効率的な確率モデルの学習を実現できる。 As described above, according to the first embodiment, in the first learning process, the object data generated based on the object data generation model is used to generate a probability model for gripping objects of various shapes. can. Also, by updating the object data generation model based on the difference between the maximum value of the gripping success probability and the target probability, it is possible to efficiently generate object data in which the maximum value of the gripping success probability is close to the target probability. This makes it possible to achieve more efficient probabilistic model learning.

実施例２では、第二学習部１２０が実行する処理が一部異なる。以下、実施例１との差異を中心に実施例２について説明する。 In Example 2, the processing executed by the second learning unit 120 is partially different. The second embodiment will be described below, focusing on the differences from the first embodiment.

実施例２の計算機システムの構成は実施例１と同一である。計算機１００のハードウェア構成は実施例１と同一である。実施例２の計算機１００のソフトウェア構成は実施例１と同一である。 The configuration of the computer system of the second embodiment is the same as that of the first embodiment. The hardware configuration of the computer 100 is the same as that of the first embodiment. The software configuration of the computer 100 of the second embodiment is the same as that of the first embodiment.

ただし、実施例２の第二学習部１２０は進捗フラグを管理する。進捗フラグには、「０」、「１」、「２」のいずれかが設定される。進捗フラグの初期値は「０」に設定されているものとする。「１」は確率モデルの学習が停滞している状態を示し、「２」は確率モデルの学習が進捗している状態を示す。「０」は、前述のいずれにも該当しない状態を示す。 However, the second learning unit 120 of the second embodiment manages progress flags. Any one of "0", "1", and "2" is set in the progress flag. Assume that the initial value of the progress flag is set to "0". "1" indicates that learning of the probability model is stagnant, and "2" indicates that learning of the probability model is progressing. "0" indicates a state that does not correspond to any of the above.

実施例２では、確率モデルの学習の進捗状況に応じて第二学習処理の学習を調整する。これによって、効率的な確率モデルの学習を実現できる。 In the second embodiment, learning in the second learning process is adjusted according to the progress of learning of the probability model. This makes it possible to implement efficient probabilistic model learning.

実施例２の第一学習部１１０が実行する処理は実施例１と同一である。実施例２では、第二学習部１２０が実行する処理が一部異なる。図１０は、実施例２の第二学習部１２０が実行する処理の一例を説明するフローチャートである。 The processing executed by the first learning unit 110 of the second embodiment is the same as that of the first embodiment. In Example 2, the processing executed by the second learning unit 120 is partially different. FIG. 10 is a flowchart illustrating an example of processing executed by the second learning unit 120 of the second embodiment.

ステップＳ３０１、ステップＳ３０２の処理は実施例１と同一である。ステップＳ３０２の処理が実行された後、第二学習部１２０は、確率モデルの学習が停滞しているか否かを判定する（ステップＳ３５１）。判定方法としては以下のような方法が考えられる。 The processes of steps S301 and S302 are the same as those of the first embodiment. After the process of step S302 is executed, the second learning unit 120 determines whether learning of the probability model is stagnant (step S351). As a determination method, the following methods are conceivable.

（判定方法１）第二学習部１２０は、各把持点の把持成功確率から把持成功確率の平均値を算出し、当該平均値が閾値より小さいか否かを判定する。把持成功確率の平均値が閾値より小さい場合、第二学習部１２０は確率モデルの学習が停滞していると判定する。なお、平均値以外の統計値を用いてもよい。 (Determination method 1) The second learning unit 120 calculates the average value of the gripping success probabilities from the gripping success probabilities of the gripping points, and determines whether or not the average value is smaller than the threshold value. When the average value of the gripping success probabilities is smaller than the threshold, the second learning unit 120 determines that learning of the probability model is stagnant. A statistical value other than the average value may be used.

（判定方法２）第二学習部１２０は、把持成功確率の最大値を履歴として管理し、前回の最大値及び今回の最大値に基づいて変化率を算出する。第二学習部１２０は、把持成功確率の最大値及び目標確率の差が第一閾値より小さく、かつ、変化率が第二閾値より小さいか否かを判定する。前述の条件を満たす場合、第二学習部１２０は、確率モデルの学習が停滞していると判定する。 (Determination method 2) The second learning unit 120 manages the maximum value of the gripping success probability as a history, and calculates the rate of change based on the previous maximum value and the current maximum value. The second learning unit 120 determines whether the difference between the maximum value of the gripping success probability and the target probability is smaller than the first threshold and whether the rate of change is smaller than the second threshold. If the above conditions are satisfied, the second learning unit 120 determines that learning of the probability model is stagnant.

確率モデルの学習が停滞していると判定された場合、第二学習部１２０は、進捗フラグに「１」を設定し（ステップＳ３５２）、第二学習処理を実行する（ステップＳ３０３）。 When it is determined that the learning of the probabilistic model has stagnated, the second learning unit 120 sets the progress flag to "1" (step S352), and executes the second learning process (step S303).

確率モデルの学習が停滞していないと判定された場合、第二学習部１２０は、第二学習処理を実行する（ステップＳ３０３）。 When it is determined that learning of the probability model has not stagnated, the second learning unit 120 executes a second learning process (step S303).

ステップＳ３０４において敵対的学習を終了しないと判定された場合、第二学習部１２０は、確率モデルの学習が進捗しているか否かを判定する（ステップＳ３５３）。 If it is determined in step S304 that the adversarial learning will not end, the second learning unit 120 determines whether learning of the probability model is progressing (step S353).

例えば、第二学習部１２０は、把持成功確率の最大値が目標確率より大きくなる回数が閾値より大きい場合、確率モデルの学習が進捗していると判定する。 For example, the second learning unit 120 determines that learning of the probability model is progressing when the number of times the maximum value of the gripping success probability becomes greater than the target probability is greater than a threshold.

確率モデルの学習が進捗していないと判定された場合、第二学習部１２０は、第一学習部１１０に学習開始指示を送信する（ステップＳ３０５）。その後、第二学習部１２０は処理を終了する。 When it is determined that learning of the probability model is not progressing, the second learning unit 120 transmits a learning start instruction to the first learning unit 110 (step S305). After that, the second learning unit 120 ends the processing.

確率モデルの学習が進捗していると判定された場合、第二学習部１２０は、進捗フラグに「２」を設定し（ステップＳ３５４）、第一学習部１１０に学習開始指示を送信する（ステップＳ３０５）。その後、第二学習部１２０は処理を終了する。 When it is determined that learning of the probability model is progressing, the second learning unit 120 sets the progress flag to "2" (step S354), and transmits a learning start instruction to the first learning unit 110 (step S305). After that, the second learning unit 120 ends the processing.

実施例２では第二学習処理も異なる。図１１は、実施例２の第二学習部１２０が実行する第二学習処理の一例を説明するフローチャートである。 The second learning process is also different in the second embodiment. FIG. 11 is a flowchart illustrating an example of the second learning process executed by the second learning unit 120 of the second embodiment.

第二学習部１２０は、進捗フラグが「０」であるか否かを判定する（ステップＳ４５１）。 The second learning unit 120 determines whether the progress flag is "0" (step S451).

進捗フラグが「０」であると判定された場合、第二学習部１２０は、ステップＳ４５３に進む。 If the progress flag is determined to be "0", the second learning unit 120 proceeds to step S453.

進捗フラグが「０」でないと判定された場合、第二学習部１２０は、目標範囲を更新する（ステップＳ４５２）。 When it is determined that the progress flag is not "0", the second learning unit 120 updates the target range (step S452).

具体的には、第二学習部１２０は変数α及び変数βを変更する。変数α及び変数βは、基準物体と、更新後の物体データ生成モデルに基づいて生成された物体との間の類似度の範囲を指定するための変数である。なお、第二学習部１２０は、変数α及び変数βの初期値を保持しているものとする。変数α及び変数βの初期値は設定画面４００を介して設定されてもよい。 Specifically, the second learning unit 120 changes the variables α and β. Variable α and variable β are variables for specifying the range of similarity between the reference object and the object generated based on the updated object data generation model. It is assumed that the second learning unit 120 holds initial values of the variables α and β. Initial values of the variables α and β may be set via the setting screen 400 .

第二学習部１２０は、把持成功確率の最大値及び目標確率の差に基づいて、変数α、βを更新する。具体的には以下のように設定される。 The second learning unit 120 updates the variables α and β based on the difference between the maximum gripping success probability and the target probability. Specifically, it is set as follows.

進捗フラグが「１」である場合、第二学習部１２０は、当該差に基づいて、前回の変数α、βより大きな値を設定する。これによって、更新前の物体データ生成モデルに基づいて生成された物体より、基準物体からの変化量が小さい物体の物体データが生成されるように物体データ生成モデルを更新できる。 When the progress flag is "1", the second learning unit 120 sets values greater than the previous variables α and β based on the difference. As a result, the object data generation model can be updated so that object data of an object having a smaller amount of change from the reference object than an object generated based on the object data generation model before updating is generated.

進捗フラグが「２」である場合、第二学習部１２０は、当該差に基づいて、前回の変数α、βより小さい値を設定する。これによって、更新前の物体データ生成モデルに基づいて生成された物体より、基準物体からの変化量が大きい物体の物体データが生成されるように物体データ生成モデルを更新できる。 When the progress flag is "2", the second learning unit 120 sets values smaller than the previous variables α and β based on the difference. As a result, the object data generation model can be updated so that the object data of the object having a larger change amount from the reference object than the object generated based on the object data generation model before updating is generated.

前述のような処理を実行することによって、把持成功確率の最大値及び目標確率の差を、物体データ生成モデルの更新に伝搬させることができる。以上がステップＳ４５２の処理の説明である。 By executing the processing as described above, the difference between the maximum value of the gripping success probability and the target probability can be propagated to update the object data generation model. The above is the description of the processing in step S452.

ステップＳ４５３では、第二学習部１２０は、目標範囲を設定する（ステップＳ４５３）。 In step S453, the second learning unit 120 sets the target range (step S453).

次に、パラメタ更新部１２２は、物体データ生成モデルを更新する（ステップＳ４５４）。物体データ生成モデルの更新方法は実施例１と同一である。 Next, the parameter updating unit 122 updates the object data generation model (step S454). The method for updating the object data generation model is the same as in the first embodiment.

次に、物体データ生成部１２１は、更新後の物体データ生成モデルに基づいて、学習用物体データ１７０を生成する（ステップＳ４５５）。 Next, the object data generation unit 121 generates learning object data 170 based on the updated object data generation model (step S455).

次に、パラメタ更新部１２２は、基準物体と学習用物体データ１７０に対応する物体との間の類似度を算出する（ステップＳ４５６）。 Next, the parameter updating unit 122 calculates the degree of similarity between the reference object and the object corresponding to the learning object data 170 (step S456).

次に、パラメタ更新部１２２は、物体データ生成モデルの学習を終了するか否かを判定する（ステップＳ４５７）。具体的には、以下のような処理が実行される。 Next, the parameter update unit 122 determines whether or not to end the learning of the object data generation model (step S457). Specifically, the following processing is executed.

パラメタ更新部１２２は、類似度が目標範囲内であるか否かを判定する。具体的には、パラメタ更新部１２２は、類似度が変数α以上、かつ、変数β以下であるか否かを判定する。類似度が目標範囲内である場合、パラメタ更新部１２２は、物体データ生成モデルの学習を終了すると判定する。 The parameter updating unit 122 determines whether the similarity is within the target range. Specifically, the parameter updating unit 122 determines whether or not the degree of similarity is greater than or equal to the variable α and less than or equal to the variable β. If the similarity is within the target range, the parameter updating unit 122 determines to end the learning of the object data generation model.

類似度が目標範囲外であると判定された場合、パラメタ更新部１２２は、物体データ生成モデルの更新回数が学習回数欄４０５の値より大きいか否かを判定する。 When it is determined that the degree of similarity is outside the target range, the parameter updating unit 122 determines whether or not the update count of the object data generation model is greater than the value in the learning count column 405 .

物体データ生成モデルの更新回数が学習回数欄４０５の値より大きい場合、パラメタ更新部１２２は、物体データ生成モデルの学習を終了すると判定する。物体データ生成モデルの更新回数が学習回数欄４０５の値以下であると判定された場合、パラメタ更新部１２２は、物体データ生成モデルの学習を継続すると判定する。以上がステップＳ４５７の処理の説明である。 When the number of updates of the object data generation model is greater than the value in the number of times of learning column 405, the parameter updating unit 122 determines to end the learning of the object data generation model. When it is determined that the number of updates of the object data generation model is equal to or less than the value of the number of times of learning column 405, the parameter updating unit 122 determines to continue learning of the object data generation model. The above is the description of the processing in step S457.

物体データ生成モデルの学習を継続すると判定された場合、第二学習部１２０は、ステップＳ４５４に戻り、同様の処理を実行する。 If it is determined to continue learning the object data generation model, the second learning unit 120 returns to step S454 and performs similar processing.

物体データ生成モデルの学習を終了すると判定された場合、第二学習部１２０は、進捗フラグに「０」を設定し（ステップＳ４５８）、その後、第二学習処理を終了する。 When it is determined to end the learning of the object data generation model, the second learning unit 120 sets the progress flag to "0" (step S458), and then ends the second learning process.

変数αの値又は変数βを小さくした場合、基準物体からの変化量が大きい物体の物体データを生成する物体データ生成モデルに更新される。変数α又は変数βを大きくした場合、基準物体からの変化量が小さい物体の物体データを生成する物体データ生成モデルに更新される。 When the value of the variable α or the variable β is decreased, the object data generation model is updated to generate object data of an object with a large amount of change from the reference object. When the variable α or the variable β is increased, the object data generation model is updated to generate object data of an object with a small amount of change from the reference object.

変数βを調整は、物体の多様性を調整することに対応する。変数αを調整は、変化量が非常に大きい物体の生成を抑止することに対応する。目標範囲の幅の調整は、物体データ生成モデルの変化量の調整に対応する。 Adjusting the variable β corresponds to adjusting the diversity of objects. Adjusting the variable α corresponds to suppressing the generation of an object with a very large amount of change. Adjusting the width of the target range corresponds to adjusting the amount of change in the object data generation model.

複雑な形状の物体の物体データを学習データとして入力した場合、確率モデルの学習が停滞する可能性がある。そこで、実施例２の計算機システムは、確率モデルの学習が停滞している場合、簡易な形状の物体が生成されるように物体データ生成モデルを更新する。すなわち、基準物体からの形状の変化量が更新前の物体データ生成モデルから生成された物体より小さくなるように、物体データ生成モデルが更新される。簡易な形状の物体の物体データを入力することによって、確率モデルの学習の停滞を防止できる。 When object data of an object with a complicated shape is input as learning data, learning of the probability model may stagnate. Therefore, the computer system of the second embodiment updates the object data generation model so that an object with a simple shape is generated when learning of the probability model is stagnant. That is, the object data generation model is updated such that the amount of change in shape from the reference object is smaller than that of the object generated from the object data generation model before updating. By inputting object data of an object with a simple shape, it is possible to prevent stagnation in the learning of the probability model.

一方、確率モデルの学習が進捗している場合、第二学習部１２０は、より複雑な形状の物体が生成されるように物体データ生成モデルを更新する。すなわち、基準物体からの形状の変化量が更新前の物体データ生成モデルから生成された物体より大きくなるように、物体データ生成モデルが更新される。これによって、より複雑な形状の物体を把持するための確率モデルを効率的に生成することができる。 On the other hand, when learning of the probability model is progressing, the second learning unit 120 updates the object data generation model so that an object with a more complicated shape is generated. That is, the object data generation model is updated such that the amount of change in shape from the reference object is greater than that of the object generated from the object data generation model before updating. This makes it possible to efficiently generate a probabilistic model for gripping an object with a more complicated shape.

実施例２によれば、確率モデルの学習の停滞を防止し、かつ、効率的な学習を実現できる。 According to the second embodiment, it is possible to prevent stagnation of learning of the probability model and realize efficient learning.

実施例３では、確率モデルの学習処理の停滞が検出された場合、第二学習部１２０の指示に基づいて、第一学習部１１０が確率モデルの再学習処理を実行する。以下、実施例１との差異を中心に実施例３について説明する。 In the third embodiment, when the stagnation of the learning process of the probability model is detected, the first learning unit 110 executes the re-learning process of the probability model based on the instruction of the second learning unit 120 . The third embodiment will be described below, focusing on the differences from the first embodiment.

図１２は、実施例３の計算機システムの構成例を示す図である。 FIG. 12 is a diagram illustrating a configuration example of a computer system according to the third embodiment.

実施例３の計算機システムの構成は実施例１と同一である。計算機１００のハードウェア構成は実施例１と同一である。実施例３の計算機１００－２のソフトウェア構成は実施例１と同一である。 The configuration of the computer system of the third embodiment is the same as that of the first embodiment. The hardware configuration of the computer 100 is the same as that of the first embodiment. The software configuration of the computer 100-2 of the third embodiment is the same as that of the first embodiment.

実施例３では、計算機１００－１の主記憶装置３０１に履歴管理情報１２００が格納される。その他の構成は実施例１と同一である。 In Example 3, the history management information 1200 is stored in the main storage device 301 of the computer 100-1. Other configurations are the same as those of the first embodiment.

履歴管理情報１２００は、確率モデル管理情報１３０の履歴を管理するための情報である。履歴管理情報１２００では、確率モデル管理情報１３０及び更新日時が対応づけて管理される。 History management information 1200 is information for managing the history of probability model management information 130 . In the history management information 1200, the probabilistic model management information 130 and update date/time are managed in association with each other.

実施例３では、第一学習部１１０及び第二学習部１２０が実行する処理が一部異なる。 In Example 3, the processes executed by the first learning unit 110 and the second learning unit 120 are partially different.

図１３は、実施例３の第一学習部１１０が実行する処理の一例を説明するフローチャートである。 FIG. 13 is a flowchart illustrating an example of processing executed by the first learning unit 110 of the third embodiment.

第一学習部１１０は、受信した指示が再学習開始指示であるか否かを判定する（ステップＳ１５１）。 The first learning unit 110 determines whether or not the received instruction is a re-learning start instruction (step S151).

受信した指示が学習開始指示であると判定された場合、第一学習部１１０はステップＳ１０１に進む。 If the received instruction is determined to be a learning start instruction, the first learning unit 110 proceeds to step S101.

受信した指示が再学習開始指示であると判定された場合、第一学習部１１０のパラメタ更新部１１２は、履歴管理情報１２００に格納される履歴に基づいて確率モデルを更新する（ステップＳ１５２）。その後、第一学習部１１０はステップＳ１０１に進む。 If the received instruction is determined to be a re-learning start instruction, the parameter updating unit 112 of the first learning unit 110 updates the probability model based on the history stored in the history management information 1200 (step S152). After that, the first learning unit 110 proceeds to step S101.

例えば、パラメタ更新部１１２は、更新日時が最も新しい確率モデルの履歴を読出し、確率モデルを更新する。なお、前述の更新方法は一例であって、これに限定されない。例えば、一定期間前の確率モデルの履歴に基づいて確率モデルが更新されてもよい。 For example, the parameter updating unit 112 reads the history of the probability model with the latest update date and time, and updates the probability model. Note that the update method described above is merely an example, and the present invention is not limited to this. For example, the probability model may be updated based on the history of the probability model a certain period of time ago.

実施例３のステップＳ１０１からステップＳ１０５までの処理は実施例１と同一である。 The processing from step S101 to step S105 of the third embodiment is the same as that of the first embodiment.

図１４は、実施例３の第二学習部１２０が実行する処理の一例を説明するフローチャートである。 FIG. 14 is a flowchart illustrating an example of processing executed by the second learning unit 120 according to the third embodiment.

実施例１のステップＳ３０１からステップＳ３０２、ステップＳ３０３からステップＳ３０５の処理は実施例１の処理と同一である。ステップＳ３５１の処理は実施例２の処理と同一である。 The processes from step S301 to step S302 and from step S303 to step S305 of the first embodiment are the same as those of the first embodiment. The processing of step S351 is the same as the processing of the second embodiment.

ステップＳ３５１において、確率モデルの学習が停滞していないと判定された場合、第二学習部１２０はステップＳ３０３に進む。 If it is determined in step S351 that learning of the probability model has not stagnated, the second learning unit 120 proceeds to step S303.

ステップＳ３５１において、確率モデルの学習が停滞していると判定された場合、第二学習部１２０は第一学習部１１０に再学習開始指示を出力する（ステップＳ３６１）。その後、第二学習部１２０は処理を終了する。 When it is determined in step S351 that learning of the probabilistic model has stagnated, the second learning unit 120 outputs a re-learning start instruction to the first learning unit 110 (step S361). After that, the second learning unit 120 ends the processing.

実施例３によれば、確率モデルの学習が停滞している場合、第一学習部１１０は、過去の確率モデルに戻した上で学習処理を実行する。これによって、学習の停滞を回避することができる。 According to the third embodiment, when the learning of the probability model is stagnant, the first learning unit 110 restores the past probability model and then executes the learning process. This makes it possible to avoid learning stagnation.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。また、例えば、上記した実施例は本発明を分かりやすく説明するために構成を詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、各実施例の構成の一部について、他の構成に追加、削除、置換することが可能である。 In addition, the present invention is not limited to the above-described embodiments, and includes various modifications. Further, for example, the above-described embodiments are detailed descriptions of the configurations for easy understanding of the present invention, and are not necessarily limited to those having all the described configurations. Moreover, it is possible to add, delete, or replace a part of the configuration of each embodiment with another configuration.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、本発明は、実施例の機能を実現するソフトウェアのプログラムコードによっても実現できる。この場合、プログラムコードを記録した記憶媒体をコンピュータに提供し、そのコンピュータが備えるプロセッサが記憶媒体に格納されたプログラムコードを読み出す。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施例の機能を実現することになり、そのプログラムコード自体、及びそれを記憶した記憶媒体は本発明を構成することになる。このようなプログラムコードを供給するための記憶媒体としては、例えば、フレキシブルディスク、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、光ディスク、光磁気ディスク、ＣＤ－Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどが用いられる。 Further, each of the above configurations, functions, processing units, processing means, and the like may be realized by hardware, for example, by designing a part or all of them using an integrated circuit. The present invention can also be implemented by software program code that implements the functions of the embodiments. In this case, a computer is provided with a storage medium recording the program code, and a processor included in the computer reads the program code stored in the storage medium. In this case, the program code itself read from the storage medium implements the functions of the above-described embodiments, and the program code itself and the storage medium storing it constitute the present invention. Examples of storage media for supplying such program code include flexible disks, CD-ROMs, DVD-ROMs, hard disks, SSDs (Solid State Drives), optical disks, magneto-optical disks, CD-Rs, magnetic tapes, A nonvolatile memory card, ROM, or the like is used.

また、本実施例に記載の機能を実現するプログラムコードは、例えば、アセンブラ、Ｃ／Ｃ＋＋、ｐｅｒｌ、Ｓｈｅｌｌ、ＰＨＰ、Ｐｙｔｈｏｎ、Ｊａｖａ（登録商標）等の広範囲のプログラム又はスクリプト言語で実装できる。 Also, the program code that implements the functions described in this embodiment can be implemented in a wide range of programs or scripting languages such as assembler, C/C++, perl, Shell, PHP, Python, and Java (registered trademark).

さらに、実施例の機能を実現するソフトウェアのプログラムコードを、ネットワークを介して配信することによって、それをコンピュータのハードディスクやメモリ等の記憶手段又はＣＤ－ＲＷ、ＣＤ－Ｒ等の記憶媒体に格納し、コンピュータが備えるプロセッサが当該記憶手段や当該記憶媒体に格納されたプログラムコードを読み出して実行するようにしてもよい。 Furthermore, by distributing the program code of the software that implements the functions of the embodiment via a network, it can be stored in storage means such as a hard disk or memory of a computer, or in a storage medium such as a CD-RW or CD-R. Alternatively, a processor provided in the computer may read and execute the program code stored in the storage means or the storage medium.

上述の実施例において、制御線や情報線は、説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。全ての構成が相互に接続されていてもよい。 In the above-described embodiments, the control lines and information lines indicate those considered necessary for explanation, and not all control lines and information lines are necessarily indicated on the product. All configurations may be interconnected.

１００計算機
１０１基準物体データ
１１０第一学習部
１１１把持確率算出部
１１２、１２２パラメタ更新部
１１３シミュレーション部
１２０第二学習部
１２１物体データ生成部
１２３把持点データ生成部
１３０確率モデル管理情報
１４０物体データ生成モデル管理情報
１５０学習データ
１６０確率データ
１７０学習用物体データ
１８０評価用入力データ
１９０参照用確率データ
２００把持装置
２０１制御装置
２０２アーム
２１０物体
２２０把持点
３００プロセッサ
３０１主記憶装置
３０２副記憶装置
３０３ネットワークインタフェース
３０４ＩＯインタフェース
３１０ネットワーク
３２０端末
３３０入力装置
３３１出力装置
４００設定画面
１２００履歴管理情報 100 computer 101 reference object data 110 first learning unit 111 gripping probability calculation units 112, 122 parameter update unit 113 simulation unit 120 second learning unit 121 object data generation unit 123 gripping point data generation unit 130 probability model management information 140 object data generation Model management information 150 Learning data 160 Probability data 170 Learning object data 180 Evaluation input data 190 Reference probability data 200 Grasping device 201 Control device 202 Arm 210 Object 220 Grasping point 300 Processor 301 Main storage device 302 Sub storage device 303 Network interface 304 IO interface 310 network 320 terminal 330 input device 331 output device 400 setting screen 1200 history management information

Claims

把持装置が物体を把持するための制御アルゴリズムを学習する計算機システムであって、
前記計算機システムは、演算装置及び前記演算装置に接続される記憶装置を有する少なくとも一つの計算機を備え、
前記少なくとも一つの計算機は、
前記物体の形状に関する情報を含む物体データ、当該物体を把持する位置の基準となる把持点、及び前記物体の把持の成否を示す情報を含む学習データを用いて、当該物体の把持成功確率を出力する第一モデルを生成するための第一学習処理と、
基準物体の形状を変形させた物体の前記物体データを出力する第二モデルを生成するための第二学習処理と、を実行し、
前記少なくとも一つの計算機は、前記第二学習処理において、
前記第二モデルに基づいて、前記基準物体の形状を変形した第一物体の前記物体データを生成し、
前記第一物体の把持点を決定し、
前記第一物体の物体データ及び前記第一物体の把持点の複数の組を前記第一モデルに入力し、
前記第一モデルより出力された前記第一物体の把持成功確率の最大値及び目標確率の差に基づいて、前記第二モデルを更新することを特徴とする計算機システム。 A computer system in which a gripping device learns a control algorithm for gripping an object,
The computer system comprises at least one computer having an arithmetic device and a storage device connected to the arithmetic device,
The at least one calculator comprises:
Using object data including information about the shape of the object, gripping points that serve as reference positions for gripping the object, and learning data including information indicating success or failure of gripping the object, outputting the gripping success probability of the object. a first learning process for generating a first model that
a second learning process for generating a second model that outputs the object data of an object obtained by deforming the shape of the reference object;
The at least one computer, in the second learning process,
generating the object data of the first object obtained by deforming the shape of the reference object based on the second model;
determining a gripping point of the first object;
inputting object data for the first object and multiple sets of grip points for the first object into the first model;
A computer system, wherein the second model is updated based on the difference between the maximum value of the gripping success probability of the first object output from the first model and the target probability.

請求項１に記載の計算機システムであって、
前記少なくとも一つの計算機は、前記第二学習処理において、
前記第一モデルの学習が停滞している場合、前記基準物体からの形状の変化量が前記第一物体より小さい物体の前記物体データが生成されるように前記第二モデルを更新し、
前記第一モデルの学習が進捗している場合、前記基準物体からの形状の変化量が前記第一物体より大きい物体の前記物体データが生成されるように前記第二モデルを更新することを特徴とする計算機システム。 A computer system according to claim 1,
The at least one computer, in the second learning process,
updating the second model so that the object data of an object whose shape change amount from the reference object is smaller than the first object is generated when the learning of the first model is stagnant;
When the learning of the first model is progressing, the second model is updated so that the object data of an object whose shape change amount from the reference object is larger than the first object is generated. computer system.

請求項２に記載の計算機システムであって、
前記少なくとも一つの計算機は、前記第二学習処理において、
前記第二モデルに基づいて生成された前記物体データに対応する物体の形状と前記基準物体の形状との間の類似度の目標範囲を設定し、
前記第一物体の把持成功確率の最大値及び前記目標確率の差が小さくなり、かつ、前記更新された第二モデルに基づいて生成された前記物体データに対応する第二物体の形状と前記基準物体の形状との間の類似度が前記目標範囲内になるように前記第二モデルを更新することを特徴とする計算機システム。 A computer system according to claim 2,
The at least one computer, in the second learning process,
setting a target range of similarity between the shape of the object corresponding to the object data generated based on the second model and the shape of the reference object;
The difference between the maximum value of the gripping success probability of the first object and the target probability becomes small, and the shape of the second object corresponding to the object data generated based on the updated second model and the reference A computer system, wherein the second model is updated so that the degree of similarity with the shape of the object is within the target range.

請求項１に記載の計算機システムであって、
前記第一モデルの更新履歴を管理するための履歴管理情報を保持し、
前記少なくとも一つの計算機は、前記第二学習処理において、
前記第一モデルより出力された前記第一物体の把持成功確率の最大値が目標確率以下である場合、前記第一物体の把持成功確率の統計値及び前記第一物体の把持成功確率の最大値の変化量のいずれかを評価値として算出し、
前記評価値に基づいて、前記第一モデルの学習が停滞しているか否かを判定し、
前記第一モデルの学習が停滞していると判定された場合、前記履歴管理情報を用いて、前記第一モデルを更新することを特徴とする計算機システム。 A computer system according to claim 1,
holding history management information for managing the update history of the first model;
The at least one computer, in the second learning process,
if the maximum value of the probability of success in gripping the first object output from the first model is less than or equal to the target probability, the statistical value of the probability of success in gripping the first object and the maximum value of the probability of success in gripping the first object; Calculate either the amount of change in as an evaluation value,
Based on the evaluation value, determine whether the learning of the first model is stagnant,
A computer system, comprising: updating the first model using the history management information when it is determined that the learning of the first model has stagnated.

把持装置が物体を把持するための制御アルゴリズムを学習する計算機システムが実行する機械学習方法であって、
前記計算機システムは、演算装置及び前記演算装置に接続される記憶装置を有する少なくとも一つの計算機から構成され、
前記機械学習方法は、
前記少なくとも一つの計算機が、前記物体の形状に関する情報を含む物体データ、当該物体を把持する位置の基準となる把持点、及び前記物体の把持の成否を示す情報を含む学習データを用いて、当該物体の把持成功確率を出力する第一モデルを生成するための第一学習処理を実行するステップと、
基準物体の前記物体データに基づいて、前記基準物体の形状を変形させた物体の前記物体データを出力する第二モデルを生成するための第二学習処理を実行するステップと、を含み、
前記第二学習処理は、
前記少なくとも一つの計算機が、前記第二モデルに基づいて、前記基準物体の形状を変形した第一物体の前記物体データを生成する第１のステップと、
前記少なくとも一つの計算機が、前記第一物体の把持点を決定する第２のステップと、
前記少なくとも一つの計算機が、前記第一物体の物体データ及び前記第一物体の把持点の複数の組を前記第一モデルに入力する第３のステップと、
前記第一モデルより出力された前記第一物体の把持成功確率の最大値及び目標確率の差に基づいて、前記第二モデルを更新する第４のステップと、を含むことを特徴とする機械学習方法。 A machine learning method executed by a computer system for learning a control algorithm for a gripping device to grip an object,
The computer system comprises at least one computer having an arithmetic device and a storage device connected to the arithmetic device,
The machine learning method comprises:
The at least one computer uses learning data including object data including information about the shape of the object, gripping points that serve as reference positions for gripping the object, and information indicating success or failure of gripping the object. executing a first learning process for generating a first model that outputs a probability of success in gripping an object;
performing a second learning process for generating a second model that outputs the object data of an object obtained by deforming the shape of the reference object, based on the object data of the reference object;
The second learning process includes
a first step in which the at least one calculator generates the object data of a first object obtained by deforming the shape of the reference object based on the second model;
a second step in which the at least one calculator determines a gripping point of the first object;
a third step in which the at least one calculator inputs object data for the first object and multiple sets of grip points for the first object into the first model;
and a fourth step of updating the second model based on the difference between the maximum value of the gripping success probability of the first object output from the first model and the target probability. Method.

請求項５に記載の機械学習方法であって、
前記第４のステップは、
前記第一モデルの学習が停滞している場合、前記少なくとも一つの計算機が、前記基準物体からの形状の変化量が前記第一物体より小さい物体の前記物体データが生成されるように前記第二モデルを更新するステップと、
前記第一モデルの学習が進捗している場合、前記少なくとも一つの計算機が、前記基準物体からの形状の変化量が前記第一物体より大きい物体の前記物体データが生成されるように前記第二モデルを更新するステップと、を含むことを特徴とする機械学習方法。 The machine learning method of claim 5,
The fourth step is
When the learning of the first model is stagnant, the at least one computer causes the second model to generate the object data of an object whose shape change amount from the reference object is smaller than the first object. updating the model;
When the learning of the first model is progressing, the at least one computer generates the object data of an object whose shape change amount from the reference object is larger than that of the first object. and updating a model.

請求項６に記載の機械学習方法であって、
前記第４のステップは、
前記少なくとも一つの計算機が、前記第二モデルに基づいて生成された前記物体データに対応する物体の形状と前記基準物体の形状との間の類似度の目標範囲を設定するステップと、
前記少なくとも一つの計算機が、前記第一物体の把持成功確率の最大値及び前記目標確率の差が小さくなり、かつ、前記更新された第二モデルに基づいて生成された前記物体データに対応する第二物体の形状と前記基準物体の形状との間の類似度が前記目標範囲内になるように前記第二モデルを更新するステップと、を含むことを特徴とする機械学習方法。 The machine learning method of claim 6,
The fourth step is
setting a target range of similarity between the shape of the object corresponding to the object data generated based on the second model and the shape of the reference object;
The at least one computer calculates the difference between the maximum value of the gripping success probability of the first object and the target probability becomes small, and the object data corresponding to the object data generated based on the updated second model. and updating the second model such that the similarity between two object shapes and the reference object shape is within the target range.

請求項５に記載の機械学習方法であって、
前記計算機システムは、前記第一モデルの更新履歴を管理するための履歴管理情報を保持し、
前記機械学習方法は、
前記少なくとも一つの計算機が、前記第一物体の把持成功確率の統計値及び前記第一物体の把持成功確率の最大値の変化量のいずれかを評価値として算出するステップと、
前記少なくとも一つの計算機が、前記評価値に基づいて、前記第一モデルの学習が停滞しているか否かを判定するステップと、
前記第一モデルの学習が停滞していると判定された場合、前記少なくとも一つの計算機が、前記履歴管理情報を用いて、前記第一モデルを更新するステップと、を含むことを特徴とする機械学習方法。 The machine learning method of claim 5,
the computer system holds history management information for managing the update history of the first model;
The machine learning method comprises:
calculating, by the at least one calculator, either the statistic value of the probability of successful gripping of the first object or the amount of change in the maximum value of the probability of successful gripping of the first object as an evaluation value;
a step in which the at least one computer determines whether learning of the first model has stagnated based on the evaluation value;
and, when it is determined that the learning of the first model has stagnated, the at least one computer updates the first model using the history management information. learning method.