JP7217727B2

JP7217727B2 - Controller, gripping system, method and program

Info

Publication number: JP7217727B2
Application number: JP2020124576A
Authority: JP
Inventors: モハッメドサヒリ; 学嗣浅谷
Original assignee: Exa Wizards Inc
Current assignee: Exa Wizards Inc
Priority date: 2020-07-21
Filing date: 2020-07-21
Publication date: 2023-02-03
Anticipated expiration: 2040-07-21
Also published as: WO2022018936A1; JP2022021147A

Description

本発明は、物体において把持装置に把持させる部分を決定する技術に関する。 The present invention relates to a technique for determining a portion of an object to be gripped by a gripping device.

物体において把持装置に把持させる部分を決定する技術が知られている。例えば、特許文献１に記載された技術は、物体を撮像した画像から物体の輪郭線を検出し、輪郭線を所定量外側にオフセットさせたオフセット線に基づいて、把持装置に把持させる把持位置を決定する。 Techniques for determining a portion of an object to be gripped by a gripping device are known. For example, the technique described in Patent Literature 1 detects the outline of an object from an image of the object, and determines a gripping position to be gripped by the gripping device based on the offset line obtained by offsetting the outline to the outside by a predetermined amount. decide.

特開２０２０－８２２１７号公報（２０２０年６月４日公開）Japanese Patent Application Laid-Open No. 2020-82217 (published on June 4, 2020)

特許文献１に記載された技術は、物体において把持装置に把持させる把持位置をさらに精度よく決定するよう改善する余地がある。 The technique described in Patent Literature 1 has room for improvement so as to more accurately determine the gripping position of the object to be gripped by the gripping device.

本発明の一態様は、物体において把持装置に把持させる把持位置を精度よく決定する技術を実現することを目的とする。 An object of one aspect of the present invention is to implement a technique for accurately determining a gripping position of an object to be gripped by a gripping device.

上記の課題を解決するために、本発明の一態様に係る制御装置は、物体を被写体として含む画像を取得する取得部と、前記画像を入力とする推測モデルを用いて、前記物体の複数の把持候補位置を推測する推測部と、前記複数の把持候補位置を参照して、把持装置に前記物体を把持させる把持位置を決定する決定部と、を備えている。 In order to solve the above problems, a control device according to an aspect of the present invention uses an acquisition unit that acquires an image including an object as a subject, and an inference model that receives the image as an input to obtain a plurality of images of the object. An estimating unit for estimating a gripping candidate position, and a determining unit for determining a gripping position at which the gripping device is to grip the object by referring to the plurality of gripping candidate positions.

本発明の一態様に係る制御装置は、コンピュータによって実現してもよく、この場合には、コンピュータを前記制御装置が備える各部（ソフトウェア要素）として動作させることにより前記制御装置をコンピュータにて実現させるプログラム、およびそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 The control device according to one aspect of the present invention may be realized by a computer. In this case, the control device is realized by the computer by operating the computer as each part (software element) included in the control device. A program and a computer-readable recording medium recording it are also included in the scope of the present invention.

本発明の一態様によれば、物体において把持装置に把持させる把持位置を精度よく決定する技術を実現することができる。 ADVANTAGE OF THE INVENTION According to one aspect of the present invention, it is possible to realize a technique for accurately determining a gripping position of an object to be gripped by a gripping device.

本発明の一実施形態に係る把持システムの概略を示すブロック図である。1 is a schematic block diagram of a grasping system according to an embodiment of the present invention; FIG. 本発明の一実施形態に係る把持システムを構成する各装置の機能的な構成を示すブロック図である。2 is a block diagram showing the functional configuration of each device that constitutes the grasping system according to one embodiment of the present invention; FIG. 本発明の一実施形態に係る把持システムが実行する処理の流れを示すフローチャートである。4 is a flow chart showing the flow of processing executed by the grasping system according to one embodiment of the present invention; 把持対象物を選択する処理の具体例を説明する図である。FIG. 10 is a diagram illustrating a specific example of processing for selecting a gripping target; 推測モデルに入力される画像の具体例を説明する図である。FIG. 4 is a diagram illustrating a specific example of an image input to an inference model; 推測モデルの構成例を模式的に示す図である。It is a figure which shows the structural example of an inference model typically. バウンディングボックスを説明する模式図である。FIG. 4 is a schematic diagram for explaining a bounding box; 本発明の一実施形態に係る把持システムが推測モデルを生成する処理の流れを示すフローチャートである。4 is a flow chart showing the flow of processing for generating an inference model by the gripping system according to one embodiment of the present invention. 教師データとして用いられる画像の一例を示す模式図である。FIG. 4 is a schematic diagram showing an example of an image used as teacher data; 把持位置を決定する処理の詳細な流れを示すフローチャートである。4 is a flowchart showing a detailed flow of processing for determining a gripping position; 把持位置を決定する処理の変形例を示すフローチャートである。9 is a flow chart showing a modified example of processing for determining a gripping position; バウンディングボックスと物体領域との関係性を説明する模式図である。FIG. 4 is a schematic diagram for explaining the relationship between bounding boxes and object regions; 把持位置を決定する処理の他の変形例を示すフローチャートである。FIG. 11 is a flowchart showing another modification of the grip position determination process; FIG. 把持候補位置を修正する処理の具体例を説明する図である。FIG. 10 is a diagram illustrating a specific example of processing for correcting gripping candidate positions; 本発明の一実施形態に係る把持システムを構成する各装置の物理的構成を例示したブロック図である。1 is a block diagram illustrating the physical configuration of each device that constitutes a grasping system according to an embodiment of the present invention; FIG.

〔実施形態〕
以下、本発明の一実施形態に係る把持システム１について説明する。 [Embodiment]
A grasping system 1 according to an embodiment of the present invention will be described below.

＜把持システム１の概要＞
図１は、本発明の一実施形態に係る把持システム１の概略を示すブロック図である。図１に示すように、把持システム１は、制御装置１０と、学習装置２０と、ロボットアーム３０と、撮像装置４０とを含む。把持システム１は、載置台ｗの上に載置された物体（ｏｂｊ１、またはｏｂｊ２）を把持するようロボットアーム３０を制御するシステムである。 <Overview of gripping system 1>
FIG. 1 is a block diagram showing an outline of a grasping system 1 according to one embodiment of the invention. As shown in FIG. 1, the grasping system 1 includes a control device 10, a learning device 20, a robot arm 30, and an imaging device 40. The gripping system 1 is a system that controls the robot arm 30 to grip an object (obj1 or obj2) placed on the mounting table w.

制御装置１０は、学習装置２０、ロボットアーム３０、および撮像装置４０と、それぞれ通信可能に接続される。ロボットアーム３０は、本発明における把持装置の一例である。例えば、制御装置１０と各装置とは、ネットワークを介して接続される。この場合、ネットワークは、有線ＬＡＮ（Local Area Network）、無線ＬＡＮ、インターネット、公衆回線網、モバイルデータ通信網、またはこれらの組み合わせである。 The control device 10 is communicably connected to the learning device 20, the robot arm 30, and the imaging device 40, respectively. Robot arm 30 is an example of a gripping device in the present invention. For example, the control device 10 and each device are connected via a network. In this case, the network is a wired LAN (Local Area Network), a wireless LAN, the Internet, a public line network, a mobile data communication network, or a combination thereof.

なお、図１に示す例では、制御装置１０および学習装置２０は、ネットワークを介して通信可能に接続された物理的に異なる装置であるが、これは本実施形態を限定するものではない。例えば、制御装置１０および学習装置２０は、物理的に１つのコンピュータによって一体に形成されていてもよい。また、図１に示す例では、制御装置１０およびロボットアーム３０は、ネットワークを介して通信可能に接続された物理的に異なる装置であるが、これは本実施形態を限定するものではない。例えば、制御装置１０は、ロボットアーム３０に内蔵されていてもよい。また、図１に示す例では、ロボットアーム３０および撮像装置４０は、物理的に連結された異なる装置であるが、これは本実施形態を限定するものではない。例えば、撮像装置４０は、ロボットアーム３０に内蔵されていてもよい。 In the example shown in FIG. 1, the control device 10 and the learning device 20 are physically different devices that are communicably connected via a network, but this does not limit the present embodiment. For example, the control device 10 and the learning device 20 may be physically integrated into one computer. Also, in the example shown in FIG. 1, the control device 10 and the robot arm 30 are physically different devices that are communicably connected via a network, but this does not limit the present embodiment. For example, the control device 10 may be built into the robot arm 30 . Also, in the example shown in FIG. 1, the robot arm 30 and the imaging device 40 are different physically coupled devices, but this is not a limitation of the present embodiment. For example, the imaging device 40 may be built into the robot arm 30 .

また、図１に示す例では、把持システム１が、ロボットアーム３０および撮像装置４０を１つずつ含んでいるが、これは本実施形態を限定するものではない。把持システム１は、複数のロボットアーム３０および複数の撮像装置４０を含んでいてもよい。また、図１に示す例では、１つのロボットアーム３０に対して１つの撮像装置４０が設けられているが、これは本実施形態を限定するものではない。把持システム１において、１つのロボットアーム３０に対応して複数の撮像装置４０が設けられていてもよいし、複数のロボットアーム３０に対応して１つの撮像装置４０が設けられていてもよい。 Also, in the example shown in FIG. 1, the grasping system 1 includes one robot arm 30 and one imaging device 40, but this does not limit the present embodiment. The gripping system 1 may include multiple robotic arms 30 and multiple imaging devices 40 . In the example shown in FIG. 1, one imaging device 40 is provided for one robot arm 30, but this does not limit the present embodiment. In the grasping system 1 , a plurality of imaging devices 40 may be provided corresponding to one robot arm 30 , or one imaging device 40 may be provided corresponding to a plurality of robot arms 30 .

把持システム１において、制御装置１０は、物体を被写体として含む画像を取得し、取得した画像を入力とする推測モデルを用いて、当該物体の複数の把持候補位置を推測する。また、制御装置１０は、推測した複数の把持候補位置を参照して、ロボットアーム３０に物体を把持させる把持位置を決定する。 In the gripping system 1, the control device 10 acquires an image including an object as a subject, and estimates a plurality of gripping candidate positions of the object using an estimation model using the acquired image as an input. In addition, the control device 10 refers to the plurality of estimated gripping candidate positions to determine the gripping position at which the robot arm 30 is to grip the object.

（物体を被写体として含む画像）
物体を被写体として含む画像とは、物体が撮像されることにより生成された画像である。本実施形態において、当該画像は、少なくとも１つの物体を被写体として含む。例えば、撮像装置４０が物体ｏｂｊ１を撮像した画像、物体ｏｂｊ２を撮像した画像、および物体ｏｂｊ１、ｏｂｊ２の両方を撮像した画像は、それぞれ、物体を被写体として含む画像の一例である。 (Image containing an object as a subject)
An image including an object as a subject is an image generated by capturing an image of the object. In this embodiment, the image includes at least one object as a subject. For example, an image captured by the imaging device 40 of the object obj1, an image of the object obj2, and an image of both the objects obj1 and obj2 are examples of images including objects as subjects.

（把持候補位置、および把持位置）
把持候補位置とは、ロボットアーム３０に物体を把持させる把持位置の候補である。把持位置とは、現実空間に存在する物体において、当該物体をロボットアーム３０に把持させる把持部分の現実空間における位置である。本実施形態では、把持位置および把持候補位置は、画像上の領域によって特定される。換言すると、当該画像上の領域は、把持部分の現実空間における位置を画像上において示す領域である。画像上の領域の詳細については後述する。 (Gripping candidate position and gripping position)
A gripping candidate position is a gripping position candidate at which the robot arm 30 is caused to grip an object. The gripping position is the position in the physical space of the gripped portion of the object that exists in the physical space that the robot arm 30 grips. In this embodiment, the grip position and grip candidate positions are identified by areas on the image. In other words, the area on the image indicates the position of the gripped portion in the physical space on the image. Details of the area on the image will be described later.

＜把持システム１の効果＞
把持システム１によれば、推測モデルを用いて推測した複数の把持候補位置を参照して把持位置を決定するので、物体において把持装置に把持させる把持位置を精度よく決定することができる。 <Effect of gripping system 1>
According to the gripping system 1, since the gripping position is determined by referring to a plurality of gripping candidate positions estimated using the estimation model, it is possible to accurately determine the gripping position of the object to be gripped by the gripping device.

＜把持システム１の機能的な構成＞
続いて、把持システム１を構成する各装置の機能的な構成について説明する。図２は、各装置の機能的な構成を示すブロック図である。 <Functional Configuration of Grasping System 1>
Next, the functional configuration of each device that configures the grasping system 1 will be described. FIG. 2 is a block diagram showing the functional configuration of each device.

（制御装置１０の機能的な構成）
図２に示すように、制御装置１０は、制御部１１と、記憶部１２とを含む。制御部１１は、取得部１１１と、推測部１１２と、決定部１１３とを含む。 (Functional configuration of control device 10)
As shown in FIG. 2 , the control device 10 includes a control section 11 and a storage section 12 . Control unit 11 includes acquisition unit 111 , estimation unit 112 , and determination unit 113 .

取得部１１１は、物体を被写体として含む画像を取得する。 Acquisition unit 111 acquires an image including an object as a subject.

推測部１１２は、推測モデル２２１を用いて、物体の複数の把持候補位置を推測する。本実施形態では、推測モデル２２１は、学習装置２０によってあらかじめ生成され、学習装置２０に記憶されている。推測部１１２は、学習装置２０に画像を送信することにより、当該画像を入力として推測モデル２２１から出力される情報を、学習装置２０から受信する。推測モデル２２１の詳細については後述する。 The estimation unit 112 estimates a plurality of gripping candidate positions of the object using the estimation model 221 . In this embodiment, the inference model 221 is generated in advance by the learning device 20 and stored in the learning device 20 . By transmitting an image to the learning device 20 , the estimating unit 112 receives information output from the inference model 221 with the image as input, from the learning device 20 . Details of the estimation model 221 will be described later.

決定部１１３は、複数の把持候補位置を参照して、ロボットアーム３０に把持させる把持位置を決定する。 The determining unit 113 determines a gripping position to be gripped by the robot arm 30 by referring to the plurality of gripping candidate positions.

記憶部１２は、制御部１１が参照する各種データを記憶する。 The storage unit 12 stores various data referred to by the control unit 11 .

（学習装置２０の機能的な構成）
図２に示すように、学習装置２０は、制御部２１と、記憶部２２とを含む。制御部２１は、学習部２１１を含む。 (Functional configuration of learning device 20)
As shown in FIG. 2 , learning device 20 includes control unit 21 and storage unit 22 . Control unit 21 includes learning unit 211 .

学習部２１１は、物体を被写体として含む画像を入力とする推測モデル２２１であって、物体の複数の把持候補位置を推測するために用いる推測モデル２２１を、機械学習により生成する。推測モデル２２１の詳細については後述する。学習部２１１は、制御装置１０から画像を受信すると、当該画像を推測モデル２２１に入力し、推測モデル２２１から出力される情報を、制御装置１０に対して送信する。 The learning unit 211 generates, by machine learning, an estimation model 221 that receives an image including an object as a subject and is used for estimating a plurality of gripping candidate positions of the object. Details of the estimation model 221 will be described later. Upon receiving an image from the control device 10 , the learning unit 211 inputs the image to the estimation model 221 and transmits information output from the estimation model 221 to the control device 10 .

記憶部２２は、制御部２１が参照する各種データを記憶する。また、記憶部２２は、学習部２１１が生成した推測モデル２２１を記憶する。 The storage unit 22 stores various data referred to by the control unit 21 . In addition, the storage unit 22 stores the estimation model 221 generated by the learning unit 211 .

（ロボットアーム３０の構成）
ロボットアーム３０は、制御装置１０の制御に基づいて、物体を把持する把持動作を実行する。具体的には、図１および図２に示すように、ロボットアーム３０は、複数の回転軸を有する多関節ロボットであり、台座部３１と、ベース部３２と、アーム部３３と、ハンド部３４とを含む。 (Configuration of robot arm 30)
The robot arm 30 performs a gripping operation for gripping an object under the control of the control device 10 . Specifically, as shown in FIGS. 1 and 2, the robot arm 30 is an articulated robot having a plurality of rotation axes, and includes a pedestal 31, a base 32, an arm 33, and a hand 34. including.

台座部３１は、ロボットアーム３０の設置面に設置される。設置面とは、例えば、床であるが、これに限られない。台座部３１は、制御装置１０の制御に基づいて設置面を移動可能であり得る。例えば、台座部３１は、設置面に接する車輪を有していてもよい。 The pedestal part 31 is installed on the installation surface of the robot arm 30 . The installation surface is, for example, the floor, but is not limited to this. The pedestal portion 31 may be movable on the installation surface based on the control of the control device 10 . For example, the pedestal portion 31 may have wheels that come into contact with the installation surface.

ベース部３２は、台座部３１に対し、旋回可能に連結される。 The base portion 32 is rotatably connected to the pedestal portion 31 .

アーム部３３は、複数のアームを含む。各アームの基端部は、ベース部３２または他のアームの先端部に対して、定められた軸まわりに回転可能に連結される。また、アーム部３３の先端にはハンド部３４が、定められた軸まわりに回転可能に接続される。また、アーム部３３の先端付近には、撮像装置４０が連結される。 Arm portion 33 includes a plurality of arms. The proximal end of each arm is rotatably coupled to the base portion 32 or the distal end of the other arm about a defined axis. A hand portion 34 is connected to the tip of the arm portion 33 so as to be rotatable around a predetermined axis. An imaging device 40 is connected near the tip of the arm portion 33 .

ハンド部３４は、一対の指部３４ａ、３４ｂを含む。ハンド部３４は、制御部１１の制御に基づいて、指部３４ａ、３４ｂを互いに離れるように開く動作と、互いに接近するよう閉じる動作とを行う。上述したロボットアーム３０の把持動作は、ハンド部３４の開閉により実現される。 The hand portion 34 includes a pair of finger portions 34a, 34b. Under the control of the control unit 11, the hand unit 34 performs an operation of opening the fingers 34a and 34b away from each other and an operation of closing the fingers 34a and 34b toward each other. The gripping operation of the robot arm 30 described above is realized by opening and closing the hand portion 34 .

ロボットアーム３０は、制御部１１の制御に基づいて、台座部３１の移動、ベース部３２の旋回、および各アームの回転の一部または全部を実行することにより、ハンド部３４を所望の位置に移動する。 Under the control of the control unit 11 , the robot arm 30 moves the pedestal 31 , turns the base unit 32 , and partially or entirely rotates each arm, thereby moving the hand unit 34 to a desired position. Moving.

（撮像装置４０の構成）
撮像装置４０は、制御装置１０の制御に基づいて、載置台ｗの上に載置された物体ｏｂｊ１およびｏｂｊ２の一部または全部を撮像した画像を生成する。例えば、撮像装置４０の撮像方向および画角は、載置台ｗの上を撮像範囲とするよう、制御装置１０の制御に基づいて変更される。 (Configuration of imaging device 40)
Under the control of the control device 10, the imaging device 40 generates an image of part or all of the objects obj1 and obj2 placed on the mounting table w. For example, the imaging direction and angle of view of the imaging device 40 are changed based on the control of the control device 10 so that the top of the mounting table w is the imaging range.

＜把持システム１の処理＞
以上のように構成された把持システム１が実行する処理の流れについて、図３を参照して説明する。図３は、把持システム１が実行する処理の流れを示すフローチャートである。 <Processing of gripping system 1>
A flow of processing executed by the grasping system 1 configured as described above will be described with reference to FIG. FIG. 3 is a flow chart showing the flow of processing executed by the gripping system 1. As shown in FIG.

（ステップＳ１０１）
ステップＳ１０１において、制御装置１０の取得部１１１は、物体を被写体として含む画像を取得する。例えば、取得部１１１は、撮像装置４０から、載置台ｗ上を撮像した画像を取得する。図４に示す画像Ｇ１０１は、当該ステップにおいて取得された画像の一例である。画像Ｇ１０１は、載置台ｗ上に載置された物体ｏｂｊ１およびｏｂｊ２を被写体として含んでいる。 (Step S101)
In step S101, the acquisition unit 111 of the control device 10 acquires an image including an object as a subject. For example, the acquisition unit 111 acquires an image of the mounting table w from the imaging device 40 . An image G101 shown in FIG. 4 is an example of the image acquired in this step. The image G101 includes objects obj1 and obj2 placed on the table w as subjects.

（ステップＳ１０２）
ステップＳ１０２において、制御部１１は、取得部１１１によって取得された画像から、１または複数の物体を検出する。画像に被写体として含まれる物体を検出する手法には、公知の手法を適用可能である。図４に示す画像Ｇ１０２は、制御部１１によって検出された物体を模式的に示している。この例では、制御部１１は、画像Ｇ１０２において、物体ｏｂｊ１を含む領域Ｒ１、および物体ｏｂｊ２を含む領域Ｒ２を検出している。 (Step S102)
In step S102 , the control unit 11 detects one or more objects from the image acquired by the acquisition unit 111 . A known method can be applied to a method of detecting an object included in an image as a subject. An image G102 shown in FIG. 4 schematically shows an object detected by the control unit 11. As shown in FIG. In this example, the control unit 11 detects a region R1 containing the object obj1 and a region R2 containing the object obj2 in the image G102.

（ステップＳ１０３）
ステップＳ１０３において、制御部１１は、画像から検出した１または複数の物体のうち、把持対象の物体を選択する。把持対象の物体を選択する条件は、予め定められている。例えば、制御部１１は、画像において占める面積に関する条件（例えば、最も大きい）を満たす物体を、把持対象として選択してもよい。また、例えば、制御部１１は、画像における位置に関する条件（例えば、中央に最も近い、右下に最も近い、等）を満たす物体を、把持対象として選択してもよい。図４に示す画像Ｇ１０３は、把持対象として選択された物体を模式的に示している。この例では、画像における位置が右下に最も近いとの条件が適用されている。つまり、領域Ｒ１および領域Ｒ２のうち、領域Ｒ１が画像の右下に最も近い。そこで、制御部１１は、領域Ｒ１に含まれる物体ｏｂｊ１を把持対象として選択している。ただし、把持対象の物体を選択する条件は、これらに限られない。 (Step S103)
In step S103, the control unit 11 selects an object to be gripped from one or more objects detected from the image. A condition for selecting an object to be gripped is determined in advance. For example, the control unit 11 may select an object that satisfies a condition regarding the area occupied in the image (for example, the largest) as a gripping target. Also, for example, the control unit 11 may select an object that satisfies a positional condition in the image (for example, closest to the center, closest to the lower right, etc.) as a gripping target. An image G103 shown in FIG. 4 schematically shows an object selected as a gripping target. In this example, the condition is applied that the position in the image is closest to the bottom right. That is, of the regions R1 and R2, the region R1 is closest to the bottom right of the image. Therefore, the control unit 11 selects the object obj1 included in the region R1 as a grip target. However, the conditions for selecting an object to be gripped are not limited to these.

（ステップＳ１０４）
ステップＳ１０４において、制御部１１は、把持対象として選択した物体に応じて、当該物体の把持開始位置までの経路を生成する。具体的には、制御部１１は、当該物体の現実空間における位置に応じて、把持開始位置を決定する。把持開始位置は、把持動作開始時の現実空間におけるロボットアーム３０の位置である。また、制御部１１は、ロボットアーム３０の現実空間における現在位置から把持開始位置までの経路を生成する。把持開始位置までの経路を生成する手法には、公知の技術を適用可能である。 (Step S104)
In step S104 , the control unit 11 generates a route to the gripping start position of the object selected as the gripping target. Specifically, the control unit 11 determines the gripping start position according to the position of the object in the physical space. The grip start position is the position of the robot arm 30 in the physical space when the grip operation is started. The control unit 11 also generates a path from the current position of the robot arm 30 in the physical space to the grip start position. A known technique can be applied to the method of generating the path to the grip start position.

（ステップＳ１０５）
ステップＳ１０５において、制御部１１は、ロボットアーム３０を、決定した経路にしたがって移動させるよう制御する。 (Step S105)
In step S105, the controller 11 controls the robot arm 30 to move along the determined route.

（ステップＳ１０６）
ステップＳ１０６において、取得部１１１は、推測モデル２２１に入力する画像を、撮像装置４０から取得する。推測モデル２２１に入力する画像は、把持対象として選択した物体を含む画像である。具体的には、取得部１１１は、撮像範囲に選択した物体を含むように、撮像装置４０の撮像方向および画角の一方または両方を制御する。例えば、撮影方向および画角は、撮像範囲に、選択した物体が含まれるとともに選択しなかった物体が含まれないよう制御されることが好ましい。ただし、これは、本実施形態を限定するものではない。例えば、選択した物体と選択しなかった物体とが近接している場合等では、撮像範囲に、選択した物体とともに選択しなかった物体が含まれていてもよい。また、制御部１１は、当該撮像範囲を撮像するよう撮像装置４０を制御する。また、取得部１１１は、撮像装置４０から、当該撮像範囲を撮像した画像を取得する。 (Step S106)
In step S106 , the acquisition unit 111 acquires an image to be input to the estimation model 221 from the imaging device 40 . The image input to the inference model 221 is an image including the object selected as the grip target. Specifically, the acquisition unit 111 controls one or both of the imaging direction and the angle of view of the imaging device 40 so that the selected object is included in the imaging range. For example, the imaging direction and angle of view are preferably controlled so that the imaging range includes the selected object and excludes the unselected object. However, this does not limit the present embodiment. For example, when the selected object and the unselected object are close to each other, the imaging range may include the unselected object together with the selected object. Also, the control unit 11 controls the imaging device 40 to capture an image of the imaging range. Further, the acquisition unit 111 acquires an image obtained by capturing the imaging range from the imaging device 40 .

図５は、当該ステップにおいて取得される、推測モデル２２１に入力する画像の一例を示す図である。図５に示す画像Ｇ１０４は、ステップＳ１０３で選択した物体ｏｂｊ１を被写体として含み、選択しなかった物体ｏｂｊ２を被写体として含まない。 FIG. 5 is a diagram showing an example of an image to be input to the inference model 221 acquired in this step. An image G104 shown in FIG. 5 includes the object obj1 selected in step S103 as a subject and does not include the unselected object obj2 as a subject.

（ステップＳ１０７）
ステップＳ１０７において、推測部１１２は、推測モデル２２１を用いて、画像に被写体として含まれる物体の複数の把持候補位置を推測する。具体的には、推測部１１２は、ステップＳ１０６で取得した画像を、学習装置２０に送信する。学習装置２０は、受信した画像を推測モデル２２１に入力し、推測モデル２２１から出力される情報を、制御装置１０に送信する。制御装置１０は、受信した情報が示す複数の把持候補位置を、推測した把持候補位置として取得する。 (Step S107)
In step S107 , the estimation unit 112 uses the estimation model 221 to estimate a plurality of gripping candidate positions of an object included as a subject in the image. Specifically, the estimation unit 112 transmits the image acquired in step S106 to the learning device 20 . The learning device 20 inputs the received image to the inference model 221 and transmits information output from the inference model 221 to the control device 10 . The control device 10 acquires a plurality of gripping candidate positions indicated by the received information as estimated gripping candidate positions.

（推測モデル２２１）
ここで、推測モデル２２１の詳細について説明する。推測モデル２２１は、物体を被写体として含む画像を入力として、当該物体における複数の把持候補位置の各々を少なくとも示す情報を出力するよう、機械学習により生成された学習済みのモデルである。推測モデル２２１から出力される情報は、複数の把持候補位置の各々を特定する画像上の領域を示す情報を含む。 (Speculation model 221)
Details of the inference model 221 will now be described. The estimation model 221 is a trained model generated by machine learning so as to receive an image including an object as a subject and output information indicating at least each of a plurality of candidate gripping positions of the object. The information output from the inference model 221 includes information indicating regions on the image that specify each of the plurality of candidate gripping positions.

本実施形態では、推測モデル２２１は、ＣＮＮ（Convolutional Neural Network、畳み込みニューラルネットワーク）である。図６は、推測モデル２２１の構成例を模式的に示す図である。 In this embodiment, the inference model 221 is a CNN (Convolutional Neural Network). FIG. 6 is a diagram schematically showing a configuration example of the inference model 221. As shown in FIG.

図６に示すように、推測モデル２２１は、入力層Ｌ０と、畳み込み層Ｌ１～Ｌ５と、全結合層Ｌ６～Ｌ８とを含む。全結合層Ｌ８は出力層であり、３つのサブレイヤＬ８－１～Ｌ８－３を含む。 As shown in FIG. 6, the inference model 221 includes an input layer L0, convolutional layers L1-L5, and fully connected layers L6-L8. The fully connected layer L8 is the output layer and includes three sublayers L8-1 to L8-3.

物体ｏｂｊ１を被写体として含む画像Ｇは、入力層Ｌ０に入力される。画像Ｇに被写体として含まれる物体ｏｂｊ１は、把持対象として選択された物体ｏｂｊ１である。画像Ｇは、把持対象として選択されなかった物体ｏｂｊ２を含まない。 An image G including an object obj1 as a subject is input to the input layer L0. An object obj1 included as a subject in the image G is the object obj1 selected as a gripping target. Image G does not include object obj2 that has not been selected as a gripping target.

図６に示すように、サブレイヤＬ８－１から出力される出力情報ｇ１は、把持候補位置ＣＰ１を示す情報と、把持成功確率ｐ１を示す情報とを含む。サブレイヤＬ８－２から出力される出力情報ｇ２は、把持候補位置ＣＰ２を示す情報と、把持成功確率ｐ２を示す情報とを含む。サブレイヤＬ８－３から出力される出力情報ｇ３は、把持候補位置ＣＰ３を示す情報と、把持成功確率ｐ３を示す情報とを含む。把持候補位置ＣＰ１～ＣＰ３は、それぞれ、異なる位置を示す。ただし、これらを特に区別する必要がない場合には、単に出力情報ｇ、把持候補位置ＣＰ、および把持成功確率ｐとも記載する。出力層Ｌ８におけるサブレイヤＬ８－１～Ｌ８－３の個数は、推測モデル２２１を用いて推測される把持候補位置ＣＰの個数に対応している。図６に示す例では、当該個数が３であるが、これは、本実施形態を限定するものではない。サブレイヤの個数、すなわち、推測される把持候補位置ＣＰの個数は、２であってもよいし、４以上であってもよい。 As shown in FIG. 6, the output information g1 output from the sublayer L8-1 includes information indicating the gripping candidate position CP1 and information indicating the gripping success probability p1. The output information g2 output from the sublayer L8-2 includes information indicating the gripping candidate position CP2 and information indicating the gripping success probability p2. The output information g3 output from the sublayer L8-3 includes information indicating the gripping candidate position CP3 and information indicating the gripping success probability p3. The candidate gripping positions CP1 to CP3 indicate different positions. However, when there is no particular need to distinguish them, the output information g, the gripping candidate position CP, and the gripping success probability p are also simply described. The number of sublayers L8-1 to L8-3 in the output layer L8 corresponds to the number of gripping candidate positions CP estimated using the estimation model 221. FIG. In the example shown in FIG. 6, the number is three, but this does not limit the present embodiment. The number of sublayers, that is, the number of estimated gripping candidate positions CP may be two, or may be four or more.

（把持候補位置、バウンディングボックス）
把持候補位置ＣＰは、画像Ｇ上の領域によって特定される。本実施形態では、把持候補位置ＣＰを特定する領域の形状は、矩形である。当該矩形領域を、以降、バウンディングボックスとも記載する。 (Grip candidate position, bounding box)
The candidate grip position CP is identified by an area on the image G. FIG. In this embodiment, the shape of the area for specifying the gripping candidate position CP is rectangular. The rectangular area is hereinafter also referred to as a bounding box.

ここで、推測モデル２２１から出力される出力情報ｇは、次式（１）によって表される。 Here, the output information g output from the estimation model 221 is represented by the following equation (1).

ｇ＝｛ｘ，ｙ，θ，ｈ，ｗ，ｐ｝・・・（１）
式（１）に含まれる６つのパラメータのうち５つｘ，ｙ，θ，ｈ，およびｗは、バウンディングボックスを表している。当該６つのパラメータのうち他の１つｐは、当該バウンディングボックスが示す把持候補位置ＣＰにおける把持成功確率を示す。把持成功確率ｐとは、当該把持候補位置ＣＰにおいてロボットアーム３０に把持動作を実行させた場合に物体ｏｂｊ１の把持に成功する確率である。 g={x, y, θ, h, w, p} (1)
Five of the six parameters included in equation (1), x, y, θ, h, and w, represent bounding boxes. The other one p of the six parameters indicates the gripping success probability at the gripping candidate position CP indicated by the bounding box. The gripping success probability p is the probability of successfully gripping the object obj1 when the robot arm 30 is caused to perform a gripping operation at the gripping candidate position CP.

図７は、バウンディングボックスを説明する模式図である。図７に示すバウンディングボックスＢＢは、中心Ｃの座標（ｘ，ｙ）、傾きθ、短辺の長さｈ、および長辺の長さｗによって特定される。ここでは、傾きθは、ｘ軸に対する長辺の傾きを示している。ただし、傾きθは、画像Ｇに規定されるその他の軸を基準として表されたものであってもよい。 FIG. 7 is a schematic diagram explaining a bounding box. The bounding box BB shown in FIG. 7 is specified by the coordinates (x, y) of the center C, the inclination θ, the length h of the short side, and the length w of the long side. Here, the slope θ indicates the slope of the long side with respect to the x-axis. However, the tilt θ may be expressed with reference to another axis defined in the image G.

バウンディングボックスＢＢの２つの短辺は、把持動作の開始前に指部３４ａおよび３４ｂを配置する位置を示す。具体的には、２つの短辺のうち辺ｂ１は、指部３４ａを配置する範囲を示す。辺ｂ２は、指部３４ｂを配置する範囲を示す。 The two short sides of bounding box BB indicate where to place fingers 34a and 34b prior to initiation of a grasping motion. Specifically, the side b1 of the two short sides indicates the range in which the finger portion 34a is arranged. A side b2 indicates a range in which the finger portion 34b is arranged.

バウンディングボックスＢＢの長辺の長さｗは、上述した配置位置に配置された指部３４ａおよび３４ｂ間の距離を表している。つまり、バウンディングボックスＢＢは、長辺が長いほど、把持動作の開始前にハンド部３４を大きく開く必要があることを表す。 The length w of the long side of the bounding box BB represents the distance between the fingers 34a and 34b arranged at the arrangement positions described above. That is, the longer the long side of the bounding box BB is, the more the hand part 34 needs to be opened before starting the gripping operation.

（ステップＳ１０８）
ステップＳ１０８において、決定部１１３は、複数の把持候補位置ＣＰを参照して、ロボットアーム３０に物体を把持させる把持位置を決定する。決定した把持位置は、上述したバウンディングボックスＢＢで表される。当該ステップの詳細については後述する。 (Step S108)
In step S108, the determining unit 113 refers to the plurality of candidate gripping positions CP to determine gripping positions at which the robot arm 30 is caused to grip the object. The determined grip position is represented by the bounding box BB described above. The details of this step will be described later.

（ステップＳ１０９）
ステップＳ１０９において、制御部１１は、決定した把持位置において物体を把持するようロボットアーム３０を制御する。具体的には、制御部１１は、決定した把持位置にハンド部３４を配置し、ロボットアーム３０に把持動作を実行させる。 (Step S109)
In step S109, the control unit 11 controls the robot arm 30 to grip the object at the determined gripping position. Specifically, the control unit 11 arranges the hand unit 34 at the determined gripping position, and causes the robot arm 30 to perform the gripping operation.

例えば、図７に示すバウンディングボックスＢＢが、決定した把持位置を表しているとする。この場合、制御部１１は、バウンディングボックスＢＢの２つの短辺に対応する現実空間の配置位置を算出する。また、制御部１１は、バウンディングボックスＢＢの長辺の長さｗに対応する現実空間の距離を算出する。次に、制御部１１は、指部３４ａおよび３４ｂを、算出した距離だけ開くとともに算出した配置位置に配置するよう制御する。その後、制御部１１は、ロボットアーム３０を制御して把持動作を実行させる。具体的には、制御部１１は、指部３４ａおよび３４ｂを閉じるようハンド部３４を制御することにより、ロボットアーム３０に物体を把持させる。 For example, assume that the bounding box BB shown in FIG. 7 represents the determined gripping position. In this case, the control unit 11 calculates the arrangement positions in the physical space corresponding to the two short sides of the bounding box BB. The control unit 11 also calculates the distance in the physical space corresponding to the length w of the long side of the bounding box BB. Next, the control unit 11 controls to open the finger portions 34a and 34b by the calculated distance and arrange them at the calculated arrangement position. After that, the control unit 11 controls the robot arm 30 to perform the gripping operation. Specifically, the control unit 11 causes the robot arm 30 to grip the object by controlling the hand unit 34 to close the fingers 34a and 34b.

＜推測モデル２２１の生成処理＞
次に、ステップＳ１０７で用いる推測モデル２２１を生成する生成処理について説明する。図８は、推測モデル２２１を生成する処理の詳細な流れを示すフローチャートである。 <Generation processing of inference model 221>
Next, generation processing for generating the inference model 221 used in step S107 will be described. FIG. 8 is a flowchart showing a detailed flow of processing for generating the inference model 221. As shown in FIG.

（ステップＳ２０１）
ステップＳ２０１において、学習装置２０の学習部２１１は、教師データとして用いる１または複数の画像を取得する。各画像は、物体を被写体として含む。また、各画像には、複数の把持候補位置ＣＰおよびその把持成功確率ｐをそれぞれ示す情報が関連付けられている。 (Step S201)
In step S201, the learning unit 211 of the learning device 20 acquires one or more images to be used as teacher data. Each image contains an object as a subject. In addition, each image is associated with information indicating a plurality of candidate gripping positions CP and their gripping success probabilities p.

また、学習部２１１は、取得した各画像に事前処理を施してから、教師データとして用いる。また、学習部２１１は、取得した各画像にデータオーギュメンテーション処理を施すことにより、教師データとして用いる画像の数を増加させる。 Also, the learning unit 211 performs preprocessing on each acquired image before using it as teacher data. In addition, the learning unit 211 increases the number of images used as teacher data by performing data augmentation processing on each acquired image.

（事前処理）
例えば、取得された各画像がＲＧＢ形式であるとする。この場合、学習部２１１は、（ｉ）各画像に対して、グレースケール形式に変換する事前処理を行ってもよい。また、学習部２１１は、（ｉｉ）各画像に対して、エッジを検出する事前処理を行ってもよい。また、学習部２１１は、各画像に対して、（ｉ）、（ｉｉ）を組み合わせた事前処理を行ってもよい。なお、学習部２１１は、事前処理を行うことなく、（ｉｉｉ）元のＲＧＢ形式の各画像を教師データとして用いてもよい。 (pretreatment)
For example, assume that each captured image is in RGB format. In this case, the learning unit 211 may (i) perform preprocessing for converting each image into a grayscale format. Also, the learning unit 211 may (ii) perform pre-processing for edge detection on each image. Further, the learning unit 211 may perform preprocessing combining (i) and (ii) on each image. Note that the learning unit 211 may (iii) use each image in the original RGB format as teacher data without performing preprocessing.

一例として、学習部２１１は、ＲＧＢ形式からＧＧＧ形式に変換した各画像を教師データとして用いてもよい。ここで、ＧＧＧ形式とは、３つのチャンネル（Ｇ、Ｇ、およびＧ）の各々にグレースケール画像を格納した形式である。各チャンネルが表すグレースケール画像は、同一のＲＧＢ形式の画像から生成された、互いに異なるグレースケール画像である。例えば、あるチャンネルが示すグレースケール画像は、他のチャンネルが示すグレースケール画像の明度を変更したものであってもよい。また、各チャンネルが示すグレースケール画像は、元のＲＧＢ形式の画像に対して、互いに異なるグレースケール変換処理を施すことにより生成されたものであってもよい。 As an example, the learning unit 211 may use each image converted from RGB format to GGG format as teacher data. Here, the GGG format is a format in which a grayscale image is stored in each of three channels (G, G, and G). The grayscale images represented by the channels are different grayscale images generated from the same RGB format image. For example, a grayscale image indicated by one channel may be obtained by changing the brightness of a grayscale image indicated by another channel. Also, the grayscale image indicated by each channel may be generated by performing different grayscale conversion processes on the original RGB format image.

また、他の例として、学習部２１１は、ＲＧＢ形式からＣＣＧ形式に変換した各画像を教師データとして用いてもよい。ここで、ＣＣＧ形式とは、３つのチャンネルのうち２つのチャンネルの各々にエッジ画像を格納し、他の１つのチャンネル（Ｇ）にグレースケール画像を格納した形式である。例えば、２つのチャンネル（Ｃ、およびＣ）の各々が示すエッジ画像は、元のＲＧＢ形式の画像に対して、互いに異なるエッジ検出処理を施すことにより生成されたものであってもよい。 As another example, the learning unit 211 may use each image converted from RGB format to CCG format as teacher data. Here, the CCG format is a format in which an edge image is stored in each of two channels out of three channels, and a grayscale image is stored in the other one channel (G). For example, the edge images indicated by each of the two channels (C and C) may be generated by performing different edge detection processes on the original RGB format image.

（データオーギュメンテーション）
図９は、教師データとして用いられる画像の一例を示す模式図である。図９において、画像Ｇ２～Ｇ８は、画像Ｇ１に対してデータオーギュメンテーション処理を施して生成した画像である。 (data augmentation)
FIG. 9 is a schematic diagram showing an example of an image used as teacher data. In FIG. 9, images G2 to G8 are images generated by performing data augmentation processing on image G1.

ここで、画像Ｇ１は、学習部２１１が取得した画像（例えば、ＲＧＢ形式）、または、上述した事前処理を施した画像（例えば、ＧＧＧ形式、またはＧＧＧ形式）である。画像Ｇ１は、物体ｏｂｊ２を被写体として含む。また、画像Ｇ１には、５つの把持候補位置ＣＰを示すバウンディングボックスＢＢ１１～ＢＢ１５が関連付けられている。また、図示はしていないが、各バウンディングボックスＢＢ１～ＢＢ１５には、それぞれ、把持成功確率ｐが関連付けられている。 Here, the image G1 is an image acquired by the learning unit 211 (for example, RGB format) or an image subjected to the preprocessing described above (for example, GGG format or GGG format). An image G1 includes an object obj2 as a subject. The image G1 is also associated with bounding boxes BB11 to BB15 indicating five candidate gripping positions CP. Although not shown, each bounding box BB1 to BB15 is associated with a gripping success probability p.

具体的には、画像Ｇ２は、画像Ｇ１を水平反転させることにより生成された画像である。画像Ｇ２に対して関連付けられるバウンディングボックスＢＢ２１～ＢＢ２５は、画像Ｇ１上に示されたバウンディングボックスＢＢ１１～ＢＢ１５を同様に水平反転させることにより生成される。 Specifically, the image G2 is an image generated by horizontally reversing the image G1. Bounding boxes BB21-BB25 associated with image G2 are generated by similarly horizontally reversing bounding boxes BB11-BB15 shown on image G1.

また、画像Ｇ３は、画像Ｇ１を垂直反転させることにより生成された画像である。画像Ｇ３に対して関連付けられるバウンディングボックスＢＢ３１～ＢＢ３５は、画像Ｇ１上に示されたバウンディングボックスＢＢ１１～ＢＢ１５を同様に垂直反転させることにより生成される。 An image G3 is an image generated by vertically inverting the image G1. The bounding boxes BB31-BB35 associated with image G3 are generated by similarly vertically flipping the bounding boxes BB11-BB15 shown on image G1.

また、画像Ｇ４は、画像Ｇ１を回転させることにより生成された画像である。画像Ｇ４に対して関連付けられるバウンディングボックスＢＢ４１～ＢＢ４５は、画像Ｇ１上に示されたバウンディングボックスＢＢ１１～ＢＢ１５を同様に回転させることにより生成される。 An image G4 is an image generated by rotating the image G1. The bounding boxes BB41-BB45 associated with image G4 are generated by similarly rotating the bounding boxes BB11-BB15 shown on image G1.

また、画像Ｇ５は、画像Ｇ１を移動させることにより生成された画像である。画像Ｇ５に対して関連付けられるバウンディングボックスＢＢ５１～ＢＢ５５は、画像Ｇ１上に示されたバウンディングボックスＢＢ１１～ＢＢ１５を同様に移動させることにより生成される。 An image G5 is an image generated by moving the image G1. Bounding boxes BB51-BB55 associated with image G5 are generated by similarly moving bounding boxes BB11-BB15 shown on image G1.

また、画像Ｇ６は、画像Ｇ１を拡大することにより生成された画像である。画像Ｇ６に対して関連付けられるバウンディングボックスＢＢ６１～ＢＢ６５は、画像Ｇ１上に示されたバウンディングボックスＢＢ１１～ＢＢ１５を同様に拡大することにより生成される。 An image G6 is an image generated by enlarging the image G1. The bounding boxes BB61-BB65 associated with image G6 are generated by similarly enlarging the bounding boxes BB11-BB15 shown on image G1.

また、画像Ｇ７は、画像Ｇ１を縮小することにより生成された画像である。画像Ｇ７に対して関連付けられるバウンディングボックスＢＢ７１～ＢＢ７５は、画像Ｇ１上に示されたバウンディングボックスＢＢ１１～ＢＢ１５を同様に縮小することにより生成される。 An image G7 is an image generated by reducing the image G1. The bounding boxes BB71-BB75 associated with image G7 are generated by similarly reducing the bounding boxes BB11-BB15 shown on image G1.

また、画像Ｇ８は、画像Ｇ１から切り出すことにより生成された画像である。画像Ｇ８に対して関連付けられるバウンディングボックスＢＢ８１～ＢＢ８５は、画像Ｇ１上に示されたバウンディングボックスＢＢ１１～ＢＢ１５から同様に切り出すことにより生成される。 An image G8 is an image generated by cutting out from the image G1. Bounding boxes BB81 to BB85 associated with image G8 are generated by similarly cutting out bounding boxes BB11 to BB15 shown on image G1.

学習部２１１は、このように、取得した画像Ｇ１に対して事前処理およびデータオーギュメンテーション処理を施した画像Ｇ１～Ｇ８を、教師データとして用いる。 The learning unit 211 uses the images G1 to G8 obtained by subjecting the acquired image G1 to preprocessing and data augmentation processing in this way as teacher data.

（ステップＳ２０２）
図６のステップＳ２０２において、学習部２１１は、各画像について、関連付けられた複数の把持候補位置ＣＰのうち所定数を正解として選択する。所定数は、推測モデル２２１から出力する把持候補位置ＣＰの個数であり、ここでは、３である。また、所定数の把持候補位置ＣＰを選択する手法は、ここでは、ランダムであるとするが、その他の手法により所定数の把持候補位置ＣＰを選択してもよい。図９の例では、学習部２１１は、各画像Ｇｉ（ｉ＝１、２、・・・、８）について、バウンディングボックスＢＢｉ１～ＢＢｉ５のうちランダムに３つを正解として選択する。 (Step S202)
In step S202 of FIG. 6, the learning unit 211 selects a predetermined number of gripping candidate positions CP associated with each image as correct answers. The predetermined number is the number of gripping candidate positions CP output from the estimation model 221, and is 3 here. Also, although the method of selecting the predetermined number of candidate gripping positions CP is assumed to be random here, the predetermined number of candidate gripping positions CP may be selected by other methods. In the example of FIG. 9, the learning unit 211 randomly selects three of the bounding boxes BBi1 to BBi5 as correct answers for each image Gi (i=1, 2, . . . , 8).

（ステップＳ２０３）
ステップＳ２０３において、学習部２１１は、ステップＳ２０２で選択した複数の把持候補位置ＣＰを正解として、推測モデル２２１を学習させる。具体的には、学習部２１１は、画像Ｇ１～Ｇ８をそれぞれ入力として、正解として選択した３つのバウンディングボックスＢＢおよびその把持成功確率ｐを出力するよう、推測モデル２２１を学習させる。 (Step S203)
In step S203 , the learning unit 211 learns the inference model 221 with the plurality of candidate gripping positions CP selected in step S202 as correct answers. Specifically, the learning unit 211 receives the images G1 to G8 as input, and trains the inference model 221 so as to output three bounding boxes BB selected as correct answers and their grasping success probabilities p.

（ステップＳ２０４）
ステップＳ２０４において、学習部２１１は、学習を終了するか否かを判断する。ステップＳ２０４でＮｏと判断された場合、学習部２１１は、ステップＳ２０２からの処理を繰り返す。例えば、ステップＳ２０４では、繰り返し回数が閾値を超えたか否かに基づいて、学習を終了するか否かを判断してもよい。また、ステップＳ２０４では、入力装置を介して入力されるユーザの指示に基づいて、学習を終了するか否かを判断してもよい。 (Step S204)
In step S204, the learning unit 211 determines whether or not to end learning. If it is determined No in step S204, the learning unit 211 repeats the process from step S202. For example, in step S204, it may be determined whether or not to end learning based on whether or not the number of repetitions exceeds a threshold. Further, in step S204, it may be determined whether or not to end the learning based on the user's instruction input via the input device.

ここで、繰り返し処理においてステップＳ２０２でランダムに選択される所定数の把持候補位置ＣＰは、前回のステップＳ２０２で選択された所定数の把持候補位置ＣＰとは異なる可能性が高い。したがって、学習装置２０は、同一の画像について正解となる把持候補位置ＣＰの組み合わせを変えながら学習を繰り返すことができ、推測モデル２２１の推測精度を向上させることができる。 Here, the predetermined number of candidate gripping positions CP randomly selected in step S202 in the repeated process is highly likely to be different from the predetermined number of candidate gripping positions CP selected in the previous step S202. Therefore, the learning device 20 can repeat learning while changing the combination of the gripping candidate positions CP that are correct for the same image, and can improve the estimation accuracy of the estimation model 221 .

＜把持位置の決定処理＞
次に、ステップＳ１０８における把持位置の決定処理の詳細について説明する。図１０は、把持位置の決定処理の詳細な流れを示すフローチャートである。 <Processing for Determining Gripping Position>
Next, the details of the process of determining the gripping position in step S108 will be described. FIG. 10 is a flowchart showing a detailed flow of gripping position determination processing.

（ステップＳ３０１）
ステップＳ３０１において、推測部１１２は、複数の把持候補位置ＣＰの各々に関する把持成功確率ｐを取得する。具体的には、推測部１１２は、ステップＳ１０７において推測モデル２２１から出力された出力情報ｇを参照し、当該出力情報ｇに含まれる把持成功確率ｐを取得すればよい。 (Step S301)
In step S301, the estimation unit 112 acquires a gripping success probability p for each of the gripping candidate positions CP. Specifically, the estimation unit 112 may refer to the output information g output from the estimation model 221 in step S107 and acquire the gripping success probability p included in the output information g.

（ステップＳ３０２）
ステップＳ３０２において、決定部１１３は、把持成功確率ｐを参照して把持位置を決定する。例えば、決定部１１３は、把持成功確率ｐが最大の把持候補位置ＣＰを、把持位置として決定する。 (Step S302)
In step S302, the determining unit 113 refers to the gripping success probability p to determine the gripping position. For example, the determination unit 113 determines the gripping candidate position CP with the highest gripping success probability p as the gripping position.

以上のように、本実施形態に係る把持システム１は、推測モデル２２１を用いて複数の把持候補位置ＣＰおよび各位置の把持成功確率ｐを推測し、そのうち、把持成功確率ｐが最大の把持候補位置ＣＰを把持位置として決定する。これにより、把持システム１は、物体においてロボットアーム３０に把持させる把持位置として、把持に成功する可能性がより高い把持位置を決定することができる。 As described above, the gripping system 1 according to the present embodiment uses the estimation model 221 to estimate a plurality of gripping candidate positions CP and the gripping success probability p of each position. Position CP is determined as the gripping position. As a result, the gripping system 1 can determine a gripping position with a higher possibility of successful gripping as the gripping position to be gripped by the robot arm 30 on the object.

〔変形例２〕
上述した実施形態に係る把持システム１は、決定部１１３による把持位置の決定処理を、以下の通り変形することが可能である。 [Modification 2]
In the gripping system 1 according to the above-described embodiment, the gripping position determination processing by the determination unit 113 can be modified as follows.

決定部１１３は、複数の把持候補位置ＣＰの各々について、当該把持候補位置ＣＰを特定する画像上の領域（バウンディングボックスＢＢ）と、画像上で物体を示す物体領域との関係性に応じた評価値を算出する。また、決定部１１３は、算出した評価値を参照して把持位置を決定する。ここで、バウンディングボックスＢＢと物体領域との関係性とは、（ｉ）バウンディングボックスＢＢの面積と、当該バウンディングボックスＢＢにおいて物体領域が占める部分領域の面積との関係、（ｉｉ）バウンディングボックスＢＢが沿う第１方向と、上述した部分領域が沿う第２方向との関係、および（ｉｉｉ）バウンディングボックスＢＢの中心と、上述した部分領域の中心との関係、のうち一部または全部を含む。これらの関係性の詳細については後述する。 For each of a plurality of candidate gripping positions CP, the determining unit 113 performs evaluation according to the relationship between the area (bounding box BB) on the image that identifies the candidate gripping position CP and the object area that indicates the object on the image. Calculate the value. Further, the determination unit 113 determines the grip position with reference to the calculated evaluation value. Here, the relationship between the bounding box BB and the object area includes (i) the relationship between the area of the bounding box BB and the area of the partial area occupied by the object area in the bounding box BB, and (ii) the bounding box BB is and (iii) the relationship between the center of the bounding box BB and the center of the partial area. Details of these relationships will be described later.

本変形例に係る把持システム１では、図２のステップＳ１０８における把持位置の決定処理が、以下のように変形される。図１１は、本変形例における把持位置の決定処理の詳細な流れを示すフローチャートである。制御装置１０の決定部１１３は、ステップＳ１０７で得られた複数の把持候補位置ＣＰの各々について、ステップＳ４０１～Ｓ４０５の処理を実行する。 In the gripping system 1 according to this modified example, the gripping position determination process in step S108 of FIG. 2 is modified as follows. FIG. 11 is a flowchart showing the detailed flow of the gripping position determination process in this modified example. The determination unit 113 of the control device 10 executes the processes of steps S401 to S405 for each of the plurality of candidate gripping positions CP obtained in step S107.

（ステップＳ４０１）
ステップＳ４０１において、決定部１１３は、当該把持候補位置ＣＰを特定するバウンディングボックスＢＢと上述した部分領域との関係を表す情報を求める。具体的には、決定部１１３は、当該関係を表す情報として、面積比α２を算出する。 (Step S401)
In step S401, the determination unit 113 obtains information representing the relationship between the bounding box BB that specifies the gripping candidate position CP and the above partial area. Specifically, the determination unit 113 calculates the area ratio α2 as information representing the relationship.

図１２は、バウンディングボックスＢＢと物体領域ＡＡとの関係性を説明するための模式図である。図１２において、面積比α２は、バウンディングボックスＢＢの面積に対する部分領域Ａの面積の割合である。部分領域Ａは、物体領域ＡＡ（太線で囲まれた領域）のうち、バウンディングボックスＢＢに含まれる部分である。決定部１１３は、バウンディングボックスＢＢにおける部分領域Ａを検出し、面積比α２を算出する。面積比α２は、次式（２）によって算出される。 FIG. 12 is a schematic diagram for explaining the relationship between the bounding box BB and the object area AA. In FIG. 12, the area ratio α2 is the ratio of the area of the partial region A to the area of the bounding box BB. A partial area A is a portion of the object area AA (the area surrounded by a thick line) that is included in the bounding box BB. Determination unit 113 detects partial region A in bounding box BB and calculates area ratio α2. The area ratio α2 is calculated by the following equation (2).

α２＝[部分領域Ａの面積]／［ｗ＊ｈ］・・・（２）
式（２）において、「／」は除算を表し、「＊」は乗算を表す。ｗ、ｈは、バウンディングボックスＢＢの長辺および短辺の長さである。式（２）により算出される面積比α２がとりうる範囲は、０以上１以下である。 α2=[area of partial region A]/[w*h] (2)
In equation (2), "/" represents division and "*" represents multiplication. w and h are the lengths of the long and short sides of the bounding box BB. The possible range of the area ratio α2 calculated by Equation (2) is 0 or more and 1 or less.

ここで、上述した面積比α２は、当該バウンディングボックスＢＢが特定する把持候補位置ＣＰを評価する指標となる。具体的には、面積比α２は、把持動作速度に影響を与える。 Here, the area ratio α2 described above serves as an index for evaluating the gripping candidate position CP specified by the bounding box BB. Specifically, the area ratio α2 affects the gripping motion speed.

例えば、面積比α２が小さいほど、把持動作速度が遅くなると考えられる。ここで、把持動作速度とは、指部３４ａ、３４ｂが閉じる動作を開始してから物体表面に接触するまでの時間の長さである。面積比α２が小さいほど、バウンディングボックスＢＢの少なくとも一方の短辺から部分領域Ａの境界線までの距離が長くなり、指部３４ａ、３４ｂの少なくとも一方が物体表面に接触するまでの時間が長くなる。 For example, it is considered that the smaller the area ratio α2, the slower the gripping speed. Here, the gripping motion speed is the length of time from when the fingers 34a and 34b start to close to when they come into contact with the surface of the object. The smaller the area ratio α2, the longer the distance from at least one short side of the bounding box BB to the boundary line of the partial area A, and the longer the time until at least one of the fingers 34a and 34b contacts the object surface. .

したがって、面積比α２が大きいほど、把持動作速度が向上するため、把持候補位置ＣＰの評価が高くなる。 Therefore, the larger the area ratio α2, the higher the gripping motion speed, and the higher the evaluation of the gripping candidate position CP.

（ステップＳ４０２）
ステップＳ４０２において、決定部１１３は、当該把持候補位置ＣＰを特定するバウンディングボックスＢＢについて、バウンディングボックスＢＢが沿う第１方向と部分領域Ａが沿う第２方向との関係を表す情報を求める。具体的には、決定部１１３は、当該関係を表す情報として、把持角度α３を算出する。図１２に示す例では、把持角度α３は、第１方向ｄ１と第２方向ｄ２とがなす角度である。把持角度α３は、次式（３）により算出される。 (Step S402)
In step S402, the determining unit 113 obtains information representing the relationship between the first direction along which the bounding box BB and the second direction along which the partial region A is along, for the bounding box BB that specifies the gripping candidate position CP. Specifically, the determination unit 113 calculates the gripping angle α3 as information representing the relationship. In the example shown in FIG. 12, the gripping angle α3 is the angle formed by the first direction d1 and the second direction d2. The gripping angle α3 is calculated by the following equation (3).

ここで、「・」は内積を表す。また、「｜ｄ１｜」は、第１方向ｄ１（ベクトルｄ１）の大きさを表し、「｜ｄ２｜」は、第２方向ｄ２（ベクトルｄ２）の大きさを表す。

Here, "·" represents an inner product. "|d1|" represents the magnitude of the first direction d1 (vector d1), and "|d2|" represents the magnitude of the second direction d2 (vector d2).

把持角度α３を算出するため、決定部１１３は、第１方向ｄ１として、バウンディングボックスＢＢの長辺が沿う方向を検出する。第１方向ｄ１は、ロボットアーム３０が把持動作を行う方向（ここでは、ハンド部３４の指部３４ａ、３４ｂの開閉方向）に相当する。また、決定部１１３は、第２方向ｄ２として、部分領域Ａが沿う方向を検出する。第２方向ｄ２は、把持候補位置ＣＰにおける物体の軸方向に相当する。第２方向ｄ２を検出する手法としては、画像を用いて物体の軸方向を検出する公知の技術を採用可能である。なお、第１方向ｄ１および第２方向ｄ２は、上述したα３が０以上π／２以下となるように検出されるものとする。 In order to calculate the gripping angle α3, the determination unit 113 detects the direction along the long side of the bounding box BB as the first direction d1. The first direction d1 corresponds to the direction in which the robot arm 30 performs a gripping operation (here, the opening and closing direction of the finger portions 34a and 34b of the hand portion 34). Further, the determination unit 113 detects the direction along which the partial area A extends as the second direction d2. The second direction d2 corresponds to the axial direction of the object at the gripping candidate position CP. As a method of detecting the second direction d2, a known technique of detecting the axial direction of an object using an image can be adopted. It should be noted that the first direction d1 and the second direction d2 are detected so that the aforementioned α3 is 0 or more and π/2 or less.

ここで、上述した把持角度α３は、当該バウンディングボックスＢＢが特定する把持候補位置ＣＰを評価する指標となる。例えば、当該把持角度α３がπ／２に近いほど、ハンド部３４の開閉方向と物体ｏｂｊの軸方向とが直交に近くなり、把持が容易になると考えられる。また、把持角度α３が０に近いほど、ハンド部３４の開閉方向と物体ｏｂｊの軸方向とが並行に近くなり、把持が難しくなると考えられる。したがって、把持角度α３が大きいほど、把持候補位置ＣＰの評価が高くなる。 Here, the gripping angle α3 described above serves as an index for evaluating the gripping candidate position CP specified by the bounding box BB. For example, the closer the gripping angle α3 is to π/2, the closer the opening/closing direction of the hand unit 34 is to the axial direction of the object obj, making gripping easier. Also, the closer the gripping angle α3 is to 0, the closer the opening/closing direction of the hand unit 34 is to the axial direction of the object obj, making gripping more difficult. Therefore, the larger the gripping angle α3, the higher the evaluation of the gripping candidate position CP.

（ステップＳ４０３）
ステップＳ４０３において、決定部１１３は、当該把持候補位置ＣＰを特定するバウンディングボックスＢＢについて、当該バウンディングボックスＢＢの中心と部分領域Ａの中心との関係を表す情報を求める。具体的には、決定部１１３は、これらの中心間の関係を表す情報として、中心距離α４を求める。図１２に示す例では、中心距離α４は、バウンディングボックスＢＢの中心Ｃ１と部分領域Ａの中心Ｃ２との距離である。中心距離α４は、次式（４）によって算出される。 (Step S403)
In step S403, the determination unit 113 obtains information representing the relationship between the center of the bounding box BB and the center of the partial area A for the bounding box BB that specifies the gripping candidate position CP. Specifically, determination unit 113 obtains center distance α4 as information representing the relationship between these centers. In the example shown in FIG. 12, the center distance α4 is the distance between the center C1 of the bounding box BB and the center C2 of the partial area A. In the example shown in FIG. The center distance α4 is calculated by the following equation (4).

ここで、図１２に示すように、ｘ１，ｙ１は中心Ｃ１の座標であり、ｘ２，ｙ２は中心Ｃ２の座標である。式（４）によって算出される中心距離α４がとりうる範囲は、０以上である。

Here, as shown in FIG. 12, x1, y1 are the coordinates of the center C1, and x2, y2 are the coordinates of the center C2. The possible range of the center distance α4 calculated by Equation (4) is 0 or more.

決定部１１３は、中心Ｃ１の座標（ｘ１，ｙ１）として、当該バウンディングボックスＢＢを示す６つのパラメータに含まれるパラメータｘ，ｙの値を取得する。また、決定部１１３は、中心Ｃ２の座標（ｘ２，ｙ２）として、物体ｏｂｊの重心に相当する画像Ｇ上の座標を検出する。重心に相当する座標を検出する手法としては、画像を用いて物体の重心を検出する公知の技術を採用可能である。 The determination unit 113 acquires the values of the parameters x and y included in the six parameters indicating the bounding box BB as the coordinates (x1, y1) of the center C1. The determining unit 113 also detects the coordinates on the image G corresponding to the center of gravity of the object obj as the coordinates (x2, y2) of the center C2. As a method of detecting the coordinates corresponding to the center of gravity, a known technique of detecting the center of gravity of an object using an image can be adopted.

ここで、中心距離α４は、当該バウンディングボックスＢＢが特定する把持候補位置ＣＰを評価する指標となる。例えば、中心距離α４が０に近いほど、指部３４ａ、３４ｂを開いた状態のハンド部３４の中心付近に物体が存在する可能性が高い。このため、把持が容易になると考えられる。したがって、中心距離α４が小さいほど、把持候補位置ＣＰの評価が高くなる。 Here, the center distance α4 is an index for evaluating the gripping candidate position CP specified by the bounding box BB. For example, the closer the center distance α4 is to 0, the more likely there is an object near the center of the hand 34 with the fingers 34a and 34b opened. For this reason, it is considered that gripping becomes easier. Therefore, the smaller the center distance α4, the higher the evaluation of the candidate gripping position CP.

（ステップＳ４０４）
ステップＳ４０４において、決定部１１３は、当該把持候補位置ＣＰを特定するバウンディングボックスＢＢについて把持成功確率α１を取得する。決定部１１３は、ステップＳ１０７において推測モデル２２１から出力される出力情報ｇを参照して、把持成功確率ｐを取得すればよい。 (Step S404)
In step S404, the determination unit 113 acquires the gripping success probability α1 for the bounding box BB that specifies the gripping candidate position CP. The determination unit 113 may acquire the gripping success probability p by referring to the output information g output from the inference model 221 in step S107.

（ステップＳ４０５）
ステップＳ４０５において、決定部１１３は、当該把持候補位置ＣＰを評価する評価値αを算出する。評価値αは、次式（５）によって算出される。 (Step S405)
In step S405, the determining unit 113 calculates an evaluation value α for evaluating the gripping candidate position CP. The evaluation value α is calculated by the following equation (5).

α＝ｗ１＊α１＋ｗ２＊α２＋ｗ３＊α３＋ｗ４＊α４・・・（５）
ここで、ｗ１、ｗ２、ｗ３、ｗ４は、重み付け係数である。また、「＊」は乗算を表す。ここでは、本実施形態では、評価値は大きいほど評価が高い、すなわち、大きいほど把持が容易であるとする。この場合、把持成功確率α１は大きいほど評価が高いため、係数ｗ１は正である。また、面積比α２は大きいほど評価が高いため、係数ｗ２は正である。また、把持角度α３は大きい（π／２に近い）ほど評価が高いため、係数ｗ３は正である。また、中心距離α４は小さい（０に近い）ほど評価が高いため、係数ｗ４は負である。なお、評価値を算出する計算式は、上述した式（５）に限定されない。 α=w1*α1+w2*α2+w3*α3+w4*α4 (5)
Here, w1, w2, w3, and w4 are weighting factors. Also, "*" represents multiplication. Here, in this embodiment, it is assumed that the larger the evaluation value, the higher the evaluation, that is, the larger the evaluation value, the easier the grip. In this case, the larger the gripping success probability α1 is, the higher the evaluation is, so the coefficient w1 is positive. Also, since the larger the area ratio α2, the higher the evaluation, the coefficient w2 is positive. Also, the larger the gripping angle α3 (closer to π/2), the higher the evaluation, so the coefficient w3 is positive. Also, the smaller the center distance α4 (closer to 0), the higher the evaluation, so the coefficient w4 is negative. Note that the calculation formula for calculating the evaluation value is not limited to the formula (5) described above.

各把持候補位置ＣＰについてステップＳ４０１～Ｓ４０５の処理が完了すると、次のステップＳ４０６の処理が実行される。 When the processing of steps S401 to S405 is completed for each gripping candidate position CP, the processing of the next step S406 is executed.

（ステップＳ４０６）
ステップＳ４０６において、決定部１１３は、評価値αを参照して把持位置を決定する。例えば、決定部１１３は、評価値αが最大の把持候補位置ＣＰを、把持位置として決定する。 (Step S406)
In step S406, the determination unit 113 refers to the evaluation value α to determine the gripping position. For example, the determination unit 113 determines the gripping candidate position CP with the largest evaluation value α as the gripping position.

以上のように、本変形例に係る把持システム１は、推測モデル２２１を用いて複数の把持候補位置ＣＰを推測してそれぞれの評価値を算出し、そのうち、評価値が最大の把持候補位置ＣＰを把持位置として決定する。これにより、把持システム１は、物体においてロボットアーム３０に把持させる把持位置として、評価がより高い把持位置を決定することができる。 As described above, the gripping system 1 according to the present modification estimates a plurality of candidate gripping positions CP using the estimation model 221 and calculates evaluation values for each of the candidate gripping positions CP. is determined as the grasping position. As a result, the gripping system 1 can determine a gripping position with a higher evaluation as the gripping position of the object to be gripped by the robot arm 30 .

〔変形例３〕
上述した実施形態に係る把持システム１は、決定部１１３による把持位置の決定処理を、以下の通り変形することが可能である。 [Modification 3]
In the gripping system 1 according to the above-described embodiment, the gripping position determination processing by the determination unit 113 can be modified as follows.

決定部１１３は、複数の把持候補位置ＣＰの各々を、当該把持候補位置ＣＰを特定する画像上の領域（バウンディングボックスＢＢ）と、画像上で物体を示す物体領域ＡＡとの関係性に応じて修正する。また、決定部１１３は、修正後の複数の把持候補位置ＣＰを参照して、把持位置を決定する。ここで、バウンディングボックスＢＢと物体領域ＡＡとの関係性については、変形例２において図１２を参照して説明した通りである。 Determining unit 113 determines each of a plurality of candidate gripping positions CP according to the relationship between an area (bounding box BB) on the image specifying the candidate gripping position CP and an object area AA indicating an object on the image. fix it. Further, the determination unit 113 determines the gripping position by referring to the plurality of modified gripping candidate positions CP. Here, the relationship between the bounding box BB and the object area AA is as described in Modification 2 with reference to FIG.

本変形例に係る把持システム１では、図２のステップＳ１０８における把持位置の決定処理が、以下のように変形される。 In the gripping system 1 according to this modified example, the gripping position determination process in step S108 of FIG. 2 is modified as follows.

図１３は、本変形例における把持位置の決定処理の詳細な流れを示すフローチャートである。また、図１４は、当該決定処理の各ステップにおける処理の具体例を説明する図である。なお、以下の説明では、「把持候補位置ＣＰを修正する」ことを、単に「バウンディングボックスＢＢを修正する」、とも記載する。 FIG. 13 is a flow chart showing a detailed flow of gripping position determination processing in this modified example. Also, FIG. 14 is a diagram illustrating a specific example of processing in each step of the determination processing. In the following description, "correcting the candidate grip position CP" is also simply referred to as "correcting the bounding box BB."

（ステップＳ５０１）
図１３に示すステップＳ５０１において、制御装置１０の決定部１１３は、ステップＳ１０７で得られた複数の把持候補位置ＣＰのうち、面積比α２が所定範囲外の把持候補位置ＣＰを除外する。これにより、決定部１１３は、推測モデル２２１から得られた複数の把持候補位置ＣＰのうちノイズの可能性が高いものを除去する。 (Step S501)
In step S501 shown in FIG. 13, the determination unit 113 of the control device 10 excludes candidate gripping positions CP whose area ratio α2 is outside the predetermined range from the plurality of candidate gripping positions CP obtained in step S107. As a result, the determination unit 113 removes those with a high possibility of being noise among the plurality of candidate gripping positions CP obtained from the inference model 221 .

ここで、面積比α２は、把持候補位置ＣＰが、推測モデル２２１が出力したノイズであるか否かを判断する指標ともなる。例えば、面積比α２が１に近い場合、指部３４ａ、３４ｂの配置位置が物体の表面に近いため、ハンド部３４と物体とが干渉する可能性があり、物体を把持できない可能性が高い。また、例えば、面積比α２が０に近い場合、ハンド部３４により把持される部分が小さすぎるため、物体を把持できない可能性が高い。換言すると、面積比α２が１または０に近い把持候補位置ＣＰは、推測モデル２２１から出力されたノイズであるとみなすことができる。 Here, the area ratio α2 also serves as an index for determining whether or not the candidate grip position CP is noise output by the estimation model 221 . For example, when the area ratio α2 is close to 1, the finger portions 34a and 34b are positioned close to the surface of the object, so there is a high possibility that the hand portion 34 and the object will interfere with each other, making it impossible to grip the object. Further, for example, when the area ratio α2 is close to 0, there is a high possibility that the object cannot be gripped because the portion gripped by the hand portion 34 is too small. In other words, the gripping candidate positions CP with the area ratio α2 close to 1 or 0 can be regarded as noise output from the estimation model 221 .

具体的には、決定部１１３は、各把持候補位置ＣＰを特定するバウンディングボックスＢＢについて、変形例２のステップＳ４０１と同様に動作して面積比α２を算出する。また、決定部１１３は、面積比α２が次式（６）を満たさないバウンディングボックスＢＢを除外する。 Specifically, the determination unit 113 operates in the same manner as in step S401 of Modification 2 to calculate the area ratio α2 for the bounding box BB specifying each gripping candidate position CP. Further, the determining unit 113 excludes bounding boxes BB whose area ratio α2 does not satisfy the following expression (6).

ｒ１≦α２≦ｒ２（ただし、０＜ｒ１＜ｒ２＜１）・・・（６）
式（６）において、ｒ１およびｒ２は、所定範囲の下限および上限を示す定数である。下限ｒ１の一例として、０．２が考えられる。また、上限ｒ２の一例として、０．８が考えられる。ただし、ｒ１およびｒ２は、上述した値に限られない。 r1≤α2≤r2 (where 0<r1<r2<1) (6)
In formula (6), r1 and r2 are constants indicating the lower and upper limits of the predetermined range. An example of the lower limit r1 is 0.2. Moreover, 0.8 can be considered as an example of the upper limit r2. However, r1 and r2 are not limited to the values described above.

当該ステップの処理の一例を、図１４を参照して説明する。図１４に示す画像Ｇ１１には、ステップＳ１０７において推測された複数の把持候補位置ＣＰを特定するバウンディングボックスＢＢ１～ＢＢ８が図示されている。なお、図１４に示す例では、推測モデル２２１は、８つの把持候補位置ＣＰを示す情報を出力するよう学習されている。 An example of the processing of this step will be described with reference to FIG. An image G11 shown in FIG. 14 shows bounding boxes BB1 to BB8 that specify a plurality of candidate gripping positions CP estimated in step S107. In the example shown in FIG. 14, the inference model 221 is trained to output information indicating eight candidate gripping positions CP.

ここで、バウンディングボックスＢＢ１には、物体ｏｂｊ１を示す物体領域ＡＡが含まれていないため、面積比α２として０が算出される。また、バウンディングボックスＢＢ２は、物体ｏｂｊ１を示す物体領域ＡＡに包含されているため、面積比α２として１が算出される。したがって、当該ステップにおいて、決定部１１３は、バウンディングボックスＢＢ１およびＢＢ２を除外する。図１４に示す画像Ｇ１２は、ステップＳ５０１で除外されずに残った６つのバウンディングボックスＢＢ３～ＢＢ８を示している。 Here, since the bounding box BB1 does not include the object area AA indicating the object obj1, 0 is calculated as the area ratio α2. Also, since the bounding box BB2 is included in the object area AA indicating the object obj1, 1 is calculated as the area ratio α2. Therefore, in this step, the determining unit 113 excludes the bounding boxes BB1 and BB2. An image G12 shown in FIG. 14 shows the six bounding boxes BB3 to BB8 that have not been excluded in step S501.

前述したように、面積比α２が１または０に近い場合、そのような把持候補位置ＣＰは、推測モデル２２１から出力されたノイズの可能性がある。したがって、当該ステップの処理により、ノイズの可能性が高い把持候補位置ＣＰが除外される。 As described above, when the area ratio α2 is close to 1 or 0, such candidate gripping positions CP may be noise output from the estimation model 221 . Therefore, by the processing of this step, the gripping candidate positions CP that are highly likely to be noise are excluded.

（ステップＳ５０２）
図１３に示すステップＳ５０２において、決定部１１３は、ステップＳ５０１で残った１または複数の把持候補位置ＣＰのうち、所定値以上となるよう把持候補位置ＣＰの把持角度α３を修正する。具体的には、決定部１１３は、各把持候補位置ＣＰを特定するバウンディングボックスＢＢ３～ＢＢ８について、変形例２のステップＳ４０２と同様に動作して把持角度α３を算出する。また、決定部１１３は、把持角度α３が次式（７）を満たさないバウンディングボックスＢＢについて、その第１方向ｄ１を修正することにより把持角度α３を修正する。 (Step S502)
In step S502 shown in FIG. 13, the determining unit 113 corrects the gripping angle α3 of the one or more gripping candidate positions CP remaining in step S501 so that the gripping angle α3 is greater than or equal to a predetermined value. Specifically, the determining unit 113 operates in the same manner as in step S402 of the second modification to calculate the gripping angle α3 for the bounding boxes BB3 to BB8 that specify each gripping candidate position CP. Further, the determining unit 113 corrects the gripping angle α3 by correcting the first direction d1 for the bounding box BB whose gripping angle α3 does not satisfy the following expression (7).

α３≦θ１（ただし、０＜θ１＜π／２）・・・（７）
θ１は、把持角度α３を修正するか否かを判定するための閾値であり、例えば、π／４である。ただし、θ１の値は、これに限られない。 α3≦θ1 (where 0<θ1<π/2) (7)
θ1 is a threshold for determining whether or not to correct the gripping angle α3, and is π/4, for example. However, the value of θ1 is not limited to this.

当該ステップの処理の一例を、図１４を参照して説明する。図１４に示す画像Ｇ１２において、バウンディングボックスＢＢ７は、把持角度α３がθ１（π／４）より小さいとする。そこで、決定部１１３は、バウンディングボックスＢＢ７が沿う第１方向ｄ１を、把持角度α３がθ１以上となるよう修正する。図１４に示す画像Ｇ１３は、当該ステップの処理後のバウンディングボックスＢＢ３～ＢＢ８を示している。画像Ｇ１３では、バウンディングボックスＢＢ７の把持角度α３が、θ１（π／４）以上であるπ／２に修正されている。 An example of the processing of this step will be described with reference to FIG. In the image G12 shown in FIG. 14, the bounding box BB7 has a gripping angle α3 smaller than θ1(π/4). Therefore, the determining unit 113 corrects the first direction d1 along which the bounding box BB7 is aligned so that the gripping angle α3 is greater than or equal to θ1. An image G13 shown in FIG. 14 shows the bounding boxes BB3 to BB8 after the processing of this step. In the image G13, the gripping angle α3 of the bounding box BB7 is corrected to π/2, which is greater than or equal to θ1(π/4).

前述したように、把持角度α３がπ／２に近いほど把持が容易になり、０に近いほど把持が難しくなると考えられる。したがって、当該ステップの処理により、把持角度α３が適切でない把持候補位置ＣＰについて、当該把持角度α３が改善される。 As described above, it is considered that the closer the gripping angle α3 is to π/2, the easier it is to grip, and the closer it is to 0, the harder it is to grip. Therefore, the processing of this step improves the gripping angle α3 for the gripping candidate position CP for which the gripping angle α3 is not appropriate.

（ステップＳ５０３）
図１３に示すステップＳ５０３において、決定部１１３は、ステップＳ５０２の処理後の各把持候補位置ＣＰについて、所定値以上となるよう面積比α２を修正する。具体的には、決定部１１３は、各把持候補位置ＣＰを特定するバウンディングボックスＢＢ３～ＢＢ８について、変形例２のステップＳ４０１と同様に動作して面積比α２を算出する。また、決定部１１３は、面積比α２が次式（８）を満たさないバウンディングボックスＢＢを縮小することにより、面積比α２を修正する。 (Step S503)
In step S503 shown in FIG. 13, the determining unit 113 corrects the area ratio α2 so that it becomes equal to or greater than a predetermined value for each candidate gripping position CP after processing in step S502. Specifically, the determination unit 113 operates in the same manner as in step S401 of the second modification for the bounding boxes BB3 to BB8 that specify each gripping candidate position CP to calculate the area ratio α2. Further, the determination unit 113 corrects the area ratio α2 by reducing the bounding box BB whose area ratio α2 does not satisfy the following expression (8).

α２≧ｒ３（ただし、ｒ１＜ｒ３＜ｒ２）・・・（８）
ｒ３は、面積比α２を修正するか否かを判定するための閾値であり、例えば、０．５である。ただし、ｒ３の値は、これに限られない。 α2≧r3 (however, r1<r3<r2) (8)
r3 is a threshold for determining whether or not to correct the area ratio α2, and is 0.5, for example. However, the value of r3 is not limited to this.

当該ステップの処理の一例を、図１４を参照して説明する。図１４に示す画像Ｇ１３において、６つのバウンディングボックスＢＢ３～ＢＢ８の各面積比α２は、全て所定値ｒ３未満であるとする。この場合、決定部１１３は、バウンディングボックスＢＢ３～ＢＢ８をそれぞれ縮小して、面積比α２がｒ３以上となるようにする。ここでは、決定部１１３は、バウンディングボックスＢＢ３～ＢＢ８各々について、短辺および長辺を縮小している。なお、決定部１１３は、短辺および長辺を縮小する際に、アスペクト比を維持してもよいし、維持しなくてもよい。また、決定部１１３は、バウンディングボックスＢＢ３～ＢＢ８各々または何れかについて、短辺および長辺の一方を縮小し、他方を拡縮しなくてもよい。また、バウンディングボックスＢＢ３～ＢＢ８の各々を縮小する基準点は、中心Ｃ１であってもよいし、それ以外の点であってもよい。図１４に示す画像Ｇ１４は、当該ステップの処理後のバウンディングボックスＢＢ３～ＢＢ８を示している。画像Ｇ１４では、バウンディングボックスＢＢ３～ＢＢ８の長辺および短辺の長さが、画像Ｇ１２における長さより縮小され、それぞれ面積比α２が所定値ｒ３以上となっている。 An example of the processing of this step will be described with reference to FIG. Assume that in the image G13 shown in FIG. 14, the area ratios α2 of the six bounding boxes BB3 to BB8 are all less than the predetermined value r3. In this case, the determining unit 113 reduces the bounding boxes BB3 to BB8 so that the area ratio α2 is equal to or greater than r3. Here, determination unit 113 reduces the short sides and long sides of each of bounding boxes BB3 to BB8. Note that the determining unit 113 may or may not maintain the aspect ratio when reducing the short sides and the long sides. Further, the determining unit 113 may reduce one of the short sides and the long sides of each or any one of the bounding boxes BB3 to BB8 without enlarging or reducing the other. Also, the reference point for reducing each of the bounding boxes BB3 to BB8 may be the center C1 or any other point. An image G14 shown in FIG. 14 shows the bounding boxes BB3 to BB8 after the processing of this step. In image G14, the lengths of the long and short sides of bounding boxes BB3 to BB8 are reduced from those in image G12, and the area ratio α2 of each bounding box is equal to or greater than a predetermined value r3.

前述したように、面積比α２が小さいと、把持動作速度が遅くなる。したがって、当該ステップの処理により、修正された把持候補位置ＣＰにおいて、把持動作速度が改善される。 As described above, when the area ratio α2 is small, the gripping speed becomes slow. Therefore, the gripping motion speed is improved at the modified gripping candidate position CP by the processing of this step.

（ステップＳ５０４）
ステップＳ５０４において、決定部１１３は、ステップＳ５０３の処理後の各把持候補位置ＣＰについて、把持角度α３を修正する。具体的には、決定部１１３は、バウンディングボックスＢＢ３～ＢＢ８がそれぞれ沿う第１方向ｄ１を修正し、把持角度α３を全て最適値（例えば、π／２）にする。これにより、全ての把持候補位置ＣＰについて、把持角度α３が最適化される。 (Step S504)
In step S504, the determination unit 113 corrects the gripping angle α3 for each gripping candidate position CP after the processing in step S503. Specifically, the determining unit 113 corrects the first direction d1 along which the bounding boxes BB3 to BB8 are aligned, and sets all the gripping angles α3 to an optimum value (eg, π/2). Thereby, the gripping angle α3 is optimized for all the gripping candidate positions CP.

なお、決定部１１３は、ステップＳ５０２の処理を省略してもよい。この場合、決定部１１３は、ステップＳ５０４の処理を、ステップＳ５０３の処理の前に実行してもよい。 Note that the determination unit 113 may omit the process of step S502. In this case, the determining unit 113 may execute the process of step S504 before the process of step S503.

（ステップＳ５０５）
ステップＳ５０５において、決定部１１３は、修正後の複数の把持候補位置ＣＰのうち、それぞれの把持成功確率ｐを参照して何れかを把持位置として決定する。なお、決定部１１３は、ステップＳ１０７において推測モデル２２１から出力される出力情報ｇを参照して、把持成功確率ｐを取得すればよい。具体的には、決定部１１３は、ステップＳ５０１～Ｓ５０４の処理が施されたバウンディングボックスＢＢ３～ＢＢ８のうち、把持成功確率ｐが最大のものを選択する。 (Step S505)
In step S505, the determination unit 113 refers to the gripping success probability p of each of the plurality of gripping candidate positions CP after correction, and determines one as the gripping position. Note that the determining unit 113 may acquire the gripping success probability p by referring to the output information g output from the inference model 221 in step S107. Specifically, the determination unit 113 selects the bounding box BB3 to BB8 that have been subjected to the processes of steps S501 to S504 and that has the highest gripping success probability p.

以上のように、本変形例に係る把持システム１は、推測モデル２２１を用いて複数の把持候補位置ＣＰを推測してそれぞれを修正し、修正した複数の把持候補位置ＣＰのうち把持成功確率ｐが最大の把持候補位置ＣＰを把持位置として決定する。これにより、把持システム１は、物体においてロボットアーム３０に把持させる把持位置として、把持成功確率ｐがより高く、かつ、より容易に把持可能な把持位置を決定することができる。 As described above, the gripping system 1 according to the present modification estimates a plurality of candidate gripping positions CP using the estimation model 221 and corrects each of them. is determined as the gripping position. As a result, the gripping system 1 can determine a gripping position that has a higher gripping success probability p and can be gripped more easily, as a gripping position for the robot arm 30 to grip the object.

〔その他の変形例〕
なお、上述した実施形態および各変形例において、ステップＳ１０８における把持位置の決定処理では、把持成功確率ｐまたは評価値αが最大の把持候補位置ＣＰを、把持位置として決定する例について説明した。ただし、決定部１１３は、把持成功確率ｐまたは評価値αが必ずしも最大の把持候補位置ＣＰを把持位置として決定しなくてもよい。例えば、決定部１１３は、把持成功確率ｐまたは評価値αが閾値以上の把持候補位置ＣＰのうち何れかを把持位置として選択してもよい。 [Other Modifications]
In the embodiment and each modified example described above, in the gripping position determination processing in step S108, the gripping candidate position CP with the maximum gripping success probability p or the evaluation value α is determined as the gripping position. However, the determination unit 113 does not necessarily have to determine the gripping candidate position CP with the maximum gripping success probability p or the evaluation value α as the gripping position. For example, the determination unit 113 may select, as the gripping position, either the gripping success probability p or the gripping candidate positions CP whose evaluation value α is equal to or greater than a threshold.

また、上述した実施形態および各変形例において、推測モデル２２１が、ＣＮＮによって生成された学習済みモデルである例について説明した。ただし、推測モデル２２１は、その他の深層学習アルゴリズムによって生成されてもよい。例えば、推測モデル２２１を生成するアルゴリズムは、例えば、ＲＮＮ：Recurrent Neural Network、ＧＡＮ：Generative Adversarial Network等であってもよいが、これらに限られない。 Also, in the above-described embodiment and each modified example, an example in which the inference model 221 is a trained model generated by CNN has been described. However, the inference model 221 may be generated by other deep learning algorithms. For example, the algorithm that generates the inference model 221 may be, for example, RNN: Recurrent Neural Network, GAN: Generative Adversarial Network, etc., but is not limited to these.

〔把持システム１の物理的構成〕
図１５は、把持システム１を構成する各装置の物理的構成を例示したブロック図である。 [Physical Configuration of Grasping System 1]
FIG. 15 is a block diagram illustrating the physical configuration of each device that constitutes the grasping system 1. As shown in FIG.

（制御装置１０の物理的構成）
制御装置１０は、図１５に示すように、バス１１０と、プロセッサ１０１と、主メモリ１０２と、補助メモリ１０３と、通信インタフェース１０４と、入出力インタフェース１０５とを備えたコンピュータによって構成可能である。プロセッサ１０１、主メモリ１０２、補助メモリ１０３、通信インタフェース１０４、および入出力インタフェース１０５は、バス１１０を介して互いに接続されている。入出力インタフェース１０５には、入力装置１０６および出力装置１０７が接続されている。 (Physical Configuration of Control Device 10)
The control device 10 can be configured by a computer having a bus 110, a processor 101, a main memory 102, an auxiliary memory 103, a communication interface 104, and an input/output interface 105, as shown in FIG. Processor 101 , main memory 102 , auxiliary memory 103 , communication interface 104 and input/output interface 105 are interconnected via bus 110 . An input device 106 and an output device 107 are connected to the input/output interface 105 .

プロセッサ１０１としては、例えば、ＣＰＵ（Central Processing Unit）、マイクロプロセッサ、デジタルシグナルプロセッサ、マイクロコントローラ、またはこれらの組み合わせ等が用いられる。 As the processor 101, for example, a CPU (Central Processing Unit), a microprocessor, a digital signal processor, a microcontroller, or a combination thereof is used.

主メモリ１０２としては、例えば、半導体ＲＡＭ（random access memory）等が用いられる。 As the main memory 102, for example, a semiconductor RAM (random access memory) or the like is used.

補助メモリ１０３としては、例えば、フラッシュメモリ、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、またはこれらの組み合わせ等が用いられる。補助メモリ１０３には、上述した制御装置１０の動作をプロセッサ１０１に実行させるためのプログラムが格納されている。プロセッサ１０１は、補助メモリ１０３に格納されたプログラムを主メモリ１０２上に展開し、展開したプログラムに含まれる各命令を実行する。 As the auxiliary memory 103, for example, a flash memory, a HDD (Hard Disk Drive), an SSD (Solid State Drive), or a combination thereof is used. The auxiliary memory 103 stores a program for causing the processor 101 to execute the operations of the control device 10 described above. The processor 101 expands the program stored in the auxiliary memory 103 onto the main memory 102 and executes each instruction included in the expanded program.

通信インタフェース１０４は、ネットワークに接続するインタフェースである。通信インタフェース１０４は、当該ネットワークを介して学習装置２０、ロボットアーム３０、および撮像装置４０のそれぞれとの通信を行う。 A communication interface 104 is an interface for connecting to a network. The communication interface 104 communicates with each of the learning device 20, the robot arm 30, and the imaging device 40 via the network.

入出力インタフェース１０５としては、例えば、ＵＳＢ（Universal Serial Bus）インタフェース、赤外線やBluetooth（登録商標）等の近距離通信インタフェース、またはこれらの組み合わせが用いられる。 As the input/output interface 105, for example, a USB (Universal Serial Bus) interface, a short-range communication interface such as infrared rays or Bluetooth (registered trademark), or a combination thereof is used.

入力装置１０６としては、例えば、キーボード、マウス、タッチパッド、マイク、又はこれらの組み合わせ等が用いられる。出力装置１０７としては、例えば、ディスプレイ、プリンタ、スピーカ、又はこれらの組み合わせが用いられる。 As the input device 106, for example, a keyboard, mouse, touch pad, microphone, or a combination thereof is used. A display, a printer, a speaker, or a combination thereof is used as the output device 107, for example.

この例で、プロセッサ１０１および通信インタフェース１０４は、制御部１１を実現するハードウェア要素の一例である。また、主メモリ１０２および補助メモリ１０３は、記憶部１２を実現するハードウェア要素の一例である。 In this example, processor 101 and communication interface 104 are examples of hardware elements that implement control unit 11 . Also, the main memory 102 and the auxiliary memory 103 are examples of hardware elements that implement the storage unit 12 .

（学習装置２０の物理的構成）
学習装置２０は、図１５に示すように、バス２１０と、プロセッサ２０１と、主メモリ２０２と、補助メモリ２０３と、通信インタフェース２０４とを備えたコンピュータによって構成可能である。プロセッサ２０１、主メモリ２０２、補助メモリ２０３、および通信インタフェース２０４は、バス２１０を介して互いに接続されている。 (Physical configuration of learning device 20)
The learning device 20 can be configured by a computer having a bus 210, a processor 201, a main memory 202, an auxiliary memory 203, and a communication interface 204, as shown in FIG. Processor 201 , main memory 202 , auxiliary memory 203 and communication interface 204 are interconnected via bus 210 .

プロセッサ２０１としては、例えば、ＣＰＵ（Central Processing Unit）、マイクロプロセッサ、デジタルシグナルプロセッサ、マイクロコントローラ、ＧＰＵ（Graphics Processing Unit）またはこれらの組み合わせ等が用いられる。 As the processor 201, for example, a CPU (Central Processing Unit), a microprocessor, a digital signal processor, a microcontroller, a GPU (Graphics Processing Unit), or a combination thereof is used.

主メモリ２０２としては、例えば、半導体ＲＡＭ（random access memory）等が用いられる。 As the main memory 202, for example, a semiconductor RAM (random access memory) or the like is used.

補助メモリ２０３としては、例えば、フラッシュメモリ、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、またはこれらの組み合わせ等が用いられる。補助メモリ２０３には、上述した学習装置２０の動作をプロセッサ２０１に実行させるためのプログラムが格納されている。プロセッサ２０１は、補助メモリ２０３に格納されたプログラムを主メモリ２０２上に展開し、展開したプログラムに含まれる各命令を実行する。 As the auxiliary memory 203, for example, flash memory, HDD (Hard Disk Drive), SSD (Solid State Drive), or a combination of these is used. Auxiliary memory 203 stores a program for causing processor 201 to execute the operation of learning apparatus 20 described above. The processor 201 expands the program stored in the auxiliary memory 203 onto the main memory 202 and executes each instruction included in the expanded program.

通信インタフェース２０４は、ネットワークに接続するインタフェースである。通信インタフェース２０４は、当該ネットワークを介して制御装置１０との通信を行う。 A communication interface 204 is an interface for connecting to a network. The communication interface 204 communicates with the control device 10 via the network.

この例で、プロセッサ２０１および通信インタフェース２０４は、制御部２１を実現するハードウェア要素の一例である。また、主メモリ２０２および補助メモリ２０３は、記憶部２２を実現するハードウェア要素の一例である。 In this example, the processor 201 and the communication interface 204 are examples of hardware elements that implement the controller 21 . Also, the main memory 202 and the auxiliary memory 203 are examples of hardware elements that implement the storage unit 22 .

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, but can be modified in various ways within the scope of the claims, and can be obtained by appropriately combining technical means disclosed in different embodiments. is also included in the technical scope of the present invention.

〔まとめ〕
本発明の一態様に係る制御装置は、物体を被写体として含む画像を取得する取得部と、前記画像を入力とする推測モデルを用いて、前記物体の複数の把持候補位置を推測する推測部と、前記複数の把持候補位置を参照して、把持装置に前記物体を把持させる把持位置を決定する決定部と、を備えている。〔summary〕
A control device according to an aspect of the present invention includes an acquisition unit that acquires an image including an object as a subject, and an estimation unit that estimates a plurality of gripping candidate positions of the object using an estimation model having the image as an input. and a determination unit that refers to the plurality of gripping candidate positions and determines a gripping position at which the gripping device grips the object.

上記構成により、推測モデルを用いて推測した複数の把持候補位置を参照して把持位置を決定するので、把持位置を精度よく決定することができる。 With the above configuration, the gripping position is determined by referring to a plurality of gripping candidate positions estimated using the estimation model, so the gripping position can be determined with high accuracy.

上述した一態様に係る制御装置において、前記推測モデルから出力される情報は、前記複数の把持候補位置の各々を特定する前記画像上の領域を示す情報を含み、前記決定部は、前記複数の把持候補位置の各々について、前記画像上の領域と、前記画像上で前記物体を示す物体領域との関係性に応じた評価値を算出し、算出した評価値を参照して前記把持位置を決定する、ことが好ましい。 In the control device according to the aspect described above, the information output from the inferred model includes information indicating an area on the image that specifies each of the plurality of candidate gripping positions, and the determination unit includes: For each gripping candidate position, an evaluation value is calculated according to the relationship between an area on the image and an object area representing the object on the image, and the gripping position is determined by referring to the calculated evaluation value. preferably.

上記構成により、各把持候補位置と物体との関係性が反映された評価値を参照するので、把持位置をより精度よく決定することができる。 With the above configuration, the evaluation value reflecting the relationship between each candidate gripping position and the object is referred to, so the gripping position can be determined with higher accuracy.

上述した一態様に係る制御装置において、前記推測モデルから出力される情報は、前記複数の把持候補位置の各々を特定する前記画像上の領域を示す情報を含み、前記決定部は、前記複数の把持候補位置の各々を、前記画像上の領域と、前記画像上で前記物体を示す物体領域との関係性に応じて修正し、修正後の前記複数の把持候補位置を参照して、前記把持位置を決定する、ことが好ましい。 In the control device according to the aspect described above, the information output from the inferred model includes information indicating an area on the image that specifies each of the plurality of candidate gripping positions, and the determination unit includes: Each of the candidate gripping positions is corrected according to the relationship between the area on the image and the object area representing the object on the image, and the plurality of candidate gripping positions after modification are referenced to perform the gripping. Preferably, the position is determined.

上記構成により、物体との関係性に応じて修正した各把持候補位置を参照するので、把持位置をより精度よく決定することができる。 With the above configuration, since each gripping candidate position corrected according to the relationship with the object is referred to, the gripping position can be determined with higher accuracy.

上述した一態様に係る制御装置において、前記関係性は、前記画像上の領域の面積と当該領域において前記物体領域が占める部分領域の面積との関係、前記画像上の領域が沿う第１方向と前記部分領域が沿う第２方向との関係、および前記画像上の領域の中心と前記部分領域の中心との関係、のうち一部または全部を含む、ことが好ましい。 In the control device according to the aspect described above, the relationship includes the relationship between the area of the area on the image and the area of the partial area occupied by the object area in the area, and the first direction along which the area on the image runs. It is preferable to include part or all of a relationship with the second direction along which the partial area extends, and a relationship between the center of the area on the image and the center of the partial area.

上記構成において、画像上の領域と部分領域との関係が適切でない場合、当該把持候補位置は、把持が難しい把持位置を示している可能性がある。また、第１方向および第２方向の関係が適切でない場合、当該把持候補位置は、把持が難しい把持角度を示している可能性がある。また、画像上の領域の中心と物体領域の中心との関係が適切でない場合、当該把持候補位置は、把持が難しい把持位置を示している可能性がある。したがって、上記構成により、より確実に把持を維持できる把持位置を決定することが可能となる。 In the above configuration, if the relationship between the area on the image and the partial area is not appropriate, the candidate gripping position may indicate a gripping position that is difficult to grip. Also, if the relationship between the first direction and the second direction is not appropriate, the gripping candidate position may indicate a gripping angle that is difficult to grip. Also, if the relationship between the center of the area on the image and the center of the object area is not appropriate, the gripping candidate position may indicate a gripping position that is difficult to grip. Therefore, with the above configuration, it is possible to determine the grip position at which the grip can be maintained more reliably.

上述した一態様に係る制御装置において、前記推測モデルから出力される情報は、前記複数の把持候補位置の各々に関する把持成功確率を含み、前記決定部は、前記把持成功確率を参照して前記把持位置を決定する、ことが好ましい。 In the control device according to the aspect described above, the information output from the estimation model includes a gripping success probability for each of the plurality of gripping candidate positions, and the determining unit refers to the gripping success probability to determine the gripping success rate. Preferably, the position is determined.

上記構成により、把持成功確率を参照するので、把持位置をより精度よく決定することができる。 With the above configuration, the gripping success probability is referred to, so the gripping position can be determined with higher accuracy.

本発明の一態様に係る把持システムは、上述した制御装置と、前記画像を撮像する撮像装置と、前記把持装置と、を備え、制御装置は、前記決定部が決定した把持位置において前記物体を把持するよう前記把持装置を制御する制御部をさらに備える。 A gripping system according to an aspect of the present invention includes the control device described above, an imaging device that captures the image, and the gripping device. A controller is further provided for controlling the gripping device to grip.

上記構成により、把持装置に物体をより確実に把持させることができる。 With the above configuration, the gripping device can grip the object more reliably.

上述した本発明の一態様に係る把持システムは、前記推測モデルを、機械学習により生成する学習装置、をさらに備えている。 The gripping system according to one aspect of the present invention described above further includes a learning device that generates the inference model by machine learning.

上記構成により、把持位置をより精度よく決定するための推測モデルを生成することができる。 With the above configuration, it is possible to generate an inference model for more accurately determining the gripping position.

本発明の一態様に係る方法は、１または複数のコンピュータが実行する方法であって、物体を被写体として含む画像を取得するステップと、前記画像を入力とする推測モデルを用いて、前記物体の複数の把持候補位置を推測するステップと、前記複数の把持候補位置を参照して、把持装置に前記物体を把持させる把持位置を決定するステップと、を含む。 A method according to an aspect of the present invention is a method executed by one or more computers, comprising: acquiring an image including an object as a subject; estimating a plurality of candidate gripping positions; and referring to the plurality of candidate gripping positions to determine a gripping position at which the gripping device is to grip the object.

上記構成により、上述した制御装置と同様の効果を奏する。 With the above configuration, the same effects as those of the control device described above can be obtained.

本発明の一態様に係るプログラムは、上述した制御装置として１または複数のコンピュータを機能させるためのプログラムであって、上記各部として１または複数のコンピュータを機能させる。 A program according to an aspect of the present invention is a program for causing one or more computers to function as the control device described above, and causes one or more computers to function as the above units.

１把持システム
１０制御装置
２０学習装置
１１、２１制御部
１２、２２記憶部
１１１取得部
１１２推測部
１１３決定部
２１１学習部
２２１推測モデル
３０ロボットアーム
４０撮像装置
１０１、２０１プロセッサ
１０２、２０２主メモリ
１０３、２０３補助メモリ
１０４、２０４通信インタフェース
１０５入出力インタフェース
１０６入力装置
１０７出力装置
１１０、２１０バス 1 gripping system 10 control device 20 learning devices 11 and 21 control units 12 and 22 storage unit 111 acquisition unit 112 estimation unit 113 determination unit 211 learning unit 221 estimation model 30 robot arm 40 imaging devices 101 and 201 processors 102 and 202 main memory 103 , 203 auxiliary memory 104, 204 communication interface 105 input/output interface 106 input device 107 output device 110, 210 bus

Claims

物体を被写体として含む画像を取得する取得部と、
前記画像を入力とする推測モデルを用いて、前記物体の複数の把持候補位置を推測する推測部と、
前記複数の把持候補位置を参照して、把持装置に前記物体を把持させる把持位置を決定する決定部と
を備え、
前記推測モデルから出力される情報は、前記複数の把持候補位置の各々を特定する前記画像上の領域を示す情報を含み、
前記決定部は、前記複数の把持候補位置の各々について、前記画像上の領域と、前記画像上で前記物体を示す物体領域との関係性に応じた評価値を算出し、算出した評価値を参照して前記把持位置を決定する、制御装置。 an acquisition unit that acquires an image including an object as a subject;
an estimating unit that estimates a plurality of gripping candidate positions of the object using an estimating model that receives the image;
a determining unit that refers to the plurality of candidate gripping positions and determines a gripping position that causes the gripping device to grip the object ;
the information output from the inference model includes information indicating an area on the image that identifies each of the plurality of candidate gripping positions;
The determination unit calculates an evaluation value corresponding to a relationship between an area on the image and an object area representing the object on the image for each of the plurality of candidate gripping positions, and determines the calculated evaluation value. A control device that determines the gripping position by reference .

物体を被写体として含む画像を取得する取得部と、
前記画像を入力とする推測モデルを用いて、前記物体の複数の把持候補位置を推測する推測部と、
前記複数の把持候補位置を参照して、把持装置に前記物体を把持させる把持位置を決定する決定部と、
を備え、
前記推測モデルから出力される情報は、前記複数の把持候補位置の各々を特定する前記画像上の領域を示す情報を含み、
前記決定部は、前記複数の把持候補位置の各々を、前記画像上の領域と、前記画像上で前記物体を示す物体領域との関係性に応じて修正し、修正後の前記複数の把持候補位置を参照して、前記把持位置を決定する、制御装置。 an acquisition unit that acquires an image including an object as a subject;
an estimating unit that estimates a plurality of gripping candidate positions of the object using an estimating model that receives the image;
a determining unit that refers to the plurality of candidate gripping positions and determines a gripping position that causes the gripping device to grip the object;
with
the information output from the inference model includes information indicating an area on the image that identifies each of the plurality of candidate gripping positions;
The determining unit corrects each of the plurality of candidate gripping positions according to a relationship between a region on the image and an object region representing the object on the image, and corrects the plurality of candidate gripping positions after modification. A controller that determines the gripping position with reference to position .

前記関係性は、前記画像上の領域の面積と当該領域において前記物体領域が占める部分領域の面積との関係、前記画像上の領域が沿う第１方向と前記部分領域が沿う第２方向との関係、および前記画像上の領域の中心と前記部分領域の中心との関係、のうち一部または全部を含む、
請求項１または２に記載の制御装置。 The relationship is the relationship between the area of the region on the image and the area of the partial region occupied by the object region in the region, and the relationship between the first direction along which the region on the image and the second direction along which the partial region is. including some or all of the relationship and the relationship between the center of the region on the image and the center of the partial region;
3. A control device according to claim 1 or 2 .

前記推測モデルから出力される情報は、前記複数の把持候補位置の各々に関する把持成功確率を含み、
前記決定部は、前記把持成功確率を参照して前記把持位置を決定する、
請求項１から３の何れか１項に記載の制御装置。 the information output from the inference model includes a gripping success probability for each of the plurality of candidate gripping positions;
The determination unit determines the gripping position by referring to the gripping success probability.
The control device according to any one of claims 1 to 3 .

請求項１から４の何れか１項に記載の制御装置と、
前記画像を撮像する撮像装置と、
前記把持装置と、を備えた把持システムであって、
前記制御装置は、前記決定部が決定した把持位置において前記物体を把持するよう前記把持装置を制御する制御部をさらに備える、把持システム。 A control device according to any one of claims 1 to 4 ;
an imaging device that captures the image;
A gripping system comprising the gripping device,
The gripping system, wherein the control device further includes a control section that controls the gripping device to grip the object at the gripping position determined by the determination section.

前記推測モデルを、機械学習により生成する学習装置、をさらに備えている、請求項５に記載の把持システム。 The grasping system according to claim 5 , further comprising a learning device that generates the inference model by machine learning.

１または複数のコンピュータが実行する方法であって、
物体を被写体として含む画像を取得するステップと、
前記画像を入力とする推測モデルを用いて、前記物体の複数の把持候補位置を推測するステップと、
前記複数の把持候補位置を参照して、把持装置に前記物体を把持させる把持位置を決定するステップと、を含み、
前記推測モデルから出力される情報は、前記複数の把持候補位置の各々を特定する前記画像上の領域を示す情報を含み、
前記決定するステップは、前記複数の把持候補位置の各々について、前記画像上の領域と、前記画像上で前記物体を示す物体領域との関係性に応じた評価値を算出し、算出した評価値を参照して前記把持位置を決定する、方法。 1. One or more computer-implemented methods comprising:
obtaining an image including the object as a subject;
estimating a plurality of gripping candidate positions of the object using an inference model having the image as an input;
determining a gripping position that causes a gripping device to grip the object, with reference to the plurality of gripping candidate positions ;
the information output from the inference model includes information indicating an area on the image that identifies each of the plurality of candidate gripping positions;
In the determining step, for each of the plurality of gripping candidate positions, an evaluation value is calculated according to a relationship between an area on the image and an object area representing the object on the image, and the calculated evaluation value is calculated. determining the grip position with reference to .

１または複数のコンピュータが実行する方法であって、
物体を被写体として含む画像を取得するステップと、
前記画像を入力とする推測モデルを用いて、前記物体の複数の把持候補位置を推測するステップと、
前記複数の把持候補位置を参照して、把持装置に前記物体を把持させる把持位置を決定するステップと、を含み、
前記推測モデルから出力される情報は、前記複数の把持候補位置の各々を特定する前記画像上の領域を示す情報を含み、
前記決定するステップは、前記複数の把持候補位置の各々を、前記画像上の領域と、前記画像上で前記物体を示す物体領域との関係性に応じて修正し、修正後の前記複数の把持候補位置を参照して、前記把持位置を決定する、方法。 1. One or more computer-implemented methods comprising:
obtaining an image including the object as a subject;
estimating a plurality of gripping candidate positions of the object using an inference model having the image as an input;
determining a gripping position that causes a gripping device to grip the object, with reference to the plurality of gripping candidate positions ;
the information output from the inference model includes information indicating an area on the image that identifies each of the plurality of candidate gripping positions;
The step of determining corrects each of the plurality of candidate gripping positions according to a relationship between an area on the image and an object area representing the object on the image, and corrects the plurality of gripping positions after modification. A method of determining the grip position with reference to candidate positions .

請求項１から４の何れか１項に記載の制御装置として１または複数のコンピュータを機能させるためのプログラムであって、上記各部として１または複数のコンピュータを機能させるためのプログラム。 A program for causing one or more computers to function as the control device according to any one of claims 1 to 4 , the program for causing one or more computers to function as the above units.