JP7341652B2

JP7341652B2 - Information processing device, information processing method, program, and system

Info

Publication number: JP7341652B2
Application number: JP2018207192A
Authority: JP
Inventors: 誠冨岡; 雅博鈴木; 俊広小林; 昭宏片山; 真和藤木; 一彦小林; 大輔小竹; 圭祐立野; 修一三瓶; 智行上野; 知弥子中島; 聡美永島
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-01-12
Filing date: 2018-11-02
Publication date: 2023-09-11
Anticipated expiration: 2038-11-02
Also published as: JP2019125345A; WO2019138834A1

Description

本発明は、移動体の移動制御を行う技術に関する。 The present invention relates to a technique for controlling the movement of a moving body.

例えば、搬送車（ＡＧＶ（ＡｕｔｏｍａｔｅｄＧｕｉｄｅｄＶｅｈｉｃｌｅ））、自律移動ロボット（ＡＭＲ（ＡｕｔｏｎｏｍｏｕｓＭｏｂｉｌｅＲｏｂｏｔ））といった移動体がある。それらを、例えば、工場や物流倉庫といった環境内で走行させる場合、移動体の移動制御を安定して行うため、特許文献１のように床にテープを貼り、テープを移動体に搭載したセンサで検知しながら走行させていた。 For example, there are moving objects such as guided vehicles (AGVs) and autonomous mobile robots (AMRs). For example, when running them in an environment such as a factory or distribution warehouse, in order to stably control the movement of the moving object, tape is pasted on the floor and a sensor mounted on the moving object is used as in Patent Document 1. The vehicle was running while being detected.

特開２０１０－３３４３４JP2010-33434

しかし、特許文献１の技術では、移動体を走行させる環境内で、物のレイアウト変更を行って動線が変わる度に、テープを貼り直す必要があったため、手間がかかっていた。そのような手間を減らし、安定して移動体を走行させることが求められている。 However, in the technique of Patent Document 1, it was necessary to reapply the tape every time the layout of objects was changed and the flow line changed in the environment in which the moving object was traveling, which was time-consuming. There is a need to reduce such labor and make the moving object travel stably.

本発明は、上記の課題に鑑みてなされたものであり、移動体の移動制御を、安定して行う情報処理装置を提供することを目的とする。また、その方法、及びプログラムを提供することを目的とする。 The present invention has been made in view of the above-mentioned problems, and an object of the present invention is to provide an information processing device that stably performs movement control of a moving body. It also aims to provide methods and programs for this purpose.

本発明に係る情報処理装置は以下の構成を備える。 The information processing device according to the present invention has the following configuration.

移動体に搭載された、撮像素子上の各々の受光部が２以上の受光素子によって構成される撮像手段が取得した奥行情報の入力を受け付ける入力手段と、
マップ情報を保持する保持手段と、
前記撮像手段によって取得された前記奥行情報の信頼度に基づいて補正された前記奥行情報と前記マップ情報とに基づいて前記撮像手段の位置姿勢を取得する取得手段と、
前記取得手段が取得した位置姿勢に基づいて前記移動体の移動を制御する制御値を得る制御手段。 an input means for accepting input of depth information acquired by an imaging means mounted on the moving object, each light receiving section on the image sensor being constituted by two or more light receiving elements;
a holding means for holding map information;
acquisition means for acquiring the position and orientation of the imaging means based on the map information and the depth information corrected based on the reliability of the depth information acquired by the imaging means ;
A control means for obtaining a control value for controlling movement of the mobile body based on the position and orientation acquired by the acquisition means.

本発明によれば、移動体の移動制御を安定して行うことが出来る。 According to the present invention, it is possible to stably control the movement of a moving body.

実施形態１におけるシステム構成を説明する図。1 is a diagram illustrating a system configuration in Embodiment 1. FIG. 実施形態１における機能構成を説明する図。FIG. 3 is a diagram illustrating a functional configuration in Embodiment 1. 撮像部１１０が備える撮像素子Ｄ１５０を説明する図。FIG. 3 is a diagram illustrating an image sensor D150 included in the image capturing section 110. 撮像部１１０が撮像する画像１５２ａ～１５４ｄの例を示す図。5 is a diagram showing examples of images 152a to 154d captured by the imaging unit 110. FIG. 実施形態１の装置の処理の流れを示すフローチャート。5 is a flowchart showing the flow of processing of the apparatus according to the first embodiment. 実施形態１の装置のハードウェア構成を示す図。1 is a diagram showing a hardware configuration of a device according to a first embodiment; FIG. 実施形態２におけるモーションステレオを用いた視覚情報の補正処理の手順を示すフローチャート。7 is a flowchart showing the procedure of visual information correction processing using motion stereo in Embodiment 2. FIG. 実施形態３における機能構成を説明する図。FIG. 7 is a diagram illustrating a functional configuration in Embodiment 3. 実施形態４における機能構成を説明する図。FIG. 7 is a diagram illustrating a functional configuration in Embodiment 4. 実施形態４における三次元計測装置の計測結果を用いた視覚情報の補正処理の手順を示すフローチャート。10 is a flowchart showing a procedure for correcting visual information using the measurement results of the three-dimensional measuring device in Embodiment 4. 実施形態５における物体検出および位置姿勢の算出の処理手順を示すフローチャート。12 is a flowchart showing a processing procedure for object detection and position/orientation calculation in Embodiment 5. 実施形態６における視覚情報の意味的領域分割の処理手順を示すフローチャート。12 is a flowchart showing a processing procedure for semantic region segmentation of visual information in Embodiment 6. 表示情報を提示するＧＵＩの一例を示す図。The figure which shows an example of GUI which presents display information. 実施形態８における機能構成を説明する図。FIG. 7 is a diagram illustrating a functional configuration in Embodiment 8. 実施形態８の装置の処理の流れを示すフローチャート。12 is a flowchart showing the flow of processing of the apparatus of Embodiment 8. 表示情報を提示するＧＵＩの一例を示す図。The figure which shows an example of GUI which presents display information.

以下、図面を参照しながら実施形態を説明する。なお、以下の実施形態において示す構成は一例に過ぎず、本発明は図示された構成に限定されるものではない。 Embodiments will be described below with reference to the drawings. Note that the configurations shown in the following embodiments are merely examples, and the present invention is not limited to the illustrated configurations.

［実施形態１］
本実施形態では、搬送車（ＡＧＶ（ＡｕｔｏｍａｔｅｄＧｕｉｄｅｄＶｅｈｉｃｌｅ））または、自律移動ロボット（ＡＭＲ（ＡｕｔｏｎｏｍｏｕｓＭｏｂｉｌｅＲｏｂｏｔ））等と称する移動体の移動制御について説明する。以下、移動体としてＡＧＶを例に説明するが、移動体はＡＭＲであっても良い。 [Embodiment 1]
In this embodiment, movement control of a moving body called a guided vehicle (AGV (Automated Guided Vehicle)) or an autonomous mobile robot (AMR (Autonomous Mobile Robot)) will be described. Hereinafter, an explanation will be given using an AGV as an example of a mobile body, but the mobile body may also be an AMR.

図１に、本実施形態におけるシステム構成図を示す。本実施形態における情報処理システム１は、複数の移動体１２（１２－１、１２－２、・・・）、工程管理システム１４、移動体管理システム１３から構成される。情報処理システム１は、物流システムや生産システムなどである。 FIG. 1 shows a system configuration diagram in this embodiment. The information processing system 1 in this embodiment includes a plurality of moving bodies 12 (12-1, 12-2, . . . ), a process management system 14, and a moving body management system 13. The information processing system 1 is a logistics system, a production system, or the like.

複数の移動体１２（１２－１、１２－２、・・・）は、工程管理システムで決められた工程のスケジュールに合わせて物体を搬送する搬送車（ＡＧＶ（ＡｕｔｏｍａｔｅｄＧｕｉｄｅｄＶｅｈｉｃｌｅ））である。移動体は環境内で複数台が移動（走行）している。 The plurality of moving bodies 12 (12-1, 12-2, . . . ) are transport vehicles (AGVs (Automated Guided Vehicles)) that transport objects according to a process schedule determined by a process management system. A plurality of moving objects are moving (running) in the environment.

工程管理システム１４は、情報処理システムが実行する工程を管理する。例えば、工場や物流倉庫内の工程を管理するＭＥＳ（ＭａｎｕｆａｃｔｕｒｉｎｇＥｘｅｃｕｔｉｏｎＳｙｓｔｅｍ）である。移動体管理システム３と通信を行っている。 The process management system 14 manages processes executed by the information processing system. For example, it is an MES (Manufacturing Execution System) that manages processes in factories and distribution warehouses. It is communicating with the mobile object management system 3.

移動体管理システム１３は、移動体を管理するシステムである。工程管理システム１２と通信を行っている。また、移動体とも通信（例えば、Ｗｉ－Ｆｉ通信）を行い、運行情報を双方向に送受信している。 The mobile object management system 13 is a system that manages mobile objects. It communicates with the process control system 12. It also communicates with mobile objects (for example, Wi-Fi communication) and sends and receives operation information in both directions.

図２は、本実施形態における情報処理装置１０を備える移動体１２のハードウェア構成例を示す図である。情報処理装置１０は、入力部１１１０、算出部１１２０、保持部１１３０、制御部１１４０から構成されている。入力部１１１０は、移動体１２に搭載された撮像部１１０と接続されている。制御部１１４０は、アクチュエータ１２０と接続されている。また、これらに加え、不図示の通信装置が移動体管理システム３と情報を双方向に通信を行っており、情報処理装置１０の各種手段に入出力している。但し、図２は、機器構成の一例である。 FIG. 2 is a diagram showing an example of the hardware configuration of the mobile body 12 including the information processing device 10 in this embodiment. The information processing device 10 includes an input section 1110, a calculation section 1120, a holding section 1130, and a control section 1140. The input section 1110 is connected to the imaging section 110 mounted on the moving body 12. Control unit 1140 is connected to actuator 120. In addition to these, a communication device (not shown) communicates information bidirectionally with the mobile object management system 3, and inputs and outputs information to various means of the information processing device 10. However, FIG. 2 is an example of the equipment configuration.

図３は、撮像部１１０が備える撮像素子Ｄ１５０を説明するための図である。本実施形態において、撮像部１１０は、内部に撮像素子Ｄ１５０を備えている。図３（ａ）に示すように、撮像素子Ｄ１５０にはその内部に受光部Ｄ１５１が格子状に多数配置されている。図３（ａ）には、４つの受光部を示している。各々の受光部Ｄ１５１には、その上面にマイクロレンズＤ１５３が設けられ、効率的に集光できるようになっている。従来の撮像素子は１つの受光部Ｄ１５１に対して１つの受光素子を備えているが、本実施形態における撮像部１１０が備える撮像素子Ｄ１５０では、各々の受光部Ｄ１５１は内部に複数の受光素子Ｄ１５２を備えている。 FIG. 3 is a diagram for explaining the image sensor D150 included in the image capturing section 110. In this embodiment, the imaging unit 110 includes an imaging element D150 inside. As shown in FIG. 3A, a large number of light receiving sections D151 are arranged in a grid pattern inside the image sensor D150. FIG. 3(a) shows four light receiving sections. Each light receiving section D151 is provided with a microlens D153 on its upper surface to efficiently collect light. A conventional image sensor has one light receiving element for one light receiving part D151, but in the image sensor D150 included in the imaging part 110 in this embodiment, each light receiving part D151 has a plurality of light receiving elements D152 inside. It is equipped with

図３（ｂ）は、１つの受光部Ｄ１５１に着目し、側面から見た様子を示すものである。図３（ｂ）に示すように、１つの受光部Ｄ１５１の内部に２つの受光素子Ｄ１５２ａおよび１５２ｂが備えられている。個々の受光素子Ｄ１５２ａ、Ｄ１５２ｂは互いに独立しており、受光素子Ｄ１５２ａに蓄積された電荷が受光素子Ｄ１５２ｂに移動することはなく、また逆に受光素子Ｄ１５２ｂに蓄積された電荷が受光素子Ｄ１５２ａに移動することはない。そのため、図３（ｂ）において、受光素子Ｄ１５２ａはマイクロレンズＤ１５３の右側から入射する光束を受光することになる。また逆に、受光素子Ｄ１５２ｂはマイクロレンズＤ１５３の左側から入射する光束を受光することになる。 FIG. 3B shows a side view of one light receiving portion D151. As shown in FIG. 3(b), two light receiving elements D152a and 152b are provided inside one light receiving section D151. The individual light-receiving elements D152a and D152b are independent from each other, and the charge accumulated in the light-receiving element D152a does not move to the light-receiving element D152b, and conversely, the charge accumulated in the light-receiving element D152b moves to the light-receiving element D152a. There's nothing to do. Therefore, in FIG. 3(b), the light-receiving element D152a receives the light beam incident from the right side of the microlens D153. Conversely, the light receiving element D152b receives the light beam incident from the left side of the microlens D153.

撮像部１１０は、受光素子Ｄ１５２ａに蓄積されている電荷のみを選択して画像Ｄ１５４ａを生成することができる。また同時に、撮像部１１０は、受光素子Ｄ１５２ｂに蓄積されている電荷のみを選択して画像Ｄ１５４ｂを生成することができる。画像Ｄ１５４ａはマイクロレンズ１５３の右側からの光束、画像Ｄ１５４ｂはマイクロレンズＤ１５３の左側の光束のみを選択して生成されるため、図４に示すように、画像Ｄ１５４ａと画像Ｄ１５４ｂは、互いに異なる撮影視点から撮影された画像となる。 The imaging unit 110 can generate the image D154a by selecting only the charges accumulated in the light receiving element D152a. At the same time, the imaging unit 110 can generate the image D154b by selecting only the charges accumulated in the light receiving element D152b. The image D154a is generated by selecting only the light flux from the right side of the microlens 153, and the image D154b is generated by selecting only the light flux from the left side of the microlens D153. Therefore, as shown in FIG. 4, the image D154a and the image D154b are generated from different shooting viewpoints. The image was taken from.

また、撮像部１１０が各受光部Ｄ１５１から、受光素子Ｄ１５２ａ、Ｄ１５２ｂの両方に蓄積されている電荷を用いて画像を形成する。従来の撮像素子を用いた場合と同じようにある視点から撮影した画像である画像Ｄ１５４ｅ（不図示）が得られることになる。撮像部１１０は、以上説明した原理によって、撮影視点の異なる画像Ｄ１５４ａ、Ｄ１５４ｂと、従来の画像１５４ｅを同時に撮像することができる。 Further, the imaging unit 110 forms an image using the charges accumulated in both the light receiving elements D152a and D152b from each light receiving unit D151. An image D154e (not shown), which is an image photographed from a certain viewpoint, is obtained in the same way as when a conventional image sensor is used. The imaging unit 110 can simultaneously capture images D154a and D154b from different shooting viewpoints and the conventional image 154e based on the principle explained above.

なお、各受光部Ｄ１５１は、より多くの受光素子Ｄ１５２を備えてもよく、任意の数の受光素子Ｄ１５２を設定することができる。例えば、図４（ｃ）は、受光部Ｄ１５１の内部に４つの受光素子Ｄ１５２ａ～Ｄ１５２ｄを設けた例を示している。 Note that each light receiving section D151 may include more light receiving elements D152, and an arbitrary number of light receiving elements D152 can be set. For example, FIG. 4(c) shows an example in which four light receiving elements D152a to D152d are provided inside the light receiving section D151.

撮像部１１０は、一対の画像Ｄ１５４ａ、Ｄ１５４ｂから、対応点探索を行って視差画像Ｄ１５４ｆ（不図示）を算出し、さらにその視差画像に基づいてステレオ法で対象の三次元形状を算出することができる。対応点探索やステレオ法は公知の技術であり、様々な方法を適用可能である。対応点探索には、画像の各画素の周囲の数画素をテンプレートとして類似するテンプレートを探索するテンプレートマッチング法や、画像の輝度情報の勾配からエッジやコーナーの特徴点を抽出し、特徴点の特徴が類似する点を探索する手法などを使う。ステレオ法では、２つの画像の座標系の関係を導出し、射影変換行列を導出し、三次元形状を算出する。撮像部１１０は画像Ｄ１５４ｅに加えて、画像Ｄ１５４ａ、画像Ｄ１５４ｂ、視差画像Ｄ１５４ｆ、ステレオ法によって求めたデプスマップＤ１５４ｄや三次元点群Ｄ１５４ｃを出力する機能を有している。 The imaging unit 110 calculates a parallax image D154f (not shown) by searching for corresponding points from the pair of images D154a and D154b, and further calculates the three-dimensional shape of the object using a stereo method based on the parallax image. can. Corresponding point search and stereo method are well-known techniques, and various methods can be applied. Corresponding points can be searched using the template matching method, which searches for similar templates by using several pixels around each pixel in the image as a template, or by extracting feature points at edges and corners from the gradient of the image's brightness information, and then determining the characteristics of the feature points. Use methods such as searching for points of similarity. In the stereo method, the relationship between the coordinate systems of two images is derived, a projective transformation matrix is derived, and a three-dimensional shape is calculated. In addition to the image D154e, the imaging unit 110 has a function of outputting an image D154a, an image D154b, a parallax image D154f, a depth map D154d obtained by the stereo method, and a three-dimensional point group D154c.

なお、ここで言うデプスマップとは、画像１５４ｃを構成する各画素に対して、計測対象までの距離（奥行き）と相関のある値を保持する画像を指す。通常、計測対象までの距離と相関のある値は、通常の画像として構成可能な整数値であり、焦点距離から決定される所定の係数を乗ずることで、対象までの物理的な距離（例えばミリメートル）に変換することができる。この焦点距離は、先述のように撮像部１１０の固有情報に含まれる。 Note that the depth map referred to here refers to an image that holds a value that is correlated with the distance (depth) to the measurement target for each pixel that constitutes the image 154c. Usually, the value that correlates with the distance to the measurement target is an integer value that can be constructed as a normal image, and by multiplying it by a predetermined coefficient determined from the focal length, the value that correlates with the distance to the measurement target (for example, millimeter ) can be converted to This focal length is included in the unique information of the imaging unit 110 as described above.

また、三次元点群Ｄ１５４について説明する。上記のようにデプスマップＤ１５４ｄから変換された計測対象までの物理的な距離に対して、別途設定される三次元空間中の直交座標系における原点（撮像部の光学中心）からの各軸（Ｘ，Ｙ，Ｚ）の値として設定される座標の集合である。 Also, the three-dimensional point group D154 will be explained. Each axis (X , Y, Z) is a set of coordinates set as values.

撮像部１１０は、単一の撮像素子Ｄ１５０によって視点の異なる一対の画像Ｄ１５４ａ、Ｄ１５４ｂを取得することができるため、２つ以上の撮像部を必要とする従来のステレオ法と異なり、より小型な構成によって三次元計測を実現することが可能となる。 The imaging unit 110 can acquire a pair of images D154a and D154b from different viewpoints using a single imaging device D150, so unlike the conventional stereo method which requires two or more imaging units, the imaging unit 110 has a smaller configuration. This makes it possible to realize three-dimensional measurement.

撮像部Ｄ１１０は、さらに光学系の焦点距離を制御するオートフォーカス機構および画角を制御するズーム機構を備える。オートフォーカス機構は有効あるいは無効を切り替え可能であり、設定した焦点距離を固定することができる。撮像部Ｄ１１０は、焦点および画角を制御するために設けられた光学系制御モータの回転角あるいは移動量といった駆動量によって規定される制御値を読み取り、不図示のルックアップテーブルを参照して焦点距離を算出し、出力することができる。また撮像部Ｄ１１０は、装着されたレンズから、焦点距離範囲、口径、ディストーションの係数、光学中心などのレンズの固有情報を読み取ることができる。読み取った固有情報を、後述する視差画像Ｄ１５４ｆ及びデプスマップＤ１５４ｄのレンズ歪みの補正や、三次元点群Ｄ１５４ｃの算出に用いる。 The imaging unit D110 further includes an autofocus mechanism that controls the focal length of the optical system and a zoom mechanism that controls the angle of view. The autofocus mechanism can be enabled or disabled, and the set focal length can be fixed. The imaging unit D110 reads a control value defined by a drive amount such as a rotation angle or a movement amount of an optical system control motor provided for controlling the focus and angle of view, and determines the focus by referring to a lookup table (not shown). Distance can be calculated and output. The imaging unit D110 can also read lens-specific information such as focal length range, aperture, distortion coefficient, and optical center from the attached lens. The read unique information is used to correct lens distortion of a parallax image D154f and a depth map D154d, which will be described later, and to calculate a three-dimensional point group D154c.

撮像部１１０は、画像Ｄ１５４ａ～Ｄ１５４ｂおよび視差画像Ｄ１５４ｆ、デプスマップＤ１５４ｄのレンズ歪みを補正する機能、主点位置の画像座標（以下、画像中心と表記する）および画像Ｄ１５４ａと画像Ｄ１５４ｂの基線長を出力する機能を有している。また、生成された画像１５４ａ～１５４ｃ、焦点距離、画像中心などの光学系データ、視差画像Ｄ１５４ｆ、基線長、デプスマップＤ１５４ｄ、三次元点群Ｄ１５４ｃなどの三次元計測データを出力する機能を有している。本実施形態においては、これらのデータを総称して画像情報（以下、「視覚情報」とも記載する。）と呼ぶ。撮像部１１０は、撮像部１１０が内部に備える記憶領域（不図示）に設定されたパラメータあるいは撮像部１１０外部から与えられる命令に応じて、画像情報の全部あるいは一部を選択的に出力する。 The imaging unit 110 has a function of correcting lens distortion of the images D154a to D154b, the parallax image D154f, and the depth map D154d, the image coordinates of the principal point position (hereinafter referred to as image center), and the baseline length of the images D154a and D154b. It has a function to output. It also has a function to output three-dimensional measurement data such as the generated images 154a to 154c, optical system data such as focal length and image center, parallax image D154f, baseline length, depth map D154d, and three-dimensional point group D154c. ing. In this embodiment, these data are collectively referred to as image information (hereinafter also referred to as "visual information"). The imaging unit 110 selectively outputs all or part of the image information according to parameters set in a storage area (not shown) included in the imaging unit 110 or a command given from outside the imaging unit 110.

本実施形態における移動制御とは、移動体が備えるアクチュエータであるモータ、および車輪の向きを変更するステアリングを制御することである。これらを制御することで、移動体を所定の目的地まで移動させる。また、制御値とは移動体を制御するための指令値のことである。 Movement control in this embodiment means controlling a motor that is an actuator included in a moving body and a steering wheel that changes the direction of the wheels. By controlling these, the moving body is moved to a predetermined destination. Further, the control value is a command value for controlling the moving body.

本実施形態における撮像部の位置姿勢とは、現実空間中に規定された任意の世界座標系における撮像部１１０の位置を表す３パラメータ、及び撮像部１１０の姿勢を表す３パラメータを合わせた６パラメータのことである。なお、ＡＧＶなどの移動体の設計段階で移動体の重心位置に対する撮像装置の取り付け位置を計測しておき、前述の取り付け位置姿勢を表す行列を外部メモリＨ１４に記憶しておく。撮像部の位置姿勢に対して前述の取り付け位置姿勢を表す行列を掛け合わせることでＡＧＶの重心位置を算出することができる。このため、本実施形態においては撮像部の位置姿勢を以て、ＡＧＶの位置姿勢と同義として扱うこととする。また、撮像部１１０の光軸をＺ軸、画像の水平方向をＸ軸、垂直方向をＹ軸とする撮像部上に規定される三次元の座標系を撮像部座標系と呼ぶ。 The position and orientation of the imaging unit in this embodiment is 6 parameters, including 3 parameters representing the position of the imaging unit 110 in an arbitrary world coordinate system defined in real space, and 3 parameters representing the orientation of the imaging unit 110. It is about. Note that the mounting position of the imaging device with respect to the center of gravity of the moving body is measured at the design stage of a moving body such as an AGV, and a matrix representing the above-mentioned mounting position and orientation is stored in the external memory H14. The center of gravity position of the AGV can be calculated by multiplying the position and orientation of the imaging unit by the matrix representing the above-mentioned mounting position and orientation. Therefore, in this embodiment, the position and orientation of the imaging unit are treated as synonymous with the position and orientation of the AGV. Further, a three-dimensional coordinate system defined on the imaging unit in which the optical axis of the imaging unit 110 is the Z axis, the horizontal direction of the image is the X axis, and the vertical direction is the Y axis is called an imaging unit coordinate system.

入力部１１１０は、撮像部１１０が取得する画像情報（視覚情報）として、シーンの画像の各画素に対して奥行き値を格納したデプスマップを時系列（例えば毎秒６０フレーム）に入力し、算出部１１２０に出力する。奥行き値とは、撮像部１１０とシーン内の物体との距離である。 The input unit 1110 inputs a depth map storing depth values for each pixel of the scene image in time series (for example, 60 frames per second) as image information (visual information) acquired by the imaging unit 110, and inputs the depth map in time series (for example, 60 frames per second) to the calculation unit. 1120. The depth value is the distance between the imaging unit 110 and an object in the scene.

算出部１１２０は、入力部１１１０が入力したデプスマップ、保持部１１３０が保持する位置姿勢算出の指標となるマップ情報を用いて撮像部の位置姿勢を算出し、取得する。なお、マップ情報については後述する。算出部１１２０はさらに、算出した位置姿勢を制御部１１４０に出力する。尚、算出部では、位置姿勢を出力するために必要な情報を入力部から取得して、保持部１１３０で保持しているマップ情報と比較するだけでも良い。 The calculation unit 1120 calculates and acquires the position and orientation of the imaging unit using the depth map input by the input unit 1110 and map information held by the holding unit 1130 and serving as an index for position and orientation calculation. Note that the map information will be described later. Calculation unit 1120 further outputs the calculated position and orientation to control unit 1140. Note that the calculation unit may simply acquire information necessary for outputting the position and orientation from the input unit and compare it with the map information held in the holding unit 1130.

保持部１１３０は、マップ情報としてポイントクラウドを保持する。ポイントクラウドとはシーンの三次元点群データのことである。本実施形態では、ポイントクラウドは任意の世界座標系における三次元座標（Ｘ，Ｙ，Ｚ）の三値を格納したデータリストとして保持部１１３０が保持しているものとする。三次元点群データは、三次元位置情報を示している。また、これらに加え、ＡＧＶの目的地である三次元座標と姿勢を表す目的位置姿勢を保持する。目標位置姿勢は１つでも複数あってもよいが、ここでは簡単のため目標位置姿勢が１地点である例を説明する。また、保持部１１３０はマップ情報を必要に応じて算出部１１２０に出力する。さらに、目標位置姿勢を制御部１１４０に出力する。 The holding unit 1130 holds a point cloud as map information. A point cloud is three-dimensional point cloud data of a scene. In this embodiment, the point cloud is assumed to be held by the holding unit 1130 as a data list storing three values of three-dimensional coordinates (X, Y, Z) in an arbitrary world coordinate system. The three-dimensional point group data indicates three-dimensional position information. In addition to these, the destination position and orientation representing the three-dimensional coordinates and orientation of the destination of the AGV are held. Although there may be one or more target position/orientation, here, for simplicity, an example in which the target position/orientation is one point will be described. Furthermore, the holding unit 1130 outputs map information to the calculation unit 1120 as necessary. Further, the target position and orientation are output to the control unit 1140.

制御部１１４０は、算出部１１２０が算出した撮像部１１０の位置姿勢、保持部１１３０が保持するマップ情報、および不図示の通信装置が入力した運行情報をもとにＡＧＶを制御する制御値を算出する。算出した制御値をアクチュエータ１２０へ出力する。 The control unit 1140 calculates a control value for controlling the AGV based on the position and orientation of the imaging unit 110 calculated by the calculation unit 1120, map information held by the holding unit 1130, and operation information input by a communication device (not shown). do. The calculated control value is output to the actuator 120.

図６は、情報処理装置１のハードウェア構成を示す図である。Ｈ１１はＣＰＵであり、システムバスＨ２１に接続された各種デバイスの制御を行う。Ｈ１２はＲＯＭであり、ＢＩＯＳのプログラムやブートプログラムを記憶する。Ｈ１３はＲＡＭであり、ＣＰＵであるＨ１１の主記憶装置として使用される。Ｈ１４は外部メモリであり、情報処理装置１が処理するプログラムを格納する。入力部Ｈ１５はキーボードやマウス、ロボットコントローラーであり、情報等の入力に係る処理を行う。表示部Ｈ１６はＨ１１からの指示に従って情報処理装置１の演算結果を表示装置に出力する。なお、表示装置は液晶表示装置やプロジェクタ、ＬＥＤインジケーターなど、種類は問わない。また、情報処理装置が備える表示部Ｈ１６が表示装置としての役割であってもよい。Ｈ１７は通信インターフェイスであり、ネットワークを介して情報通信を行うものであり、通信インターフェイスはイーサネット（登録商標）でもよく、ＵＳＢやシリアル通信、無線通信等種類は問わない。なお、前述した移動体管理システム１３とは通信インターフェイスＨ１７を介して情報のやり取りを行う。Ｈ１８はＩ／Ｏであり、撮像装置Ｈ１９から画像情報（視覚情報）を入力する。なお、撮像装置Ｈ１９とは前述した撮像部１１０のことである。Ｈ２０は前述したアクチュエータ１２０のことである。 FIG. 6 is a diagram showing the hardware configuration of the information processing device 1. As shown in FIG. H11 is a CPU that controls various devices connected to the system bus H21. H12 is a ROM that stores BIOS programs and boot programs. H13 is a RAM, which is used as the main storage device of H11, which is the CPU. H14 is an external memory that stores programs processed by the information processing device 1. The input unit H15 is a keyboard, a mouse, or a robot controller, and performs processing related to inputting information and the like. The display unit H16 outputs the calculation results of the information processing device 1 to the display device according to instructions from H11. Note that the display device may be of any type, such as a liquid crystal display device, a projector, or an LED indicator. Further, the display unit H16 included in the information processing device may serve as a display device. H17 is a communication interface that performs information communication via a network, and the communication interface may be Ethernet (registered trademark), and any type such as USB, serial communication, wireless communication, etc. may be used. Note that information is exchanged with the mobile object management system 13 described above via the communication interface H17. H18 is an I/O, which inputs image information (visual information) from the imaging device H19. Note that the imaging device H19 refers to the imaging section 110 described above. H20 is the actuator 120 mentioned above.

次に、本実施形態における処理手順について説明する。図５は、本実施形態における情報処理装置１０の処理手順を示すフローチャートである。以下、フローチャートは、ＣＰＵが制御プログラムを実行することにより実現されるものとする。処理ステップは、初期化Ｓ１１０、視覚情報取得Ｓ１２０、視覚情報入力Ｓ１３０、位置姿勢算出Ｓ１４０、制御値算出Ｓ１５０、ＡＧＶの制御Ｓ１６０、システム終了判定Ｓ１７０から構成されている。 Next, the processing procedure in this embodiment will be explained. FIG. 5 is a flowchart showing the processing procedure of the information processing device 10 in this embodiment. Hereinafter, it is assumed that the flowchart is realized by the CPU executing a control program. The processing steps include initialization S110, visual information acquisition S120, visual information input S130, position and orientation calculation S140, control value calculation S150, AGV control S160, and system termination determination S170.

ステップＳ１１０では、システムの初期化を行う。すなわち、外部メモリＨ１４からプログラムを読み込み、情報処理装置１０を動作可能な状態にする。また、情報処理装置１０に接続された各機器のパラメータ（撮像部１１０の内部パラメータや焦点距離）や、撮像部１１０の初期位置姿勢を前時刻位置姿勢としてＲＡＭであるＨ１３に読み込む。また、ＡＧＶの各デバイスを起動し、動作・制御可能な状態とする。これらに加え、通信Ｉ／Ｆ（Ｈ１７）を通して移動体管理システムから運行情報を受信し、ＡＧＶが向かうべき目的地の三次元座標を受信し、保持部１１３０に保持する。 In step S110, the system is initialized. That is, the program is read from the external memory H14 and the information processing device 10 is put into an operable state. Further, the parameters of each device connected to the information processing device 10 (internal parameters and focal length of the imaging unit 110) and the initial position and orientation of the imaging unit 110 are read into the RAM H13 as the previous time position and orientation. In addition, each device of the AGV is activated and brought into a state where it can be operated and controlled. In addition to these, operation information is received from the mobile object management system through the communication I/F (H17), and the three-dimensional coordinates of the destination to which the AGV should go are received and held in the holding unit 1130.

ステップＳ１２０では、撮像部１１０が視覚情報を取得し、入力部１１１０に入力する。本実施形態において視覚情報とはデプスマップのことであり、前述の方法で撮像部１１０がデプスマップを取得してあるものとする。つまり、デプスマップとは図４におけるＤ１５４ｄのことである。 In step S120, the imaging unit 110 acquires visual information and inputs it to the input unit 1110. In this embodiment, the visual information is a depth map, and it is assumed that the imaging unit 110 has acquired the depth map using the method described above. In other words, the depth map is D154d in FIG.

ステップＳ１３０では、入力部１１１０が、撮像部１１０が取得したデプスマップを取得する。なお、本実施形態においては、デプスマップは各画素の奥行き値を格納した二次元配列リストのことである。 In step S130, the input unit 1110 acquires the depth map acquired by the imaging unit 110. Note that in this embodiment, the depth map is a two-dimensional array list that stores the depth value of each pixel.

ステップＳ１４０では、算出部１１２０が、入力部１１１０が入力したデプスマップと、保持部１１３０が保持するマップ情報とを用いて撮像部１１０の位置姿勢を算出する。具体的には、まずデプスマップから撮像座標系に規定された三次元点群を算出する。画像座標（ｕ_ｔ，ｖ_ｔ）と撮像部１１０の内部パラメータ（ｆ_ｘ、ｆ_ｙ、ｃ_ｘ、ｃ_ｙ）、デプスマップの画素の奥行き値Ｄを用いて三次元点群（Ｘ_ｔ，Ｙ_ｔ，Ｚ_ｔ）を数式１により算出する。 In step S140, the calculation unit 1120 calculates the position and orientation of the imaging unit 110 using the depth map input by the input unit 1110 and the map information held by the holding unit 1130. Specifically, first, a three-dimensional point group defined in the imaging coordinate system is calculated from the depth map. _A three _- dimensional _point group ₍ _{X t} _, _Y _t , Z _t ) are calculated using Equation 1.

次に、撮像部１１０の前時刻位置姿勢を用いて三次元点群を前時刻位置姿勢座標系に座標変換する。つまり三次元点群に前時刻位置姿勢の行列を掛け合わせる。算出した三次元点群と保持部１１３０が保持するマップ情報のポイントクラウドの各三次元点の最近傍の点同士の距離の和が小さくなるように位置姿勢を算出する。具体的には、ＩＣＰ（ＩｔｅｒａｔｉｖｅＣｌｏｓｅｓｔＰｏｉｎｔ）アルゴリズムを用いて前時刻位置姿勢に対する撮像部１１０の位置姿勢を算出する。最後に、世界座標系に変換して、世界座標系における位置姿勢を制御部１１４０に出力する。なお、算出した位置姿勢はＲＡＭであるＨ１３に前時刻位置姿勢として上書きして保持する。 Next, the three-dimensional point group is coordinate-transformed into the previous time position and orientation coordinate system using the previous time position and orientation of the imaging unit 110. In other words, the three-dimensional point group is multiplied by the matrix of the position and orientation at the previous time. The position and orientation are calculated so that the sum of the distances between the calculated three-dimensional point group and the nearest points of each three-dimensional point of the point cloud of map information held by the holding unit 1130 is small. Specifically, the position and orientation of the imaging unit 110 relative to the position and orientation at the previous time are calculated using an ICP (Iterative Closest Point) algorithm. Finally, it is converted to the world coordinate system, and the position and orientation in the world coordinate system is output to the control unit 1140. Note that the calculated position and orientation are overwritten and held in the RAM H13 as the previous time position and orientation.

ステップＳ１５０では、制御部１１４０が、ＡＧＶを制御するための制御値を算出する。具体的には、保持部１１３０が保持する目的地座標と算出部１１２０が算出した撮像部１１０の位置姿勢とのユークリッド距離が小さくなるように制御値を算出する。制御部１１４０が算出した制御値をアクチュエータ１２０に出力する。 In step S150, control unit 1140 calculates a control value for controlling the AGV. Specifically, the control value is calculated so that the Euclidean distance between the destination coordinates held by the holding unit 1130 and the position and orientation of the imaging unit 110 calculated by the calculating unit 1120 becomes small. The control value calculated by the control unit 1140 is output to the actuator 120.

ステップＳ１６０では、アクチュエータ１２０が、制御部１１４０が算出した制御値を用いてＡＧＶを制御する。 In step S160, the actuator 120 controls the AGV using the control value calculated by the control unit 1140.

ステップＳ１７０では、システムを終了するか否か判定する。具体的には、保持部１１３０が保持する目的地座標と算出部１１２０が算出した撮像部１１０の位置姿勢とのユークリッド距離が所定の閾値以下であれば、目的地に到着したとして終了する。そうでなければステップＳ１２０に戻り処理を続ける。 In step S170, it is determined whether or not to terminate the system. Specifically, if the Euclidean distance between the destination coordinates held by the holding unit 1130 and the position and orientation of the imaging unit 110 calculated by the calculating unit 1120 is less than or equal to a predetermined threshold, it is determined that the destination has been reached and the process ends. Otherwise, the process returns to step S120 and continues.

実施形態１では、撮像素子上の各々の受光部が２以上の受光素子によって構成されることを特徴とする撮像部が取得したデプスマップから求める三次元点と、マップ情報であるポイントクラウドの各三次元点とを用いる。それらの三次元点の距離が最小となるように撮像部の位置姿勢を算出する。算出した撮像部の位置姿勢と、ＡＧＶの目的地との距離が最小化するように自動的にＡＧＶを制御することで、安定して、かつ手間を減らしてＡＧＶを運用することが出来る。 In the first embodiment, each light-receiving section on the image sensor is composed of two or more light-receiving elements, and each of the three-dimensional points obtained from the depth map acquired by the imaging section and the point cloud that is map information are A three-dimensional point is used. The position and orientation of the imaging unit is calculated so that the distance between these three-dimensional points is minimized. By automatically controlling the AGV so that the distance between the calculated position and orientation of the imaging unit and the destination of the AGV is minimized, the AGV can be operated stably and with less effort.

＜変形例＞
実施形態１では、撮像部１１０がデプスマップＤ１５４ｄを算出し、本情報処理装置における入力部１１１０がデプスマップを入力していた。変形例として、撮像部１１０の位置姿勢を算出できれば、入力部１１１０が入力するのは撮像部１１０が算出したデプスマップに限らない。具体的には、撮像部１１０が内部で撮像部１１０座標系におけるポイントクラウドを算出していれば、入力部１１１０が、撮像部１１０が算出したポイントクラウドを入力することができる。このとき、算出部１１２０は、入力部１１１０が入力したポイントクラウドを用いて位置姿勢算出を行うことができる。なお、撮像部１１０が算出するポイントクラウドとは、図４における三次元点群Ｄ１５４のことである。また、入力部１１１０が、撮像部１１０が取得した一対の画像Ｄ１５４ａ、Ｄ１５４ｂ、および撮像部１１０が保持する焦点距離を入力し、算出部１１２０が対応点探索およびステレオ法によってデプスマップを求めてもよい。また、それらに加えて入力部１１１０が、撮像部１１０が取得したＲＧＢ画像やグレー画像である画像を合わせて視覚情報として入力してもよい。つまり、撮像部１１０が行うデプスマップ算出を、かわりに算出部１１２０が行うこともできる。 <Modified example>
In the first embodiment, the imaging unit 110 calculates the depth map D154d, and the input unit 1110 in the information processing apparatus inputs the depth map. As a modification, as long as the position and orientation of the imaging unit 110 can be calculated, what the input unit 1110 inputs is not limited to the depth map calculated by the imaging unit 110. Specifically, if the imaging unit 110 internally calculates a point cloud in the imaging unit 110 coordinate system, the input unit 1110 can input the point cloud calculated by the imaging unit 110. At this time, the calculation unit 1120 can perform position and orientation calculation using the point cloud input by the input unit 1110. Note that the point cloud calculated by the imaging unit 110 is the three-dimensional point group D154 in FIG. 4. Further, even if the input unit 1110 inputs the pair of images D154a and D154b acquired by the imaging unit 110 and the focal length held by the imaging unit 110, and the calculation unit 1120 calculates the depth map by searching for corresponding points and using the stereo method. good. Further, in addition to these, the input unit 1110 may input an image that is an RGB image or a gray image acquired by the imaging unit 110 as visual information. In other words, the depth map calculation performed by the imaging unit 110 can be performed by the calculation unit 1120 instead.

撮像部１１０は、さらに光学系の焦点距離を制御するフォーカス制御機構を備えることができ、このフォーカス制御を本情報処理装置が制御してもよい。例えば、本情報処理装置の制御部１１４０がフォーカスを調整する制御値（フォーカス値）を算出してもよい。例えば、移動体が移動し視覚画像の見えが変わった時に、デプスマップの平均値や中央値の奥行きに合わせて撮像部１１０のフォーカスを調整する制御値を算出する。また、本情報処理装置がフォーカスを調整するのではなく、撮像部１１０内部に構成されたオートフォーカス機構が調節することもできる。フォーカスを調整することでよりピントの合った視覚情報を取得できるため高精度に位置姿勢を算出することができる。なお、撮像部１１０はフォーカス制御機能が無い構成（フォーカス固定）であってもよい。この場合には撮像部１１０はフォーカス制御機構を搭載しなくて済むため小型化できる。 The imaging unit 110 may further include a focus control mechanism that controls the focal length of the optical system, and this focus control may be controlled by the information processing apparatus. For example, the control unit 1140 of the information processing apparatus may calculate a control value (focus value) for adjusting focus. For example, when the moving object moves and the appearance of the visual image changes, a control value is calculated to adjust the focus of the imaging unit 110 in accordance with the average value or median depth of the depth map. Further, instead of the information processing apparatus adjusting the focus, an autofocus mechanism configured inside the imaging unit 110 can adjust the focus. By adjusting the focus, more focused visual information can be obtained, making it possible to calculate the position and orientation with high accuracy. Note that the imaging unit 110 may have a configuration without a focus control function (focus is fixed). In this case, the imaging unit 110 does not need to be equipped with a focus control mechanism, so it can be made smaller.

撮像部１１０は、さらに光学系のズームを制御するズーム制御機構を備えることができ、このズーム制御を本情報処理装置が行ってもよい。具体的には、移動体が高速に移動する場合には、制御部１１４０がズームを広角にして広い視野の視覚情報合を取得するようにズーム値を調整する制御値（調整値）を算出する。また、移動体を高精度に制御したく、撮像部１１０の位置姿勢を高精度に算出したい場合には、ズームを狭角にして狭い視野の視覚情報合を高解像度で取得するようにズーム値を調整する制御値（調整値）を算出する。このように必要に応じてズーム値を変えることで、安定して、高精度に撮像部１１０の位置姿勢を算出することができる。このため、安定して、高精度に移動体を制御することができる。 The imaging unit 110 may further include a zoom control mechanism that controls the zoom of the optical system, and this zoom control may be performed by the information processing apparatus. Specifically, when the moving body moves at high speed, the control unit 1140 calculates a control value (adjustment value) for adjusting the zoom value so as to widen the zoom and obtain visual information in a wide field of view. . In addition, if you want to control the moving object with high precision and calculate the position and orientation of the imaging unit 110 with high precision, the zoom value may be set to a narrow angle to obtain visual information in a narrow field of view with high resolution. Calculate the control value (adjustment value) to adjust the By changing the zoom value as necessary in this way, the position and orientation of the imaging unit 110 can be calculated stably and with high precision. Therefore, the moving body can be controlled stably and with high precision.

本実施形態においては、撮像部１１０はピンホールカメラモデルに当てはまる光学系を想定して説明したが、撮像部１１０の位置姿勢、移動体の制御を行うための視覚情報を取得することのできる光学系であればどのような光学装置（レンズ）を用いてもよい。具体的には全天周レンズや魚眼レンズでもよいし双曲面ミラーでもよい。マクロレンズを用いてもよい。例えば、全天周レンズや魚眼レンズを用いると広大な視野の奥行き値を取得でき、位置姿勢推定のロバスト性が向上する。マクロレンズを用いると詳細な位置姿勢を算出することができる。このように、使用するシーンに合わせてユーザがレンズを自由に変更（交換など）することができ、安定して、高精度に撮像部１１０の位置姿勢を算出することができる。また、安定して、高精度に移動体を制御することができる。 In the present embodiment, the imaging unit 110 has been described assuming an optical system applicable to a pinhole camera model, but an optical system that can acquire visual information for controlling the position and orientation of the imaging unit 110 and a moving body has been described. Any optical device (lens) may be used as long as it is a system. Specifically, it may be a full-sky lens, a fisheye lens, or a hyperboloid mirror. A macro lens may also be used. For example, by using an all-sky lens or a fisheye lens, it is possible to obtain depth values over a wide field of view, improving the robustness of position and orientation estimation. By using a macro lens, detailed position and orientation can be calculated. In this way, the user can freely change (exchange, etc.) the lens according to the scene in which it will be used, and the position and orientation of the imaging unit 110 can be calculated stably and with high precision. Furthermore, the moving body can be controlled stably and with high precision.

このようにズーム値や焦点距離を変える構成とする場合には、撮像部１１０が、焦点および画角を制御するために設けられた光学系制御モータの回転角あるいは移動量によって規定される制御値を読み取る。そして、不図示のルックアップテーブルを参照して焦点距離を算出する。また、レンズを変えた場合には撮像部１１０は、レンズに付与した電子接点を通してレンズに記録された焦点距離値を読み取る。また、不図示のＵＩを用いて人が撮像部１１０に焦点距離を入力することもできる。このようにして取得した焦点距離値を用いて撮像部１１０はデプスマップを算出する。そして、本情報処理装置の入力部１１０は撮像部１１０から視覚情報と合わせて焦点距離値を入力する。算出部１１２０は、入力部１１１０が入力したデプスマップ、および焦点距離値を用いて位置姿勢を算出する。また、撮像部１１０は算出した焦点距離を用いて撮像部１１０座標系におけるポイントクラウドを算出することができる。この時には本情報処理装置の入力部１１０は撮像部１１０が算出したポイントクラウドを入力し、算出部１１２０は入力部１１０が入力したポイントクラウドを用いて位置姿勢を算出する。 When the zoom value and focal length are changed in this way, the imaging unit 110 uses a control value defined by the rotation angle or movement amount of the optical system control motor provided to control the focus and angle of view. Read. Then, the focal length is calculated with reference to a lookup table (not shown). Furthermore, when the lens is changed, the imaging unit 110 reads the focal length value recorded on the lens through the electronic contact provided on the lens. Furthermore, a person can also input the focal length to the imaging unit 110 using a UI (not shown). The imaging unit 110 calculates a depth map using the focal length values obtained in this manner. Then, the input unit 110 of the information processing apparatus inputs the focal length value together with the visual information from the imaging unit 110. The calculation unit 1120 calculates the position and orientation using the depth map input by the input unit 1110 and the focal length value. Furthermore, the imaging unit 110 can calculate a point cloud in the imaging unit 110 coordinate system using the calculated focal length. At this time, the input unit 110 of the information processing device inputs the point cloud calculated by the imaging unit 110, and the calculation unit 1120 calculates the position and orientation using the point cloud input by the input unit 110.

本実施形態におけるマップ情報とは、ポイントクラウドのことであった。しかしながら、撮像部１１０の位置姿勢を算出するための指標となる情報であれば何でもよい。具体的には、ポイントクラウドの各点にさらに色情報である三値を付与した色情報付きポイントクラウドであってもよい。また、デプスマップと位置姿勢を関連付けてキーフレームとし、キーフレームを複数保持する構成としてよい。このときには、キーフレームのデプスマップと撮像部１１０が取得したデプスマップとの距離を最小化するように位置姿勢を算出する。さらに、また、入力部１１１０が画像を入力する構成であれば、算出部１１２０が入力画像をキーフレームに関連付けて保持しておいてもよい。また、さらにＡＧＶが通行可能な領域と壁などの通行不可の場所を関連付けた２Ｄマップを保持する構成としておいてもよい。なお、２Ｄマップの利用法については後述する。 The map information in this embodiment is a point cloud. However, any information may be used as long as it is an index for calculating the position and orientation of the imaging unit 110. Specifically, it may be a point cloud with color information in which each point in the point cloud is further given a ternary value as color information. Further, a configuration may be adopted in which a depth map and a position/orientation are associated with each other to form a key frame, and a plurality of key frames are held. At this time, the position and orientation are calculated so as to minimize the distance between the depth map of the key frame and the depth map acquired by the imaging unit 110. Furthermore, if the input unit 1110 is configured to input an image, the calculation unit 1120 may store the input image in association with a key frame. Furthermore, a configuration may be adopted in which a 2D map that associates areas through which the AGV can pass and places such as walls that cannot be passed through is held. Note that how to use the 2D map will be described later.

本実施形態において位置姿勢算出はＩＣＰアルゴリズムを用いた例を説明したが、位置姿勢を算出することができればどのような方法を使ってもよい。つまり、本実施形態で説明したポイントクラウドの代わりに、算出部１１２０がそれらからメッシュモデルを算出し、各面の距離が最小化するように位置姿勢を算出する方法を用いてもよい。また、デプスマップおよびポイントクラウドから不連続点となる三次元エッジを算出し、それら三次元エッジ同士の距離が最小化するように位置姿勢を算出する方法を用いてもよい。また、入力部１１１０が画像を入力する構成であれば、算出部１１２０が入力画像を更に用いて位置姿勢を算出することもできる。 In this embodiment, an example in which the ICP algorithm is used to calculate the position and orientation has been described, but any method may be used as long as the position and orientation can be calculated. That is, instead of using the point cloud described in this embodiment, a method may be used in which the calculation unit 1120 calculates a mesh model from them and calculates the position and orientation so that the distance between each surface is minimized. Alternatively, a method may be used in which three-dimensional edges that are discontinuous points are calculated from the depth map and point cloud, and the position and orientation are calculated so that the distance between these three-dimensional edges is minimized. Further, if the input unit 1110 is configured to input an image, the calculation unit 1120 can further use the input image to calculate the position and orientation.

また、ＡＧＶがジャイロやＩＭＵなどの慣性センサ、タイヤの回転量を取得するエンコーダといったセンサを備えていれば、入力部１１１０が、入力センサのセンサ値を入力する。算出部１１２０が、センサ値を併用して撮像部１１０の位置姿勢を算出することもできる。具体的にはＫａｌｍａｎＦｉｌｔｅｒや、ＶｉｓｕａｌＩｎｅｒｔｉａｌＳＬＡＭとして関連技術が公知であり、これらを援用できる。このように、撮像部１１０の視覚情報とセンサ情報を併用することで高精度にロバストに位置姿勢を算出することができる。また、ジャイロやＩＭＵなどの慣性センサを撮像部１１０が撮像する視覚情報のブレの低減に用いることができる。具体的には、鉛直方向の移動や回転を検知した場合にはＡＧＶの振動とみなし、これをキャンセルするように視覚情報を画像変形する。このようにすることで、ＡＧＶの走行時の揺れの影響を受けずに高精度に位置姿勢を算出することができる。 Furthermore, if the AGV is equipped with a sensor such as an inertial sensor such as a gyro or an IMU, or an encoder that obtains the amount of rotation of a tire, the input unit 1110 inputs the sensor value of the input sensor. The calculation unit 1120 can also calculate the position and orientation of the imaging unit 110 using sensor values. Specifically, related technologies such as Kalman Filter and Visual Inertial SLAM are known, and these can be used. In this way, the position and orientation can be calculated robustly and with high accuracy by using both the visual information of the imaging unit 110 and the sensor information. Further, an inertial sensor such as a gyro or an IMU can be used to reduce blurring of visual information captured by the imaging unit 110. Specifically, when movement or rotation in the vertical direction is detected, it is regarded as vibration of the AGV, and visual information is transformed into an image to cancel this. By doing so, the position and orientation can be calculated with high accuracy without being affected by vibrations when the AGV is running.

実施形態１では、単に目標位置姿勢と算出部１１２０が算出した位置姿勢との距離が小さくなるように、制御部１１４０が制御値を算出していた。その他、制御部１１４０は目的地に到達するための制御値を算出するものであればどのような制御値を算出しても用いてもよい。具体的には、入力幾何情報であるデプスマップの奥行き値が所定の距離未満になった場合には、制御部１１４０が例えば右に旋回するような制御値を算出する。また、保持部１１３０が保持するマップ情報でポイントクラウドが存在する部分は通行不可、存在しない空間は通行可として、算出部１１２０が動的計画法によってルートを生成し、制御部１１４０がこのルートに従って制御値を算出することもできる。このようにすることで壁に沿った行動を行うことができ壁との衝突を回避しつつ目的地まで移動することができる。また、算出部１１２０が、あらかじめマップ情報であるポイントクラウドを地面である平面に射影して２Ｄマップを作成しておく。ポイントクラウドが射影された地点は壁や障害物など通行できない地点であり、射影されていない地点は空間中に何もなく通行が可能な地点である。この情報を基に、動的計画法により目的地までのルートを生成することができる。また、算出部１１２０が、目的地に近くなるほど小さくなるような値を格納したコストマップを算出し、制御部１１４０がこれを入力として制御値を出力するように学習したニューラルネットワークである深層強化学習器を用いて制御値を算出してもよい。このように壁などの障害物を回避しつつ移動する制御値を算出することで、安定して、安全にＡＧＶを運用することができる。 In the first embodiment, the control unit 1140 simply calculates the control value so that the distance between the target position and orientation and the position and orientation calculated by the calculation unit 1120 becomes small. In addition, the control unit 1140 may calculate any control value and use it as long as it calculates a control value for reaching the destination. Specifically, when the depth value of the depth map, which is the input geometric information, becomes less than a predetermined distance, the control unit 1140 calculates a control value for turning to the right, for example. Further, in the map information held by the storage unit 1130, the calculation unit 1120 generates a route by dynamic programming, with areas where point clouds exist as impassable and areas where point clouds do not exist as passable, and the control unit 1140 follows this route. Control values can also be calculated. By doing this, you can move along the wall and move to your destination while avoiding collisions with the wall. Further, the calculation unit 1120 projects a point cloud, which is map information, onto a plane, which is the ground, to create a 2D map in advance. Points where point clouds are projected are points that cannot be passed due to walls or obstacles, and points where point clouds are not projected are points where there is nothing in space and passable points. Based on this information, a route to the destination can be generated using dynamic programming. In addition, the calculation unit 1120 calculates a cost map that stores values that become smaller as it gets closer to the destination, and the control unit 1140 uses this as input and learns to output a control value using deep reinforcement learning. The control value may be calculated using a device. By calculating the control value for moving while avoiding obstacles such as walls in this way, the AGV can be operated stably and safely.

保持部１１３０がマップ情報を保持しない構成としてもよい。具体的には、撮像部１１０が時刻ｔとその１時刻前のｔ’’に取得した視覚情報をもとに算出部１１２０が時刻ｔ’’に対する時刻ｔの位置姿勢を算出する。このように算出部１１２０が毎時刻算出する位置姿勢変化量の行列を掛け合わせることで、マップ情報が無くとも撮像部１１０の位置姿勢を算出することができる。このような構成とすることで計算資源の小さい計算機においても位置姿勢を算出し、移動体の制御を行うことができる。 A configuration may also be adopted in which the holding unit 1130 does not hold map information. Specifically, the calculation unit 1120 calculates the position and orientation at time t with respect to time t″ based on the visual information acquired by the imaging unit 110 at time t and t″ one time before. In this way, by multiplying the matrix of position and orientation changes calculated every time by the calculation unit 1120, the position and orientation of the imaging unit 110 can be calculated even without map information. With such a configuration, even a computer with small computational resources can calculate the position and orientation and control the moving body.

実施形態１では、保持部１１３０が、事前に作成したマップ情報を保持していた。しかしながら、撮像部１１０が取得した視覚情報と、算出部１１２０が算出した位置姿勢をもとにマップ情報を作成しつつ位置姿勢推定を行うＳＬＡＭ（ＳｉｍｕｌｔａｎｅｏｕｓＬｏｃａｌｉｚａｔｉｏｎａｎｄＭａｐｐｉｎｇ）の構成としてもよい。ＳＬＡＭの方法は数多く提案されておりそれらを援用できる。例えば、複数時刻に撮像部１１０が取得したポイントクラウドを時系列統合するＰｏｉｎｔ－ＢａｓｅｄＦｕｓｉｏｎアルゴリズムを用いることができる意。また、計測した奥行きの物体と空間との境界をボクセルデータとして時系列的に統合するＫｉｎｅｃｔＦｕｓｉｏｎアルゴリズムを用いることができる。この他にも、画像から検出した特徴点の奥行きにデプスセンサの奥行き値としてトラッキングしつつ、マップを生成するＲＧＢ－ＤＳＬＡＭアルゴリズム等が公知であり、これらを援用できる。また、本実施形態において、マップは同一の時間帯に生成されるものに限定されない。例えば、時間帯を変えてマップを複数生成し、それらを合成してもよい。 In the first embodiment, the holding unit 1130 held map information created in advance. However, a SLAM (Simultaneous Localization and Mapping) configuration may be used in which the position and orientation is estimated while creating map information based on the visual information acquired by the imaging unit 110 and the position and orientation calculated by the calculation unit 1120. Many SLAM methods have been proposed and can be used. For example, it is possible to use a Point-Based Fusion algorithm that chronologically integrates point clouds acquired by the imaging unit 110 at multiple times. Furthermore, it is possible to use the Kinect Fusion algorithm, which integrates the boundaries between the measured depth of objects and space as voxel data in a time-series manner. In addition to this, there are known RGB-D SLAM algorithms that generate a map while tracking the depth of feature points detected from an image as depth values of a depth sensor, and these can be used. Furthermore, in this embodiment, the maps are not limited to those generated during the same time period. For example, multiple maps may be generated for different time zones and then combined.

本実施形態において、マップ情報は、移動体１１に搭載している撮像部１１０で取得したデータから生成することに限らない。例えば、環境のＣＡＤ図面や地図画像をそのまま、あるいはデータフォーマット変換したものを保持部１１３０が保持して置いてもよい。また、ＣＡＤ図面や地図画像によるマップを初期マップとして保持部１１３０が保持しておき、前述のＳＬＡＭ技術で更新しても良い。マップの更新時刻を保持しておき、所定の時間が過ぎた地点のマップを更新するようにＡＧＶを制御する制御値を制御部１１４０が算出してもよい。マップの更新は、上書きで更新しても良いし、初期マップを保持しておき差分を更新情報として記憶しても良い。また、その際、マップをレイヤーで管理して、表示部Ｈ１６で確認したり、初期マップに戻したりすることも出来る。ディスプレイ画面を見ながら操作を行うことで利便性が向上する。 In this embodiment, the map information is not limited to being generated from data acquired by the imaging unit 110 mounted on the moving body 11. For example, the holding unit 1130 may hold a CAD drawing or a map image of the environment as it is, or a converted data format. Further, the holding unit 1130 may hold a map based on a CAD drawing or a map image as an initial map, and update it using the above-mentioned SLAM technique. The control unit 1140 may calculate a control value for controlling the AGV so that the update time of the map is held and the map is updated at a point where a predetermined time has passed. The map may be updated by overwriting, or the initial map may be retained and the differences may be stored as update information. In addition, at that time, the map can be managed in layers and checked on the display section H16, or it can be returned to the initial map. Convenience is improved by performing operations while looking at the display screen.

実施形態１では、移動体管理システム１３が設定した目的地座標を基に移動体が動作していた。一方、本情報処理装置が算出した位置姿勢や制御値を移動体管理システムに通信Ｉ／Ｆ（Ｈ１７）を通して送信することもできる。撮像部１１０が取得した視覚情報を基に算出した位置姿勢や制御値を移動体管理システム１３や工程管理システム１４が参照することで、より効率よく工程の管理、移動体の管理を行うことができる。また、常時オンラインで移動管理システムから目的地座標を取得するのであれば、保持部１１３０は目的地座標を保持しないで通信Ｉ／Ｆを介して随時受信する構成とすることもできる。 In the first embodiment, the mobile body operates based on the destination coordinates set by the mobile body management system 13. On the other hand, the position and orientation and control values calculated by this information processing device can also be transmitted to the mobile object management system through the communication I/F (H17). The moving object management system 13 and the process management system 14 refer to the position and orientation and control values calculated based on the visual information acquired by the imaging unit 110, so that processes and moving objects can be managed more efficiently. can. Further, if the destination coordinates are always acquired online from the movement management system, the holding unit 1130 may be configured to receive the destination coordinates at any time via the communication I/F without holding the destination coordinates.

実施形態１では、情報処理システム１として、工程管理システム１４が工場の全体工程を管理し、管理状況に応じて移動体管理システム１３が移動体の運行情報を管理し、運行情報に従って移動体１２が移動する構成だった。しかしながら、移動体が、撮像部１１０が取得する視覚情報を基に移動する構成であればよい。例えば、あらかじめ所定の二地点を保持部１１３０に保持しておき、それらの間を行き来するような構成とすれば工程管理システムや移動体管理システムが無くてもよい。 In the first embodiment, as the information processing system 1, the process management system 14 manages the entire process of the factory, the mobile object management system 13 manages the operation information of the mobile objects according to the management status, and the mobile object 12 according to the operation information It was configured to move. However, any configuration in which the moving body moves based on visual information acquired by the imaging unit 110 may be used. For example, if two predetermined points are held in advance in the holding unit 1130 and the system is configured to move back and forth between them, a process management system or a moving object management system may be unnecessary.

本実施形態において、移動体１２は搬送車（ＡＧＶ）に限定されるものではない。例えば、移動体１２は自動運転車、自律移動ロボットであっても良く、本実施形態で説明した移動制御をそれらに適用しても良い。 In this embodiment, the moving body 12 is not limited to a guided vehicle (AGV). For example, the mobile object 12 may be a self-driving car or an autonomous mobile robot, and the movement control described in this embodiment may be applied to them.

特に、前述の情報処理装置を自動車に搭載すれば、自動運転を実現する自動車としても用いることができる。制御部１１４０が算出した制御値を用いて自動車を移動させる。この場合には、自動車に搭載されるカーナビゲーションシステムからＩ／Ｏ（Ｈ１８）を通して目的地座標やマップ情報を取得することができる。 In particular, if the above-mentioned information processing device is installed in a car, it can also be used as a car that realizes automatic driving. The control value calculated by the control unit 1140 is used to move the car. In this case, destination coordinates and map information can be acquired from the car navigation system installed in the vehicle through I/O (H18).

また、移動体を制御するために用いるのではなく、撮像部１１０が取得した視覚情報を基に位置姿勢を算出する装置として構成してもよい。具体的には、複合現実感システムにおける現実空間と仮想物体との位置合わせ、すなわち、仮想物体の描画に利用するための現実空間における撮像部１１０の位置及び姿勢の計測に本実施形態の方法を適用することもできる。ここでは一例として、スマートフォンやタブレットに代表されるモバイル端末のディスプレイに、撮像部１１０が撮像した画像Ｄ１５４ａに３ＤＣＧモデルを位置合わせし合成して提示する例を説明する。このような用途を実現するために、入力部１１２０は、撮像装置１１１０が取得したデプスマップＤ１５２ｃに加え、画像Ｄ１５４ａを入力する。また、保持手段１１３０は仮想物体の３ＤＣＧモデルと、マップ座標系において３ＤＣＧモデルを設置する三次元位置を更に保持する。算出手段１１２０は、実施形態１で説明したように算出した撮像装置１１１０の位置姿勢を用いて、画像Ｄ１５４ａに、３ＤＣＧモデルを合成する。このようにすることで、複合現実感を体験するユーザはモバイル端末を把持し、モバイル端末のディスプレイを通じて本情報処理装置が算出した位置姿勢を基に仮想物体が重畳された現実空間を安定して観察することができる。 Further, instead of being used to control a moving object, the device may be configured as a device that calculates the position and orientation based on visual information acquired by the imaging unit 110. Specifically, the method of this embodiment is used to align the real space and the virtual object in a mixed reality system, that is, to measure the position and orientation of the imaging unit 110 in the real space for use in drawing the virtual object. It can also be applied. Here, as an example, an example will be described in which a 3DCG model is aligned and synthesized with the image D154a captured by the imaging unit 110 and presented on the display of a mobile terminal such as a smartphone or a tablet. In order to realize such a use, the input unit 1120 inputs the image D154a in addition to the depth map D152c acquired by the imaging device 1110. Further, the holding means 1130 further holds the 3DCG model of the virtual object and the 3D position where the 3DCG model is installed in the map coordinate system. The calculation unit 1120 combines the 3DCG model with the image D154a using the position and orientation of the imaging device 1110 calculated as described in the first embodiment. In this way, a user experiencing mixed reality can hold a mobile terminal and stably navigate the real space on which virtual objects are superimposed based on the position and orientation calculated by this information processing device through the display of the mobile terminal. can be observed.

［実施形態２］
実施形態１では、撮像部が取得したデプスマップを用いて撮像部の位置姿勢を算出していた。ＤＡＦ（ＤｕａｌＰｉｘｅｌＡｕｔｏＦｏｃｕｓ）による撮像部は、特に撮像部から特定の距離範囲を高精度に計測することができる。そこで、実施形態２では、撮像部からの距離が特定の範囲外であってもモーションステレオによって奥行き値を算出することで、撮像部が取得したデプスマップをさらに高精度化し、位置姿勢を安定して、高精度に算出する。 [Embodiment 2]
In the first embodiment, the position and orientation of the imaging unit is calculated using the depth map acquired by the imaging unit. An imaging unit using DAF (Dual Pixel Auto Focus) can particularly measure a specific distance range from the imaging unit with high precision. Therefore, in Embodiment 2, by calculating the depth value using motion stereo even when the distance from the imaging unit is outside a specific range, the depth map acquired by the imaging unit is further improved in accuracy and the position and orientation are stabilized. and calculate with high accuracy.

実施形態２における装置の構成は、実施形態１で説明した情報処理装置１０の構成を示す図２と同一であるため省略する。入力部１１１０が、視覚情報を保持部１１３０に入力し、保持部１１３０が視覚情報を保持することが、実施形態１と異なる。また、および算出部１１２０が、保持部１１３０が保持する視覚情報も用いてデプスマップを補正し、位置姿勢を算出する点が、実施形態１と異なる。また、保持部１１３０はあらかじめ撮像部１１０の特性情報として、撮像部１１０が取得するデプスマップの奥行き値の信頼度を関連付けたリストを保持しているものとする。奥行き値の信頼度とは、事前に撮像部１１０と平面パネルとのを所定の距離だけ離して撮影したときの実距離と計測距離との誤差の逆数を０から１の値にクリッピングした値のことである。あらかじめさまざまな距離に対して信頼度を算出してあるものとする。ただし、計測ができなかった点は信頼度を０としておく。なお、本実施形態においては撮像部１１０が取得し、入力部１１１０が入力する視覚情報とは、画像およびデプスマップであるものとする。 The configuration of the device in the second embodiment is the same as that shown in FIG. 2 showing the configuration of the information processing device 10 described in the first embodiment, so a description thereof will be omitted. This embodiment differs from the first embodiment in that the input unit 1110 inputs visual information to the holding unit 1130, and the holding unit 1130 holds the visual information. Further, this embodiment differs from the first embodiment in that the calculation unit 1120 also uses the visual information held by the holding unit 1130 to correct the depth map and calculate the position and orientation. Further, it is assumed that the holding unit 1130 holds in advance, as characteristic information of the imaging unit 110, a list in which reliability of depth values of depth maps acquired by the imaging unit 110 are associated with each other. The reliability of the depth value is the value obtained by clipping the reciprocal of the error between the actual distance and the measured distance from 0 to 1 when the imaging unit 110 and the flat panel are photographed at a predetermined distance apart. That's true. It is assumed that reliability has been calculated for various distances in advance. However, the reliability level is set to 0 for points that could not be measured. Note that in this embodiment, visual information acquired by the imaging unit 110 and inputted by the input unit 1110 is an image and a depth map.

実施形態２における処理全体の手順は、実施形態１で説明した情報処理装置１０の処理手順を示す図４と同一であるため、説明を省略する。実施形態１とは、位置姿勢算出ステップＳ１４０前にデプスマップの補正ステップが追加される点が異なる。図７は、デプスマップ補正ステップにおける処理手順の詳細を示すフローチャートである。 The overall processing procedure in the second embodiment is the same as that shown in FIG. 4 showing the processing procedure of the information processing apparatus 10 described in the first embodiment, and therefore the description thereof will be omitted. This embodiment differs from the first embodiment in that a depth map correction step is added before the position and orientation calculation step S140. FIG. 7 is a flowchart showing details of the processing procedure in the depth map correction step.

ステップＳ２１１０では、算出部１１２０が、保持部１１３０から撮像部１１０の特性情報を読み込む。 In step S2110, the calculation unit 1120 reads characteristic information of the imaging unit 110 from the holding unit 1130.

ステップＳ２１２０では、算出部１１２０が、保持部１１３０が保持する、撮像部１１０が視覚画像を取得した時刻ｔ以前の任意の時刻ｔ’に取得した視覚情報である画像とデプスマップ入力画像とを用いてモーションステレオによって奥行き値を算出する。時刻ｔ以前の任意の時刻ｔ‘に取得した視覚情報である画像を、以降、過去画像とも記載する。時刻ｔ以前の任意の時刻ｔ‘に取得したデプスマップを、以降、過去デプスマップとも記載する。モーションステレオ法は公知の技術であり、様々な方法を適用可能である。なお、二枚の画像からのモーションステレオでは奥行き値のスケールの曖昧性が残るが、これについては過去デプスマップとモーションステレオによって算出した奥行き値との比率を基に算出することができる。 In step S2120, the calculation unit 1120 uses the depth map input image and an image held by the holding unit 1130 that is visual information acquired at an arbitrary time t' before time t when the imaging unit 110 acquired the visual image. The depth value is calculated using motion stereo. An image that is visual information acquired at an arbitrary time t' before time t will hereinafter also be referred to as a past image. A depth map acquired at an arbitrary time t' before time t is hereinafter also referred to as a past depth map. The motion stereo method is a known technique, and various methods can be applied. Note that in motion stereo from two images, ambiguity in the scale of the depth value remains, but this can be calculated based on the ratio between the past depth map and the depth value calculated by motion stereo.

ステップＳ２１３０では、算出部１１２０が、ステップＳ２１１０で読み込んだ特性情報である奥行き値に関連づいた信頼度とステップＳ２１２０でモーションステレオにより算出した奥行き値を用いて重み付き和によりデプスマップを更新する。具体的には、デプスマップの各奥行き値ｄ付近の信頼度の値を重みαとすると、モーションステレオで算出した奥行き値ｍとを数式２の重み付き和で補正する。 In step S2130, the calculation unit 1120 updates the depth map by a weighted sum using the reliability associated with the depth value, which is the characteristic information read in step S2110, and the depth value calculated by motion stereo in step S2120. Specifically, when the reliability value near each depth value d of the depth map is set as a weight α, the depth value m calculated by motion stereo is corrected by the weighted sum of Equation 2.

ｄ_ｎｅｗ＝αｄ＋（１－α）ｍ・・・（数式２）
算出したｄ_ｎｅｗを用いてデプスマップを更新する。デプスマップの全ての画素の更新を終えたら、デプスマップ補正ステップを終了し、実施形態１で説明したステップＳ１５０以降の処理を続ける。 d _new =αd+(1-α)m...(Formula 2)
The depth map is updated using the calculated d _new . When all pixels of the depth map have been updated, the depth map correction step is ended, and the processing from step S150 described in the first embodiment is continued.

以上のように、実施形態２では、撮像部１１０が奥行き値を高精度に取得できる場合には撮像部１１０の取得した奥行き値の重みを大きく、そうでない場合にはモーションステレオによって算出した奥行き値の重みを大きくする。これによって撮像部１１０の計測精度が低下してもそれをモーションステレオにより補正して、高精度にデプスマップを算出することができる。 As described above, in the second embodiment, when the imaging unit 110 can acquire a depth value with high precision, the weight of the depth value acquired by the imaging unit 110 is increased, and when it cannot, the depth value calculated by motion stereo is increased. Increase the weight of As a result, even if the measurement accuracy of the imaging unit 110 decreases, it can be corrected by motion stereo and the depth map can be calculated with high accuracy.

＜変形例＞
本実施形態においては、デプスマップの補正における信頼度を、撮像部１１０が算出するデプスマップの奥行き値の計測誤差から算出し、重みαとしていた。しかしながら、撮像部１１０が取得したデプスマップとモーションステレオによって算出した奥行き値とを統合し、デプスマップを高精度化するような重みの値を算出する方法であればよい。例えば、デプスマップの奥行きの逆数に所定の係数βを積算した値を重みとする方法でもよい。また、入力画像の勾配を算出し、勾配方向と撮像部１１０における素子の配置方向との内積を重みとしてもよい。他にも、撮像部１１０における二つの画像Ｄ１５４ａとＤ１５４ｂの基線長や視差画像Ｄ１５４ｆの基線長を撮像部１１０からさらに入力部１１１０が入力し、これと、モーションステレオにおける基線長の比を重みとして用いることもできる。また、本実施形態で説明したように各画素に重みを算出するのではなく、特定の画素のみ統合する方式や、一部の画素または全画素に同一の重みを適応して重み付き和を算出してもよい。また、ある過去１時刻だけでなく、複数時刻の画像、デプスマップを用いてモーションステレオを行ってもよい。 <Modified example>
In this embodiment, the reliability in depth map correction is calculated from the measurement error of the depth value of the depth map calculated by the imaging unit 110, and is set as the weight α. However, any method may be used as long as it integrates the depth map acquired by the imaging unit 110 and the depth value calculated by motion stereo, and calculates a weight value that increases the accuracy of the depth map. For example, a method may be used in which the weight is a value obtained by multiplying the reciprocal of the depth of the depth map by a predetermined coefficient β. Alternatively, the gradient of the input image may be calculated, and the inner product of the gradient direction and the arrangement direction of the elements in the imaging unit 110 may be used as a weight. In addition, the input unit 1110 further inputs the baseline length of the two images D154a and D154b in the imaging unit 110 and the baseline length of the parallax image D154f from the imaging unit 110, and uses the ratio of this and the baseline length in motion stereo as a weight. It can also be used. In addition, instead of calculating a weight for each pixel as described in this embodiment, a method of integrating only specific pixels or applying the same weight to some pixels or all pixels to calculate a weighted sum is also available. You may. Further, motion stereo may be performed using images and depth maps not only at one past time but also at multiple times.

さらに、より精度よくロバストに位置姿勢を算出できるように視覚情報が得られるように、ＡＧＶを制御することもできる。例えば、モーションステレオの基線長が大きくなるようにＡＧＶを動かすように制御部１１４０が制御値を算出する。具体的には、遠方の所定の点を撮像部１１０でとらえたまま蛇行走行するような制御値がその一例である。これにより、モーションステレオ時の基線長が長くなるためより遠方の奥行き値を精度よく算出できる。また、撮像部１１０がより広い視野の視覚情報を得るように制御部１１４０が制御値を算出することもできる。具体的には、撮像部１１０の光学中心を中心とした見回し動作をするような制御値を算出することである。これにより、より視野の広い視覚情報を取得できるため、最適化における発散や誤差を減らして位置姿勢を算出することができる。 Furthermore, the AGV can also be controlled so that visual information can be obtained to more accurately and robustly calculate the position and orientation. For example, the control unit 1140 calculates a control value to move the AGV so that the baseline length of motion stereo increases. Specifically, one example is a control value that causes the vehicle to travel in a meandering manner while capturing a predetermined distant point with the imaging unit 110. As a result, the base line length during motion stereo becomes longer, so it is possible to accurately calculate a depth value at a farther distance. Further, the control unit 1140 can also calculate a control value so that the imaging unit 110 obtains visual information with a wider field of view. Specifically, the control value is calculated so as to perform a look-around operation centered on the optical center of the imaging unit 110. As a result, visual information with a wider field of view can be obtained, so that the position and orientation can be calculated while reducing divergence and errors in optimization.

入力部１１１０が、通信Ｉ／Ｆを通して他のＡＧＶから画像および位置姿勢を受信し、受信した画像および位置姿勢と、撮像部１１０が取得した画像とを用いてモーションステレオして奥行き値を算出することもできる。また、受信するのは視覚情報であればなんでもよく、他のＡＧＶの撮像部が取得したデプスマップや視差画像、三次元点群であってもよい。 The input unit 1110 receives an image and position/orientation from another AGV through the communication I/F, and calculates a depth value by performing motion stereo using the received image and position/orientation and the image acquired by the imaging unit 110. You can also do that. Moreover, what is received may be any visual information, such as a depth map, a parallax image, or a three-dimensional point group acquired by an imaging unit of another AGV.

［実施形態３］
実施形態１、２では、撮像部１１０が取得したシーンを撮影した視覚情報を基に位置姿勢や制御値を算出していた。しかしながら、テクスチャのない壁や柱においては奥行き精度が低下することがある。そこで実施形態３では、シーンに対して所定のパターン光を投影し、それらパターン光を撮像部１１０が取得することで、奥行き精度を向上させる。 [Embodiment 3]
In the first and second embodiments, the position and orientation and control values were calculated based on visual information obtained by photographing a scene acquired by the imaging unit 110. However, depth accuracy may decrease for walls and columns without texture. Therefore, in Embodiment 3, depth accuracy is improved by projecting predetermined pattern light onto the scene and having the imaging unit 110 acquire the pattern light.

本実施形態における情報処理装置３０の構成を図８に示す。実施形態１で説明した情報処理装置１０における制御部１１４０がさらに、投影装置３１０の制御値を算出し、出力する点が異なる。なお、本実施形態における投影装置とは、プロジェクタであり、撮像部１１０の光軸と投影装置の光軸とが一致するように取り付けられているものとする。また、投影装置３１０が投影するパターンとは、投影および非投影の領域がランダムに存在するように生成されたランダムパターンのことである。なお、本実施形態においては視覚情報とは、撮像部１１０が取得する画像Ｄ１５４ｅおよびデプスマップＤ１５４ｄであり、入力部１１１０が撮像部１１０より入力する。 FIG. 8 shows the configuration of the information processing device 30 in this embodiment. The difference is that the control unit 1140 in the information processing device 10 described in the first embodiment further calculates and outputs a control value for the projection device 310. Note that the projection device in this embodiment is a projector, and it is assumed that the projection device is installed so that the optical axis of the imaging unit 110 and the optical axis of the projection device coincide. Furthermore, the pattern projected by the projection device 310 is a random pattern generated such that projected and non-projected regions exist randomly. Note that in this embodiment, the visual information is an image D154e and a depth map D154d acquired by the imaging unit 110, and is input by the input unit 1110 from the imaging unit 110.

本実施形態における処理手順の図は、実施形態１で説明した情報処理装置１０の処理手順を説明する図５と同一であるため説明を省略する。ステップＳ１５０において、算出部１１２０が、入力視覚情報がテクスチャに乏しいか否かを表すテクスチャ度合の値を算出し、制御部１１４０がテクスチャ度合の値を基にパターン投影のＯＮ／ＯＦＦの制御値を算出する点が実施形態１と異なる。 The diagram of the processing procedure in this embodiment is the same as FIG. 5 illustrating the processing procedure of the information processing apparatus 10 described in Embodiment 1, so the explanation will be omitted. In step S150, the calculation unit 1120 calculates a texture degree value indicating whether or not the input visual information lacks texture, and the control unit 1140 calculates a pattern projection ON/OFF control value based on the texture degree value. The calculation is different from the first embodiment.

ステップＳ１５０における、制御部１１４０が算出するパターン投影の制御値の算出手順の詳細を説明する。まず、算出部１１２０が、入力画像にソーベルフィルタを畳み込み、さらにそれらの絶対値を算出して勾配画像を算出する。ソーベルフィルタは、画像の１次微分を算出するためのフィルタの一種でありさまざまな文献で公知である。算出した勾配画像のうち所定の勾配値閾値以上の画素の割合をテクスチャ度合とする。次に、制御部１１４０が、テクスチャ度合の値が所定の閾値以上であれば投影装置をＯＮに、所定の閾値未満であれば投影装置をＯＦＦにするように制御値を算出する。 Details of the procedure for calculating the pattern projection control value calculated by the control unit 1140 in step S150 will be described. First, the calculation unit 1120 convolves the input image with a Sobel filter, and further calculates their absolute values to calculate a gradient image. A Sobel filter is a type of filter for calculating the first-order differential of an image, and is known in various documents. The ratio of pixels having a predetermined gradient value or more than a predetermined gradient value threshold in the calculated gradient image is defined as the texture degree. Next, the control unit 1140 calculates a control value such that the projection device is turned on if the texture degree value is greater than or equal to a predetermined threshold, and turned off if it is less than a predetermined threshold.

以上のように、実施形態３では、シーンがテクスチャに乏しい場合には、ランダムなパターン光を投影する。これにより、シーンにランダムな模様が付加されるため、シーンがテクスチャに乏しい場合であっても撮像部がより精度よくデプスマップが取得できる。このため、精度よく位置姿勢を算出することができるようになる。 As described above, in the third embodiment, when the scene lacks texture, random pattern light is projected. As a result, a random pattern is added to the scene, so even if the scene lacks texture, the imaging unit can acquire a depth map with higher accuracy. Therefore, the position and orientation can be calculated with high accuracy.

＜変形例＞
本実施形態においては、パターン光とはランダムパターンのことであった。しかしながら、テクスチャに乏しい領域にテクスチャを付与するようなパターンであれば何でもよい。例えば、ランダムドットパターンや縞パターン（制限はや格子パターンなど）を投影してもよい。なお、縞パターンでは，変調波長内と波長外の距離を判別できないという曖昧性があるが、周波数を変えて複数時刻で取得した入力画像から奥行き値を求めるグレーコード方式を用いることで曖昧性を排除することができる。 <Modified example>
In this embodiment, the patterned light is a random pattern. However, any pattern may be used as long as it provides texture to an area lacking texture. For example, a random dot pattern or a striped pattern (limited to a lattice pattern, etc.) may be projected. Note that the striped pattern has an ambiguity in that it is not possible to distinguish between distances within the modulation wavelength and outside the wavelength, but this ambiguity can be resolved by using the Gray code method that calculates depth values from input images acquired at multiple times by changing the frequency. can be excluded.

本実施形態では、制御部１１４０が投影のＯＮ／ＯＦＦの制御値を出力し、統制装置３１０が投影の有無を切り替えていた。しかし、投影装置３１０がパターン光を投影できる構成であればこれに限らない。例えば、初期化ステップＳ１１０において電源が投入されることで投影装置３１０が投影を開始するように構成してもよい。また、シーンの任意の部分を投影装置３１０が投影するように構成してもよい。具体的には、勾配画像の勾配値が所定の閾値未満の領域にのみ投影装置３１０が投影するように、制御部１１４０が投影装置３１０の投影パターンを切り替えることもできる。実施形態５で述べる物体検出において人の目を検出し、人の目を避けてパターンを投影するよう制御値を算出することもできる。さらには、パターンのＯＮ、ＯＦＦだけでなく明度を変えてもよい。つまり、制御部１１４０が、デプスマップの奥行き値が大きい領域はより明るく投影装置３１０が投影するように制御値を算出することもできるし、入力画像の暗い部分はより明るく投影するように制御値を算出することもできる。また、算出部１１２０が位置姿勢を算出する際の繰り返し計算における誤差の残差が所定の閾値以上であればパターンを変更するような構成とすることもできる。 In this embodiment, the control unit 1140 outputs a control value for ON/OFF of projection, and the control device 310 switches the presence/absence of projection. However, the configuration is not limited to this as long as the projection device 310 can project patterned light. For example, the projection device 310 may be configured to start projection when the power is turned on in the initialization step S110. Further, the projection device 310 may be configured to project any part of the scene. Specifically, the control unit 1140 can also switch the projection pattern of the projection device 310 so that the projection device 310 projects only on an area where the gradient value of the gradient image is less than a predetermined threshold. In the object detection described in the fifth embodiment, it is also possible to detect human eyes and calculate a control value so as to project a pattern while avoiding the human eyes. Furthermore, not only the ON/OFF of the pattern but also the brightness may be changed. In other words, the control unit 1140 can calculate a control value so that the projection device 310 projects brighter an area with a large depth value in the depth map, or calculate a control value so that a dark part of the input image is projected brighter. It is also possible to calculate Alternatively, the pattern may be changed if the residual error in repeated calculations when the calculation unit 1120 calculates the position and orientation is equal to or greater than a predetermined threshold.

本実施形態では、テクスチャ度合値はソーベルフィルタによる勾配画像を用いていた。他にも、プレフィットフィルタやＳＣＨＡＲＲフィルタ、エッジ検出を行うキャニーフィルタといったフィルタによって算出した勾配画像やエッジ画像を用いてテクスチャ度合値を算出することもできる。また、画像にＤＦＴ（ｄｉｓｃｒｅｔｅＦｏｕｒｉｅｒｔｒａｎｓｆｏｒｍ：離散フーリエ変換）をかけた高周波成分をテクスチャ度合値とすることもできる。また、画像中の角といった特徴点を算出し、特徴点の個数をテクスチャ度合として用いてもよい。 In this embodiment, the texture degree value uses a gradient image obtained by a Sobel filter. In addition, the texture degree value can also be calculated using a gradient image or an edge image calculated by a filter such as a prefit filter, a SCHARR filter, or a Canny filter that performs edge detection. Furthermore, a high frequency component obtained by applying DFT (discrete Fourier transform) to an image can also be used as a texture degree value. Alternatively, feature points such as corners in the image may be calculated, and the number of feature points may be used as the texture degree.

［実施形態４］
実施形態１、２では、撮像部が取得したシーンを撮影した視覚情報を基に位置姿勢や制御値を算出していた。実施形態３では、パターン光を投影してテクスチャの乏しいシーンに対する精度向上について述べた。実施形態４では、さらに他の三次元センサが計測したシーンの位置を表す三次元情報を合わせて用いる方法について述べる。 [Embodiment 4]
In the first and second embodiments, the position and orientation and control values were calculated based on visual information obtained by photographing a scene acquired by the imaging unit. In the third embodiment, a description has been given of improving accuracy for scenes with poor texture by projecting patterned light. In the fourth embodiment, a method will be described in which three-dimensional information representing the position of the scene measured by another three-dimensional sensor is also used.

本実施形態における情報処理装置４０の構成を図９に示す。実施形態１で説明した情報処理装置１０における入力部１１１０がさらに、三次元計測装置４１０からの三次元情報を入力する点で実施形態１と異なる。なお、本実施形態における三次元計測装置４１０とは３ＤＬｉＤＡＲ（ｌｉｇｈｔｄｅｔｅｃｔｉｏｎａｎｄｒａｎｇｉｎｇ）であり、レーザパルスの往復時間により距離を測定する装置である。三次元装置が取得する計測値を入力部１１１０がポイントクラウドとして、入力部１１１０が入力する。また、保持部はあらかじめ撮像部１１０の特性情報として、撮像部１１０が取得するデプスマップの奥行き値の信頼度を関連付けたリスト、および三次元計測装置４１０の奥行き値の信頼度を関連付けたリストを保持しているものとする。これらの信頼度は実施形態２で説明した方法で撮像部１１０、および三次元計測装置４１０ともに事前に算出されているものとする。 FIG. 9 shows the configuration of the information processing device 40 in this embodiment. This embodiment differs from the first embodiment in that the input unit 1110 in the information processing apparatus 10 described in the first embodiment further inputs three-dimensional information from the three-dimensional measuring device 410. Note that the three-dimensional measurement device 410 in this embodiment is a 3DLiDAR (light detection and ranging), which is a device that measures distance based on the round trip time of a laser pulse. The input unit 1110 inputs measurement values acquired by the three-dimensional device as a point cloud. Further, the holding unit stores in advance, as characteristic information of the imaging unit 110, a list in which the reliability of the depth values of the depth map acquired by the imaging unit 110 is associated with each other, and a list in which the reliability of the depth values of the three-dimensional measuring device 410 is associated with each other. It is assumed that it is retained. It is assumed that these reliability levels are calculated in advance for both the imaging unit 110 and the three-dimensional measuring device 410 using the method described in the second embodiment.

実施形態４における処理全体の手順は、実施形態１で説明した情報処理装置１０の処理手順を示す図４と同一であるため、説明を省略する。実施形態１とは、位置姿勢算出ステップＳ１４０の前にデプスマップの補正ステップが追加される点が異なる。図１０は、デプスマップ補正ステップにおける処理手順の詳細を示すフローチャートである。 The overall processing procedure in the fourth embodiment is the same as that shown in FIG. 4 showing the processing procedure of the information processing apparatus 10 described in the first embodiment, and therefore the description thereof will be omitted. This embodiment differs from the first embodiment in that a depth map correction step is added before the position and orientation calculation step S140. FIG. 10 is a flowchart showing details of the processing procedure in the depth map correction step.

ステップＳ４１１０では、算出部１１２０が、保持部１１３０から撮像部１１０、および三次元計測装置４１０の特性情報を読み込む。 In step S4110, the calculation unit 1120 reads characteristic information of the imaging unit 110 and the three-dimensional measurement device 410 from the holding unit 1130.

ステップＳ４１２０では、算出部１１２０が、ステップＳ４１１０で読み込んだ特性情報である奥行き値に関連づいた信頼度を用いて撮像部１１０が算出したデプスマップと三次元計測装置４１０が計測したポイントクラウドとを統合する。具体的には、数式２における値ｍを三次元計測装置４１０が計測した奥行き値と置き換えることでデプスマップを更新することができる。なお、重みαは、デプスマップの信頼度をγ_Ｄ、同じ地点を指すポイントクラウドの信頼度をγ_Ｌとすると、数式３によって算出する。 In step S4120, the calculation unit 1120 calculates the depth map calculated by the imaging unit 110 using the reliability associated with the depth value, which is the characteristic information read in step S4110, and the point cloud measured by the three-dimensional measuring device 410. Integrate. Specifically, the depth map can be updated by replacing the value m in Equation 2 with the depth value measured by the three-dimensional measuring device 410. Note that the weight α is calculated using Equation 3, where γ _D is the reliability of the depth map, and γ _L is the reliability of the point cloud pointing to the same point.

算出した重みを用いて数２によりデプスマップを更新する。デプスマップの全ての画素の更新を終えたら、デプスマップ補正ステップを終了し、実施形態１で説明したステップ１５０以降の処理を続ける。 The depth map is updated using Equation 2 using the calculated weights. When all pixels of the depth map have been updated, the depth map correction step is ended, and the processing from step 150 described in the first embodiment is continued.

以上のように、実施形態４では、撮像部が奥行き値を高精度に取得できる場合には撮像部の取得した奥行き値の重みを大きく、三次元計測装置が奥行き値を高精度に取得できる場合には三次元計測装置が取得した奥行き値の重みを大きくする。これによって撮像部、および三次元計測装置のより良い精度で計測できる奥行き値を用いてデプスマップを算出することができ、高精度に位置姿勢を算出することができる。 As described above, in the fourth embodiment, when the imaging section can acquire depth values with high accuracy, the weight of the depth values acquired by the imaging section is increased, and when the three-dimensional measuring device can acquire depth values with high accuracy, the weight is increased. In this case, the weight of the depth value acquired by the three-dimensional measuring device is increased. As a result, the depth map can be calculated using the depth value that can be measured with better accuracy by the imaging unit and the three-dimensional measuring device, and the position and orientation can be calculated with high accuracy.

＜変形例＞
本実施形態においては、三次元計測装置４１０として３ＤＬｉＤＡＲを用いる方法について説明した。三次元計測装置４１０はこれに限るものでなく、撮像部１１０が取得した視覚情報を高精度化できる三次元情報が計測できるものであればよい。例えば、ＴＯＦ（ＴｉｍｅＯｆＦｌｉｇｈｔ）距離計測カメラであってもよいし、２台のカメラを備えたステレオカメラであってもよい。また、ＤＡＦによる撮像部１１０と別の単眼カメラを、撮像部１１０の光軸と一致させて配置したステレオ構成としてもよい。信頼度の特性が異なる撮像部１１０をさらに搭載し、これを三次元計測装置４１０とみなして同様にデプスマップを更新する構成でもよい。 <Modified example>
In this embodiment, a method using 3DLiDAR as the three-dimensional measurement device 410 has been described. The three-dimensional measurement device 410 is not limited to this, and may be any device that can measure three-dimensional information that can improve the accuracy of the visual information acquired by the imaging unit 110. For example, it may be a TOF (Time Of Flight) distance measurement camera or a stereo camera including two cameras. Alternatively, a stereo configuration may be used in which the DAF imaging unit 110 and another monocular camera are arranged to align with the optical axis of the imaging unit 110. A configuration may also be adopted in which an imaging unit 110 with different reliability characteristics is further installed, and this is regarded as the three-dimensional measurement device 410, and the depth map is updated in the same manner.

［実施形態５］
実施形態１、２では、撮像部１１０が取得したシーンを撮影した視覚情報を基に位置姿勢や制御値を算出していた。実施形態３では、シーンに対して所定のパターン光を投影した。実施形態４では、さらに三次元計測装置が計測した三次元形状を合わせて用いていた。実施形態５では、視覚情報から物体を検出し、これを用いて移動体の制御を行う。特に本実施形態においては、ＡＧＶは荷物を搭載して運んでおり、目的地に到着すると棚やベルトコンベアに対して所定の位置に厳密に停止しなければならない場合について述べる。本実施形態では、撮像部１１０が撮像した棚やベルトコンベアといった物体の位置姿勢を算出することで厳密な位置姿勢を算出し、ＡＧＶを制御する方法について述べる。なお、本実施形態においては特に断りが無い限り物体の特徴情報とは物体の位置姿勢のことである。 [Embodiment 5]
In the first and second embodiments, the position and orientation and control values were calculated based on visual information obtained by photographing a scene acquired by the imaging unit 110. In the third embodiment, a predetermined pattern of light was projected onto the scene. In the fourth embodiment, a three-dimensional shape measured by a three-dimensional measuring device was also used. In the fifth embodiment, an object is detected from visual information, and a moving body is controlled using this. In particular, in this embodiment, a case will be described in which the AGV is carrying cargo and must stop exactly at a predetermined position relative to a shelf or a belt conveyor upon arrival at the destination. In this embodiment, a method will be described in which the precise position and orientation of an object such as a shelf or a belt conveyor captured by the imaging unit 110 is calculated to control the AGV. Note that in this embodiment, unless otherwise specified, the feature information of an object refers to the position and orientation of the object.

実施形態５における装置の構成は、実施形態１で説明した情報処理装置１０の構成を示す図２と同一であるため省略する。なお、算出部１１２０が、さらに視覚情報から物体検出を行い、制御部１１４０が、検出した物体が視覚情報中の所定の位置に写るように移動体を制御する。さらに、保持部１１３０が物体検出のための物体モデルを保持しているとともに、ＡＧＶが目的に到着した際に物体に対してどの位置姿勢でいるべきかという物体に対する目標位置姿勢を保持している。以上の点が実施形態１と異なる。 The configuration of the device in the fifth embodiment is the same as that shown in FIG. 2 showing the configuration of the information processing device 10 described in the first embodiment, so a description thereof will be omitted. Note that the calculation unit 1120 further performs object detection from the visual information, and the control unit 1140 controls the moving body so that the detected object appears at a predetermined position in the visual information. Further, the holding unit 1130 holds an object model for detecting an object, and also holds a target position and orientation with respect to the object, which indicates in which position and orientation the AGV should be with respect to the object when it arrives at the destination. . The above points differ from the first embodiment.

物体モデルとは、物体の形状を表すＣＡＤモデルと、物体の三次元特徴点としてある法線を持った二点の三次元点群の相対位置を特徴量とするＰＰＦ（ＰｏｉｎｔＰａｉｒＦｅａｔｕｒｅ）特徴情報を格納したリストのことである。 An object model is a CAD model that represents the shape of an object, and PPF (Point Pair Feature) feature information whose features are the relative positions of a group of two three-dimensional points with a certain normal as three-dimensional feature points of the object. It is a list that stores .

実施形態５における処理全体の手順は、実施形態１で説明した情報処理装置１０の処理手順を示す図５と同一であるため、説明を省略する。ただし、実施形態１とは、位置姿勢算出ステップＳ１４０後に物体検出ステップが追加される点が異なる。図１１は、物体検出ステップの詳細を説明したフローチャートである。 The overall processing procedure in Embodiment 5 is the same as that shown in FIG. 5 showing the processing procedure of the information processing apparatus 10 described in Embodiment 1, so the explanation will be omitted. However, this embodiment differs from the first embodiment in that an object detection step is added after the position and orientation calculation step S140. FIG. 11 is a flowchart illustrating details of the object detection step.

ステップＳ５１１０では、算出部１１２０が、保持部１１３０が保持する物体モデルを読み込む。 In step S5110, calculation unit 1120 reads the object model held by holding unit 1130.

ステップＳ５１２０では、算出部１１２０が、デプスマップから物体モデルに当てはまる物体が視覚情報中のどこに写っているか検出する。具体的には、まずデプスマップからＰＰＦ特徴を算出する。そして、デプスマップから検出したＰＰＦと物体モデルのＰＰＦとをマッチングすることで、撮像部１１０に対する物***置姿勢の初期値を算出する。 In step S5120, the calculation unit 1120 detects from the depth map where an object matching the object model appears in the visual information. Specifically, first, PPF features are calculated from the depth map. Then, by matching the PPF detected from the depth map and the PPF of the object model, an initial value of the object position and orientation with respect to the imaging unit 110 is calculated.

ステップＳ１５３０では、算出部１１２０が算出した、撮像部１１０に対する物体の位置姿勢を初期位置としてさらにＩＣＰアルゴリズムにより精密に撮像部１１０に対する物体の位置姿勢を算出する。合わせて、保持部１１３０が保持する、物体に対する目標位置姿勢との残差を算出する。算出手段１１２０が、算出した残差を制御部１１４０に入力し、物体検出ステップを終了する。 In step S1530, the position and orientation of the object relative to the imaging unit 110 calculated by the calculation unit 1120 is used as an initial position, and the position and orientation of the object relative to the imaging unit 110 is further precisely calculated using the ICP algorithm. At the same time, the residual difference between the target position and orientation of the object held by the holding unit 1130 is calculated. The calculation means 1120 inputs the calculated residual to the control section 1140, and ends the object detection step.

図５におけるステップＳ１５０においては、制御部１１４０が、算出部１１２０が算出した物体の位置姿勢の残差が小さくなる方向にＡＧＶが移動するようにアクチュエータ１２０の制御値を算出する。 In step S150 in FIG. 5, the control unit 1140 calculates a control value for the actuator 120 so that the AGV moves in a direction in which the residual error of the position and orientation of the object calculated by the calculation unit 1120 becomes smaller.

実施形態５では、撮像素子上の各々の受光部が２以上の受光素子によって構成されることを特徴とする撮像部が取得したデプスマップに写る物体を検出し、モデルフィッティングにより物体の位置姿勢を算出する。そして、事前に与えられた対象物に対する位置姿勢と、検出した物体の位置姿勢との差が小さくなるようにＡＧＶを制御する。すなわち、ＡＧＶを対象物に対して厳密に位置合わせするよう制御する。このように、事前に形状が既知の物体に対する位置姿勢を算出することで、高精度に位置姿勢を算出でき、高精度にＡＧＶを制御することができる。 In the fifth embodiment, each light-receiving part on the image sensor is configured with two or more light-receiving elements. An object reflected in a depth map acquired by the imaging part is detected, and the position and orientation of the object is determined by model fitting. calculate. Then, the AGV is controlled so that the difference between the position and orientation of the object given in advance and the position and orientation of the detected object becomes small. That is, the AGV is controlled to be precisely aligned with the object. In this way, by calculating the position and orientation of an object whose shape is known in advance, the position and orientation can be calculated with high precision, and the AGV can be controlled with high precision.

＜変形例＞
本実施形態では、物体の検出にＰＰＦ特徴を用いていた。しかしながら、物体を検出することができる方法なら何でもよい。例えば、特徴量として三次元点の法線と周囲に位置する三次元点の法線との内積のヒストグラムを特徴量とするＳＨＯＴ特徴を用いてもよい。また、ある三次元点の法線ベクトルを軸とする円柱面に周囲の三次元点を投影したＳｐｉｎＩｍａｇｅを用いた特徴を用いてもよい。また、特徴量を用いずに物体を検出する方法として機械学習による学習モデルを用いることもできる。具体的には、デプスマップを入力すると物体領域が１、非物体領域が０を返すように学習したニューラルネットワークを学習モデルとして用いることができる。また、デプスマップから物体の６自由度を出力できるよう学習した学習モデルであれば、ステップＳ５１１０からＳ５１３０をまとめて物体の位置姿勢を算出してもよい。 <Modified example>
In this embodiment, PPF features were used to detect objects. However, any method that can detect objects may be used. For example, a SHOT feature whose feature quantity is a histogram of the inner product of the normal of a three-dimensional point and the normals of surrounding three-dimensional points may be used. Alternatively, a feature using a spin image in which surrounding three-dimensional points are projected onto a cylindrical surface whose axis is the normal vector of a certain three-dimensional point may be used. Furthermore, a learning model based on machine learning can also be used as a method for detecting objects without using feature amounts. Specifically, a neural network trained to return 1 for object regions and 0 for non-object regions when a depth map is input can be used as a learning model. Further, if the learning model is trained to output six degrees of freedom of the object from the depth map, the position and orientation of the object may be calculated by combining steps S5110 to S5130.

本実施形態では、物体の例として棚やベルトコンベアとしていた。しかしながらＡＧＶを停止させたときに撮像部１１０が観測でき、相対位置姿勢（相対位置、相対姿勢）が一意に定まる物体であればなんでもよい。例えば、位置姿勢の指標として工場の天井に張り付けた三次元のマーカ（具体的には、３Ｄプリンタで印刷した任意の凹凸を持つ任意形形状の物体）でもよい。また、ＡＧＶが充電式で充電ステーションに停止する場合には充電ステーションの形状の３ＤＣＡＤモデルでもよい。また、ＣＡＤモデルでなくとも、目標位置姿勢であらかじめ停止した際のデプスマップを物体モデルとして用いてもよい。このとき、ＡＧＶ運用時には保持したデプスマップと入力部１１１０が入力したデプスマップとの間の位置姿勢誤差が小さくなるようにＡＧＶを制御すればよい。このようにするとＣＡＤモデルの作成の手間なく物体モデルを生成できる。 In this embodiment, examples of objects include shelves and belt conveyors. However, any object may be used as long as it can be observed by the imaging unit 110 when the AGV is stopped and the relative position and orientation (relative position, relative orientation) can be uniquely determined. For example, it may be a three-dimensional marker (specifically, an arbitrary-shaped object with arbitrary irregularities printed by a 3D printer) attached to the ceiling of a factory as an index of position and orientation. Furthermore, if the AGV is rechargeable and stops at a charging station, a 3D CAD model of the shape of the charging station may be used. Furthermore, instead of using a CAD model, a depth map obtained when the object is stopped in advance at the target position and orientation may be used as the object model. At this time, the AGV may be controlled so that the position/orientation error between the depth map held during AGV operation and the depth map input by the input unit 1110 is reduced. In this way, an object model can be generated without the hassle of creating a CAD model.

本実施形態では、ＡＧＶの厳密な位置決めのため、位置姿勢算出のために、物体を検出しモデルフィッティングする方法を例示した。しかしながら、厳密な位置姿勢算出の目的のためだけではなく、衝突回避やほかのＡＧＶの位置姿勢検出に用いてもよい。具体的には、物体モデルとしてＡＧＶの形状のＣＡＤモデルを保持しておき、算出部１１２０が物体検出によって他のＡＧＶを見つけたとする。このとき、他のＡＧＶの座標を避けるように制御部１１４０が制御値を算出することにも使うことができる。そして、他のＡＧＶに衝突することを避ける。また、他のＡＧＶを検出した時にはアラートを提示し、他のＡＧＶに自身の進行ルートを空けるよう指令を出してもよい。他のＡＧＶが止まっていれば、バッテリー切れで停止しているとみなしてそれらに近接するように制御値を算出し、連結して充電ステーションへ移動するよう制御部１１４０が制御値を算出してもよい。また、工場において通路に配線がなされている時には、算出部１１２０が配線を検出し、制御部１１４０がそれらを踏まないように迂回するように制御値を算出してもよい。地面に凹凸がある場合には凹凸を避けるように制御値を算出してもよい。また、物体モデルごとに進入禁止や、推奨ルートといったラベルを関連付けておけば、当該物体をシーン中に配置することで容易にＡＧＶの通行可否を設定することができる。 In the present embodiment, a method of detecting an object and performing model fitting in order to accurately position the AGV and calculate the position and orientation has been exemplified. However, it may be used not only for the purpose of exact position and orientation calculation, but also for collision avoidance and other AGV position and orientation detection. Specifically, suppose that a CAD model of the shape of an AGV is held as an object model, and calculation unit 1120 finds another AGV by object detection. At this time, the control unit 1140 can also use it to calculate the control value so as to avoid the coordinates of other AGVs. And avoid colliding with other AGVs. Furthermore, when another AGV is detected, an alert may be presented and a command may be issued to the other AGV to clear its own route. If other AGVs are stopped, the control unit 1140 assumes that they are stopped due to a dead battery and calculates a control value to approach them and connect them and move to the charging station. Good too. Further, when wiring is installed in a passageway in a factory, the calculation unit 1120 may detect the wiring, and the control unit 1140 may calculate a control value so as to take a detour to avoid stepping on the wiring. If the ground is uneven, the control value may be calculated to avoid the unevenness. Furthermore, by associating labels such as prohibited entry and recommended routes with each object model, it is possible to easily set whether the AGV is allowed to pass or not by placing the object in the scene.

本実施形態においては、物体モデルとはＣＡＤモデルであった。しかしながら物体の位置姿勢を算出できればモデルは何でもよい。例えば、対象物を複数視点で撮影したステレオ画像からＳｔｒｕｃｔｕｒｅＦｒｏｍＭｏｔｉｏｎアルゴリズムによって三次元復元して生成したメッシュモデルであってもよい。また、ＲＧＢ－Ｄセンサで複数視点から撮影したデプスマップを統合して作成したポリゴンモデルであってもよい。また、前述のような物体を検出するように学習したニューラルネットワークモデルとして例えばＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）を用いてもよい。 In this embodiment, the object model was a CAD model. However, any model may be used as long as the position and orientation of the object can be calculated. For example, it may be a mesh model generated by three-dimensionally restoring stereo images of the object taken from multiple viewpoints using a Structure From Motion algorithm. Alternatively, it may be a polygon model created by integrating depth maps taken from multiple viewpoints using an RGB-D sensor. Further, for example, a CNN (Convolutional Neural Network) may be used as a neural network model trained to detect objects as described above.

ＡＧＶが運搬する物体を撮像部１１０が撮像し、算出部１１１０が認識し、搭載した物体種に応じて制御部１１４０が制御値を算出することもできる。具体的には、搭載した物体が壊れ物であればＡＧＶが低速で移動するように制御値を算出する。また、あらかじめ物体毎に目的位置姿勢を関連付けたリストを保持部１１３０に保持していれば、搭載した物体と関連づいた目的位置にＡＧＶを移動させるように制御値を算出してもよい。 It is also possible for the imaging unit 110 to image an object carried by the AGV, for the calculation unit 1110 to recognize it, and for the control unit 1140 to calculate a control value depending on the type of the mounted object. Specifically, if the mounted object is fragile, the control value is calculated so that the AGV moves at a low speed. Further, if a list in which the target position and orientation are associated with each object is stored in the holding unit 1130 in advance, a control value may be calculated to move the AGV to the target position associated with the mounted object.

また、撮像部１１０が、移動体から所定の距離範囲内で、物体を検知したら、運搬するべき物体が落下していると判断してアラートを表示することもできる。また、ＡＧＶに不図示のロボットアームを搭載していれば、ロボットアームで当該物体を取得するようにロボットアームの制御値を制御部１１４０が算出してもよい。 Further, if the imaging unit 110 detects an object within a predetermined distance from the moving body, it can determine that the object to be transported is falling and display an alert. Furthermore, if the AGV is equipped with a robot arm (not shown), the control unit 1140 may calculate a control value for the robot arm so that the robot arm acquires the object.

［実施形態６］
実施形態１～４では、撮像部１１０が取得した視覚情報を基に高精度に安定して位置姿勢を算出し、移動体の制御値を算出する方法について述べた。実施形態５では、視覚情報から物体を検出し、これを用いて移動体の制御を行う方法について述べた。実施形態６では実施形態１～５の追加機能として、入力視覚情報を領域分割した結果を用いてＡＧＶの制御やマップ情報の生成を高精度に安定して行う方法について述べる。特に、本実施形態においてはマップ情報の生成時に適応する方法を例示する。マップ情報には、時間が経過しても位置姿勢が変わらない静止物体を登録し、これらを用いて位置姿勢を算出する方がシーンの変化に対するロバスト性が向上する。そこで、視覚情報を意味的領域分割し、各画素が何の物体種であるかを判別する。そして、あらかじめ物体種毎に算出した静止物体らしさ情報を用いて階層的なマップ情報を生成する方法、およびそれらを用いた位置姿勢推定方法について述べる。なお、本実施形態においては、特に断りが無い限り物体の特徴情報とは物体の種類のことである。 [Embodiment 6]
In the first to fourth embodiments, a method has been described in which the position and orientation of the moving body are calculated stably with high accuracy based on the visual information acquired by the imaging unit 110, and the control value of the moving body is calculated. In the fifth embodiment, a method was described in which an object is detected from visual information and a moving object is controlled using this. Embodiment 6 will describe, as an additional function of Embodiments 1 to 5, a method for stably controlling an AGV and generating map information with high precision using the results of region segmentation of input visual information. In particular, this embodiment exemplifies a method that is applied when generating map information. Registering stationary objects whose position and orientation do not change over time in the map information and calculating the position and orientation using these objects improves robustness against changes in the scene. Therefore, visual information is divided into semantic regions to determine what type of object each pixel represents. Next, we will describe a method of generating hierarchical map information using stationary object likelihood information calculated in advance for each object type, and a position and orientation estimation method using them. Note that in this embodiment, unless otherwise specified, the feature information of an object refers to the type of the object.

実施形態６における装置の構成は、実施形態１で説明した情報処理装置１０の構成を示す図２と同一であるため省略する。なお、算出部１１２０が、さらに視覚情報を意味的領域分割し、それらを用いて階層的にマップ情報を生成する。本実施形態における階層的マップ情報とは、（１）工場のレイアウトＣＡＤモデル、（２）静止物体マップ、（３）什器マップ、（４）動く物体マップの４レイヤーで構成されるポイントクラウドである。なお、保持部１１３０は外部メモリＨ１４に（１）工場のレイアウトＣＡＤモデルを保持しておく。また、階層的に作成したマップ情報を用いて位置姿勢を算出する。位置姿勢算出方法については後述する。なお、本実施形態においては撮像部１１０が取得し、入力部１１１０が入力する視覚情報とは、画像およびデプスマップであるものとする。また、保持部１１３０は、画像を入力すると、各画素が該当する物体か否かを表すマスク画像を物体種ごとに出力するように学習された学習モデルであるＣＮＮを合わせて保持する。その学習モデルとともに、各物体種が各階層（２）～（４）どれに当てはまるかのルックアップテーブルを保持しており、物体種を指定するとどの階層の物体種であるかが判明する。 The configuration of the device in the sixth embodiment is the same as that shown in FIG. 2 showing the configuration of the information processing device 10 described in the first embodiment, so a description thereof will be omitted. Note that the calculation unit 1120 further divides the visual information into semantic regions and uses them to hierarchically generate map information. The hierarchical map information in this embodiment is a point cloud composed of four layers: (1) factory layout CAD model, (2) stationary object map, (3) fixtures map, and (4) moving object map. . Note that the holding unit 1130 holds (1) a factory layout CAD model in the external memory H14. Furthermore, the position and orientation are calculated using map information created hierarchically. The position and orientation calculation method will be described later. Note that in this embodiment, visual information acquired by the imaging unit 110 and inputted by the input unit 1110 is an image and a depth map. The holding unit 1130 also holds a CNN, which is a learning model trained to output a mask image indicating whether each pixel is a corresponding object for each object type when an image is input. Along with the learning model, it maintains a lookup table that shows which of the layers (2) to (4) each object type applies to, and when an object type is specified, it becomes clear which layer the object type belongs to.

実施形態６における処理全体の手順の図は、実施形態１で説明した情報処理装置１０の処理手順を示す図４と同一であるため説明を省略する。算出部１１２０が位置姿勢を算出する際に、保持部１１３０が保持するマップ情報のレイヤーを考慮して位置姿勢を算出する点が実施形態１と異なる。また、実施形態１とは、位置姿勢算出ステップＳ１４０後に領域分割・マップ生成ステップが追加される点が異なる。これらの処理の詳細は後述する。 The diagram of the overall processing procedure in the sixth embodiment is the same as FIG. 4 showing the processing procedure of the information processing apparatus 10 described in the first embodiment, so the explanation will be omitted. This embodiment differs from the first embodiment in that when calculating the position and orientation, the calculation unit 1120 calculates the position and orientation in consideration of the layers of map information held by the storage unit 1130. Further, this embodiment differs from the first embodiment in that an area division/map generation step is added after the position/orientation calculation step S140. Details of these processes will be described later.

ステップＳ１４０では、算出部１１２０が、保持部１１３０が保持するマップ情報のレイヤーごとに、位置姿勢算出の寄与度となる重みをポイントクラウドに付与して位置姿勢を算出する。具体的には、本実施形態における例のように（１）～（４）のレイヤーを保持する場合には、より動きにくいマップ情報である（１）から（４）にかけて順次重みを小さくする。 In step S140, the calculation unit 1120 calculates the position and orientation of each layer of the map information held by the holding unit 1130 by assigning weights to the point cloud, which are the degrees of contribution to the calculation of the position and orientation. Specifically, when retaining layers (1) to (4) as in the example in this embodiment, the weight is sequentially decreased from (1) to (4), which are map information that is less likely to move.

図１２は、領域分割・マップ生成ステップの詳細を説明したフローチャートである。この領域分割・マップ生成ステップは図５における位置姿勢算出ステップＳ１４０直後に追加され、実行される。 FIG. 12 is a flowchart illustrating details of the area division/map generation step. This area division/map generation step is added and executed immediately after the position/orientation calculation step S140 in FIG. 5.

ステップＳ６１１０では、算出部１１２０が入力画像を意味的領域分割する。意味的領域分割は多数の手法が提案されており、これらを援用できる。ただし、画像を意味的領域分割する方法であれば、上記方法に限るものではない。これらの方法を用いて、物体種ごとに各画素に当該物体か否かを割り当てたマスク画像を得る。 In step S6110, the calculation unit 1120 divides the input image into semantic regions. Many methods have been proposed for semantic region segmentation, and these can be used. However, the method is not limited to the above method as long as it divides the image into semantic regions. Using these methods, a mask image is obtained in which each pixel is assigned whether or not it is the object for each object type.

ステップＳ６１２０では、デプスマップを領域分割する。具体的には、まずデプスマップの各画素に対して法線を算出し、周囲の法線との内積が所定の値以下となる法線のエッジを検出する。そして法線のエッジを境界としてそれぞれの領域に異なるラベルを割り振ることでデプスマップを領域分割し、領域分割画像を得る。 In step S6120, the depth map is divided into regions. Specifically, a normal line is first calculated for each pixel of the depth map, and an edge of the normal line whose inner product with surrounding normal lines is less than or equal to a predetermined value is detected. Then, the depth map is divided into regions by assigning different labels to each region using the edges of the normal lines as boundaries, and a region-divided image is obtained.

ステップＳ６１３０では、算出部１１２０が、入力画像を意味的領域分割したマスク画像と、デプスマップを領域分割した領域分割画像を基にポイントクラウドの意味的領域分割を行う。具体的には、各デプスマップの領域Ｓ_Ｄｊと各マスクの物体領域Ｓ_Ｍｊの包含関係の割合Ｎ_ｉ，ｊを数式４により算出する。なお、ｉは物体種、ｊはデプスマップの領域分割のラベルである。 In step S6130, the calculation unit 1120 performs semantic region segmentation of the point cloud based on a mask image obtained by semantically segmenting the input image and a region segmented image obtained by segmenting the depth map. Specifically, the ratio N _i,j of the inclusion relationship between the region S _Dj of each depth map and the object region S _Mj of each mask is calculated using Equation 4. Note that i is the object type, and j is the label for region division of the depth map.

次に、Ｎ_ｉ，ｊが所定の閾値以上であるデプスマップの領域Ｓ_Ｄｊに物体種ｉを割り当てる。ただし、物体種が割り当てられていない画素は背景ラベルを割り当てておく。以上により、デプスマップの各画素に物体種ｉを割り当てる。 Next, the object type i is assigned to the region S _Dj of the depth map where N _i,j is equal to or greater than a predetermined threshold. However, pixels to which no object type is assigned are assigned a background label. As described above, object type i is assigned to each pixel of the depth map.

ステップＳ６１４０では、算出部１１２０が、ステップＳ６１３０でデプスマップに割り当てられた物体種ラベルを基に階層的にマップ情報を生成する。具体的には、デプスマップの物体種ラベルごとにルックアップテーブルを参照し、保持部１１３０が保持するマップ情報の各レイヤーにデプスマップから求まる三次元点群を保存する。保存が完了したら、領域分割・マップ生成ステップを終了する。 In step S6140, the calculation unit 1120 generates map information hierarchically based on the object type label assigned to the depth map in step S6130. Specifically, the lookup table is referred to for each object type label of the depth map, and the three-dimensional point group found from the depth map is stored in each layer of map information held by the storage unit 1130. When the storage is completed, the area segmentation/map generation step is completed.

以上のように、実施形態６では、デプスマップを意味的領域分割することで、位置姿勢算出に適切な動かない物体と、位置姿勢算出に不適な、動く物体とを分けてマップ情報に登録することができる。また、分割したマップ情報を用いて、より動く物体程小さくなるように重みを割り当てる。そして、割り当てた重みに従って位置姿勢を算出する。このようすることで、より安定してロバストに位置姿勢を算出することができる。 As described above, in the sixth embodiment, by semantically dividing the depth map into regions, stationary objects that are suitable for position and orientation calculations and moving objects that are unsuitable for position and orientation calculation are registered separately in map information. be able to. Furthermore, using the divided map information, weights are assigned so that the more moving the object, the smaller the weight. Then, the position and orientation are calculated according to the assigned weights. By doing so, the position and orientation can be calculated more stably and robustly.

＜変形例＞
本実施形態では（１）～（４）のレイヤーを用いていた。しかしながら、物体の移動具合に応じて複数レイヤーを持つ構成であればよく、（１）～（４）の任意の個数のレイヤーのみ保持部１１３０が保持する構成であってよい。また、これら以外にも特定の物体（ＡＧＶレイヤー、人レイヤー）や、柱レイヤー、ランドマーク（３Ｄマーカや充電ステーション）レイヤーを保持する構成としてもよい。 <Modified example>
In this embodiment, layers (1) to (4) were used. However, it is only necessary to have a plurality of layers depending on the movement of the object, and the holding unit 1130 may hold only an arbitrary number of layers (1) to (4). Further, in addition to these, a configuration may also be adopted in which specific objects (AGV layer, person layer), pillar layer, and landmark (3D marker or charging station) layer are held.

本実施形態においては、意味的領域分割したデプスマップを用いてマップ情報生成、および位置姿勢を算出した。一方、意味的領域分割したデプスマップを用いて、制御部１１４０が制御値を算出してもよい。具体的には、意味的領域分割したときに人やほかのＡＧＶが検出された場合には、それらを避けるように制御部１１４０が制御値を算出することができる。このようにすることで、安全にＡＧＶを運用できる。また、人や他のＡＧＶの後ろをついていくような制御値を制御部１１４０が算出してもよい。このようにすることで、マップ情報が無くともＡＧＶが動作することができるようになる。さらに、算出部１１２０が、意味的領域分割結果を基に人のジェスチャーを認識し、制御部１１４０が制御値を算出してもよい。具体的には、例えば人の腕や指、頭、胴体、足などのパーツごとに画像の領域をラベリングし、それらの相互位置関係を基にジェスチャーを認識する。人の手招きジェスチャーを認識したら人の近くに移動するように制御値を算出する、人の指差しジェスチャーを認識したら指差した方向に移動するよう制御値を算出する。このように、人のジェスチャーを認識することで、ユーザが直接コントローラなどを用いてＡＧＶをコントロールせずとも移動させることができるようになるため、手間なくＡＧＶを運用できる。 In this embodiment, map information is generated and the position and orientation are calculated using a depth map that has been segmented into semantic regions. On the other hand, the control unit 1140 may calculate the control value using the depth map obtained by semantically dividing the regions. Specifically, if humans or other AGVs are detected during semantic region segmentation, the control unit 1140 can calculate a control value to avoid them. By doing so, the AGV can be operated safely. Alternatively, the control unit 1140 may calculate a control value that allows the vehicle to follow a person or other AGV. By doing so, the AGV can operate even without map information. Furthermore, the calculation unit 1120 may recognize human gestures based on the semantic region segmentation results, and the control unit 1140 may calculate the control value. Specifically, regions of the image are labeled for each part, such as a person's arms, fingers, head, torso, and legs, and gestures are recognized based on their relative positions. When it recognizes a person's beckoning gesture, it calculates a control value to move closer to the person, and when it recognizes a person's pointing gesture, it calculates a control value to move in the direction pointed. In this way, by recognizing human gestures, the user can move the AGV without directly controlling the AGV using a controller or the like, so the AGV can be operated without any hassle.

本実施形態の方法で検出した物体種に応じて制御部１１４０が制御値を算出してもよい。具体的には、物体種が人であれば止まる、物体種がほかのＡＧＶであれば避ける、というように制御する。これにより、衝突を避けなければならない人にぶつからないように安全に、人でないものは避けて効率よくＡＧＶを運用できるようになる。 The control unit 1140 may calculate the control value according to the object type detected by the method of this embodiment. Specifically, the control is such that if the object type is a person, the vehicle stops, and if the object type is another AGV, it is avoided. This makes it possible to operate the AGV safely and efficiently by avoiding collisions with people and avoiding non-human objects.

本実施形態では、ＡＧＶが受動的に物体をセグメンテーションしていた。しかしながら、動く物体を排除するようにＡＧＶが人に指示を出してもよい。具体的には、マップ情報を生成中に、算出部１１２０が人を検出した場合には、制御部１１４０が不図示のスピーカを用いて人を移動させる音声を出力する制御値を算出する。このようにすることで、動く物体を除外してマップ情報を生成することができる。 In this embodiment, the AGV passively segmented the object. However, the AGV may instruct a person to remove moving objects. Specifically, when the calculation unit 1120 detects a person while generating map information, the control unit 1140 calculates a control value for outputting a sound to move the person using a speaker (not shown). By doing so, map information can be generated while excluding moving objects.

本実施形態においては、意味的領域分割を行い、物体種を特定していた。しかしながら物体種を特定せずとも算出部１１２０がマップ情報や位置姿勢の算出や、制御部１１４０が制御値を算出する構成としてもよい。つまり、図１２のＳ６１１０，Ｓ６１２０、を取り除いた構成とすることもできる。具体的には、例えばデプスマップを地面からの高さによって領域分割する。このとき、ＡＧＶの高さ以上の高さである画素は無視して制御値を算出する。つまり、ＡＧＶが衝突しない高さのポイントクラウドはルート生成に用いない。このようにすることで、処理する点群数が減少し高速に制御値を算出することができるようになる。また、平面度合を基に領域分割ってもよい。こうすると、位置姿勢算出の寄与度が高い三次元エッジを優先して位置姿勢算出に用いる（位置姿勢で曖昧性が残る平面を処理から除外する）ことができるようになり、算出時間の減少とロバスト性の向上につながる。 In this embodiment, semantic region segmentation is performed to identify the object type. However, the calculation unit 1120 may calculate map information and position/orientation, and the control unit 1140 may calculate a control value without specifying the object type. In other words, it is also possible to have a configuration in which S6110 and S6120 in FIG. 12 are removed. Specifically, for example, the depth map is divided into regions based on the height from the ground. At this time, the control value is calculated while ignoring pixels whose height is higher than the height of the AGV. In other words, a point cloud at a height where AGVs do not collide is not used for route generation. By doing so, the number of point groups to be processed is reduced and control values can be calculated at high speed. Alternatively, the area may be divided based on the degree of flatness. This makes it possible to prioritize 3D edges that have a high contribution to position and orientation calculations and use them for position and orientation calculations (excluding planes with ambiguity in position and orientation from processing), reducing calculation time and reducing calculation time. This leads to improved robustness.

本実施形態では、マップ情報で動く物体ほど重みを小さくして位置姿勢算出の寄与度を低下させていた。一方、マップ情報はレイヤー構造をもっていなくても、デプスマップの意味的領域分割結果を基にデプスマップの各画素に重みを算出し、この重みを用いて位置姿勢を算出することもできる。入力部１１１０がデプスマップを入力したら、次に算出部１１２０がデプスマップをＳ６１１０からＳ６１３０の処理手順で意味的領域分割する。そして各画素の物体種ラベルを基にルックアップテーブルを参照して重みを決める。その後、ステップＳ１４０により重みを考慮して位置姿勢を算出する。以上のように、マップをレイヤー構造とせずとも、位置姿勢算出における動く物体の影響を低下させることができる。こうするとマップの容量を小さくすることができる。 In the present embodiment, the weight of an object that moves according to the map information is decreased to reduce its contribution to position and orientation calculation. On the other hand, even if the map information does not have a layer structure, it is also possible to calculate a weight for each pixel of the depth map based on the result of semantic region division of the depth map, and use this weight to calculate the position and orientation. When the input unit 1110 inputs the depth map, the calculation unit 1120 then divides the depth map into semantic regions in the processing steps from S6110 to S6130. Then, a weight is determined by referring to a lookup table based on the object type label of each pixel. After that, in step S140, the position and orientation are calculated in consideration of the weight. As described above, the influence of moving objects on position and orientation calculation can be reduced without using a layered map. This allows you to reduce the size of the map.

本実施形態における撮像部１１０は、撮像素子上の各々の受光部が２以上の受光素子によって構成されることを特徴とする撮像部に限らず、ＴＯＦカメラや３ＤＬｉＤＡＲなど、三次元の奥行き情報を取得できるものであれば何でもよい。 The imaging unit 110 in this embodiment is not limited to an imaging unit in which each light receiving unit on an image sensor is composed of two or more light receiving elements, but also includes a TOF camera, 3DLiDAR, etc. that captures three-dimensional depth information. Anything is fine as long as it can be obtained.

本実施形態では保持部１１３０がマップ情報をレイヤーごとに保持している。これらは表示部Ｈ１６で確認したり、初期マップに戻したりすることも出来る。ディスプレイ画面を見ながらレイヤーを確認することで、動く物体がマップに登録されている場合には再度ＡＧＶにマップを生成する指示を出すことで、手軽に、安定してＡＧＶを運用することができる。 In this embodiment, the holding unit 1130 holds map information for each layer. These can be confirmed on the display section H16 or returned to the initial map. By checking the layers while looking at the display screen, if a moving object is registered in the map, you can easily and stably operate the AGV by instructing the AGV to generate a map again. .

本実施形態では、１台のＡＧＶがマップ情報を作成することを想定していたが、複数のＡＧＶが協調してマップ情報を生成することもできる。具体的には各ＡＧＶの作成したマップ情報の同一の地点を指すポイントクラウドが同じ位置となるようにＩＣＰアルゴリズムで位置合わせする。また、個別に作成したマップ情報を統合するときに、マップ作成時刻を参照して、より新しいマップ情報を残すように統合してもよい。また、しばらくマップ情報が更新されていない領域のマップを生成するように制御部１１４０が作業を行っていないＡＧＶを移動させてもよい。このように、複数のＡＧＶで強調してマップを生成することで、マップ情報の生成に係る時間が短くなり、手軽にＡＧＶを運用することができる。 In this embodiment, it is assumed that one AGV creates map information, but it is also possible for a plurality of AGVs to cooperate and create map information. Specifically, alignment is performed using the ICP algorithm so that point clouds pointing to the same point in map information created by each AGV are at the same position. Further, when integrating individually created map information, the map creation time may be referred to and the information may be integrated so as to leave more recent map information. Further, the control unit 1140 may move an AGV on which it is not working so as to generate a map of an area whose map information has not been updated for a while. In this way, by generating a map with emphasis using a plurality of AGVs, the time required to generate map information is shortened, and AGVs can be easily operated.

制御部１１４０が算出する制御値の算出は、マップ情報を用いて、目的位置姿勢に近づくように算出する方法であれば本実施形態で説明した方法に限らない。具体的には、ルート生成のための学習モデルを用いて制御値を決定することができる。例えば、強化学習の学習モデルであるＤＱＮ（ＤｅｅｐＱ－Ｎｅｔｗｏｒｋ）を援用できる。特に、あらかじめ目標位置姿勢に近づくと報酬を高く、目標位置姿勢から離れると報酬を低く、障害物に接近すると報酬を低くするように強化学習の学習モデルを学習しておくことで実現できる。 The control value calculated by the control unit 1140 is not limited to the method described in this embodiment, as long as it is a method of calculating the control value so as to approach the target position and orientation using map information. Specifically, the control value can be determined using a learning model for route generation. For example, DQN (Deep Q-Network), which is a reinforcement learning learning model, can be used. In particular, this can be achieved by training a reinforcement learning learning model in advance such that when approaching the target position/posture, the reward is high, when moving away from the target position/posture, the reward is low, and when approaching an obstacle, the reward is low.

実施形態１から６においては、マップ情報を用いて位置姿勢の算出、制御値算出を行う方法について述べた。しかしながら、マップ情報の利用目的はこれに限らない。具体的には、作成したマップ情報を用いて、ＡＧＶの運搬シミュレーションを行い、効率良く運搬できるよう工程管理システムが工程を生成してもよい。同様に移動体管理システムが、マップ情報を基にＡＧＶの運行タイミングや混雑を回避するようなルートを生成してもよい。 In the first to sixth embodiments, the method of calculating the position and orientation and the control value using map information has been described. However, the purpose of using map information is not limited to this. Specifically, the created map information may be used to perform an AGV transportation simulation, and the process management system may generate a process for efficient transportation. Similarly, the mobile object management system may generate AGV operation timings and routes that avoid congestion based on map information.

作成したマップで配送シミュレーションとともに前述の学習モデルの学習を行ってもよい。このとき、シミュレーション上で障害物の設置、人や他のＡＧＶとの衝突といった状況を再現し学習しておくことで、現実に同様の状況が生じても制御部１１４０が学習モデルを用いて安定して制御値を算出することができる。また、Ａ３Ｃ（ＡｓｙｎｃｈｒｏｎｏｕｓＡｄｖａｎｔａｇｅＡｃｔｏｒ－Ｃｒｉｔｉｃ）などの方式により並列に学習することで、短時間で効率よく学習モデルが制御方法を学習するように構成することもできる。 The above-described learning model may be trained using the created map in addition to the delivery simulation. At this time, by reproducing and learning situations such as the installation of obstacles and collisions with people or other AGVs in the simulation, the control unit 1140 can use the learning model to stabilize the situation even if a similar situation occurs in reality. The control value can be calculated by Further, by performing parallel learning using a method such as A3C (Asynchronous Advantage Actor-Critic), it is possible to configure the learning model to efficiently learn the control method in a short time.

［実施形態７］
実施形態１から６に共通して適用できるＵＩについて説明する。撮像部が取得した視覚情報や、算出部が算出した位置姿勢、物体の検出結果、マップ情報などをユーザが確認することを説明する。また、ＡＧＶは自動制御で動くためユーザの入力により制御することを説明する。ユーザがＡＧＶの状況を確認できるよう、ＡＧＶを制御することもできるように、表示装置として例えばディスプレイにＧＵＩを表示し、マウスやタッチパネルといった入力装置によりユーザからの操作を入力する。なお、本実施形態において、ディスプレイはＡＧＶに搭載されているものとしているが、このような構成に限るものではない。つまり、通信Ｉ／Ｆ（Ｈ１７）を介して、ユーザの持つモバイル端末のディスプレイを表示装置として用いる、移動体管理システムに接続された液晶ディスプレイを表示装置として用いる、といったこともできる。ＡＧＶに搭載されている表示装置を用いる場合でも、ＡＧＶに搭載されていない表示装置を用いる場合でも、表示情報は、情報処理装置で生成することができる。また、ＡＧＶに搭載されていない表示装置を用いる場合には、表示装置に付随する計算機が、情報処理装置から表示情報の生成に必要な情報を取得し表示情報を生成してもよい。 [Embodiment 7]
A UI that can be commonly applied to Embodiments 1 to 6 will be described. It will be explained that the user confirms visual information acquired by the imaging unit, position and orientation calculated by the calculation unit, object detection results, map information, and the like. Furthermore, since the AGV operates under automatic control, it will be explained that it is controlled by user input. In order to enable the user to check the status of the AGV and also to control the AGV, a GUI is displayed on a display as a display device, and operations from the user are input using an input device such as a mouse or a touch panel. Note that in this embodiment, it is assumed that the display is mounted on the AGV, but the configuration is not limited to this. That is, it is also possible to use the display of a mobile terminal held by the user as a display device, or to use a liquid crystal display connected to a mobile object management system as a display device via the communication I/F (H17). Display information can be generated by an information processing device regardless of whether a display device installed in the AGV or a display device not installed in the AGV is used. Furthermore, when using a display device that is not mounted on the AGV, a computer attached to the display device may obtain information necessary for generating display information from the information processing device and generate the display information.

実施形態７における装置の構成は、実施形態１で説明した情報処理装置１０の構成を示す図２と同一であるため省略する。算出部１１２０が、撮像部１１０が取得した視覚情報、算出部１１２０が算出した位置姿勢や検出した物体、制御部１１４０が算出した制御値に基づいて表示情報を生成し、ＡＧＶに搭載したタッチパネルディスプレイ等に提示する点が実施形態１と異なる。なお、表示情報の詳細については後述する。また、本実施形態においては、保持部１１３０は、２Ｄのマップ情報、３Ｄのマップ情報を保持しているものとする。 The configuration of the device in the seventh embodiment is the same as that shown in FIG. 2 showing the configuration of the information processing device 10 described in the first embodiment, so a description thereof will be omitted. The calculation unit 1120 generates display information based on the visual information acquired by the imaging unit 110, the position and orientation calculated by the calculation unit 1120, the detected object, and the control value calculated by the control unit 1140, and displays the display information on the touch panel display mounted on the AGV. This embodiment differs from the first embodiment in that it presents the following. Note that details of the display information will be described later. Further, in this embodiment, it is assumed that the holding unit 1130 holds 2D map information and 3D map information.

図１３に、本実施形態における表示装置が提示する表示情報の一例であるＧＵＩ１００を示す。 FIG. 13 shows a GUI 100 that is an example of display information presented by the display device in this embodiment.

Ｇ１１０は２Ｄのマップ情報を提示するためのウィンドウである。Ｇ１２０は３Ｄのマップ情報を提示するためのウィンドウである。Ｇ１３０は撮像部１１０が取得した画像Ｄ１５４ｅを提示するためのウィンドウである。Ｇ１４０は撮像部１１０が取得したデプスマップＤ１５４ｄを提示するためのウィンドウである。また、Ｇ１５０は、算出部１１２０が実施形態１で説明したように算出した位置姿勢や実施形態５、６で説明したように検出した物体、実施形態１で説明したように制御部１１４０が算出した制御値に基づいて表示情報を提示するためのウィンドウである。 G110 is a window for presenting 2D map information. G120 is a window for presenting 3D map information. G130 is a window for presenting the image D154e acquired by the imaging unit 110. G140 is a window for presenting the depth map D154d acquired by the imaging unit 110. G150 also includes the position and orientation calculated by the calculation unit 1120 as described in the first embodiment, the object detected as described in the fifth and sixth embodiments, and the position and orientation calculated by the control unit 1140 as described in the first embodiment. This is a window for presenting display information based on control values.

Ｇ１１０は、保持部１１３０が保持している２Ｄマップの提示例を示している。Ｇ１１１は、撮像部１１０を搭載したＡＧＶである。算出部１１２０が、撮像部の位置姿勢（ＡＧＶの位置姿勢）に基づき２Ｄマップ上に合成する。Ｇ１１２は、算出部１１２０が実施例５や６の方法で検出した物体の位置姿勢に基づき、衝突の可能性がある場合に吹き出しとしてアラートを提示した例である。Ｇ１１３は、制御部１１４０が算出した制御値に基づき、ＡＧＶの進行予定ルートを矢印として提示した例である。図１３においては、ＡＧＶはＧ１１４に提示した目的地に向かっている。このように、２ＤマップとＡＧＶの位置、物体の検出結果、ルートを提示することでユーザが容易にＡＧＶの運行状況を把握することができる。なお、Ｇ１１１～Ｇ１１４は色や線の太さ、形状を変えることでユーザがより容易に運行状況を把握できるようにしてよい。 G110 shows an example of presentation of the 2D map held by the holding unit 1130. G111 is an AGV equipped with an imaging unit 110. The calculation unit 1120 synthesizes the images on a 2D map based on the position and orientation of the imaging unit (the position and orientation of the AGV). G112 is an example in which an alert is presented as a speech bubble when there is a possibility of a collision based on the position and orientation of the object detected by the calculation unit 1120 using the method of the fifth or sixth embodiment. G113 is an example in which the expected route of the AGV is presented as an arrow based on the control value calculated by the control unit 1140. In FIG. 13, the AGV is heading to the destination presented to G114. In this way, by presenting the 2D map, the position of the AGV, the object detection results, and the route, the user can easily understand the operating status of the AGV. Note that G111 to G114 may be changed in color, line thickness, and shape so that the user can more easily grasp the operating status.

Ｇ１２０は、保持部１１３０が保持する３Ｄマップの提示例を示している。Ｇ１２１は、実施形態６で説明したように、算出部１１２０がデプスマップを意味的領域分割した結果を用いて保持部１１３０が保持する３Ｄマップを更新した結果を可視化した例である。具体的には工場のＣＡＤデータから得られた動かない物体は濃く、他のＡＧＶや人など動く物体ほど薄く提示した。なお、濃さに限らず、レイヤーごとに色を変更して提示してよい。また、ＧＵＩ１２２には、算出部１１２０が検出した物体のラベルを提示した。このように、３Ｄマップを提示することで、２Ｄマップと比較しさらに高さ方向を考慮してユーザは運行状況を把握することができる。また、ＡＧＶが走行している間に見つけた物体種であれば、現場に行かずとも探すことができる。 G120 shows an example of presentation of the 3D map held by the holding unit 1130. G121 is an example of visualizing the result of updating the 3D map held by the holding unit 1130 using the result of semantic region segmentation of the depth map by the calculation unit 1120, as described in the sixth embodiment. Specifically, stationary objects obtained from the factory's CAD data were presented darker, and moving objects such as other AGVs and people were presented lighter. Note that the color is not limited to the density and may be presented by changing the color for each layer. Furthermore, the label of the object detected by the calculation unit 1120 was presented on the GUI 122. In this way, by presenting the 3D map, the user can understand the operating situation by considering the height direction in comparison with the 2D map. Furthermore, if the type of object is found while the AGV is running, it can be searched for without going to the site.

Ｇ１３０は、撮像部１１０が取得した画像の提示例を示している。Ｇ１３１、Ｇ１３２は、実施形態６で説明したように、算出部１１２０が検出した物体である他のＡＧＶや人の外周にバンディングボックスを点線で重畳した。ただし実践や二重線でもよいし、色を変えて提示することで強調してもよい。このように撮像部１１０が取得した画像に物体の検出結果を重畳することで、算出部１１２０が検出した物体を手間なくユーザが確認することができる。 G130 shows a presentation example of an image acquired by the imaging unit 110. In G131 and G132, as described in the sixth embodiment, a banding box is superimposed on the outer periphery of another AGV or person, which is an object detected by the calculation unit 1120, using a dotted line. However, it may be a practice or a double line, or it may be emphasized by presenting it in different colors. By superimposing the object detection result on the image acquired by the imaging unit 110 in this manner, the user can easily confirm the object detected by the calculation unit 1120.

Ｇ１４０は、撮像部１１０が取得したデプスマップの提示例を示している。Ｇ１４１は、実施形態５で説明した、保持部１１３０が保持する物体のＣＡＤモデルを、算出部１１２０が算出した物体の位置姿勢を用いてワイヤーフレームとして重畳した例である。Ｇ１４２は、ＡＧＶのＣＡＤモデルをワイヤーフレームとして重畳した例である。Ｇ１４３は、三次元のマーカのＣＡＤモデルを重畳した例である。このように、デプスマップ上に検出して位置姿勢を算出した物体を提示することで、ユーザは検出した物体を容易に把握することができる。また、検出した物体の位置姿勢を用いてＡＧＶの制御を行う場合には、デプスマップとＣＧとの位置ずれからＡＧＶの位置姿勢の算出ずれを把握することができる。なお、ワイヤーフレームをさらにＧ１３０に重畳してもよい。こうすると、実写の画像とモデルとのずれをユーザは比較すればよく、より手軽に直感的にＡＧＶの位置姿勢算出精度や物体の検出精度を確認することができるようになる。 G140 shows a presentation example of the depth map acquired by the imaging unit 110. G141 is an example in which the CAD model of the object held by the holding unit 1130 described in the fifth embodiment is superimposed as a wire frame using the position and orientation of the object calculated by the calculation unit 1120. G142 is an example in which the CAD model of the AGV is superimposed as a wire frame. G143 is an example in which a CAD model of a three-dimensional marker is superimposed. In this way, by presenting the detected object on the depth map and the position and orientation calculated, the user can easily understand the detected object. Further, when controlling the AGV using the detected position and orientation of the object, it is possible to understand the calculated deviation in the position and orientation of the AGV from the positional deviation between the depth map and the CG. Note that the wire frame may be further superimposed on G130. In this way, the user only has to compare the deviation between the actual image and the model, and can more easily and intuitively check the AGV position/orientation calculation accuracy and object detection accuracy.

Ｇ１５０は、ＡＧＶを人手で操作するためのＧＵＩや、算出部１１２０や制御部１１４０が算出した値、ＡＧＶの運行情報の提示例を示している。Ｇ１５１は緊急停止ボタンであり、ユーザはこのボタンに指で触れることでＡＧＶの移動を停止させることができる。Ｇ１５２はマウスカーソルであり、不図示のマウスやコントローラ、タッチパネルを通したユーザのタッチ動作に従ってカーソルを移動させることができ、ボタンを押下することでＧＵＩ内のボタンやラジオボタンを操作することができる。Ｇ１５３はＡＧＶのコントローラを提示した例である。ユーザはコントローラの内部の円を上下左右に移動させることで、それらの入力に応じたＡＧＶの前後左右の動作を行うことができる。Ｇ１５４はＡＧＶの内部状態を提示した例である。ＡＧＶが自動走行しており、秒速０．５ｍ／ｓで動作している状態を例として図示した。また、ＡＧＶが走行を開始してからの時間、目的地までにかかる残りの時間、予定に対する到着予想時刻の差といった運行情報も合わせて提示した。Ｇ１５６はＡＧＶの動作や表示情報の設定を行うためのＧＵＩである。マップ情報を生成するか否か、検出した物体を提示するか否かといった操作をユーザが行うことができる。Ｇ１５７はＡＧＶの運行情報を提示した例である。算出部１１２０が算出した位置姿勢や、移動体管理システム１３から受信した目的地座標、ＡＧＶが運搬している物品名を提示した例である。このように、運行情報を提示する、ユーザからの入力に係るＧＵＩを提示することで、より直感的にＡＧＶを運用できるようになる。 G150 indicates a GUI for manually operating the AGV, values calculated by the calculation unit 1120 and the control unit 1140, and a presentation example of AGV operation information. G151 is an emergency stop button, and the user can stop the movement of the AGV by touching this button with a finger. G152 is a mouse cursor, and the cursor can be moved according to the user's touch operation through a mouse, controller, or touch panel (not shown), and buttons and radio buttons in the GUI can be operated by pressing the button. . G153 is an example of an AGV controller. By moving the circle inside the controller vertically, horizontally, and vertically, the user can move the AGV forward, backward, left, and right according to these inputs. G154 is an example showing the internal state of the AGV. An example of a state in which the AGV is automatically traveling and operating at a speed of 0.5 m/s is illustrated. It also presented operational information such as the time since the AGV started traveling, the time remaining to reach the destination, and the difference between the expected arrival time and the scheduled time. G156 is a GUI for setting the AGV operation and display information. The user can perform operations such as whether or not to generate map information and whether or not to present detected objects. G157 is an example of presenting AGV operation information. This is an example in which the position and orientation calculated by the calculation unit 1120, the destination coordinates received from the mobile object management system 13, and the name of the article being transported by the AGV are presented. In this way, by presenting a GUI that presents operation information and that is related to input from the user, it becomes possible to operate the AGV more intuitively.

実施形態７における情報処理装置の処理手順は実施形態１で説明した情報処理装置１０の処理手順を説明した図５のステップＳ１６０の後に、算出部１１２０が表示情報を生成する、表示情報生成ステップ（不図示）が新たに追加される点で異なる。表示情報生成ステップでは、撮像部１１０が撮像した視覚情報、算出部１１２０が算出した位置姿勢、検出した物体、制御部１１４０が算出した制御値を基に、表示情報をレンダリングし、表示装置に出力する。 The processing procedure of the information processing apparatus in the seventh embodiment includes a display information generation step (where the calculation unit 1120 generates display information) after step S160 in FIG. The difference is that (not shown) is newly added. In the display information generation step, display information is rendered based on the visual information captured by the imaging unit 110, the position and orientation calculated by the calculation unit 1120, the detected object, and the control value calculated by the control unit 1140, and output to the display device. do.

実施形態７では、撮像部が取得した視覚情報や、算出部が算出した位置姿勢、検出した物体、制御部が算出した制御値を基に、算出部が表示情報を生成し、ディスプレイに提示する。これによりユーザが容易に本情報処理装置の状態を確認することができる。また、ＡＧＶの制御値や各種パラメータ、表示モード等をユーザが入力する。これにより手軽にＡＧＶの各種設定を変更したり移動させたりすることができる。このように、ＧＵＩを提示することで手軽にＡＧＶを運用することができるようになる。 In the seventh embodiment, the calculation unit generates display information based on the visual information acquired by the imaging unit, the position and orientation calculated by the calculation unit, the detected object, and the control value calculated by the control unit, and presents it on the display. . This allows the user to easily check the status of the information processing apparatus. Further, the user inputs control values, various parameters, display mode, etc. of the AGV. This allows various settings of the AGV to be easily changed or moved. In this way, by presenting the GUI, it becomes possible to easily operate the AGV.

表示装置は、ディスプレイに限らない。ＡＧＶにプロジェクタを搭載すれば、プロジェクタを用いて表示情報を提示することもできる。また、移動体管理１３システムに表示装置を接続してあれば、通信Ｉ／Ｆ（Ｈ１７）経由で表示情報を移動体管理システム１３に送信し提示してもよい。また、表示情報の生成に必要な情報のみ送信して、移動体管理システム１３内部の計算機で表示情報を生成することもできる。このようにすることで、ユーザはＡＧＶに搭載した表示装置を確認せずとも、ＡＧＶの運行状況や操作を手軽に行うことができるようになる。 The display device is not limited to a display. If the AGV is equipped with a projector, display information can also be presented using the projector. Further, if a display device is connected to the mobile body management system 13, display information may be transmitted to the mobile body management system 13 and presented via the communication I/F (H17). Alternatively, the display information can be generated by a computer inside the mobile object management system 13 by transmitting only the information necessary for generating the display information. By doing so, the user can easily check the operating status and operation of the AGV without checking the display device mounted on the AGV.

本実施形態における表示情報は、本情報処理が扱う情報を提示するものであれば何でもよい。本実施形態で説明した表示情報の他にも、位置姿勢算出時の残差や物体検出時の認識尤度値を表示することもできる。さらには、位置姿勢算出に係った時間やフレームレート、ＡＧＶのバッテリーの残量情報なども表示してもよい。このように、本情報処理装置が扱う情報を提示することで、ユーザが本情報処理装置の内部状態を確認することができるようになる。 The display information in this embodiment may be any information as long as it presents information handled by this information processing. In addition to the display information described in this embodiment, it is also possible to display the residual error when calculating the position and orientation and the recognition likelihood value when detecting an object. Furthermore, the time and frame rate involved in calculating the position and orientation, information on the remaining battery power of the AGV, etc. may also be displayed. In this way, by presenting the information handled by the information processing device, the user can confirm the internal state of the information processing device.

本実施形態で説明したＧＵＩは一例であって、ＡＧＶの運行状況を把握する、ＡＧＶに対して操作（入力）を行うことができるようなＧＵＩであればどんなＧＵＩを用いてもよい。例えば色を変える、線の太さや実線・破線・二重線を切り替える、拡大縮小する、必要のない情報を隠す、というように表示情報を変更することもできる。また、物体モデルはワイヤーフレームではなく輪郭を表示してもよいし、透過したポリゴンモデルを重畳してもよい。このように表示情報の可視化方法を変えることで、ユーザがより直感的に表示情報を理解することができるようになる。 The GUI described in this embodiment is an example, and any GUI may be used as long as it is a GUI that can grasp the operation status of the AGV and perform operations (inputs) on the AGV. For example, you can change the displayed information by changing the color, switching line thickness, solid, broken, or double lines, scaling, or hiding unnecessary information. Furthermore, the object model may be displayed as an outline instead of a wire frame, or a transparent polygon model may be superimposed. By changing the method of visualizing the displayed information in this way, the user can more intuitively understand the displayed information.

本実施形態で説明したＧＵＩをインターネット経由で不図示のサーバに接続することもできる。このような構成とすると、例えばＡＧＶに不具合が生じた場合に、ＡＧＶメーカの担当者が、現場に行かずともサーバを経由して表示情報を取得して、ＡＧＶの状態を確認することができる。 The GUI described in this embodiment can also be connected to a server (not shown) via the Internet. With such a configuration, if a problem occurs with the AGV, for example, a person in charge of the AGV manufacturer can obtain display information via the server and check the status of the AGV without going to the site. .

入力装置はタッチパネルを例で挙げたが、ユーザの入力を受け付ける入力装置であればなんでもよい。キーボードでもよいし、マウスでもよく、ジェスチャー（例えば撮像部１１０が取得する視覚情報から認識する）でもよい。さらにはネットワーク経由で移動体管理システムが入力装置となってもよい。また、スマートフォンやタブレット端末を通信Ｉ／Ｆ（Ｈ１７）を経由して接続すれば、それらを表示装置／入力装置として用いることもできる。 Although a touch panel is used as an example of the input device, any input device that accepts user input may be used. It may be a keyboard, a mouse, or a gesture (recognized from visual information acquired by the imaging unit 110, for example). Furthermore, a mobile object management system may serve as an input device via a network. Furthermore, if a smartphone or tablet terminal is connected via the communication I/F (H17), they can also be used as a display device/input device.

本実施形態において入力装置が入力するのは、本実施形態で説明したものに限らず、本情報処理装置のパラメータを変えるものであれば何でもよい。例えば、移動体の制御値の上限（速度の上限）を変えるようなユーザの入力を受け付けてもよいし、Ｇ１１０上でユーザがクリックした目的地点を入力してもよい。さらには、物体検出で用いるモデルと用いないモデルのユーザの選択を入力してもよい。検出できなかった物体をＧ１３０上でユーザが囲うように入力し、学習モデルの不図示の学習装置が撮像部１１０の視覚情報に合わせて当該物体を検出できるように学習するように構成してもよい。 In this embodiment, what the input device inputs is not limited to what has been described in this embodiment, but may be anything that changes the parameters of the information processing device. For example, a user input such as changing the upper limit of the control value (upper limit of speed) of the moving body may be accepted, or a destination point clicked by the user on G110 may be input. Furthermore, a user's selection of a model to be used and a model not to be used in object detection may be input. The configuration may also be configured such that the user inputs a box around an object that cannot be detected on the G130, and the learning device (not shown) of the learning model learns to detect the object in accordance with the visual information of the imaging unit 110. good.

［実施形態８］
実施形態６では、撮像部１１０が取得した視覚情報を意味的領域分割し、各画素が何の物体種であるかを判別しマップを生成する方法、およびそれらマップや判別した物体種に応じてＡＧＶを制御する方法について述べた。実施形態８ではさらに、同一物体種でも状況によって異なる意味情報を認識し、それら認識結果を基にＡＧＶを制御する方法について述べる。 [Embodiment 8]
In the sixth embodiment, a method is described in which the visual information acquired by the imaging unit 110 is divided into semantic regions and a map is generated by determining what type of object each pixel is, and a method is described according to these maps and the determined object type. A method for controlling an AGV has been described. Embodiment 8 further describes a method of recognizing different semantic information depending on the situation even for the same object type, and controlling the AGV based on the recognition results.

本実施形態８では、撮像部１１０が取得した視覚情報から、工場における積み重ねておかれた製品などの、物体の積み重なり度合を意味情報として認識する。つまり、撮像部１１０の視界に入る物体の意味情報を認識する。そして、物体の積み重なり度合に応じてＡＧＶを制御する方法について述べる。つまり、積み重なっているような物体をより安全に避けるようにＡＧＶを制御する。なお、本実施形態における物体の積み重なり度合とは、物体の積み重ね個数、または高さのことである。 In the eighth embodiment, the degree of stacking of objects, such as products stacked in a factory, is recognized as semantic information from the visual information acquired by the imaging unit 110. In other words, the semantic information of an object that enters the field of view of the imaging unit 110 is recognized. Then, a method for controlling the AGV according to the degree of stacking of objects will be described. In other words, the AGV is controlled to more safely avoid objects that are piled up. Note that the degree of stacking of objects in this embodiment refers to the number or height of stacked objects.

ＡＧＶの制御値算出には空間が物体によって占有されているかどうかを表す占有マップを用いるものとする。なお、本実施形態においては占有マップとして、シーンを格子状に区切り、各格子に障害物が存在する確率を保持した二次元の占有格子マップを用いる。本実施形態においては、占有マップは、ＡＧＶの接近拒絶度を表す値（０に近い程通行が許容され、１に近い程通行が拒絶される０から１の連続変数）を保持することとした。ＡＧＶは占有マップの接近拒絶値が所定の値以上の領域（本実施形態においては格子）を通らないように、目的地まで制御する。目的地とは、工程管理システム１２から取得した運行情報に含まれる、ＡＧＶの目的地である二次元座標のことである。 It is assumed that an occupancy map indicating whether a space is occupied by an object is used to calculate the control value of the AGV. Note that in this embodiment, a two-dimensional occupancy grid map is used as the occupancy map, which divides the scene into grids and holds the probability that an obstacle exists in each grid. In this embodiment, the occupancy map holds a value representing the degree of approach rejection of the AGV (a continuous variable from 0 to 1, in which passage is allowed as it is closer to 0, and passage is refused as it is closer to 1). . The AGV is controlled to the destination so that it does not pass through an area (in this embodiment, a grid) where the access rejection value of the occupancy map exceeds a predetermined value. The destination refers to the two-dimensional coordinates of the destination of the AGV included in the operation information acquired from the process management system 12.

本実施形態における情報処理システムは、実施形態１の図１で説明したシステム構成と同一であるため説明を省略する。 The information processing system in this embodiment is the same as the system configuration described in FIG. 1 of Embodiment 1, so a description thereof will be omitted.

図１４は、本実施形態８における情報処理装置８０を備える移動体１２のモジュール構成を示す図である。情報処理装置８０は、入力部１１１０、位置姿勢算出部８１１０、意味情報認識部８１２０、制御部８１３０から構成されている。入力部１１１０は、移動体１２に搭載された撮像部１１０と接続されている。制御部８１３０は、アクチュエータ１２０と接続されている。また、これらに加え、不図示の通信装置が移動体管理システム３と情報を双方向に通信を行っており、情報処理装置８０の各種手段に入出力している。 FIG. 14 is a diagram showing a module configuration of a mobile body 12 including an information processing device 80 in the eighth embodiment. The information processing device 80 includes an input section 1110, a position/orientation calculation section 8110, a semantic information recognition section 8120, and a control section 8130. The input section 1110 is connected to the imaging section 110 mounted on the moving body 12. Control unit 8130 is connected to actuator 120. In addition to these, a communication device (not shown) communicates information bidirectionally with the mobile object management system 3, and inputs and outputs information to various means of the information processing device 80.

本実施形態における撮像部１１０、アクチュエータ１２０、入力部１１１０は実施形態１と同様であるため詳細の説明は省略する。 The imaging unit 110, actuator 120, and input unit 1110 in this embodiment are the same as those in Embodiment 1, so detailed explanations will be omitted.

以下に、位置姿勢算出部８１１０、意味情報認識部８１２０、制御部８１３０を順に説明する。 Below, the position/orientation calculation section 8110, the semantic information recognition section 8120, and the control section 8130 will be explained in order.

位置姿勢算出部８１１０は、入力部１１１０が入力したデプスマップを基に撮像部１１０の位置姿勢を算出する。また、算出した位置姿勢を基にシーンの三次元マップを作成する。算出した位置姿勢、および三次元マップを意味情報認識部８１２０、および制御部８１３０に入力する。 The position and orientation calculation unit 8110 calculates the position and orientation of the imaging unit 110 based on the depth map input by the input unit 1110. Furthermore, a three-dimensional map of the scene is created based on the calculated position and orientation. The calculated position and orientation and three-dimensional map are input to the semantic information recognition section 8120 and the control section 8130.

意味情報認識部８１２０は、入力部１１１０が入力したデプスマップ、および位置姿勢算出部８１１０が算出した位置姿勢および三次元マップを基に、意味情報として積み重なっている物体の個数および高さの値を推定する。推定した個数および高さの値を制御部８１３０に入力する。 The semantic information recognition unit 8120 calculates the values of the number and height of stacked objects as semantic information based on the depth map input by the input unit 1110 and the position and orientation and three-dimensional map calculated by the position and orientation calculation unit 8110. presume. The estimated number and height values are input to the control unit 8130.

制御部８１３０は、位置姿勢算出部８１１０が算出した位置姿勢、および三次元マップを入力する。また、意味情報認識部８１２０が推定した意味情報としての積み重なっている物体の個数および高さの値を入力する。制御部８１３０は、入力された値を基に、シーン中の物体への接近拒絶値を算出し、物体から所定の接近拒絶値以上の占有格子のセルを通るようにＡＧＶを制御する制御値を算出する。制御部８１３０は、算出した制御値をアクチュエータ１２０に出力する。 The control unit 8130 inputs the position and orientation calculated by the position and orientation calculation unit 8110 and the three-dimensional map. In addition, the number and height of objects piled up as semantic information estimated by the semantic information recognition unit 8120 are input. The control unit 8130 calculates an approach rejection value for the object in the scene based on the input value, and generates a control value for controlling the AGV so that the object passes through cells of the occupied grid having a predetermined access rejection value or more. calculate. Control unit 8130 outputs the calculated control value to actuator 120.

次に、本実施形態における処理手順について説明する。図１５は、本実施形態における情報処理装置８０の処理手順を示すフローチャートである。処理ステップは、初期化Ｓ１１０、視覚情報取得Ｓ１２０、視覚情報入力Ｓ１３０、位置姿勢算出Ｓ８１０、意味情報推定Ｓ８２０、制御値算出Ｓ８３０、制御Ｓ１６０、システム終了判定Ｓ１７０からなる。なお、初期化Ｓ１１０、視覚情報取得Ｓ１２０、視覚情報入力Ｓ１３０、制御Ｓ１６０、システム終了判定Ｓ１７０は実施形態１の図５と同一であるため説明を省略する。以下に、位置姿勢算出Ｓ８１０、意味情報推定Ｓ８２０、制御値算出Ｓ８３０のステップを順に説明する。 Next, the processing procedure in this embodiment will be explained. FIG. 15 is a flowchart showing the processing procedure of the information processing device 80 in this embodiment. The processing steps include initialization S110, visual information acquisition S120, visual information input S130, position and orientation calculation S810, semantic information estimation S820, control value calculation S830, control S160, and system termination determination S170. Note that initialization S110, visual information acquisition S120, visual information input S130, control S160, and system termination determination S170 are the same as those in FIG. 5 of the first embodiment, so their explanations will be omitted. Below, the steps of position and orientation calculation S810, semantic information estimation S820, and control value calculation S830 will be explained in order.

ステップＳ８１０では、位置姿勢算出部８１１０が、撮像装置１１０の位置姿勢を算出するとともに、三次元のマップを作成する。これは、位置姿勢をもとにマップを作成しつつ位置姿勢推定を行うＳＬＡＭ（ＳｉｍｕｌｔａｎｅｏｕｓＬｏｃａｌｉｚａｔｉｏｎａｎｄＭａｐｐｉｎｇ）アルゴリズムにより実現する。具体的には、複数時刻に撮像部１１０が取得したデプスマップの奥行きの差が最小となるようにＩＣＰアルゴリズムで位置姿勢を算出する。また、算出した位置姿勢を基にデプスマップを時系列的に統合するＰｏｉｎｔ－ＢａｓｅｄＦｕｓｉｏｎアルゴリズムを用いて三次元のマップを作成する。 In step S810, the position and orientation calculation unit 8110 calculates the position and orientation of the imaging device 110 and creates a three-dimensional map. This is realized by a SLAM (Simultaneous Localization and Mapping) algorithm that estimates the position and orientation while creating a map based on the position and orientation. Specifically, the position and orientation are calculated using the ICP algorithm so that the difference in depth between depth maps acquired by the imaging unit 110 at multiple times is minimized. Furthermore, a three-dimensional map is created using a Point-Based Fusion algorithm that integrates depth maps in time series based on the calculated position and orientation.

ステップＳ８２０では、意味情報認識部８１２０が、デプスマップおよび三次元マップを領域分割し、領域ごとに物体の重なり数（ｎ）と高さ（ｈ）を算出する。具体的な処理手順を以下に順に説明する。 In step S820, the semantic information recognition unit 8120 divides the depth map and the three-dimensional map into regions, and calculates the number of overlapping objects (n) and height (h) for each region. The specific processing procedure will be explained in order below.

まず、デプスマップの各ピクセルとその周囲のピクセルの奥行き値に基づいて法線方向を算出する。次に、周囲の画素の法線方向との内積値が所定の値より大きければ同じ物体領域として一意の領域識別ラベルを割り当てる。このようにして、デプスマップを領域分割する。そして、領域分割したデプスマップの各画素が指す三次元マップの各ポイントクラウドにも領域識別ラベルを伝搬させることで三次元マップの領域分割を行う。 First, the normal direction is calculated based on the depth values of each pixel in the depth map and its surrounding pixels. Next, if the inner product value with the normal direction of surrounding pixels is larger than a predetermined value, a unique region identification label is assigned as the same object region. In this way, the depth map is divided into regions. Then, the three-dimensional map is divided into regions by propagating the region identification label to each point cloud of the three-dimensional map pointed to by each pixel of the divided depth map.

次に、三次元マップをＸ－Ｚ方向（ＡＧＶの移動平面）に等間隔に分割したバウンディングボックスを作成する。分割した各バウンディングボックスを地面から順に鉛直方向（Ｙ軸方向）に走査し、バウンディングボックスに含まれる各ポイントクラウドのラベル数を数える。また、ポイントクラウドの地面（Ｘ－Ｚ平面）からの高さの最大値を算出する。算出した領域数ｎ、最大の高さｈをポイントクラウドごとに三次元マップに保持しておく。 Next, bounding boxes are created by dividing the three-dimensional map at equal intervals in the XZ direction (the movement plane of the AGV). Each divided bounding box is sequentially scanned from the ground in the vertical direction (Y-axis direction), and the number of labels of each point cloud included in the bounding box is counted. Also, calculate the maximum height of the point cloud from the ground (XZ plane). The calculated number of areas n and maximum height h are stored in a three-dimensional map for each point cloud.

ステップＳ８３０では、制御部Ｓ１６０が、三次元マップを基に占有マップを作成する。また、物体の重なり数（ｎ）と高さ（ｈ）から占有マップの接近拒絶度の値を更新する。そして、更新した占有マップを基にＡＧＶを制御する。 In step S830, the control unit S160 creates an occupancy map based on the three-dimensional map. Furthermore, the approach rejection value of the occupancy map is updated based on the number of overlapping objects (n) and height (h). Then, the AGV is controlled based on the updated occupancy map.

具体的には、まず、ステップＳ８１０で作成した三次元マップをＡＧＶの移動平面にあたる床面であるＸ－Ｚ平面に射影し、２Ｄの占有マップを得る。次に、三次元マップの各ポイントクラウドをＸ－Ｚ平面に射影した点と各占有マップとの距離、ポイントクラウドの物体の重なり数（ｎ）と高さ（ｈ）の値を用いて、占有マップの各格子の値である接近拒絶値を更新する。ｉ番目のポイントクラウドＰ_ｉをＸ－Ｚ平面に射影した座標Ｘ－Ｚをｐ_ｉとする。また、占有のｊ番目のセルＱｊの座標をｑ_ｊとする。占有の値はｈ，Ｎが大きい程大きく、距離が離れるほど小さくなる関数として、例えば次のように求める。ただし、ｄ_ｉｊはｐ_ｉとｑ_ｉのユークリッド距離である。 Specifically, first, the three-dimensional map created in step S810 is projected onto the XZ plane, which is the floor surface corresponding to the movement plane of the AGV, to obtain a 2D occupancy map. Next, using the distance between each point cloud of the three-dimensional map projected onto the XZ plane and each occupancy map, the number of overlapping objects (n) and the height (h) of the point cloud, Update the access rejection value, which is the value of each grid in the map. Let p _i be the coordinates XZ of the i-th point cloud P _i projected onto the XZ plane. Further, the coordinates of the j-th occupied cell Qj are assumed to be _qj . The value of occupancy is determined as a function that increases as h and N increase and decreases as the distance increases, for example, as follows. However, d _ij is the Euclidean distance between p _i and q _i .

以上のように定めた、占有マップと目標位置姿勢・現在位置姿勢の情報から、ＡＧＶと目標地点の位置姿勢を最小にしつつも、占有マップの接近拒絶度の値が高い格子を避けるようにＡＧＶの進行ルート（移動経路）を決め、制御値を算出する。制御部８１３０が算出した制御値をアクチュエータ１３０に出力する。 Based on the information on the occupancy map, target position/orientation, and current position/orientation determined as above, the AGV is designed to minimize the position/orientation of the AGV and the target point while avoiding grids with high approach rejection values on the occupancy map. Determine the travel route (travel route) and calculate the control value. The control value calculated by the control unit 8130 is output to the actuator 130.

実施形態８では、意味情報としてＡＧＶの周囲の物体の積み重ね数および高さを推定し、それらの値が大きくなる程、ＡＧＶがそれら物体から距離をおいて走行するように制御する。これにより、例えばＡＧＶが物流倉庫のようにたくさんの物品が積まれた棚やパレットがある場合に、ＡＧＶがそれら物体からさらに距離をおいて走行することができるようになり、より安全にＡＧＶを運行できるようになる。 In the eighth embodiment, the number and height of stacked objects around the AGV are estimated as semantic information, and the larger these values are, the further the AGV is controlled to travel from the objects. As a result, for example, when an AGV is in a distribution warehouse where there are shelves or pallets loaded with many items, the AGV can move further away from those objects, making the AGV safer. It will be possible to operate.

＜変形例８－１＞
本実施形態における撮像部１１０は、ＴＯＦカメラやステレオカメラなど、画像とデプスマップを取得できるものであれば何でもよい。さらには、画像のみを取得するＲＧＢカメラや、モノクロカメラのような単眼カメラを用いてもよい。単眼カメラを用いる場合には位置及び姿勢の算出、占有マップの生成処理に奥行きが必要となるが、カメラの動きから奥行き値を算出することで本実施形態を実現する。なお、以降の実施形態において説明する撮像部１１０についても本実施形態と同様に構成する。 <Modification 8-1>
The imaging unit 110 in this embodiment may be anything that can acquire images and depth maps, such as a TOF camera or a stereo camera. Furthermore, an RGB camera that captures only images or a monocular camera such as a monochrome camera may be used. When using a monocular camera, depth is required for position and orientation calculation and occupancy map generation processing, but this embodiment is realized by calculating the depth value from the movement of the camera. Note that the imaging unit 110 described in subsequent embodiments is also configured in the same manner as this embodiment.

＜変形例８－２＞
占有マップの接近拒絶度の値は物体の高さが高く、積み重なり数が大きい程大きく、距離が離れるほど小さくなる関数であれば本実施形態において説明した方法に限らない。例えば、物体の高さや積み重なり数に比例する関数としてもよいし、距離に反比例する関数としてもよい。物体の高さ、または積み重なり数のどちらか一方のみ考慮する関数でもよい。距離、物体の高さや積み重なり数に応じた占有の値を格納したリストを参照して定めてもよい。なお、このリストはあらかじめ外部メモリ（Ｈ１４）に記憶しておいてもよいし、移動体管理システム１３が保持しており、必要に応じて情報処理装置に通信Ｉ／Ｆ（Ｈ１７）を介して情報処理装置８０にダウンロードしてもよい。 <Modification 8-2>
The approach rejection value of the occupancy map is not limited to the method described in this embodiment, as long as it is a function that increases as the height of the object is high and the number of stacks increases, and decreases as the distance increases. For example, it may be a function proportional to the height of objects or the number of stacked objects, or it may be a function inversely proportional to distance. It may be a function that only takes into account either the height of the object or the number of stacks. It may be determined by referring to a list that stores occupancy values according to distance, height of objects, and number of stacked objects. Note that this list may be stored in advance in the external memory (H14), or held by the mobile object management system 13, and may be sent to the information processing device via the communication I/F (H17) as necessary. It may also be downloaded to the information processing device 80.

占有マップは本実施形態で説明したような占有マップでなくとも、空間中の物体の有無を判別できるような構成であれば何でもよい。例えば所定の半径のポイントクラウドとして表してもよいし、何らかの関数で近似してもよい。二次元の占有マップに限らず三次元の占有マップを用いてもよく、例えば３Ｄのボクセル空間（Ｘ，Ｙ，Ｚ）で保持しても、ＴＳＤＦ（ＴｒｕｎｃａｔｅｄＳｉｇｎｅｄＤｉｓｔａｎｃｅＦｕｎｃｔｉｏｎ）である符号付距離場として保持してもよい。 The occupancy map does not have to be the occupancy map described in this embodiment, but may have any configuration as long as it can determine the presence or absence of an object in the space. For example, it may be expressed as a point cloud with a predetermined radius, or it may be approximated by some function. Not only a two-dimensional occupancy map but also a three-dimensional occupancy map may be used. For example, even if it is maintained in a 3D voxel space (X, Y, Z), a signed distance field that is a TSDF (Truncated Signed Distance Function) may be used. It may be held as

本実施形態においては、物体の高さや積み重なり数に応じて接近拒絶度を変えた線マップを用いて制御値を算出していたが、対象の意味情報を基に制御値を変化させるものであればこれに限らない。例えば、物体の高さや積み重なり数に応じた制御方法を記したリストを参照して制御値を決めてもよい。制御方法を記したリストとは、具体的には物体数と積み重なり数が所定の値であり条件を満たせば左に旋回する、減速するなどといった動作を規定したリストのことである。他にも所定の高さや積み重なり数の物体が見つかった時、それらが視野に写らないように回転するような制御値を算出するなど、事前に決めたルールベースでＡＧＶを制御してもよい。また、高さや積み重なり数が大きくなる程速度を低下させるような制御値を算出するなど、計測値を変数とした関数に当てはめてＡＧＶを制御してもよい。 In this embodiment, the control value is calculated using a line map in which the degree of approach rejection is changed according to the height of the object and the number of stacked objects, but the control value may be changed based on the semantic information of the object. It is not limited to this. For example, the control value may be determined by referring to a list that describes control methods depending on the height of objects and the number of stacked objects. Specifically, the list describing the control method is a list that specifies actions such as turning to the left or decelerating when the number of objects and the number of stacks are predetermined values and conditions are met. Alternatively, the AGV may be controlled based on predetermined rules, such as calculating a control value to rotate the object so that it does not appear in the field of view when objects of a predetermined height or number of stacks are found. Furthermore, the AGV may be controlled by applying a measurement value to a function as a variable, such as by calculating a control value that reduces the speed as the height or number of stacks increases.

＜変形例８－３＞
本実施形態においては、撮像部１１０はＡＧＶに搭載されていた例を示したが、ＡＧＶの進行方向を撮影できればＡＧＶに搭載されている必要は無い。具体的には、天井に取り付けられた監視カメラを撮像装置１１０として用いてもよい。このときには、撮像装置１１０がＡＧＶを撮影し、例えばＩＣＰアルゴリズムにより撮像装置１１０に対する位置姿勢を求めることができる。また、ＡＧＶ上部にマーカを貼っておき、撮像装置１１０がマーカを検出することで位置姿勢を求めることもできる。また、撮像装置１１０がＡＧＶの進行ルート上の物体を検出してもよい。撮像装置１１０は１台であっても複数台であってもよい。 <Modification 8-3>
In this embodiment, an example was shown in which the imaging unit 110 is mounted on the AGV, but it does not need to be mounted on the AGV as long as it can photograph the direction of movement of the AGV. Specifically, a surveillance camera attached to the ceiling may be used as the imaging device 110. At this time, the imaging device 110 photographs the AGV, and the position and orientation with respect to the imaging device 110 can be determined using, for example, an ICP algorithm. Alternatively, the position and orientation can be determined by pasting a marker on the top of the AGV and having the imaging device 110 detect the marker. Further, the imaging device 110 may detect an object on the traveling route of the AGV. The number of imaging devices 110 may be one or multiple.

位置姿勢算出部８１１０や意味情報認識部８１２０、制御部８１３０もＡＧＶに搭載されている必要は無い。例えば、制御部８１３０が移動体管理システム１３に搭載されている構成がある。この場合、通信Ｉ／Ｆ（Ｈ１７）を介し制御部８１３０が必要な情報を送受信するようにすることで実現できる。このようにすることで移動体であるＡＧＶ上に大きな計算機を乗せる必要が無くなり、ＡＧＶの重量が軽くて済むため、効率良くＡＧＶを運用することができる。 It is not necessary that the position/orientation calculation unit 8110, the semantic information recognition unit 8120, and the control unit 8130 are also installed in the AGV. For example, there is a configuration in which the control unit 8130 is installed in the mobile object management system 13. In this case, this can be realized by having the control unit 8130 transmit and receive necessary information via the communication I/F (H17). By doing this, there is no need to mount a large computer on the moving AGV, and the weight of the AGV can be reduced, so that the AGV can be operated efficiently.

＜変形例８－４＞
本実施形態においては、意味情報とは物体の積み重なり度合のことであった。しかしながら、ＡＧＶが安全に、効率よく運行するための制御値を算出することのできる意味情報であれば、意味情報認識部８１２０はどのような意味情報を認識してもよい。また、その意味情報を用いて、制御部８１３０が制御値を算出してもよい。 <Modification 8-4>
In this embodiment, the semantic information refers to the degree of stacking of objects. However, the semantic information recognition unit 8120 may recognize any semantic information as long as it is semantic information that allows the calculation of control values for the safe and efficient operation of the AGV. Further, the control unit 8130 may calculate a control value using the semantic information.

例えば、構造物の位置を意味情報として認識してもよい。具体的には、工場にある構造物である「ドア」の開き度合を意味情報として用いることもできる。ドアが閉まっている時と比べ、ドアが開いているまたは開きかけている場合にはＡＧＶを低速に走行させる。また、物がクレーンでつるされていることを認識し、物の下部に潜り込まないように制御する。このようにすることで、より安全にＡＧＶを運用する。 For example, the position of a structure may be recognized as semantic information. Specifically, the degree of opening of a "door," which is a structure in a factory, can be used as semantic information. The AGV is made to run at a lower speed when the door is open or about to open compared to when the door is closed. It also recognizes when an object is being suspended by a crane and controls the device to prevent it from getting under the object. By doing so, the AGV can be operated more safely.

本実施形態では積み重ね度合を認識したが、積み重ねに限らず近接して並んでいることを認識してもよい。例えば、複数の台車を認識し、それらの距離が所定の値より小さければ所定の距離以上離れて運行するような制御値を算出する。 In this embodiment, the degree of stacking is recognized, but the recognition is not limited to stacking, but may also be recognizing that they are lined up closely. For example, a plurality of trolleys are recognized, and if the distance between them is smaller than a predetermined value, a control value is calculated such that the trolleys are operated at a distance greater than or equal to a predetermined distance.

また、他のＡＧＶとその上方に位置する荷物を認識してＡＧＶの上に荷物が乗っていることを認識してもよい。他のＡＧＶが荷物を搭載していれば自分（ＡＧＶ）が避け、そうでなければ自分（ＡＧＶ）は直進し、他のＡＧＶに自分（ＡＧＶ）を避ける制御値を移動体管理システム１３経由で送信してもよい。他のＡＧＶに荷物が搭載されている場合にはさらに、荷物の大きさを認識し、大きさに応じて制御方法を決めてもよい。このように荷物を搭載してかどうかや荷物の大きさを判別することで、荷物を搭載していないＡＧＶやより小さな荷物を搭載しているＡＧＶが回避動作をすることで移動にかかるエネルギーや時間を小さくし効率的にＡＧＶを運行できる。さらには、他のＡＧＶに搭載された物体の種別を認識し、種別に応じた物体の価値や壊れやすさを認識することで、価値が高いものや壊れやすいものを搭載していれば自分（ＡＧＶ）が避けるような制御値を算出してもよい。荷物の種別を認識することで、荷物への損害を小さく、安全にＡＧＶを運用することができる
入力画像に写る物体の外形を意味情報として用いることもできる。具体的には、検出した物体がとがっている場合にはそのような物体から距離をおいて走行するようにすることでＡＧＶに傷を負わすことなく安全に運用できる。また、壁のように平らな物体であれば、一定距離を運行するようにすることで、ＡＧＶのふらつきを抑え、安定して効率よく運行できる。 Alternatively, it may be possible to recognize other AGVs and the cargo located above them, and to recognize that the cargo is on top of the AGV. If the other AGV is loaded with cargo, the user (AGV) will avoid it, otherwise the user (AGV) will go straight, and the other AGV will be given control values to avoid the user (AGV) via the mobile body management system 13. You can also send it. If cargo is loaded on another AGV, the size of the cargo may be further recognized and the control method may be determined depending on the size. In this way, by determining whether cargo is loaded or not and the size of the cargo, AGVs without cargo or carrying smaller cargo can take evasive action, reducing the energy required for movement. AGV can be operated efficiently by reducing the time required. Furthermore, by recognizing the type of objects carried on other AGVs and recognizing the value and fragility of objects according to the type, it is possible to recognize objects of high value or breakage by recognizing them ( It is also possible to calculate a control value that the AGV avoids. By recognizing the type of baggage, it is possible to minimize damage to the baggage and operate the AGV safely.The external shape of the object in the input image can also be used as semantic information. Specifically, if the detected object is sharp, the AGV can be operated safely without damaging the AGV by keeping a distance from the object. In addition, if the object is flat like a wall, by making the AGV travel a certain distance, the AGV can be prevented from wandering and can be operated stably and efficiently.

意味情報として、物体自体の危険度や壊れやすさを認識してもよい。例えば、段ボールに印字された「危険」という文字や髑髏マークを認識したら、そのような段ボールからは所定の距離以上離れて移動するようＡＧＶを制御する。このようにすることで、物体の危険性や壊れやすさをもとにより安全にＡＧＶを運用することができる。また、工場の自動機の稼働状況を示す積層灯の点灯状況を認識し、自動機が稼働中であれば所定の距離以上近づかないような制御値を算出する。このようにすると、自動機の安全センサにＡＧＶが検出されてしまい自動機を止めるようなことがなくなり、効率的にＡＧＶを運用することができる。 As semantic information, the degree of danger or fragility of the object itself may be recognized. For example, if the word "danger" or a skull mark printed on a cardboard is recognized, the AGV is controlled to move away from such cardboard by a predetermined distance or more. By doing so, the AGV can be operated more safely based on the danger and fragility of the object. It also recognizes the lighting status of stacked lights that indicate the operating status of automatic machines in the factory, and calculates control values that prevent the robot from getting closer than a predetermined distance if the automatic machines are in operation. In this way, the automatic machine will not be stopped due to the AGV being detected by the safety sensor of the automatic machine, and the AGV can be operated efficiently.

＜変形例８－５＞
本実施形態においては、意味情報を基にＡＧＶを減速する方法を述べた。しかしながら、制御方法は上記の方法に限らず、ＡＧＶを効率的に、安全に運用できる方法であればよい。例えば、加減速のパラメータを変えるようにしてもよい。つまり意味情報に応じた減速においても緩やかに減速をするのか、急に減速するのかといった緻密な制御ができるようになる。回避のパラメータを変更してもよく、物体の近くを回避するのか、大きく回避するのか、ルートを変更して回避するのか、止まるのかといった制御を切り替えるように構成してもよい。また、ＡＧＶの制御値算出の頻度を増減する。頻度を増加させることでより緻密な制御ができるようになり、逆に低下させることにより緩徐に制御ができるようになる。このように意味情報を基に制御方法を変更することでより効率よく、安全にＡＧＶを運用する。 <Modification 8-5>
In this embodiment, a method for decelerating an AGV based on semantic information has been described. However, the control method is not limited to the above method, and any method that allows the AGV to be operated efficiently and safely may be used. For example, acceleration/deceleration parameters may be changed. In other words, even when decelerating according to semantic information, it becomes possible to precisely control whether to decelerate gradually or suddenly. The avoidance parameters may be changed, and the control may be switched such as whether to avoid near the object, avoid it largely, change the route to avoid it, or stop. Also, the frequency of AGV control value calculation is increased or decreased. Increasing the frequency allows for more precise control; conversely, decreasing the frequency allows for slower control. By changing the control method based on semantic information in this way, the AGV can be operated more efficiently and safely.

［実施形態９］
実施形態８では、ＡＧＶの周りに存在する物体の積み重ね度合や形状、構造物の状態といった、ある一時刻の周囲の静的な意味情報を基にＡＧＶを制御していた。実施形態９では、それらの時間的変化を踏まえてＡＧＶを制御する。本実施形態における意味情報とは、画像に写る物体の移動量のことを指す。なお、本実施形態においては、画像に写る物体の移動量に加え、物体の種別も合わせて認識し、その結果をもとにＡＧＶの制御値を算出する方法について述べる。具体的には、周囲の物体の種別として他の移動体であるＡＧＶとそれらに積まれた荷物、および他のＡＧＶの移動量を認識し、それら認識結果に基づいて自身（ＡＧＶ）または他のＡＧＶの制御値を算出する。 [Embodiment 9]
In the eighth embodiment, the AGV is controlled based on static semantic information surrounding the AGV at a certain time, such as the stacking degree and shape of objects existing around the AGV, and the state of structures. In the ninth embodiment, the AGV is controlled based on these temporal changes. The semantic information in this embodiment refers to the amount of movement of an object in an image. In this embodiment, a method will be described in which the type of the object is recognized in addition to the amount of movement of the object in the image, and the AGV control value is calculated based on the result. Specifically, it recognizes the types of surrounding objects, such as other moving objects such as AGVs and the cargo loaded on them, as well as the amount of movement of other AGVs, and based on these recognition results, it Calculate the AGV control value.

本実施形態における情報処理装置の構成は、実施形態８で説明した情報処理装置８０の図１４と同一であるので説明を省略する。実施形態８と異なるのは、意味情報認識部８１２０が推定し制御部８１３０に入力する意味情報が、検出した物体種としてＡＧＶとそれに積まれた荷物、そして他のＡＧＶの移動量である点である。 The configuration of the information processing apparatus in this embodiment is the same as that of the information processing apparatus 80 described in Embodiment 8 shown in FIG. 14, so the description thereof will be omitted. The difference from the eighth embodiment is that the semantic information estimated by the semantic information recognition unit 8120 and inputted to the control unit 8130 is the detected object type, which is the AGV, the cargo loaded on it, and the amount of movement of other AGVs. be.

本実施形態における処理手順の図は、実施形態８で説明した情報処理装置８０の処理手順を説明する図１５と同一であるため説明を省略する。実施形態８と異なるのは、意味情報推定ステップＳ８２０、および制御値算出Ｓ８３０の処理内容である。 The diagram of the processing procedure in this embodiment is the same as FIG. 15 illustrating the processing procedure of the information processing apparatus 80 described in Embodiment 8, so the explanation will be omitted. What differs from the eighth embodiment is the processing content of the semantic information estimation step S820 and the control value calculation S830.

意味情報推定ステップＳ８２０では、意味情報認識部８１２０が、デプスマップを領域分割し、さらに領域毎に物体の種別を推定する。このとき、合わせて推定した物体の位置およびサイズを推定する。次に、検出した物体のうち（他の）ＡＧＶの位置とその（他の）ＡＧＶの過去の位置とを比較し、（他の）ＡＧＶの移動量を算出する。本実施形態においては、他のＡＧＶの移動量とは、自分（ＡＧＶ）に対する相対位置姿勢の変化量のことである。 In the semantic information estimation step S820, the semantic information recognition unit 8120 divides the depth map into regions and further estimates the type of object for each region. At this time, the position and size of the estimated object are also estimated. Next, the position of the (other) AGV among the detected objects is compared with the past position of the (other) AGV, and the amount of movement of the (other) AGV is calculated. In this embodiment, the amount of movement of another AGV is the amount of change in the relative position and orientation with respect to itself (the AGV).

まず、実施形態６で説明したように、画像、およびデプスマップを基に、デプスマップを領域分割し、領域毎の物体種を特定する。 First, as described in the sixth embodiment, the depth map is divided into regions based on the image and the depth map, and the object type for each region is identified.

次にＡＧＶと認識された領域を抽出し、他の領域との相対位置関係を算出する。このとき、ＡＧＶとの距離が所定の閾値より小さく、かつＡＧＶと認識された領域より鉛直（Ｙ軸）方向にある領域をＡＧＶに搭載された荷物領域であると判定する。さらに、ＡＧＶのサイズ、および搭載された荷物領域の大きさを取得する。なお、サイズとは荷物領域を囲むバウンディングボックスの長辺の長さとする。 Next, the area recognized as an AGV is extracted, and the relative positional relationship with other areas is calculated. At this time, an area where the distance to the AGV is smaller than a predetermined threshold and is located in the vertical (Y-axis) direction from the area recognized as the AGV is determined to be the cargo area loaded on the AGV. Furthermore, the size of the AGV and the size of the loaded cargo area are obtained. Note that the size is defined as the length of the long side of the bounding box surrounding the baggage area.

そして、時刻ｔ－１と時刻ｔにＡＧＶと認識された領域をそれぞれ抽出し、それらの相対位置関係を、ＩＣＰアルゴリズムを用いて算出する。なお、算出した相対位置関係とは、自分（ＡＧＶ）に対する他のＡＧＶの相対位置姿勢の変化量のことである。これを以降で他のＡＧＶの移動量と呼ぶ。 Then, regions recognized as AGVs at time t-1 and time t are respectively extracted, and their relative positional relationships are calculated using the ICP algorithm. Note that the calculated relative positional relationship is the amount of change in the relative position and orientation of another AGV with respect to the AGV itself (AGV). This will hereinafter be referred to as the amount of movement of other AGVs.

制御値算出Ｓ８３０は、ステップＳ８２０において意味情報認識部８１２０が算出した他のＡＧＶの移動量、および他のＡＧＶとそれに搭載された荷物のサイズを基に、制御部８１３０が自分（ＡＧＶ）の行動を決定する。 In control value calculation S830, the control unit 8130 calculates the behavior of itself (AGV) based on the movement amount of the other AGV calculated by the semantic information recognition unit 8120 in step S820, and the size of the other AGV and the luggage loaded thereon. Determine.

まず、他のＡＧＶの移動量より、他のＡＧＶが自分（ＡＧＶ）に近づいているか遠ざかっているのか判定する。他のＡＧＶが遠ざかっている場合には制御値は変更しない。一方近づいている場合には、さらに荷物のサイズを基づいて新たな制御値を算出する。具体的には、あらかじめ不図示の入力手段によりＲＡＭ（Ｈ１３）に格納しておいた自分（ＡＧＶ）のサイズと、他のＡＧＶとその荷物のサイズとを比較し、自分（ＡＧＶ）の方が小さければ自分（ＡＧＶ）が他のＡＧＶを回避するルート計画を行う。一方、自分（ＡＧＶ）の方が大きければ、自分（ＡＧＶ）の速度を減速しつつ通信インターフェイスＨ１７を介し、移動体管理システム１３に検出した他のＡＧＶに回避動作を行わせる信号を送る。 First, based on the amount of movement of the other AGV, it is determined whether the other AGV is approaching or moving away from itself (AGV). If other AGVs are moving away, the control value is not changed. On the other hand, if the cargo is approaching, a new control value is calculated based on the size of the cargo. Specifically, the size of one's own (AGV) stored in the RAM (H13) in advance using an input means (not shown) is compared with the size of other AGVs and their luggage, and the size of one's own (AGV) is determined to be larger. If it is smaller, you (AGV) plan a route to avoid other AGVs. On the other hand, if the own (AGV) is larger, the mobile unit management system 13 sends a signal to the detected other AGV to perform an avoidance operation via the communication interface H17 while reducing the speed of the own (AGV).

以上のように、実施形態９においては意味情報としてＡＧＶの周囲の物体の種別を判定し、かつ他のＡＧＶであればさらに移動量と搭載した荷物の大きさを推定した結果をもとに制御値を算出する。このとき、他のＡＧＶやその荷物が自分（ＡＧＶ）より大きければ自分が回避し、逆に小さければ相手を回避させるような制御を行う。このようにすることで、サイズが小さく、搭載している荷物が小さいＡＧＶがルートを譲るような制御ができるようになり、時間およびエネルギー効率よくＡＧＶを運用することができる。 As described above, in Embodiment 9, the type of objects around the AGV is determined as semantic information, and in the case of other AGVs, the control is performed based on the results of estimating the amount of movement and the size of the loaded baggage. Calculate the value. At this time, if the other AGV or its cargo is larger than itself (AGV), it will avoid it, and if it is smaller, it will avoid the other AGV. By doing so, it becomes possible to perform control such that an AGV that is small in size and carries a small amount of luggage yields the route, and it is possible to operate the AGV in a time- and energy-efficient manner.

＜変形例９－１＞
本実施形態においては、他の移動体としてＡＧＶを検出していた。しかしながら、ＡＧＶに限らず、少なくとも位置や姿勢が変化し、それに応じてＡＧＶの制御を変えることができればどのような物体を検出してもよい。具体的には、移動体としてフォークリフトや移動ロボットを検出してもよい。また、装置の一部の位置や姿勢が変化量を意味情報として認識し、それに応じてＡＧＶの制御を変えてもよい。例えば、自動機やアームロボットのアーム、ベルトコンベアといった機器の可動部の移動量が例えば所定の動作速度より大きければ、よりＡＧＶがそれらから所定の距離を離れて制御するようにしてもよい。 <Modification 9-1>
In this embodiment, an AGV was detected as another moving object. However, the present invention is not limited to AGVs, and any object may be detected as long as the position or orientation changes and the control of the AGV can be changed accordingly. Specifically, a forklift or a mobile robot may be detected as the moving object. Alternatively, the amount of change in the position or orientation of a part of the device may be recognized as semantic information, and the control of the AGV may be changed accordingly. For example, if the amount of movement of a movable part of a device such as an automatic machine, an arm of an arm robot, or a belt conveyor is greater than a predetermined operating speed, the AGV may be controlled at a predetermined distance from them.

＜変形例９－２＞
本実施形態においては、自分と他のＡＧＶとどちらかが回避するような制御値を算出していたが、対象物の動きに応じて制御値を変えるような制御方法であればよい。具体的には、移動量の大小に応じて実施形態８で説明した占有マップの接近拒絶度の値を動的に更新し、これを用いてＡＧＶの制御値を算出してもよい。 <Modification 9-2>
In this embodiment, the control value is calculated so that either the user or another AGV avoids the vehicle, but any control method that changes the control value depending on the movement of the object may be used. Specifically, the approach rejection degree value of the occupancy map described in the eighth embodiment may be dynamically updated depending on the magnitude of the movement amount, and this may be used to calculate the control value of the AGV.

また、他のＡＧＶの移動量が自分（ＡＧＶ）と同じ方向であれば他のＡＧＶに追従して移動するような制御値を算出してもよい。十字路に差し掛かった時に先に横方向から来たＡＧＶがあれば、通過し終えるまで待つという制御値を算出してもよし、自分が先行して十字路を進行する場合には移動体管理システム１３を通じて他のＡＧＶを待機させるような制御値を算出してもよい。さらに、他のＡＧＶが例えば進行方向に対し左右に振動していることを観測した場合や、他のＡＧＶに搭載された荷物が他のＡＧＶに対し振動する動きが観測された場合には、一定距離以上近づかないようなルートを通るような制御値を算出してもよい。 Furthermore, if the movement amount of the other AGV is in the same direction as the self (AGV), a control value may be calculated that causes the AGV to follow the other AGV. When you approach a crossroads, if there is an AGV that came from the side first, you can calculate a control value that waits until it finishes passing, or if you want to advance through the crossroads in advance, you can use the mobile object management system 13 to calculate the control value. A control value that causes other AGVs to stand by may also be calculated. Furthermore, if it is observed that another AGV is vibrating to the left or right with respect to the direction of travel, or if the cargo loaded on another AGV is observed to be vibrating relative to the other AGV, the It is also possible to calculate a control value that takes a route that does not approach the object more than the distance.

物体の動きからさらに作業工程を意味情報として認識してもよい。例えば、ロボットが他のＡＧＶに荷物を積む動作をしていることを認識してもよい。このとき、自分（ＡＧＶ）は別のルートを探索するような制御値を算出してもよい。また、例えば物流倉庫において出荷用パレットに荷物が載せられる動作を認識し、移動体（フォークリフト）が当該パレットに接近するように制御してもよい。このように、対象の動きを意味情報として認識し、それらに合わせて移動体であるＡＧＶやフォークリフトを制御することでより効率よく運用する。 The work process may be further recognized as semantic information from the movement of the object. For example, the robot may recognize that it is loading cargo onto another AGV. At this time, the AGV may calculate a control value to search for another route. Furthermore, for example, the operation of loading cargo onto a shipping pallet in a distribution warehouse may be recognized, and a moving object (a forklift) may be controlled to approach the pallet. In this way, the movement of an object is recognized as semantic information, and moving objects such as AGVs and forklifts are controlled accordingly, allowing for more efficient operation.

［実施形態１０］
実施形態１０ではさらに、人の作業や役割を認識した結果を基により安全にＡＧＶを運行する方法について説明する。本実施形態においては、意味情報として人と人が保持する物体種をもとに人の作業種別を推定し、作業種別に応じてＡＧＶの制御を行う。具体例を挙げると、人と人が押すハンドリフトを検出し作業種別として運搬作業を認識してＡＧＶを回避させる制御や、人と人が持つ溶接機を検出し作業種別として溶接作業を認識しＡＧＶのルートを変更するといった制御を実現する。なお、本実施形態においては、人と人が持つ物体に対応してＡＧＶの制御を決める接近拒絶度のパラメータが、事前に人手によって与えられているものとする。パラメータとは具体的には、例えば、人が大きな荷物を持っている場合に０．４、人が台車を押していれば０．６、人が溶接機を持っていれば０．９といった値のことである。本実施形態においては、これらパラメータが保持されたパラメータリストを移動体管理システム１３が保持している。必要に応じて情報処理装置に通信Ｉ／Ｆ（Ｈ１７）を介して移動体管理システム１３から情報処理装置８０にダウンロードし、外部メモリ（Ｈ１４）に保持して参照できるものとする。 [Embodiment 10]
Embodiment 10 further describes a method for operating an AGV more safely based on the results of recognizing people's tasks and roles. In this embodiment, a person's work type is estimated based on the person and the object type held by the person as semantic information, and the AGV is controlled according to the work type. To give specific examples, the system detects hand lifts being pushed by people, recognizes transport work as the type of work, and avoids AGVs, and detects welding machines held by people and recognizes welding as the type of work. This realizes control such as changing the AGV route. In this embodiment, it is assumed that the parameter of the degree of approach rejection, which determines the control of the AGV corresponding to the person and the object held by the person, is manually given in advance. Specifically, the parameter is, for example, a value of 0.4 if a person is carrying a large baggage, 0.6 if a person is pushing a cart, and 0.9 if a person is holding a welding machine. That's true. In this embodiment, the mobile object management system 13 holds a parameter list in which these parameters are held. It is assumed that the information can be downloaded from the mobile object management system 13 to the information processing device 80 via the communication I/F (H17) to the information processing device as needed, and stored in the external memory (H14) for reference.

本実施形態における情報処理装置の構成は、実施形態８で説明した情報処理装置８０の図１４と同一であるので説明を省略する。実施形態８と異なるのは、意味情報認識部８１２０が推定し制御部８１３０に入力する意味情報が異なる。 The configuration of the information processing apparatus in this embodiment is the same as that of the information processing apparatus 80 described in Embodiment 8 shown in FIG. 14, so the description thereof will be omitted. The difference from the eighth embodiment is the semantic information estimated by the semantic information recognition unit 8120 and input to the control unit 8130.

意味情報推定ステップＳ８２０では、意味情報認識部８１２０が、入力画像から人、および人が保持する物体種を認識する。そして、あらかじめ外部メモリＨ１４に保持しておいた人と人が保持する物体に応じたＡＧＶの制御ルールを記録したパラメータリストに基づいてＡＧＶを制御する。 In the semantic information estimation step S820, the semantic information recognition unit 8120 recognizes the person and the object type held by the person from the input image. Then, the AGV is controlled based on a parameter list that is stored in advance in the external memory H14 and records control rules for the AGV according to the person and the object held by the person.

まず、視覚情報から人の手の部位を検出する。人の手の部位の検出には、人の各部位とそれらの接続関係を認識し、人の骨格を推定する方法を援用する。そして、人の手の位置にあたる画像座標を取得する。 First, the part of a person's hand is detected from visual information. To detect parts of a person's hand, a method that recognizes each part of the person and their connection relationships and estimates the human skeleton is used. Then, the image coordinates corresponding to the position of the person's hand are obtained.

次に、人が保持する物体種を検出する。物体の検出には実施形態６で述べた、画像を物体種ごとに領域分割するよう学習したニューラルネットワークを用いる。分割した領域のうち、人の手の位置の画像座標と所定の距離以内にある領域を人の保持する物体領域として認識し、当該領域に割り当てられた物体種を取得する。なお、ここでいう物体種とは、前述のリストが保持する物体ＩＤと一意に対応付けられるものである。 Next, the type of object held by the person is detected. Object detection uses a neural network that has learned to divide an image into regions for each object type, as described in the sixth embodiment. Among the divided regions, a region within a predetermined distance from the image coordinates of the position of the person's hand is recognized as an object region held by the person, and the object type assigned to the region is acquired. Note that the object type referred to here is one that is uniquely associated with the object ID held in the above-mentioned list.

最後に、得られた物体ＩＤと前述の制御パラメータリストを参照し、接近拒絶度のパラメータを取得する。取得したパラメータは意味情報認識部８１２０が制御部８１３０に入力する。 Finally, the obtained object ID and the above-mentioned control parameter list are referred to to obtain the parameter of the degree of approach rejection. The acquired parameters are input by the semantic information recognition unit 8120 to the control unit 8130.

制御値算出ステップＳ８３０では、ステップＳ８２０において意味情報認識部８１２０が算出した物体の接近拒絶度のパラメータを基に、制御部８１３０が自分（ＡＧＶ）の行動を決定する。なお、実施形態８で説明した占有マップの接近拒絶度の値を以下のように更新することで、制御値を算出する。なお、これは接近拒絶度の値が大きい程大きく、距離が離れるほど小さくなるような関数である。 In the control value calculation step S830, the control unit 8130 determines its own (AGV) behavior based on the parameter of the degree of approach rejection of the object calculated by the semantic information recognition unit 8120 in step S820. Note that the control value is calculated by updating the approach rejection degree value of the occupancy map described in the eighth embodiment as follows. Note that this is a function that increases as the value of the degree of approach rejection increases, and decreases as the distance increases.

なお、Ｓｃｏｒｅ_ｊがｊ番目の格子の値である。ｓ_ｉが、ステップＳ８２０において検出したｉ番目の物体の接近拒絶度を表すパラメータである。以上のように定めた占有マップを用い、実施形態８で説明したようにＡＧＶの進行ルートを決める。 Note that Score _j is the value of the j-th lattice. s _i is a parameter representing the degree of approach rejection of the i-th object detected in step S820. Using the occupancy map determined as above, the travel route of the AGV is determined as described in the eighth embodiment.

さらにＡＧＶの速度の最大値ｖ_ｍａｘを進行中の占有マップの接近拒絶度の値を基に次ように制限するように制御値を算出する。 Further, a control value is calculated so as to limit the maximum value v _max of the AGV speed as follows based on the approach rejection value of the ongoing occupancy map.

αは占有マップの接近拒絶度の値と速度との調整パラメータであり、βは現在ＡＧＶが通行中の占有マップの接近拒絶度の値である。ｖ_ｍａｘは、占有マップの接近拒絶度の値が大きくなる（１に近づく）程０に近づくような値として算出される。このようにして制御部８１３０が算出した制御値をアクチュエータ１３０に出力する。 α is an adjustment parameter between the approach rejection value of the occupancy map and the speed, and β is the approach rejection value of the occupancy map currently being passed by the AGV. v _max is calculated as a value that approaches 0 as the approach rejection degree value of the occupancy map increases (closes to 1). The control value calculated by the control unit 8130 in this manner is output to the actuator 130.

実施形態１０では、人と人が保持する物体の組み合わせから人の作業の種別を求め、接近拒絶度を表すパラメータを決める。そして接近拒絶度が大きい程、人から遠ざかるように低速に動作するような制御値を算出する。これにより、人の作業に応じて適切な距離をおいてＡＧＶを制御する。このようにして、ＡＧＶをより安全に制御することができる。 In the tenth embodiment, the type of work of the person is determined from the combination of the person and the object held by the person, and a parameter representing the degree of approach rejection is determined. Then, as the degree of approach rejection increases, a control value is calculated such that the robot operates at a slower speed so as to move further away from the person. This allows the AGV to be controlled at an appropriate distance depending on the person's work. In this way, the AGV can be controlled more safely.

＜変形例＞
実施形態１０では、人と人が保持する物体の組み合わせを意味情報として認識していたが、人に付随する状態を認識してより安全にＡＧＶを制御する方法であれば上記方法に限らない。 <Modified example>
In Embodiment 10, the combination of a person and an object held by the person is recognized as semantic information, but the method is not limited to the above method as long as it is a method that recognizes the state associated with the person and more safely controls the AGV.

意味情報として、人の服装を認識してもよい。例えば、工場において、作業着を着ているのが作業者、スーツを着ているのが見学者であることを認識するとする。この認識結果を用いてＡＧＶの動きに慣れている作業者と比較し、特にＡＧＶの動きに慣れていないような見学者の付近を通るときにはよりゆっくりと進行するようにしてより安全にＡＧＶを制御する。 As semantic information, a person's clothing may be recognized. For example, assume that in a factory, it is recognized that the people wearing work clothes are workers and the people wearing suits are visitors. Using this recognition result, we can control the AGV more safely by making it move more slowly, especially when passing near visitors who are not accustomed to the movement of the AGV, compared to workers who are accustomed to the movement of the AGV. do.

意味情報として、人の年齢を認識してもよい。例えば、病院において院内配送を行うＡＧＶにおいて、子供や年配の人を認識した時には、ゆっくりと所定の距離をおいて通過することでより、より安全にＡＧＶを運用することができるようになる。 A person's age may be recognized as semantic information. For example, when an AGV that performs intra-hospital delivery in a hospital recognizes a child or elderly person, the AGV can be operated more safely by passing the person slowly at a predetermined distance.

意味情報として、人の動きを認識してもよい。例えば、ホテルでの荷物運びをするＡＧＶにおいて、千鳥足で歩くように、人が前後左右に繰り返し移動していることを認識した場合には所定の距離をおいて通過する制御値を算出し、より安全にＡＧＶを運用することができる。 The movement of a person may be recognized as semantic information. For example, when an AGV that carries luggage at a hotel recognizes that a person is repeatedly moving back and forth and left and right, as if walking in a staggered manner, it calculates a control value for passing at a predetermined distance, and AGV can be operated safely.

人の動きから作業を認識してもよい。具体的には、工場において作業者がＡＧＶに荷物を積み込もうとする動作を検出したら、ゆっくりと作業者に近づき荷物の積み込みが終わるまで停止するような制御値を算出してもよい。こうすることで作業者がＡＧＶの停止位置まで移動してから荷物を積み込む必要が無く、効率良く作業を行うことができる。 Work may also be recognized from people's movements. Specifically, if a worker attempts to load cargo onto an AGV in a factory, a control value may be calculated that causes the AGV to slowly approach the worker and stop until the worker finishes loading the cargo. This eliminates the need for the worker to move to the AGV's stopping position and then load the cargo, allowing the worker to work efficiently.

人の人数を意味情報として認識してもよい。具体的には、ＡＧＶの進行ルート上に所定の数より多数の人を認識した場合には、ルートを変更する。このようにすることで、たとえ人の間を縫って進行した場合に万が一人と接触することを避け、より安全にＡＧＶを運用することができる。 The number of people may be recognized as semantic information. Specifically, if more people than a predetermined number are recognized on the route of the AGV, the route is changed. By doing this, even if the AGV passes between people, it can avoid contact with anyone and operate the AGV more safely.

［実施形態１１］
実施形態８から１０に共通して適用できるＵＩについて説明する。撮像部１１０が取得した視覚情報や、位置姿勢算出部８１１０が算出した位置姿勢、マップ情報、制御値を提示するＵＩに加え、さらに実施形態８から１０で説明した意味情報や占有マップに割り当てた値といった情報表示する。 [Embodiment 11]
A UI that can be commonly applied to embodiments 8 to 10 will be described. In addition to the UI that presents the visual information acquired by the imaging unit 110, the position and orientation calculated by the position and orientation calculation unit 8110, map information, and control values, the UI also presents the semantic information and occupancy map described in Embodiments 8 to 10. Display information such as values.

実施形態１１における装置の構成は、実施形態８で説明した情報処理装置８０の構成を示す図２と同一であるため省略する。なお、表示のための機器の構成に関しては実施形態７で説明した構成と同一であるため省略する。 The configuration of the device in Embodiment 11 is the same as that shown in FIG. 2 showing the configuration of information processing device 80 described in Embodiment 8, so a description thereof will be omitted. Note that the configuration of the display device is the same as the configuration described in the seventh embodiment, so a description thereof will be omitted.

図１３に、本実施形態における表示装置が提示する表示情報の一例であるＧＵＩ２００を示す。Ｇ２１０は撮像部１１０が取得した視覚情報および意味情報認識部８１２０が認識した意味情報を提示するためのウィンドウである。Ｇ２２０は実施形態８で述べたＡＧＶのナビゲーションのための接近拒絶度を提示するためのウィンドウである。またＧ２３０は２Ｄの占有マップを提示するためのウィンドウである。また、Ｇ２４０は、ＡＧＶを人手で操作するためのＧＵＩや、位置姿勢算出部８１１０や意味情報認識部８１２０、制御部８１３０が算出した値、ＡＧＶの運行情報を提示するためのウィンドウである。 FIG. 13 shows a GUI 200 that is an example of display information presented by the display device in this embodiment. G210 is a window for presenting the visual information acquired by the imaging unit 110 and the semantic information recognized by the semantic information recognition unit 8120. G220 is a window for presenting the degree of access rejection for AGV navigation described in the eighth embodiment. Further, G230 is a window for presenting a 2D occupancy map. Further, G240 is a window for presenting a GUI for manually operating the AGV, values calculated by the position/orientation calculation unit 8110, semantic information recognition unit 8120, and control unit 8130, and operation information of the AGV.

Ｇ２１０は、意味情報認識部８１２０が検出した意味情報として、複数の物体とそれらの相対距離、および接近拒絶度の値の提示例を示している。Ｇ２１１は、検出した物体のバウンディングボックスである。本実施形態においては、他のＡＧＶとその荷物を検出しそれらを囲むバウンディングボックスを点線で表示している。なお、複数の物体を統合してバウンディングボックスを提示しているが、検出した物体それぞれにバウンディングボックスを描いてもよい。また、バウンディングボックスは検出した物体の位置がわかれば何でもよく点線で描いても実線で描いてもよし、半透明のマスクを重畳して提示してもよい。Ｇ２１２は検出した意味情報を提示するポップアップである。検出した複数の物体種とそれらの距離、および接近拒絶度の値を提示している。このように、認識した意味情報を視覚情報に重畳して提示することで、ユーザが直感的に視覚情報と意味情報を関連付けて把握することができる。 G210 shows an example of presentation of a plurality of objects, their relative distances, and approach rejection values as semantic information detected by the semantic information recognition unit 8120. G211 is the bounding box of the detected object. In this embodiment, other AGVs and their luggage are detected and bounding boxes surrounding them are displayed with dotted lines. Note that although a bounding box is presented by integrating a plurality of objects, a bounding box may be drawn for each detected object. Further, the bounding box may be of any type as long as the position of the detected object is known, and may be drawn as a dotted line or a solid line, or may be presented by superimposing a semi-transparent mask. G212 is a pop-up that presents detected semantic information. It presents multiple detected object types, their distances, and approach rejection values. In this way, by presenting the recognized semantic information in a superimposed manner on the visual information, the user can intuitively associate and understand the visual information and the semantic information.

Ｇ２２０は、撮像部１１０が取得した視覚情報に、制御部８１３０が算出したＡＧＶの接近拒絶度を重畳した例である。Ｇ２２１は、接近拒絶度が高い程濃い色を重畳している。このように接近拒絶度を視覚情報に重畳して提示することで、ユーザは直感的に視覚情報と接近拒絶度の値を関連付けて把握することができる。なお、Ｇ２２１は色や濃度、形状を変えることでより容易にユーザが接近拒絶度を把握できるようにしてもよい。 G220 is an example in which the degree of approach rejection of the AGV calculated by the control unit 8130 is superimposed on the visual information acquired by the imaging unit 110. In G221, the higher the degree of access rejection, the darker the color is superimposed. By presenting the approach rejection degree superimposed on the visual information in this way, the user can intuitively associate the visual information with the value of the approach rejection degree. Note that G221 may be changed in color, density, and shape so that the user can more easily understand the degree of approach rejection.

Ｇ２３０は、制御部８１３０が算出した占有マップと意味情報認識部８１２０が認識した意味情報を提示例である。Ｇ２３１は占有マップの接近拒絶度の値が大きい程濃く、小さい程薄くなるように占有マップの接近拒絶度の値を可視化している。Ｇ２３２はさらに、意味情報認識部８１２０が認識した意味情報として、構造物の位置を提示している。本実施形態では、工場の扉が開いていることを認識した結果を提示している例を示した。Ｇ２３３はさらに、意味情報認識部８１３０が認識した意味情報として、周囲の物体の移動量を提示している。本実施形態では物体の移動方向と速度を提示した。このようにして、占有マップとその値、意味情報の認識結果を提示することで、ユーザは容易にそれらを関連付けてＡＧＶの内部状態を把握することができる。また、このように占有マップを提示することで、ユーザが制御部８１３０のＡＧＶのルート生成過程を容易に把握することができるようになる。 G230 is an example of presenting the occupancy map calculated by the control unit 8130 and the semantic information recognized by the semantic information recognition unit 8120. G231 visualizes the access rejection value of the occupancy map so that the larger the access rejection value of the occupancy map is, the darker it becomes, and the smaller the access rejection value of the occupancy map, the lighter it becomes. G232 further presents the position of the structure as the semantic information recognized by the semantic information recognition unit 8120. In this embodiment, an example is shown in which the result of recognizing that the door of the factory is open is presented. G233 further presents the amount of movement of surrounding objects as semantic information recognized by the semantic information recognition unit 8130. In this embodiment, the moving direction and speed of the object are presented. In this way, by presenting the occupancy map, its value, and the recognition result of the semantic information, the user can easily associate them and understand the internal state of the AGV. Further, by presenting the occupancy map in this manner, the user can easily understand the AGV route generation process by the control unit 8130.

Ｇ２４０は、ＡＧＶを人手で操作するためのＧＵＩや、位置姿勢算出部８１１０や意味情報認識部８１２０、制御部８１３０が算出した値、ＡＧＶの運行情報の提示例を示している。Ｇ２４１は意味情報認識部８１２０が認識する意味情報や、認識結果を表示するか否かといった設定をするためのＧＵＩであり、例えば項目のオンオフを切り替えるラジオボタンである。Ｇ２４２は、制御部８１３０が算出する接近拒絶距離や制御値を算出するパラメータを調整するためのＧＵＩであり、例えばスライドバーや数字入力フォームがこれにあたる。 G240 shows a GUI for manually operating the AGV, values calculated by the position/orientation calculation unit 8110, the semantic information recognition unit 8120, and the control unit 8130, and a presentation example of the AGV operation information. G241 is a GUI for making settings such as the semantic information recognized by the semantic information recognition unit 8120 and whether or not to display the recognition results, and is, for example, a radio button for switching items on and off. G242 is a GUI for adjusting parameters for calculating the approach rejection distance and control value calculated by the control unit 8130, and is, for example, a slide bar or a number input form.

＜変形例＞
本実施形態で説明したＧＵＩは一例であって、意味情報認識部８１２０が算出した意味情報、制御部８１３０が算出した占有マップの接近拒絶度の値などを提示し、ＡＧＶの内部状態を把握するようにするＧＵＩであればどのような可視化方法を用いてもよい。例えば色を変える、線の太さや実線・破線・二重線を切り替える、拡大縮小する、必要のない情報を隠す、というように表示情報を変更することもできる。このように表示情報の可視化方法を変えることで、ユーザがより直感的に表示情報を理解することができるようにする。 <Modified example>
The GUI described in this embodiment is an example, and presents the semantic information calculated by the semantic information recognition unit 8120, the approach rejection value of the occupancy map calculated by the control unit 8130, and grasps the internal state of the AGV. Any visualization method may be used as long as it is a GUI that allows this. For example, you can change the displayed information by changing the color, switching line thickness, solid, broken, or double lines, scaling, or hiding unnecessary information. By changing the method of visualizing the displayed information in this way, the user can more intuitively understand the displayed information.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention provides a system or device with a program that implements one or more of the functions of the embodiments described above via a network or a storage medium, and one or more processors in the computer of the system or device reads and executes the program. This can also be achieved by processing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

Claims

移動体に搭載された撮像手段であって、撮像素子上の各々の受光部が２以上の受光素子によって構成される前記撮像手段によって推定された視差画像に基づいて、第１の方法によって取得された第１の奥行値の入力を受け付ける第１の入力手段と、
前記第１の方法とは異なる第２の方法によって取得された第２の奥行値との入力を受け付ける第２の入力手段と、
マップ情報を保持する保持手段と、
前記第１の奥行値の大きさに応じて決定された信頼度および前記第２の奥行値に基づいて補正された奥行情報と、前記マップ情報と、に基づいて前記撮像手段の位置姿勢を取得する取得手段と、
前記取得手段が取得した位置姿勢に基づいて前記移動体の移動を制御する制御値を得る制御手段と、
を備えることを特徴とする情報処理装置。 An image capturing means mounted on a moving object, wherein each light receiving section on an image sensor is configured to acquire a parallax image by a first method based on a parallax image estimated by the image capturing means configured of two or more light receiving elements. a first input means for receiving an input of a first depth value;
a second input means that receives an input of a second depth value obtained by a second method different from the first method ;
a holding means for holding map information;
obtaining the position and orientation of the imaging means based on the reliability determined according to the magnitude of the first depth value and the depth information corrected based on the second depth value , and the map information; an acquisition means to
a control means for obtaining a control value for controlling movement of the mobile body based on the position and orientation acquired by the acquisition means;
An information processing device comprising:

前記奥行情報は、前記撮像手段が撮像した情報であり、前記撮像手段が前記受光素子を選択的に構成して生成したデプスマップであること、
を特徴とする請求項１記載の情報処理装置。 The depth information is information captured by the imaging means , and is a depth map generated by the imaging means by selectively configuring the light receiving element;
The information processing device according to claim 1, characterized in that:

前記奥行情報は、前記撮像手段が撮像した情報であり、前記撮像手段が前記受光素子を選択的に構成して生成した空間中の三次元の位置情報を保持する三次元点群であること、を特徴とする請求項１記載の情報処理装置。 The depth information is information captured by the imaging means, and is a three-dimensional point group that holds three-dimensional positional information in a space generated by the imaging means by selectively configuring the light receiving element; The information processing device according to claim 1, characterized in that:

前記信頼度は、奥行値ごとの前記第１の奥行値の計測精度であり、
前記取得手段は、前記撮像手段によって取得された前記奥行値ごとの前記第１の奥行値の計測精度に基づいて補正された前記奥行情報と、前記マップ情報とに基づいて前記撮像手段の位置姿勢を取得すること、
を特徴とする請求項１に記載の情報処理装置。 The reliability is the measurement accuracy of the first depth value for each depth value ,
The acquisition means determines the position and orientation of the imaging means based on the depth information corrected based on the measurement accuracy of the first depth value for each depth value acquired by the imaging means and the map information. to obtain,
The information processing device according to claim 1, characterized in that:

前記取得手段は、前記奥行値ごとの前記第１の奥行値の計測精度と、前記奥行情報と、前記撮像手段が前記第１の奥行値を取得した第一の時刻より前の第二の時刻に取得した奥行値とに基づいて前記奥行情報を補正し、補正された前記奥行情報と前記マップ情報とに基づいて前記撮像手段の位置姿勢を取得すること、
を特徴とする請求項４に記載の情報処理装置。 The acquisition means measures the measurement accuracy of the first depth value for each depth value , the depth information, and a second time before the first time at which the imaging means acquires the first depth value . correcting the depth information based on the depth value obtained in , and obtaining the position and orientation of the imaging means based on the corrected depth information and the map information;
The information processing device according to claim 4, characterized by:

前記制御手段がさらに、パターン光を投影するための投影装置を制御する制御値を算出する、
ことを特徴とする請求項１乃至５何れか１項に記載の情報処理装置。 The control means further calculates a control value for controlling a projection device for projecting the patterned light.
The information processing device according to any one of claims 1 to 5.

前記入力手段がさらに、空間中の三次元位置を表す三次元情報を取得する三次元計測装置が計測した三次元情報の入力を受け付け、
前記取得手段がさらに、前記奥行情報と、前記三次元情報とを基に前記奥行情報を補正し、補正した奥行情報と前記マップ情報とに基づいて前記撮像手段の位置姿勢を取得すること、
を特徴とする請求項１乃至６何れか１項に記載の情報処理装置。 The input means further receives input of three-dimensional information measured by a three-dimensional measuring device that acquires three-dimensional information representing a three-dimensional position in space,
The acquisition means further corrects the depth information based on the depth information and the three-dimensional information, and acquires the position and orientation of the imaging means based on the corrected depth information and the map information;
The information processing device according to any one of claims 1 to 6, characterized in that:

前記取得手段はさらに、前記奥行情報または前記マップ情報の何れか一方または両方から物体の位置姿勢情報を取得し、
前記制御手段は、前記物体の位置姿勢情報を基に前記移動体を制御する制御値を算出すること、
を特徴とする請求項１乃至７何れか１項に記載の情報処理装置。 The acquisition means further acquires position and orientation information of the object from either or both of the depth information and the map information,
The control means calculates a control value for controlling the moving object based on position and orientation information of the object;
The information processing device according to any one of claims 1 to 7, characterized by:

前記制御手段がさらに、前記物体の位置姿勢情報を基に所定の物体が前記奥行情報の所定の地点に位置するように前記移動体を制御する制御値を算出すること、
を特徴とする請求項８記載の情報処理装置。 The control means further calculates a control value for controlling the moving object so that the predetermined object is located at a predetermined point in the depth information based on the position and orientation information of the object;
The information processing device according to claim 8, characterized in that:

前記制御手段がさらに、前記物体の位置姿勢情報を基に所定の物体との衝突を避けるように前記移動体を制御する制御値を算出すること、
を特徴とする請求項８記載の情報処理装置。 The control means further calculates a control value for controlling the moving body to avoid a collision with a predetermined object based on the position and orientation information of the object;
The information processing device according to claim 8, characterized in that:

前記取得手段がさらに、前記奥行情報を基に前記マップ情報を生成して補正し、
前記保持手段が、前記補正された前記マップ情報を保持すること
を特徴とする請求項１乃至１０何れか１項に記載の情報処理装置。 The acquisition means further generates and corrects the map information based on the depth information,
11. The information processing apparatus according to claim 1, wherein the holding means holds the corrected map information.

前記取得手段がさらに、前記奥行情報を領域分割すること、
を特徴とする請求項１乃至１１何れか１項に記載の情報処理装置。 The acquisition means further divides the depth information into regions;
The information processing device according to any one of claims 1 to 11, characterized in that:

前記取得手段がさらに、前記奥行情報を意味的な領域分割により領域分割すること、
を特徴とする請求項１２記載の情報処理装置。 The acquisition means further divides the depth information into regions by semantic region division;
13. The information processing apparatus according to claim 12.

前記取得手段がさらに、前記領域分割の結果を基に前記マップ情報を生成し補正すること、を特徴とする請求項１２または１３記載の情報処理装置。 14. The information processing apparatus according to claim 12, wherein the acquisition means further generates and corrects the map information based on the result of the area division.

前記制御手段がさらに、前記領域分割の結果を基に前記移動体を制御する制御値を算出すること、
を特徴とする請求項１２乃至１４何れか１項に記載の情報処理装置。 The control means further calculates a control value for controlling the moving body based on the result of the area division;
The information processing device according to any one of claims 12 to 14.

前記制御手段がさらに、前記奥行情報、前記マップ情報、前記取得した位置姿勢、前記制御値の少なくとも一つを基に前記撮像手段のパラメータを調整する調整値を算出すること、
を特徴とする請求項１乃至１５何れか１項記載の情報処理装置。 The control means further calculates an adjustment value for adjusting a parameter of the imaging means based on at least one of the depth information, the map information, the acquired position and orientation, and the control value;
The information processing device according to any one of claims 1 to 15, characterized in that:

前記調整値とは、前記撮像手段のフォーカス値であること、
を特徴とする請求項１６記載の情報処理装置。 The adjustment value is a focus value of the imaging means,
The information processing device according to claim 16, characterized in that:

前記調整値とは、前記撮像手段のズーム値であること、
を特徴とする請求項１６記載の情報処理装置。 The adjustment value is a zoom value of the imaging means;
The information processing device according to claim 16, characterized in that:

前記撮像手段は、光学装置を交換することが可能であり、
前記入力手段はさらに、前記交換した光学装置のパラメータを取得すること、
を特徴とする請求項１乃至１８の何れか１項に記載の情報処理装置。 The imaging means can have an exchangeable optical device,
The input means further obtains parameters of the replaced optical device;
The information processing device according to any one of claims 1 to 18, characterized by:

前記奥行情報、前記マップ情報、前記位置姿勢、前記制御値のうち少なくとも一つを基に表示情報を生成する表示情報生成手段
をさらに備えることを特徴とする請求項１乃至１９の何れか１項に記載の情報処理装置。 20. Any one of claims 1 to 19, further comprising display information generation means for generating display information based on at least one of the depth information, the map information, the position and orientation, and the control value. The information processing device described in .

移動体に搭載された撮像手段であって、撮像素子上の各々の受光部が２以上の受光素子によって構成される前記撮像手段によって推定された視差画像に基づいて第１の方法によって取得された第１の奥行値の入力を受け付ける第１の入力工程と、
前記第１の方法とは異なる第２の方法によって取得された第２の奥行値の入力を受け付ける第２の入力工程と、
マップ情報を保持手段に保持する工程と、
前記第１の奥行値の大きさに応じて決定された信頼度および前記第２の奥行値に基づいて補正された奥行情報と、前記マップ情報と、に基づいて前記撮像手段の位置姿勢を取得する取得工程と、
前記取得工程が取得した位置姿勢を基に移動体の移動を制御する制御値を算出する制御工程と、
を備えることを特徴とする情報処理方法。 An imaging means mounted on a moving object, wherein each light-receiving section on the imager is configured by two or more light-receiving elements, and the parallax image is acquired by the first method based on the estimated parallax image by the imaging means. a first input step of accepting input of a first depth value;
a second input step of receiving an input of a second depth value obtained by a second method different from the first method ;
a step of retaining map information in a retaining means;
obtaining the position and orientation of the imaging means based on the reliability determined according to the magnitude of the first depth value and the depth information corrected based on the second depth value , and the map information; an acquisition process to
a control step of calculating a control value for controlling the movement of the mobile body based on the position and orientation acquired in the acquisition step;
An information processing method comprising:

移動体に搭載された撮像手段であって、撮像素子上の各々の受光部が２以上の受光素子によって構成される前記撮像手段と、
前記撮像手段によって推定された視差画像に基づいて、第１の方法によって取得された第１の奥行値の入力を受け付ける第１の入力手段と、
前記第１の方法とは異なる第２の方法によって取得された第２の奥行値との入力を受け付ける第２の入力手段と、
マップ情報を保持する保持手段と、
前記第１の奥行値の大きさに応じて決定された信頼度および前記第２の奥行値に基づいて補正された奥行情報と、前記マップ情報と、に基づいて前記撮像手段の位置姿勢を取得する取得手段と、
前記取得手段が取得した位置姿勢に基づいて前記移動体の移動を制御する制御値を得る制御手段と、
を特徴とする情報処理システム。 An imaging means mounted on a moving body, wherein each light receiving section on the image sensor is constituted by two or more light receiving elements;
a first input means that receives an input of a first depth value obtained by a first method based on the parallax image estimated by the imaging means ;
a second input means that receives an input of a second depth value obtained by a second method different from the first method ;
a holding means for holding map information;
obtaining the position and orientation of the imaging means based on the reliability determined according to the magnitude of the first depth value and the depth information corrected based on the second depth value , and the map information; an acquisition means to
a control means for obtaining a control value for controlling movement of the mobile body based on the position and orientation acquired by the acquisition means;
An information processing system characterized by:

移動体に搭載された撮像手段であって、撮像素子上の各々の受光部が２以上の受光素子によって構成される前記撮像手段と、
前記撮像手段によって推定された視差画像に基づいて、第１の方法によって取得された第１の奥行値の入力を受け付ける第１の入力手段と、
前記第１の方法とは異なる第２の方法によって取得された第２の奥行値との入力を受け付ける第２の入力手段と、
マップ情報を保持する保持手段と、
前記第１の奥行値の大きさに応じて決定された信頼度および前記第２の奥行値に基づいて補正された奥行情報と、前記マップ情報と、に基づいて前記撮像手段の位置姿勢を取得する取得手段と、
前記取得手段が取得した位置姿勢に基づいて前記移動体の移動を制御する制御値を得る制御手段と、
前記制御値で前記移動体の移動を制御するアクチュエータと、を備えること、
を特徴とする移動体。 An imaging means mounted on a moving body, wherein each light receiving section on the image sensor is constituted by two or more light receiving elements;
a first input means that receives an input of a first depth value obtained by a first method based on the parallax image estimated by the imaging means ;
a second input means that receives an input of a second depth value obtained by a second method different from the first method ;
a holding means for holding map information;
obtaining the position and orientation of the imaging means based on the reliability determined according to the magnitude of the first depth value and the depth information corrected based on the second depth value , and the map information; an acquisition means to
a control means for obtaining a control value for controlling movement of the mobile body based on the position and orientation acquired by the acquisition means;
an actuator that controls movement of the moving body using the control value;
A mobile object characterized by