JP6955081B2

JP6955081B2 - Electronic devices, systems and methods for determining object orientation

Info

Publication number: JP6955081B2
Application number: JP2020502372A
Authority: JP
Inventors: マイアースベン; 訓成小堀; ケールバディム
Original assignee: トヨタモーターヨーロッパ
Priority date: 2017-07-20
Filing date: 2017-07-20
Publication date: 2021-10-27
Anticipated expiration: 2037-07-20
Also published as: WO2019015761A1; JP2020527270A

Description

本開示は、オブジェクト姿勢を決定する、特に、シーン内の非静止オブジェクト姿勢を認識するための電子デバイス、システムおよび方法に関する。ここで、オブジェクト姿勢は事前定義されていない。 The present disclosure relates to electronic devices, systems and methods for determining object poses, in particular for recognizing non-stationary object poses in a scene. Here, the object posture is not predefined.

多くの分野で、自動化はますます重要になっており、それは、ロボット工学の必要性が高まっていることも意味する。一方、ロボットシステムは産業分野で一般的になっているが、それらの、例えば、日常生活の中で個々のユーザーにサービスを提供するための使用は、家庭内の環境ではまだかなり一般的ではない。しかしながら、また、この分野では、ロボットシステムに対する高い需要がある。例えば、ロボットシステムは、高齢者が特定のオブジェクト、例えば、鉛筆を見つけて、手にするのに役立つ。 Automation is becoming more and more important in many areas, which also means that the need for robotics is increasing. Robot systems, on the other hand, have become commonplace in the industrial field, but their use, for example, to serve individual users in daily life, is not yet quite common in home environments. .. However, there is also a high demand for robotic systems in this area. For example, a robot system helps an elderly person find and pick up a particular object, such as a pencil.

家庭内でのロボットシステムの使用に関する１つの問題は、産業用アプリケーションとは対照的に、多くのタスクは標準化できない、つまり、事前定義し、厳密に制御することができないことである。したがって、ロボットシステムは、個々に異なるタスクを実行できる必要がある。さらに、家庭内の動作条件、例えば点灯、オブジェクトの配置などはより困難である。他の領域でも、ロボットシステムが使用されている場合、関心のあるオブジェクト姿勢が、不明、つまり、強制的に事前定義されていない場合がある。 One problem with the use of robotic systems in the home is that, in contrast to industrial applications, many tasks cannot be standardized, that is, they cannot be predefined and tightly controlled. Therefore, robot systems need to be able to perform different tasks. Furthermore, operating conditions in the home, such as lighting and placement of objects, are more difficult. In other areas as well, when robotic systems are used, the object poses of interest may be unknown, i.e. forced, undefined.

ロボットシステムの重要な側面は、したがって、特定のオブジェクトを見つけて認識する機能である。任意の場所および任意の方向に配置できる。この目的のために、ロボットシステムは光学センサを備えてもよく、移動可能であってもよい。例えば、駆動可能な車輪を有することができる。 An important aspect of a robotic system is therefore the ability to find and recognize specific objects. It can be placed in any location and in any direction. For this purpose, the robot system may be equipped with an optical sensor or may be mobile. For example, it can have driveable wheels.

さらなる課題は、オブジェクト、例えば、検知されたシーンで認識されたオブジェクト姿勢を決定することである。正確な姿勢（特に６Ｄ姿勢）を決定することは、オブジェクトを拾い上げる場合、またはロボットシステムによって操作する場合、有利であることができる。 A further task is to determine the object's orientation, eg, the object's orientation recognized in the detected scene. Determining the correct attitude (especially the 6D attitude) can be advantageous when picking up an object or manipulating it by a robotic system.

反復最近傍点（ＩＣＰ）アルゴリズムは、点群や法線など、つまり、３ＤＲＧＢデータの２つの形状を一致させるためのアルゴリズムの一例である。このタイプのデータは、ロボット工学アプリケーションで一般的に使用され、オブジェクト認識中の姿勢調整に使用される主要なコンポーネントアルゴリズムを構成する。 The iterative nearest point (ICP) algorithm is an example of an algorithm for matching two shapes such as a point cloud and a normal, that is, 3D RGB data. This type of data is commonly used in robotics applications and constitutes the main component algorithm used for postural adjustment during object recognition.

既知のオブジェクト認識においては、ＩＣＰアルゴリズムを使用して、ロボットの３Ｄ（ＲＧＢ−Ｄ）センサから来る点群をターゲットオブジェクトの既知のモデルの点群と一致させる。アルゴリズムの出力は、観測されたデータに最適にフィットするモデルの変換である。これは、モデルベースのＩＣＰと呼ばれる。したがって、このようなオブジェクト検出シナリオでは、１つの点群がモデル仮説から得られ、センサからのクラウドデータをポイントするように調整する必要がある。以下では、これを「シーン」と呼ぶ。 In known object recognition, the ICP algorithm is used to match the point cloud coming from the robot's 3D (RGB-D) sensor with the point cloud of a known model of the target object. The output of the algorithm is a transformation of the model that best fits the observed data. This is called a model-based ICP. Therefore, in such an object detection scenario, one point cloud needs to be obtained from the model hypothesis and adjusted to point to the cloud data from the sensor. Hereinafter, this is referred to as a "scene".

しかしながら、そのような従来のアプローチは、発信元ポイントクラウドと宛先ポイントクラウドのポイント間の対応の発見において計算コストがかかる。最も近い一致を見つけるには、すべてのポイントペア間の距離を計算する必要があるためである。特に、リアルタイムシナリオでの使用は難しい場合がある。レンダリング手順の実行にはコストがかかるため、望ましくない遅延が発生する可能性があるためである。 However, such a traditional approach is computationally expensive in discovering the point correspondence between the source point cloud and the destination point cloud. This is because the distance between all point pairs needs to be calculated to find the closest match. In particular, it can be difficult to use in real-time scenarios. This is because performing the rendering procedure is costly and can result in unwanted delays.

ポイントツープレーンアプローチを使用したオブジェクト認識ＩＣＰ技術が知られている。例えば、ヤン・チェン、ジェラルド・メディオーニ、「複数の距離画像の登録によるオブジェクトモデリング（ＯｂｊｅｃｔＭｏｄｅｌｉｎｇｂｙＲｅｇｉｓｔｒａｔｉｏｎｏｆＭｕｌｔｉｐｌｅＲａｎｇｅＩｍａｇｅｓ）」ＩｎｔｅｒｎａｔｉｏｎａｌＪｏｕｒｎａｌｏｆＩｍａｇｅａｎｄＶｉｓｉｏｎＣｏｍｐｕｔｉｎｇ，１０（３），ｐｐ．１４５−１５５，１９９２．を参照。 Object recognition ICP technology using a point-to-plane approach is known. For example, Jan Chen, Gerald Medioni, "Object Modeling by Restriction Range Images" International Journal of Image and Vision (10) 145-155, 1992. See.

この比較方法を加速するために、ＫＤツリー（ｋ次元ツリー）アルゴリズムを使用することが知られている。しかしながら、ＫＤツリー自体を使用するには、前述のツリーを構築する必要があり、また、メモリと時間の面でも計算コストがかかる。この例は、Ｋ．Ｔａｔｅｎｏ，Ｄ．Ｋｏｔａｋｅ，ａｎｄＳ．Ｕｃｈｉｙａｍａ．「オンラインテクスチャ更新によるモデルベースの３Ｄオブジェクトトラッキング（Ｍｏｄｅｌ−ｂａｓｅｄ３ＤＯｂｊｅｃｔＴｒａｃｋｉｎｇｗｉｔｈＯｎｌｉｎｅＴｅｘｔｕｒｅＵｐｄａｔｅ）」、ＭＶＡ，２００９である。 It is known to use a KD tree (k-dimensional tree) algorithm to accelerate this comparison method. However, in order to use the KD tree itself, it is necessary to construct the above-mentioned tree, and the calculation cost is high in terms of memory and time. An example of this is K.K. Tateno, D.M. Kotake, and S. Uchiyama. "Model-based 3D Object Tracking with Online Texture Update", MVA, 2009.

ＵＳ２０１２１１４２５１（Ａ１）は、統計形状モデルを使用して３Ｄオブジェクトの２Ｄ表現から３Ｄ形状を回復し、復元された３Ｄ形状をオブジェクトクラスの少なくとも１つのオブジェクトの既知の３Ｄから２Ｄへの表現と比較する、特定のオブジェクトクラスの３Ｄオブジェクトのオブジェクト認識のためのシステムを開示する。 US2012114251 (A1) recovers a 3D shape from a 2D representation of a 3D object using a statistical shape model and compares the restored 3D shape with the known 3D to 2D representation of at least one object in the object class. Discloses a system for object recognition of 3D objects of a particular object class.

現在、特に、オブジェクト姿勢認識中に、例えば、リアルタイムのシナリオで計算の労力を削減する必要があるオブジェクト姿勢を決定する電子デバイス、システムおよび方法を提供することが依然として望ましい。 Currently, it is still desirable to provide electronic devices, systems and methods for determining object poses that need to reduce computational effort, for example, in real-time scenarios, especially during object pose recognition.

したがって、本開示の実施形態によれば、オブジェクト姿勢を決定するための電子デバイスが提供される。電子デバイスは、
− 光学センサの３Ｄ画像データを受信し、
− ３Ｄ画像データに基づいて、光学センサの位置に対するオブジェクト姿勢を推定し、
− 推定されたオブジェクト姿勢に基づいて、所定の３Ｄオブジェクトモデルの所定のビュー位置のセットから最も近いものを識別し、
− 識別された最も近いビュー位置に基づいて、シーン内のオブジェクト姿勢を決定する
ように構成されている。 Therefore, according to the embodiments of the present disclosure, an electronic device for determining an object posture is provided. Electronic devices
-Receive the 3D image data of the optical sensor and
− Estimate the object orientation with respect to the position of the optical sensor based on the 3D image data.
-Based on the estimated object orientation, identify the closest one from a given set of view positions in a given 3D object model.
-It is configured to determine the object pose in the scene based on the closest identified view position.

そのような電子デバイスを提供することにより、２つの形状、点群や法線など、つまり、３ＤＲＧＢデータのマッチングのアルゴリズムのより高速な実装が、供給される。このタイプのデータは、ロボット工学アプリケーションで一般的に使用され、オブジェクト認識中に、姿勢の調整に使用されるアルゴリズムの重要なコンポーネントを構成する。 Providing such an electronic device provides a faster implementation of an algorithm for matching two shapes, point clouds, normals, etc., i.e., 3D RGB data. This type of data is commonly used in robotics applications and constitutes an important component of the algorithm used to adjust posture during object recognition.

言い換えると、本開示によれば、モデルの所定のビュー（したがって、事前に決定されたビュー）位置を取得するために、モデルが、望ましくは事前にレンダリングされる。その結果、姿勢認識では、さらに時間のかかるレンダリングの必要なしに、このデータは姿勢調整に使用することが望ましい。これにより、この方法はリアルタイムシナリオに対して、特に興味深いものになる。コストのかかるレンダリング手順を削減したり、オブジェクト姿勢認識中に回避したりすることもできるためである。 In other words, according to the present disclosure, the model is preferably pre-rendered in order to obtain a predetermined view (and thus a predetermined view) position of the model. As a result, in attitude recognition, it is desirable to use this data for attitude adjustment without the need for more time-consuming rendering. This makes this method particularly interesting for real-time scenarios. This is because costly rendering steps can be reduced or avoided during object orientation recognition.

したがって、本開示により提案される概念およびアルゴリズムは、従来のＫＤツリー手法の代わりにＩＣＰで使用できる。特に、本開示の概念およびアルゴリズムは、検知されたオブジェクトの点群とオブジェクトモデルとの間の対応関係の取得に使用できる。 Therefore, the concepts and algorithms proposed in this disclosure can be used in ICP instead of the traditional KD tree method. In particular, the concepts and algorithms of the present disclosure can be used to obtain the correspondence between the point cloud of the detected object and the object model.

光学センサは、オブジェクトが存在するシーンを感知することが望ましい。 It is desirable that the optical sensor senses the scene in which the object is present.

光学センサの位置に対するオブジェクト姿勢を推定することにより、オブジェクト姿勢（つまり、オブジェクト空間内）に対する対応する光学センサの位置を、同時に推定することが望ましい。 By estimating the object pose with respect to the position of the optical sensor, it is desirable to simultaneously estimate the position of the corresponding optical sensor with respect to the object pose (ie, in object space).

所定の３Ｄオブジェクトモデルの所定のビュー位置のセットから最も近いものの識別は、推定されたオブジェクト姿勢に基づいている。さらに、３Ｄ画像データに基づいて、異なるビュー位置から見たとき、オブジェクト（Ｏ）および／またはオブジェクトモデルの事前レンダリング画像データを表すことができる。特に、オブジェクトを表す３Ｄ画像データ（Ｏ）を、オブジェクトモデルの画像データと比較することができる（ビューごとに１つのデータセット、各データセットとの比較）。推定されたオブジェクト姿勢は、決定する姿勢を見つけるためのガイドとして、および／または開始点として識別プロセスで使用できる。所定の視点位置は、そこからオブジェクトが見える所定の視点であることが望ましい。所定のビュー位置は、オブジェクトの１つまたは複数の軌道上、またはオブジェクトの球上で、互いに等距離に分布していることができる。例えば、数百（例えば、３００以上）のビュー位置があり得る。 The identification of the closest thing from a given set of view positions in a given 3D object model is based on the estimated object orientation. Further, based on the 3D image data, it is possible to represent the pre-rendered image data of the object (O) and / or the object model when viewed from different view positions. In particular, the 3D image data (O) representing the object can be compared with the image data of the object model (one dataset per view, comparison with each dataset). The estimated object pose can be used in the identification process as a guide for finding the determined pose and / or as a starting point. It is desirable that the predetermined viewpoint position is a predetermined viewpoint from which the object can be seen. The predetermined view positions can be distributed equidistantly from each other on one or more orbits of the object, or on the sphere of the object. For example, there can be hundreds (eg, 300 or more) of view positions.

光学センサ（３）の３Ｄ画像データは、点群を含むことができ、および／または、３Ｄオブジェクトモデルは点群を含むことができる。 The 3D image data of the optical sensor (3) can include a point cloud, and / or the 3D object model can include a point cloud.

したがって、最も近いビューを識別するため、および／またはオブジェクト姿勢を決定するために、これらの点群は互いに比較される、または、３Ｄ画像データのレンダリングされた点群のデータセットは、モデルのポイントクラウドの事前レンダリングビューのデータセットと比較できる。 Therefore, in order to identify the closest view and / or determine the object orientation, these point clouds are compared to each other, or the data set of the rendered point cloud of the 3D image data is the point of the model. Can be compared to the cloud pre-rendered view dataset.

オブジェクト姿勢の推定は、シーン内の姿勢を推定することにより、オブジェクト姿勢仮説を決定し、姿勢仮説に基づいて、光学センサの位置に対するオブジェクト姿勢を推定することを含むことができる。 The estimation of the object posture can include determining the object posture hypothesis by estimating the posture in the scene, and estimating the object posture with respect to the position of the optical sensor based on the posture hypothesis.

オブジェクト姿勢の推定または３Ｄ画像データの受信は、受信した３Ｄ画像データに基づいてシーン内のオブジェクトを認識することを含むことができる。 Estimating the object orientation or receiving 3D image data can include recognizing objects in the scene based on the received 3D image data.

したがって、オブジェクト認識はプロセスの最初に実行され、例えば３Ｄ画像が受信されたとき、または、最も近いビューの識別の実行中（または実行前）に実行される。または、デバイスは３Ｄ画像データを受信できる。ここで、これらのデータには、認識されたオブジェクトに関する情報がすでに含まれている。 Therefore, object recognition is performed at the beginning of the process, for example when a 3D image is received, or during (or before) identification of the closest view. Alternatively, the device can receive 3D image data. Here, these data already contain information about the recognized object.

事前定義されたビュー位置のセットから最も近いものを識別することは、オブジェクト空間でのセンサの位置（つまり、その姿勢または６Ｄ姿勢）の決定、および、所定のビュー位置のセットから最適なビューを見つけること、に基づくものであることができる。 Identifying the closest one from a predefined set of view positions determines the position of the sensor in object space (ie its orientation or 6D orientation), and the best view from a given set of view positions. It can be based on finding.

したがって、センサビューの（望ましくレンダリングされた）データと、事前定義されたさまざまなビューのモデルの（事前）レンダリングデータとの単純な比較を、所定のビュー位置の最も近い位置を識別するために、実行することができる。 Therefore, a simple comparison of the (desirably rendered) data in the sensor view with the (pre-rendered) data in the models of the various predefined views is done to identify the closest position to a given view position. Can be executed.

所定のビュー位置のセットのそれぞれは、再符号化されたデータセットにリンクすることができる。上記再符号化されたデータセットは、ビュー位置から見たときオブジェクトモデルのレンダリングされた画像データを表す。 Each set of predetermined view positions can be linked to a re-encoded dataset. The recoded dataset represents the rendered image data of the object model when viewed from the view position.

例えば、所定のビュー位置のセットは、１つまたは複数のルックアップテーブル（ビュー位置ごとに１つなど）において、再符号化されたデータセットにリンクされることができる。この電子デバイスは、上記のルックアップテーブルを提供する、および／または再符号化されたデータセットにリンクされた所定のビュー位置のセットを提供するデータストレージを備えることができる。ルックアップテーブルは、再符号化されたデータセットを含むことができる。 For example, a set of predetermined view positions can be linked to a re-encoded dataset in one or more lookup tables (such as one for each view position). The electronic device may include data storage that provides the lookup table described above and / or provides a set of predetermined view positions linked to the re-encoded dataset. The lookup table can contain a recoded dataset.

再符号化されたデータセットのレンダリングされた画像データは、オブジェクトモデルのサブサンプリングされた点群、モデルのサブサンプリングされた輪郭、および／または、モデルのサブサンプリングされた表面モデルを含むことができる。 The rendered image data of the re-encoded dataset can include a subsampled point cloud of the object model, a subsampled contour of the model, and / or a subsampled surface model of the model. ..

所定のビュー位置のセットから最も近いものを識別することは、所定のビュー位置のそれぞれについて、リンクされた再符号化データセットのレンダリングされた画像データをシーンにプロジェクトすることと、レンダリングされた画像データをシーン内のオブジェクトを表す３Ｄ画像データと比較することと、所定のビュー位置のどれに対して、レンダリングされた画像データと、そのシーンにおけるオブジェクトを表す３Ｄ画像データとの間の偏差が最小値に達するかを決定することと、を含むことができる。 Identifying the closest thing from a given set of view positions is to project the rendered image data of the linked recoded dataset into the scene for each of the given view positions and to identify the rendered image. Comparing the data with the 3D image data representing the object in the scene and minimizing the deviation between the rendered image data and the 3D image data representing the object in the scene for any of the given view positions. Determining if a value is reached can include.

前記偏差は、最小化されるエラーとも呼ぶことができる。 The deviation can also be referred to as a minimized error.

画像データは、可視光画像と深度画像のペアを含むことができる。これらのデータは、デバイスへの入力データであり得る。 The image data can include a pair of visible light image and depth image. These data can be input data to the device.

可視光画像は、電磁スペクトルの可視部分、特に人間の視覚システムによって処理される３つのバンド（ＲＧＢ：赤、緑、青）を含むことがあり得る。 Visible light images can include visible parts of the electromagnetic spectrum, particularly three bands (RGB: red, green, blue) processed by the human visual system.

オブジェクト姿勢は、ｘ，ｙ，ｚ位置情報および、θ，φ，Ψ回転情報であることができる。 The object posture can be x, y, z position information and θ, φ, Ψ rotation information.

より一般的には、オブジェクト姿勢は、場所の数学的記述および座標系でのオブジェクトの向きであり得る。 More generally, the object orientation can be a mathematical description of the location and the orientation of the object in the coordinate system.

シーン内のオブジェクト姿勢を決定することは、所定の３Ｄオブジェクトモデルの所定のビュー位置のセットから、識別された最も近いビューに基づく、θ，φ，Ψ回転情報を決定すること、および／または、モデルをシーンに最も近いビューでプロジェクトすることに基づく，ｘ，ｙ，ｚ位置情報を決定すること、および、プロジェクトモデルとシーン内のオブジェクトを表す３Ｄ画像データを比較することを含むことができる。 Determining the object orientation in a scene determines θ, φ, Ψ rotation information based on the closest identified view from a given set of view positions in a given 3D object model, and / or It can include determining x, y, z position information based on projecting the model in the view closest to the scene, and comparing the project model with 3D image data representing objects in the scene.

本開示はさらに、オブジェクト姿勢を決定するためのシステムに関する。このシステムは、
● 特に上記のような電子デバイス、および、
● オブジェクトを検知するように構成された光学センサを備える。このセンサは特に３Ｄカメラまたはステレオカメラである。 The present disclosure further relates to a system for determining object orientation. This system
● Especially the above electronic devices and
● Equipped with an optical sensor configured to detect objects. This sensor is especially a 3D camera or a stereo camera.

したがって、このシステムは。オブジェクトを自律的に認識して位置特定し、特に、上記オブジェクト姿勢を決定するように構成することができる。たとえば、オブジェクトを取得する手段などを備える移動可能なロボットシステムとして実現することができる。 Therefore, this system is. The object can be autonomously recognized and positioned, and in particular, the object can be configured to determine the posture. For example, it can be realized as a movable robot system provided with means for acquiring an object.

本開示はさらに、オブジェクト姿勢を決定する方法に関する。この方法は、
● 光学センサ（３）の３Ｄ画像データを受信するステップであって、シーン内のオブジェクト（Ｏ）を表すステップと、
● ３Ｄ画像データに基づいて、前記光学センサの位置に関する前記オブジェクト姿勢の推定するステップと、
● 推定されたオブジェクト姿勢に基づいて、所定の３Ｄオブジェクトモデルの所定のビュー位置のセットから最も近いものを識別するステップと、
● 前記識別された最も近いビュー位置に基づいて、前記シーン内の前記オブジェクト姿勢を決定するステップと、
を含む。 The present disclosure further relates to methods of determining object orientation. This method
● A step of receiving 3D image data of the optical sensor (3), which represents an object (O) in the scene, and a step of representing the object (O) in the scene.
● Based on the 3D image data, the step of estimating the object posture with respect to the position of the optical sensor, and
● A step to identify the closest one from a given set of view positions in a given 3D object model based on the estimated object orientation.
● A step of determining the object orientation in the scene based on the closest identified view position, and
including.

この方法は、オブジェクトモデルの複数のビュー位置を決定するステップと、所定のビュー位置のセットの各ビュー位置に対して、所定のビュー位置のセットを形成するステップと、再符号化されたデータセットを決定するステップであって、該再符号化されたデータセットは、前記ビュー位置から見たときオブジェクトモデルのレンダリングされた画像データを表す、ステップと、前記ビュー位置を前記再符号化されたデータセットにリンクするステップと、をさらに含むことができる。 This method involves determining multiple view positions in an object model, forming a set of predetermined view positions for each view position in a set of predetermined view positions, and a re-encoded dataset. The re-encoded dataset represents the rendered image data of the object model when viewed from the view position, and the re-encoded data of the view position. It can further include steps that link to the set.

したがって、所定のビュー位置のセットおよび／または関連する再符号化済みデータセットを、事前に決定することができる。これらのデータは、オブジェクト姿勢の決定方法において、つまり、オブジェクト姿勢認識にこの方法を使用中において使用することができる。 Thus, a set of predetermined view positions and / or associated re-encoded datasets can be pre-determined. These data can be used in the method of determining the object posture, that is, while using this method for object posture recognition.

この方法は、上記の電子デバイスの機能に対応する、さらなる方法ステップを含み得る。さらに望ましい方法のステップを以下に説明する。 The method may include additional method steps corresponding to the functionality of the electronic device described above. The steps of a more desirable method are described below.

光学センサ（３）の３Ｄ画像データは、点群を含むことができ、および／または、３Ｄオブジェクトモデルは点群を含む。 The 3D image data of the optical sensor (3) can include a point cloud and / or the 3D object model contains a point cloud.

オブジェクト姿勢を推定するステップは、シーン内の姿勢を推定することにより、オブジェクト姿勢仮説を決定するステップと、姿勢仮説に基づいて、光学センサの位置に対するオブジェクト姿勢の推定するステップとを含むことができる。 The step of estimating the object posture can include a step of determining the object posture hypothesis by estimating the posture in the scene and a step of estimating the object posture with respect to the position of the optical sensor based on the posture hypothesis. ..

オブジェクト姿勢を推定するステップまたは３Ｄ画像データを受信するステップは、受信した３Ｄ画像データに基づいてシーン内のオブジェクトを認識するステップを含むことができる。 The step of estimating the object posture or the step of receiving the 3D image data can include the step of recognizing the object in the scene based on the received 3D image data.

所定のビュー位置のセットから最も近いものを識別するステップは、オブジェクト空間内におけるセンサの位置（つまり、その姿勢または６Ｄ姿勢）の決定、および、所定のビュー位置のセットからの最適なビューの発見に基づくことができる。 The steps to identify the closest thing from a given set of view positions are determining the position of the sensor in object space (ie its orientation or 6D orientation) and finding the best view from the given set of view positions. Can be based on.

再符号化されたデータセットのレンダリングされた画像データは、オブジェクトモデルのサブサンプリングされた点群、モデルのサブサンプリングされた輪郭、および／または、モデルのサブサンプリングされた表面モデルを含む。 The rendered image data of the re-encoded dataset includes a subsampled point cloud of the object model, a subsampled contour of the model, and / or a subsampled surface model of the model.

所定のビュー位置のセットから最も近いものを識別するステップは、所定のビュー位置のそれぞれについて、リンクされた再符号化データセットのレンダリングされた画像データをシーンにプロジェクトし、レンダリングされた画像データをシーン内のオブジェクトを表す３Ｄ画像データと比較し、および、所定のビュー位置のどれに対して、レンダリングされた画像データと、そのシーンのオブジェクトを表す３Ｄ画像データとの間の偏差が、最小値に達するかを決定すること、を含むことができる。 The step of identifying the closest from a given set of view positions is to project the rendered image data of the linked re-encoded dataset into the scene for each of the given view positions and load the rendered image data. The minimum value is the deviation between the rendered image data and the 3D image data representing the object in the scene compared to the 3D image data representing the object in the scene and for any of the given view positions. Can include deciding whether to reach.

シーン内のオブジェクト姿勢を決定するステップには、所定の３Ｄオブジェクトモデルの所定のビュー位置のセットから、識別された最も近いビューに基づいた、θ，φ，Ψ回転情報を決定するステップと、および／または、そのシーンへの、最も近いビューにおけるモデルのプロジェクトに基づく、ｘ，ｙ，ｚ位置情報を決定するステップと、および、プロジェクトモデルとシーン内のオブジェクトを表す３Ｄ画像データを比較するステップと、が含まれる。 The steps to determine the object orientation in the scene include determining θ, φ, Ψ rotation information based on the closest identified view from a given set of view positions in a given 3D object model, and / Or a step of determining x, y, z position information based on the model's project in the closest view to the scene, and a step of comparing the project model with 3D image data representing objects in the scene. , Is included.

上記の要素の組み合わせが意図されており、また、明細書内のものを作成することができる。ただし、矛盾が生じるような場合を除く。 A combination of the above elements is intended and can be made within the specification. However, this does not apply when there is a contradiction.

前述の一般的な説明の両方および、以下の詳細な説明は、単なる例示および説明であり、開示を制限するものではなく、請求項に記載のとおりであることが理解されるべきである。 It should be understood that both the general description described above and the detailed description below are merely exemplary and explanatory and do not limit disclosure and are as set forth in the claims.

添付図面は、本願明細書に組み込まれ、その一部を構成し、本開示の実施形態を説明とともに、図示するものであり、そして、その原理を説明するのに役立つものである。 The accompanying drawings are incorporated herein by reference, constitute a portion thereof, illustrate embodiments of the present disclosure, along with description, and serve to explain the principles thereof.

図１は、本開示の実施形態による、電子デバイスを備えたシステムのブロック図を示している。FIG. 1 shows a block diagram of a system with an electronic device according to an embodiment of the present disclosure. 図２ａおよび図２ｂは、本開示の実施形態による、例示的なシーンを示しており（図２ａ）、オブジェクト姿勢は、事前にレンダリングされたオブジェクトモデルを使用して決定される（図２ｂ）。2a and 2b show exemplary scenes according to embodiments of the present disclosure (FIG. 2a), where object orientation is determined using a pre-rendered object model (FIG. 2b). 図２ａおよび図２ｂは、本開示の実施形態による、例示的なシーンを示しており（図２ａ）、オブジェクト姿勢は、事前にレンダリングされたオブジェクトモデルを使用して決定される（図２ｂ）。2a and 2b show exemplary scenes according to embodiments of the present disclosure (FIG. 2a), where object orientation is determined using a pre-rendered object model (FIG. 2b). 図３は、オブジェクトモデルの（準備／事前レンダリング）オフライン処理の典型的な方法を図示する概略的フローチャートを示す。FIG. 3 shows a schematic flowchart illustrating a typical method of (preparation / pre-rendering) offline processing of an object model. 図４は、オブジェクト姿勢が決定される、（使用中の）画像データのオンライン処理の例示的な方法を図示する概略的フローチャートを示す。FIG. 4 shows a schematic flowchart illustrating an exemplary method of online processing of (in use) image data in which the object orientation is determined. 図５は、図４の方法で使用される姿勢仮説の更新の例示的な方法を示す概略フローチャートを示す。FIG. 5 shows a schematic flow chart illustrating an exemplary method of updating the posture hypothesis used in the method of FIG.

次に、本開示の例示的な実施形態を詳細に参照し、その例を添付の図面に示す。図面全体を通して、可能な限り、同じまたは同様の部品を参照するために同じ参照番号が使用される。 Next, exemplary embodiments of the present disclosure will be referred to in detail, examples of which are shown in the accompanying drawings. Wherever possible, the same reference numbers are used to refer to the same or similar parts throughout the drawing.

図１は、本開示の実施形態によれば、電子デバイス１を備えたシステム３０のブロック図を示す。システムは、さまざまな機能を有するロボットシステム１０を備えることができる。例えば、それは、駆動可能な車輪があって、移動可能であり得る。また、オブジェクトを取得するための手段例えば少なくとも１つのグリッパーを有することがあり得る。 FIG. 1 shows a block diagram of a system 30 including an electronic device 1 according to an embodiment of the present disclosure. The system can include a robot system 10 having various functions. For example, it may have driveable wheels and be mobile. It may also have means for retrieving objects, such as at least one gripper.

電子装置１は、シーン内のオブジェクトの存在と位置（特に姿勢）を検出するコンピュータビジョンアルゴリズムを実行する。ロボットシステムでは、この情報を検索し、オブジェクトを見つけて操作することができるようにする必要がある。電子デバイス１への入力は、可視光（ＲＧＢ）と深度画像（Ｄ）のペアである。電子デバイス１の出力は、ターゲットオブジェクトの６Ｄ姿勢（ｘ，ｙ，ｚロケーションおよびｘ，ｙ，ｚ周りの回転）である。 The electronic device 1 executes a computer vision algorithm that detects the presence and position (particularly posture) of an object in the scene. Robotic systems need to be able to retrieve this information to find and manipulate objects. The input to the electronic device 1 is a pair of visible light (RGB) and a depth image (D). The output of the electronic device 1 is the 6D orientation (x, y, z location and rotation around x, y, z) of the target object.

電子デバイス１は、データストレージ２に接続されているか、または、データストレージ２を備えている。上記のデータストレージは、ターゲットオブジェクトを、形状（３Ｄ）およびオブジェクトの外観（色）情報を提供する３Ｄモデルファイルの形式において格納するために使用できる。 The electronic device 1 is connected to or includes data storage 2. The above data storage can be used to store the target object in the form of a 3D model file that provides shape (3D) and object appearance (color) information.

電子デバイス１は、システム３０においてさらなる機能をさらに実行することができる。例えば、電子デバイスは汎用ＥＣＵ（電子制御ユニット）システムとしても機能することができる。電子デバイス１は、電子回路、プロセッサ（共有、専用またはグループ）、組み合わせ論理回路、１つ以上のソフトウェアプログラムを実行するメモリ、および／または、説明された機能を提供する他の適切なコンポーネントを備えることができる。言い換えると、デバイス１はコンピュータデバイスであり得る。 The electronic device 1 can further perform additional functions in the system 30. For example, the electronic device can also function as a general-purpose ECU (electronic control unit) system. Electronic device 1 comprises electronic circuits, processors (shared, dedicated or group), combined logic circuits, memory for executing one or more software programs, and / or other suitable components that provide the described functionality. be able to. In other words, device 1 can be a computer device.

デバイス１は、オブジェクトを検索して取得するように構成されている（可動）ロボットシステム１０の外部にあることができる。言い換えると、ロボットシステム１０のボード上の計算リソースが制限され得る。例えば、それらは、３Ｄデータを、例えばＷｉ−Ｆｉ経由で外部（および、たとえば静止）電子デバイス１にのみ送信できる。装置１によって決定された結果は、ロボットに送り返すことができる。 The device 1 can be outside the (movable) robot system 10 that is configured to search for and acquire objects. In other words, the computational resources on the board of the robot system 10 can be limited. For example, they can only transmit 3D data to external (and eg static) electronic devices 1 via, for example, Wi-Fi. The result determined by the device 1 can be sent back to the robot.

電子装置１は、さらに光学センサ３特に３Ｄデジタルカメラ３、例えばステレオカメラまたはＭｉｃｒｏｓｏｆｔＫｉｎｅｃｔカメラに接続されている。電子デバイス１およびデジタルカメラは、ロボットシステム１０に含まれ得る。デジタルカメラ３は、３次元シーンを記録でき、そして特に、シーンの形状（３Ｄ）および外観（色）情報を提供するデジタルデータを出力するように構成されている。 The electronic device 1 is further connected to an optical sensor 3, particularly a 3D digital camera 3, such as a stereo camera or a Microsoft Kinect camera. The electronic device 1 and the digital camera may be included in the robot system 10. The digital camera 3 is configured to be capable of recording a three-dimensional scene and, in particular, to output digital data that provides shape (3D) and appearance (color) information of the scene.

デジタルカメラ３の出力は、電子デバイス１に送信される。望ましくは、出力は瞬時に、つまり、リアルタイムまたは準リアルタイムで送信される。したがって、検索されたオブジェクトも、リアルタイムまたは準リアルタイムで電子デバイスによって認識および検索できる（つまり、姿勢が決定される）。 The output of the digital camera 3 is transmitted to the electronic device 1. Desirably, the output is transmitted instantaneously, i.e. in real time or near real time. Therefore, the searched object can also be recognized and searched by the electronic device in real time or near real time (that is, the posture is determined).

システム３０は、サーバ２０をさらに備えることができる。サーバ２０を使用して、例えば、図３に示されているように、オブジェクトモデルの（準備／事前レンダリング）オフライン処理を実行できる。事前にレンダリングされた（つまり、再符号化された）データは、その後、サーバに保存されるか、電子デバイスに提供される。この目的のために、この電子デバイス１は、サーバに接続可能であり得る。例えば、電子デバイス１は、無線接続を介してサーバ２０に接続されることができる。代替的または追加的に、電子装置１は、固定接続を介して、例えばケーブル経由でサーバ２０に接続可能であり得る。電子デバイス１とサーバ２０間のデータ転送が、ポータブルデータストレージ、例えばＵＳＢスティックを使用することで実現されることが可能である。代替的に、サーバの処理は、電子デバイス１自体によって実行されることができる。 The system 30 may further include a server 20. The server 20 can be used to perform (preparation / pre-rendering) offline processing of the object model, for example, as shown in FIG. The pre-rendered (ie, re-encoded) data is then stored on a server or provided to an electronic device. For this purpose, the electronic device 1 may be connectable to a server. For example, the electronic device 1 can be connected to the server 20 via a wireless connection. Alternatively or additionally, the electronic device 1 may be connectable to the server 20 via a fixed connection, eg, via a cable. Data transfer between the electronic device 1 and the server 20 can be realized by using a portable data storage, for example, a USB stick. Alternatively, the processing of the server can be performed by the electronic device 1 itself.

以下では、本開示の原理概念およびアルゴリズムを、図２−図５を参照して説明する。 In the following, the principle concept and algorithm of the present disclosure will be described with reference to FIGS. 2 to 5.

本開示は、例えばルックアップテーブルを使用した反復最近傍点（ＩＣＰ）アルゴリズムの、望ましくは改善された（すなわち加速された）実装を提案する。ＩＣＰは、２つの点群の整列に一般的に使用されるアルゴリズムである。オブジェクト検出シナリオでは、１つの点群はモデル仮説から得られ、以下、「シーン」と呼ぶ、センサからのクラウドデータをポイントするように調整する必要がある。 The present disclosure proposes a preferably improved (ie, accelerated) implementation of an iterative nearest point (ICP) algorithm using, for example, a look-up table. ICP is a commonly used algorithm for aligning two point clouds. In the object detection scenario, one point cloud is obtained from the model hypothesis and needs to be adjusted to point to the cloud data from the sensor, hereinafter referred to as the "scene".

図２ａおよび図２ｂは、本開示の実施形態によれば、オブジェクト姿勢が、事前にレンダリングされたオブジェクトモデルを使用して決定される（図２ｂ）シーンの例を示している（図２ａ）。 2a and 2b show an example of a scene in which the object orientation is determined using a pre-rendered object model (FIG. 2b) according to the embodiments of the present disclosure (FIG. 2a).

図２ａのシーンＳ内の検知されたオブジェクトＯを表す画像データおよび図２ｂのオブジェクトモデルＭは、（図２ａおよび２ｂに模式的に示されているように）各点群を備える、または、構成する。点群からのデータ（例：輪郭、および表面法線）は、計算コストを削減するためにサブサンプリングされる。これは黒い点、図２ａおよび図２ｂとして示されている。 The image data representing the detected object O in the scene S of FIG. 2a and the object model M of FIG. 2b include or configure each point cloud (as schematically shown in FIGS. 2a and 2b). do. Data from point clouds (eg contours and surface normals) are subsampled to reduce computational costs. This is shown as black dots, FIGS. 2a and 2b.

詳細に、図２ａは、現在のオブジェクトの仮説を示しており、図２ｂは、事前にレンダリングされた最も近い視点（ビュー位置）を示している。それぞれに輪郭と内部サンプリングポイントがある。内部表面領域の黒い点は、対応を確立するために使用される。すなわち、そのシーンから、左ビューの黒い点は対応する点ｄ＿ｉである。そして、右ビューの黒い点は、モデルの対応するソース点ｓ＿ｉである。 In detail, FIG. 2a shows the hypothesis of the current object, and FIG. 2b shows the closest pre-rendered viewpoint (view position). Each has a contour and an internal sampling point. The black dots in the inner surface area are used to establish the correspondence. That is, from that scene, the black dot in the left view is the corresponding point d_i. The black dots in the right view are the corresponding source points s_i of the model.

図示されているように、イメージセンサによって検知された図２ａのシーンは、上下逆さまであり、したがって、図２ｂのオブジェクトモデルと同じ向きではない。したがって、この例のように、最も近い事前レンダリングされた視点を決定できるが、そのビューポイントにリンクされた事前にレンダリングされたモデルビューが表示面で回転する、すなわち、検知されたオブジェクトに関して逆さまになるという問題が発生する可能性がある。 As shown, the scene of FIG. 2a detected by the image sensor is upside down and therefore not in the same orientation as the object model of FIG. 2b. Thus, as in this example, the closest pre-rendered viewpoint can be determined, but the pre-rendered model view linked to that viewpoint rotates on the display plane, i.e. upside down with respect to the detected object. The problem of becoming can occur.

このために、モデルレンダリングスペースの安価でありながら非常に効果的な近似は、オンラインレンダリングと輪郭抽出の両方を回避するために使用できる。オフライン段階では、等距離の視点ｖ＿ｉは、ビュー依存のスパース３Ｄサンプリングセットをローカルオブジェクト空間に格納するために抽出されるレンダリングされた各３Ｄ輪郭点から、オブジェクトモデルの周りの単位球でサンプリングできる。これらのポイントは３Ｄ空間で使用することが望ましいため、スケールでサンプリングしたり、異なる面内回転をサンプリングしたりする必要はない。最後に、図２ａおよび図２ｂに示すように、各輪郭点について、また、その２Ｄ勾配方向が保存されることが望ましい、そして、それらの法線を持つ内部表面点のセットが保存され得る。 For this reason, an inexpensive yet highly effective approximation of the model rendering space can be used to avoid both online rendering and contour extraction. In the offline phase, equidistant viewpoint v_i can be sampled in the unit sphere around the object model from each rendered 3D contour point extracted to store the view-dependent sparse 3D sampling set in the local object space. Since these points are preferably used in 3D space, there is no need to sample on a scale or sample different in-plane rotations. Finally, as shown in FIGS. 2a and 2b, it is desirable that each contour point and its 2D gradient direction be preserved, and a set of internal surface points with their normals can be preserved.

簡単に言えば、輪郭点は、図２ａ、図２ｂの例に示すように、オブジェクトモデルに関しては、表示された平面内の検知されたオブジェクトの回転とは独立である姿勢決定プロセスを提供するために利用できる。 Simply put, contour points provide a pose determination process that is independent of the rotation of the detected object in the displayed plane for the object model, as shown in the examples of FIGS. 2a and 2b. Can be used for.

図３は、オブジェクトモデルの（準備／事前レンダリング）オフライン処理の例示的な方法を示す概略フローチャートを示す。 FIG. 3 shows a schematic flowchart showing an exemplary method of (preparation / pre-rendering) offline processing of an object model.

オブジェクトモデルは、高密度の視点セットおよび輪郭などの必要なすべての情報から事前にレンダリングでき、点群や表面の法線データ、またはその組み合わせを抽出できる。このデータは、３Ｄユニットの頂点位置ｖ＿ｉ（つまり、ビュー位置）とともに保存される。各ｖ＿ｉは、独自のローカル参照フレームに独自のデータへの参照を保持する。このデータは、図４に示すように、（使用中の）画像データのオンライン処理されたデータとして提供される。 The object model can be pre-rendered from all required information such as dense viewpoint sets and contours, and can extract point cloud and surface normal data, or a combination thereof. This data is stored with the vertex position v_i (ie, the view position) of the 3D unit. Each v_i holds a reference to its own data in its own local reference frame. This data is provided as online processed data of the (in use) image data, as shown in FIG.

図４は、（使用中の）画像データのオンライン処理の例示的な方法を示す概略フローチャートを示す。ここで、オブジェクト姿勢が決定される。 FIG. 4 shows a schematic flow chart illustrating an exemplary method of online processing of image data (in use). Here, the object posture is determined.

図４において、最初にオブジェクトが認識される。この文脈では、図４は、オブジェクトの認識と位置特定（つまり、姿勢の決定）の（完全な）プロセスの例を示すことに留意する。しかしながら、認識ステップは、本開示の一部ではなく、先行プロセスのみであり得る。 In FIG. 4, the object is first recognized. Note that in this context, FIG. 4 shows an example of the (complete) process of object recognition and positioning (ie, posture determination). However, the recognition step may only be a prior process, not part of this disclosure.

その後、姿勢仮説が決定される。姿勢の調整が必要な場合、現在の姿勢仮説は、回転行列Ｒおよびカメラ空間内の並進ベクトルｔとともに提供されえる（図５の「姿勢仮説」を参照）。オブジェクト空間のカメラ位置は、（式１）で取得できる。

さらに、（図３の）再符号化されたデータおよび推定カメラ位置（またはカメラに対する対応する推定オブジェクト位置）に基づいて、最も近いビュー（３Ｄ単位ベクトル）が決定される（式２）。

After that, the posture hypothesis is determined. If attitude adjustment is required, the current attitude hypothesis can be provided with the rotation matrix R and the translation vector t in camera space (see “Attitude Hypothesis” in FIG. 5). The camera position in the object space can be obtained by (Equation 1).

In addition, the closest view (3D unit vector) is determined based on the re-encoded data (in FIG. 3) and the estimated camera position (or the corresponding estimated object position with respect to the camera) (Equation 2).

最も近いビューを見つけることにより、オブジェクト姿勢の３Ｄ（θ，φ，Ψ）回転情報を決定できる。 By finding the closest view, the 3D (θ, φ, Ψ) rotation information of the object orientation can be determined.

記号化するために、モデルの姿勢［Ｒ；ｔ］は追跡中に提供され、ここでオブジェクト空間Ｏ（式１）のカメラ位置を計算することによるレンダリングは回避できる。式２では、ユニット長に正規化することが望ましい。ここで最も近い視点Ｖ^＊は、ドット積を介してすばやく見つけることができる。 To symbolize, the model orientation [R; t] is provided during tracking, where rendering by calculating the camera position in object space O (Equation 1) can be avoided. In Equation 2, it is desirable to normalize to the unit length. The closest viewpoint V ^* here can be quickly found through the dot product.

Ｖ^＊のすべてのデータについて、ソースビューの各ポイント（事前レンダリング済み）が目的地の対応するポイントｄで識別されるように、シーンに前方プロジェクトされる。次に、確立された閉形式の解を特異値分解（ＳＶＤ）で解いて進めることができる。 For all V ^* data, each point in the source view (pre-rendered) is project forward into the scene so that it is identified by the corresponding point d in the destination. Next, the established closed form solution can be solved by singular value decomposition (SVD).

各ｓ＿ｉ（モデルのソースポイント）はＶ^＊で保存されるため、各ｓ＿ｉは、現在の仮説ｐを持つシーンに変換できる。これは、３Ｄ頂点ベクトルである。（式３）を参照。

Since each s_i (model source point) is ^{stored in V *} , each s_i can be transformed into a scene with the current hypothesis p. This is a 3D vertex vector. See (Equation 3).

これは、ポイントがシーンに取り込まれるだけで、ｄ＿ｉがまだどのポイントであるかは判別されない。このため、変換された点ｐは画像平面に逆プロジェクトされ、その画像位置において、シーンポイントクラウドを検索される。したがって、一定の時間を必要とする１つの操作でのシーンにおいて、ｄ＿ｉは点群を参照することで決定できる。このプロセスは、ＫＤツリーよりも計算コストが少なくてすみ、これは、ｓ＿ｉとｄ＿ｉの間の対応を取得するためにＩＣＰで従来使用されている。 This is because the points are only captured in the scene, and it is not possible to determine which point d_i is yet. Therefore, the converted point p is inversely projected to the image plane, and the scene point cloud is searched at the image position. Therefore, in a scene with one operation that requires a certain amount of time, d_i can be determined by referring to a point cloud. This process is less computationally expensive than a KD tree, which has traditionally been used in ICP to obtain a correspondence between s_i and d_i.

さらに詳細には、図４に示すように、３Ｄポイントｐは画像平面にプロジェクトされる（カメラ画像が２Ｄに奥行き成分を加えたものであると仮定する）。 More specifically, as shown in FIG. 4, the 3D point p is projected on the image plane (assuming the camera image is 2D plus a depth component).

したがって、ポイントの２Ｄ位置は（ｘ，ｙ）において、ｘ_ｍ＝ｆ（Ｘ／Ｚ）およびｙ_ｍ＝ｆ（Ｙ／Ｚ）として決定することができる。ここで、ｆはカメラの焦点距離である。 Thus, 2D position of the point can be determined (x, _y), as x m = f (X / Z ) and _{y m = f (Y / Z} ). Here, f is the focal length of the camera.

後続のステップにおいて、深度情報ｚは、（ｘ、ｙ）において決定することができる。 In subsequent steps, the depth information z can be determined at (x, y).

その結果、ｘ，ｙ，ｚ位置情報およびθ，φ，Ψ回転情報を含む完全な６Ｄ姿勢を決定できます。 As a result, a complete 6D attitude including x, y, z position information and θ, φ, Ψ rotation information can be determined.

追加のステップとして、各対応は、プロジェクトされたソースクラウドポイントと宛先クラウドポイント｜｜（Ｒ＊ｓ＋ｔ）−ｄ｜｜との間の距離が、しきい値タウより小さい場合、方程式４を最小化するエネルギー関数についてのみ考慮される。（式４）

As an additional step, each correspondence minimizes Equation 4 if the distance between the projected source cloudpoint and the destination cloudpoint || (R * s + t) -d || is less than the threshold tau. Only the energy function to be considered. (Equation 4)

上記のすべてのステップは、各反復でタウ減衰を使用して実行でき、したがって、アルゴリズムをオクルージョンおよび外れ値に対して堅牢にする。 All the above steps can be performed with tau attenuation at each iteration, thus making the algorithm robust against occlusions and outliers.

望ましくは、ＤＡＴＡ＿Ａは任意のタイプのデータセットであることに注意する。図３では、サブサンプリングされた点群、輪郭、そして表面法線が示されている。しかし、オンライン処理中のＳＶＤでは、「点群」情報のみに使用できるが、そして「輪郭」、「表面法線」は使用されないことがあり得る。どんな種類のデータセットを保存するのが望ましいかは、ＩＣＰの最適化アルゴリズムと本願特許に依存する。ＤＡＴＡ＿Ａの項目は特定しない。 Note that preferably, DATA_A is any type of dataset. In FIG. 3, subsampled point clouds, contours, and surface normals are shown. However, in SVD during online processing, it is possible that only "point cloud" information can be used, and "contour" and "surface normal" are not used. What kind of dataset is desirable to store depends on the ICP optimization algorithm and the patent of the present application. The item of DATA_A is not specified.

オンライン処理におけるあらゆるタイプのオブジェクト認識アルゴリズムは、本開示において使用され得る。 Any type of object recognition algorithm in online processing can be used in this disclosure.

３Ｄモデルは、あらゆる種類の例えば、ファイル内に保存できる。それは、望ましくは点群で構成され、または通常、表面または色の情報、またはこれらの任意の組み合わせで構成される。フォーマットタイプは、ＤＡＴＡ＿Ａのいずれかである。 The 3D model can be stored in any kind of file, for example. It is preferably composed of point clouds, or usually composed of surface or color information, or any combination thereof. The format type is one of DATA_A.

図５は、図４の方法で使用される姿勢仮説を更新する例示的な方法を示す概略フローチャートを示す。姿勢の調整が必要な場合、このプロセスは追加で適用することができる。このプロセスでは、ＳＶＤは、仮説を更新するために、現在の姿勢仮説、始点ｓおよび終点ｄに基づいて適用できる。可能なＳＶＤ評価関数は式４である。 FIG. 5 shows a schematic flowchart showing an exemplary method of updating the posture hypothesis used in the method of FIG. This process can be applied additionally if posture adjustment is required. In this process, SVD can be applied based on the current postural hypothesis, start point s and end point d to update the hypothesis. A possible SVD evaluation function is Equation 4.

本開示では、あらゆるタイプのＩＣＰ最適化が機能するため、オンライン処理での姿勢仮説のさまざまな更新手法が可能である。 In the present disclosure, all types of ICP optimizations work, allowing various methods for updating posture hypotheses in online processing.

説明全体を通して、クレームを含み、「１つの．．．を備える（ｃｏｍｐｒｉｓｉｎｇａ）」という用語は、特に明記しない限り、「少なくとも１つを含む」と同義語として理解されるべきである。加えて、請求項を含む説明に記載されている範囲は、特に明記しない限り、その最終値を含むと理解されるべきである。説明されている要素の特定の値は、当業者に知られている許容される製造または業界の公差内にあるものと理解すべきである。そして「実質的に」および／または「およそ」、および／または「一般に」という用語の使用は、そのような容認された許容範囲内に入ることを意味すると理解されるべきである。 Throughout the description, the term "comprising a", including claims, should be understood as a synonym for "contains at least one" unless otherwise stated. In addition, the scope of the claims, including claims, should be understood to include its final value, unless otherwise stated. It should be understood that the particular values of the elements described are within acceptable manufacturing or industry tolerances known to those of skill in the art. And the use of the terms "substantially" and / or "approximately" and / or "generally" should be understood to mean falling within such acceptable tolerances.

本明細書の本開示は、特定の実施形態を参照して説明したが、これらの実施形態は、単に原理の例示および本開示の用途にすぎないことは理解されるべきことである。 Although the present disclosure herein has been described with reference to specific embodiments, it should be understood that these embodiments are merely illustrations of principles and uses of the present disclosure.

本願明細書と例は単なる例示と見なされることを意図しており、本開示の真の範囲は以下の特許請求の範囲によって示される。 The specification and examples of the present application are intended to be regarded as merely examples, and the true scope of the present disclosure is indicated by the following claims.

Claims

オブジェクト姿勢を決定するための電子デバイス（１）であって、該電子デバイスは、
光学センサ（３）の３Ｄ画像データを受信し、ここで、該３Ｄ画像データは、シーン（Ｓ）内のオブジェクト（Ｏ）を表すものであり、
前記３Ｄ画像データに基づいて前記光学センサの位置に関して、前記オブジェクト姿勢（ｘ，ｙ，ｚ，θ，φ，Ψ）を推定し、
推定された前記オブジェクト姿勢に基づいて、所定の３Ｄオブジェクトモデルの所定のビュー位置のセットから最も近いものを識別し、
識別された最も近い前記ビュー位置に基づいて、前記シーン内の前記オブジェクトの姿勢を決定するように構成される、
電子デバイス（１）であって、
前記オブジェクト姿勢は、ｘ，ｙ，ｚ位置情報、および、θ，φ，Ψ回転情報を含む６Ｄ姿勢であり、
前記オブジェクト姿勢を推定することは、受信された前記３Ｄ画像データに基づいて前記シーンにおける前記オブジェクトを認識することを含む、
電子デバイス（１）。 An electronic device (1) for determining an object posture, wherein the electronic device is
The 3D image data of the optical sensor (3) is received, and here, the 3D image data represents an object (O) in the scene (S).
The object posture (x, y, z, θ, φ, Ψ ) is estimated with respect to the position of the optical sensor based on the 3D image data.
Based on the estimated object orientation, identify the closest one from a given set of view positions in a given 3D object model.
It is configured to determine the orientation of the object in the scene based on the closest identified view position.
Electronic device (1)
The object posture is a 6D posture including x, y, z position information and θ, φ, Ψ rotation information.
Estimating the object posture includes recognizing the object in the scene based on the received 3D image data.
Electronic device (1).

前記光学センサ（３）の前記３Ｄ画像データは点群を含む、および／または、前記３Ｄオブジェクトモデルは点群を含む、請求項１に記載の電子デバイス（１）。 The electronic device (1) according to claim 1, wherein the 3D image data of the optical sensor (3) includes a point cloud and / or the 3D object model includes a point cloud.

前記オブジェクト姿勢を推定することは、前記シーン内の姿勢を推定することにより、オブジェクト姿勢の仮説を決定することと、前記オブジェクト姿勢の仮説に基づいて、前記光学センサの位置に関する前記オブジェクト姿勢の推定することと、を含む、請求項１または２に記載の電子デバイス（１）。 To estimate the object posture is to determine the hypothesis of the object posture by estimating the posture in the scene, and to estimate the object posture with respect to the position of the optical sensor based on the hypothesis of the object posture. The electronic device (1) according to claim 1 or 2, comprising:

所定のビュー位置の前記セットから最も近いものを識別することは、オブジェクト空間での前記光学センサの位置を決定することと、所定のビュー位置の前記セットから最適なビューを見つけ出すこととに基づいている、請求項１ないし３のいずれか１項に記載の電子デバイス（１）。 Identifying the closest thing to the set of given view positions is based on determining the position of the optical sensor in object space and finding the best view from the set of given view positions. The electronic device (1) according to any one of claims 1 to 3.

所定のビュー位置の前記セットのそれぞれは、再符号化されたデータセットにリンクされており、該再符号化されたデータセットは、前記ビュー位置から見たとき、前記３Ｄオブジェクトモデルのレンダリングされた画像データを表す、請求項１ないし４のいずれか１項に記載の電子デバイス（１）。 Each of the sets in a given view position is linked to a re-encoded dataset, which is a rendered version of the 3D object model when viewed from the view position. The electronic device (1) according to any one of claims 1 to 4 , which represents image data.

再符号化されたデータセットの前記レンダリングされた画像データは、前記３Ｄオブジェクトモデルのサブサンプリングされた点群、該３Ｄオブジェクトモデルのサブサンプリングされた輪郭、および／または、該３Ｄオブジェクトモデルのサブサンプリングされた表面モデルを含む、請求項５に記載の電子デバイス（１）。 The rendered image data re-encoded data set, the sub-sampled point cloud of 3D object model, the 3D object model subsampled contour, and / or subsampling of the 3D object model The electronic device (1) according to claim 5 , wherein the surface model is included.

所定のビュー位置の前記セットから最も近いものを識別することは、
前記所定のビュー位置の各々に対して、リンクされた再符号化データセットのレンダリングされた画像データを、前記シーンにプロジェクトし、
前記レンダリングされた画像データをシーン内の前記オブジェクトを表す前記３Ｄ画像データと比較し、
前記所定のビューのどれに対して、前記レンダリングされた画像データと前記シーン内の前記オブジェクトを表す前記３Ｄ画像データとの間の偏差が、最小に達するかを決定すること
を含む、請求項１ないし６のいずれか１項に記載の電子デバイス（１）。 Identifying the closest one from the set of given view positions
For each of the predetermined view positions, the rendered image data of the linked re-encoded dataset is projected into the scene.
The rendered image data is compared with the 3D image data representing the object in the scene.
Against any of the predetermined views comprises the deviation between the 3D image data representing the object in the said rendered image data scene to determine the minimum is reached, claim 1 The electronic device (1) according to any one of 6 to 6.

前記３Ｄ画像データは、可視光画像と深度画像とのペアを含む、請求項１ないし７のいずれか１項に記載の電子デバイス（１）。 The electronic device (1) according to any one of claims 1 to 7 , wherein the 3D image data includes a pair of a visible light image and a depth image.

前記可視光画像は、特に、人間の視覚システムによって処理される３つのバンド（ＲＧＢ）に分解された、電磁スペクトルの可視部分を含む、請求項８に記載の電子デバイス（１）。 The electronic device (1) of claim 8 , wherein the visible light image specifically comprises a visible portion of an electromagnetic spectrum decomposed into three bands (RGB) processed by a human visual system.

前記シーン内の前記オブジェクトの前記姿勢を決定することは、
前記所定の３Ｄオブジェクトモデルの所定のビュー位置の前記セットから、前記識別された最も近いビューに基づいてθ，φ，Ψ回転情報を決定すること、および／または最も近いビューのモデルを前記シーンにプロジェクトすることに基づいて、前記ｘ，ｙ，ｚ位置情報を決定することと、
プロジェクトされたモデルを、前記シーン内のオブジェクトを表す前記３Ｄ画像データと比較することと、
を含む、請求項１ないし９のいずれか１項に記載の電子デバイス（１）。 Determining the orientation of the object in the scene
Determining θ, φ, Ψ rotation information from the set of predetermined view positions of the predetermined 3D object model based on the identified closest view, and / or bringing the model of the closest view into the scene. Determining the x, y, z position information based on the project
Comparing the projected model with the 3D image data representing the objects in the scene,
The electronic device (1) according to any one of claims 1 to 9 , further comprising the electronic device (1).

請求項１ないし１０のいずれか１項に記載の電子デバイス（１）と、前記オブジェクトを感知するように構成された光学センサであって、該光学センサは特に３Ｄカメラまたはステレオカメラである、光学センサと、を備える、オブジェクト姿勢を決定するためのシステム（３０）。 An electronic device (1) according to any one of claims 1 to 10, an optical sensor configured to sense the object, the optical sensor is particularly 3D camera or a stereo camera, optical A system (30) for determining object orientation, comprising a sensor.

光学センサ（３）の３Ｄ画像データを受信するステップであって、該３Ｄ画像データは、シーン（Ｓ）内のオブジェクト（Ｏ）を表す、ステップと、
前記３Ｄ画像データに基づいて前記光学センサの位置に関して、オブジェクト姿勢（ｘ，ｙ，ｚ，θ，φ，Ψ）を推定するステップと、
前記推定されたオブジェクト姿勢に基づいて、所定の３Ｄオブジェクトモデルの所定のビュー位置のセットから最も近いものを識別するステップと、
前記識別された最も近いビュー位置に基づいて、前記シーン内の前記オブジェクト姿勢を決定するステップと、
のステップを含む、オブジェクト姿勢を決定する方法であって、
前記オブジェクト姿勢は、ｘ，ｙ，ｚ位置情報、および、θ，φ，Ψ回転情報を含む６Ｄ姿勢であり、
前記オブジェクト姿勢を推定するステップは、受信された前記３Ｄ画像データに基づいて前記シーンにおける前記オブジェクトを認識するステップを含む、
オブジェクト姿勢を決定する方法。 A step of receiving 3D image data of the optical sensor (3), wherein the 3D image data represents an object (O) in the scene (S).
A step of estimating the object posture (x, y, z, θ, φ, Ψ) with respect to the position of the optical sensor based on the 3D image data, and
A step of identifying the closest one from a given set of view positions in a given 3D object model based on the estimated object orientation.
A step of determining the object orientation in the scene based on the closest identified view position.
A method of determining the object's posture , including the steps in
The object posture is a 6D posture including x, y, z position information and θ, φ, Ψ rotation information.
The step of estimating the object posture includes a step of recognizing the object in the scene based on the received 3D image data.
How to determine the object posture.

前記３Ｄオブジェクトモデルの複数のビュー位置を決定するステップと、
所定のビュー位置の前記セットを形成するステップと、
所定のビュー位置の前記セットの各ビュー位置に対して、再符号化されたデータセットを決定するステップであって、該再符号化されたデータセットは、前記ビュー位置から見たとき前記３Ｄオブジェクトモデルのレンダリングされた画像データを表す、ステップと、
前記ビュー位置を前記再符号化されたデータセットにリンクするステップと、
のステップをさらに含む、請求項１２に記載の方法。 A step of determining a plurality of view positions of the 3D object model, and
With the step of forming the set of predetermined view positions,
A step of determining a re-encoded dataset for each view position of the set at a predetermined view position, wherein the re-encoded dataset is the 3D object when viewed from the view position. Steps and steps that represent the rendered image data of the model,
With the step of linking the view position to the recoded dataset,
12. The method of claim 12 , further comprising the step of.