JP7034746B2

JP7034746B2 - Feature expression device, recognition system including it, and feature expression program

Info

Publication number: JP7034746B2
Application number: JP2018016980A
Authority: JP
Inventors: 研人藤原; 育郎佐藤; 満安倍; 悠一吉田; 義明坂倉
Original assignee: Denso IT Laboratory Inc
Current assignee: Denso IT Laboratory Inc
Priority date: 2018-02-02
Filing date: 2018-02-02
Publication date: 2022-03-14
Anticipated expiration: 2038-02-02
Also published as: JP2019133545A

Description

本発明は、多次元データの集合を特徴表現する特徴表現装置、それを含む認識システム、及び特徴表現プログラムに関する。 The present invention relates to a feature expression device that features a set of multidimensional data, a recognition system including the feature expression device, and a feature expression program.

従来、車両にカメラを設置して画像を撮影し、畳込ニューラルネットワーク（ＣＮＮ）を用いて画像に対して物体認識を行う技術が知られている。このような車載カメラの画像による物体認識では、悪天候や夜間などによって車載カメラの可視性が低下する場合には、物体認識の精度が低下する。これを考慮して、レンジセンサなどから得られる３次元点群で構成される幾何情報を用いた物体認識の技術が開発されている。 Conventionally, there is known a technique of installing a camera in a vehicle to take an image and recognizing an object on the image using a convolutional neural network (CNN). In such object recognition based on the image of the vehicle-mounted camera, the accuracy of object recognition is reduced when the visibility of the vehicle-mounted camera is reduced due to bad weather or nighttime. In consideration of this, an object recognition technique using geometric information composed of a group of three-dimensional points obtained from a range sensor or the like has been developed.

しかしながら、畳込ニューラルネットワークに３次元点群をそのまま入力データとして入力すると、点の順序の問題、畳み込む領域の定義の問題が生じる。この問題に対応するために、従来の解決方法は主に３通りある。 However, if the 3D point cloud is input as input data to the convolutional neural network as it is, the problem of the order of the points and the problem of the definition of the convolution area arise. There are three main conventional solutions to address this problem.

第１の方法は、点群を点があるかないかを示すブロックで構成されるボリュームデータに変換して、近傍ブロックの畳込みを行うものである（例えば、非特許文献１を参照）。第２の方法は、点群を任意の視点から撮影した画像に変換し、あるいは物体を切り開いて展開図として画像に変換し、従来の画像学習方法を適用するものである（例えば、非特許文献２を参照）。第３の方法は、点群をそのまま扱って順序入替不変関数を学習し、点群を１つの特徴ベクトルにまとめる方法である（例えば、非特許文献３を参照）。 The first method converts a point cloud into volume data composed of blocks indicating whether or not there are points, and convolves neighboring blocks (see, for example, Non-Patent Document 1). In the second method, a point cloud is converted into an image taken from an arbitrary viewpoint, or an object is cut open and converted into an image as a developed view, and a conventional image learning method is applied (for example, a non-patent document). See 2). The third method is a method in which the point cloud is treated as it is, the order-ordering invariant function is learned, and the point cloud is combined into one feature vector (see, for example, Non-Patent Document 3).

Z. Wu et al., 3D ShapeNets: A Deep Representation for Volumetric Shape Modeling、IEEE CVPR 2015Z. Wu et al., 3D ShapeNets: A Deep Representation for Volumetric Shape Modeling, IEEE CVPR 2015 H. Su et al., Multi-view Convolutional Neural Networks for 3D Shape Recognition、ICCV 2015H. Su et al., Multi-view Convolutional Neural Networks for 3D Shape Recognition, ICCV 2015 C. Qi et al., PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation、CVPR 2017C. Qi et al., PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation, CVPR 2017 V. Nair and G. Hinton, Rectified linear units improve restricted Boltzmann machines, ICML 2010V. Nair and G. Hinton, Rectified linear units improve restricted Boltzmann machines, ICML 2010

しかしながら、上記の従来の各方法では、１つのデータで１つの物体の１つの姿勢しか表現できず、１つの物体の形状を様々な視点から表現するためには、その物体を様々な姿勢に動かして１つの物体から複数のデータに拡張することが必要になる。このため、従来の各方法では、正確な物体認識を実現するために必要な学習データの量が膨大になるとともに、識別器の構成が複雑になって学習時間も膨大になるという問題がある。 However, in each of the above-mentioned conventional methods, only one posture of one object can be expressed by one data, and in order to express the shape of one object from various viewpoints, the object is moved to various postures. It is necessary to extend from one object to multiple data. For this reason, each of the conventional methods has a problem that the amount of learning data required to realize accurate object recognition becomes enormous, the configuration of the classifier becomes complicated, and the learning time becomes enormous.

本発明は、多次元点群データを、認識や学習の負荷が小さい形式で特徴表現すること目的とする。 An object of the present invention is to characterize multidimensional point cloud data in a format with a small load of recognition and learning.

本発明の一態様の特徴表現装置は、多次元の点の集合からなる多次元点群データを特徴表現する特徴表現装置であって、前記点の集合を、前記点の集合の周辺に設定されたサンプル点に最も近い最近傍点の座標、及び前記サンプル点から前記最近傍点までの最近傍距離を示す距離場に変換する距離場変換部と、前記最近傍点の座標と前記最近傍距離からなる行列の特異値分解をして標準座標系への変換を取得する正準投影部と、前記最近傍点を入力とし、前記距離を出力とするエクストリームラーニングマシーンを訓練して、その重みを前記点の集合の特徴ベクトルとして出力するパラメータ化部とを備えた構成を有する。 The feature expression device of one aspect of the present invention is a feature expression device that features and expresses multidimensional point group data composed of a set of multidimensional points, and the set of points is set around the set of points. A matrix consisting of a distance field conversion unit that converts the coordinates of the nearest nearest point closest to the sample point and a distance field indicating the nearest neighbor distance from the sample point to the nearest neighbor point, and the coordinates of the nearest neighbor point and the nearest neighbor distance. A canonical projection unit that obtains the conversion to the standard coordinate system by decomposing the singular value of, and an extreme learning machine that inputs the nearest point and outputs the distance, and sets the weights of the points. It has a configuration including a parameterization unit that outputs as a feature vector of.

この構成により、多次元点群データを固定長の特徴ベクトルに変換するので、認識や学習の負荷が小さい形式で当該多次元点群データを特徴表現できる。なお、「多次元」は３次元以上を意味する。 With this configuration, the multidimensional point cloud data is converted into a fixed-length feature vector, so that the multidimensional point cloud data can be featured in a format with a small recognition and learning load. In addition, "multidimensional" means three or more dimensions.

上記の特徴表現装置において、前記エクストリームラーニングマシーンは、活性化関数としてＲｅＬＵを用いてよい。この構成により、多次元点群データから特徴ベクトルことで多次元点群データのスケール不変性を実現できる。 In the above feature expression device, the extreme learning machine may use ReLU as an activation function. With this configuration, scale invariance of the multidimensional point cloud data can be realized by using the feature vector from the multidimensional point cloud data.

上記の特徴表現装置において、前記多次元の点の集合は、物体の表面の点の集合として取得された３次元点群であってよい。これにより、物体の３次元形状の情報を特徴ベクトルに変換できる。 In the above feature expression device, the set of multidimensional points may be a three-dimensional point cloud acquired as a set of points on the surface of an object. As a result, the information on the three-dimensional shape of the object can be converted into a feature vector.

本発明の一態様の認識システムは、上記の特徴表現装置と、前記特徴ベクトルを用いて認識を行う認識装置とを備えた構成を有している。この構成により、多次元点群データを特徴ベクトルに変換して、認識に用いるので、認識装置における認識の負荷が小さくできる。 The recognition system of one aspect of the present invention has a configuration including the above-mentioned feature expression device and a recognition device that performs recognition using the feature vector. With this configuration, the multidimensional point cloud data is converted into a feature vector and used for recognition, so that the recognition load in the recognition device can be reduced.

上記の認識システムは、物体を撮影することによって、前記物体の３次元点群データを取得する点群データ取得装置をさらに備えていてよく、前記特徴表現装置は、前記点群データ取得装置にて取得された前記３次元点群データを前記多次元点群データとして、特徴表現をしてよい。この構成により、認識装置における物体認識の負荷を小さくできる。 The recognition system may further include a point cloud data acquisition device that acquires three-dimensional point cloud data of the object by photographing the object, and the feature expression device is the point cloud data acquisition device. The acquired three-dimensional point cloud data may be used as the multidimensional point cloud data for feature expression. With this configuration, the load of object recognition in the recognition device can be reduced.

本発明の一態様の特徴表現プログラムは、情報処理装置にて実行されることで、前記情報処理装置を、多次元の点の集合からなる多次元点群データを特徴表現する特徴表現装置であって、前記点の集合を、前記点の集合の周辺に設定されたサンプル点に最も近い最近傍点の座標、及び前記サンプル点から前記最近傍点までの最近傍距離を示す距離場に変換する距離場変換部と、前記最近傍点の座標と前記最近傍距離からなる行列の特異値分解をして標準座標系への変換を取得する正準投影部と、前記最近傍点を入力とし、前記距離を出力とするエクストリームラーニングマシーンを訓練して、その重みを前記点の集合の特徴ベクトルとして出力するパラメータ化部とを備えた特徴表現装置として機能させる。 The feature expression program of one aspect of the present invention is a feature expression device that features and expresses the multidimensional point group data composed of a set of multidimensional points by executing the information processing device in the information processing device. A distance field that converts the set of points into a distance field that indicates the coordinates of the nearest point closest to the sample point set around the set of points and the nearest distance from the sample point to the nearest point. The conversion unit, the canonical projection unit that obtains the conversion to the standard coordinate system by performing singular value decomposition of the matrix consisting of the coordinates of the nearest neighbor point and the nearest neighbor distance, and the nearest neighbor point are input and the distance is output. The extreme learning machine is trained to function as a feature expression device equipped with a parameterization unit that outputs the weight as a feature vector of the set of points.

この構成によっても、多次元点群データを固定長の特徴ベクトルに変換するので、認識や学習の負荷が小さい形式で当該多次元点群データを特徴表現できる。 Even with this configuration, since the multidimensional point cloud data is converted into a fixed-length feature vector, the multidimensional point cloud data can be feature-expressed in a format with a small recognition and learning load.

本発明によれば、多次元点群データを固定長の特徴ベクトルに変換するので、認識や学習の負荷が小さい形式で当該多次元点群データを特徴表現できる。 According to the present invention, since the multidimensional point cloud data is converted into a fixed-length feature vector, the multidimensional point cloud data can be feature-expressed in a format in which the load of recognition and learning is small.

図１は、本発明の実施の形態の物体認識システムの構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an object recognition system according to an embodiment of the present invention. 図２は、本発明の実施の形態の距離場の例を示す図である。FIG. 2 is a diagram showing an example of a distance field according to an embodiment of the present invention. 図３は、本発明の実施の形態の正準投影部による標準座標系への変換を示す図である。FIG. 3 is a diagram showing conversion to a standard coordinate system by the canonical projection unit according to the embodiment of the present invention. 図４は、本発明の実施の形態のパラメータ化部で用いるニューラルネットワークを示す図である。FIG. 4 is a diagram showing a neural network used in the parameterization unit of the embodiment of the present invention.

以下、図面を参照して本発明の実施の形態を説明する。なお、以下に説明する実施の形態は、本発明を実施する場合の一例を示すものであって、本発明を以下に説明する具体的構成に限定するものではない。本発明の実施にあたっては、実施の形態に応じた具体的構成が適宜採用されてよい。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. It should be noted that the embodiments described below show an example of the case where the present invention is carried out, and the present invention is not limited to the specific configuration described below. In carrying out the present invention, a specific configuration according to the embodiment may be appropriately adopted.

図１は、本発明の実施の形態の認識システムの構成を示すブロック図である。認識システム１００は、点群データ取得装置１０と、特徴表現装置２０と、識別装置３０とを備えている。 FIG. 1 is a block diagram showing a configuration of a recognition system according to an embodiment of the present invention. The recognition system 100 includes a point cloud data acquisition device 10, a feature expression device 20, and an identification device 30.

点群データ取得装置１０は、本実施の形態ではレンジセンサ（測域センサ）であり、レーザ光を用いた光飛行時間に基づく距離計測によって距離画像を生成して、物体の３次元点群データを取得する。以下、３次元点群データが取得された各点を「物体点」といい、物体点の集合を単に「点群」という。なお、点群データ取得装置１０における３次元点群データの取得方法は、レンジセンサに限らず、例えばステレオ測距によって３次元点群データを取得してもよい。 The point cloud data acquisition device 10 is a range sensor (range sensor) in the present embodiment, and generates a distance image by distance measurement based on light flight time using a laser beam to generate three-dimensional point cloud data of an object. To get. Hereinafter, each point from which the three-dimensional point cloud data has been acquired is referred to as an "object point", and a set of object points is simply referred to as a "point cloud". The method for acquiring the three-dimensional point cloud data in the point cloud data acquisition device 10 is not limited to the range sensor, and for example, the three-dimensional point cloud data may be acquired by stereo ranging.

特徴表現装置２０は、点群データ取得装置１０で得られた３次元点群データを特徴ベクトルに変換する。特徴表現装置２０は、距離場変換部２１と、正準投影部２２と、パラメータ化部２３とを備えている。 The feature expression device 20 converts the three-dimensional point cloud data obtained by the point cloud data acquisition device 10 into a feature vector. The feature expression device 20 includes a distance field conversion unit 21, a canonical projection unit 22, and a parameterization unit 23.

距離場変換部２１は、任意の座標系に置かれている物体点を標準座標系での表現に変換するため、点群を陰的表現である距離場に変換する。図２は、本発明の実施の形態の距離場の例を示す図である。図２において、濃い部分は物体までの距離が近く、薄い部分は物体までの距離が遠いことを示している。 The distance field conversion unit 21 converts a point cloud into a distance field, which is an implicit expression, in order to convert an object point placed in an arbitrary coordinate system into a representation in a standard coordinate system. FIG. 2 is a diagram showing an example of a distance field according to an embodiment of the present invention. In FIG. 2, the dark part indicates that the distance to the object is short, and the light part indicates that the distance to the object is long.

距離場変換部２１は、物体点の周辺にランダムに設定した空間サンプル点と点群との最近傍距離を計測することで、任意の座標系の点群を距離場に変換する。ここで、最近傍距離とは、物体周辺の単位球の中に空間サンプル点ｓを設定し、物体点ｐのうちの空間サンプル点ｓとの距離が最小になる物体点ｐと当該空間サンプル点ｓとの距離であり、空間サンプル点の関数φ（ｓ）として、以下の式（１）で算出される。 The distance field conversion unit 21 converts a point cloud of an arbitrary coordinate system into a distance field by measuring the nearest distance between a spatial sample point randomly set around the object point and the point cloud. Here, the nearest proximity distance is an object point p and the space sample point where the distance from the space sample point s among the object points p is minimized by setting the space sample point s in the unit sphere around the object. It is a distance from s and is calculated by the following equation (1) as a function φ (s) of a spatial sample point.

ここで、Ｐは、物体点の集合である。すなわち、距離場Φ_ｐを構成するために、物体点の集合Ｐの周りにランダムに空間サンプル点ｓを設定する。

Here, P is a set of object points. That is, in order to construct the distance field Φ _p , spatial sample points s are randomly set around the set P of object points.

任意の座標系の物体点の集合を距離場に変換することには次の２つの利点がある。第１の利点は、距離場が物体点の並べ替えに対して不変であるということである。距離場は、物体点ｐとその周りに設定された空間サンプル点ｓとの最近傍距離を含むため、同じ点群については、その姿勢に関わらず、同じ距離場が得られることになる。 Converting a set of object points in an arbitrary coordinate system into a distance field has the following two advantages. The first advantage is that the distance field is invariant to the reordering of object points. Since the distance field includes the nearest neighbor distance between the object point p and the spatial sample point s set around the object point p, the same distance field can be obtained for the same point cloud regardless of its posture.

第２の利点は、距離場が、座標系が拡大縮小するとそれに応じて距離φ（ｓ）も拡大縮小するというスケール共変性を有するという点である。したがって、本実施の形態では、距離場の表現をスケール可換性をもつニューラルネットワーク（後述）に埋め込むことで、スケール不変性を実現する。 The second advantage is that the distance field has scale covariance in which the distance φ (s) scales accordingly as the coordinate system scales. Therefore, in the present embodiment, scale invariance is realized by embedding the representation of the distance field in a neural network (described later) having scale commutativity.

正準投影部２２は、距離場を４次元標準座標系に投影することで回転不変性を実現する。このために、正準投影部２２は、まず、空間サンプル点Ｓの座標と最近傍物体点（点群のうちの最近傍距離を有する点）までの距離とを連結させた下式（２）の行列Ｍを生成する。

The canonical projection unit 22 realizes rotation invariance by projecting a distance field onto a four-dimensional standard coordinate system. For this purpose, the canonical projection unit 22 first connects the coordinates of the spatial sample point S and the distance to the nearest object point (the point having the nearest distance in the point cloud) by the following equation (2). Generates the matrix M of.

正準投影部２２は、下式（３）に示すように行列Ｍの特異値分解を行い、標準座標系への変換を取得する。

すなわち、標準座標系への変換は、空間サンプル点の座標ｓｘ、ｓｖ、ｓｚ及び対応する距離φ（ｓ）からなる行列Ｍの固有値分解で得られるＶ^＊とする。 The canonical projection unit 22 performs singular value decomposition of the matrix M as shown in the following equation (3), and obtains the conversion to the standard coordinate system.

That is, the conversion to the standard coordinate system is V ^* obtained by the eigenvalue decomposition of the matrix M consisting of the coordinates sx, sv, sz of the spatial sample points and the corresponding distance φ (s).

正準投影部２２は、Ｖ^＊を固定するために、下式（４）によって、基底Ｕと行列Ｍの距離部分のベクトルφ（ｓ）を掛け合わせて符号を取得する。

In order to fix V ^* , the canonical projection unit 22 obtains a code by multiplying the vector φ (s) of the distance portion of the basis U and the matrix M by the following equation (4).

正準投影部２２は、さらに、得られた符号を下式（５）によってＶ^＊に適用する。

ここで、Ｃは、符号ｃを対角に持つ行列である。この投影によって、任意の物体は、それがどのような座標系で表現されていても、標準座標系では１つの姿勢に位置合わせされる。 The canonical projection unit 22 further applies the obtained reference numeral to V ^* by the following equation (5).

Here, C is a matrix having the symbol c diagonally. This projection aligns any object in one pose in the frame of reference, no matter what coordinate system it is represented in.

図３は、本発明の実施の形態の正準投影部２２による標準座標系への変換を示す図である。図３では、見やすさのために、もとの点群が特異値分解によって得られたＶに従って変換されている。また、図３の上段は、同じ物体（ウサギ）の表面の点群が任意の角度で回転している状態を示しており、下段は、対応する上段の物体の表面の点群の変換を示している。 FIG. 3 is a diagram showing conversion to a standard coordinate system by the canonical projection unit 22 according to the embodiment of the present invention. In FIG. 3, the original point cloud is converted according to the V obtained by the singular value decomposition for the sake of visibility. Further, the upper part of FIG. 3 shows a state in which the point cloud on the surface of the same object (rabbit) is rotated at an arbitrary angle, and the lower part shows the conversion of the point cloud on the surface of the corresponding upper object. ing.

物体が図３の上段に示すように任意の角度に回転していたとしても、正準投影部２２の投影によって図３の下段に示すようにすべて距離場が唯一の姿勢に位置合わせされる。すなわち、正準投影部２２によって、様々な姿勢が標準座標系に配置され、正準表現は回転不変となる。 Even if the object is rotated at an arbitrary angle as shown in the upper part of FIG. 3, the distance field is all aligned to the unique posture as shown in the lower part of FIG. 3 by the projection of the canonical projection unit 22. That is, various postures are arranged in the standard coordinate system by the canonical projection unit 22, and the canonical representation is rotation-invariant.

パラメータ化部２３は、標準座標系に投影された距離場を特徴ベクトルに埋め込む。図４は、本発明の実施の形態のパラメータ化部２３で用いるニューラルネットワークを示す図である。パラメータ化部２３は、図４に示すニューラルネットワークを用いて、標準座標系に投影された距離場を固定長の特徴ベクトルに埋め込む。このニューラルネットワークは、空間サンプル点の標準座標系での座標値を入力とし、対応する最近傍距離の値を出力とするものであり、物体の距離関数の役割を果たす。 The parameterization unit 23 embeds the distance field projected on the standard coordinate system in the feature vector. FIG. 4 is a diagram showing a neural network used in the parameterization unit 23 of the embodiment of the present invention. The parameterization unit 23 embeds the distance field projected on the standard coordinate system into a fixed-length feature vector using the neural network shown in FIG. This neural network takes the coordinate values of the spatial sample points in the standard coordinate system as inputs and outputs the values of the corresponding nearest neighbor distances, and plays the role of the distance function of the object.

通常のニューラルネットワークでは、１つの物体に対して大量の重みパターンの可能性が存在するが、本実施の形態のパラメータ化部２３は、１つの物体に対して１つの重みパターンが生成されるように、事前に決定した乱数基底Ｗを用いるエクストリームラーニングマシーン（ＥＬＭ）を採用する。パラメータ化部２３は、このＥＬＭの重みβ_１～β_Ｋを点群の特徴ベクトルとして出力する。 In a normal neural network, there is a possibility of a large number of weight patterns for one object, but the parameterization unit 23 of the present embodiment is such that one weight pattern is generated for one object. An extreme learning machine (ELM) using a predetermined random number basis W is adopted. The parameterization unit 23 outputs the weights β ₁ to β _K of this ELM as a feature vector of the point cloud.

ここで、ＥＬＭは、フィードフォワードニューラルネットワークであり、その重みＷはランダムに設定される。いま、入力をＸとし、ターゲットをｔとして、この入力ＸをＫ次元の特徴空間にマッピングして、下式（６）によって出力Ｈを得る。

ここで、関数ｆは、非線形の活性化関数であり、ｗ_ｉは、第ｉ次元に対応する重みであり、ｂは任意のバイアスである。 Here, ELM is a feedforward neural network, and its weight W is randomly set. Now, with X as the input and t as the target, this input X is mapped to the K-dimensional feature space, and the output H is obtained by the following equation (6).

Here, the function f is a non-linear activation function, wi is a weight corresponding to the _i -th dimension, and b is an arbitrary bias.

このニューラルネットワークのパラメータを取得して、このニューラルネットワークの出力がターゲットｔとなるように、ｔ＝Ｈβを満たす重みβを見つける必要がある。この重みβは、単純にＨの疑似逆行列を見つけることで得ることも可能であるが、下式（７）を解くことでより頑健に重みβを求めることができる。 It is necessary to acquire the parameters of this neural network and find the weight β that satisfies t = Hβ so that the output of this neural network becomes the target t. This weight β can be obtained by simply finding the pseudo-inverse matrix of H, but the weight β can be obtained more robustly by solving the following equation (7).

ここで、ｃは、Ｈの対角成分に加えられる制約ないし拘束である。ＥＬＭの基底は任意の値に固定されているので、それらに対応する重みは唯一に決定される。よって、ＥＬＭのこの特性を利用することで、与えられた点群に対して唯一の重みのセットβ_１～β_Ｋを得ることができる。

Here, c is a constraint or constraint applied to the diagonal component of H. Since the bases of ELM are fixed to arbitrary values, the corresponding weights are uniquely determined. Therefore, by utilizing this property of ELM, it is possible to obtain the only set of weights β ₁ to β _K for a given point cloud.

パラメータ化部２３は、標準座標系での距離場の情報を埋め込むために、標準座標系に投影された空間サンプル点Ｓの座標を入力として用い、それを距離関数Φ（Ｓ）のもとの出力に戻す。Ｖ^＊の１～３列目を転置させたＶの第１列をＶ_Ｓと表記すると、このＶ_Ｓは、空間サンプル点Ｓの座標を標準座標系に変換するものである。 The parameterization unit 23 uses the coordinates of the spatial sample point S projected on the standard coordinate system as an input in order to embed the information of the distance field in the standard coordinate system, and uses it as the input under the distance function Φ (S). Return to output. When the first column of V obtained by transposing the first to third columns of V ^* is expressed as VS, this _VS converts the coordinates of the spatial sample point _S into a standard coordinate system.

そこで、パラメータ化部２３は、下式（８）によって、４次元の標準座標系に投影された空間サンプリング点Ｓの要素である座標値Ｌを取得する。

すなわち、パラメータ化部２３は、空間サンプル点Ｓの座標を、Ｖ^＊の１～３列目を転置させたＶに適用して、４次元の標準座標系における座標値Ｌを取得する。 Therefore, the parameterization unit 23 acquires the coordinate value L, which is an element of the spatial sampling point S projected on the four-dimensional standard coordinate system, by the following equation (8).

That is, the parameterization unit 23 applies the coordinates of the spatial sample point S to V in which the first to third columns of V ^* are transposed, and acquires the coordinate value L in the four-dimensional standard coordinate system.

パラメータ化部２３は、入力を空間サンプル点Ｓの座標の標準座標系での値Ｌとし、出力を対応する距離φ（Ｓ）として、標準座標系での距離場の情報を埋め込むように、ＥＬＭを訓練する。 The parameterization unit 23 sets the input as the value L of the coordinates of the spatial sample point S in the standard coordinate system, and sets the output as the corresponding distance φ (S) so as to embed the information of the distance field in the standard coordinate system. To train.

本実施の形態では、さらに活性化関数にＲｅＬＵ（Rectified Liner Unit、非特許文献４を参照）を利用して、下式（９）に示すようにバイアスｂを取り除くことで、物体のスケール不変性も実現する。

In the present embodiment, ReLU (Rectified Liner Unit, see Non-Patent Document 4) is further used as an activation function to remove the bias b as shown in the following equation (9), thereby invariing the scale of the object. Will also be realized.

ここで、ｗ_ｉは、乱数基底であり、Ｋは、基底数であり、ｆは活性化関数（ＲｅＬＵ）である。また、Ｌ_ｉｎは、

であり、その最終行は入力Ｌにおけるすべての値の標準偏差によって拡大縮小されたバイアスである。 Here, _wi is a random number basis, K is a basis number, and f is an activation function (ReLU). _In addition, Lin is

And its last line is the bias scaled by the standard deviation of all values at the input L.

すなわち、ＲｅＬＵは負の値以外をそのまま返す活性化関数であるため、バイアスｂを取り除くことで、式（９）において、入力のスケール要素がそのまま出力に反映される。このとき、内部の重みは変わらないので、式（９）によってスケール不変性が実現できる。 That is, since ReLU is an activation function that returns a value other than a negative value as it is, by removing the bias b, the scale element of the input is directly reflected in the output in the equation (9). At this time, since the internal weight does not change, scale invariance can be realized by the equation (9).

パラメータ化部２３は、φ（ｓ）＝Ｈβを満足する重みβを下式（１０）で求める。

パラメータ化部２３は、このようにして得られ得た重みβを、もとの点群の特徴ベクトルとして出力する。 The parameterization unit 23 obtains the weight β satisfying φ (s) = Hβ by the following equation (10).

The parameterization unit 23 outputs the weight β thus obtained as a feature vector of the original point cloud.

上記から明らかなように、特徴表現装置２０は、３次元点群データを距離場に変換して得られる情報を用いてＥＬＭを訓練し、その訓練によって得られたＥＬＭの重みβを、点群データ取得装置１０にて取得された３次元点群データの特徴ベクトルとして出力するものである。よって、特徴表現装置２０は、特徴ベクトルに変換すべき３次元点群データが点群データ取得装置１０で得られるたびにＥＬＭの訓練を行うものである。 As is clear from the above, the feature expression device 20 trains the ELM using the information obtained by converting the three-dimensional point cloud data into a distance field, and the weight β of the ELM obtained by the training is used as the point cloud. It is output as a feature vector of the three-dimensional point cloud data acquired by the data acquisition device 10. Therefore, the feature expression device 20 trains the ELM every time the point cloud data acquisition device 10 obtains the three-dimensional point cloud data to be converted into the feature vector.

識別装置３０は、特徴表現装置２０から出力される特徴ベクトルを用いて学習を行い、あるいは識別処理を行う。本実施の形態の認識システム１００では、従来法とは異なり、特徴表現装置２０において、１物体につき１表現で表され、かつ、コンパクトな特徴ベクトルが生成されるので、識別装置３０としては、ディープニューラルネットワーク等の複雑な識別処理は不要となり、総数の少ない従来のニューラルネットワークで十分に３次元点群データの識別が可能である。また、特徴表現装置２０では、ＥＬＭの重みを特徴ベクトルとして生成するので、点群データの点数に関わらず固定長の特徴ベクトルを得ることができる。 The identification device 30 performs learning or identification processing using the feature vector output from the feature expression device 20. In the recognition system 100 of the present embodiment, unlike the conventional method, the feature expression device 20 is represented by one expression for each object and a compact feature vector is generated. Therefore, the identification device 30 is deep. Complicated identification processing such as a neural network becomes unnecessary, and a conventional neural network with a small total number can sufficiently identify three-dimensional point cloud data. Further, since the feature expression device 20 generates the ELM weight as a feature vector, a feature vector having a fixed length can be obtained regardless of the number of points in the point cloud data.

なお、特徴表現装置２０は、情報処理装置によって構成することができ、特徴量減装置２０の構成要素である距離場変換部２１、正準投影部２２、及びパラメータ化部２３は、ハードウェアと協働するソフトウェアによって実現してもよいし、ハードウェア回路で構成されてもよい。また、識別装置３０もハードウェアと協働するソフトウェアによって実現してもよいし、ハードウェア回路で構成されてもよく、特徴表現装置２０を構成する情報処理装置と一体的に構成されてもよい。ソフトウェアは、情報処理装置にて実行される情報処理プログラムによって提供されてよい。 The feature expression device 20 can be configured by an information processing device, and the distance field conversion unit 21, the canonical projection unit 22, and the parameterization unit 23, which are the components of the feature amount reduction device 20, are hardware. It may be realized by collaborative software, or it may be configured by a hardware circuit. Further, the identification device 30 may be realized by software that cooperates with hardware, may be configured by a hardware circuit, or may be integrally configured with an information processing device that constitutes the feature expression device 20. .. The software may be provided by an information processing program executed by the information processing apparatus.

また、上記の実施の形態では、特徴表現装置２０において得られた３次元点群データの特徴ベクトルを用いて、識別装置３０により３次元点群データで表現される物体の認識を行う認識システムを説明したが、特徴表現装置２０は認識システム以外にも応用が可能である。すなわち、特徴表現装置２０で得られる特徴ベクトルの用途は認識処理に限られない。 Further, in the above embodiment, a recognition system for recognizing an object represented by the three-dimensional point cloud data by the identification device 30 by using the feature vector of the three-dimensional point cloud data obtained by the feature expression device 20 is provided. As described above, the feature expression device 20 can be applied to other than the recognition system. That is, the use of the feature vector obtained by the feature expression device 20 is not limited to the recognition process.

３次元点群データに限られない、４次元ないしそれ以上の多次元データの集合であっても、上記の実施の形態と同様にして距離場に変換して、ＥＬＭの訓練を行うことで得られた重みを特徴ベクトルとすることは有効である。この意味で、本実施の形態の「点」は、２次元や３次元で表現される点のみならず、４次元以上で表現される情報を含むものである。 Even a set of four-dimensional or more multidimensional data, which is not limited to three-dimensional point cloud data, can be obtained by converting it into a distance field in the same manner as in the above embodiment and performing ELM training. It is effective to use the given weight as a feature vector. In this sense, the "point" of the present embodiment includes not only points represented in two dimensions or three dimensions but also information represented in four or more dimensions.

本発明は、認識や学習の負荷が小さい形式で多次元点群データを特徴表現でき、多次元データの集合を特徴表現する特徴表現装置等として有用である。 INDUSTRIAL APPLICABILITY The present invention can feature-express multidimensional point cloud data in a format with a small load of recognition and learning, and is useful as a feature expression device or the like that features-expresses a set of multidimensional data.

１０点群データ取得装置
２０特徴表現装置
２１距離場変換部
２２正準投影部
２３パラメータ化部
３０識別装置
１００認識システム 10 point cloud data acquisition device 20 feature expression device 21 distance field conversion unit 22 canonical projection unit 23 parameterization unit 30 identification device 100 recognition system

Claims

多次元の点の集合からなる多次元点群データを特徴表現する特徴表現装置であって、
前記点の集合を、前記点の集合の周辺に設定されたサンプル点の座標、及び前記サンプル点から前記点の集合のうちの前記サンプル点に最も近い点までの距離である最近傍距離を示す距離場に変換する距離場変換部と、
前記サンプル点の座標と前記最近傍距離からなる行列の特異値分解をして前記行列の標準座標系への変換を取得する正準投影部と、
前記標準座標系に変換された前記サンプル点の座標を入力とし、前記最近傍距離を出力とするエクストリームラーニングマシーンを訓練して、その重みを前記点の集合の特徴ベクトルとして出力するパラメータ化部と、
を備えた、特徴表現装置。 It is a feature expression device that features and expresses multidimensional point cloud data consisting of a set of multidimensional points.
The nearest distance, which is the coordinates of the sample points set around the set of points and the distance from the sample points to the point closest to the sample point in the set of points. A distance field converter that converts to a distance field that indicates
A canonical projection unit that obtains a transformation of the matrix into a standard coordinate system by performing a singular value decomposition of the matrix consisting of the coordinates of the sample points and the nearest proximity distance.
With a parameterization unit that trains an extreme learning machine that inputs the coordinates of the sample points converted to the standard coordinate system and outputs the nearest neighbor distance, and outputs the weights as a feature vector of the set of points. ,
A feature expression device equipped with.

前記エクストリームラーニングマシーンは、活性化関数としてＲｅＬＵを用いる、請求項１に記載の特徴表現装置。 The feature expression device according to claim 1, wherein the extreme learning machine uses ReLU as an activation function.

前記多次元の点の集合は、物体の表面の点の集合として取得された３次元点群である、請求項１又は２に記載の特徴表現装置。 The feature expression device according to claim 1 or 2, wherein the multidimensional point set is a three-dimensional point cloud acquired as a set of points on the surface of an object.

請求項１に記載の特徴表現装置と、
前記特徴ベクトルを用いて認識を行う認識装置と、
を備えた認識システム。 The feature expression device according to claim 1 and
A recognition device that performs recognition using the feature vector, and
A recognition system equipped with.

物体を撮影することによって、前記物体の３次元点群データを取得する点群データ取得装置をさらに備え、
前記特徴表現装置は、前記点群データ取得装置にて取得された前記３次元点群データを前記多次元点群データとして、特徴表現をする、請求項４に記載の認識システム。 A point cloud data acquisition device for acquiring three-dimensional point cloud data of the object by photographing the object is further provided.
The recognition system according to claim 4, wherein the feature expression device expresses a feature using the three-dimensional point cloud data acquired by the point cloud data acquisition device as the multidimensional point cloud data.

情報処理装置にて実行されることで、前記情報処理装置を、多次元の点の集合からなる多次元点群データを特徴表現する特徴表現装置であって、
前記点の集合を、前記点の集合の周辺に設定されたサンプル点の座標、及び前記サンプル点から前記点の集合のうちの前記サンプル点に最も近い点までの距離である最近傍距離を示す距離場に変換する距離場変換部と、
前記サンプル点の座標と前記最近傍距離からなる行列の特異値分解をして前記行列の標準座標系への変換を取得する正準投影部と、
前記標準座標系に変換された前記サンプル点の座標を入力とし、前記最近傍距離を出力とするエクストリームラーニングマシーンを訓練して、その重みを前記点の集合の特徴ベクトルとして出力するパラメータ化部と、
を備えた、特徴表現装置として機能させる特徴表現プログラム。 The information processing device is a feature expression device that features and expresses multidimensional point cloud data composed of a set of multidimensional points by being executed by the information processing device.
The nearest distance, which is the coordinates of the sample points set around the set of points and the distance from the sample points to the point closest to the sample point in the set of points. A distance field converter that converts to a distance field that indicates
A canonical projection unit that obtains a transformation of the matrix into a standard coordinate system by performing a singular value decomposition of the matrix consisting of the coordinates of the sample points and the nearest proximity distance.
With a parameterization unit that trains an extreme learning machine that inputs the coordinates of the sample points converted to the standard coordinate system and outputs the nearest neighbor distance, and outputs the weights as a feature vector of the set of points. ,
A feature expression program that functions as a feature expression device.