JP2019139651A

JP2019139651A - Program, device and method for classifying unknown multi-dimensional vector data groups into classes

Info

Publication number: JP2019139651A
Application number: JP2018024404A
Authority: JP
Inventors: 修平山口; Shuhei Yamaguchi; 直紀関口; Naoki Sekiguchi; 栄二宇都宮; Eiji Utsunomiya
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2018-02-14
Filing date: 2018-02-14
Publication date: 2019-08-22
Anticipated expiration: 2038-02-14
Also published as: JP6846369B2

Abstract

To provide a program, device and method with which it is possible to classify unknown multi-dimensional vector data groups into classes without preliminary learning.SOLUTION: The present invention causes a computer to function as cluster number estimation means for clustering multi-dimensional vector data groups and thereby estimating the number of clusters k, and as class classification means for inputting multi-dimensional vector data to a neural network whose output layer is the number of clusters k and outputting k per-class classification probabilities. Especially, the cluster number estimation means is based on DBSCAN. The class classification means uses a softmax function as an activation function of the output layer in the neural network and outputs per-class classification probabilities based on total probability 1. The present invention may further include dimension compression means for compressing the dimensions of multi-dimensional vector data in a stage preceding the cluster number estimation means.SELECTED DRAWING: Figure 2

Description

本発明は、複数次元のベクトルデータ群をクラス分類する技術に関する。 The present invention relates to a technique for classifying a multi-dimensional vector data group.

近年、ＩｏＴ(Internet of Things)用のセンサの小型化及び高精度化に伴って、サーバは、ネットワークを介して大量のセンサデータを収集することができる。また、ニューラルネットワークの普及に伴って、それらビッグデータも容易に分析することができる。更に、これまで人手に基づく各種モニタリング業務も、システムによって自動化され、一元的に管理されてきている。例えば人の行き来が困難な地域（山間部や、インフラ設備（橋や鉄塔、柵等）、野生動物監視）であっても、環境センサを配置することによって、遠隔からのリモート監視も可能となってきている。 In recent years, with the miniaturization and high accuracy of sensors for IoT (Internet of Things), a server can collect a large amount of sensor data via a network. Also, with the spread of neural networks, these big data can be easily analyzed. Furthermore, various monitoring operations based on human resources have been automated and managed centrally by the system. For example, even in areas where it is difficult for people to come and go (mountain areas, infrastructure facilities (bridges, steel towers, fences, etc.), wildlife monitoring), remote monitoring from remote locations is possible by placing environmental sensors. It is coming.

一方で、ビッグデータの分析という観点からは、センサデータ自体が統一的なフォーマットで構成されていない。そのために、ニューラルネットワークを用いたとしても、分析結果の信頼性の確保が難しく、データの共有化が進みにくいという問題もある。特に、センサデータが多種多様且つ大量になってくるほど、多数次元のベクトルデータを正常状態と異常状態とに分類することは難しくなる。 On the other hand, from the viewpoint of big data analysis, the sensor data itself is not configured in a uniform format. Therefore, even if a neural network is used, there is a problem that it is difficult to ensure the reliability of analysis results and it is difficult to share data. In particular, it becomes more difficult to classify multi-dimensional vector data into a normal state and an abnormal state as the sensor data becomes various and large in quantity.

従来、機械設備に設置されたセンサからセンサデータを取得し、異常予兆の有無を診断する技術がある（例えば特許文献１参照）。この技術によれば、正常期間のセンサデータの時系列的な波形から正常モデルを学習し、運転期間のセンサデータを正常モデルと比較して、機械設備の異常予兆の有無を診断する。正常モデルは、統計的分類手法のクラスタリングによって、正常期間のセンサデータから生成されたクラスタによって構成される。クラスタとは、多次元ベクトル空間における中心及び半径で特定される領域である。 Conventionally, there is a technique for acquiring sensor data from a sensor installed in a mechanical facility and diagnosing the presence or absence of an abnormality sign (see, for example, Patent Document 1). According to this technique, a normal model is learned from a time-series waveform of sensor data in a normal period, and sensor data in an operation period is compared with a normal model to diagnose the presence or absence of an abnormality sign of mechanical equipment. The normal model is composed of clusters generated from sensor data in a normal period by clustering using a statistical classification method. A cluster is an area specified by a center and a radius in a multidimensional vector space.

特開２０１７−０３３４７０号公報JP 2017-033470 A

「t-SNE を用いた次元圧縮方法のご紹介」、[online]、［平成２９年１２月９日検索］、インターネット＜URL:https://blog.albert2005.co.jp/2015/12/02/tsne/＞“Introduction of dimensional compression method using t-SNE”, [online], [Search on December 9, 2017], Internet <URL: https://blog.albert2005.co.jp/2015/12/ 02 / tsne / ＞

図１は、未知のビッグデータの分析における課題を表す説明図である。 FIG. 1 is an explanatory diagram showing a problem in analysis of unknown big data.

図１によれば、特許文献１に記載の技術も同様に、正常期間におけるセンサデータから事前学習を必要するものである。そのために、多種多様且つ大量であって未知の多数次元のベクトルデータを、事前学習なしに分類することは困難であった。
また、クラスタリングによってクラスタに分類して学習しているが、クラスタ数を予め指定しなければならない。このクラスタ数は、学習モデルの精度にも影響を与えるものであるために、未知のデータに対して予めクラスタ数を決定することは極めて難しい。 According to FIG. 1, the technique described in Patent Document 1 also requires prior learning from sensor data in a normal period. For this reason, it is difficult to classify various and large amounts of unknown multi-dimensional vector data without prior learning.
Further, although learning is performed by classifying into clusters by clustering, the number of clusters must be designated in advance. Since the number of clusters affects the accuracy of the learning model, it is extremely difficult to determine the number of clusters in advance for unknown data.

そこで、本発明は、事前学習なしに、未知の複数次元のベクトルデータ群をクラス分類することができるプログラム、装置及び方法を提供することを目的とする。 Therefore, an object of the present invention is to provide a program, an apparatus, and a method capable of classifying an unknown multi-dimensional vector data group without prior learning.

本発明によれば、複数次元のベクトルデータ群をクラス分類する装置に搭載されたコンピュータを機能させるプログラムであって、
複数次元のベクトルデータ群をクラスタリングすることによって、クラスタ数ｋを推定するクラスタ数推定手段と、
複数次元のベクトルデータを、出力層をクラスタ数ｋとするニューラルネットワークに入力し、ｋ個のクラス毎の分類確率を出力するクラス分類手段と
してコンピュータを機能させることを特徴とする。 According to the present invention, there is provided a program for causing a computer mounted in an apparatus for classifying a vector data group of a plurality of dimensions to function,
Cluster number estimation means for estimating the number k of clusters by clustering a multidimensional vector data group;
The computer is caused to function as class classification means for inputting multi-dimensional vector data to a neural network having an output layer of k clusters and outputting classification probabilities for each of k classes.

本発明のプログラムにおける他の実施形態によれば、
クラスタ数推定手段は、ＤＢＳＣＡＮ(Density-Based Spatial Clustering of Applications with Noise)に基づくものである
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the program of the present invention,
The cluster number estimating means preferably causes the computer to function so as to be based on DBSCAN (Density-Based Spatial Clustering of Applications with Noise).

本発明のプログラムにおける他の実施形態によれば、
クラス分類手段は、ニューラルネットワークにおける出力層の活性化関数として、ソフトマックス関数を用いて、合計確率１とするクラス毎の分類確率を出力する
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the program of the present invention,
The class classification means preferably causes the computer to output a classification probability for each class with a total probability of 1, using a softmax function as an activation function of the output layer in the neural network.

本発明のプログラムにおける他の実施形態によれば、
クラスタ数推定手段の前段にあって、複数次元のベクトルデータ群を次元圧縮する次元圧縮手段を更に有し、
次元圧縮手段から出力された低次元ベクトルデータ群を、クラスタ数推定手段へ入力する
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the program of the present invention,
In the preceding stage of the cluster number estimation means, further comprising a dimension compression means for dimensionally compressing a multidimensional vector data group,
It is also preferable to cause the computer to function so as to input the low-dimensional vector data group output from the dimension compression means to the cluster number estimation means.

本発明のプログラムにおける他の実施形態によれば、
次元圧縮手段は、ｔ−ＳＮＥ(t-Distributed Stochastic Neighbor Embedding)に基づくものであり、
低次元ベクトルデータ群は、２次元又は３次元のベクトルデータ群である
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the program of the present invention,
The dimension compression means is based on t-SNE (t-Distributed Stochastic Neighbor Embedding),
It is also preferable to cause the computer to function so that the low-dimensional vector data group is a two-dimensional or three-dimensional vector data group.

本発明のプログラムにおける他の実施形態によれば、
複数次元のベクトルデータ群は、複数のセンサから出力されたベクトルデータを混在させたものである
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the program of the present invention,
It is also preferred that the computer function so that the multi-dimensional vector data group is a mixture of vector data output from a plurality of sensors.

本発明のプログラムにおける他の実施形態によれば、
クラス分類手段によって最も高い分類確率となるクラスのラベルを、当該複数次元のベクトルデータに付与することによって、教師データを生成する教師データ生成手段と
してコンピュータを更に機能させることも好ましい。 According to another embodiment of the program of the present invention,
It is also preferable to further cause the computer to function as teacher data generation means for generating teacher data by giving a class label having the highest classification probability to the multi-dimensional vector data by the class classification means.

本発明によれば、複数次元のベクトルデータ群をクラス分類する装置であって、
複数次元のベクトルデータ群をクラスタリングすることによって、クラスタ数ｋを推定するクラスタ数推定手段と、
複数次元のベクトルデータを、出力層をクラスタ数ｋとするニューラルネットワークに入力し、ｋ個のクラス毎の分類確率を出力するクラス分類手段と
を有することを特徴とする。 According to the present invention, an apparatus for classifying a multi-dimensional vector data group,
Cluster number estimation means for estimating the number k of clusters by clustering a multidimensional vector data group;
Classifying means for inputting multi-dimensional vector data into a neural network whose output layer has k number of clusters and outputting classification probabilities for each of k classes is characterized.

本発明によれば、複数次元のベクトルデータ群を入力する装置のクラス分類方法であって、
装置は、
複数次元のベクトルデータ群をクラスタリングすることによって、クラスタ数ｋを推定する第１のステップと、
複数次元のベクトルデータを、出力層をクラスタ数ｋとするニューラルネットワークに入力し、ｋ個のクラス毎の分類確率を出力する第２のステップと
を実行することを特徴とする。 According to the present invention, there is provided a class classification method for an apparatus for inputting a multidimensional vector data group,
The device
A first step of estimating the number k of clusters by clustering a multi-dimensional vector data group;
The second step of inputting multi-dimensional vector data to a neural network having an output layer of k clusters and outputting classification probabilities for each of k classes is performed.

本発明のプログラム、装置及び方法によれば、事前学習なしに、未知の複数次元のベクトルデータ群をクラス分類することができる。 According to the program, apparatus, and method of the present invention, it is possible to classify unknown multi-dimensional vector data groups without prior learning.

未知のビッグデータの分析における課題を表す説明図である。It is explanatory drawing showing the subject in the analysis of unknown big data. 本発明における分析装置の機能構成図である。It is a functional block diagram of the analyzer in this invention. 次元圧縮部を表す説明図である。It is explanatory drawing showing a dimension compression part. クラスタ推定部及びクラス分類部を表す説明図である。It is explanatory drawing showing a cluster estimation part and a class classification | category part.

以下では、本発明の実施の形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図２は、本発明における分析装置の機能構成図である。 FIG. 2 is a functional configuration diagram of the analyzer according to the present invention.

分析装置１は、未知の複数次元のベクトルデータ群をクラス分類するものである。図２によれば、分析装置１は、ベクトルデータ群蓄積部１００と、次元圧縮部１０１と、クラスタ数推定部１１と、クラス分類部１２と、教師データ生成部１３とを有する。これら機能構成部は、装置に搭載されたコンピュータを機能させるプログラムを実行することによって実現される。また、これら機能構成部の処理の流れは、ベクトルデータ群のクラス分類方法としても理解できる。 The analysis apparatus 1 classifies an unknown multi-dimensional vector data group. As shown in FIG. 2, the analysis apparatus 1 includes a vector data group storage unit 100, a dimension compression unit 101, a cluster number estimation unit 11, a class classification unit 12, and a teacher data generation unit 13. These functional components are realized by executing a program that causes a computer installed in the apparatus to function. Further, the processing flow of these functional components can be understood as a method of classifying vector data groups.

［ベクトルデータ群蓄積部１００］
ベクトルデータ群蓄積部１００は、未知の複数次元のベクトルデータ群を蓄積している。「複数次元」とは、複数のセンサから出力されたベクトルデータを混在させたものであることを意味する。
複数次元のベクトルデータ群としては、例えば動体センサ（加速度センサや地磁気センサなど）、生体センサ（心拍センサなど）、環境センサ（温湿度センサ、気圧センサ、圧力センサなど）、光センサ（超音波センサや赤外線センサなど）、音響センサのような様々なベクトルデータを混在させることができる。本発明によれば、それらベクトルデータ群を時系列に取得することができれば、正常時と異常時とを大まかに判定することができる。 [Vector Data Group Storage Unit 100]
The vector data group storage unit 100 stores an unknown multi-dimensional vector data group. “Multiple dimensions” means a mixture of vector data output from a plurality of sensors.
Examples of multi-dimensional vector data groups include moving body sensors (such as acceleration sensors and geomagnetic sensors), biological sensors (such as heart rate sensors), environmental sensors (such as temperature and humidity sensors, atmospheric pressure sensors, and pressure sensors), and optical sensors (ultrasound sensors). And infrared sensors) and various vector data such as acoustic sensors can be mixed. According to the present invention, if these vector data groups can be acquired in time series, the normal time and the abnormal time can be roughly determined.

［次元圧縮部１０１］
次元圧縮部１０１は、オプション的なものであって、クラスタ数推定部１１の前段にあって、複数次元のベクトルデータ群を次元圧縮する。次元圧縮部１０１は、次元圧縮した低次元ベクトルデータ群を、クラスタ数推定部１１へ出力する。ここでの低次元ベクトルデータ群とは、２〜３次元程度のベクトルデータ群を意味する。 [Dimension compression unit 101]
The dimension compressing unit 101 is optional and is in the preceding stage of the cluster number estimating unit 11 and dimension compresses a multi-dimensional vector data group. The dimension compression unit 101 outputs the dimension-compressed low-dimensional vector data group to the cluster number estimation unit 11. Here, the low-dimensional vector data group means a vector data group of about 2 to 3 dimensions.

図３は、次元圧縮部を表す説明図である。 FIG. 3 is an explanatory diagram illustrating the dimension compression unit.

次元圧縮部１０１は、ｔ−ＳＮＥ(t-Distributed Stochastic Neighbor Embedding)に基づくものである（例えば非特許文献１参照）。これは、２点間の「近さ」を確率分布で表現する手法である。

ｐ_ij：ｘ_iからｘ_jの近さを表す同時分布
ｐ_j|i：平均ｘ_iに従うガウス分布についてｘ_jが抽出される確率密度
σ_i ²：平均ｘ_iのガウス分布の分散（異なる2点の類似度のみを表す）
ｐ_i|i＝0

次元削減後も元のデータ構造を完全に再現できていれば、ｐ_ij＝ｑ_ijとなる。そのために、ｔ−ＳＮＥでは、ｐ_ijとｑ_ijとの誤差がなるべく小さくなるように次元削減を目指す。
ｔ−ＳＮＥでは、分布間の距離を測る指標として、カルバック・ライブラー・ダイバージェンス(Kullback-Leibler-divergence)を用いる。

ｔ−ＳＮＥでは、ｐ_jiとｑ_jiとを用いて目的関数Ｃを最小化する。解析的に最小解を求めることができないので、勾配法を用いる。

収束した後のＹ＝｛ｙ1，ｙ2，・・・，ｙn｝が、出力される。次元圧縮により、高次元データを人間が視覚的に把握することができる。 The dimension compression unit 101 is based on t-SNE (t-Distributed Stochastic Neighbor Embedding) (see, for example, Non-Patent Document 1). This is a method of expressing “closeness” between two points by a probability distribution.

p _ij: the joint distribution of x _i represents the closeness of x _j p j _{| i:} average x probability density x _j for a Gaussian distribution is extracted according to the _i sigma _i ^2: variance of the Gaussian distribution with mean x _i (two different (Represents only point similarity)
p _{i | i} = 0

If the original data structure can be completely reproduced even after dimension reduction, p _ij = q _ij . For this purpose, t-SNE aims to reduce dimensions so that the error between p _ij and q _ij is as small as possible.
In t-SNE, Kullback-Leibler-divergence is used as an index for measuring the distance between distributions.

In t-SNE, the objective function C is minimized using p _ji and q _ji . Since the minimum solution cannot be obtained analytically, the gradient method is used.

Y = {y1, y2,..., Yn} after convergence is output. With dimensional compression, humans can visually grasp high-dimensional data.

図４は、クラスタ推定部及びクラス分類部を表す説明図である。 FIG. 4 is an explanatory diagram illustrating a cluster estimation unit and a class classification unit.

［クラスタ数推定部１１］
クラスタ数推定部１１は、複数次元のベクトルデータ群をクラスタリングすることによって、クラスタ数ｋを推定する。
クラスタ数推定部１１は、ＤＢＳＣＡＮ(Density-Based Spatial Clustering of Applications with Noise)に基づくものである。ＤＢＳＣＡＮは、クラスタ数を予め設定する必要がないために、クラスタ数を未知として、ベクトルデータ群の分類が可能となる。 [Cluster Number Estimator 11]
The cluster number estimation unit 11 estimates the cluster number k by clustering a multi-dimensional vector data group.
The cluster number estimation unit 11 is based on DBSCAN (Density-Based Spatial Clustering of Applications with Noise). Since the DBSCAN does not need to set the number of clusters in advance, the vector data group can be classified with the number of clusters unknown.

ＤＢＳＣＡＮは、密度ベースのクラスタリング方法であり、以下の２つのパラメータを用いる。
距離の閾値：ε(Eps-neighborhood of a point)
対象個数の閾値：minPts(a minimum number of points)
データの点を、以下の３つの種類に分類する。
コア点：半径ε以内に少なくともminPts個の隣接点を持つ点
到達可能点：半径ε以内にminPts個ほどは隣接点がないが、
半径ε以内にCore pointsを持つ点
外れ値：半径ε以内に隣接点がない点
コア点の集まりからクラスタを作成し、到達可能点を各クラスタに割り当てる。
即ち、ＤＢＳＣＡＮは、ある空間に点集合が与えられたとき、高い密度領域にある点同士をグループとしてまとめて、低い密度領域にある点を外れ値とする。
そして、クラスタ数推定部１１は、ＤＢＳＣＡＮによって、最適なクラスタ数ｋを推定し、それをクラス分類部１２へ出力する。 DBSCAN is a density-based clustering method and uses the following two parameters.
Distance threshold: ε (Eps-neighborhood of a point)
Target number threshold: minPts (a minimum number of points)
Data points are classified into the following three types.
Core point: A point with at least minPts adjacent points within radius ε Reachable point: There are not as many adjacent points as minPts within radius ε,
Outliers with Core points within radius ε: Points with no adjacent points within radius ε Create clusters from a collection of core points and assign reachable points to each cluster.
That is, when a set of points is given to a certain space, DBSCAN collects points in a high density region as a group and sets points in a low density region as outliers.
Then, the cluster number estimation unit 11 estimates the optimum cluster number k by DBSCAN and outputs it to the class classification unit 12.

［クラス分類部１２］
クラス分類部１２は、複数次元のベクトルデータを、出力層をクラスタ数ｋとするニューラルネットワークに入力し、ｋ個のクラス毎の分類確率を出力する。クラスタ数ｋは、クラスタ数推定部１１によって推定されたものである。 [Class classification unit 12]
The class classification unit 12 inputs multi-dimensional vector data to a neural network having an output layer with the number of clusters k, and outputs a classification probability for each of k classes. The cluster number k is estimated by the cluster number estimation unit 11.

クラス分類部１２は、全結合型のニューラルネットワークにおける出力層の活性化関数として、ソフトマックス関数を用いて、合計確率１とするクラス毎の分類確率を出力する。 The class classification unit 12 outputs a classification probability for each class with a total probability of 1 using a softmax function as an activation function of the output layer in the fully connected neural network.

ニューラルネットワークとは、生体の脳における特性を計算機上のシミュレーションによって表現することを目指した数学モデルをいう。シナプスの結合によってネットワークを形成した人工ニューロン（ユニット）が、学習によってシナプスの結合強度を変化させ、問題解決能力を持つようなモデル全般をいう。 A neural network is a mathematical model that aims to express characteristics in the brain of a living body by computer simulation. This refers to all models in which an artificial neuron (unit) that forms a network by synaptic connections changes the strength of synaptic connections through learning and has problem-solving ability.

図４によれば、順伝播型の畳み込みニューラルネットワーク(Convolutional Neural Network, CNN)として、入力層(input layer)と、中間層(hidden layer)と、出力層(output layer)との３つの層から構成され、入力層から出力層へ向けて一方向に伝播する。中間層は、グラフ状に複数の層から構成するものであってもよい。各層は、複数のユニット（ニューロン）を持ち、前方層のユニットから後方層のユニットへつなぐ関数のパラメータを、「重み(weight)」と称す。学習とは、この関数のパラメータとして、適切な「重み」を算出することにある。 According to FIG. 4, a forward propagation type convolutional neural network (CNN) includes three layers: an input layer, an intermediate layer, and an output layer. Configured and propagates in one direction from the input layer to the output layer. The intermediate layer may be composed of a plurality of layers in a graph. Each layer has a plurality of units (neurons), and a parameter of a function connected from a unit in the front layer to a unit in the rear layer is referred to as “weight”. Learning is to calculate an appropriate “weight” as a parameter of this function.

本発明のニューラルネットワークは、分類問題（データがどのクラスに属するかを判別）としてソフトマックス関数を適用する。出力層が全部でｋ個あるとして、ｍ番目の出力ｙ_mを、以下のように表す。
ｙ_m＝exp(ｘ_m)／Σ_i=1 ^kexp(ｘ_i)
exp(ｘ)：ｅ^ｘを表す指数関数（eは2.7182・・・のネイピア数）
ｘ_m：入力信号 The neural network of the present invention applies a softmax function as a classification problem (determining which class the data belongs to). As the output layer is k pieces in total, the m-th output y _m, expressed as follows.
y _m = exp (x _m ) / Σ _{i = 1} ^k exp (x _i )
exp (x): exponential function representing e ^x (e is the number of Napiers of 2.7182 ...)
x _m : Input signal

ソフトマックス関数の出力は、全ての入力信号から矢印による結びつきがある。出力の各ニューロンが全ての入力信号ｘ_mから影響を受ける。
また、ソフトマックス関数の出力の総和は１となり、その性質によりソフトマックス関数の出力を「確率」として解釈することができる。この確率の結果から、どのクラスに属するかと判断することができる。
ソフトマックス関数により判別されたクラスは、未知の複数センサデータのクラスとなる。 The output of the softmax function is linked by an arrow from all input signals. Each neuron in the output is affected by all input signals x _m .
Also, the sum of the outputs of the softmax function is 1, and the output of the softmax function can be interpreted as “probability” due to its property. From this probability result, it can be determined which class it belongs to.
The class determined by the softmax function is a class of unknown multiple sensor data.

但し、本発明によれば、クラス分類手法として、教師有り学習のサポートベクターマシン(Support Vector Machine)のようなパターン認識モデルは使用しない。本発明によれば、教師無し学習としてクラス分類するためである。 However, according to the present invention, a pattern recognition model such as a supervised learning support vector machine is not used as a classification method. According to the present invention, classification is performed as unsupervised learning.

［教師データ生成部１３］
教師データ生成部１３は、クラス分類部１２によって最も高い分類確率となるクラスのラベルを、当該複数次元のベクトルデータに付与することによって、教師データを生成する。その教師データは、教師データ蓄積部に蓄積することによって、教師有り学習モデルへ適用することもできる。 [Teacher data generation unit 13]
The teacher data generation unit 13 generates teacher data by giving a class label having the highest classification probability to the multi-dimensional vector data by the class classification unit 12. The teacher data can be applied to a supervised learning model by accumulating in the teacher data accumulating unit.

以上、詳細に説明したように、本発明のプログラム、装置及び方法によれば、事前学習なしに、未知の複数次元のベクトルデータ群をクラス分類することができる。 As described above in detail, according to the program, apparatus, and method of the present invention, it is possible to classify unknown multi-dimensional vector data groups without prior learning.

本発明によれば、汎用的に、多種多様且つ大量であって未知の多数次元のベクトルデータを、事前学習なしに分類することができる。また、クラスタリングの際に、学習モデルの精度にも影響を与えるクラスタ数ｋを予め指定する必要もない。この点も、未知の多数次元のベクトルデータに対する分類に適する。 According to the present invention, it is possible to classify a wide variety of large-scale and unknown multi-dimensional vector data without prior learning. Further, when clustering, it is not necessary to specify the number of clusters k that also affects the accuracy of the learning model in advance. This point is also suitable for classification of unknown multidimensional vector data.

更に、本発明によれば、未知の複数次元のベクトルデータ群をクラス分類することができるので、多種多様なセンサデータを混在させることもできる。
例えば動体センサをインフラ設備（例えば橋や鉄塔、柵等）に設置して、そのセンサデータ群を、例えば正常状態と異常状態とに大まかに分類することもできる。
また、例えば動体センサや生体センサを人や動物に装着することによって、そのセンサデータ群を、例えば人や動物の行動把握（走る、歩く、静止等）に分類することもできる。
更に、例えば環境センサや光センサ、音響センサを、既存の機械設備に装着することによって、そのセンサデータ群を、例えば正常状態と異常状態とに大まかに分類することもできる。 Furthermore, according to the present invention, unknown multi-dimensional vector data groups can be classified, so that a wide variety of sensor data can be mixed.
For example, a moving body sensor can be installed in an infrastructure facility (for example, a bridge, a steel tower, a fence, etc.), and the sensor data group can be roughly classified into, for example, a normal state and an abnormal state.
Further, for example, by attaching a moving body sensor or a biological sensor to a person or an animal, the sensor data group can be classified into, for example, grasping behavior (running, walking, stillness, etc.) of the person or animal.
Furthermore, for example, by attaching an environmental sensor, an optical sensor, or an acoustic sensor to an existing mechanical facility, the sensor data group can be roughly classified into, for example, a normal state and an abnormal state.

前述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 Various changes, modifications, and omissions of the above-described various embodiments of the present invention can be easily made by those skilled in the art. The above description is merely an example, and is not intended to be restrictive. The invention is limited only as defined in the following claims and the equivalents thereto.

１分析装置
１００ベクトルデータ群蓄積部
１０１次元圧縮部
１１クラスタ数推定部
１２クラス分類部
１３教師データ生成部
DESCRIPTION OF SYMBOLS 1 Analysis apparatus 100 Vector data group storage part 101 Dimension compression part 11 Cluster number estimation part 12 Class classification part 13 Teacher data generation part

Claims

複数次元のベクトルデータ群をクラス分類する装置に搭載されたコンピュータを機能させるプログラムであって、
複数次元のベクトルデータ群をクラスタリングすることによって、クラスタ数ｋを推定するクラスタ数推定手段と、
複数次元のベクトルデータを、出力層をクラスタ数ｋとするニューラルネットワークに入力し、ｋ個のクラス毎の分類確率を出力するクラス分類手段と
してコンピュータを機能させることを特徴とするプログラム。 A program for causing a computer mounted in an apparatus for classifying a multidimensional vector data group to function,
Cluster number estimation means for estimating the number k of clusters by clustering a multidimensional vector data group;
A program for causing a computer to function as class classification means for inputting multi-dimensional vector data to a neural network whose output layer has k number of clusters and outputting classification probabilities for each of k classes.

前記クラスタ数推定手段は、ＤＢＳＣＡＮ(Density-Based Spatial Clustering of Applications with Noise)に基づくものである
ようにコンピュータを機能させることを特徴とする請求項１に記載のプログラム。 The program according to claim 1, wherein the cluster number estimating means causes a computer to function so as to be based on DBSCAN (Density-Based Spatial Clustering of Applications with Noise).

前記クラス分類手段は、前記ニューラルネットワークにおける出力層の活性化関数として、ソフトマックス関数を用いて、合計確率１とするクラス毎の分類確率を出力する
ようにコンピュータを機能させることを特徴とする請求項１又は２に記載のプログラム。 The class classification means causes a computer to function so as to output a classification probability for each class with a total probability of 1, using a softmax function as an output layer activation function in the neural network. Item 3. The program according to item 1 or 2.

前記クラスタ数推定手段の前段にあって、複数次元のベクトルデータ群を次元圧縮する次元圧縮手段を更に有し、
前記次元圧縮手段から出力された低次元ベクトルデータ群を、前記クラスタ数推定手段へ入力する
ようにコンピュータを機能させることを特徴とする請求項１から３のいずれか１項に記載のプログラム。 In the preceding stage of the cluster number estimation means, further comprising a dimension compression means for dimensionally compressing a multidimensional vector data group,
The program according to any one of claims 1 to 3, wherein the computer is caused to function so as to input a low-dimensional vector data group output from the dimension compression means to the cluster number estimation means.

前記次元圧縮手段は、ｔ−ＳＮＥ(t-Distributed Stochastic Neighbor Embedding)に基づくものであり、
前記低次元ベクトルデータ群は、２次元又は３次元のベクトルデータ群である
ようにコンピュータを機能させることを特徴とする請求項４に記載のプログラム。 The dimension compression means is based on t-SNE (t-Distributed Stochastic Neighbor Embedding),
The program according to claim 4, wherein the computer functions so that the low-dimensional vector data group is a two-dimensional or three-dimensional vector data group.

複数次元のベクトルデータ群は、複数のセンサから出力されたベクトルデータを混在させたものである
ようにコンピュータを機能させることを特徴とする請求項１から５のいずれか１項に記載のプログラム。 The program according to any one of claims 1 to 5, wherein the computer functions so that the multi-dimensional vector data group is a mixture of vector data output from a plurality of sensors.

前記クラス分類手段によって最も高い分類確率となるクラスのラベルを、当該複数次元のベクトルデータに付与することによって、教師データを生成する教師データ生成手段と
してコンピュータを更に機能させることを特徴とする請求項１から６のいずれか１項に記載のプログラム。 The computer further functions as teacher data generation means for generating teacher data by giving a class label having the highest classification probability to the multi-dimensional vector data by the class classification means. The program according to any one of 1 to 6.

複数次元のベクトルデータ群をクラス分類する装置であって、
複数次元のベクトルデータ群をクラスタリングすることによって、クラスタ数ｋを推定するクラスタ数推定手段と、
複数次元のベクトルデータを、出力層をクラスタ数ｋとするニューラルネットワークに入力し、ｋ個のクラス毎の分類確率を出力するクラス分類手段と
を有することを特徴とする装置。 An apparatus for classifying a multi-dimensional vector data group,
Cluster number estimation means for estimating the number k of clusters by clustering a multidimensional vector data group;
An apparatus comprising: class classification means for inputting multi-dimensional vector data to a neural network having an output layer of k clusters and outputting classification probabilities for each of k classes.

複数次元のベクトルデータ群を入力する装置のクラス分類方法であって、
前記装置は、
複数次元のベクトルデータ群をクラスタリングすることによって、クラスタ数ｋを推定する第１のステップと、
複数次元のベクトルデータを、出力層をクラスタ数ｋとするニューラルネットワークに入力し、ｋ個のクラス毎の分類確率を出力する第２のステップと
を実行することを特徴とする装置のクラス分類方法。 A class classification method for a device that inputs a multidimensional vector data group,
The device is
A first step of estimating the number k of clusters by clustering a multi-dimensional vector data group;
A class classification method for an apparatus, comprising: inputting a plurality of dimensional vector data to a neural network having an output layer having a cluster number k; and executing a second step of outputting a classification probability for each of k classes. .