JP6893483B2

JP6893483B2 - Information estimation device and information estimation method

Info

Publication number: JP6893483B2
Application number: JP2018021943A
Authority: JP
Inventors: 仁吾安達
Original assignee: Denso IT Laboratory Inc
Current assignee: Denso IT Laboratory Inc
Priority date: 2018-02-09
Filing date: 2018-02-09
Publication date: 2021-06-23
Anticipated expiration: 2038-02-09
Also published as: JP2019139482A

Description

本発明は、ニューラルネットワークを用いた推定処理を行う情報推定装置及び情報推定方法に関する。本発明は、特に、オートエンコーダの一種であるバリエーショナルオートエンコーダ（Variational AutoEncoder）を改良した情報推定装置及び情報推定方法に関する。 The present invention relates to an information estimation device and an information estimation method that perform estimation processing using a neural network. The present invention particularly relates to an information estimation device and an information estimation method in which a variational autoencoder, which is a kind of autoencoder, is improved.

ニューラルネットワーク（ＮＮ： Neural Network）を用いた推定器は、他の推定器と比べ、画像やセンサー信号データなど、大量の情報を入力データとして処理し、推定を行うことができることから様々な分野への応用に期待されている。 Compared to other estimators, estimators using neural networks (NN: Neural Network) can process a large amount of information such as images and sensor signal data as input data and perform estimation, so they can be used in various fields. It is expected to be applied to.

ニューラルネットワークには、オートエンコーダ（Auto-encoder）と呼ばれるものが存在する。オートエンコーダはニューラルネットワークによる教師無し学習器であり、典型的には、オートエンコーダのニューラルネットワーク構造において、入力層で次元数を意味するニューロン数が多く、徐々に後続の層のニューロンの数が減っていき、中心部分の潜在空間を表す層で最も次元数が圧縮されてニューロンの数が少なくなる。一方、中心部分の潜在空間を表す層以降では、逆にニューロンの数が増えていき、最後の出力層ではニューロンの数が入力層と同じになる構造を有している。すなわち、入力層の次元数と出力層の次元数は同一であり、中心部分の潜在空間を表す層の次元数は、入力層及び出力層の次元数よりも少なく設定される。なお、入力層から潜在空間を表す層までの前半部分はエンコーダと呼ばれ、潜在空間を表す層から出力層までの後半部分はデコーダと呼ばれる。 There is what is called an auto-encoder in a neural network. An autoencoder is an unsupervised learner with a neural network, typically in the neural network structure of an autoencoder, the number of neurons in the input layer, which means the number of dimensions, is large, and the number of neurons in the subsequent layers gradually decreases. In the layer representing the latent space in the central part, the number of dimensions is compressed most and the number of neurons decreases. On the other hand, after the layer representing the latent space in the central part, the number of neurons increases conversely, and the final output layer has a structure in which the number of neurons becomes the same as that of the input layer. That is, the number of dimensions of the input layer and the number of dimensions of the output layer are the same, and the number of dimensions of the layer representing the latent space of the central portion is set to be smaller than the number of dimensions of the input layer and the output layer. The first half from the input layer to the layer representing the latent space is called an encoder, and the second half from the layer representing the latent space to the output layer is called a decoder.

ラベルのない学習データ（ｎ_Xin次元のベクトルｘ）を入力すると、まずエンコーダで、次元数を減らした潜在空間のデータ（ｎ_z次元のベクトルｚ：潜在変数とも呼ばれる）に圧縮される。潜在空間の中では元データの類似度に応じて複数の塊に集まって存在する。さらに、その圧縮された空間のデータｚがデコーダを通り、入力ｘを復元（Reconstruction）することができる。これが古典的なオートエンコーダであり、固定値である入力ｘに基づいてオートエンコーダから出力される値は、入力ｘと同様にある固定値に一意的に決まり、決定論的（Deterministic）である。 When unlabeled training data (n _Xin- dimensional vector x) is input, the encoder first _{compresses it into latent space data (n z-} dimensional vector z: also called latent variable) with a reduced number of dimensions. In the latent space, they are gathered in a plurality of chunks according to the similarity of the original data. Further, the data z in the compressed space can pass through the decoder and the input x can be reconstructed. This is a classical autoencoder, and the value output from the autoencoder based on the input x, which is a fixed value, is uniquely determined by a fixed value like the input x, and is deterministic (Deterministic).

それに対し、確率的要素を含んだもの、すなわち、ある固定された入力ｘに対して毎回計算ごとに出力値が変わる確率的な（Stochastic）オートエンコーダとして、非特許文献１には、バリエーショナルオートエンコーダ（Variational AutoEncoder、以降、ＶＡＥと略す）が提案されている。 On the other hand, as a probabilistic (Stochastic) autoencoder that includes a stochastic element, that is, the output value changes for each calculation for a certain fixed input x, Variational Auto is described in Non-Patent Document 1. An encoder (Variational AutoEncoder, hereinafter abbreviated as VAE) has been proposed.

上述した古典的なオートエンコーダでは、入力されたベクトルデータｘに対し、圧縮されたｎ_z次元の潜在空間でのベクトルデータｚが一意的に決まるよう構成されているが、一方、ＶＡＥでは、入力されたベクトルデータｘに対し、圧縮されたｎ_z次元の潜在空間でのベクトルｚが一意的に決まるのではなく、ある事後確率分布ｐ（ｚ｜ｘ）をとる確率変数のベクトルとして求められる。その事後確率分布ｐ（ｚ｜ｘ）は、例えば、ｎ_z次元の多変量ガウス分布によって表される。以下、非特許文献１において提案されている理論について説明する。 _{The classic autoencoder described above is configured to uniquely determine the vector data z in the compressed nz-} dimensional latent space with respect to the input vector data x, whereas the VAE is configured to uniquely determine the input vector data z. _{The vector z in the} compressed nz-dimensional latent space is not uniquely determined with respect to the obtained vector data x, but is obtained as a vector of a random variable having a certain posterior probability distribution p (z | x). The posterior probability distribution p (z | x) is _{represented by, for example, an nz-} dimensional multivariate Gaussian distribution. Hereinafter, the theory proposed in Non-Patent Document 1 will be described.

ＶＡＥでは、与えられたデータｘは、それを生じさせる元となった潜在的要因の全てのｚの値を積分することで説明される。それは数式的に以下のように記述される。 In a VAE, given data x is described by integrating the values of all z of the potential factors that gave rise to it. It is mathematically described as follows.

ここで、ｐ_θとは、あるパラメータθでその分布形状が決定される確率を意味する。右辺の全てのｚを積分することで説明されたデータｘの確率が大きいほど、データｘが説明されていることを意味する。 Here, p _θ means the probability that the distribution shape is determined by a certain parameter θ. The greater the probability of the data x explained by integrating all the z on the right side, the more the data x is explained.

データｘが与えられたとき、その要因となった潜在的確率変数ｚはどのような分布をとるのかを表す事後確率分布ｐ（ｚ｜ｘ）を求めたい。しかし、この事後確率分布ｐ（ｚ｜ｘ）は解析的には計算不可能であるため、例えば変分法が用いられる。すなわち、ｐ（ｚ｜ｘ）に近いとされたある提案関数ｑ_φ（あるパラメータφでその分布形状が決定される確率分布）があると仮定すると、以下の関係式が成り立ち、この関係式から提案関数ｑ_φを求め、それをｐ（ｚ｜ｘ）の近似解とすることができる。 Given the data x, we want to find the posterior probability distribution p (z | x) that represents the distribution of the potential random variable z that caused it. However, since this posterior probability distribution p (z | x) cannot be calculated analytically, for example, the variational method is used. _{That is, assuming that there is a proposed function q φ} (probability distribution in which the distribution shape is determined by a certain parameter φ) that is close to p (z | x), the following relational expression holds, and from this relational expression The proposed function q _φ can be obtained and used as an approximate solution of p (z | x).

ここで、上式（１）の左辺は、前述の与えられたデータｘの説明がどれだけできるか、もっともらしさを表す対数尤度である。 Here, the left side of the above equation (1) is a log-likelihood that expresses the plausibility of how much the above-mentioned given data x can be explained.

上式（１）の右辺第１項のＤ_KLは、ＫＬダイバージェンス（KL Divergence）を意味し、２つの関数がどれだけ近いか、距離を表すゼロ以上の値を返す関数である。事後確率分布ｐ（ｚ｜ｘ）を近似させた提案関数ｑ_φを求めるためには、その分布がどういう関数で表されるのかを決め、その関数のパラメータθ、φを決定する。大量にあるデータｘに対して、前述の式がより最適な状態でパラメータθ、φで成り立っているとすると、左辺の尤度のｌｏｇｐ_θ（ｘ）が説明できているので高いはずであり、提案関数ｑ_φが、知ることができない事後確率分布ｐ（ｚ｜ｘ）に近づいているので右辺第１項のＤ_KLはゼロに近づくとみなせる。 _{D KL of} the first term on the right side of the above equation (1) means KL Divergence, and is a function that returns a value of zero or more indicating how close the two functions are and the distance. _{In order to obtain the proposed function q φ} that approximates the posterior probability distribution p (z | x), it is determined what kind of function the distribution is represented by, and the parameters θ and φ of that function are determined. Assuming that the above equation holds for the parameters θ and φ in a more optimal state for a large amount of data x, it should be high because _{the logp θ (x) of the likelihood on the left side can be explained.} Since the proposed function q _φ approaches the posterior probability distribution p (z | x) that cannot be known, it can be _{considered that the D KL of} the first term on the right side approaches zero.

一方、右辺第２項をＬ（θ，φ；ｘ）と書くと、右辺第２項は、以下のように２つの項で表される。 On the other hand, if the second term on the right side is written as L (θ, φ; x), the second term on the right side is represented by two terms as follows.

上式（２）の第１項は、正則化（Regularization）を意味する項であり、上式（２）の第２項は、入力されたデータを出力において復元（Reconstruction）できるかを意味する項である。尤度を表すｌｏｇｐ_θ（ｘ）を高くするためには、Ｌ（θ，φ；ｘ）を最大化する必要があり、上式（２）の第１項及び第２項を最大化させる必要がある。学習における最適化とは、大量の学習データｘに対して目的関数Ｌ（θ，φ；ｘ）を最大にするパラメータθ、φを求めることである。そのためには、大量のデータ処理能力のあるニューラルネットワークを用いることが最適であり、パラメータ最適化計算ツールとして使用する。 The first term of the above equation (2) means regularization, and the second term of the above equation (2) means whether the input data can be reconstructed at the output. It is a term. _{In order to increase the logp θ} (x) representing the likelihood, it is necessary to maximize L (θ, φ; x), and it is necessary to maximize the first and second terms of the above equation (2). There is. Optimization in learning is to find parameters θ and φ that maximize the objective function L (θ, φ; x) for a large amount of learning data x. For that purpose, it is optimal to use a neural network capable of processing a large amount of data, and it is used as a parameter optimization calculation tool.

非特許文献１で提案されているＶＡＥでは、ｑ_φ（ｚ｜ｘ）を、ｎ_z次元多変量ガウス分布と考えて、その形状を決定するパラメータφを、ガウス分布の平均μ_zと分散共分散行列Σ_zの分散ｄｉａｇ（Σ_z）の２つであるとして計算している。なお、ｄｉａｇは行列の対角項を意味している。また、残りの非対角部分ｏｆｆｄｉａｇ（Σ_z）に関しては、非特許文献１ではゼロとしており、したがって、共分散値ｏｆｆｄｉａｇ（Σ_z）に関しては、非特許文献１で提案されているＶＡＥでは計算されず、指定されていない。すなわち、非特許文献１で提案されているＶＡＥでは、以下の式のような条件が設定されている。 In the VAE proposed in Non-Patent Document 1, q _φ (z | x) _{is regarded as an nz-} dimensional multivariate Gaussian distribution, and the parameter φ that determines its shape is the mean μ _{z of the} Gaussian distribution and the variance. It is calculated assuming that there are two variance diags (Σ _z ) of the variance matrix Σ _z. Note that diag means the diagonal term of the matrix. Further, the remaining off-diagonal portion offdiag (Σ _z ) is set to zero in Non-Patent Document 1, and therefore, the covariance value offdiag (Σ _z ) is calculated in VAE proposed in Non-Patent Document 1. Not specified and not specified. That is, in the VAE proposed in Non-Patent Document 1, the following conditions are set.

パラメータφはエンコーダの出力値として計算され、潜在空間の層のニューロン数は、ｎ_z次元×２となる。つまり、以下のｎ_z次元×２個のパラメータの値が順番にエンコーダから出力される。 The parameter φ is calculated as the output value of the encoder, and the number of neurons in the layer of the latent space is _nz dimension × 2. That is, _{the values of the following n z} dimensions × 2 parameters are output from the encoder in order.

前述のように、最適化計算では、目的関数Ｌ（θ，φ；ｘ）を最大化する必要があり、そのためには、正則化を意味する上式（２）の第１項 As described above, in the optimization calculation, it is necessary to maximize the objective function L (θ, φ; x), and for that purpose, the first term of the above equation (2) meaning regularization.

を最大にする必要がある。この項を最大にするということは、 Should be maximized. Maximizing this term means

を最小化するということであり、求めようとする分布ｑ_φ（ｚ｜ｘ）が分布ｐ_θ（ｚ）にできるだけ近い形状でなければならないということである。ｐ_θ（ｚ）はｚの事前分布ｐ_θ（ｚ）を意味し、非特許文献１によれば、平均μ₀はゼロの値のベクトル、分散値Σ₀は単位ベクトルとなる、以下の式のような標準ガウス分布として計算する。 This means that the distribution q _φ (z | x) to be obtained must have a shape as close as possible to the _{distribution p θ (z).} p _θ (z) means the prior distribution p _θ (z) of z, and according to Non-Patent Document 1, the average μ ₀ is a vector of zero values, and the variance value Σ ₀ is a unit vector. Calculate as a standard Gaussian distribution such as.

上記の式より、正則化を意味する上式（２）の第１項は、以下の式のように表される。 From the above equation, the first term of the above equation (2), which means regularization, is expressed as the following equation.

もう１つのパラメータθは、非特許文献１によると、デコーダの出力値を意味することになる。デコーダでは、ある具体的なｚの値をサンプリングし、前述のように得られた確率分布ｑ_φ（ｚ｜ｘ）、すなわち、知り得ない事後確率ｐ（ｚ｜ｘ）に限りなく近づけた確率分布ｑ_φ（ｚ｜ｘ）から復元させる。前述の復元に関する上式（２）の第２項は、復元されたｘが、入力されたデータｘに対応して同じ値となるかを表す対数尤度を意味する。 According to Non-Patent Document 1, the other parameter θ means the output value of the decoder. In the decoder, a specific value of z is sampled, and the probability distribution q _φ (z | x) obtained as described above, that is, the probability of getting as close as possible to the unknown posterior probability p (z | x). Restore from the distribution q _φ (z | x). The second term of the above equation (2) regarding the above-mentioned restoration means a log-likelihood indicating whether the restored x has the same value corresponding to the input data x.

つまり、前述のようにデコーダの最終層から出力される値はｘそのものではなく、そのｘがとる確率分布ｐ_θ（ｘ｜ｚ）の形状を決定するパラメータθとする。仮に、データｘが白黒の画像である場合、その確率分布をベルヌーイ分布と置き、ベルヌーイ分布を決定するパラメータθを使って、入力ｘと同じである確率ｐ_θ（ｘ｜ｚ）を計算し、さらにそのｌｏｇをとることでｌｏｇ［ｐ_θ（ｚ｜ｘ）］を計算する。前述の復元に関する上式（２）の第２項の期待値の部分 That is, as described above, the value output from the final layer of the decoder is not x itself, but a parameter θ that determines the shape of the _{probability distribution pθ (x | z) taken by that x.} If the data x is a black-and-white image, the probability distribution is set as the Bernoulli distribution, and the parameter θ that determines the Bernoulli distribution is used _{to calculate the probability p θ} (x | z) that is the same as the input x. Further, the log [p _θ (z | x)] is calculated by taking the log. The expected value of the second term of the above equation (2) regarding the above-mentioned restoration

は、バッチの複数のサンプルで処理することで、同等の期待値計算をしているものとみなされる。 Is considered to have the same expected value calculation by processing with multiple samples in batch.

図１は、従来技術におけるＶＡＥの一例を模式的に示す図である。図１に示すように、入力Ｘ（ｎ_Xin次元のベクトル）は、ニューラルネットワークで構成されたエンコーダを通り、エンコーダから、ガウス分布の平均（ｎ_z次元）と分散値（ｎ_z次元）とが出力される。また、エンコーダの出力結果に基づいてある具体的なｚの値がサンプリングされて、ニューラルネットワークで構成されたデコーダに入力され、デコーダからｎ_Xout次元のベクトルが出力される。なお、デコーダからの出力は、入力Ｘと同じとなるよう最適化され、入力と出力の次元数は同じ（ｎ_Xin＝ｎ_Xout）である。 FIG. 1 is a diagram schematically showing an example of VAE in the prior art. As shown in FIG. 1, the input X (n _Xin dimension vector) passes through an encoder composed of a neural network, and the mean ( _nz dimension) and the variance value ( _nz dimension) of the Gaussian distribution are obtained from the encoder. It is output. Further, a specific value of z is sampled based on the output result of the encoder and input to the decoder composed of the neural network, and the n _Xout dimension vector is output from the decoder. The output from the decoder is optimized to be the same as the input X, and the number of dimensions of the input and the output is the same (n _Xin = n _Xout ).

国際公開公報ＷＯ２０１４１０５８６６Ａ１International Publication WO201410508666A1

“Auto-Encoding Variational Bayes”, Diederik P. Kingma, Max Welling：２０１３年１２月２０日（https://arxiv.org/abs/1312.6114から取得可能）“Auto-Encoding Variational Bayes”, Diederik P. Kingma, Max Welling: December 20, 2013 (available from https://arxiv.org/abs/1312.6114) “APPROXIMATING THE KULLBACK LEIBLER DIVERGENCE BETWEEN GAUSSIAN MIXTURE MODELS”, John R. Hershey and Peder A. Olsen：２００７年４月１５−２０日（{ HYPERLINK "http://ieeexplore.ieee.org/document/4218101/" ,http://ieeexplore.ieee.org/document/4218101/} から取得可能）"APPROXIMATING THE KULLBACK LEIBLER DIVERGENCE BETWEEN GAUSSIAN MIXTURE MODELS", John R. Hershey and Peder A. Olsen: April 15-20, 2007 ({HYPERLINK "http://ieeexplore.ieee.org/document/4218101/", (Available from http://ieeexplore.ieee.org/document/4218101/})

非特許文献１で提案されているＶＡＥは確率的要素を備えているが、ニューラルネットワークの潜在空間での出力は、ｚの値そのものではなく、ｚがとり得る値の確率分布の形状を決定づけるパラメータである。上述のように、非特許文献１で提案されているＶＡＥでは、ｑ_φ（ｚ｜ｘ）をｎ_z次元多変量ガウス分布と考え、ＶＡＥの潜在空間の層におけるパラメータφはｎ_z個の平均とｎ_z個の分散値であり、また、共分散値はすべてゼロとして単純化している。 The VAE proposed in Non-Patent Document 1 has a stochastic element, but the output of the neural network in the latent space is not the value of z itself, but a parameter that determines the shape of the probability distribution of the values that z can take. Is. As described above, in the VAE proposed in Non-Patent Document 1, q _φ (z | x) _{is considered as an nz-} dimensional multivariate Gaussian distribution, and the parameter φ in the layer of the latent space of the VAE is the average of _nz. And _nz number of variance values, and all covariance values are simplified as zero.

しかしながら、より複雑な分布をとらせようと設計者がデザインする場合には、その分布形状を決定づけるパラメータがより多く必要となる。例えば、潜在空間の分布を１０次元多変量ガウス分布にした場合、その形状を決定づけるパラメータの数は、１０個の平均値、１０個の分散値に加えて、（１０×１０−１０）／２＝４５個の共分散値が必要となる。また、潜在空間の分布を混合ガウス分布などにする場合には、さらに複雑となる。 However, when a designer designs a more complicated distribution, more parameters are required to determine the distribution shape. For example, when the distribution of the latent space is a 10-dimensional multivariate Gaussian distribution, the number of parameters that determine the shape is (10 × 10-10) / 2 in addition to the 10 mean values and 10 variance values. = 45 covariance values are required. Further, when the distribution of the latent space is a mixed Gaussian distribution or the like, it becomes more complicated.

上記の課題を解決するため、本発明は、確率的要素を備えた新たなオートエンコーダを実現する情報推定装置及び情報推定方法を提供することを目的とする。 In order to solve the above problems, it is an object of the present invention to provide an information estimation device and an information estimation method that realize a new autoencoder having a stochastic element.

上記目的を達成するため、本発明によれば、従来技術におけるＶＡＥのエンコーダの潜在空間での出力ｚを、出力ｚの分布を決定づけるパラメータとするのではなく、前述の古典的なオートエンコーダと同様に出力ｚの値そのものであるようにし、かつ、出力ｚの値は、古典的なオートエンコーダのような決定論的なある値ではなく、ある確率分布からサンプリングされた確率変数であるようにした情報推定装置及び情報推定方法が提供される。 In order to achieve the above object, according to the present invention, the output z in the latent space of the VAE encoder in the prior art is not used as a parameter that determines the distribution of the output z, but is the same as the above-mentioned classical autoencoder. The output z value itself is set to be the value of the output z itself, and the value of the output z is not a deterministic value like a classical autoencoder, but a random variable sampled from a probability distribution. An information estimation device and an information estimation method are provided.

上記目的を達成するため、例えば、本発明に係る情報推定装置は、ニューラルネットワークを使用して推定処理を行う情報推定装置であって、
エンコーダ及びデコーダにより構成されたオートエンコーダを備え、前記オートエンコーダに入力された入力データに基づいて前記エンコーダ及び前記デコーダで順次計算処理を行い、前記推定処理の結果として前記オートエンコーダから出力データを出力するよう構成されているオートエンコーダ計算部を有し、
データの一部をドロップアウトさせるドロップアウト層と、前記ドロップアウト層から出力されたデータに対して重みの計算を行う全結合層との組み合わせからなる少なくとも１つの一体化層を、前記エンコーダの最終層として設けることで、前記エンコーダからの出力値である潜在空間での出力値が多次元確率変数ベクトルとなるように構成されている。 In order to achieve the above object, for example, the information estimation device according to the present invention is an information estimation device that performs estimation processing using a neural network.
An autoencoder composed of an encoder and a decoder is provided, calculation processing is sequentially performed by the encoder and the decoder based on input data input to the autoencoder, and output data is output from the autoencoder as a result of the estimation processing. Has an autoencoder calculator that is configured to
The final encoder has at least one integrated layer consisting of a combination of a dropout layer that drops out part of the data and a fully coupled layer that calculates weights for the data output from the dropout layer. By providing it as a layer, the output value in the latent space, which is the output value from the encoder, is configured to be a multidimensional random variable vector.

また、上記目的を達成するため、例えば、本発明に係る情報推定方法は、ニューラルネットワークを使用して推定処理を行う情報推定装置で行われる情報推定方法であって、
エンコーダ及びデコーダにより構成されたオートエンコーダを用いて、前記オートエンコーダに入力された入力データに基づいて前記エンコーダ及び前記デコーダで順次計算処理を行い、前記推定処理の結果として前記オートエンコーダから出力データを出力するオートエンコーダ計算ステップを有し、
データの一部をドロップアウトさせるドロップアウト層と、前記ドロップアウト層から出力されたデータに対して重みの計算を行う全結合層との組み合わせからなる少なくとも１つの一体化層を、前記エンコーダの最終層として設けることで、前記エンコーダからの出力値である潜在空間での出力値を多次元確率変数ベクトルとする。 Further, in order to achieve the above object, for example, the information estimation method according to the present invention is an information estimation method performed by an information estimation device that performs estimation processing using a neural network.
Using an autoencoder composed of an encoder and a decoder, the encoder and the decoder sequentially perform calculation processing based on the input data input to the autoencoder, and as a result of the estimation processing, output data from the autoencoder is output. Has an autoencoder calculation step to output
The final encoder has at least one integrated layer consisting of a combination of a dropout layer that drops out part of the data and a fully coupled layer that calculates weights for the data output from the dropout layer. By providing it as a layer, the output value in the latent space, which is the output value from the encoder, becomes a multidimensional random variable vector.

本発明は、確率的要素を備えた新たなオートエンコーダを実現し、潜在空間における次元数（ニューロンの数）の増加を抑えながら、潜在空間における確率分布についてに任意の確率分布の形状に対応できるという効果を有する。また、本発明は、潜在空間における確率分布の形状を解析的な計算によって推測できるため、潜在空間における入力データの分離の様子をより正確に評価することができるという効果を有する。 The present invention realizes a new autoencoder equipped with a stochastic element, and can correspond to the shape of an arbitrary probability distribution for the probability distribution in the latent space while suppressing an increase in the number of dimensions (the number of neurons) in the latent space. It has the effect of. Further, since the shape of the probability distribution in the latent space can be estimated by analytical calculation, the present invention has an effect that the state of separation of input data in the latent space can be evaluated more accurately.

従来技術におけるＶＡＥの一例を模式的に示す図である。It is a figure which shows typically an example of VAE in the prior art. 本発明の第１の実施の形態におけるオートエンコーダの第１の例を模式的に示す図である。It is a figure which shows typically the 1st example of the autoencoder in 1st Embodiment of this invention. 本発明の第１の実施の形態におけるオートエンコーダの第１の例に関して、ＤＦ層の詳細を示す図である。It is a figure which shows the detail of the DF layer with respect to the 1st example of the autoencoder in 1st Embodiment of this invention. 本発明の第１の実施の形態におけるオートエンコーダの第２の例を示す図である。It is a figure which shows the 2nd example of the autoencoder in 1st Embodiment of this invention. 本発明の第１の実施の形態におけるオートエンコーダの第２の例に関して、ＤＦ層の詳細を示す図である。It is a figure which shows the detail of the DF layer with respect to the 2nd example of the autoencoder in the 1st Embodiment of this invention. 本発明の第１の実施の形態におけるオートエンコーダの計算処理機能を含む情報推定装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of the information estimation apparatus which includes the calculation processing function of the autoencoder in the 1st Embodiment of this invention. 本発明の第１の実施の形態における計算処理の一例を示すフローチャートである。It is a flowchart which shows an example of the calculation process in 1st Embodiment of this invention. （ａ）は、ガウス分布の幅を表すσの等高線の楕円と、さらにその分布に従って点在するモンテカルロ的にサンプリングした点の散布図とを示す表示方法を説明するための図であり、（ｂ）は、ガウス分布の幅を表すσの等高線の楕円と、さらに、そのガウス楕円の中心値、つまり平均値の点とを示す表示方法を説明するための図である。(A) is a diagram for explaining a display method showing an ellipse of contour lines of σ representing the width of the Gaussian distribution and a scatter plot of Monte Carlo-sampled points scattered according to the distribution. ) Is a diagram for explaining a display method showing an ellipse of contour lines of σ representing the width of a Gaussian distribution and a center value of the Gaussian ellipse, that is, a point of an average value. 本発明の第１の実施の形態における情報推定装置を用いた実験によって得られた、潜在空間の次元数ｎ_zがｎ_z＝２の場合の潜在空間におけるｚの値の分布を示す図であり、図８（ａ）の表示方法で描かれた図である。It is a figure which showed the distribution of the value of z in the latent space when the _{dimension number n z} of the latent space is n _z = 2, which was obtained by the experiment using the information estimation apparatus in the 1st Embodiment of this invention. , Is a diagram drawn by the display method of FIG. 8A. 本発明の第１の実施の形態における情報推定装置を用いた実験によって得られた、潜在空間の次元数ｎ_zがｎ_z＝２の場合の潜在空間におけるｚの値の分布を示す図であり、図８（ｂ）の表示方法で描かれた図である。It is a figure which showed the distribution of the value of z in the latent space when the _{dimension number n z} of the latent space is n _z = 2, which was obtained by the experiment using the information estimation apparatus in the 1st Embodiment of this invention. , The figure drawn by the display method of FIG. 8B. （ａ）は、本発明の第１の実施の形態における情報推定装置を用いた実験結果を評価するために作成された図であって、学習前の状態のオートエンコーダが入力画像を復元した状態を示す図であり、（ｂ）は、本発明の第１の実施の形態における情報推定装置を用いた実験結果を評価するために作成された図であり、学習後の状態のオートエンコーダが入力画像を復元した状態を示す図である。(A) is a figure created for evaluating the experimental result using the information estimation device according to the first embodiment of the present invention, and is a state in which the auto-encoder in the state before learning restores the input image. (B) is a diagram created for evaluating the experimental results using the information estimation device according to the first embodiment of the present invention, and is input by the auto-encoder in the state after learning. It is a figure which shows the state which restored the image. 図９の事後確率分布（ガウス分布）を、本発明の第２の実施の形態により混合ガウス分布の場合に拡張した、入力画像が右上の文字「Ｈ」の画像の場合の実験結果を示す図であり、解析的に計算した混合ガウス分布は等高線で示され、モンテカルロ的に散布図で分布を表したものを重ねて表示させたものである。The figure which shows the experimental result when the input image is the image of the upper right letter "H" which extended the posterior probability distribution (Gaussian distribution) of FIG. 9 in the case of the mixed Gaussian distribution by the 2nd Embodiment of this invention. The analytically calculated mixed Gaussian distribution is shown by contour lines, and the distribution is shown in a scatter plot in a Monte Carlo manner. 図９の事後確率分布（ガウス分布）を、本発明の第２の実施の形態により混合ガウス分布の場合に拡張した、入力画像が右上の文字「Ｈ」の画像の場合の実験結果を示す別の図であり、解析的に計算した混合ガウス分布は等高線で示され、モンテカルロ的に散布図で分布を表したものを重ねて表示させたものである。Another example showing the experimental result when the input image is the image of the upper right letter "H", which is an extension of the posterior probability distribution (Gaussian distribution) of FIG. 9 in the case of the mixed Gaussian distribution according to the second embodiment of the present invention. The analytically calculated mixed Gaussian distribution is shown by contour lines, and the distribution is shown in a scatter plot in a Monte Carlo manner.

以下、図面を参照しながら、本発明の第１及び第２の実施の形態について説明する。 Hereinafter, the first and second embodiments of the present invention will be described with reference to the drawings.

＜第１の実施の形態＞
本発明の第１の実施の形態では、オートエンコーダの潜在空間での出力ｚを、出力ｚの分布を決定づけるパラメータとするのではなく、前述の古典的なオートエンコーダと同様に出力ｚの値そのものであるようにし、かつ、出力ｚの値を、古典的なオートエンコーダの場合のような決定論的なある値とするのではなく、ある確率分布からサンプリングされた確率変数とする。 <First Embodiment>
In the first embodiment of the present invention, the output z in the latent space of the autoencoder is not used as a parameter that determines the distribution of the output z, but the value of the output z itself is the same as that of the above-mentioned classical autoencoder. And the value of the output z is not a deterministic value as in the case of a classical autoencoder, but a random variable sampled from a probability distribution.

具体的には、本発明の第１の実施の形態では、エンコーダを構成するニューラルネットワーク内にドロップアウト層を追加することで、固定値である入力データに対して、エンコーダから出力される値を確率変数に変換する。さらに、ドロップアウトによるベルヌーイ分布がニューラルネットワーク上でどのような形状で伝搬するかを解析的に計算することで、その確率変数の分布形状を計算しておき、従来技術におけるＶＡＥと同様、正則化計算に使用する。 Specifically, in the first embodiment of the present invention, by adding a dropout layer in the neural network constituting the encoder, the value output from the encoder is set with respect to the input data which is a fixed value. Convert to a random variable. Furthermore, by analytically calculating the shape of the Bernoulli distribution due to dropout on the neural network, the distribution shape of the random variable is calculated and regularized as in the case of VAE in the prior art. Used for calculation.

以下、図２〜図５を参照しながら、本発明の実施の形態におけるオートエンコーダの構造について説明する。図２は、本発明の第１の実施の形態におけるオートエンコーダの第１の例を模式的に示す図であり、図３は、本発明の第１の実施の形態におけるオートエンコーダの第１の例に関して、ＤＦ層の詳細を示す図である。また、図４は、本発明の第１の実施の形態におけるオートエンコーダの第２の例を示す図であり、図５は、本発明の第１の実施の形態におけるオートエンコーダの第２の例に関して、ＤＦ層の詳細を示す図である。なお、図２及び図３に示す例では、エンコーダにドロップアウト層が１つ設けられており、図４及び図５に示す例では、エンコーダにドロップアウト層が２つ設けられている。 Hereinafter, the structure of the autoencoder according to the embodiment of the present invention will be described with reference to FIGS. 2 to 5. FIG. 2 is a diagram schematically showing a first example of the autoencoder according to the first embodiment of the present invention, and FIG. 3 is a diagram showing a first example of the autoencoder according to the first embodiment of the present invention. It is a figure which shows the detail of the DF layer with respect to an example. Further, FIG. 4 is a diagram showing a second example of the autoencoder according to the first embodiment of the present invention, and FIG. 5 is a diagram showing a second example of the autoencoder according to the first embodiment of the present invention. It is a figure which shows the detail of the DF layer with respect to. In the examples shown in FIGS. 2 and 3, the encoder is provided with one dropout layer, and in the examples shown in FIGS. 4 and 5, the encoder is provided with two dropout layers.

本発明の第１の実施の形態におけるオートエンコーダでは、古典的なオートエンコーダのエンコーダに、入力データの一部を欠損させてランダム性を生むドロップアウト層と、ドロップアウト層と、重みの計算を行う全結合（Fully Connected：ＦＣ）層を設ける。さらに、そのドロップアウト層とＦＣ層から、出力される値の分布を解析的に計算し、それを正則化の条件に使用する。なお、本明細書では簡易表現のため、ドロップアウト層及びＦＣ層を組み合わせた一体化層をＤＦ層と呼び、ドロップアウト層における計算処理とＦＣ層における計算処理が一緒に行われるものとして説明する。 In the autoencoder according to the first embodiment of the present invention, the encoder of a classical autoencoder is subjected to a dropout layer in which a part of input data is deleted to generate randomness, a dropout layer, and a weight calculation. A Fully Connected (FC) layer is provided. Furthermore, the distribution of the values output from the dropout layer and the FC layer is analytically calculated and used as a condition for regularization. In this specification, for the sake of simplicity, the integrated layer in which the dropout layer and the FC layer are combined is referred to as a DF layer, and the calculation process in the dropout layer and the calculation process in the FC layer are described together. ..

まず、エンコーダにドロップアウト層が１つ設けられた場合について説明する。図２には、エンコーダにドロップアウト層が１つ設けられた場合が図示されている。図１に示す従来のＶＡＥでは、潜在空間での値の次元数はｚの確率分布のパラメータの数であったのに対し、図２に示すオートエンコーダでは、本発明の第１の実施の形態では、潜在空間での値の次元数はｚの次元数ｎ_zそのものとなる。 First, a case where one dropout layer is provided in the encoder will be described. FIG. 2 shows a case where the encoder is provided with one dropout layer. In the conventional VAE shown in FIG. 1, the number of dimensions of the value in the latent space is the number of parameters of the probability distribution of z, whereas in the autoencoder shown in FIG. 2, the first embodiment of the present invention is used. Then, the number of dimensions of the value in the latent space is the number of dimensions n _{z of z} itself.

また、図３には、エンコーダにドロップアウト層が１つ設けられた場合におけるエンコーダのＤＦ１層が図示されている。なお、図３は、図２のエンコーダに含まれるドロップアウト層及びＦＣ層の部分を抜き出して図示したものである。図３のＤＦ１層への入力値Ｘｉｎ^DF1は固定値であり、その出力Ｘｏｕｔ^DF1はドロップアウト層によって変換された確率変数である。出力Ｘｏｕｔ^DF1の確率分布は、例えば特許文献１で提案された計算方法を用いて計算することができる。以下に、その計算方法について説明する。 Further, FIG. 3 shows a DF1 layer of the encoder when the encoder is provided with one dropout layer. Note that FIG. 3 is an extracted view of the dropout layer and the FC layer included in the encoder of FIG. ^{The input value Xin DF1} to the DF1 layer in FIG. 3 is a fixed value, and its output Xout ^DF1 is a random variable converted by the dropout layer. The probability distribution of the output Xout ^DF1 can be calculated using, for example, the calculation method proposed in Patent Document 1. The calculation method will be described below.

ＤＦ１層への入力をＸｉｎ^DF1、出力をＸｏｕｔ^DF1とし、ＤＦ１層のドロップアウト層にあらかじめ設定されたドロップアウト率（データをランダムに欠損させる確率）をｐ_Drop ^DF1とする。また、ＤＦ１層のＦＣ層にあらかじめ設定された重みをＷ_i,j ^DF1とし、バイアスをｂ_i ^DF1とする。ただし、添え字ｉ及びｊは、１≦ｉ≦ｎ_Xout ^DF1、１≦ｊ≦ｎ_Xin ^DF1を満たす整数である。なお、明細書中の表記ｎ_Xin ^DF1は、ｎの下付き添字がＸｉｎ^DF1であることを表し、明細書中の表記ｎ_Xout ^DF1は、ｎの下付き添字がＸｏｕｔ^DF1であることを表す。 The input to the DF1 layer is Xin ^DF1 , the output is Xout ^DF1, and the dropout rate (probability of randomly losing data) set in the dropout layer of the DF1 layer is p _Drop ^DF1 . Moreover, the preset weights FC layer of DF1 layer W _i, and _j ^DF1, the bias and b _i ^DF1. However, the subscripts i and j are integers satisfying _{1 ≦ i ≦ n Xout} ^DF1 and 1 ≦ j ≦ n _Xin ^DF1. The notation n _Xin ^DF1 in the specification indicates that the subscript of n is Xin ^DF1 _{, and the notation n Xout} ^DF1 in the specification indicates that the subscript of n is Xout ^DF1 .

ＤＦ１層への入力Ｘｉｎ^DF1は固定値であり、定数からなるｎ_Xin ^DF1次元のベクトルであって、以下のように表される。 ^{The input Xin DF1} to the DF1 layer is a fixed value, an n _Xin ^DF1 dimensional vector consisting of constants, and is expressed as follows.

一方、ＤＦ１層からの出力Ｘｏｕｔ^DF1は、以下のように表される。 On the other hand, the output Xout ^DF1 from the DF1 layer is represented as follows.

ＤＦ１層からの出力Ｘｏｕｔ^DF1は、ｎ_Xout ^DF1次元のベクトルであり、このベクトルＸｏｕｔ^DF1のｉ番目の要素は以下のとおりである。 ^{The output Xout DF1} from the DF1 layer is an n _Xout ^DF1 dimensional vector, and the i-th element of ^{this vector Xout DF1 is as follows.}

ここで、ドロップアウト層におけるドロップアウトにより、右辺のＷ_i,j ^DF1Ｘｉｎ^DF1 _j項（１≦ｊ≦ｎ_Xin ^DF1）が、確率ｐ_drop ^DF1でランダムに消える（ゼロとなる）。したがって、各項の和である左辺のＸｏｕｔ^DF1 _iは“サンプリング和”としてとらえて計算することができる。このことから、出力Ｘｏｕｔ^DF1は確率変数であり、例えば、以下のようなｎ_Xout ^DF1次元の多変量ガウス分布に従う確率変数であるとする。 Here, due to the dropout in the dropout layer _{, the} ^Wi , j DF1 Xin ^DF1 _j term (1 ≦ j ≦ n _Xin ^DF1 ) on the right side randomly disappears (becomes zero) with the probability p _drop ^DF1. ^{Therefore, the Xout DF1} _{i on the} left side, which is the sum of each term, can be calculated as the "sampling sum". From this, it ^{is assumed that the output Xout DF1} is a random variable, and for example, it is a random variable that follows the following n _Xout ^{DF 1-} dimensional multivariate Gaussian distribution.

ただし、μ_out ^DF1は、平均値を示すｎ_Xout ^DF1次元のベクトル、Σ_out ^DF1は、ｎ_Xout ^DF1×ｎ_Xout ^DF1の分散共分散行列である。平均値μ_out ^DF1及び分散共分散行列Σ_out ^DF1は、以下の式から得られる。 However, μ _out ^DF1 _{is an n Xout} ^{DF 1-} dimensional vector showing an average value, and Σ _out ^DF1 is a variance-covariance matrix of _{n Xout} ^DF1 × n _Xout ^DF1. The mean value μ _out ^DF1 and the variance-covariance matrix Σ _out ^DF1 are obtained from the following equations.

図３のＤＦ１層からの出力は、図２のオートエンコーダのエンコーダからの出力であり、エンコーダから出力される潜在空間における値ｚの確率分布ｑ_φ（ｚ｜ｘ）に対応している。これより表記について、Ｘｏｕｔ^DF1をｚに、μ_out ^DF1をμ_zに、Σ_out ^DF1をΣ_zに、ｎ_Xin ^DF1をｎ_hに、ｎ_Xout ^DF1をｎ_zにそれぞれ置き換えることができ、エンコーダから出力される潜在空間における値ｚは、以下の多変量ガウス分布として表される。 The output from the DF1 layer of FIG. 3 is the output from the encoder of the autoencoder of FIG. 2, _{and corresponds to the probability distribution q φ} (z | x) of the value z in the latent space output from the encoder. From this, regarding the notation, Xout ^DF1 can be replaced with z, μ _out ^DF1 with μ _z , Σ _out ^DF1 with Σ _z , n _Xin ^DF1 with n _h , and n _Xout ^DF1 with n _z. The output value z in the latent space is represented by the following multivariate Gaussian distribution.

ただし、μ_zはｎ_z次元のベクトル、Σ_zはｎ_z×ｎ_zの分散共分散行列である。 However, μ _z is an n _z- dimensional vector, and Σ _z is a variance-covariance matrix of _{n z} × n _z.

次に、エンコーダにドロップアウト層が２つ設けられた場合について説明する。図４には、より複雑な場合として、エンコーダにドロップアウト層が２つ設けられた場合が図示されている。また、図５には、エンコーダにドロップアウト層が２つ設けられた場合におけるエンコーダのＤＦ１層、ＲｅＬｕ（Rectified Linear Unit）層、ＤＦ２層が図示されている。なお、図５は、図４のエンコーダに含まれる２つのドロップアウト層及びＦＣ層と、これらの間に挟まれたＲｅＬｕ層の部分を抜き出して図示したものである。以下、ＤＦ層が２つ存在する場合の計算方法について説明する。 Next, a case where the encoder is provided with two dropout layers will be described. FIG. 4 illustrates a more complex case where the encoder is provided with two dropout layers. Further, FIG. 5 shows a DF1 layer, a ReLu (Rectified Linear Unit) layer, and a DF2 layer of the encoder when the encoder is provided with two dropout layers. Note that FIG. 5 shows an extracted portion of the two dropout layers and the FC layer included in the encoder of FIG. 4 and the ReLu layer sandwiched between them. Hereinafter, the calculation method when two DF layers are present will be described.

図５の場合、ＲｅＬｕ層を挟んで、２つのＤＦ層、すなわちＤＦ１層及びＤＦ２層が設けられている。１つ目のＤＦ１層への入力、出力は上述のとおりである。また、ＤＦ１層とＤＦ２層の間にあるＲｅＬｕ層などの非線形関数の計算方法としては、例えば、特許文献１に挙げたような多変量ガウス近似として計算する方法や、単純に、ガウス関数が負の領域にあるか正の領域にあるかの判断で近似して計算する方法（本願出願時には非公開であるが、本発明者を発明者とする特許出願（特願２０１７−１９６７４０）に係る明細書及び図面に記載された計算方法）など使用可能であるが、本発明はこれらの計算方法に限定されるものではない。 In the case of FIG. 5, two DF layers, that is, a DF1 layer and a DF2 layer are provided with the ReLu layer interposed therebetween. The inputs and outputs to the first DF1 layer are as described above. Further, as a method of calculating a non-linear function such as the ReLu layer between the DF1 layer and the DF2 layer, for example, a method of calculating as a multivariate Gaussian approximation as described in Patent Document 1, or simply a Gaussian function is negative. Details of a patent application (Japanese Patent Application No. 2017-196740) in which the present inventor is the inventor, although it is not disclosed at the time of filing the application of the present application. (Calculation methods described in books and drawings) can be used, but the present invention is not limited to these calculation methods.

以下、２つ目のＤＦ２層への入力、出力について説明する。ＤＦ２層への入力をＸｉｎ^DF2、出力をＸｏｕｔ^DF2とし、ＤＦ２層のドロップアウト率をｐ_Drop ^DF2とする。また、ＤＦ２層のＦＣ層の重みをＷ_i,j ^DF2とし、バイアスをｂ_i ^DF2とする。ただし、添え字ｉ及びｊは、１≦ｉ≦ｎ_Xout ^DF2、１≦ｊ≦ｎ_Xin ^DF2を満たす整数である。なお、明細書中の表記ｎ_Xin ^DF2は、ｎの下付き添字がＸｉｎ^DF2であることを表し、明細書中の表記ｎ_Xout ^DF2は、ｎの下付き添字がＸｏｕｔ^DF2であることを表す。 Hereinafter, the input and output to the second DF2 layer will be described. The input to the DF2 layer is Xin ^DF2 , the output is Xout ^DF2, and the dropout rate of the DF2 layer is p _Drop ^DF2 . Moreover, the weight of the FC layer of DF2 layer W _i, and _j ^DF2, the bias and b _i ^DF2. However, the subscripts i and j are integers that satisfy _{1 ≦ i ≦ n Xout} ^DF2 and 1 ≦ j ≦ n _Xin ^DF2. The notation n _Xin ^DF2 in the specification indicates that the subscript of n is Xin ^DF2 _{, and the notation n Xout} ^DF2 in the specification indicates that the subscript of n is Xout ^DF2 .

ＤＦ２層への入力Ｘｉｎ^DF2、出力Ｘｏｕｔ^DF2は両方とも、多変量ガウス分布に従う確率変数となり、以下のように表される。 ^{Both the input Xin DF2} and the output Xout ^DF2 to the DF2 layer are random variables that follow a multivariate Gaussian distribution and are expressed as follows.

ただし、μ_in ^DF2はｎ_Xin ^DF2次元のベクトル、Σ_in ^DF2はｎ_Xin ^DF2×ｎ_Xin ^DF2の分散共分散行列であり、μ_out ^DF2はｎ_Xout ^DF2次元のベクトル、Σ_out ^DF2はｎ_Xout ^DF2×ｎ_Xout ^DF2の分散共分散行列である。 However, μ _in ^DF2 is an n _Xin ^DF2 dimensional vector, Σ _in ^DF2 is a variance-covariance matrix of _{n Xin} ^DF2 × n _Xin ^DF2 _{, μ out} ^DF2 is an n _Xout ^DF2 dimensional vector, and Σ _out ^DF2 is an n _Xout ^DF2. × n _Xout ^DF2 variance-covariance matrix.

平均値に関しては、以下のように計算できる。 The average value can be calculated as follows.

また、分散共分散行列に関しては、以下のように計算できる。 The variance-covariance matrix can be calculated as follows.

上記の右辺の第１項に関しては、以下のように計算できる。 The first term on the right side of the above can be calculated as follows.

図５のＤＦ２層からの出力は、図４のオートエンコーダのエンコーダからの出力であり、エンコーダから出力される潜在空間における値ｚの確率分布ｑ_φ（ｚ｜ｘ）に対応している。したがって、ドロップアウト層が１つ存在する場合と同様に、表記について、Ｘｏｕｔ^DF2をｚに、μ_out ^DF2をμ_zに、Σ_out ^DF2をΣ_zに、ｎ_Xin ^DF2をｎ_hに、ｎ_Xout ^DF2をｎ_zにそれぞれ置き換えることができ、エンコーダから出力される潜在空間における値ｚは、以下の多変量ガウス分布として表される。 The output from the DF2 layer of FIG. 5 is the output from the encoder of the autoencoder of FIG. 4, _{and corresponds to the probability distribution q φ} (z | x) of the value z in the latent space output from the encoder. Therefore, as in the case where there is one dropout layer, the notation is such that Xout ^DF2 is z, μ _out ^DF2 is μ _z , Σ _out ^DF2 is Σ _z , n _Xin ^DF2 is n _h , and n _{X out.} ^DF2 can be replaced with n _z , respectively, and the value z in the latent space output from the encoder is expressed as the following multivariate Gaussian distribution.

なお、ここでは、ドロップアウト層が２つ存在する場合について説明しているが、ドロップアウト層が３つ以上存在していてもよい。例えば、ＤＦ２層からの出力値が更なるドロップアウト層（３つ目のドロップアウト層）に入力されてもよく、この場合も、上述したＤＦ２層における計算方法と同様の計算方法によって、更なるドロップアウト層からの出力値を求めることができる。 Although the case where two dropout layers are present is described here, three or more dropout layers may be present. For example, the output value from the DF2 layer may be input to a further dropout layer (third dropout layer), and in this case as well, further by a calculation method similar to the calculation method in the DF2 layer described above. The output value from the dropout layer can be calculated.

以上のように、本発明の第１の実施の形態では、固定値である入力データをドロップアウトによって確率変数に変換して確率分布を生じさせ、解析計算方法により、その確率分布を計算する。また、この計算結果を、従来技術におけるＶＡＥと同様に、正則化の条件に使用する。すなわち、下記の式で表される確率分布ｑ_φ（ｚ｜ｘ）が、下記の式で表される事前分布ｐ_θ（ｚ）とあまりに異ならないよう、同じ形状に留めるための条件を課す。 As described above, in the first embodiment of the present invention, input data having a fixed value is converted into a random variable by dropout to generate a probability distribution, and the probability distribution is calculated by an analysis calculation method. Further, this calculation result is used as a condition for regularization as in the case of VAE in the prior art. That is, a condition is imposed so that the probability distribution q _φ (z | x) represented by the following formula does not differ too much from the _{prior distribution p θ (z) represented by the following formula.}

例えば、上記の確率分布ｑ_φ（ｚ｜ｘ）と事前分布ｐ_θ（ｚ）とが同じ形状に留まっているかを判定するため、前述のように多変量ガウス分布のＫＬダイバージェンスを使い、２つの多変量ガウス分布の距離を最小にするようなコスト関数を設定する。その式を以下に示す。 For example, in order to determine whether the above probability distribution q _φ (z | x) and the prior distribution p _θ (z) remain in the same shape, two KL divergence of the multivariate Gaussian distribution are used as described above. Set a cost function that minimizes the distance of the multivariate Gaussian distribution. The formula is shown below.

本発明の第１の実施の形態における計算方法は、非特許文献１に開示されている従来技術の計算方法と比較すると、共分散の値を計算している点で大きく異なっている。すなわち、非特許文献１では共分散の値を求めておらず、共分散の値をゼロの値としており、あるいは、共分散の値を求めるためにはさらにニューロンの数を増やす必要があったのに対し、本発明の第１の実施の形態では、エンコーダのより少ないニューロンの数でありながら、上述した解析計算によって共分散の値の計算も行っている。 The calculation method in the first embodiment of the present invention is significantly different from the calculation method of the prior art disclosed in Non-Patent Document 1 in that the value of covariance is calculated. That is, in Non-Patent Document 1, the value of covariance is not obtained, the value of covariance is set to zero, or the number of neurons needs to be further increased in order to obtain the value of covariance. On the other hand, in the first embodiment of the present invention, the value of the covariance is also calculated by the above-mentioned analytical calculation, although the number of neurons is smaller in the encoder.

また、本発明の第１の実施の形態における計算方法によれば、オートエンコーダの出力が入力データを再現できるかという条件の判定について、従来技術に係るＶＡＥの判定より簡単に行うことができる。従来技術によれば、エンコーダの出力値はｚの確率分布のパラメータであるため、例えばデコーダに入力するための値を得るためには、さらにその確率分布を作って、ｚの値をサンプリングしなければならない。一方、本発明の第１の実施の形態では、エンコーダの出力そのものが、ｚの値であり、すなわち、エンコーダの出力値をそのままデコーダの入力値として使用することができる。ｚの値を得た後のデコーダにおける処理は、本発明の第１の実施の形態も従来技術も同じである。 Further, according to the calculation method in the first embodiment of the present invention, the determination of the condition that the output of the autoencoder can reproduce the input data can be performed more easily than the determination of VAE according to the prior art. According to the prior art, the output value of the encoder is a parameter of the probability distribution of z. Therefore, for example, in order to obtain a value to be input to the decoder, the probability distribution must be further created and the value of z must be sampled. Must be. On the other hand, in the first embodiment of the present invention, the output of the encoder itself is a value of z, that is, the output value of the encoder can be used as it is as the input value of the decoder. The processing in the decoder after obtaining the value of z is the same as in the first embodiment of the present invention and in the prior art.

また、本発明の第１の実施の形態では、ドロップアウト率は、エンコーダで生成するｚの確率分布を表現するために使用されることから、例えばドロップアウト層が１つの場合は、ドロップアウト率は相対的に大きい値（例えば、０．７以上の値）とすることが望ましい。 Further, in the first embodiment of the present invention, the dropout rate is used to express the probability distribution of z generated by the encoder. Therefore, for example, when there is one dropout layer, the dropout rate is used. Is preferably a relatively large value (for example, a value of 0.7 or more).

次に、本発明の第１の実施の形態における処理を実行することが可能な情報推定装置について説明する。図６は、本発明の第１の実施の形態における情報推定装置の構成の一例を示すブロック図である。図６の情報推定装置１０は、ニューラルネットワークを用いて推定処理を行う推定器であり、オートエンコーダ計算部２０、エンコーダ出力分布形状計算部３０、コスト関数計算部４０、パラメータ最適化計算部５０を有する。 Next, an information estimation device capable of executing the process according to the first embodiment of the present invention will be described. FIG. 6 is a block diagram showing an example of the configuration of the information estimation device according to the first embodiment of the present invention. The information estimation device 10 of FIG. 6 is an estimator that performs estimation processing using a neural network, and includes an autoencoder calculation unit 20, an encoder output distribution shape calculation unit 30, a cost function calculation unit 40, and a parameter optimization calculation unit 50. Have.

図６に示すブロック図は、本発明に関連した機能を表しているにすぎず、実際の実装では、ハードウェア、ソフトウェア、ファームウェア、又はそれらの任意の組み合わせによって実現されてもよい。ソフトウェアで実装される機能は、１つ又は複数の命令若しくはコードとして任意のコンピュータ可読媒体に記憶され、これらの命令又はコードは、ＣＰＵ（Central Processing Unit：中央処理ユニット）やＧＰＵ（Graphics Processing Unit：グラフィックスプロセッシングユニット）などのハードウェアベースの処理ユニットによって実行可能である。また、本発明に関連した機能は、ＩＣ（Integrated Circuit：集積回路）やＩＣチップセットなどを含む様々なデバイスによって実現されてもよい。 The block diagram shown in FIG. 6 merely represents a function related to the present invention, and in an actual implementation, it may be realized by hardware, software, firmware, or any combination thereof. Functions implemented in hardware are stored in any computer-readable medium as one or more instructions or codes, and these instructions or codes are stored in a CPU (Central Processing Unit) or GPU (Graphics Processing Unit:). It can be executed by a hardware-based processing unit such as a graphics processing unit). Further, the functions related to the present invention may be realized by various devices including an IC (Integrated Circuit), an IC chipset, and the like.

オートエンコーダ計算部２０は、ニューラルネットワークにより構成されたエンコーダ及びデコーダを含むオートエンコーダを有し、入力データＸについてエンコーダ及びデコーダで計算処理を行って、出力データＸを出力する機能を有する。オートエンコーダ計算部２０における計算に用いられるオートエンコーダは、図２〜図５を参照しながら説明したように、１つ又は２つ以上のドロップアウト層がエンコーダ内に設けられており、ドロップアウト層においてデータの一部がランダムに欠損される。これにより、オートエンコーダのエンコーダからの出力（潜在空間における出力）の値ｚを確率変数とすることができる。 The autoencoder calculation unit 20 has an autoencoder including an encoder and a decoder configured by a neural network, and has a function of performing calculation processing on the input data X by the encoder and the decoder and outputting the output data X. The autoencoder used for the calculation in the autoencoder calculation unit 20 has one or more dropout layers provided in the encoder as described with reference to FIGS. 2 to 5, and the dropout layer. In, a part of the data is randomly lost. As a result, the value z of the output (output in the latent space) from the encoder of the autoencoder can be used as a random variable.

エンコーダ出力分布形状計算部３０は、入力のデータｘがエンコーダでドロップアウトによってどのような確率分布の形状になったかを解析的に計算する機能を有する。エンコーダ出力分布形状計算部３０は、例えば、入力データｘ、ドロップアウト層におけるドロップアウト率、パラメータ（例えば、ＦＣ層における重み及びバイアス）から、潜在空間における出力ｚの分布形状を計算することができる。 The encoder output distribution shape calculation unit 30 has a function of analytically calculating what kind of probability distribution shape the input data x has become due to dropout by the encoder. The encoder output distribution shape calculation unit 30 can calculate the distribution shape of the output z in the latent space from, for example, the input data x, the dropout rate in the dropout layer, and the parameters (for example, the weight and the bias in the FC layer). ..

コスト関数計算部４０は、ドロップアウトによるエンコーダ出力分布形状計算部３０で計算された分布形状（潜在空間における出力ｚの分布形状）から正則化の条件を満たすか計算し、さらに、オートエンコーダ計算部２０から算出される出力ｘが入力ｘとどれだけ似ているかを計算することで、これら２つの計算結果を合わせた全体のコスト関数の値を計算する機能を有する。 The cost function calculation unit 40 calculates whether the regularization condition is satisfied from the distribution shape (distribution shape of output z in the latent space) calculated by the encoder output distribution shape calculation unit 30 by dropout, and further, the auto encoder calculation unit By calculating how similar the output x calculated from 20 is to the input x, it has a function of calculating the value of the entire cost function by combining these two calculation results.

パラメータ最適化計算部５０は、コスト関数計算部４０で計算されたコスト関数の値が最適化されるように、オートエンコーダ計算部２０で参照した重み及びバイアスをどの値に最適化するのかを計算する機能を有する。パラメータ最適化計算部５０は、コスト関数の値が最小になるようパラメータ（重み及びバイアス）を計算し、この計算の結果得られたパラメータはオートエンコーダ計算部２０に供給されて、オートエンコーダのパラメータが更新される。 The parameter optimization calculation unit 50 calculates to which value the weight and bias referred to by the auto-encoder calculation unit 20 are optimized so that the value of the cost function calculated by the cost function calculation unit 40 is optimized. Has the function of The parameter optimization calculation unit 50 calculates parameters (weights and biases) so that the value of the cost function is minimized, and the parameters obtained as a result of this calculation are supplied to the auto-encoder calculation unit 20 to obtain the parameters of the auto-encoder. Is updated.

以上のように構成された情報推定装置１０において、大量の入力データＸに対して繰返し最適化が行われることで、オートエンコーダから最適解が得られるように最適化が行われる。 In the information estimation device 10 configured as described above, by repeatedly optimizing a large amount of input data X, optimization is performed so that an optimum solution can be obtained from the autoencoder.

次に、図７を参照しながら、図６に図示されている情報推定装置１０における処理の一例について説明する。図７は、本発明の第１の実施の形態における情報推定装置の処理の一例を示すフローチャートである。 Next, an example of processing in the information estimation device 10 illustrated in FIG. 6 will be described with reference to FIG. 7. FIG. 7 is a flowchart showing an example of processing of the information estimation device according to the first embodiment of the present invention.

図７に示すフローチャートにおいて、最初に、オートエンコーダ計算部２０は、オートエンコーダのパラメータ（重み、バイアス）を初期化する（ステップＳ１０１）。そして、学習データＸがオートエンコーダの入力Ｘとして入力されると（ステップＳ１０２）、オートエンコーダ計算部２０は、オートエンコーダのエンコーダにおいて潜在空間での値ｚを計算する（ステップＳ１０３）。 In the flowchart shown in FIG. 7, the autoencoder calculation unit 20 first initializes the parameters (weight, bias) of the autoencoder (step S101). Then, when the learning data X is input as the input X of the autoencoder (step S102), the autoencoder calculation unit 20 calculates the value z in the latent space in the autoencoder encoder (step S103).

また、エンコーダ出力分布形状計算部３０は、ドロップアウト率、入力データＸ、パラメータ（重み、バイアス）から、潜在空間での値ｚの分布形状を計算する（ステップＳ１０４）。エンコーダ出力分布形状計算部３０で計算された潜在空間での値ｚの分布形状に係る情報は、コスト関数計算部４０に供給される。 Further, the encoder output distribution shape calculation unit 30 calculates the distribution shape of the value z in the latent space from the dropout rate, the input data X, and the parameters (weight, bias) (step S104). Information related to the distribution shape of the value z in the latent space calculated by the encoder output distribution shape calculation unit 30 is supplied to the cost function calculation unit 40.

オートエンコーダ計算部２０は、さらに、潜在空間での値ｚを用いて、オートエンコーダのデコーダの出力Ｘを計算する（ステップＳ１０５）。オートエンコーダ計算部２０で計算されたオートエンコーダのデコーダの出力Ｘは、コスト関数計算部４０に供給される。 The autoencoder calculation unit 20 further calculates the output X of the autoencoder decoder using the value z in the latent space (step S105). The output X of the autoencoder decoder calculated by the autoencoder calculation unit 20 is supplied to the cost function calculation unit 40.

コスト関数計算部４０は、潜在空間での値ｚの分布形状に係る情報に基づいて正則化の条件を満たすかを計算し、さらに、出力Ｘが入力Ｘとどれだけ似ているかを計算して、これらの２つの計算結果を合わせた全体のコスト関数の値を計算する（ステップＳ１０６）。 The cost function calculation unit 40 calculates whether the regularization condition is satisfied based on the information related to the distribution shape of the value z in the latent space, and further calculates how similar the output X is to the input X. , Calculate the value of the entire cost function by combining these two calculation results (step S106).

パラメータ最適化計算部５０は、コスト関数計算部４０で計算されたコスト関数の値が最小になるようパラメータ（重み及びバイアス）を計算し、この計算結果に基づいて、オートエンコーダ計算部２０におけるオートエンコーダのパラメータが更新される（ステップＳ１０７）。 The parameter optimization calculation unit 50 calculates parameters (weights and biases) so that the value of the cost function calculated by the cost function calculation unit 40 is minimized, and based on this calculation result, the auto encoder calculation unit 20 auto. The encoder parameters are updated (step S107).

未処理の新しい学習データＸが存在する場合（ステップＳ１０８で「はい」）にはステップＳ１０２に戻り、新しい学習データＸについて同様の処理（ステップＳ１０３〜Ｓ１０７の処理）が実行される。すなわち、大量の学習データＸについて、ステップＳ１０３〜Ｓ１０７の処理が繰り返し実行される。一方、すべての学習データＸについて処理が実行され、未処理の新しい学習データＸが存在しない場合（ステップＳ１０８で「いいえ」）には、処理は終了となる。 If there is unprocessed new learning data X (“Yes” in step S108), the process returns to step S102, and the same processing (processing of steps S103 to S107) is executed for the new learning data X. That is, the processes of steps S103 to S107 are repeatedly executed for a large amount of training data X. On the other hand, when the processing is executed for all the training data X and there is no new unprocessed learning data X (“No” in step S108), the processing ends.

次に、本発明の第１の実施の形態における情報推定装置を用いて実際に学習最適化計算を行った場合の実験結果について示す。なお、以下に記載する実験においては、図２及び図３に示されているオートエンコーダを採用し、エンコーダにドロップアウト層を１つ設けている。また、潜在空間での値ｚの次元数ｎ_zをｎ_z＝２としている。さらに、本発明に係る技術分野で利用されているＭＮＩＳＴデータ（０〜９の手書き数字の画像セット）を使用して学習を行うことで、入力されたＭＮＩＳＴデータを出力において復元するオートエンコーダを構築している。 Next, the experimental results when the learning optimization calculation is actually performed using the information estimation device according to the first embodiment of the present invention will be shown. In the experiment described below, the autoencoder shown in FIGS. 2 and 3 is adopted, and the encoder is provided with one dropout layer. _{Further, the number of dimensions n z} of the value z in the latent space is set to n _z = 2. Further, by performing learning using MNIST data (an image set of handwritten numbers 0 to 9) used in the technical field according to the present invention, an auto encoder that restores the input MNIST data at the output is constructed. doing.

最適化のためのアルゴリズムには、二乗平均平方根（ＲＭＳ：root mean square）方式を使い、学習率０．００１でオートエンコーダの重みとバイアスを計算している。また、上述の事前分布は、以下のようにして計算している。 The root mean square (RMS) method is used as the optimization algorithm, and the weight and bias of the autoencoder are calculated with a learning rate of 0.001. The above prior distribution is calculated as follows.

なお、もちろん、分散共分散行列の非対角項の部分、すなわち、共分散値を０以外の値にして、正の相関や負の相関を持たせることもできる。 Of course, the off-diagonal portion of the variance-covariance matrix, that is, the covariance value can be set to a value other than 0 to have a positive correlation or a negative correlation.

図９及び図１０に、本発明の第１の実施の形態における情報推定装置を用いた実験によって得られた、潜在空間の次元数ｎ_zがｎ_z＝２の場合の潜在空間におけるｚの値の分布を示す。なお、２次元のガウス分布を視覚化表示する方法としては、例えば、図８（ａ）に示すように、ガウス分布の幅を表すσの等高線の楕円と、さらにその分布に従って点在するモンテカルロ的（何度も試行を繰り返すこと）にサンプリングした点の散布図とを示す表示方法と、図８（ｂ）に示すように、ガウス分布の幅を表すσの等高線の楕円と、さらに、そのガウス楕円の中心値、つまり平均値の点とを示す表示方法がある。図９は、実験結果を図８（ａ）の表示方法で表した図であり、図１０は、実験結果を図８（ｂ）の表示方法で表した図である。 9 and 10 show the value of z in the latent space when the _{number of dimensions n z} of the latent space is n _z = 2, which was obtained by an experiment using the information estimation device according to the first embodiment of the present invention. The distribution of is shown. As a method of visualizing and displaying the two-dimensional Gaussian distribution, for example, as shown in FIG. 8A, an ellipse of σ contour lines representing the width of the Gaussian distribution and a Monte Carlo-like ellipse scattered according to the distribution. A display method showing a scatter plot of the sampled points (by repeating the trial many times), an ellipse with a contour line of σ representing the width of the Gaussian distribution, and the Gaussian as shown in FIG. 8 (b). There is a display method that shows the center value of the ellipse, that is, the point of the average value. FIG. 9 is a diagram showing the experimental results by the display method of FIG. 8 (a), and FIG. 10 is a diagram showing the experimental results by the display method of FIG. 8 (b).

図９及び図１０に図示されている実験結果は、ＭＮＩＳＴデータを用いて５０００回の最適化学習を行った状態で、モンテカルロ的に４００個サンプリングした場合の潜在空間におけるｚの値の分布を示している。ＭＮＩＳＴデータの手書き数字０〜９いずれかのある１つの画像入力データに対し、１つのｚの値の分布（楕円）が潜在空間に存在する。図９及び図１０では、ＭＮＩＳＴデータの画像の異なる手書き数字０〜９のそれぞれに対応する潜在空間でのｚの値が異なる色によって表されている。 The experimental results shown in FIGS. 9 and 10 show the distribution of the z value in the latent space when 400 pieces are sampled in a Monte Carlo manner in a state where the optimization learning is performed 5000 times using the MNIST data. ing. For one image input data having any of the handwritten numbers 0 to 9 of the MNIST data, one z value distribution (ellipse) exists in the latent space. In FIGS. 9 and 10, the value of z in the latent space corresponding to each of the different handwritten numbers 0 to 9 in the image of the MNIST data is represented by different colors.

なお、本発明に係る技術分野では、通常、ＭＮＩＳＴデータの手書き数字０〜９に対応して、例えばＶＡＥにおける潜在空間での値ｚを色分け表示することが行われている。図９及び図１０も、当業者が容易に理解できるようにこうした慣例にならって作成されたものであって本来はカラー図面であるが、モノクロ図面では色を表現することが困難である。図９及び図１０に関して、手書き数字０〜９及び各数字に対応づけられた色について概略的に説明すると、潜在空間での値ｚは、手書き数字が０の場合は赤、１の場合は緑、２の場合は青、３の場合は黄色、４の場合は水色、５の場合は紫、６の場合はオレンジ、７の場合はピンク、８の場合は灰色、９の場合は黒にそれぞれ対応している。また、必ずしも正確な表現ではないが、図９及び図１０の中心に対して、赤の点は１時の方向、緑の点は９時の方向、青の点は１２時の方向、黄色の点は５時の方向、水色の点は５時の方向、紫の点は６時の方向、オレンジの点は５時の方向、ピンクの点は６時の方向、灰色の点は１１時の方向、黒の点は４時の方向に塊を形成して広がりを有している。このように、図９及び図１０では、２次元の潜在空間内で、同じ色同士、つまり同じ手書き数字同士が塊を形成して広がっている。したがって、入力されたＭＮＩＳＴデータに対して、手書き数字０〜９のいずれの画像であったのかを、正解ラベル無しの教師無し学習で、自動的に分類ができていることがわかる。 In the technical field according to the present invention, the value z in the latent space in, for example, VAE is displayed in different colors corresponding to the handwritten numbers 0 to 9 of the MNIST data. 9 and 10 are also created according to such a convention so that those skilled in the art can easily understand them, and are originally color drawings, but it is difficult to express colors in monochrome drawings. To schematically explain the handwritten numbers 0 to 9 and the colors associated with each number with respect to FIGS. 9 and 10, the value z in the latent space is red when the handwritten number is 0 and green when the handwritten number is 1. 2 is blue, 3 is yellow, 4 is light blue, 5 is purple, 6 is orange, 7 is pink, 8 is gray, and 9 is black. It corresponds. Also, although it is not always an accurate expression, the red point is in the 1 o'clock direction, the green point is in the 9 o'clock direction, the blue point is in the 12 o'clock direction, and yellow with respect to the center of FIGS. The dots are in the 5 o'clock direction, the light blue dots are in the 5 o'clock direction, the purple dots are in the 6 o'clock direction, the orange dots are in the 5 o'clock direction, the pink dots are in the 6 o'clock direction, and the gray dots are in the 11 o'clock direction. The direction and the black dots form a lump in the direction of 4 o'clock and have a spread. As described above, in FIGS. 9 and 10, the same colors, that is, the same handwritten numbers, form a lump and spread in the two-dimensional latent space. Therefore, it can be seen that which image of the handwritten numbers 0 to 9 was automatically classified with respect to the input MNIST data by unsupervised learning without a correct answer label.

なお、例えば図９では、本発明の第１の実施の形態における解析的な計算によって得られた、潜在空間でのｚの値のガウス分布のパラメータ（平均値、分散共分散値）に基づいて、各手書き数字の入力に対応する事後確率分布をｑ_φ（ｚ｜ｘ）が楕円で表されている。さらに、解析的な計算によって得られた事後確率分布（楕円）が正しいかどうかを視覚的に検証すべく、それぞれの楕円に対してモンテカルロ的にドロップアウトにより確率的に４００個分散させた点を散布図としてプロットしている。これは、確かに解析的な計算によって得られた楕円が、ドロップアウトにより生じた確率分布をとらえていると評価するために行ったものであるが、実際に実施する場合には、このような細かい点を描画するための標本計算は不要である。 For example, in FIG. 9, based on the Gaussian distribution parameters (mean value, variance-covariance value) of the z value in the latent space obtained by the analytical calculation in the first embodiment of the present invention. , The posterior probability distribution corresponding to the input of each handwritten number is represented by an ellipse of _{q φ (z | x).} Furthermore, in order to visually verify whether the posterior probability distribution (ellipse) obtained by analytical calculation is correct, 400 points were stochastically dispersed for each ellipse by Monte Carlo dropout. It is plotted as a scatter plot. This was done to evaluate that the ellipse obtained by the analytical calculation certainly captures the probability distribution generated by the dropout, but when it is actually implemented, it is like this. No sample calculation is required to draw fine points.

一方、非特許文献１に開示されている従来技術に係るＶＡＥでは、図１を参照して説明したように、オートエンコーダの中心の潜在空間に関して計算できるのは、ｚの値そのものではなく、ｚがとる分布のパラメータである。したがって、従来技術に係るＶＡＥでは、図９及び図１０に示すようなｚの値の散布図を直接描画することができない。このように、従来技術に係るＶＡＥでは共分散値の計算は行われないことから、平均、分散、共分散の全てを使って初めて分かる確率分布の形状、すなわち、図９及び図１０に示されている楕円形状を描くことができない。したがって、従来技術に係るＶＡＥでは、実際の個々のｚの値が潜在空間内で、異なる入力の手書き数字画像ごとに重なっているのか、あるいはきちんと分離できているのかを見ることもできない。 On the other hand, in the VAE according to the prior art disclosed in Non-Patent Document 1, as described with reference to FIG. 1, it is not the value of z itself but the value of z that can be calculated with respect to the latent space at the center of the autoencoder. It is a parameter of the distribution taken by. Therefore, in the VAE according to the prior art, it is not possible to directly draw a scatter plot of the value of z as shown in FIGS. 9 and 10. As described above, since the covariance value is not calculated in the VAE according to the prior art, the shape of the probability distribution that can be understood only by using all of the mean, the variance, and the covariance, that is, is shown in FIGS. 9 and 10. I can't draw the elliptical shape. Therefore, in the VAE according to the prior art, it is not possible to see whether the actual individual z values are overlapped or properly separated for each handwritten digit image of different inputs in the latent space.

また、もし、従来のＶＡＥで得られる結果を用いて図９及び図１０に示すような分布を表示しようとする場合には、ｚがとる分布のパラメータとして平均値μ_zと分散値ｄｉａｇ（Σ_z）だけではなく、潜在空間における共分散値ｏｆｆｄｉａｇ（Σ_z）の出力も用意して重みを学習させ、学習後の完成した分布からサンプリングを行ったうえで散布図として表示する必要がある。すなわち、従来のＶＡＥで共分散値を計算しようとした場合には、分布形状を決定づけるパラメータがより多く必要となり、より複雑な構造を設計する必要がある。 Further, if the distribution as shown in FIGS. 9 and 10 is to be displayed using the results obtained by the conventional VAE, the mean value μ _z and the variance value diag (Σ) are used as the parameters of the distribution taken by z. _{It is necessary to prepare not only z} ) but also the output of the covariance value offdiag (Σ _z ) in the latent space to learn the weights, sample the completed distribution after learning, and then display it as a scatter plot. That is, when trying to calculate the covariance value by the conventional VAE, more parameters for determining the distribution shape are required, and it is necessary to design a more complicated structure.

また、図１１（ａ）及び（ｂ）は、本発明の第１の実施の形態における情報推定装置を用いた実験結果を評価するために作成された図である。図１１（ａ）及び（ｂ）は、２次元の潜在空間における２０×２０の各グリッドでサンプリングを行って、各グリッドの値をデコーダで手書き数字の画像に復元した結果を、そのグリッドの位置を反映させたまま並べてプロットした図である。なお、図１１（ａ）には、オートエンコーダの最適化学習の回数がゼロ（学習回数＝０、すなわち学習前）のときに得られた出力が図示されており、図１１（ｂ）には、オートエンコーダの最適化学習の回数が５０００回目（学習回数＝５０００、すなわち学習後）のときに得られた出力が図示されている。 11 (a) and 11 (b) are diagrams created for evaluating the experimental results using the information estimation device according to the first embodiment of the present invention. In FIGS. 11A and 11B, sampling is performed on each 20 × 20 grid in a two-dimensional latent space, and the value of each grid is restored to an image of handwritten numbers by a decoder. It is a figure plotted side by side while reflecting. Note that FIG. 11 (a) shows the output obtained when the number of times of optimization learning of the autoencoder is zero (number of times of learning = 0, that is, before learning), and FIG. 11 (b) shows. , The output obtained when the number of times of optimization learning of the autoencoder is the 5000th time (the number of times of learning = 5000, that is, after learning) is shown in the figure.

最適化学習の回数がゼロのときには、オートエンコーダからの出力は、入力された手書き数字画像を復元できておらず、図１１（ａ）に示すようにただのランダムなノイズである。一方、最適化学習の回数が５０００回目では、オートエンコーダからの出力は、図１１（ａ）に示すように入力された手書き数字画像を復元できていることがわかる。また、形状が似たような数字は、潜在空間内で似たような箇所に存在しており、従来技術に係るＶＡＥと同じような結果が得られている。 When the number of optimization learnings is zero, the output from the autoencoder cannot restore the input handwritten numeric image, and is just random noise as shown in FIG. 11A. On the other hand, when the number of times of optimization learning is 5000, it can be seen that the output from the autoencoder can restore the input handwritten numeric image as shown in FIG. 11A. Further, numbers having similar shapes exist in similar places in the latent space, and the same result as VAE according to the prior art is obtained.

＜第２の実施の形態＞
次に、本発明の第２の実施の形態について説明する。上述した第１の実施の形態では、潜在空間でのｚの値の確率分布ｑ_φ（ｚ｜ｘ）は多変量ガウス分布であるとして計算を行っている。しかしながら、ＤＦ層からの出力Ｘｏｕｔ^DFを計算するためのｘｉｎ^DF _jＷ_i,j ^DF項の中に、他の項に比べて逸脱して大きい値を持つ項が存在する場合には、上述した第１の実施の形態のようなＤＦ層からの出力Ｘｏｕｔ^DFを多変量ガウス分布とする近似が成り立たない。その場合は、特許文献１に記述されるように、ピーク項と呼ばれる逸脱したｘｉｎ^DF _jＷ_i,j ^DF項に対しては、ピーク項がドロップアウトされた場合及びドロップアウトされなかった場合を個別に考えることで、確率変数ではなく、条件確率下での定数ととらえ、それぞれの場合の下で、上述した第１の実施の形態のような多変量ガウス分布として計算することができる。そして、その場合は、複数の場合ごとの条件確率下での多変量ガウス分布となるため、ＤＦ層からの出力Ｘｏｕｔ^DFは多変量“混合”ガウス分布となる。 <Second embodiment>
Next, a second embodiment of the present invention will be described. In the first embodiment described above, the probability distribution q _φ (z | x) of the value of z in the latent space is calculated assuming that it is a multivariate Gaussian distribution. However, if there is a term in the xin ^DF _j _{Wi, j} ^DF ^{term for calculating the output Xout DF} from the DF layer that deviates from the other terms and has a larger value, it is described above. The approximation of the output Xout ^DF from the DF layer as a multivariate Gaussian distribution as in the first embodiment does not hold. In that case, as described in Patent Document 1, for the deviated xin ^DF _j _{Wi, j} ^DF term called the peak term, the case where the peak term is dropped out and the case where the peak term is not dropped out By considering them individually, they can be regarded as constants under conditional probabilities rather than random variables, and can be calculated as multivariate Gaussian distributions as in the first embodiment described above under each case. ^{Then, in that case, the output Xout DF} from the DF layer has a multivariate "mixed" Gaussian distribution because it has a multivariate Gaussian distribution under the conditional probability for each of a plurality of cases.

なお、上述の第１の実施の形態では、ＤＦ層からの出力Ｘｏｕｔ^DFの重みの計算に相当する項をＷ_i,j ^DFＸｉｎ^DF _jと記載していたが、第２の実施の形態では、ｘｉｎ^DF _jＷ_i,j ^DFと記載する。両者は表記が異なっているもの、同一の項を表している。 In the first embodiment described above, the term corresponding to the calculation of the weight of ^{the output Xout DF} from the DF layer is described as Wi _{, j} ^DF ^{Xin DF} j, but in the second embodiment, it is described as _{Wi, j DF Xin DF j.} , Xin ^DF _j _{Wi, j} ^DF . Both have different notations and represent the same term.

ドロップアウト層と全結合層からなるＤＦ層に関して、その出力ベクトルのｉ番目の要素Ｘｏｕｔ^DF _iは、重みＷと入力Ｘｉｎ^DFとの積の和に、バイアス項ｂ_i ^DFを加えたものであり、以下の式のように表される。 Respect DF layer made of the drop-out layer and the total binding layer, i-th element Xout ^DF _i of the output vector, the sum of the product of the weight W and the input Xin ^DF, and plus the bias term b _i ^DF , Is expressed as the following equation.

その中のある１つの項が、他の項より逸脱してその絶対値が大きいピーク項（ｊ＝ｐｅａｋ）である場合、つまり以下の式が成り立つ場合には、２つのガウス分布が混合した混合ガウス分布となる。 If one of the terms deviates from the other term and has a larger absolute value (j = peak), that is, if the following equation holds, then the two Gaussian distributions are mixed. It has a Gaussian distribution.

なお、上記の式の不等号「≫」は、左辺の値が右辺の値より逸脱して大きいことを意味する。 The inequality sign ">>" in the above equation means that the value on the left side deviates from the value on the right side and is larger.

以下、より一般的な場合として、ＤＦ層（例えば、図３のＤＦ１層）からの出力ベクトルＸｏｕｔ^DFの確率分布について、どのように多変量混合ガウス分布として計算されるのかについて説明する。 ^{Hereinafter, as a more general case, how the probability distribution of the output vector Xout DF} from the DF layer (for example, the DF1 layer in FIG. 3) is calculated as a multivariate mixed Gaussian distribution will be described.

第１の実施の形態と全く同様に、ｎ_Xout ^DF次元の出力ベクトルＸｏｕｔ^DFは、ｎ_Xout ^DF個の要素を持つ確率変数ベクトルであり、ｉ番目の要素（１≦ｉ≦ｎ_Xout ^DF）を、Ｘｏｕｔ^DF _iと表す。それぞれの要素Ｘｏｕｔ^DF _iは、以下の式のように、インデックスｊ（１≦ｊ≦ｎ_Xin ^DF）で表されるｎ_Xin ^DF個のｘＷ項を持った式となる。 Exactly as in the first embodiment, the output vector Xout ^DF _{of the n Xout} ^DF dimension is a random variable vector having n _Xout ^DF elements, and the i-th element (1 ≦ i ≦ n _Xout ^DF ) is set. , Xout ^DF _i . Each element Xout ^DF _i _{is an expression having n Xin} ^DF xW terms represented by an index j (1 ≦ j ≦ n _Xin ^DF ) as shown in the following formula.

ここで、前述のピーク項（ｊ＝ｐｅａｋ）とは、ある１つのインデックスｉ番目の行（Ｘｏｕｔ^DF _i）の中で逸脱して大きい値を持つｘＷ項ではなく、１≦ｉ≦ｎ_Xout ^DFの範囲のすべての行の中において、共通のインデックスｊを持つ最も逸脱したｘＷ項の値であり、ｊ番目の列（“カラム”）のことである。そのため、あるインデックスｉで特定される１つの行だけからピーク項を決定することはできず、例えば以下のような手順で、すべての行のインデックスｉを見ながら、ピーク項（ｊ＝ｐｅａｋ）のカラムを見つけ出す必要がある。 Here, the above-mentioned peak term (j = peak) is not an xW term having a large value deviating from the ^{i-th row (Xout DF} _i _{) of a certain index, but 1 ≦ i ≦ n Xout} ^DF. The most deviant xW term value with a common index j in all rows of the range, the jth column (“column”). Therefore, the peak term cannot be determined from only one row specified by a certain index i. For example, the peak term (j = peak) can be determined while looking at the index i of all rows by the following procedure. You need to find the column.

まず、すべてのｎ_Xin ^DF個のカラムに対して、逸脱度を示すカラムの箱ＰｅａｋＳｃｏｒｅ_j（１≦ｊ≦ｎ_Xin ^DF）を用意し、以下のように初期値をゼロとする。 First, for all n _Xin ^DF columns, a column box PeakScore _j (1 ≦ j ≦ n _Xin ^DF ) indicating the degree of deviation is prepared, and the initial value is set to zero as follows.

次に、あるｉ番目の行におけるピーク項を探す。すなわち、ｉ番目の行のすべてのｘＷ_j項（１≦ｊ≦ｎ_Xin ^DF）の平均値ｘＷＭｅａｎ_iを計算する。 Next, look for the peak term in the i-th row. That is, the average value xWMean _i _{of all xW j} terms (1 ≦ j ≦ n _Xin ^DF ) in the i-th row is calculated.

なお、右辺は、あるｉ番目の行において、すべてのインデックスｊのｘＷ_j項の平均値を計算することを意味する。さらに、そのあるｉ番目の行における、それぞれのｘＷ_j項（１≦ｊ≦ｎ_Xin ^DF）について、平均値からどれだけ逸脱しているかを示す値ｘＷＤｅｖｉａｔｉｏｎ_i,jを計算する。この値は、例えば以下の式のように、平均値との差分の絶対値として計算される。 The right side means that _{the average value of the xW j} term of all the indexes j is calculated in the i-th row. _{Further, for each xW j} term (1 ≦ j ≦ n _Xin ^DF ) in the i-th row, the _{values xWDaviation i, j} indicating how much the value deviates from the average value are calculated. This value is calculated as the absolute value of the difference from the average value, for example, as in the following formula.

これにより、あるｉ番目の行におけるｊ番目のｘＷ_j項が平均値からどれだけ逸脱しているかを示すスコア（逸脱度）を計算することができる。すべての行（すべてのインデックスｉ）について上記の計算を行い、累積的に各インデックスｊに対するスコアを蓄積していく。例えば以下のように、ｘＷＤｅｖｉａｔｉｏｎ_i,jの値を前述したカラムの箱ＰｅａｋＳｃｏｒｅ_jに足し合わせる。 As a result, it is possible to calculate a score (deviation degree) indicating how much the _{j-th xW j} term in a certain i-th row deviates from the average value. The above calculation is performed for all rows (all indexes i), and the scores for each index j are cumulatively accumulated. For example, as shown below _{, the values of xWDeviation i,} _j are added to the above-mentioned column box PeakScore j.

以上の計算をすべての行（すべてのインデックスｉ：１≦ｉ≦ｎ_Xout ^DF）について繰り返し、ＰｅａｋＳｃｏｒｅ_jを更新していくことで、最終的に、各カラム（各インデックスｊ）の逸脱度を得ることができる。そして、最終的に得られたＰｅａｋＳｃｏｒｅ_j（１≦ｊ≦ｎ_Xin ^DF）を値が大きいものから順に並べて、ＰｅａｋＳｃｏｒｅ_jが大きい値から順番に所定の個数（例えば、Ｋ個）のインデックスｊを記録する。これにより、Ｋ個のインデックスｊ（ｊ_k=1，ｊ_k=2，…，ｊ_k=K）が、コラムとしてのピーク項ｘＷ_jの候補として特定される。 By repeating the above calculation for all rows (all indexes i: 1 ≦ i ≦ n _Xout ^DF _{) and updating PeakScore j} , the deviation degree of each column (each index j) is finally obtained. be able to. Then, the finally obtained PeakScore _j (1 ≦ j ≦ n _Xin ^DF ) is arranged in order from the one with the largest value, and a predetermined number (for example, K) of indexes j are recorded in order from the value with the largest _{PeakScore j.} To do. As a result, K indexes j (j _{k = 1} , j _{k = 2} , ..., J _{k = K} ) are specified as candidates for the peak term xW _{j as a column.}

次に、それぞれのピーク項ｘＷ_jに対して、ドロップアウトされた場合／ドロップアウトされなかった場合の組み合わせを考え、混合ガウス分布を作成する。Ｋ個のピーク項を考慮した場合には、混合ガウス分布の混合数は２^K個となる。 Next, for each peak term xW _j , a mixed Gaussian distribution is created by considering the combination of the case where the dropout is performed and the case where the peak term is not dropped out. When K peak terms are taken into consideration, the number of mixed Gaussian distributions is 2 ^K.

なお、ピーク項として記録する個数（Ｋ個）が大きいほど、正確に真の確率分布を計算することができるが、一方、Ｋの値を大きくすれば計算負荷が大きくなってしまう。したがって、Ｋの値は、計算負荷とのトレードオフにより、計算処理できる範囲でユーザが事前に指定してもよい。ピーク項の個数（Ｋの値）は１又は２以上の整数とすることが可能であり、また、ピーク項の個数（Ｋの値）をゼロとした場合は、上述した本発明の第１の実施の形態と同様の計算となる。 The larger the number (K) recorded as the peak term, the more accurately the true probability distribution can be calculated, but on the other hand, the larger the value of K, the larger the calculation load. Therefore, the value of K may be specified in advance by the user within a range in which calculation processing can be performed, due to a trade-off with the calculation load. The number of peak terms (value of K) can be an integer of 1 or 2 or more, and when the number of peak terms (value of K) is zero, the first aspect of the present invention described above. The calculation is the same as that of the embodiment.

以下、第１の実施の形態に係る計算において、Ｋ個のすべてのピーク項ｘＷ_j（ｊ＝ｊ_K=1，ｊ_K=2，…，ｊ_k=K）についてドロップアウトされた場合／ドロップアウトされなかった場合を考慮し、それぞれの場合における条件確率の下でガウス分布として近似した出力Ｘｏｕｔ^DFの確率分布を計算する計算方法について、具体的な例を用いて説明する。 Hereinafter, in the calculation according to the first embodiment, when all K peak terms xW _j (j = j _{K = 1} , j _{K = 2} , ..., J _{k = K} ) are dropped out / dropped. A calculation method for calculating the probability distribution of ^{the output Xout DF} approximated as a Gaussian distribution under the conditional probability in each case will be described using a specific example in consideration of the case where the output is not out.

ここでは、具体例としてピーク項の個数を２個（Ｋ＝２）とし、前述のＰｅａｋＳｃｏｒｅ_jから計算された２つのピーク項ｘＷ_jのインデックスｊ（ｊ＝ｊ_K=1，ｊ_K=2）が、ｊ_K=1＝３、ｊ_K=2＝５であった場合を考える。すなわち、ピーク項はｘＷ_j=3とｘＷ_j=5である。 Here, as a specific example, the number of peak terms is set to 2 (K = 2), and the index j (j = j _{K = 1} , j _{K = 2} ) of the two peak terms xW _j _{calculated from the above-mentioned PeakScore j).} However, consider the case where _{j K = 1} = 3 and j _{K = 2 = 5.} That is, the peak terms are xW _{j = 3} and xW _{j = 5} .

２つのピーク項ｘＷ_j=3、ｘＷ_j=5がドロップアウトされた場合／ドロップアウトされなかった場合の組み合わせは、下記のケース（１）〜（４）の２^K=2＝４通り存在する。 ^{There are 4 combinations of 2 K = 2} = 4 in the following cases (1) to (4) when the two peak terms xW _{j = 3} and xW _{j = 5 are dropped out / not dropped out.} ..

（１）ｘＷ_j=3がドロップアウトされた、ｘＷ_j=5がドロップアウトされた
（２）ｘＷ_j=3がドロップアウトされた、ｘＷ_j=5がドロップアウトされなかった
（３）ｘＷ_j=3がドロップアウトされなかった、ｘＷ_j=5がドロップアウトされた
（４）ｘＷ_j=3がドロップアウトされなかった、ｘＷ_j=5がドロップアウトされなかった (1) xW _{j = 3} was dropped out, xW _{j = 5} was dropped out (2) xW _{j = 3} was dropped out, xW _{j = 5} was not dropped out (3) xW _{j = 3} was not dropped out, xW _{j = 5} was dropped out (4) xW _{j = 3} was not dropped out, xW _{j = 5} was not dropped out

上記４つのケース（１）〜（４）を考慮すると、出力Ｘｏｕｔ^DFの確率分布は４つの多変量混合ガウス分布となる。ケース（１）〜（４）のそれぞれのケースが起こり得る確率は、ＤＦ層におけるドロップアウト率をｐ_Drop ^DFとすると、以下のようになる。 Considering the above four cases (1) to (4), ^{the probability distribution of the output Xout DF} is a four multivariate mixed Gaussian distribution. The probabilities that each of the cases (1) to (4) can occur are as follows, where the dropout rate in the DF layer is p _Drop ^DF .

ピーク項に対応するインデックスｊ_K=1＝３、ｊ_K=2＝５以外のすべてのインデックスｊにおけるｘｉｎ^DF _jＷ_i,j ^DF項（１≦ｊ、ｊ≠３,ｊ≠５≦ｎ_Xin ^DF）は、ドロップアウトにより消えたり残ったりとゆらゆら変化する確率変数である。一方、ピーク項ｘｉｎ^DF _j=3Ｗ_i,j=3 ^DFとｘｉｎ^DF _j=5Ｗ_i,j=5 ^DFは、それぞれの項がドロップアウトされた場合／ドロップアウトされなかった場合を考えるので、それぞれの条件下での固定値として取り扱うことができる。このことから、第２の実施の形態では、第１の実施の形態に係る計算において、あるｉ番目の行における確率変数として考えるｘｉｎ^DF _jＷ_i,j ^DF項群のうち、ピーク項ｘｉｎ^DF _j=3Ｗ_i,j=3 ^DFとピーク項ｘｉｎ^DF _j=5Ｗ_i,j=5 ^DFは取り除いて、以下のように計算する。 _Xin ^DF _j _{Wi, j} ^DF term (1 ≤ j, j ≠ 3, j ≠ 5 ≤ n Xin) in all indexes j except _{index j K = 1} = 3, j _{K = 2} = 5 corresponding to the peak term ^DF ) is a random variable that fluctuates as it disappears or remains due to a dropout. On the other hand, the peak terms xin ^DF _{j = 3} _{Wi, j = 3} ^DF and xin ^DF _{j = 5} _{Wi, j = 5} ^DF consider the case where each term is dropped out / not dropped out. , Can be treated as a fixed value under each condition. Therefore, in the second embodiment, the peak term xin ^{DF of the} ^{xin DF} _j _{Wi, j} ^DF term group considered as a random variable in the i-th row in the calculation according to the first embodiment. _{Remove j = 3} _{Wi, j = 3} ^DF and peak term xin ^DF _{j = 5} _{Wi, j = 5} ^DF , and calculate as follows.

したがって、ケース（１）〜（４）のそれぞれの場合において、平均値は以下のようになる。 Therefore, in each of the cases (1) to (4), the average value is as follows.

また、分散値は、以下のように、第１の実施の形態と同様の式で計算できる。 Further, the variance value can be calculated by the same formula as in the first embodiment as follows.

ただし、ＬｉｓｔＷ^DFｘ^DF _iに関して、２つのピーク項を確率変数でなく定数として取り扱うため、バイアス項と同様にピーク項ｘｉｎ^DF _j=3Ｗ_i,j=3 ^DFとピーク項ｘｉｎ^DF _j=5Ｗ_i,j=5 ^DFを無視することができる。したがって、以下の式のように、ピーク項に対応するインデックスｊ_K=1＝３、ｊ_K=2＝５を除くｘｉｎ^DF _jＷ_i,j ^DF項のリストＬｉｓｔＷ^DFｘ^DF _{j≠3,j≠5,i}を計算に使用する。 However, regarding ListW ^DF x ^DF _i , since the two peak terms are treated as constants instead of random variables, the peak terms xin ^DF _{j = 3} _{Wi, j = 3} ^DF and the peak terms xin ^DF _{j = 5 are treated in the same way as the bias term.} _{Wi, j = 5} ^DF can be ignored. Accordingly, as shown in the following expression, the index _{j K = 1 = 3, j} K = xin DF j W i except 2 = _{5, j} list of ^DF claim ListW ^{^DF} x ^DF _{j ≠ 3} corresponding to the peak _{section, j ≠ 5, i} is used in the calculation.

このようにピーク項を除いたＬｉｓｔＷ^DFｘ^DF _iを使用して、前述した式から分散値Ｖａｒ（Ｘｏｕｔ^DF _i）を求める。分散値Ｖａｒ（Ｘｏｕｔ^DF _i）は、ケース（１）〜（４）において、すべて同じ値となる。 ^{Using ListW DF} x ^DF _i excluding the peak term in this way, the variance value Var (Xout ^DF _i ) is obtained from the above equation. The variance value Var (Xout ^DF _i ) is the same value in all cases (1) to (4).

また、共分散値も第１の実施の形態と同様に求められる。 Further, the covariance value is also obtained as in the first embodiment.

共分散値は、ケース（１）〜（４）において、すべて同じ値となる。 The covariance values are all the same in cases (1) to (4).

最終的に、分散共分散行列は、すべてのケース（１）〜（４）において同じ値となる。 Finally, the covariance matrix has the same value in all cases (1)-(4).

以上、４つのケース（１）〜（４）について、各ケースが起こり得る確率値と、各ケースにおける平均値、分散値、共分散値が計算できる。これらを単純に確率値を重みとして足し合わせることで、以下の式のように、４つのガウス分布を混合させた多変量混合ガウス分布として、出力値の確率分布を計算することができる。 As described above, for each of the four cases (1) to (4), the probability value at which each case can occur and the average value, variance value, and covariance value in each case can be calculated. By simply adding these as weights to the probability values, the probability distribution of the output value can be calculated as a multivariate mixed Gaussian distribution in which four Gaussian distributions are mixed, as shown in the following equation.

また、第１の実施の形態では、出力値の確率分布ｑ_φ（ｚ｜ｘ）が正則化の条件を満たすかを判定するために、多変量ガウス分布である確率分布ｑ_φ（ｚ｜ｘ）と事前分布ｐ_θ（ｚ）とのＫＬダイバージェンスを計算している。一方、第２の実施の形態では、出力値の確率分布ｑ_φ（ｚ｜ｘ）が混合ガウス分布である。混合ガウスのＫＬダイバージェンスの計算には、解析的解は存在しないが、非特許文献２に挙げられるような、変分近似（Variational Approximation）法など、様々な近似計算法で計算することができる。 In the first embodiment, the probability of the output value distribution q _phi | to (z x) to determine whether the condition is satisfied regularization probability multivariate Gaussian distribution q φ _(z | x ) And the prior distribution p _θ (z) are calculated for KL divergence. On the other hand, in the second embodiment, the probability distribution q _φ (z | x) of the output value is a mixed Gaussian distribution. There is no analytical solution for the calculation of KL divergence of mixed gauss, but it can be calculated by various approximate calculation methods such as the variational approximation method as mentioned in Non-Patent Document 2.

以上の第２の実施の形態に係る計算方法により、第１の実施の形態の拡張として、潜在空間でのｚの値の確率分布ｑ_φ（ｚ｜ｘ）を多変量混合ガウス分布として計算することができる。その計算結果として、図１２及び図１３に、それぞれ、ピーク項の個数を４個（Ｋ＝４）として、２^K=4＝１６個のガウス分布からなる混合ガウス分布で、潜在変数の潜在空間でのｚの値の確率分布ｑ_φ（ｚ｜ｘ）を２次元でプロットさせたものを示す。この場合、入力画像としては、図の右上に小さく示される文字「Ｈ」の画像を入れた。図９に示すガウス分布のときと同様に、モンテカルロの分布（散布図や１次元のヒストグラム）と解析的分布（２次元の等高線、１次元の関数の形状）が一致し、解析的に分布を混合ガウスとして計算できていることがわかる。 By the above calculation method according to the second embodiment, as an extension of the first embodiment, the probability distribution q _φ (z | x) of the value of z in the latent space is calculated as a multivariate mixed Gaussian distribution. be able to. As a result of the calculation, in FIGS. 12 and 13, the latent space of the latent variable is a mixed Gaussian distribution consisting of ^{2 K = 4} = 16 Gaussian distributions, where the number of peak terms is 4 (K = 4), respectively. _{The probability distribution q φ} (z | x) of the value of z in is plotted in two dimensions. In this case, as the input image, an image of the small letter "H" shown in the upper right of the figure was inserted. Similar to the Gaussian distribution shown in FIG. 9, the Monte Carlo distribution (scatter plot and one-dimensional histogram) and the analytical distribution (two-dimensional contour lines and one-dimensional function shape) match, and the distribution is analytically distributed. It can be seen that it can be calculated as a mixed gauss.

また、図５に図示されているように複数のドロップアウト層を設けた場合においても、混合ガウス分布の条件確率下でのそれぞれのガウス分布について個別に第１の実施の形態と同様の計算を行うことで、出力値ｑ_φ（ｚ｜ｘ）の確率分布を計算することができる。ただし、エンコーダに設けられたＤＦ層で計算が行われるとガウス分布がさらに複数の混合ガウス分布に分かれるため、複数の各ＤＦ層を伝搬するごとに、混合数がどんどん増えていく。そのため、例えば既存の技術を利用して、似ている混合ガウス分布同士を融合させるなどの処理を行うことで、ガウス分布の混合数を低減させながら計算を行ってもよい。 Further, even when a plurality of dropout layers are provided as shown in FIG. 5, the same calculation as in the first embodiment is individually performed for each Gaussian distribution under the conditional probability of the mixed Gaussian distribution. By doing so, the probability distribution of the output value q _φ (z | x) can be calculated. However, when the calculation is performed on the DF layer provided in the encoder, the Gaussian distribution is further divided into a plurality of mixed Gaussian distributions, so that the number of mixturees increases steadily as each of the plurality of DF layers propagates. Therefore, for example, the calculation may be performed while reducing the mixed number of Gaussian distributions by performing processing such as fusing similar mixed Gaussian distributions by using an existing technique.

また、本発明の第２の実施の形態に係る情報推定装置は、本発明の第１の実施の形態に係る情報推定装置の構成（図６に図示されている構成）を拡張することによって実現可能である。例えば、オートエンコーダ計算部２０に、ＤＦ層の出力値Ｘｏｕｔ^DF _iを計算する際に現れる重みＷと入力Ｘｉｎ^DFとの積であるｘＷ項のピーク項を決定する機能を有するデータ解析部を設ければよい。そして、オートエンコーダ計算部２０が、データ解析部で特定されたＫ個のピーク項について上述した計算を実行するよう拡張されることで、潜在空間において、多変量混合ガウス分布に従ったｚの値を出力することが可能となる。また、正則化の条件に係る計算についても、オートエンコーダ計算部２０が上述した計算を実行するよう拡張されればよい。 Further, the information estimation device according to the second embodiment of the present invention is realized by expanding the configuration (configuration shown in FIG. 6) of the information estimation device according to the first embodiment of the present invention. It is possible. For example, the autoencoder calculation unit 20 is provided with a data analysis unit having a function of determining the peak term of the xW term, which is the product of the weight W appearing when calculating ^{the output value Xout DF} _i ^{of the DF layer and the input Xin DF.} Just do it. Then, the autoencoder calculation unit 20 is expanded to execute the above-mentioned calculation for the K peak terms specified by the data analysis unit, so that the value of z according to the multivariate mixed Gaussian distribution in the latent space is executed. Can be output. Further, the calculation related to the regularization condition may be extended so that the autoencoder calculation unit 20 executes the above-mentioned calculation.

本発明は、ニューラルネットワークを使用した推定技術に適用可能であり、確率的要素を備えた新たなオートエンコーダを実現することが可能である。 The present invention is applicable to an estimation technique using a neural network, and it is possible to realize a new autoencoder having a stochastic element.

１０情報推定装置
２０オートエンコーダ計算部
３０エンコーダ出力分布形状計算部
４０コスト関数計算部
５０パラメータ最適化計算部 10 Information estimation device 20 Autoencoder calculation unit 30 Encoder output distribution shape calculation unit 40 Cost function calculation unit 50 Parameter optimization calculation unit

Claims

ニューラルネットワークを使用して推定処理を行う情報推定装置であって、
エンコーダ及びデコーダにより構成されたオートエンコーダを備え、前記オートエンコーダに入力された入力データに基づいて前記エンコーダ及び前記デコーダで順次計算処理を行い、前記推定処理の結果として前記オートエンコーダから出力データを出力するよう構成されているオートエンコーダ計算部を有し、
データの一部をドロップアウトさせるドロップアウト層と、前記ドロップアウト層から出力されたデータに対して重みの計算を行う全結合層との組み合わせからなる少なくとも１つの一体化層を、前記エンコーダの最終層として設けることで、前記エンコーダからの出力値である潜在空間での出力値が多次元確率変数ベクトルとなるように構成されている情報推定装置。 An information estimation device that performs estimation processing using a neural network.
An autoencoder composed of an encoder and a decoder is provided, calculation processing is sequentially performed by the encoder and the decoder based on input data input to the autoencoder, and output data is output from the autoencoder as a result of the estimation processing. Has an autoencoder calculator that is configured to
The final encoder has at least one integrated layer consisting of a combination of a dropout layer that drops out part of the data and a fully coupled layer that calculates weights for the data output from the dropout layer. An information estimation device configured to be provided as a layer so that the output value in the latent space, which is the output value from the encoder, becomes a multidimensional random variable vector.

前記オートエンコーダ計算部は、前記ドロップアウト層において、あらかじめ定められたドロップアウト率に従って前記一体化層に入力されるデータの一部をドロップアウトさせるとともに、前記全結合層において、前記ドロップアウト層から出力されたデータのベクトルの値に重みの行列を掛けた項のリストの和にバイアスを加えた値を計算するよう構成されており、
前記リストに含まれる各項のうちの一部が、前記ドロップアウト率に従ってゼロとなる請求項１に記載の情報推定装置。 The autoencoder calculation unit drops out a part of the data input to the integrated layer in the dropout layer according to a predetermined dropout rate, and in the fully connected layer, from the dropout layer. It is configured to calculate a biased value of the sum of the list of terms obtained by multiplying the vector value of the output data by a matrix of weights.
The information estimation device according to claim 1, wherein a part of each item included in the list becomes zero according to the dropout rate.

前記一体化層に入力される前記データ、前記ドロップアウト率、前記重み、前記バイアスに基づいて、前記潜在空間での出力値である多次元確率変数ベクトルが従う確率分布の平均値、分散値、共分散値を計算するエンコーダ出力分布形状計算部を有する請求項２に記載の情報推定装置。 Based on the data, the dropout rate, the weight, and the bias input to the integration layer, the average value and the variance value of the probability distribution followed by the multidimensional random variable vector which is the output value in the latent space, The information estimation device according to claim 2, further comprising an encoder output distribution shape calculation unit for calculating a covariance value.

エンコーダ出力分布形状計算部は、
前記リストに含まれる各項の和にドロップアウトされずに残る比率を掛けて、さらにバイアスを加えることで、前記リストの和が従う分布の平均値を計算し、
前記リストの分散値を計算して標本平均の分散値計算を行うことで、前記リストの和が従う分布の分散値を計算し、
前記リストの和が従う分布の前記分散値から、前記リストの和のある２つの要素の相関を示す共分散値を計算し、
前記平均値、前記分散値、前記共分散値から、前記潜在空間での出力値である多次元確率変数ベクトルが従う確率分布の形状を解析的に計算するよう構成されている請求項３に記載の情報推定装置。 Encoder output distribution shape calculation unit
By multiplying the sum of each term included in the list by the ratio that remains without being dropped out and further biasing, the average value of the distribution that the sum of the list follows is calculated.
By calculating the variance value of the list and performing the variance value calculation of the sample mean, the variance value of the distribution according to the sum of the list is calculated.
From the variance value of the distribution followed by the sum of the list, the covariance value showing the correlation of the two elements with the sum of the list is calculated.
The third aspect of claim 3, wherein the shape of the probability distribution followed by the multidimensional random variable vector, which is the output value in the latent space, is analytically calculated from the average value, the variance value, and the covariance value. Information estimation device.

前記潜在空間での出力値である多次元確率変数ベクトルが従う確率分布が事前分布と同じ形状に留まるよう正則化させる正則化処理と、前記オートエンコーダから出力される前記出力データが前記オートエンコーダに入力される前記入力データを復元する復元処理とを評価するコスト関数を計算するコスト関数計算部と、
前記コスト関数に基づいて、前記正則化処理及び前記復元処理を最適化するパラメータを計算し、前記オートエンコーダの計算で用いられるパラメータを前記最適化パラメータで更新するパラメータ最適化計算部とを、
有する請求項１から４のいずれか１つに記載の情報推定装置。 Regularization processing that makes the probability distribution followed by the multidimensional random variable vector, which is the output value in the latent space, stay in the same shape as the prior distribution, and the output data output from the autoencoder to the autoencoder. A cost function calculation unit that calculates a cost function that evaluates the restoration process that restores the input data, and
A parameter optimization calculation unit that calculates parameters for optimizing the regularization process and the restoration process based on the cost function and updates the parameters used in the calculation of the autoencoder with the optimization parameters.
The information estimation device according to any one of claims 1 to 4.

前記一体化層から出力されるデータである多次元確率変数ベクトルデータの各要素を計算する際に使用される、前記ドロップアウト層から出力されたデータのベクトルの値に重みの行列を掛けた項のリストにおいて、前記多次元確率変数ベクトルの各要素に含まれる共通のインデックスで特定される項を参照し、他のインデックスで特定される項よりも大きい値を持つ項のインデックスを所定の個数だけ抽出して、他の項よりも大きい値を持つピーク項として特定するデータ解析部を有し、
前記オートエンコーダ計算部は、前記ピーク項が前記ドロップアウト層においてドロップアウトされる場合と、前記ピーク項が前記ドロップアウト層においてドロップアウトされない場合とに分けて、それぞれの場合のガウス分布の平均値、分散値、共分散値を計算し、さらに、それぞれの場合が起こる確率値を用いて、それぞれの場合のガウス分布の混合和を計算することで、多変量混合ガウス分布を算出するよう構成されている請求項２に記載の情報推定装置。 A term obtained by multiplying the vector value of the data output from the dropout layer by a weight matrix, which is used when calculating each element of the multidimensional random variable vector data which is the data output from the integrated layer. In the list of, the terms specified by the common index included in each element of the multidimensional random variable vector are referred to, and the indexes of the terms having a value larger than the terms specified by the other indexes are specified by a predetermined number. It has a data analysis unit that extracts and identifies as a peak term with a value larger than the other terms.
The auto-encoder calculation unit divides the case where the peak term is dropped out in the dropout layer and the case where the peak term is not dropped out in the dropout layer, and the average value of the Gaussian distribution in each case. , Variance value, and covariance value are calculated, and the mixture of Gaussian distributions in each case is calculated using the probability value that each case occurs, so that the multivariate mixed Gaussian distribution is calculated. The information estimation device according to claim 2.

ニューラルネットワークを使用して推定処理を行う情報推定装置で行われる情報推定方法であって、
エンコーダ及びデコーダにより構成されたオートエンコーダを用いて、前記オートエンコーダに入力された入力データに基づいて前記エンコーダ及び前記デコーダで順次計算処理を行い、前記推定処理の結果として前記オートエンコーダから出力データを出力するオートエンコーダ計算ステップを有し、
データの一部をドロップアウトさせるドロップアウト層と、前記ドロップアウト層から出力されたデータに対して重みの計算を行う全結合層との組み合わせからなる少なくとも１つの一体化層を、前記エンコーダの最終層として設けることで、前記エンコーダからの出力値である潜在空間での出力値を多次元確率変数ベクトルとする情報推定方法。 It is an information estimation method performed by an information estimation device that performs estimation processing using a neural network.
Using an autoencoder composed of an encoder and a decoder, the encoder and the decoder sequentially perform calculation processing based on the input data input to the autoencoder, and as a result of the estimation processing, output data from the autoencoder is output. Has an autoencoder calculation step to output
The final encoder has at least one integrated layer consisting of a combination of a dropout layer that drops out part of the data and a fully coupled layer that calculates weights for the data output from the dropout layer. An information estimation method in which an output value in a latent space, which is an output value from the encoder, is used as a multidimensional random variable vector by providing it as a layer.

前記オートエンコーダ計算ステップは、前記ドロップアウト層において、あらかじめ定められたドロップアウト率に従って前記一体化層に入力されるデータの一部をドロップアウトさせるとともに、前記全結合層において、前記ドロップアウト層から出力されたデータのベクトルの値に重みの行列を掛けた項のリストの和にバイアスを加えた値を計算し、
前記リストに含まれる各項のうちの一部が、前記ドロップアウト率に従ってゼロとなる請求項７に記載の情報推定方法。 In the autoencoder calculation step, a part of the data input to the integrated layer is dropped out in the dropout layer according to a predetermined dropout rate, and in the fully connected layer, from the dropout layer. Calculate the sum of the list of terms obtained by multiplying the vector value of the output data by the weight matrix and biasing the value.
The information estimation method according to claim 7, wherein a part of each item included in the list becomes zero according to the dropout rate.

前記一体化層に入力される前記データ、前記ドロップアウト率、前記重み、前記バイアスに基づいて、前記潜在空間での出力値である多次元確率変数ベクトルが従う確率分布の平均値、分散値、共分散値を計算するエンコーダ出力分布形状計算ステップを有する請求項８に記載の情報推定方法。 Based on the data, the dropout rate, the weight, and the bias input to the integration layer, the average value and the variance value of the probability distribution followed by the multidimensional random variable vector which is the output value in the latent space, The information estimation method according to claim 8, further comprising an encoder output distribution shape calculation step for calculating a covariance value.

エンコーダ出力分布形状計算ステップは、
前記リストに含まれる各項の和にドロップアウトされずに残る比率を掛けて、さらにバイアスを加えることで、前記リストの和が従う分布の平均値を計算するステップと、
前記リストの分散値を計算して標本平均の分散値計算を行うことで、前記リストの和が従う分布の分散値を計算するステップと、
前記リストの和が従う分布の前記分散値から、前記リストの和のある２つの要素の相関を示す共分散値を計算するステップと、
前記平均値、前記分散値、前記共分散値から、前記潜在空間での出力値である多次元確率変数ベクトルが従う確率分布の形状を解析的に計算するステップとを、
有する請求項９に記載の情報推定方法。 Encoder output distribution shape calculation step
The step of calculating the average value of the distribution that the sum of the list follows by multiplying the sum of each term included in the list by the ratio that remains without being dropped out and further biasing it.
By calculating the variance value of the list and calculating the variance value of the sample mean, the step of calculating the variance value of the distribution according to the sum of the list, and
A step of calculating a covariance value indicating the correlation between two elements having a sum of the list from the variance value of the distribution followed by the sum of the list.
A step of analytically calculating the shape of the probability distribution followed by the multidimensional random variable vector, which is the output value in the latent space, from the average value, the variance value, and the covariance value.
The information estimation method according to claim 9.

前記潜在空間での出力値である多次元確率変数ベクトルが従う確率分布が事前分布と同じ形状に留まるよう正則化させる正則化処理と、前記オートエンコーダから出力される前記出力データが前記オートエンコーダに入力される前記入力データを復元する復元処理とを評価するコスト関数を計算するコスト関数計算ステップと、
前記コスト関数に基づいて、前記正則化処理及び前記復元処理を最適化するパラメータを計算し、前記オートエンコーダの計算で用いられるパラメータを前記最適化パラメータで更新するパラメータ最適化計算ステップとを、
有する請求項７から１０のいずれか１つに記載の情報推定方法。 Regularization processing that makes the probability distribution followed by the multidimensional random variable vector, which is the output value in the latent space, stay in the same shape as the prior distribution, and the output data output from the autoencoder to the autoencoder. A cost function calculation step that calculates a cost function that evaluates the restoration process that restores the input data that is input, and
A parameter optimization calculation step of calculating the parameters for optimizing the regularization process and the restoration process based on the cost function and updating the parameters used in the calculation of the autoencoder with the optimization parameters.
The information estimation method according to any one of claims 7 to 10.

前記一体化層から出力されるデータである多次元確率変数ベクトルデータの各要素を計算する際に使用される、前記ドロップアウト層から出力されたデータのベクトルの値に重みの行列を掛けた項のリストにおいて、前記多次元確率変数ベクトルの各要素に含まれる共通のインデックスで特定される項を参照し、他のインデックスで特定される項よりも大きい値を持つ項のインデックスを所定の個数だけ抽出して、他の項よりも大きい値を持つピーク項として特定するデータ解析ステップを有し、
前記オートエンコーダ計算ステップは、前記ピーク項が前記ドロップアウト層においてドロップアウトされる場合と、前記ピーク項が前記ドロップアウト層においてドロップアウトされない場合とに分けて、それぞれの場合のガウス分布の平均値、分散値、共分散値を計算し、さらに、それぞれの場合が起こる確率値を用いて、それぞれの場合のガウス分布の混合和を計算することで、多変量混合ガウス分布を算出する請求項８に記載の情報推定方法。 A term obtained by multiplying the vector value of the data output from the dropout layer by a weight matrix, which is used when calculating each element of the multidimensional random variable vector data which is the data output from the integrated layer. In the list of, the terms specified by the common index included in each element of the multidimensional random variable vector are referred to, and the indexes of the terms having a value larger than the terms specified by the other indexes are specified by a predetermined number. It has a data analysis step to extract and identify as a peak term with a value greater than the other terms.
The auto-encoder calculation step is divided into a case where the peak term is dropped out in the dropout layer and a case where the peak term is not dropped out in the dropout layer, and the average value of the Gaussian distribution in each case. , Variance value, covariance value, and further, using the probability value at which each case occurs, the mixture of Gaussian distributions in each case is calculated to calculate the multivariate mixed Gaussian distribution. Information estimation method described in.