JP4993339B2

JP4993339B2 - Latent class extracting method and apparatus, latent class extracting program and recording medium therefor

Info

Publication number: JP4993339B2
Application number: JP2006093842A
Authority: JP
Inventors: 啓一郎帆足; 正柳原; 史昭菅谷
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2006-03-30
Filing date: 2006-03-30
Publication date: 2012-08-08
Anticipated expiration: 2026-03-30
Also published as: JP2007272291A

Description

本発明は、潜在クラス抽出方法および装置ならびに潜在クラス抽出プログラムおよびその記録媒体に係り、特に、多次元行列で表現された分析データを次元圧縮して近似行列に変換し、この近似行列から潜在クラスを抽出する方法および装置ならびにプログラムおよびその記録媒体に関する。 The present invention relates to a latent class extraction method and apparatus, a latent class extraction program, and a recording medium thereof, and more particularly to analysis data expressed in a multidimensional matrix and dimensionally compresses it to convert it into an approximate matrix. The present invention relates to a method and apparatus for extracting a program, a program, and a recording medium thereof.

マーケティング分析などにおいて、顧客の購買履歴などのデータを大量に分析するための手法として潜在クラス分析が知られている。潜在クラス分析とは、分析対象データに潜在的に存在するクラスを抽出する統計的分析手法であり、たとえば顧客を購買履歴に基づいて複数のカテゴリに分類する場合などに利用できる。 In marketing analysis and the like, latent class analysis is known as a technique for analyzing a large amount of data such as customer purchase history. Latent class analysis is a statistical analysis technique for extracting classes that potentially exist in analysis target data, and can be used, for example, when customers are classified into a plurality of categories based on purchase history.

この潜在クラス分析の代表的な手法としてEMアルゴリズムが知られている。EMアルゴリズムは、測定可能な「顕在変数」をもとに、この顕在変数の隠れた要因となる「潜在変数」を推測するためのアルゴリズムであり、非特許文献１において詳細に論じられている。 An EM algorithm is known as a representative method of this latent class analysis. The EM algorithm is an algorithm for estimating a “latent variable” that is a hidden factor of an actual variable based on a measurable “existent variable”, and is discussed in detail in Non-Patent Document 1.

また、EMアルゴリズムの問題として知られている初期値依存性を解決する手法として、非特許文献２では「確定的アニーリングEMアルゴリズム」が提案されている。 As a technique for solving the initial value dependency known as a problem of the EM algorithm, Non-Patent Document 2 proposes a “deterministic annealing EM algorithm”.

上記の各アルゴリズムなどによって得られる潜在クラス抽出結果は、たとえばマーケティング分析システムなどに応用することができる。たとえば、本出願人による特許出願の特許文献１では、市場データから抽出される因子データをもとにマーケティング分析を支援するシステムが提案されている。
A.P. Dempster, N.M. Laird, D.B. Rubin: Maximum likelihood from incomplete data via the EM algorithm, Journal of Royal Statistic Society, Series B39, pp. 1-38, 1976. 上田、中野：確定的アニーリングEMアルゴリズム、信学論D-II、Vol. J80-D-II, No. 1, pp. 267-276, 1997. 特開２００１−１６７２０３号公報 The latent class extraction result obtained by each of the above algorithms can be applied to, for example, a marketing analysis system. For example, in Patent Document 1 of a patent application filed by the present applicant, a system that supports marketing analysis based on factor data extracted from market data is proposed.
AP Dempster, NM Laird, DB Rubin: Maximum likelihood from incomplete data via the EM algorithm, Journal of Royal Statistic Society, Series B39, pp. 1-38, 1976. Ueda, Nakano: Deterministic annealing EM algorithm, theory of theory D-II, Vol. J80-D-II, No. 1, pp. 267-276, 1997. JP 2001-167203 A

上記した従来のEMアルゴリズムなどの潜在クラス分析の手法は、分析対象データが増えるにつれて計算に要する時間が膨大になる。また、EMアルゴリズムには初期値に依存して潜在クラス抽出結果が局所解に陥ってしまう問題が知られているが、特に、分析対象データ内の偏りが多い場合、その傾向が顕著になり、有意な分析結果を得ることが困難である。 In the method of latent class analysis such as the conventional EM algorithm described above, the time required for calculation becomes enormous as the analysis target data increases. In addition, the EM algorithm is known to have a problem that the latent class extraction result falls into a local solution depending on the initial value, especially when there is a lot of bias in the analysis target data, the tendency becomes remarkable, It is difficult to obtain a significant analysis result.

一方、確定的アニーリングEMアルゴリズムによれば、EMアルゴリズムの初期値依存性はある程度解決できるものの、計算時間がEMアルゴリズムのさらに数倍かかるので実用化は困難である。 On the other hand, according to the deterministic annealing EM algorithm, although the initial value dependency of the EM algorithm can be solved to some extent, it is difficult to put it to practical use because the calculation time is several times longer than the EM algorithm.

本発明の目的は、上記した従来技術の課題を解決し、短時間での潜在クラス分析を可能にする潜在クラス抽出方法および装置ならびに潜在クラス抽出プログラムおよびその記録媒体を提供することにある。 An object of the present invention is to provide a latent class extracting method and apparatus, a latent class extracting program, and a recording medium thereof that solve the above-described problems of the prior art and enable latent class analysis in a short time.

上記した目的を達成するために、本発明の潜在クラス抽出装置は、以下のような手段を有することを特徴とする。
(1)多次元行列で表現された分析対象のデータ行列を次元圧縮する次元圧縮手段と、前記次元圧縮により得られた近似行列の各成分を離散化する離散化手段と、前記離散化後の近似行列から潜在クラスを抽出する潜在クラス抽出手段とを含むことを特徴とする。
(2)分析対象のデータ行列を特異値分解する特異値分解手段をさらに含み、次元圧縮手段は、特異値分解の結果に基づいてデータ行列を次元圧縮することを特徴とする。
(3)離散化手段が、特異値の大きい成分ほど存在確率が高くなるように各成分を離散化することを特徴とする。 In order to achieve the above object, the latent class extracting apparatus of the present invention is characterized by having the following means.
(1) dimensional compression means for dimensionally compressing a data matrix to be analyzed expressed in a multidimensional matrix, discretization means for discretizing each component of the approximate matrix obtained by the dimensional compression, and after the discretization And latent class extracting means for extracting the latent class from the approximate matrix.
(2) It further includes singular value decomposition means for singular value decomposition of the data matrix to be analyzed, and the dimension compression means is characterized in that the data matrix is dimensionally compressed based on the result of singular value decomposition.
(3) The discretization means discretizes each component so that a component having a larger singular value has a higher existence probability.

本発明によれば、以下のような効果が達成される。
上記した特徴(1)によれば、分析対象のデータ行列に予め次元圧縮が実施されて近似行列に圧縮され、データ量を削減された後に潜在クラス抽出が行われるので、潜在クラス抽出に要する処理時間を短縮できる。 According to the present invention, the following effects are achieved.
According to the above feature (1), since the dimension compression is performed on the data matrix to be analyzed in advance and compressed into an approximate matrix, and the latent class extraction is performed after the data amount is reduced, the processing required for the latent class extraction You can save time.

上記した特徴(2)によれば、分析対象のデータ行列が特異値分解の結果に基づいて次元圧縮されるので、圧縮された近似行列では関連性の深い成分同士が一つの次元にまとまった概念空間が生成される。したがって、分析対象データに局所的な偏りがある場合でも潜在クラス抽出の結果が局所解に陥ってしまうことを防止できる。 According to the above feature (2), since the data matrix to be analyzed is dimensionally compressed based on the result of singular value decomposition, the closely related components are grouped in one dimension in the compressed approximate matrix. A space is created. Therefore, even when there is a local bias in the analysis target data, it is possible to prevent the latent class extraction result from falling into a local solution.

上記した特徴(3)によれば、次元圧縮により得られた近似行列では、特異値の大きい成分ほど存在確率が高くなるように各成分が離散化されるので、特異値の算出結果を反映した離散化が可能になる。 According to the above feature (3), in the approximate matrix obtained by dimensional compression, each component is discretized so that the existence probability of the component having a larger singular value is higher, so the calculation result of the singular value is reflected. Discretization is possible.

以下、図面を参照して本発明の最良の実施の形態について詳細に説明する。図１は、本発明に係る潜在クラス抽出装置の主要部の構成を示したブロック図である。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the best embodiment of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a main part of a latent class extraction device according to the present invention.

分析対象データベース(DB)１には、潜在クラス抽出の対象となる多次元のデータ行列Aが記憶されており、例えばデータ行列Aがユーザの購買履歴であれば、商品が行項目に設定され、ユーザが列項目に設定される。特異値分解部２は、前記分析対象DB１から分析対象データ行列Aを読み出して特異値分解を施す。この特異値分解により、後に詳述するように、分析対象データ行列Aの各行項目に関して相互関係が捉えられ、行項目に関して関連性の深い成分同士がひとつの次元にまとめられる。 The analysis target database (DB) 1 stores a multidimensional data matrix A that is a target of latent class extraction. For example, if the data matrix A is a purchase history of a user, a product is set as a line item, The user is set in the column item. The singular value decomposition unit 2 reads the analysis object data matrix A from the analysis object DB 1 and performs singular value decomposition. By this singular value decomposition, as will be described in detail later, the interrelationship is captured with respect to each row item of the analysis target data matrix A, and components that are closely related to the row item are combined into one dimension.

次元圧縮部３は、前記特異値分解の結果に基づいて、大きさが上位の特異値のみを使用して前記分析対象データ行列Aを次元圧縮し、近似行列A'を求める。離散化部４は、前記近似行列A'の各成分を離散化して離散化近似行列A''を求める。潜在クラス抽出部５は、前記離散化近似行列A''に既知のEMアルゴリズムや確定的アニーリングEMアルゴリズムなどの手法を適用して潜在クラス抽出を行う。潜在クラス抽出の結果は、ユーザに提示されても良いし、あるいはデータ分析支援システム（図示せず）などに出力されるようにしても良い。 Based on the result of the singular value decomposition, the dimension compressing unit 3 performs dimension compression on the analysis target data matrix A using only the singular value having the higher magnitude, and obtains an approximate matrix A ′. The discretization unit 4 discretizes each component of the approximate matrix A ′ to obtain a discretized approximate matrix A ″. The latent class extraction unit 5 performs latent class extraction by applying a technique such as a known EM algorithm or a deterministic annealing EM algorithm to the discretized approximation matrix A ″. The result of the latent class extraction may be presented to the user, or may be output to a data analysis support system (not shown).

図２は、上記した潜在クラス抽出装置における潜在クラス抽出手順を示したフローチャートである。 FIG. 2 is a flowchart showing a latent class extraction procedure in the above-described latent class extraction apparatus.

ステップS１では、分析対象DB１から分析対象のデータ行列Aが読み出される。ステップS２では、前記読み出されたデータ行列Aに対して、特異値分解部２において特異値分解(SVD：Singular Value Decomposition)が施される。この特異値分解により、分析対象データ行列Aが次式(1)を満足する行列成分U、Σ、Vtに分解される。

A＝UΣVt (1)
In step S1, the analysis target data matrix A is read from the analysis target DB1. In step S2, the singular value decomposition unit 2 performs singular value decomposition (SVD: Singular Value Decomposition) on the read data matrix A. By this singular value decomposition, the analysis target data matrix A is decomposed into matrix components U, Σ, and Vt that satisfy the following equation (1).

A ＝ UΣVt (1)

図３は、分析対象データAの特異値分解および次元圧縮を模式的に表現した図であり、分析対象データ行列Aがｎ行ｍ列（ｎ×ｍ：ｎ＞ｍ）であれば、行列成分Uはｎ行ｎ列の直交行列であり、左特異ベクトルを示している。行列成分Vt はｍ行ｍ列の直交行列であり、右特異ベクトルを示している。行列成分Σは、特異値σ（σ1，σ2，…，σr，０，…０）を対角成分に持ち、対角成分以外の成分が全て「０」となるような対角行列であって、各特異値σは上の行ほど値が大きくなる（σ1≧σ2≧…≧σr＞０）ように求められるのが一般的である。 FIG. 3 is a diagram schematically showing singular value decomposition and dimension compression of the analysis target data A. If the analysis target data matrix A is n rows and m columns (n × m: n> m), matrix components are shown. U is an orthogonal matrix of n rows and n columns, and indicates a left singular vector. The matrix component Vt is an orthogonal matrix of m rows and m columns and indicates a right singular vector. The matrix component Σ is a diagonal matrix having singular values σ (σ1, σ2,..., Σr, 0,... 0) as diagonal components and all components other than the diagonal components are “0”. In general, each singular value σ is determined such that the value increases in the upper row (σ1 ≧ σ2 ≧... ≧ σr> 0).

ステップS３では、次元圧縮部３において、前記特異値分解の結果に基づいて分析対象データ行列Aが次元圧縮される。すなわち、次元圧縮部３は直交行列Uの先頭のｋ個の列成分（左特異ベクトル）からなるｎ行ｋ列の射影行列Ukを生成する。直交行列Vt についても同様に、先頭のｋ個の列成分（右特異ベクトル）からなるｍ行ｋ列の行列Vkを生成する。さらに、対角行列Σの第１行から第ｋ行までの成分からなるｋ行ｋ列の直交行列Σkを生成する。そして、次式に基づいてｋ行ｍ列に次元圧縮された近似行列A'を求める。なお、行列ＶkTは行列Ｖkの転置行列である。

A'＝ＵkΣkＶkT （２）
In step S3, the dimension compression unit 3 performs dimension compression on the analysis target data matrix A based on the result of the singular value decomposition. That is, the dimension compression unit 3 generates an n-by-k projection matrix Uk composed of the first k column components (left singular vectors) of the orthogonal matrix U. Similarly, for the orthogonal matrix Vt, a matrix Vk of m rows and k columns composed of the first k column components (right singular vectors) is generated. Furthermore, an orthogonal matrix Σk of k rows and k columns composed of components from the first row to the kth row of the diagonal matrix Σ is generated. Then, an approximate matrix A ′ dimension-compressed to k rows and m columns is obtained based on the following equation. The matrix VkT is a transposed matrix of the matrix Vk.

A '= UkΣkVkT (2)

ここまでの処理により、例えばｍ人のユーザのｎ件の商品に関する購買履歴としての分析対象データ行列Aが、関連性の強い商品同士をまとめてｋ件の商品群とされたｍ行ｋ列の近似行列A'に次元圧縮される。 As a result of the processing so far, for example, the analysis target data matrix A as a purchase history of n products of m users is an m row k column in which highly related products are grouped into k product groups. Dimensionally compressed to approximate matrix A ′.

ステップS４では、近似行列A'の各成分が、EMアルゴリズムや確定的アニーリングEMアルゴリズムなどの手法を適用できるように離散化される。すなわち、分析対象データ行列Aの各成分が離散化されていても、上記した特異値分解および次元圧縮により各成分が実数となっており、このままではEMアルゴリズム等を適用できない。そこで、ステップS４では近似行列A'の各成分が離散化される。 In step S4, each component of the approximate matrix A ′ is discretized so that a technique such as an EM algorithm or a deterministic annealing EM algorithm can be applied. That is, even if each component of the analysis target data matrix A is discretized, each component becomes a real number by the above-described singular value decomposition and dimension compression, and the EM algorithm or the like cannot be applied as it is. Therefore, in step S4, each component of the approximate matrix A ′ is discretized.

図４は、本実施形態における離散化の手順を示したフローチャートであり、ここでは２値化による離散化を例にして説明する。 FIG. 4 is a flowchart showing a discretization procedure in the present embodiment. Here, the discretization by binarization will be described as an example.

ステップS４１では、離散化対象の行を指定する変数ｊに初期値「１」がセットされる。ステップS４２では、各列のｊ番目の行の全ての成分（ａj1，ａj2…ａjm）が抽出され、ステップS４３において降順にソートされる。ステップS４４では、第ｊ行に関して、離散化処理の結果、「１」が発生する率（以下、単に発生率と表現する）αj（％）が、変数ｊ、特異値σおよび基準発生率αrefの関数として、次式(3)に基づいて求められる。

αj＝（σj／σ1）×αref (3)
In step S41, an initial value “1” is set to a variable j that designates a discretization target row. In step S42, all components (aj1, aj2,... Ajm) in the jth row of each column are extracted and sorted in descending order in step S43. In step S44, with respect to the j-th row, as a result of the discretization process, the rate of occurrence of “1” (hereinafter simply expressed as the rate of occurrence) αj (%) is the variable j, the singular value σ, and the reference occurrence rate αref It is obtained as a function based on the following equation (3).

αj = (σj / σ1) × αref (3)

ステップＳ４５では、降順にソートされた成分の上位αj％のみを「１」とし、それ以外を全て「０」とすることで第ｊ行の各成分が離散化される。ステップＳ４６では変数ｊが行数ｋと比較され、全ての行に関して離散化が完了（ｊ≧ｋ）するまでは、ステップＳ４６で変数ｊを更新した後にステップＳ４２へ戻って上記した各処理が繰り返される。 In step S45, only the top αj% of the components sorted in descending order are set to “1”, and all other components are set to “0”, whereby each component in the jth row is discretized. In step S46, the variable j is compared with the number k of rows, and until discretization is completed for all rows (j ≧ k), the variable j is updated in step S46, and then the processing returns to step S42 and the above-described processes are repeated. It is.

本実施形態では、式(3)で示されたように、各行の発生率αjが変数ｊおよび特異値σの関数として求められ、基準発生率αrefが８０（％）であれば、第１行目に関しては、αj＝αとなるので、上位８０％の成分が「１」となり、下位２０％が「０」となる。第２行目に関しては、αj＝（σ2／σ1）×αとなり、一般的にはσ2＜σ1なので、上位８０％未満の成分が「１」となり、下位２０％以上の成分が「０」となる。以下同様に、特異値の大きい成分ほど存在確率が高くなるように、行が下がるに従って前記発生率αjが漸減するので、特異値の大きさを反映した離散化が可能になる。 In the present embodiment, as shown in Expression (3), the occurrence rate αj of each row is obtained as a function of the variable j and the singular value σ, and if the reference occurrence rate αref is 80 (%), the first row For the eye, αj = α, so the upper 80% component is “1” and the lower 20% is “0”. Regarding the second row, αj = (σ2 / σ1) × α, and generally σ2 <σ1, so that the component in the lower 80% is “1” and the component in the lower 20% is “0”. Become. Similarly, the occurrence rate αj gradually decreases as the number of rows decreases so that a component having a larger singular value has a higher probability of existence, so that discretization reflecting the size of the singular value becomes possible.

図２に戻り、ステップS５では、以上のようにして次元圧縮され、かつ各成分が離散化された近似行列A''から、前記潜在クラス抽出部５において、既知のEMアルゴリズムや確定的アニーリングEMアルゴリズムなどの手法を適用して潜在クラス抽出が行われる。 Returning to FIG. 2, in step S5, the latent class extraction unit 5 uses a known EM algorithm or deterministic annealing EM from the approximate matrix A ″ that has been dimensionally compressed as described above and whose components have been discretized. Latent class extraction is performed by applying a technique such as an algorithm.

図５は、本発明に係る潜在クラス抽出装置として機能できるコンピュータ５０の主要部の構成を示した機能ブロック図であり、オペレーティングシステム(OS)を含む基本プログラムや各種の基本データが記憶されたROM５２と、各種のプログラムやデータが記憶されるハードディスクドライブ装置(HDD)５７と、CR-ROMやDVD等の記憶メディア６１からプログラムやデータを読み出すメディアドライブ装置５６と、プログラムを実行するCPU５１と、このCPU５１にワークエリアを提供するRAM５３と、入出力インターフェース(I/F)５５を介して接続されたディスプレイ５８、キーボード５９およびマウス等のポインティングデバイス６０と、外部装置と通信するパラレル／シリアルI/F５４とを主要な構成としている。 FIG. 5 is a functional block diagram showing the configuration of the main part of the computer 50 that can function as a latent class extracting device according to the present invention. The ROM 52 stores a basic program including an operating system (OS) and various basic data. A hard disk drive (HDD) 57 that stores various programs and data, a media drive 56 that reads programs and data from a storage medium 61 such as a CR-ROM and a DVD, a CPU 51 that executes the program, A RAM 53 that provides a work area to the CPU 51, a display 58, a keyboard 59, and a pointing device 60 such as a mouse connected via an input / output interface (I / F) 55, and a parallel / serial I / F 54 that communicates with an external device. And the main components.

本実施形態では、分析対象のデータ行列Aがシリアル／パラレルI/F５４から入力、またはメディアドライブ装置５６で読み取られてHDD５４に予め記憶される。前記図２に関して説明した潜在クラス抽出処理のプログラムは記憶メディア６１に記憶され、メディアドライブ装置５６で読み取られてHDD５７にインストールされる。 In this embodiment, the data matrix A to be analyzed is input from the serial / parallel I / F 54 or read by the media drive device 56 and stored in the HDD 54 in advance. The latent class extraction processing program described with reference to FIG. 2 is stored in the storage medium 61, read by the media drive device 56, and installed in the HDD 57.

このような構成において、オペレータがキーボード５９およびポインティングデバイス６０を操作して潜在クラス抽出プログラムを起動し、さらに分析対象のデータ行列Aを指定すると、CPU５１において潜在クラス抽出プログラムが実行され、上記した特異値分解、次元圧縮、離散化および潜在クラス抽出が順次に実行される。潜在クラスの抽出結果はディスプレイ５８に表示される。 In such a configuration, when the operator operates the keyboard 59 and the pointing device 60 to start the latent class extraction program and further specifies the data matrix A to be analyzed, the latent class extraction program is executed in the CPU 51, and the above-described peculiarity Value decomposition, dimension compression, discretization and latent class extraction are performed sequentially. The extraction result of the latent class is displayed on the display 58.

本発明に係る潜在クラス抽出装置の主要部の構成を示したブロック図である。It is the block diagram which showed the structure of the principal part of the latent class extraction apparatus which concerns on this invention. 潜在クラス抽出手順を示したフローチャートである。It is the flowchart which showed the latent class extraction procedure. 分析対象データの特異値分解および次元圧縮を模式的に表現した図である。It is the figure which expressed typically the singular value decomposition | disassembly and dimension compression of analysis object data. 近似行列の各成分を離散化する手順を示したフローチャートである。It is the flowchart which showed the procedure which discretizes each component of an approximated matrix. 本発明に係る潜在クラス抽出装置として機能できるコンピュータの主要部の構成を示した機能ブロック図である。It is the functional block diagram which showed the structure of the principal part of the computer which can function as a latent class extraction apparatus concerning this invention.

符号の説明Explanation of symbols

１…分析対象データベース，２…特異値分解部，３…次元圧縮部，４…離散化部，５…潜在クラス抽出部 DESCRIPTION OF SYMBOLS 1 ... Analysis object database, 2 ... Singular value decomposition part, 3 ... Dimension compression part, 4 ... Discretization part, 5 ... Latent class extraction part

Claims

測定可能な顕在変数が二次元行列で表現された分析対象のデータ行列から、当該顕在変数の隠れた要因となる潜在変数を推測し、当該潜在変数に基づいて前記データ行列に潜在的に存在する複数の潜在クラスを抽出する潜在クラス抽出装置において、
分析対象のデータ行列を特異値分解する特異値分解手段と、
前記特異値分解の結果に基づいて前記分析対象のデータ行列を次元圧縮することで近似行列を求める次元圧縮手段と、
前記近似行列の各成分を「０」または「１」へ離散化する離散化手段と、
前記離散化後の近似行列から潜在クラスを抽出する潜在クラス抽出手段とを含み、
前記離散化手段は、前記近似行列の各成分を、前記特異値分解で得られた特異値が大きい成分ほど高い存在確率で「１」に離散化することを特徴とする潜在クラス抽出装置。 A measurable manifest variable is inferred from a data matrix to be analyzed expressed in a two-dimensional matrix, and a latent variable that is a hidden factor of the manifest variable is inferred and potentially exists in the data matrix based on the latent variable. In a latent class extraction device that extracts a plurality of latent classes,
Singular value decomposition means for singular value decomposition of the data matrix to be analyzed;
Dimensional compression means for obtaining an approximate matrix by dimensionally compressing the data matrix to be analyzed based on the result of the singular value decomposition ;
A discretization unit for discretizing the components of the approximate matrix to "0" or "1",
Look containing a latent class extracting means for extracting a potential class from the approximate matrix after the discretization,
The discretization means discretizes each component of the approximation matrix into “1” with a higher existence probability as a component having a larger singular value obtained by the singular value decomposition .

前記潜在クラス抽出手段は、前記離散化後の近似行列からEMアルゴリズム又は確定的アニーリングEMアルゴリズムによって潜在クラスを抽出することを特徴とする請求項１に記載の潜在クラス抽出装置。 The latent class extracting device according to claim 1, wherein the latent class extracting unit extracts a latent class from the approximated matrix after the discretization by an EM algorithm or a deterministic annealing EM algorithm .

前記離散化手段は、値が大きい順がj番目の特異値σjおよび所定の基準発生率αrefの関数として、前記近似行列における前記特異値分解で得られた特異値σjに対応する成分における前記存在確率αjを次式（３）に基づいて求め、値が大きい側の上位から当該存在確率αjに等しい割合の成分を「１」へ、それ以外を全て「０」へと離散化することを特徴とする請求項１または２に記載の潜在クラス抽出装置。
αj＝（σj／σ1）×αref （３） The discretization means includes the presence in a component corresponding to the singular value σj obtained by the singular value decomposition in the approximate matrix as a function of the j-th singular value σj in descending order and a predetermined reference occurrence rate αref. Probability αj is obtained based on the following equation (3), and components having a ratio equal to the existence probability αj are discretized from “higher value” to “1”, and all others are discretized to “0”. The latent class extraction device according to claim 1 or 2.
αj = (σj / σ1) × αref (3)

測定可能な顕在変数からなる二次元行列で表現された分析対象のデータ行列から当該顕在変数の隠れた要因となる潜在変数を推測し、当該潜在変数に基づいて前記データ行列に潜在的に存在する複数の潜在クラスを抽出する潜在クラス抽出方法において、
コンピュータが分析対象のデータ行列を特異値分解する手順と、
コンピュータが前記特異値分解の結果に基づいて前記分析対象のデータ行列を次元圧縮することで近似行列を求める手順と、
コンピュータが前記近似行列の各成分を「０」または「１」へ離散化する手順と、
コンピュータが前記離散化後の近似行列から潜在クラスを抽出する手順とを含み、
前記離散化する手順では、前記近似行列の各成分を、前記特異値分解で得られた特異値が大きい成分ほど高い存在確率で「１」に離散化することを特徴とする潜在クラス抽出方法。 A latent variable that is a hidden factor of the manifest variable is inferred from the data matrix to be analyzed expressed by a two-dimensional matrix of measurable manifest variables, and potentially exists in the data matrix based on the latent variable. In the latent class extraction method for extracting a plurality of latent classes,
A procedure for a computer to perform singular value decomposition on a data matrix to be analyzed;
A procedure for obtaining an approximate matrix by dimensionally compressing the data matrix to be analyzed based on the result of the singular value decomposition ;
A procedure for the computer to discretize each component of the approximation matrix into “ 0” or “1” ;
And a procedure for the computer to extract latent class from the approximate matrix after the discretization seen including,
In the discretization step, the latent class extraction method is characterized in that each component of the approximation matrix is discretized to “1” with a higher existence probability as a component having a larger singular value obtained by the singular value decomposition .

前記潜在クラスを抽出する手順では、前記離散化後の近似行列からEMアルゴリズム又は確定的アニーリングEMアルゴリズムによって潜在クラスを抽出することを特徴とする請求項４に記載の潜在クラス抽出方法。 5. The latent class extracting method according to claim 4 , wherein in the step of extracting the latent class, a latent class is extracted from the approximated matrix after the discretization by an EM algorithm or a deterministic annealing EM algorithm .

前記離散化する手順では、値が大きい順がj番目の特異値σjおよび所定の基準発生率αrefの関数として、前記近似行列における前記特異値分解で得られた特異値σjに対応する成分における前記存在確率αjを次式（３）に基づいて求め、値が大きい側の上位から当該存在確率αjに等しい割合の成分を「１」へ、それ以外を全て「０」へと離散化することを特徴とする請求項４または５に記載の潜在クラス抽出方法。
αj＝（σj／σ1）×αref （３） In the discretization procedure, the components in the component corresponding to the singular value σj obtained by the singular value decomposition in the approximation matrix as a function of the j-th singular value σj and a predetermined reference occurrence rate αref in descending order. The existence probability αj is obtained based on the following equation (3), and the components having a ratio equal to the existence probability αj from the higher value side are discretized to “1” and all others are discretized to “0”. The latent class extracting method according to claim 4 or 5, characterized in that:
αj = (σj / σ1) × αref (3)

測定可能な顕在変数が二次元行列で表現された分析対象のデータ行列から、当該顕在変数の隠れた要因となる潜在変数を推測し、当該潜在変数に基づいて前記データ行列に潜在的に存在する複数の潜在クラスを抽出する潜在クラス抽出プログラムにおいて、
分析対象のデータ行列を特異値分解する手順と、
前記特異値分解の結果に基づいて前記分析対象のデータ行列を次元圧縮することで近似行列を求める手順と、
前記近似行列の各成分を「０」または「１」へ離散化する手順と、
前記離散化後の近似行列から潜在クラスを抽出する手順とを、コンピュータに実行させ、
前記離散化する手順では、前記近似行列の各成分を、前記特異値分解で得られた特異値が大きい成分ほど高い存在確率で「１」に離散化する潜在クラス抽出プログラム。 A measurable manifest variable is inferred from a data matrix to be analyzed expressed in a two-dimensional matrix, and a latent variable that is a hidden factor of the manifest variable is inferred and potentially exists in the data matrix based on the latent variable. In the latent class extraction program that extracts multiple latent classes,
A procedure for singular value decomposition of the data matrix to be analyzed;
Obtaining an approximate matrix by dimensionally compressing the data matrix to be analyzed based on the result of the singular value decomposition ;
A step of discretizing each component of the approximated matrix to "0" or "1",
Causing the computer to execute a procedure of extracting a latent class from the approximated matrix after the discretization ,
In the discretization procedure, a latent class extraction program that discretizes each component of the approximation matrix into “1” with a higher existence probability as a component having a larger singular value obtained by the singular value decomposition .

前記潜在クラスを抽出する手順では、前記離散化後の近似行列からEMアルゴリズム又は確定的アニーリングEMアルゴリズムによって潜在クラスを抽出することを特徴とする請求項７に記載の潜在クラス抽出プログラム。 The latent class extracting program according to claim 7 , wherein in the step of extracting the latent class, the latent class is extracted from the approximated matrix after the discretization by an EM algorithm or a deterministic annealing EM algorithm .

前記離散化する手順では、値が大きい順がj番目の特異値σjおよび所定の基準発生率αrefの関数として、前記近似行列における前記特異値分解で得られた特異値σjに対応する成分における前記存在確率αjを次式（３）に基づいて求め、値が大きい側の上位から当該存在確率αjに等しい割合の成分を「１」へ、それ以外を全て「０」へと離散化することを特徴とする請求項７または８に記載の潜在クラス抽出プログラム。
αj＝（σj／σ1）×αref （３） In the discretization procedure, the components in the component corresponding to the singular value σj obtained by the singular value decomposition in the approximation matrix as a function of the j-th singular value σj and a predetermined reference occurrence rate αref in descending order. The existence probability αj is obtained based on the following equation (3), and the components having a ratio equal to the existence probability αj from the higher value side are discretized to “1” and all others are discretized to “0”. The latent class extracting program according to claim 7 or 8, characterized in that
αj = (σj / σ1) × αref (3)

前記請求項７ないし９のいずれかに記載の潜在クラス抽出プログラムが記録されたコンピュータ読み取り可能な記録媒体。 10. A computer-readable recording medium on which the latent class extraction program according to claim 7 is recorded.