CN101853239A

CN101853239A - Nonnegative matrix factorization-based dimensionality reducing method used for clustering

Info

Publication number: CN101853239A
Application number: CN201010167504.4A
Authority: CN
Inventors: 郭跃飞; 朱真峰; 薛向阳
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2010-05-06
Filing date: 2010-05-06
Publication date: 2010-10-06

Abstract

The invention belongs to the technical field of statistical model identification and machine learning, and in particular discloses a nonnegative matrix factorization-based dimensionality reducing method used for clustering. The method comprises the following steps of: adopting a KL distance; adding a data normalization constraint; directly discovering the internal relation among data dimensionalities by minimizing an object error function between data compression and reconstruction; obtaining a mapping matrix; and projecting high-dimensional data to a low-dimensional subspace by using the mapping matrix so as to perform effective data analysis such as clustering and the like. A simpler iterative formula compared with an original factorization method is obtained by the method; and normalization can be maintained naturally in each iterative updating. The normalization makes a final mapping matrix have higher sparsity compared with the original factorization method. In the obtained low-dimensional space, a clustering result shows that a more effective low-dimensional data characteristic can be obtained by the method; and an algorithm is simple and effective.

Description

A kind of dimension reduction method that is used for cluster based on the nonnegative matrix decomposition

Technical field

The invention belongs to statistical model identification and machine learning techniques field, be specifically related to a kind of dimension reduction method that is used for cluster based on the nonnegative matrix decomposition.

Background technology

Cluster is one of the most basic research task in machine learning field.In actual applications, each venn diagram of data shows a relevant feature.Generally, be difficult to judge simply which feature helps cluster, method commonly used is exactly an image data feature as much as possible, carries out cluster then.Therefore, data characteristics generally is a higher-dimension, and the data characteristics of higher-dimension is brought the problem of two aspects usually: 1) storage and calculation cost are higher, and 2) the dimension disaster problem.In actual applications, the dimension disaster problem is one of subject matter of facing of many mode identification methods, as Gait Recognition, image classification and text-processing etc.During especially in the face of the limited sample of higher-dimension, dimension disaster is more outstanding, and it directly causes the cluster performance decrease.How dimensionality reduction is studied exactly the high dimensional data compressing mapping in low n-dimensional subspace n, thereby more effectively finishes task such as cluster.This mapping can be linear or nonlinear.Because linear dimension reduction method is simply effective, it is widely used in the every field of machine learning and pattern-recognition.

Current, except above-mentioned linear or nonlinear dimension reduction method,, can be divided into following a few class again from different angles: as, based on whether using the class label information, can be divided into not having and supervise semi-supervised and measure of supervision.What the present invention considered is the nonnegativity of data.Many methods have symbol arbitrarily, but the decomposition of picture nonnegative matrix (nonnegative matrix factorization, NNMF) method can keep the nonnegativity of data symbol, and this has reflected text, the essential characteristic of data such as image.

(principal component analysis, PCA) to supervise linear dimension reduction method be [1] to Jing Dian nothing to principal component analysis.One of basic problem that dimensionality reduction relates to is how to select suitable dimension r, and the PCA method is easy to obtain the r value by the analytical characteristic value.Finally, all former data can be represented best according to the least squares error principle in the r dimensional linear subspace of former feature space.Spectral analysis method has the similar method of choosing the low-dimensional number [2].Spectral method has solid theory and uses widely and be easy to and carry out.

Another kind of popular subspace dimension reduction method be linear discriminant analysis (Fisher ' s linear discriminant analysis, LDA) [3].This method can keep the dependency structure of category information in the low-dimensional projector space.When training sample is insufficient, gets and to adopt semi-supervised method [4].Contrast, PCA and LDA are linear dimension reduction methods, LDA carries out modeling to the class differences clearly, and PCA does not consider category information.It is optimum that the PCA method is rebuild (reconstruction) for data, but it is not suitable for the separation and the identification of class.

Considered linear case above, when data set can not be represented with sample average and covariance matrix effectively, perhaps, and when comprising complex characteristic in the data, the linear method poor efficiency that becomes.In this case, can utilize nuclear skill (kernel trick), as principal component analysis (kernel PCA) [5] based on nuclear.(locally linearembedding, LLE) (artificial neural network, ANN) [7] also are the methods of using always to other nonlinear technology for [6] and artificial neural network as local linear embedding.The LLE method has kept the neighbor relationships before and after the dimensionality reduction.But the ANN method simulated the nervous system retrospective (as, study) mechanism.But the training of ANN model generally is consuming time.

The present invention focuses on nonnegative matrix decomposition method [8].The NNMF method can be data matrix X ^{N * m}Be decomposed into C ^{N * r}* M ^{R * m}, when n and m represented dimension and sample number respectively, C was a basis matrix and M is a matrix of coefficients.Popular its simple and practical property that is of NNMF method.In order to solve the problem in the application, different constraints or parameter have been added among the standard N NMF.As, Li etc. have proposed local NNMF by three new constraints of adding on base and matrix of coefficients, and (localNNMF LNMF), thereby realizes useful local visual pattern [9]; Cichocki etc. have investigated (the method for [10] of α-divergence) (brief note is α NMF) based on α " distance "; Shahnaz etc. have considered the sparse property of matrix of coefficients, have proposed GD-CLS algorithm [11].Basis matrix can be considered projection matrix, so NNMF can be used for dimensionality reduction [12,13].Contrast, except that the NNMF method, above-mentioned method all has symbol arbitrarily.But these NNMF methods based on KL " distance " or Euclidean distance also have some common problems.At first, former NNMF method that they are complicated to some extent is so final update rule needs more computing time than former method.Secondly, iteration update method explanation Matrix C and M are the functions of self, so C and M need be initialised simultaneously, and too much initially may not necessarily produce more effective matrix decomposition.The present invention has provided a kind of normalization compression method (normalized compressionusing NNMF is designated as NCMF) that decomposes based on nonnegative matrix, and NCMF has finally obtained simple and effective alternative manner by the normalization data dimension.

Summary of the invention

The object of the present invention is to provide does not a kind ofly increase original NNMF complexity and fast convergence rate, the nonnegative matrix dimension reduction method that economize computing time.

Dimension reduction method provided by the invention adopts KL " distance ", and the target error function between minimise data compression and the reconstruction finally obtains mapping matrix; Utilize this mapping matrix again, high dimensional data is projected in the low n-dimensional subspace n, thereby carry out the active data analysis, as cluster etc.In minimizing the process of above-mentioned error, on the basis of the non-negative requirement of data, add the data normalization constraint, the L1 normal form of promptly decomposing two every row of matrix that obtain is always 1.This normalization constraint does not have complicated former resolution problem, on the contrary, has obtained the result more terse than methods such as former NNMF.The NCMF algorithm that the present invention proposes can keep normalizing naturally in each iteration is upgraded.These characteristics make final mapping matrix compare with methods such as former NNMF to have better sparse property.In the lower dimensional space that obtains, shared class k means clustering algorithm shows that NCMF has obtained more effectively low-dimensional data characteristics.Compare with existing related algorithm, great advantage of the present invention is simple and effective.

Among the present invention, when among the data matrix X during sample of every line display, i.e. X ^{N * m}Be decomposed into C ^{N * r}* M ^{R * m}The time, different with above-mentioned matrix decomposition, be equivalent to the X data matrix has been carried out transposition, make n and m represent sample number and dimension respectively.The internal relation between the different dimensional can be directly realized in this processing, thereby reaches effective dimensionality reduction purpose.In many application, the sample column vector among the data matrix X is also by normalization, and the present invention has carried out transpose process to X, so all dimensional vectors of normalization.This disposal route makes Matrix C and M tool be of fresh significance: C has write down the compression result of high dimensional data among the X, then can be considered condensation matrix, and M has reflected the mapping relations between the low-dimensional (column vector) among higher-dimension among the X (column vector) and the C, and then M is a mapping matrix.The present invention's distance metric method of KL " distance " as matrix X and condensation matrix C and mapping matrix M product.Corresponding objective function is:

F (C, M) = \min_{C &GreaterEqual; 0, M &GreaterEqual; 0} Σ_{ij} (X_{ij} \log \frac{X_{ij}}{{(CM)}_{ij}} - X_{ij} + {(CM)}_{ij}) + μ (Σ_{i} C_{ik} - 1) + v (Σ_{k} M_{kj} - 1) - - - (1)

Wherein, ∑ _iX _Ij=1, (row of representing matrix C of 1≤k≤r) and the row of matrix M, μ and v are positive parameters to k.The update rule that obtains is,

C_{ik} = \frac{{({XM}^{T})}_{ik}}{Σ_{j} M_{kj}} - - - (2)

M_{kj} = \underset{i}{Σ} (\frac{X_{ij}}{{(CM)}_{ij}} C_{ik}) M_{kj} . - - - (3)

Wherein, the transposition of T representing matrix, i.e. M ^TThe transposition of representing matrix M.

NCMF is resolving into from X the process of C and M, constantly rebuilds X by CM, and the .NCMF method has simplicity as shown in Figure 1, normalizing and sparse property.

1, simplicity

Compare the decomposition method based on KL " distance " (brief note is KL_NMF) that Lee proposes in [8], we have oversimplified the renewal to Matrix C.This simple renewal is from following observation.Pushed away to get XMT=C (MMT) and C=XMT (Z)-1 by X=CM, wherein Z=MMT is the matrix of an order r.If Z is a unit matrix, can delete safely.Following application example 1 (with reference to equation 8) illustrates that the Z of this moment is the matrix of a diagonal dominance elder generation, and then Z is an approximate diagonal matrix.When higher-dimension (row among the X) balancedly corresponded to low-dimensional (row among the C), Z had become an approximate numbers matrix (scalar matrix).This moment, Z can promptly upgrade getting final product by the safety deletion and being reverse into simple division equally as the rule (2) of C.Done two advantages like this: simple, eliminated contrary calculating; Avoid introducing negative.In fact, even higher-dimension (row among the X) corresponds to lack of balance in the low-dimensional (row among the C), above-mentioned formula is suitable equally, as the Occam razor: simple and practicality is best [14].The above-mentioned renewal of C also has another kind of the explanation.The renewal that is listed as among the C is equivalent to the equal vector of some row (dimension) corresponding among the X, and above-mentioned update rule just in time embodies these characteristics.This simplicity can be quickened iterative process, thereby makes the NCMF method have higher operational efficiency than methods such as NNMF.

2, normalizing

Based on above-mentioned two more new formula can further derive following conclusion: C and M is keeping normalizing in each iteration is upgraded.More new formula and the normalization of X in application based on C can get,

\underset{i}{Σ} C_{ik} = \underset{i}{Σ} \frac{{({XM}^{T})}_{ik}}{Σ_{j} M_{kj}}

= \underset{i}{Σ} \frac{Σ_{j^{'}} X_{{ij}^{'}} M_{{kj}^{'}}}{Σ_{j} M_{kj}} - - - (4)

= \underset{j^{'}}{Σ} \frac{Σ_{i} X_{{ij}^{'}} M_{{kj}^{'}}}{Σ_{j} M_{kj}}

= \frac{Σ_{j^{'}} M_{{kj}^{'}}}{Σ_{j} M_{kj}}

= 1

Similarly,

\underset{k}{Σ} M_{kj} = \underset{k}{Σ} (\underset{i}{Σ} (\frac{X_{ij}}{{(CM)}_{ij}} C_{ik}) M_{kj})

= \underset{k}{Σ} (\underset{i}{Σ} (\frac{X_{ij}}{Σ_{k^{'}} C_{{ik}^{'}} M_{k^{'} j}} C_{ik}) M_{kj})

= \underset{k}{Σ} \underset{i}{Σ} \frac{X_{ij} C_{ik} M_{kj}}{Σ_{k^{'}} C_{{ik}^{'}} M_{k^{'} j}}

= \underset{i}{Σ} X_{ij} \frac{Σ_{k} C_{ik} M_{kj}}{Σ_{k^{'}} C_{i k^{'}} M_{k^{'} j}} - - - (5)

= \underset{i}{Σ} X_{ij}

= 1

Above-mentioned derivation has proved NCMF in decomposable process, needs only data matrix X at first by normalization, and Matrix C and M just have the normalization characteristics.Same analysis explanation NNMF does not have this normalization characteristics.

We so can analyze the convergence of NCMF.Update rule (2) illustrates that C is the function of M, and then objective function (1) can be rewritten as the only function of variable M.Therefore, it is important that the convergence of M causes the pass.And article [15] illustrates that M can converge to local static point.So the renewal of M and C can reduce the value of objective function (1), NCMF is a convergent.

3, sparse property

Formula (3) illustrates that the renewal of M does not need the similarity between the computing dimension, but the correlativity between the row among each the peacekeeping C among the calculating X.In whole decomposable process, element Xij rebuilds with (CM) ij.And then, ω _Ij=(X _Ij)/((CM) _Ij) initial value and its magnitude relationship between rebuilding has been described.If (X _Ij)＞(CM) _Ij, illustrate that reconstructed value is not enough to estimate initial value, so ω _Ij＞1 amplifies C _IkOn the contrary, (X _Ij)＜(CM) _IjReconstructed value over-fitting (overfitting) is described, then ω _Ij＜1 can reduce C _IkIf make γ _Kj=∑ _iω _IjC _Ik, M then _Kj=r _KjM _KjM is described _KjBy coefficient r _KjUpgrade.When mapping matrix M make j dimension Xvj (v=[1,2 ..., n] T) mainly be under the jurisdiction of M _KjThe time, we can illustrate r _KjAlways be not less than 1: upgrade the step in each iteration, formula (5) illustrates ∑ _kM _KjAlways be 1.When Xvj mainly is under the jurisdiction of M _KjThe time, Xvj is under the jurisdiction of M _{K ' j}(1≤k '≤r ﹠amp; The value of k ' ≠ k) has less degree of membership.And this whole renewal is a convergent.Therefore, the dimension of each among the X can forward to the k from k ' the degree of membership of row (dimension) among the C.So, r _KjAlways be not less than 1.This degree of membership that each dimension is described is along with the renewal of M highlights gradually.This has also just in time illustrated the characteristics of the sparse property of M.

The sparse property characteristics of mapping matrix M also can obtain by analyzing experimental result.The present invention weighs sparse property with information entropy (entropy) [16].Entropy is to probabilistic a kind of tolerance, and when a non-negative matrix was only formed by 0 and 1, the entropy minimum was 0; When these nonnegative numbers all equate, the entropy maximum.So sparse property can be measured with entropy: entropy is more little, sparse property is good more.For edge distribution q ∈ R ^{1 * m}, the entropy of q may be defined as,

H (q) = - Σ_{i = 1}^{m} q_{i} \log q_{i} . - - - (6)

If q matrix is considered as a vector, do same processing.From the angle of fuzzy clustering, the entropy of M has illustrated its ambiguity.At first should become joint distribution (joint distribution) to M is ∑ _{K, j}M ' _Kj=1 form.If M only forms by 0 and 1, then M ' has only the 1/m of m individual non-0, and this moment, entropy was Then a method of reasonably measuring the sparse property of M may be defined as

g(M)＝H(M’)-log(m).????????????????????(7)

As g (M) when equaling 0, reflected the most sparse result of desirable many-one mapping.

According to above content, corresponding algorithm is summarized as follows:

1, given data set X (depositing sample), low n-dimensional subspace n dimension r and iterations l by row.

2, the sample number n of computational data matrix X and dimension m.

3, normalization X makes ∑ _iX _Ij=1.

4, initialization mapping matrix M ∈ R ^{R * m}

5,1 step 6 of iteration and 7.

6, use Upgrade C.

7, use

Upgrade M.

8, return condensation matrix C and mapping matrix M.

Description of drawings

The decomposition and reconstruction process of Fig. 1 NCMF.

Frame on Fig. 2 gait data collection is removed the image of the figure viewed from behind.

Embodiment

Embodiment 1 can illustrate matrix X, the characteristics of C and M more intuitively.Matrix

A = [\begin{matrix} 2 & 0 & 0 & 0 & 1 \\ 0 & 3 & 2 & 1 & 0 \\ 0 & 2 & 3 & 2 & 0 \\ 2 & 0 & 0 & 0 & 2 \end{matrix}]

It is a simple data matrix.For text data, the word that each row is corresponding different, the statistics vector of various words occurrence number in a certain text of each line display.Our task is in the lower dimensional space of compression similar text to be got together.In order to simplify the problems referred to above, the data among the data matrix A can at first be carried out the binarization processing, that is,

A_{ij} = \{\begin{matrix} 1 & if & A_{ij} > 0 \\ 0 & else \end{matrix}

Based on this, NCMF can obtain following decomposition,

[\begin{matrix} 1 / 2 & 0 & 0 & 0 & 1 / 2 \\ 0 & 1 / 2 & 1 / 2 & 1 / 2 & 0 \\ 0 & 1 / 2 & 1 / 2 & 1 / 2 & 0 \\ 1 / 2 & 0 & 0 & 0 & 1 / 2 \end{matrix}] = [\begin{matrix} 1 / 2 & 0 \\ 0 & 1 / 2 \\ 0 & 1 / 2 \\ 1 / 2 & 0 \end{matrix}] [\begin{matrix} 1 & 0 & 0 & 0 & 1 \\ 0 & 1 & 1 & 1 & 0 \end{matrix}] - - - (8)

Three matrixes of above-mentioned decomposition correspondence are used X respectively, when C and M represent, obtain X=CM.Matrix X is made up of 4 capable vectors of sample, and each sample has 5 dimensions (row), and the condensation matrix C after decomposing has dropped to 2 from 5 dimensions of X and ties up.Mapping matrix M has write down the correlativity between different higher-dimensions and the low-dimensional, its row vector has illustrated that the higher dimensional space among the original X is compressed together more similar dimension in current lower dimensional space, and the column vector of M has illustrated among the X mapping relations between the low-dimensional in a certain original higher-dimension and C.This is a special case, and in actual applications, M becomes with 1 array by approaching 0, and equation (5) illustrates that it is 1 that M can keep each row L1 normal form.

The experimental result of embodiment 2 explanations on gait (gait) data set [17].The raw data of this data set is some video datas, from all frame data such as the accompanying drawing 2 of the elimination background wherein extracted.In order to use these gait datas, can further extract data message, form the data matrix that each ID number (corresponding people) has some sample characteristics vectors.For the dimensionality reduction characteristics of NCMF algorithm are described, the present invention has adopted in the galData data from maximum preceding 6 of sample and ID number right data has been carried out cluster analysis.To relevant result, we adopt three kinds of measures, and working time, the precision of sparse property and cluster is used second respectively, and information entropy and purity (purity) are measured.

The class k mean algorithm SIB[18 that The data behind the dimensionality reduction is same] carry out cluster, and adopt purity tolerance clustering precision.If cluster result matrix S record, wherein Shk represents that h birdss of the same feather flock together to k bunch ratio (1≤h, k≤r).Then purity may be defined as,

u (S_{k}) = \max_{h = 1,2, . . ., r} S_{hk} - - - (9)

Finally, we can use average purity

u (S) = \frac{1}{r} Σ_{k = 1}^{r} u (S_{k}) - - - (10)

The tolerance clustering precision.

Each decomposition algorithm all uses row to represent data sample in the experiment.Subordinate list 1 has shown concrete experimental result.NCMF is minimum working time; The entropy minimum of mapping matrix M, this explanation matrix M is the most sparse, and the reduction data that NCMF obtains have obtained maximum precision.

NCMF is more more effective on the low-dimensional data characteristics of working time and acquisition than other method in these experimental result explanations.Although NCMF has added more more constraints than NNMF, the result has obtained more terse update rule.This simplicity makes NCMF more efficient than former NNMF on the one hand, simultaneously also than making M only approach identical decomposition more by 0 and 1 expression.Above-mentioned sparse property and normalized Analysis explanation NCMF have obtained sparse mapping matrix M naturally.The more new formula (2) of NCMF and (3) illustrate and account for 2/3 of all working times computing time of mapping matrix that this explanation NCMF has emphasized the renewal of matrix M.In case obtain this matrix, can calculate condensation matrix C apace.In a word, this simplicity, normalizing and sparse property make that the NCMF algorithm is more effective.

Table 1 has illustrated the execution time of the relevant decomposition algorithm with other of NCMF dimensionality reduction in 6 dimension spaces of compression, the experimental result of clustering precision three aspects of the sparse property of mapping matrix and low dimension data.

Table 1

Algorithm	??NCMF	??KL_NMF	??LNMF	??αNMF	??GD-CLS
Algorithm	??NCMF	??KL_NMF	??LNMF	??αNMF	??GD-CLS	Working time (second)	??46.68	??57.73	??66.25	??76.71	??329.7
Degree of rarefication (entropy)	??0.6465	??1.3876	??1.5346	??1.2147	??1.2678	Working time (second)	??46.68	??57.73	??66.25	??76.71	??329.7
Degree of rarefication (entropy)	??0.6465	??1.3876	??1.5346	??1.2147	??1.2678	Precision (purity)	??0.7443	??0.7192	??0.7147	??0.7030	??0.7239

[0071]List of references:

[1]K.Fukunaga，Introduction?to?Statistical?pattern?recognition，Academic?Press，2ndedition，1991.

[2]J.Shi，J.Malik，Normalized?cuts?and?image?segmentation，IEEE?Transactions?on?PatternAnalysis?and?Machine?Intelligence?22(8)(2000)888-905.

[3]R.A.Fisher，The?use?of?multiple?measurements?in?taxonomic?problems，Annals?ofEugenics?7(1936)179-188.

[4]D.Cai，X.F.He，J.W.Han，Semi-supervised?discriminant?analysis，in：IEEE?11thInternational?Conference?on?Computer?Vision(ICCV)，2007，pp.1-7.

[5]S.Bernhard，S.Alexander，M.Klaus-Robert，Kernel?principal?component?analysis，1999.

[6]S.Roweis，L.Saul，Nonlinear?dimensionality?reduction?by?locally?linear?embedding.Science?290(5500)(2000)2223-2326.

[7]R.O.Duda，P.E.Hart，D.G.Stork，Pattern?classification，2nd?edition，Wiley，2001.

[8]D.D.Lee，H.S.Seung，Learning?the?parts?of?objects?by?nonnegative?matrix?factorization，Nature?401(6755)(1999)788-791.

[9]S.Z.Li，X.W.Hou，H.J.Zhang，Learning?spatially?localized，parts-based?representation，in：IEEE?Conference?on?Computer?Vision?and?Pattern?Recognition(CVPR)，vol.1，2001，pp.207-212.

[10]A.Cichocki，H.Lee，Y.D.Kim，S.Choi，Non-negative?matrix?factorization?with_-divergence，Pattern?Recognition?Letters?29(9)(2008)1433-1440.

[11]F.Shahnaz，M.W.Berry，V.P.Pauca，R.J.Plemmons，Document?clustering?usingnonegative?matrix?factorization，Information?Processing?and?Management?42(2006)(2005)373-386.

[12]W.X.Liu，K.H.Yuan，D.Yea，Reducing?microarray?data?via?nonnegative?matrixfactorization?for?visualization?and?clustering?analysis，Journal?of?Biomedical?Informatics?41(4)(2008)602-606.

[13]S.Tsuge，M.Shishibori，S.Kuroiwa，K.Kita，Dimensionality?reduction?usingnon-negative?matrix?factorization?for?information?retrieval，in：2001?IEEE?InternationalConference?of?Systems，Man，and?Cybernetics，vol.2，2001，pp.960-965.

[14]Occam?razor.http://en.wikipedia.org/wiki/Occam％27s_razor.

[15]D.D.Lee，H.S.Seung，Algorithms?for?non-negative?matrix?factorization，in：Proceedings?of?the?13th?Annual?Conference?on?Neural?Information?Processing?Systems(NIPS)，vol.13，2000，pp.556-562.

[16]T.Cover，J.Thomas，Elements?of?Information?Theory，JohnWiley?&?Sons，New?York，USA，1991.

[17]Gait?data?set.http://marathon.csee.usf.edu/GaitBaseline/.

[18]N.Slonim，N.Friedman，N.Tishby，Unsupervised?document?classification?usingsequential?information?maximization，in：Proceeding?of?25th?ACM?intermational?Conference?onResearch?and?Development?in?Information?Retrieval(SIGIR)，2002，pp.129-136.。

Claims

1. dimension reduction method that decomposes based on nonnegative matrix that is used for cluster, it is characterized in that adopting KL " distance ", on the basis of the non-negative requirement of data, add the data normalization constraint, by the target error function between minimise data compression and the reconstruction, directly seek the internal relation between the data dimension, finally obtain mapping matrix; Utilize this mapping matrix again, high dimensional data is projected in the low n-dimensional subspace n, thereby carry out corresponding data analysis.

2. method according to claim 1 is characterized in that sample of every line display among the data matrix X, and X ^{N * m}Be decomposed into C ^{N * r}* M ^{R * m}, by the dimensional vector of normalization data matrix X, finally obtaining condensation matrix C and mapping matrix M, corresponding objective function is:

F (C, M) = \min_{C &GreaterEqual; 0, M &GreaterEqual; 0} Σ_{ij} (X_{ij} \log \frac{X_{ij}}{{(CM)}_{ij}} - X_{ij} + {(CM)}_{ij}) + μ (Σ_{i} C_{ik} - 1) + v (Σ_{k} M_{kj} - 1)

Wherein, ∑ _iX _Ij=1, n is a sample number, and m is a dimension, and r is the dimension of lower dimensional space, k (row of representing matrix C of 1≤k≤r) and the row of matrix M, μ and v are positive parameters, the update rule that obtains is:

C_{ik} = \frac{({XM}^{T})_{ik}}{Σ_{j} M_{kj}}

M_{kj} = \underset{i}{Σ} (\frac{X_{ij}}{{(CM)}_{ij}} C_{ik}) M_{kj},

The subscript T of matrix represents this transpose of a matrix.

3. method according to claim 1 is characterized in that concrete calculation procedure is as follows:

1), given by the capable sample data collection X that deposits, low n-dimensional subspace n dimension r and iterations l;

2), the sample number n of computational data matrix X and dimension m;

3), normalization X makes ∑ _iX _Ij=1;

4), initialization mapping matrix M ∈ R ^{R * m}

5), 1 step 6) of iteration and 7);

6), use

C_{ik} = \frac{({XM}^{T})_{ik}}{Σ_{j} M_{kj}}

Upgrade C;

7), use

M_{kj} = \underset{i}{Σ} (\frac{X_{ij}}{{(CM)}_{ij}} C_{ik}) M_{kj}

Upgrade M;

8), return condensation matrix C and mapping matrix M.