CN105469101A

CN105469101A - Mixed two-dimensional probabilistic principal component analysis method

Info

Publication number: CN105469101A
Application number: CN201511022718.1A
Authority: CN
Inventors: 孙艳丰; 刘思萌; 句福娇; 胡永利; 尹宝才
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2015-12-31
Filing date: 2015-12-31
Publication date: 2016-04-06

Abstract

The invention discloses a mixed two-dimensional probabilistic principal component analysis method, the method can perform the dimensionality reduction for the two-dimensional data in row and column directions, and the reconstruction effect is better than the same in the prior art. The method comprises the steps: (1) a probabilistic second order principal component analysis model 2DPCA is built according to a formula (1); (2) a mixed 2DPC is built according to a formula (2); (3) the parameter in formula (2) is estimated through the maximum likelihood function of a formula (4); (4) when the formula (4) is solved, the parameter is optimized through a variation EM(Expectation Maximization) algorithm, in E step of the EM algorithm, the posteriori distribution of a hidden variable is solved, M step relates to parameter updating in formula (4), the E step and the M step are performed in an iterative way, so that the likelihood function value is increased and tends to be stable.

Description

A kind of mixing Two-dimensional Probabilistic principal component analytical method

Technical field

The invention belongs to the technical field of feature extraction and Data Dimensionality Reduction, relate to a kind of mixing Two-dimensional Probabilistic principal component analytical method particularly.

Background technology

Higher-dimension and multi-modal data are seen everywhere in the research of modern computer vision.The high-dimensional of data not only increases the complicacy of algorithm and the expense of storage, and also reduces algorithm popularity in actual applications.But high dimensional data is evenly distributed on a lower dimensional space or popular world often.So, find a kind of mapping relations of higher-dimension observation data in lower dimensional space to become a challenging problem of machine learning research.In the past few decades, the algorithm about Data Dimensionality Reduction obtains remarkable progress.

Principal component analysis (PCA) (PrincipalComponentAnalysis, PCA) is a kind of dimension reduction method being widely used in pattern-recognition and machine learning.PCA has a variety of explanation, and wherein a kind of is suppose that the high dimensional data observed is a kind of linear mapping of data in lower dimensional space.The data obtained in reality often have inner dependency structure, such as, and image.In order to apply PCA on 2-D data, the most direct way is by 2-D data vectorization.But the data after vectorization are all generally high dimensional datas, so not only can cause the problem of dimension disaster but also have ignored the spatial structural form of 2-D data.So the PCA of data vector is not a kind of feature extracting method of optimum.Therefore a kind of PCA (2DPCA) of 2-D data is proposed.Compared with traditional PCA, 2DPCA directly carries out dimensionality reduction computing to 2D data matrix, therefore remains the structural relation between data, and achieves the experimental result being better than PCA.

The PCA of these non-probability only relies on raw data, does not suppose any parameter, also the priori of observation data is not applied in predicting the outcome.Not enough in order to overcome this, TippingandBishop proposes a kind of pca model (PPCA) of probability.This model is that tables of data is shown as one-dimensional vector equally, and supposes that noise obedience average is 0, and oblique variance is the Gaussian distribution of unit matrix.Compared with traditional PCA, PPCA is that the theory of applied probability reaches Data Dimensionality Reduction, and the parameter of model draws by Maximum-likelihood estimation (MLE).And then Zhaoetal. proposes the probability P CA (2DPPCA) based on 2-D data, this is a breakthrough of conventional P PCA to 2D model.2DPPCA define only the overall situation projection of sample at image space, and in order to represent the local message of sample better, Wangetal. proposes mixing probability P CA (MP2DPCA) based on human face data.

The probability distribution of one group of 2-D data is quite complicated, and in most cases it can not be distributed by a certain particular probability and represent, so need to use Gaussian mixtures to be similar to.According to Bishop, by using abundant Gaussian distribution, and regulate their average and the coefficient of variance and linear combination, nearly all continuous probability density can both be similar to arbitrary precision.The linear combination of multiple Gaussian distribution is called that mixed Gaussian (mixtureofGaussians) distributes.Rasmussen proposes gauss hybrid models (GMM), and GMM is applied to picture background and extracts by Zivkovic, and Lietal. uses mixed Gaussian regretional analysis to carry out subspace clustering.The probability distribution of second order data uses the hybrid analog-digital simulation of multiple Gaussian distribution, namely be by Second Order High Dimensional data dimensionality reduction in multiple principal direction, calculative parameter is more, therefore variation EM algorithm (VariationalExpectationMaximization, VariationalEM) can be utilized to separate this model.

But above method all can not carry out the dimensionality reduction in row and column both direction to 2-D data.

Summary of the invention

Technology of the present invention is dealt with problems and is: overcome the deficiencies in the prior art, and provide a kind of mixing Two-dimensional Probabilistic principal component analytical method, it can carry out dimensionality reduction to 2-D data in row and column both direction, and quality reconstruction is better.

Technical solution of the present invention is: this mixing Two-dimensional Probabilistic principal component analytical method, and the Gaussian mixtures of matrix variables obeyed by sample, and the method comprises the following steps:

(1) the second order principal component model 2DPCA of probability is built according to formula (1)

X _n＝LB _nR ^T+M+E _n；(1)

Wherein L (p × r) and R (q × c) is the dimensionality reduction matrix in row and column direction respectively; B _n(r × c) is sample X _nhidden variable core, be called matrix of coefficients; R≤p, c≤q is the number of lines and columns after dimensionality reduction; M (p × q) is Mean Matrix, E _nthe noise meeting matrix Gaussian distribution, its each component meets

(2) 2DPCA of mixing is built according to formula (2)

The average item M be wherein separated _kthe average of kth class sample set, L _kand R _kthe kth class dimensionality reduction that sample set obtains, π _kblending ratio, it is the variance of a kth Gaussian distribution;

(3) by the maximum likelihood function of formula (4), the parameter in formula (2) is estimated

Z _nkvalue is 1 or 0, represents the n-th sample and whether belongs to a kth Gaussian distribution.

(4) when solution formula (4), variation greatest hope EM algorithm is utilized to carry out Optimal Parameters in the E step of EM algorithm, to hidden variable solve its Posterior distrbutionp: wherein average, be the covariance matrix in row and column direction respectively, calculate formula (9)-(11):

T_{n}^{(k)} = {cσ}_{k}^{2} {[t r (R_{k}^{T} R_{k} S_{n}^{(k)}) L_{k}^{T} L_{k} + σ_{k}^{2} t r (S_{n}^{(k)}) I_{r}]}^{- 1} - - - (9)

S_{n}^{(k)} = {rσ}_{k}^{2} {[t r (L_{k}^{T} L_{k} T_{n}^{(k)}) R_{k}^{T} R_{k} + σ_{k}^{2} t r (T_{n}^{(k)}) I_{c}]}^{- 1} - - - (10)

v e c (Q_{n}^{(k)}) = {[R_{k}^{T} R_{k} &CircleTimes; L_{k}^{T} L_{k} + σ_{k} I &CircleTimes; σ_{k} I]}^{- 1} y - - - (11)

Wherein

y = v e c (L_{k}^{T} (X_{n} - M_{k}) R_{k});

M walks the parameter in more new formula (4), obtains formula (12)-(14):

L_{k} = [Σ_{n = 1}^{N} γ_{n k} (X_{n} - M_{k}) R_{k} {(Q_{n}^{(k)})}^{T}] {[Σ_{n = 1}^{N} γ_{n k} Q_{n}^{(k)} R_{k}^{T} R_{k} {(Q_{n}^{(k)})}^{T} + t r (R_{k}^{T} R_{k} S_{n}^{(k)}) T_{n}^{(k)}]}^{- 1} - - - (12)

R_{k} = [Σ_{n = 1}^{N} γ_{n k} {(X_{n} - M_{k})}^{T} L_{k} Q_{n}^{(k)}] {[Σ_{n = 1}^{N} γ_{n k} {(Q_{n}^{(k)})}^{T} L_{k}^{T} L_{k} Q_{n}^{(k)} + t r (L_{k}^{T} L_{k} T_{n}^{(k)}) S_{n}^{(k)}]}^{- 1} - - - (13)

\begin{matrix} σ_{k}^{2} = \frac{1}{{pqN}_{k}} {Σ_{n = 1}^{N} γ_{n k} t r {(X_{n} - M_{k})}^{T} (X_{n} - M_{k}) \\ - 2 Σ_{n = 1}^{N} γ_{n k} t r (R_{k} < B_{n}^{(k)} >^{T} L_{k}^{T} (X_{n} - M_{k})) \\ + Σ_{n = 1}^{N} γ_{n k} t r (< B_{n}^{(k) T} L_{k}^{T} L_{k} B_{n}^{(k)} > R_{k}^{T} R_{k})} \end{matrix} - - - (14)

Wherein γ _{n, k}represent that the n-th sample belongs to the posterior probability of a kth Gaussian distribution;

Iteration carries out E step and M step, makes likelihood function value increase and tend towards stability.

The present invention is based on mixed Gauss model and dimensionality reduction is carried out to 2-D data, by introducing hidden variable, use variation EM algorithm solving model parameter, and the matrix of coefficients after dimensionality reduction, reach the effect that 2-D data is compressed, use dimensionality reduction matrix and matrix of coefficients reconstructed image, compare with original image and obtain losing less image, matrix of coefficients after dimensionality reduction is regarded as the feature of sample, coefficient of performance matrix is classified to sample, so can carry out dimensionality reduction to 2-D data in row and column both direction, quality reconstruction is better.

Accompanying drawing explanation

Fig. 1 shows the reconstructed error of algorithms of different on MNIST database, K=10 in K=5, Fig. 1 c in K=2, Fig. 1 b in Fig. 1 a.

Fig. 2 a is the reconstructed error curve map on Yale database, and Fig. 2 b is the reconstructed error curve map on AR database, K=5 in two figure.

Embodiment

This mixing Two-dimensional Probabilistic principal component analytical method, the Gaussian mixtures of matrix variables obeyed by sample, and the method comprises the following steps:

X _n＝LB _nR ^T+M+E _n；(1)

(2) 2DPCA of mixing is built according to formula (2)

T_{n}^{(k)} = {cσ}_{k}^{2} {[t r (R_{k}^{T} R_{k} S_{n}^{(k)}) L_{k}^{T} L_{k} + σ_{k}^{2} t r (S_{n}^{(k)}) I_{r}]}^{- 1} - - - (9)

S_{n}^{(k)} = {rσ}_{k}^{2} {[t r (L_{k}^{T} L_{k} T_{n}^{(k)}) R_{k}^{T} R_{k} + σ_{k}^{2} t r (T_{n}^{(k)}) I_{c}]}^{- 1} - - - (10)

v e c (Q_{n}^{(k)}) = {[R_{k}^{T} R_{k} &CircleTimes; L_{k}^{T} L_{k} + σ_{k} I &CircleTimes; σ_{k} I]}^{- 1} y - - - (11)

Wherein

y = v e c (L_{k}^{T} (X_{n} - M_{k}) R_{k});

M walks the parameter in more new formula (4), obtains formula (12)-(14):

L_{k} = [Σ_{n = 1}^{N} γ_{n k} (X_{n} - M_{k}) R_{k} {(Q_{n}^{(k)})}^{T}] {[Σ_{n = 1}^{N} γ_{n k} Q_{n}^{(k)} R_{k}^{T} R_{k} {(Q_{n}^{(k)})}^{T} + t r (R_{k}^{T} R_{k} S_{n}^{(k)}) T_{n}^{(k)}]}^{- 1} - - - (12)

R_{k} = [Σ_{n = 1}^{N} γ_{n k} {(X_{n} - M_{k})}^{T} L_{k} Q_{n}^{(k)}] {[Σ_{n = 1}^{N} γ_{n k} {(Q_{n}^{(k)})}^{T} L_{k}^{T} L_{k} Q_{n}^{(k)} + t r (L_{k}^{T} L_{k} T_{n}^{(k)}) S_{n}^{(k)}]}^{- 1} - - - (13)

\begin{matrix} σ_{k}^{2} = \frac{1}{{pqN}_{k}} {Σ_{n = 1}^{N} γ_{n k} t r {(X_{n} - M_{k})}^{T} (X_{n} - M_{k}) \\ - 2 Σ_{n = 1}^{N} γ_{n k} t r (R_{k} < B_{n}^{(k)} >^{T} L_{k}^{T} (X_{n} - M_{k})) \\ + Σ_{n = 1}^{N} γ_{n k} t r (< B_{n}^{(k) T} L_{k}^{T} L_{k} B_{n}^{(k)} > R_{k}^{T} R_{k})} \end{matrix} - - - (14)

Illustrate in greater detail the present invention now.

In order to solution formula (2), utilize the density function of the Posterior distrbutionp of all variablees in variation approximate data solving model.

The second order PCA (PSOPCA) of 1 probability

Make χ={ X ₁, X ₂..., X _nn number of sample of one group of I.i.d. random variables, wherein each size is 2D principal component analysis (PCA) can be expressed as following form:

X _n＝LB _nR ^T+M+E _n；(1)

Wherein L (p × r) and R (q × c) is the dimensionality reduction matrix in row and column direction respectively, B _n(r × c) is sample X _nhidden variable core, i.e. matrix of coefficients.R≤p, c≤q is the number of lines and columns after dimensionality reduction.M (p × q) is Mean Matrix, E _nthe noise meeting matrix Gaussian distribution, that is, namely its each component meets then model (1) is exactly the probability 2DPCA of standard.

2MixB2DPPCA model

For more complicated data set, only the original sample set of matching is difficult to a principal component model, because this principal component analysis (PCA) is a kind of overall dimensionality reduction model, for the sample of some Data distribution8 complexity, a principal direction is only found obviously to be irrational.Therefore, propose a kind of local 2DPCA model, utilize the mixed form of multiple 2DPCA to find one group of dimensionality reduction direction, thus better can represent the major component of raw data.

In this project, consider the 2DPCA of mixing, object is to the two dimensional sample data of complexity after obtaining classification, and solves the dimensionality reduction matrix of each class.Suppose sample X _nobey the gauss hybrid models (GMM) be made up of K Gaussian distribution, namely

It should be noted that an average item M be separated _kbe associated with K blending constituent, in fact it is the average of kth class sample set, L _kand R _kit is the kth class dimensionality reduction that sample set obtains.π _kblending ratio,

π_{k} > 0, Σ_{k = 1}^{K} π_{k} = 1.

Introduce a K and tie up binary random variables z, in K element, only have an element z _kequal 1, other elements are 0. namely z _k∈ 0,1} and so p (z _k=1)=π _k, then the Posterior distrbutionp of z is defined as:

p (z) = Π_{k = 1}^{K} π_{k}^{z_{k}}

Therefore the conditional probability of a sample Xn obedience kth Gaussian distribution is

Then the conditional probability of sample Xn can be write as:

Suppose hidden variable matrix prior probability obey average be 0, row, column covariance is the Gaussian distribution of unit matrix:

The variation greatest hope of 3MixB2DPPCA solves

By maximum likelihood function, the parameter in formula (2) is estimated.By utilizing the thought of mixed Gauss model, formula (2) can reach classifies to the data in sample set, and finds different classes of dimensionality reduction matrix L _kand R _k(k=1 ..., K).The likelihood function of this model is:

When solving this model, variation greatest hope (EM) algorithm optimization Model Parameter can be utilized

L_{k}, R_{k}, π_{k}, σ_{k}^{2}, (k = 1, ..., K) .

(1) variable γ is introduced _{n, k}, it represents that namely the n-th sample belongs to the posterior probability of a kth Gaussian distribution,

γ_{n k} = \frac{π_{k} p (X_{n} | k)}{p (X_{n})} = \frac{π_{k} p (v e c (X_{n}) | k)}{p (v e c (X_{n}))} - - - (5)

Wherein p (X _n| k) be sample X _nmarginal probability.Due in given hidden variable rear X _nconditional probability be:

Therefore vec (X _n) marginal distribution be: wherein

C = σ^{2} I + W_{k} W_{k}^{T}, W_{k} = R_{k} &CircleTimes; L_{k}

Thus, the weight coefficient of each Gaussian distribution can be obtained:

π_{k} = \frac{1}{N} Σ_{n = 1}^{N} γ_{n k} - - - (7)

And average:

M_{k} = \frac{Σ_{n = 1}^{N} γ_{n k} X_{n}}{Σ_{n = 1}^{N} γ_{n k}} - - - (8)

(2) parameter of first initialization mixture model (1) in the E step of EM algorithm, to hidden variable solve its Posterior distrbutionp: wherein the covariance matrix in row and column direction respectively.Can draw as calculated:

T_{n}^{(k)} = {cσ}_{k}^{2} {[t r (R_{k}^{T} R_{k} S_{n}^{(k)}) L_{k}^{T} L_{k} + σ_{k}^{2} t r (S_{n}^{(k)}) I_{r}]}^{- 1} - - - (9)

S_{n}^{(k)} = {rσ}_{k}^{2} {[t r (L_{k}^{T} L_{k} T_{n}^{(k)}) R_{k}^{T} R_{k} + σ_{k}^{2} t r (T_{n}^{(k)}) I_{c}]}^{- 1} - - - (10)

v e c (Q_{n}^{(k)}) = {[R_{k}^{T} R_{k} &CircleTimes; L_{k}^{T} L_{k} + σ_{k} I &CircleTimes; σ_{k} I]}^{- 1} y - - - (11)

Wherein

y = v e c (L_{k}^{T} (X_{n} - M_{k}) R_{k}) .

(3), after E step obtains the Posterior distrbutionp of hidden variable, M walks the parameter in Renewal model, i.e. dimensionality reduction matrix.By maximization likelihood function, can obtain:

L_{k} = [Σ_{n = 1}^{N} γ_{n k} (X_{n} - M_{k}) R_{k} {(Q_{n}^{(k)})}^{T}] {[Σ_{n = 1}^{N} γ_{n k} Q_{n}^{(k)} R_{k}^{T} R_{k} {(Q_{n}^{(k)})}^{T} + t r (R_{k}^{T} R_{k} S_{n}^{(k)}) T_{n}^{(k)}]}^{- 1} - - - (12)

R_{k} = [Σ_{n = 1}^{N} γ_{n k} {(X_{n} - M_{k})}^{T} L_{k} Q_{n}^{(k)}] {[Σ_{n = 1}^{N} γ_{n k} {(Q_{n}^{(k)})}^{T} L_{k}^{T} L_{k} Q_{n}^{(k)} + t r (L_{k}^{T} L_{k} T_{n}^{(k)}) S_{n}^{(k)}]}^{- 1} - - - (13)

The present invention has done corresponding experiment on four databases, handwritten form database, Yale database, AR database and FERET database.These experiments are to prove that method in this paper can be lost smaller to sample dimensionality reduction, and matrix of coefficients after dimensionality reduction is as the feature of original image, can effectively to Images Classification.The algorithm related to has: GLRAM (GeneralizedLowRankApproximationsofMatrices), mixPPCA (MixtureProbabilisticPCA), mixB2DPPCA (MixtureofBilateral-ProjectionTwo-dimensionalProbabilisti cPCA).

The preparation of 1 data and experiment parameter setting

Following four databases are applied in experiment:

◆ a subset of MINIST database ( http:// yann.lecun.com/exdb/mnist)

◆ Yale database

( http://vision.ucsd.edu/content/yale-face-database)

◆ AR database

( http://rvl1.ecn.purdue.edu/～aleix/aleix_face_DB.html)

◆ FERET database

( http://www.itl.nist.gov/iad/humanid/feret/feret_master.html)

From MNIST database, select 1000 pictures, namely each digital random selects 100.All images are gray level image and size is 28 × 28.

Yale database comprises 15 people, everyone 11 width images.These images obtain in different illumination conditions and expression.Training stage we select everyone 6 width images (90) as training, remaining conduct test, all image sizes are 64 × 64.

AR database comprises the 4000 width images of 126 people, everyone 26 width face direct pictures, and this 26 width image comprises expression shape change, illumination variation and blocks.Everyone 26 width images are divided into two parts (shooting time is separated by two weeks), and two parts are 13.In this experiment, select the image of 30 people to test (male 15 people, female 15 people), and only select everyone unscreened 14 width images.Wherein everyone front 7 width are used for training, and rear 7 width are tested, and every pictures down-sampling is 64 × 64.

FERET database comprises the 1400 width images of 200 people, and everyone has 7 width images, and comprise angle, expression, light and shade change in this 7 width image, the size of every pictures is 80 × 80.The image selecting 50 people is at random tested, and everyone random choose 5 is trained, and tests in addition 2.

π in this experiment _kinitial value is 1/K, T _kand S _kinitial value is unit matrix, σ _kinitial value is 1, L _kand R _kinitial value is given at random.

2 reconstructed errors

The mainly reconstructed error of more different dimension reduction method on MNIST and Yale database.The method related to has GLRAM, mixB2DPPCA.GLRAM and mixB2DPPCA that mixPPCA and the present invention propose is the feature extracting method of two dimension, at row, column both direction dimensionality reduction; MixPPCA and mixB2DPPCA utilizes gauss hybrid models to Data Dimensionality Reduction.

Fig. 1 is the reconstructed error of algorithms of different on MNIST database.As can be seen from the figure, when K is identical, the reconstructed error of method in this paper is used to be better than mixPPCA; When K changes, the reconstructed error of GLRAM is constant, this is because GLRAM method uses non-mixed model, has nothing to do with the size of K; When K increases, the reconstructed error of mixPPCA and mixB2DPPCA all decreases, and mixB2DPPCA is always better than mixPPCA and GLRAM.

Fig. 2 is the reconstructed error curve map on Yale database and AR database, wherein K=5.As can be seen from the figure the algorithm using the present invention to propose can obtain the less reconstructed image of reconstructed error, i.e. the loss reduction when dimensionality reduction.

3 discriminations

3rd the main robustness verifying mixB2DPPCA algorithm at Yale, AR and FERET database in discrimination of experiment.Utilize arest neighbors (1-NN) algorithm as sorter in experiment.Table 1 is the discriminations of GLRAM, mixPPCA and mixB2DPCA tri-kinds of methods on Yale storehouse.Wherein the dimension (r, c) of dimensionality reduction is respectively (2,2), (4,4), (6,6) and (8,8).The K of gauss hybrid models respectively value is 4,6,8.

Table 1

Table 2 is the discriminations on AR database.Dimensionality reduction dimension (r, c) is respectively (4,4), (6,6) and (8,8).The K of gauss hybrid models respectively value is 6,8,10.

Table 2

Table 3 is the discriminations on FERET database.Dimensionality reduction dimension (r, c) is respectively (4,4), (6,6), (8,8) and (10,10).The K of gauss hybrid models respectively value is 6,8,10.

Table 3

As can be seen from the table above, the feature using mixB2DPPCA of the present invention to extract has clear superiority in identification.

The above; it is only preferred embodiment of the present invention; not any pro forma restriction is done to the present invention, every above embodiment is done according to technical spirit of the present invention any simple modification, equivalent variations and modification, all still belong to the protection domain of technical solution of the present invention.

Claims

1. mix a Two-dimensional Probabilistic principal component analytical method, it is characterized in that, the Gaussian mixtures of matrix variables obeyed by sample, and the method comprises the following steps:

X _n＝LB _nR ^T+M+E _n；(1)

(2) 2DPCA of mixing is built according to formula (2)

The average item M be wherein separated _kthe average of kth class sample set, L _kand R _kthe kth class dimensionality reduction that sample set obtains, π _kblending ratio, π _k> 0, it is the variance of a kth Gaussian distribution;

T_{n}^{(k)} = {cσ}_{k}^{2} {[t r (R_{k}^{T} R_{k} S_{n}^{(k)}) L_{k}^{T} L_{k} + σ_{k}^{2} t r (S_{n}^{(k)}) I_{r}]}^{- 1} - - - (9)

S_{n}^{(k)} = {rσ}_{k}^{2} {[t r (L_{k}^{T} L_{k} T_{n}^{(k)}) R_{k}^{T} R_{k} + σ_{k}^{2} t r (T_{n}^{(k)}) I_{c}]}^{- 1} - - - (10)

v e c (Q_{n}^{(k)}) = {[R_{k}^{T} R_{k} &CircleTimes; L_{k}^{T} L_{k} + σ_{k} I &CircleTimes; σ_{k} I]}^{- 1} y - - - (11)

Wherein

y = v e c (L_{k}^{T} (X_{n} - M_{k}) R_{k});

M walks the parameter in more new formula (4), obtains formula (12)-(14):

L_{k} = [Σ_{n = 1}^{N} γ_{n k} (X_{n} - M_{k}) R_{k} {(Q_{n}^{(k)})}^{T}] {[Σ_{n = 1}^{N} γ_{n k} Q_{n}^{(k)} R_{k}^{T} R_{k} {(Q_{n}^{(k)})}^{T} + t r (R_{k}^{T} R_{k} S_{n}^{(k)}) T_{n}^{(k)}]}^{- 1} - - - (12)

R_{k} = [Σ_{n = 1}^{N} γ_{n k} {(X_{n} - M_{k})}^{T} L_{k} Q_{n}^{(k)}] {[Σ_{n = 1}^{N} γ_{n k} {(Q_{n}^{(k)})}^{T} L_{k}^{T} L_{k} Q_{n}^{(k)} + t r (L_{k}^{T} L_{k} T_{n}^{(k)}) S_{n}^{(k)}]}^{- 1} - - - (13)

\begin{matrix} σ_{k}^{2} = \frac{1}{{pqN}_{k}} {Σ_{n = 1}^{N} γ_{n k} t r {(X_{n} - M_{k})}^{T} (X_{n} - M_{k}) \\ - 2 Σ_{n = 1}^{N} γ_{n k} t r (R_{k} < B_{n}^{(k)} >^{T} L_{k}^{T} (X_{n} - M_{k})) \\ + Σ_{n = 1}^{N} γ_{n k} t r (< B_{n}^{(k) T} L_{k}^{T} L_{k} B_{n}^{(k)} > R_{k}^{T} R_{k})} \end{matrix} - - - (14)