CN104050482B

CN104050482B - A kind of manifold learning generalization algorithm based on local linear smoothing

Info

Publication number: CN104050482B
Application number: CN201410288959.XA
Authority: CN
Inventors: 张淼; 刘攀; 赖镇洲; 沈毅
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2014-06-24
Filing date: 2014-06-24
Publication date: 2017-06-13
Anticipated expiration: 2034-06-24
Also published as: CN104050482A

Abstract

A kind of manifold learning generalization algorithm based on local linear smoothing, belongs to hyperspectral image data dimensionality reduction technology field.It is an object of the invention to propose it is a kind of suitable for any manifold learning arithmetic, can keep former manifold learning dimensionality reduction result, the extensive method of the manifold learning based on local linear smoothing, its step to be：First, neighborhood is found；2nd, projection matrix is calculated；3rd, linear regression coeffficient matrix is asked for；4th, new samples dimensionality reduction result is calculated.The present invention can realize the extensive of new samples on the basis of original manifold learning dimensionality reduction result is kept, construct a Linear Mapping from higher-dimension to low-dimensional, may be such that any manifold learning arithmetic for not possessing generalization ability such as LE, LLE, LTSA etc. have generalization ability, so that these time-consuming manifold learning arithmetics are applied to the dimension-reduction treatment process of high spectrum image.

Description

Manifold learning generalization algorithm based on local linear regression

Technical Field

The invention belongs to the technical field of hyperspectral image data dimension reduction, and particularly relates to a generalization algorithm for manifold learning.

Background

The hyperspectral image can record rich spectral information of the ground objects, and the method is favorable for accurately and finely classifying and identifying the ground objects. However, the increase of the number of the hyperspectral data wave bands inevitably causes difficulty in information redundancy and data processing, dimension disaster is caused, and the phenomenon brings obstruction to hyperspectral data processing, so that the problem that the redundancy of the hyperspectral data is eliminated becomes necessary to be solved. The redundancy of the high-spectrum data is mainly caused by the correlation between the bands of the high-spectrum data, the dimension reduction is an important preprocessing method, the characteristics of the high-dimensional data are expressed by using the low-dimensional data, the image information can be effectively reserved, and the information redundancy is reduced. Common hyperspectral image dimension reduction algorithms are divided into linear dimension reduction and nonlinear dimension reduction, although the linear dimension reduction algorithms such as PCA (principal Component analysis) and LDA (linear Discriminant analysis) are simple to implement, the manifold learning algorithm can better mine the nonlinear structure of hyperspectral data and improve the data analysis capability because the hyperspectral images have nonlinear characteristics. Classical manifold learning algorithms include le (laplacian egenmap), lle (local Linear embedding) and ltsa (local satellite Space alignment) algorithms, and can be used for a feature extraction algorithm of a hyperspectral image. However, most classical nonlinear manifold learning algorithms have no generalization capability, cannot directly learn new hyperspectral data, and can obtain a new dimension reduction result only by learning together with all old data.

The generalization problem of the manifold learning algorithm can be described as: known hyperspectral dataset X ═ { X ═ X₁，x₂，...，x_N}∈R^D ^×NN is the number of sample sets, D is the dimension of the sample set, x_jFor the j-th hyperspectral data sample shown, it is also known that a certain manifold learning algorithm reduces the dimension of X to obtain a reduced-dimension data set Y ═ { Y ═ Y₁，y₂，...，y_N}∈R^d×ND is the dimensionality reduction, y_jIs x_jFor a new hyperspectral data x_new∈R^D×1Find x_newCorresponding dimensionality reduction result y_new∈R^d×1。

Many attempts have been made by many researchers to generalize manifold learning. The common ideas are to linearize the nonlinear learning algorithm, and extend the kernel of the nonlinear learning algorithm. However, linearization changes the dimensionality reduction of the original manifold learning algorithm, and kernel development requires the construction of kernel functions for each specific manifold learning algorithm.

Disclosure of Invention

The invention aims to provide a manifold learning generalization method which is applicable to any manifold learning algorithm, can keep the original manifold learning dimensionality reduction result and is based on local linear regression, so that the manifold learning algorithm without generalization capability, such as classical manifold learning algorithms of LTSA, LLE, LE and the like, obtains generalization capability, and the manifold learning algorithm is applicable to the dimensionality reduction processing process of a hyperspectral image.

The purpose of the invention is realized by the following technical scheme:

data distributed over a high-dimensional manifold can be approximated as being distributed over a low-dimensional hyperplane in a small local area, and in this neighborhood, a linear mapping between high-dimensional data and low-dimensional embedding can be assumed. Therefore, for a new sample point, its neighborhood is first found in the original data space, then a linear mapping from high dimension to low dimension is constructed in the neighborhood, and finally generalization of the new sample is realized through the linear mapping.

As shown in fig. 1, the manifold learning generalization algorithm based on local linear regression provided by the present invention specifically comprises the following steps:

step one, searching the field:

for a new data sample x_newFinding X in the hyperspectral dataset X_newThe k nearest sample points form a neighborhood data setD is a sample set dimension, and is obtainedCorresponding dimension reduction neighborhood data set in dimension reduction data set Yd is dimension reduction, k is required to be more than or equal to d, and R represents a real number domain.

Calculating x_newX of the jth hyperspectral data sample in X_jDistance b (j):

B(j)＝||x_j-x_new||₂j＝1，2，...，N。

sorting the elements in the distance vector B from small to large, and taking the data samples corresponding to the first k smallest elements to form the data samplesAnd in Y_oldIn which a correspondence is found

Step two, calculating a projection matrix:

1) constructing a matrix C:

wherein H_kIn order to centre the operator in the k-dimension,e_k＝[1，1，...1]^T∈R^k×1is a column vector of length k with all "1" elements, I_kIs an identity matrix of k × k.

2) Performing characteristic decomposition on the matrix C:

Cv＝λv，

where v is the eigenvector of the matrix C, and λ is the eigenvalue corresponding to the eigenvector v.

3) Calculating a local projection matrix V:

wherein v is_iIth large eigenvalue λ of C_iCorresponding eigenvectors, V (i) is the ith column of the projection matrix V, and d is the dimension reduction dimension.

Step three, solving a linear regression coefficient matrix:

1) computing neighborhood data setsTangential space coordinate Z:

wherein, V^TIs the transpose of the local projection matrix V.

2) Calculating a linear regression coefficient matrix L:

wherein Z is a group represented byThe mapping relation of (1) is as follows:E_iis a linear regression error (·)⁺Representing the Moore-Penrose generalized inverse operator.

Step four, calculating a new sample dimension reduction result:

wherein,M_kan averaging operator is calculated for the k-dimension,y_newis x_newAnd (4) reducing the dimension.

Compared with the prior art, the invention has the following advantages:

the method can realize generalization of a new sample on the basis of keeping the original manifold learning dimensionality reduction result, constructs a linear mapping from a high dimension to a low dimension, and can enable any manifold learning algorithm without generalization capability, such as LE, LLE, LTSA and the like, to have generalization capability, so that the time-consuming manifold learning algorithms are suitable for the dimensionality reduction processing process of the hyperspectral image.

Drawings

FIG. 1 is a flow chart of the steps of the present invention;

FIG. 2 is a feature scatter plot of hyperspectral image data in the present invention;

FIG. 3 is a generalized error curve of hyperspectral data in the present invention.

Detailed Description

The technical solution of the present invention is further described below with reference to the accompanying drawings, but the present invention is not limited thereto, and modifications or equivalent substitutions may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

The hyperspectral image data was obtained by the kennedy space center in the united states by taking a picture of a farmland in indiana, which had a total of 16 different crops, and the spatial resolution of the image was 20 × 20m²Each pixel point has 224 wave bands, the coverage spectrum range is a spectrum range of 0.2-2.4 mu m, and the spectrum resolution is 10 nm.

Randomly extracting 1000 samples from all data of the IND PINE as training samples, and taking the other 1000 samples as testing samples as an original data set X to be subjected to dimensionality reduction_allThe training sample set is X_trainThe test sample set is X_testIn this embodiment, the LLE algorithm is used to perform manifold learning on the training samples and the test samples to obtain the reduced-dimension dataset Y of all the samples_allTest sample and training sample correspondenceThe dimensionality reduction result of is Y_trainAnd Y_test. The algorithm proposed by the invention uses Y_trainAnd X_allFor input, a generalization result Y of the test sample is obtained_newThen compare Y_testAnd Y_newError of (2):

performing manifold learning on all samples through LLE manifold learning algorithm to obtain dimension reduction result Y of all samples_allThe dimension reduction result of the training sample and the test sample is Y_trainAnd Y_test. For each data sample in a given sample set, performing the following steps:

step one, finding a neighborhood:

x for the test sample set_testOf one data sample x_newIn training sample set X_trainFind x in_newThe 10 nearest sample points form a neighborhood data setAnd obtainCorresponding dimension reduction neighborhood data set in dimension reduction data set Y

Calculating x_newAnd X_trainX of the jth hyperspectral data sample_jDistance b (j):

B(j)＝||x_j-x_new||₂j＝1，2，...，1000。

sequencing the elements in the distance vector B to obtainAnd in Y_trainIn which a correspondence is found。

Step two, calculating a projection matrix:

1) constructing a matrix C:

wherein H₁₀For the purpose of the 10-dimensional centering operator,e₁₀＝[1，1，...1]^T∈R^10×1is a column vector of length 10 with all "1" elements, I₁₀Is an identity matrix of 10 × 10.

2) Performing characteristic decomposition on the matrix C:

Cv＝λv。

3) calculating a local projection matrix V:

wherein v is_iIth large eigenvalue λ of C_iThe corresponding eigenvector, V (i), is the V (i) th column of the projection matrix.

Step three, solving a linear regression coefficient matrix:

1) computing neighborhood data setsTangential space coordinate Z:

Z＝V^TXH₁₀。

2) calculating a linear regression coefficient matrix L:

step four, calculating a new sample dimension reduction result:

wherein,M₁₀an averaging operator is calculated for the 10 dimensions,y_newis x_newAnd (4) reducing the dimension.

The results of the experiment are shown in FIG. 2. The left column in fig. 2 is the dimension reduction results of the hyperspectral image test data corresponding to the LE, LLE and LTSA algorithms, and the right column is the generalization results of the LE-LLR, LLE-LLR and LTSA-LLR algorithms on the hyperspectral test samples. It can be seen that the generalization result of the algorithm provided by the invention is approximately the same as the true value of the algorithm, and the method has a better generalization effect on the hyperspectral data. The generalization errors corresponding to the three algorithms LE, LLE and LTSA in the experiment were 1.9%, 2.0% and 1.3%.

Obtaining a generalization result Y of the test sample according to the previous steps_newOf the through typeCalculating Y_testAnd Y_newAnd obtaining a generalization error curve. Because the sampling density of the data also influences the final generalization error, the influence of the sampling density on the generalization error is studied by changing the number of sampling points in the experiment, and the experimental result is shown in fig. 3. As can be seen from the figure, the generalized refinement of the algorithm proposed by the present inventionThe degree is related to the sampling density of the data, the higher the sampling density is, the better the linearity of the local neighborhood is, and the smaller the generalization error of the algorithm is.

By analyzing the comprehensive embodiment, the manifold learning generalization method based on local linear regression improves the existing manifold learning algorithm, can well solve the problem that the existing manifold learning has no generalization capability, and has strong engineering practicability.

Claims

1. A manifold learning generalization method based on local linear regression is characterized in that the manifold learning generalization method comprises the following steps:

step one, finding a neighborhood:

for a new data sample x_newFinding X in the hyperspectral dataset X_newThe k nearest sample points form a neighborhood data setD is a sample set dimension, and is obtainedCorresponding dimension reduction neighborhood data set in dimension reduction data set Yd is dimension reduction dimension, k is required to be more than or equal to d, and R represents a real number domain;

calculating x_newX of the jth hyperspectral data sample in X_jDistance b (j):

B(j)＝||x_j-x_new||₂j＝1,2,...,N；

sequencing the elements in the distance vector B to obtainAnd find the corresponding in Y

Step two, calculating a projection matrix:

1) constructing a matrix C:

C = {\tilde{X}}^{T} H_{k} {\tilde{X}}^{T},

wherein H_kIn order to centre the operator in the k-dimension,e_k＝[1,1,...1]^T∈R^k×1is a column vector of length k with all "1" elements, I_kAn identity matrix of k × k;

2) performing characteristic decomposition on the matrix C:

Cv＝λv，

wherein v is the eigenvector of the matrix C, and λ is the eigenvalue corresponding to the eigenvector v;

3) calculating a local projection matrix V:

V (i) = \frac{1}{\sqrt{λ_{i}}} \tilde{X} H_{k} v_{i}, i = 1, 2, ..., d,

wherein v is_iIth large eigenvalue λ of C_iCorresponding eigenvectors, V (i) is the ith column of the projection matrix V, and d is the dimension reduction dimension;

step three, solving a linear regression coefficient matrix:

1) computing neighborhood data setsTangential space coordinate Z:

Z = V^{T} \tilde{X} H_{k},

wherein, V^TIs a transposed matrix of the local projection matrix V;

2) calculating a linear regression coefficient matrix L:

L = \tilde{Y} H_{k} {(Z)}^{+},

wherein Z is a group represented byThe mapping relation of (1) is as follows:E_iis a linear regression error (·)⁺The generalized inverse operator of Moore-Penrose is solved;

step four, calculating a new sample dimension reduction result:

y_{n e w} = {LV}^{T} (x_{n e w} - \overset{&OverBar;}{x}) + \overset{&OverBar;}{y},

2. The generalization method of manifold learning based on local linear regression of claim 1, wherein in the first step, the distance vector B is selected fromSorting the data samples in a small-to-large order, and taking the data sample composition corresponding to the first k minimum elementsAnd find the corresponding in Y