CN116416441A

CN116416441A - Hyperspectral image feature extraction method based on multi-level variational automatic encoder

Info

Publication number: CN116416441A
Application number: CN202111627432.1A
Authority: CN
Inventors: 于文博; 黄鹤; 沈纲祥
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2023-07-11
Also published as: WO2023125456A1

Abstract

The application provides a hyperspectral image feature extraction method based on a multi-level variation automatic encoder aiming at hyperspectral images. The method uses a variation automatic encoder as a method basic frame, and adopts the fusion characteristic finally corrected as the spatial spectrum combination characteristic finally output after training. The method can better extract important discrimination information in the data, improve the sortability and sorting precision of the pixels, reduce the occurrence of misclassification phenomenon in the subsequent sorting task and improve the anti-noise interference capability of the model.

Description

Hyperspectral image feature extraction method based on multi-level variational automatic encoder

Technical Field

The application relates to the technical field of spectral imaging, in particular to a hyperspectral image feature extraction method based on a multilayer variational automatic encoder.

Background

In the remote sensing field, hyperspectral imaging techniques are widely used in various studies. The hyperspectral image contains rich spatial features and spectral features, wherein the spatial features refer to spatial position information of pixels at each wavelength, and the spectral features refer to spectral curves formed by spectral reflectivities of single pixels at each wavelength. By extracting the characteristics of the hyperspectral image, the low-dimensional embedded characteristics containing rich discrimination information can be obtained, redundant information in the image is reduced, and the recognition accuracy in the subsequent classification research can be improved. In the early stage, the hyperspectral image feature extraction method mainly extracts the spectral features of pixels, and does not consider the position information among the pixels, so that a better result is difficult to obtain. Along with the improvement of computer computing power and the deep learning research, methods for extracting spatial spectrum features by training a neural network are sequentially provided, the methods introduce the idea of multi-sensor data fusion, and the method of independently extracting spatial features and spectral features and carrying out feature fusion is adopted, so that the information loss is avoided, and the performance of an algorithm is improved.

The hyperspectral image feature extraction method can be classified into a feature extraction method based on spectral features and a feature extraction method based on spatial features from the viewpoint of information sources.

The feature extraction method based on the spectral features is to construct a feature extractor by utilizing a single spectral curve in the hyperspectral image, and neglecting the position information of different pixels in the space dimension. Early, more widely used methods included principal component analysis (Principle Component Analysis, PCA), minimization of noise separation (Minimum Noise Fraction, MNF), linear discriminant analysis (Linear Discriminant Analysis, LDA), and the like. These methods generally consider internal discrimination information of the hyperspectral pixels to ensure sortability. With the continued deep learning research, some depth network models have also been applied to hyperspectral image feature extraction research, including Auto-encoder (AE), variational Auto-encoder (VAE), long Short term Memory network (LSTM), and so on. However, the method does not consider the position relation among different pixels, so that the information in the image is only described on the spectrum level, and the advantage of 'map unification' of the hyperspectral image is not fully exerted, namely, uniformity and synergy exist between spatial information and spectrum information in the image. Currently, the main method in the research field is a feature extraction method based on spatial spectrum features.

The hyperspectral image is typical three-dimensional cube data, the data combines the spatial information of the target ground object with the spectral information under each wavelength and is commonly reflected in the complete data, so that the hyperspectral image has the characteristic of map unification, namely the spatial information and the spectral information of the hyperspectral image are consistent. Meanwhile, due to the influence of the changeable shooting environment of the hyperspectral image and the external interference, the phenomena of 'one object with different spectrums' and 'one spectrum of foreign matters' still exist in the image, and the phenomena also interfere the result of hyperspectral analysis research. The spatial information of the hyperspectral image can be understood as a local spatial neighborhood of a single pixel in a spatial dimension, and the definition assumes that each pixel has a certain relation with the pixels in the spatial neighborhood, so that the position information of the pixel in a real ground object can be mastered by learning the distribution of the local spatial neighborhood. The current common method is to extract the local information of the pixels by using a convolutional neural network, extract the spectral features of the pixels by using a full-connection layer, and finally realize the combination of the spatial features by using a splicing layer. When spectral feature extraction is performed, some studies will also utilize continuous information within the spectral curve of the long-term memory network learning pixels. However, this type of method has certain drawbacks: (1) these methods do not consider continuous information in hyperspectral images from many aspects, but only describe continuous information from the perspective of the spectral curve; (2) when the method extracts the spectral features and the spatial features respectively, the correlation and the cooperative performance of the two mappings are poor, and feature fusion is realized only by utilizing a splicing layer in the last step, so that the distribution of the two features cannot be fully matched; (3) these methods do not consider the homology between spatial and spectral features, and although both features describe the information of the hyperspectral image from different levels, both describe the same hyperspectral pixel, so there is a potential for homology.

In a word, the existing hyperspectral image feature extraction method has certain defects:

(1) the prior hyperspectral image feature extraction method based on the empty spectrum features does not consider the continuous information of pixels from various layers, and most methods only describe the continuous information from the angle of a spectrum curve, so that the diversity of data is weakened;

(2) in the existing hyperspectral image feature extraction method based on the spatial spectrum features, the cooperation capability between the spatial feature mapping and the spectral feature mapping is poor, the two mappings are basically completely split, and feature fusion is carried out by adopting a splicing layer after feature extraction is carried out, but the data distribution of the spatial features and the spectral features is greatly different, so that the expected purpose is difficult to achieve by splicing cleanly;

(3) the homology between the spatial features and the spectral features is not considered in the existing hyperspectral image feature extraction method based on the spatial features, and the reality that both features are extracted from the same hyperspectral pixel is ignored, so that the data distribution difference of the two features is further increased, the feature fusion expression is not facilitated, and the subsequent classification research is also not facilitated.

Disclosure of Invention

To overcome the above drawbacks, the present application aims to: the application provides a hyperspectral image feature extraction method based on a multi-level variational automatic encoder (multi-level VAE) aiming at hyperspectral images. The method uses a variation automatic encoder as a method basic frame, and adopts the fusion characteristic finally corrected as the spatial spectrum combination characteristic finally output after training.

In order to achieve the above purpose, the present application adopts the following technical scheme,

the hyperspectral image feature extraction method based on the multi-level variation automatic encoder is characterized by comprising the following steps of:

s1, selecting a hyperspectral image, wherein the size of the hyperspectral image is X multiplied by Y multiplied by B, X and Y are the space sizes of the hyperspectral image under each wavelength, B is the number of the wavelengths of the hyperspectral image,

s2, configuring neighborhood information for each hyperspectral pixel in the hyperspectral image, namely selecting a neighborhood pixel with the surrounding size of s multiplied by s as neighborhood information of the pixel, wherein the neighborhood information refers to a square area taking the hyperspectral pixel as the center, the side length is s, the s is an odd number,

s3, transforming the neighborhood information based on a depth network model to obtain a size of 1×s ² X B as Input to the spatial feature extraction module ^p ，

Taking a second sample with a size of 1 XB hyperspectral pixels as an Input of a spectral feature extraction module ^q The first samples and the second samples have the same number and one-to-one correspondence,

s4, training a deep neural network,

s5, feature stitching and mean value feature mu calculation, namely the first step in the spatial feature extraction module

Input of layer->

Is->

Layer output->

The spectral feature extraction module is->

Input of layer->

Is->

Layer(s)

Splicing the layer outputs according to the calculation formula: />

Wherein 1 < i < m, according to the calculation formula, < ->

The mean characteristic μ was obtained with dimensions bs×s ² ×d，

S6, pooling operation, namely pooling the mean characteristic mu and the standard deviation characteristic delta by using an average pooling layer to obtain the pooled mean characteristic

And standard deviation characteristics after pooling->

The dimensions are all bs x d and,

s7, obtaining a fused feature O based on a feature fusion module, inputting the fused feature O to a decoder module, wherein the decoder is used for reconstructing data of the fused feature O,

s8, network optimization, namely according to the formula f=Γ _R +Γ _KL +Γ _Homo Constructing a loss function in training a network model, wherein

Wherein Σ (·) is the content in brackets all added together. Wherein Γ is _R Calculating similarity between spectral feature extraction module input and encoder output using Euclidean distance Γ _KL Is a loss function in a variational automatic encoder VAE, using KL divergence to calculate gaussian componentsSimilarity between cloth and embedded feature distribution Γ _Homo And calculating the similarity between the output of all the spatial feature extraction modules and the output of the corresponding spectral feature extraction module by using the spectral angular distance.

Preferably, in step S1, the method further includes:

carrying out normalization preprocessing on the hyperspectral image, setting a neighborhood size s, setting the number m of network layers in a spatial feature extraction module and a spectral feature extraction module, setting the number n of network layers in a decoder module, and embedding a feature dimension d, wherein d is an even number larger than 0.

Preferably, step S4 includes: from X Y dimensions of 1 xs ² And randomly selecting small batches of samples from the first sample of the XB and the second samples of the X Y size of the 1 XB respectively, inputting the small batches of samples into the deep neural network for training of the deep neural network, wherein the numbers of the small batches of pixels of the first sample and the second sample are bs, and the activation functions in the network are the Tanh activation functions.

Preferably, the hyperspectral image feature extraction method based on the multi-level variation automatic encoder is characterized by further comprising the following steps:

normalizing all X multiplied by Y hyperspectral pixels to enable the value range to be between-1 and 1, wherein the normalization formula is as follows:

wherein x is _min Representing minimum value, x in the pixel data _max Is the maximum value.

Preferably, the calculation formula of the Tanh activation function is:

preferably, step S8 further includes: using a loss function, selecting a step size of 10 ^-3 Optimizing the depth network model, and after the model is stable, pooling the mean value characteristics

As an output.

Preferably, step S3 includes:

transforming the hyperspectral pixel x with the size of 1 to obtain a hyperspectral pixel x with the size of

The second sample is used as the Input of the spectral feature extraction module ^q ，

If B is not s ² Is an integer multiple of s, then the epsilon wavelengths are removed so that B-epsilon is s ² Where q is used to refer to the relevant variable in the spectral feature extraction module.

Preferably, step S6 further includes:

obtaining standard deviation feature delta by using long-term memory network layer L, wherein the input of the long-term memory network layer L is

The number of nodes is d, and the delta size is bs multiplied by s ² ×d。

Preferably, in step S7, the decoder module includes:

n fully connected network layers, wherein each network layer is { d }, respectively ₁ ,d ₂ …d _n Each network layer input is { ind }, respectively ₁ ,ind ₂ …ind _n Each network layer output is { outd }, respectively ₁ ,outd ₂ …outd _n And the number of nodes of the last layer is B, and the number of nodes of other network layers is d.

Preferably, step S7 further includes:

according to the formula

And obtaining a fused characteristic O, wherein gamma is a randomly generated noise matrix, accords with Gaussian distribution and has the size of bs multiplied by d.

Advantageous effects

Compared with the prior art, the multi-continuous feature integration method of the hyperspectral image is used for extracting the spatial features and the spectral features of the hyperspectral image, two kinds of continuous information contained in the hyperspectral image are considered, and the purpose of describing information at multiple angles is achieved by designing a depth network model. In addition, in the method, through a multi-level spatial feature correction method, the spatial features of different stages are utilized to sequentially carry out information correction on the spectral features of corresponding stages, so that the cooperation capability of the two features in the extraction process is improved. According to the method for improving the homology of the multi-level spatial spectrum features based on the spectral angular distances, the spectral angular distances are utilized to gradually improve the homology among the spatial spectrum features of each level, the relevance of the spatial features and the spectral features in the feature mapping stage is improved, and the problem that classification precision is difficult to improve due to large difference of distribution of two feature data is solved.

Drawings

Figure 1 is a flow chart of a feature extraction method according to an embodiment of the present application,

fig. 2 is a flow chart of an overall differential demodulation process according to an embodiment of the present application.

Detailed Description

The above-described aspects are further described below in conjunction with specific embodiments. It should be understood that these examples are illustrative of the present application and are not limiting the scope of the present application. The implementation conditions employed in the examples may be further adjusted as in the case of the specific manufacturer, and the implementation conditions not specified are typically those in routine experiments.

The hyperspectral image feature extraction method based on the multi-level variation automatic encoder utilizes a long-short-term memory network layer to extract continuous features of pixels from a space layer and a spectrum layer, and uses a splicing layer to fuse the two continuous features, so that the problem of single continuous information in a traditional feature extraction algorithm is solved. The method for improving the homology of the multi-level spatial spectrum features based on the spectral angular distances utilizes the spectral angular distances to calculate and increase the homology among the spatial spectrum features of each stage, solves the problem that the subsequent classification precision is difficult to improve due to the large data distribution difference between the spatial features and the spectral features, and improves the relevance of the two features in the feature mapping stage.

The hyperspectral image feature extraction method based on the multi-level variation automatic encoder provided by the application is described below with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a feature extraction method, which includes:

s1, selecting/acquiring a hyperspectral image, wherein the size of the hyperspectral image is X multiplied by Y multiplied by B, X and Y are the spatial sizes of the hyperspectral image under each wavelength, B is the number of wavelengths of the hyperspectral image, carrying out normalization pretreatment on the hyperspectral image, setting a neighborhood size s, the number m of network layers in a spatial feature extraction module and a spectral feature extraction module, the number n of network layers in a decoder module and an embedded feature dimension d, wherein d is required to be an even number larger than 0.

S2, setting neighborhood information for each hyperspectral pixel (X multiplied by Y hyperspectral pixels in total), and selecting a neighborhood pixel with the surrounding size of sxs as the neighborhood information of the pixel, wherein the neighborhood information size is sxsxB.

S3, constructing a depth network model, and transforming neighborhood information of the pixels to obtain a pixel with the size of 1×s ² X B first sample and serve as Input of the spatial feature extraction module ^p . The hyperspectral pixel x with the size of 1 XB is transformed to obtain the pixel with the size of

And takes the second sample of the spectrum characteristic extraction module as Input of the spectrum characteristic extraction module ^q If B is not s ² Is an integer multiple of s, then the epsilon wavelengths are removed so that B-epsilon is s ² Is an integer multiple of (a). Wherein, the p upper corner mark is used for referring to the related variable in the spatial feature extraction module, and the q upper corner mark is used for referring to the related variable in the spectral feature extraction module. The first samples and the second samples have the same number and one-to-one correspondence. The depth network model comprises a spatial feature extraction module, a spectral feature extraction module, a feature fusion module and a decoder module.

S4, training the deep neural network, randomly selecting small batches of samples from all X multiplied by Y hyperspectral pixels, inputting the samples into the deep neural network, wherein the number of the small batches of pixels is bs, the activation function in the network is a Tanh activation function, and all network layers except the last layer of the decoder module are connected with a batch normalization layer (Batch Normalization Layer).

S5, performing characteristic splicing operation,

i.e. Input ^p As a means of

Input spatial feature extraction Module +.>

In the layer, get the output->

Inputq is taken as

Input spectral feature extraction Module +.>

In the layer, get the output->

Wherein (1)>

And->

Are all bs×s in size ² X d. Will->

As the>

Input of layer->

According to the following general formulaA kind of electronic device with high-pressure air-conditioning system

Wherein Concat (·) is a splicing operation, and splicing the two in the third dimension to obtain bs×s ² X 2d output and use it as the first spectral feature extraction module

Input of layer->

The first space feature extraction module

Input of layer->

Is->

Layer output->

The spectral feature extraction module is->

Input of layer->

Is->

Layer and->

Splicing of layer outputs is shown below:

wherein i is more than 1 and less than m. According to the following formula:

the mean characteristic μ was obtained with dimensions bs×s ² ×d。

S6, pooling operation, namely pooling the mean value characteristic mu and the standard deviation characteristic delta by using an average pooling layer (Average Pooling Layer) to obtain the pooled mean value characteristic

And standard deviation characteristics after pooling->

The dimensions are bs×d. The method further comprises the following steps: obtaining standard deviation feature delta by using a long-short-term memory network layer L, wherein the network layer input is that

The number of nodes is d, and the delta size is bs multiplied by s ² ×d。

S7, obtaining a fused feature O based on a feature fusion module, and taking the fused feature O as the input of a decoder module; in this step, the formula is used

And obtaining the fused characteristic O in a characteristic fusion module, wherein gamma is a randomly generated noise matrix which accords with Gaussian distribution and has the size of bs multiplied by d.

S8. according to the formula f=Γ _R +Γ _KL +Γ _Homo Constructing a loss function in training a network model, wherein

Wherein Σ (-) is the sum of the contents in brackets. Wherein Γ is _R Calculating similarity between spectral feature extraction module input and encoder output using Euclidean distance Γ _HL Is a loss function in a variational automatic encoder VAE, using KL divergence to calculate similarity between gaussian distribution and embedded feature distribution, Γ _Homo And calculating the similarity between the output of all the spatial feature extraction modules and the output of the corresponding spectral feature extraction module by using the spectral angular distance. In this embodiment, the spatial feature extraction module includes: m Long Short-term memory network layers (Long Short-term Memory Layer), each network layer is respectively

The input of each network layer is ∈>

The output of each network layer is ∈>

The number of the nodes of the last layer is d/2, and the number of the nodes of other network layers is d. The spectral feature extraction module comprises: m long-term and short-term memory network layers, each network layer is->

Each networkThe layer inputs are +.>

The output of each network layer is respectively

The number of the nodes of the last layer is d/2, and the number of the nodes of other network layers is d. The decoder module is composed of n fully connected network layers (Fully Connected Layer), each network layer is { d }, respectively ₁ ，d ₂ ...d _n Each network layer input is { ind }, respectively ₁ ，ind ₂ ...ind _n Each network layer output is { outd }, respectively ₁ ，outd ₂ ...outd _n And the number of nodes of the last layer is B, and the number of nodes of other network layers is d. The decoder is used for reconstructing the data of the fused characteristic O, and the structure of the automatic encoder is formed, so that the consistency of sample information is guaranteed.

In an embodiment, further comprising: using the above loss function, selecting step length of 10 ^-3 Optimizing the depth network model, and after the model is stable, pooling the mean value characteristics

And taking all the first samples and the second samples as test samples as output, and obtaining expected embedded features.

Preferably, in the step S4, all x×y hyperspectral pixels are divided into a training set and a testing set according to a certain proportion, and normalized to make the value range between-1 and 1, and the normalization formula is as follows:

wherein x is _min Representing minimum value, x in the pixel data _max Is the maximum value. Then randomly sequencing and packaging the training set pixels, namely dividing the training set pixels into a plurality of sample packets, wherein each sample packet contains bs pixels, and only selecting for each iteration optimizationOne of the sample packets is input into the neural network, and the sample packet selected each time is different. The calculation formula of the Tanh activation function is as follows:

the method described above is verified next in connection with the detailed description.

The hyperspectral image feature extraction method based on the multi-level variation automatic encoder is used for extracting the hollow spectrum features of hyperspectral images and for subsequent classification research, and takes a Indiana forest data set (Indian Pines Dataset) as an example, the image size is 145 multiplied by 200, and the image contains 21025 pixels in total, each pixel contains 200 spectrum wavelengths, and the whole data set contains 16 effective categories and background noise categories in total. After the pixels belonging to the background noise category are removed, 10366 effective pixels are remained in total. The deep network architecture is shown in fig. 2:

input: the input hyperspectral image is an image of 145×145×200 in size.

Parameter setting: the neighborhood size is 5, the number of network layers in the spatial feature extraction module and the spectral feature extraction module is 3, the number of network layers in the decoder module is 3, and the embedded feature dimension is 40.

And selecting neighborhood information, obtaining the neighborhood information with the size of 5 multiplied by 200 for each pixel, and inputting the pixel and the neighborhood information into a depth network for training.

Training the CNN

40% of samples in 10366 training set data are selected for training the deep network model, and the samples are randomly ordered and packaged, and the number of pixels in a small batch is 512. Only one of the sample packets is used for each training. After training, all 10366 training set data are input into a depth model for testing, embedded features with the size of 10366 multiplied by 40 are obtained, and finally classification is carried out by using an SVM classifier. And randomly selecting 10% of samples to train the SVM classifier, testing by using the rest 90% of samples to finally obtain a classification result, and evaluating the classification result by selecting the overall classification precision and the average classification precision. The overall classification result refers to the ratio of the number of correctly classified samples divided by the total number of samples in all samples. The average classification accuracy is first the ratio of the number of correctly classified samples in each class divided by the number of the class samples, and the average value of the various ratios is calculated.

The hyperspectral image characteristic extraction method based on the multi-level variation automatic encoder and the common variation automatic encoder (the common variation automatic encoder comprises an encoder, a characteristic fusion module and a decoder, wherein the encoder consists of 3 full-connection layers, the decoder consists of 3 full-connection layers, and the number of network layer nodes and the characteristic fusion module structure are the same as those of the method implemented by the application), and the obtained classification results are shown in the following table.

	Overall classification accuracy	Average classification accuracy
			The method implemented by the application	85.3％	79.1％
Adding random Gaussian noise	81.4％	72.3％
			Automatic encoder for common variation	76.7％	66.3％

As can be seen from the table, the method can better improve the classification performance of the embedded features and has fewer misclassification samples. In addition, by adding certain random Gaussian noise into the original hyperspectral image and repeating the experiment, the overall classification accuracy is 81.4% (the overall classification accuracy reaches 85.3% when no random Gaussian noise is added), so that the method has strong anti-noise interference capability. Therefore, the method can effectively improve the classifiability and classification precision of the embedded features and improve the noise interference resistance of the model.

The foregoing embodiments are provided to illustrate the technical concept and features of the present application and are intended to enable those skilled in the art to understand the contents of the present application and implement the same according to the contents, and are not intended to limit the scope of the present application. All such equivalent changes and modifications as come within the spirit of the disclosure are desired to be protected.

Claims

1. The hyperspectral image feature extraction method based on the multi-level variation automatic encoder is characterized by comprising the following steps of:

s3, constructing a depth network model, and transforming the neighborhood information based on the depth network model to obtain the depth-based depth network model with the size of 1 Xs ² X B as Input to the spatial feature extraction module ^p ，

Second sample of 1 XB hyperspectral pixelThe second sample is used as Input of the spectral feature extraction module ^q The first samples and the second samples have the same number and one-to-one correspondence,

s4, training a deep neural network,

Input of layer->

Is the first

Layer output->

The spectral feature extraction module is->

Input of layer->

Is->

Layer(s)

Splicing the layer outputs according to the calculation formula: />

Wherein 1 < i < m, according to the calculation formula, < ->

The mean characteristic μ was obtained with dimensions bs×s ² ×d，

And standard deviation characteristics after pooling->

The dimensions are all bs x d and,

s8. network optimization, i.e. according to the formula Γ=Γ _R +Γ _KL +Γ _Homo Constructing a loss function in training a network model, wherein

Wherein Σ (·) is the content in brackets all added together.

2. The hyperspectral image feature extraction method based on a multi-level variational automatic encoder as claimed in claim 1, wherein in step S1, further comprising:

3. The hyperspectral image feature extraction method based on the multi-level variational automatic encoder as claimed in claim 1, wherein the step S4 comprises:

from X Y dimensions of 1 xs ² And randomly selecting small batches of samples from the first sample of the XB and the second samples of the X Y size of the 1 XB respectively, inputting the small batches of samples into the deep neural network for training of the deep neural network, wherein the numbers of the small batches of pixels of the first sample and the second sample are bs, and the activation functions in the network are the Tanh activation functions.

4. The hyperspectral image feature extraction method based on a multi-level variational automatic encoder as claimed in claim 3, further comprising:

5. The hyperspectral image feature extraction method based on the multi-level variational automatic encoder as claimed in claim 3, wherein,

the calculation formula of the Tanh activation function is as follows:

6. the hyperspectral image feature extraction method based on the multi-level variational automatic encoder as claimed in claim 1, wherein,

the step S8 further includes: using a loss function, selecting a step size of 10 ^-3 Optimizing the constructed network model, and after the model is stable, pooling the mean value characteristics

And taking the first sample and the second sample as test samples to obtain expected embedded features.

7. The hyperspectral image feature extraction method based on the multi-level variational automatic encoder as claimed in claim 1, wherein,

the step S3 includes:

the hyperspectral pixel x with the size of 1 XB is transformed to obtain the pixel with the size of

8. The hyperspectral image feature extraction method based on the multi-level variational automatic encoder as claimed in claim 1, wherein,

the step S6 further includes:

The number of nodes is d, and the delta size is bs multiplied by s ² ×d。

9. The hyperspectral image feature extraction method based on the multi-level variational automatic encoder as claimed in claim 1, wherein,

in step S7, the decoder module includes:

10. The hyperspectral image feature extraction method based on a multi-level variational automatic encoder as claimed in claim 1, wherein step S7 further comprises:

according to the formula