CN116416441A - Hyperspectral image feature extraction method based on multi-level variational automatic encoder - Google Patents

Hyperspectral image feature extraction method based on multi-level variational automatic encoder Download PDF

Info

Publication number
CN116416441A
CN116416441A CN202111627432.1A CN202111627432A CN116416441A CN 116416441 A CN116416441 A CN 116416441A CN 202111627432 A CN202111627432 A CN 202111627432A CN 116416441 A CN116416441 A CN 116416441A
Authority
CN
China
Prior art keywords
feature extraction
hyperspectral image
network
hyperspectral
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111627432.1A
Other languages
Chinese (zh)
Inventor
于文博
黄鹤
沈纲祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN202111627432.1A priority Critical patent/CN116416441A/en
Priority to PCT/CN2022/142106 priority patent/WO2023125456A1/en
Publication of CN116416441A publication Critical patent/CN116416441A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/58Extraction of image or video features relating to hyperspectral data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The application provides a hyperspectral image feature extraction method based on a multi-level variation automatic encoder aiming at hyperspectral images. The method uses a variation automatic encoder as a method basic frame, and adopts the fusion characteristic finally corrected as the spatial spectrum combination characteristic finally output after training. The method can better extract important discrimination information in the data, improve the sortability and sorting precision of the pixels, reduce the occurrence of misclassification phenomenon in the subsequent sorting task and improve the anti-noise interference capability of the model.

Description

Hyperspectral image feature extraction method based on multi-level variational automatic encoder
Technical Field
The application relates to the technical field of spectral imaging, in particular to a hyperspectral image feature extraction method based on a multilayer variational automatic encoder.
Background
In the remote sensing field, hyperspectral imaging techniques are widely used in various studies. The hyperspectral image contains rich spatial features and spectral features, wherein the spatial features refer to spatial position information of pixels at each wavelength, and the spectral features refer to spectral curves formed by spectral reflectivities of single pixels at each wavelength. By extracting the characteristics of the hyperspectral image, the low-dimensional embedded characteristics containing rich discrimination information can be obtained, redundant information in the image is reduced, and the recognition accuracy in the subsequent classification research can be improved. In the early stage, the hyperspectral image feature extraction method mainly extracts the spectral features of pixels, and does not consider the position information among the pixels, so that a better result is difficult to obtain. Along with the improvement of computer computing power and the deep learning research, methods for extracting spatial spectrum features by training a neural network are sequentially provided, the methods introduce the idea of multi-sensor data fusion, and the method of independently extracting spatial features and spectral features and carrying out feature fusion is adopted, so that the information loss is avoided, and the performance of an algorithm is improved.
The hyperspectral image feature extraction method can be classified into a feature extraction method based on spectral features and a feature extraction method based on spatial features from the viewpoint of information sources.
The feature extraction method based on the spectral features is to construct a feature extractor by utilizing a single spectral curve in the hyperspectral image, and neglecting the position information of different pixels in the space dimension. Early, more widely used methods included principal component analysis (Principle Component Analysis, PCA), minimization of noise separation (Minimum Noise Fraction, MNF), linear discriminant analysis (Linear Discriminant Analysis, LDA), and the like. These methods generally consider internal discrimination information of the hyperspectral pixels to ensure sortability. With the continued deep learning research, some depth network models have also been applied to hyperspectral image feature extraction research, including Auto-encoder (AE), variational Auto-encoder (VAE), long Short term Memory network (LSTM), and so on. However, the method does not consider the position relation among different pixels, so that the information in the image is only described on the spectrum level, and the advantage of 'map unification' of the hyperspectral image is not fully exerted, namely, uniformity and synergy exist between spatial information and spectrum information in the image. Currently, the main method in the research field is a feature extraction method based on spatial spectrum features.
The hyperspectral image is typical three-dimensional cube data, the data combines the spatial information of the target ground object with the spectral information under each wavelength and is commonly reflected in the complete data, so that the hyperspectral image has the characteristic of map unification, namely the spatial information and the spectral information of the hyperspectral image are consistent. Meanwhile, due to the influence of the changeable shooting environment of the hyperspectral image and the external interference, the phenomena of 'one object with different spectrums' and 'one spectrum of foreign matters' still exist in the image, and the phenomena also interfere the result of hyperspectral analysis research. The spatial information of the hyperspectral image can be understood as a local spatial neighborhood of a single pixel in a spatial dimension, and the definition assumes that each pixel has a certain relation with the pixels in the spatial neighborhood, so that the position information of the pixel in a real ground object can be mastered by learning the distribution of the local spatial neighborhood. The current common method is to extract the local information of the pixels by using a convolutional neural network, extract the spectral features of the pixels by using a full-connection layer, and finally realize the combination of the spatial features by using a splicing layer. When spectral feature extraction is performed, some studies will also utilize continuous information within the spectral curve of the long-term memory network learning pixels. However, this type of method has certain drawbacks: (1) these methods do not consider continuous information in hyperspectral images from many aspects, but only describe continuous information from the perspective of the spectral curve; (2) when the method extracts the spectral features and the spatial features respectively, the correlation and the cooperative performance of the two mappings are poor, and feature fusion is realized only by utilizing a splicing layer in the last step, so that the distribution of the two features cannot be fully matched; (3) these methods do not consider the homology between spatial and spectral features, and although both features describe the information of the hyperspectral image from different levels, both describe the same hyperspectral pixel, so there is a potential for homology.
In a word, the existing hyperspectral image feature extraction method has certain defects:
(1) the prior hyperspectral image feature extraction method based on the empty spectrum features does not consider the continuous information of pixels from various layers, and most methods only describe the continuous information from the angle of a spectrum curve, so that the diversity of data is weakened;
(2) in the existing hyperspectral image feature extraction method based on the spatial spectrum features, the cooperation capability between the spatial feature mapping and the spectral feature mapping is poor, the two mappings are basically completely split, and feature fusion is carried out by adopting a splicing layer after feature extraction is carried out, but the data distribution of the spatial features and the spectral features is greatly different, so that the expected purpose is difficult to achieve by splicing cleanly;
(3) the homology between the spatial features and the spectral features is not considered in the existing hyperspectral image feature extraction method based on the spatial features, and the reality that both features are extracted from the same hyperspectral pixel is ignored, so that the data distribution difference of the two features is further increased, the feature fusion expression is not facilitated, and the subsequent classification research is also not facilitated.
Disclosure of Invention
To overcome the above drawbacks, the present application aims to: the application provides a hyperspectral image feature extraction method based on a multi-level variational automatic encoder (multi-level VAE) aiming at hyperspectral images. The method uses a variation automatic encoder as a method basic frame, and adopts the fusion characteristic finally corrected as the spatial spectrum combination characteristic finally output after training.
In order to achieve the above purpose, the present application adopts the following technical scheme,
the hyperspectral image feature extraction method based on the multi-level variation automatic encoder is characterized by comprising the following steps of:
s1, selecting a hyperspectral image, wherein the size of the hyperspectral image is X multiplied by Y multiplied by B, X and Y are the space sizes of the hyperspectral image under each wavelength, B is the number of the wavelengths of the hyperspectral image,
s2, configuring neighborhood information for each hyperspectral pixel in the hyperspectral image, namely selecting a neighborhood pixel with the surrounding size of s multiplied by s as neighborhood information of the pixel, wherein the neighborhood information refers to a square area taking the hyperspectral pixel as the center, the side length is s, the s is an odd number,
s3, transforming the neighborhood information based on a depth network model to obtain a size of 1×s 2 X B as Input to the spatial feature extraction module p
Taking a second sample with a size of 1 XB hyperspectral pixels as an Input of a spectral feature extraction module q The first samples and the second samples have the same number and one-to-one correspondence,
s4, training a deep neural network,
s5, feature stitching and mean value feature mu calculation, namely the first step in the spatial feature extraction module
Figure BDA0003440028390000041
Input of layer->
Figure BDA0003440028390000042
Is->
Figure BDA0003440028390000043
Layer output->
Figure BDA0003440028390000044
The spectral feature extraction module is->
Figure BDA0003440028390000045
Input of layer->
Figure BDA0003440028390000046
Is->
Figure BDA0003440028390000047
Layer(s)
Figure BDA0003440028390000048
Splicing the layer outputs according to the calculation formula: />
Figure BDA0003440028390000049
Wherein 1 < i < m, according to the calculation formula, < ->
Figure BDA00034400283900000410
The mean characteristic μ was obtained with dimensions bs×s 2 ×d,
S6, pooling operation, namely pooling the mean characteristic mu and the standard deviation characteristic delta by using an average pooling layer to obtain the pooled mean characteristic
Figure BDA00034400283900000411
And standard deviation characteristics after pooling->
Figure BDA00034400283900000412
The dimensions are all bs x d and,
s7, obtaining a fused feature O based on a feature fusion module, inputting the fused feature O to a decoder module, wherein the decoder is used for reconstructing data of the fused feature O,
s8, network optimization, namely according to the formula f=Γ RKLHomo Constructing a loss function in training a network model, wherein
Figure BDA00034400283900000413
Figure BDA00034400283900000414
Figure BDA0003440028390000051
Wherein Σ (·) is the content in brackets all added together. Wherein Γ is R Calculating similarity between spectral feature extraction module input and encoder output using Euclidean distance Γ KL Is a loss function in a variational automatic encoder VAE, using KL divergence to calculate gaussian componentsSimilarity between cloth and embedded feature distribution Γ Homo And calculating the similarity between the output of all the spatial feature extraction modules and the output of the corresponding spectral feature extraction module by using the spectral angular distance.
Preferably, in step S1, the method further includes:
carrying out normalization preprocessing on the hyperspectral image, setting a neighborhood size s, setting the number m of network layers in a spatial feature extraction module and a spectral feature extraction module, setting the number n of network layers in a decoder module, and embedding a feature dimension d, wherein d is an even number larger than 0.
Preferably, step S4 includes: from X Y dimensions of 1 xs 2 And randomly selecting small batches of samples from the first sample of the XB and the second samples of the X Y size of the 1 XB respectively, inputting the small batches of samples into the deep neural network for training of the deep neural network, wherein the numbers of the small batches of pixels of the first sample and the second sample are bs, and the activation functions in the network are the Tanh activation functions.
Preferably, the hyperspectral image feature extraction method based on the multi-level variation automatic encoder is characterized by further comprising the following steps:
normalizing all X multiplied by Y hyperspectral pixels to enable the value range to be between-1 and 1, wherein the normalization formula is as follows:
Figure BDA0003440028390000052
wherein x is min Representing minimum value, x in the pixel data max Is the maximum value.
Preferably, the calculation formula of the Tanh activation function is:
Figure BDA0003440028390000053
preferably, step S8 further includes: using a loss function, selecting a step size of 10 -3 Optimizing the depth network model, and after the model is stable, pooling the mean value characteristics
Figure BDA0003440028390000064
As an output.
Preferably, step S3 includes:
transforming the hyperspectral pixel x with the size of 1 to obtain a hyperspectral pixel x with the size of
Figure BDA0003440028390000061
The second sample is used as the Input of the spectral feature extraction module q
If B is not s 2 Is an integer multiple of s, then the epsilon wavelengths are removed so that B-epsilon is s 2 Where q is used to refer to the relevant variable in the spectral feature extraction module.
Preferably, step S6 further includes:
obtaining standard deviation feature delta by using long-term memory network layer L, wherein the input of the long-term memory network layer L is
Figure BDA0003440028390000062
The number of nodes is d, and the delta size is bs multiplied by s 2 ×d。
Preferably, in step S7, the decoder module includes:
n fully connected network layers, wherein each network layer is { d }, respectively 1 ,d 2 …d n Each network layer input is { ind }, respectively 1 ,ind 2 …ind n Each network layer output is { outd }, respectively 1 ,outd 2 …outd n And the number of nodes of the last layer is B, and the number of nodes of other network layers is d.
Preferably, step S7 further includes:
according to the formula
Figure BDA0003440028390000063
And obtaining a fused characteristic O, wherein gamma is a randomly generated noise matrix, accords with Gaussian distribution and has the size of bs multiplied by d.
Advantageous effects
Compared with the prior art, the multi-continuous feature integration method of the hyperspectral image is used for extracting the spatial features and the spectral features of the hyperspectral image, two kinds of continuous information contained in the hyperspectral image are considered, and the purpose of describing information at multiple angles is achieved by designing a depth network model. In addition, in the method, through a multi-level spatial feature correction method, the spatial features of different stages are utilized to sequentially carry out information correction on the spectral features of corresponding stages, so that the cooperation capability of the two features in the extraction process is improved. According to the method for improving the homology of the multi-level spatial spectrum features based on the spectral angular distances, the spectral angular distances are utilized to gradually improve the homology among the spatial spectrum features of each level, the relevance of the spatial features and the spectral features in the feature mapping stage is improved, and the problem that classification precision is difficult to improve due to large difference of distribution of two feature data is solved.
Drawings
Figure 1 is a flow chart of a feature extraction method according to an embodiment of the present application,
fig. 2 is a flow chart of an overall differential demodulation process according to an embodiment of the present application.
Detailed Description
The above-described aspects are further described below in conjunction with specific embodiments. It should be understood that these examples are illustrative of the present application and are not limiting the scope of the present application. The implementation conditions employed in the examples may be further adjusted as in the case of the specific manufacturer, and the implementation conditions not specified are typically those in routine experiments.
The hyperspectral image feature extraction method based on the multi-level variation automatic encoder utilizes a long-short-term memory network layer to extract continuous features of pixels from a space layer and a spectrum layer, and uses a splicing layer to fuse the two continuous features, so that the problem of single continuous information in a traditional feature extraction algorithm is solved. The method for improving the homology of the multi-level spatial spectrum features based on the spectral angular distances utilizes the spectral angular distances to calculate and increase the homology among the spatial spectrum features of each stage, solves the problem that the subsequent classification precision is difficult to improve due to the large data distribution difference between the spatial features and the spectral features, and improves the relevance of the two features in the feature mapping stage.
The hyperspectral image feature extraction method based on the multi-level variation automatic encoder provided by the application is described below with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a feature extraction method, which includes:
s1, selecting/acquiring a hyperspectral image, wherein the size of the hyperspectral image is X multiplied by Y multiplied by B, X and Y are the spatial sizes of the hyperspectral image under each wavelength, B is the number of wavelengths of the hyperspectral image, carrying out normalization pretreatment on the hyperspectral image, setting a neighborhood size s, the number m of network layers in a spatial feature extraction module and a spectral feature extraction module, the number n of network layers in a decoder module and an embedded feature dimension d, wherein d is required to be an even number larger than 0.
S2, setting neighborhood information for each hyperspectral pixel (X multiplied by Y hyperspectral pixels in total), and selecting a neighborhood pixel with the surrounding size of sxs as the neighborhood information of the pixel, wherein the neighborhood information size is sxsxB.
S3, constructing a depth network model, and transforming neighborhood information of the pixels to obtain a pixel with the size of 1×s 2 X B first sample and serve as Input of the spatial feature extraction module p . The hyperspectral pixel x with the size of 1 XB is transformed to obtain the pixel with the size of
Figure BDA0003440028390000081
And takes the second sample of the spectrum characteristic extraction module as Input of the spectrum characteristic extraction module q If B is not s 2 Is an integer multiple of s, then the epsilon wavelengths are removed so that B-epsilon is s 2 Is an integer multiple of (a). Wherein, the p upper corner mark is used for referring to the related variable in the spatial feature extraction module, and the q upper corner mark is used for referring to the related variable in the spectral feature extraction module. The first samples and the second samples have the same number and one-to-one correspondence. The depth network model comprises a spatial feature extraction module, a spectral feature extraction module, a feature fusion module and a decoder module.
S4, training the deep neural network, randomly selecting small batches of samples from all X multiplied by Y hyperspectral pixels, inputting the samples into the deep neural network, wherein the number of the small batches of pixels is bs, the activation function in the network is a Tanh activation function, and all network layers except the last layer of the decoder module are connected with a batch normalization layer (Batch Normalization Layer).
S5, performing characteristic splicing operation,
i.e. Input p As a means of
Figure BDA0003440028390000082
Input spatial feature extraction Module +.>
Figure BDA0003440028390000083
In the layer, get the output->
Figure BDA0003440028390000084
Inputq is taken as
Figure BDA0003440028390000085
Input spectral feature extraction Module +.>
Figure BDA0003440028390000086
In the layer, get the output->
Figure BDA0003440028390000087
Wherein (1)>
Figure BDA0003440028390000088
And->
Figure BDA0003440028390000089
Are all bs×s in size 2 X d. Will->
Figure BDA00034400283900000810
As the>
Figure BDA0003440028390000091
Input of layer->
Figure BDA0003440028390000092
According to the following general formulaA kind of electronic device with high-pressure air-conditioning system
Figure BDA0003440028390000093
Wherein Concat (·) is a splicing operation, and splicing the two in the third dimension to obtain bs×s 2 X 2d output and use it as the first spectral feature extraction module
Figure BDA0003440028390000094
Input of layer->
Figure BDA0003440028390000095
The first space feature extraction module
Figure BDA0003440028390000096
Input of layer->
Figure BDA0003440028390000097
Is->
Figure BDA0003440028390000098
Layer output->
Figure BDA0003440028390000099
The spectral feature extraction module is->
Figure BDA00034400283900000910
Input of layer->
Figure BDA00034400283900000911
Is->
Figure BDA00034400283900000912
Layer and->
Figure BDA00034400283900000913
Splicing of layer outputs is shown below:
Figure BDA00034400283900000914
wherein i is more than 1 and less than m. According to the following formula:
Figure BDA00034400283900000915
the mean characteristic μ was obtained with dimensions bs×s 2 ×d。
S6, pooling operation, namely pooling the mean value characteristic mu and the standard deviation characteristic delta by using an average pooling layer (Average Pooling Layer) to obtain the pooled mean value characteristic
Figure BDA00034400283900000916
And standard deviation characteristics after pooling->
Figure BDA00034400283900000917
The dimensions are bs×d. The method further comprises the following steps: obtaining standard deviation feature delta by using a long-short-term memory network layer L, wherein the network layer input is that
Figure BDA00034400283900000918
The number of nodes is d, and the delta size is bs multiplied by s 2 ×d。
S7, obtaining a fused feature O based on a feature fusion module, and taking the fused feature O as the input of a decoder module; in this step, the formula is used
Figure BDA00034400283900000919
And obtaining the fused characteristic O in a characteristic fusion module, wherein gamma is a randomly generated noise matrix which accords with Gaussian distribution and has the size of bs multiplied by d.
S8. according to the formula f=Γ RKLHomo Constructing a loss function in training a network model, wherein
Figure BDA00034400283900000920
Figure BDA00034400283900000921
Figure BDA0003440028390000101
Wherein Σ (-) is the sum of the contents in brackets. Wherein Γ is R Calculating similarity between spectral feature extraction module input and encoder output using Euclidean distance Γ HL Is a loss function in a variational automatic encoder VAE, using KL divergence to calculate similarity between gaussian distribution and embedded feature distribution, Γ Homo And calculating the similarity between the output of all the spatial feature extraction modules and the output of the corresponding spectral feature extraction module by using the spectral angular distance. In this embodiment, the spatial feature extraction module includes: m Long Short-term memory network layers (Long Short-term Memory Layer), each network layer is respectively
Figure BDA0003440028390000102
The input of each network layer is ∈>
Figure BDA0003440028390000103
The output of each network layer is ∈>
Figure BDA0003440028390000104
Figure BDA0003440028390000105
The number of the nodes of the last layer is d/2, and the number of the nodes of other network layers is d. The spectral feature extraction module comprises: m long-term and short-term memory network layers, each network layer is->
Figure BDA0003440028390000106
Figure BDA0003440028390000107
Each networkThe layer inputs are +.>
Figure BDA0003440028390000108
The output of each network layer is respectively
Figure BDA0003440028390000109
The number of the nodes of the last layer is d/2, and the number of the nodes of other network layers is d. The decoder module is composed of n fully connected network layers (Fully Connected Layer), each network layer is { d }, respectively 1 ,d 2 ...d n Each network layer input is { ind }, respectively 1 ,ind 2 ...ind n Each network layer output is { outd }, respectively 1 ,outd 2 ...outd n And the number of nodes of the last layer is B, and the number of nodes of other network layers is d. The decoder is used for reconstructing the data of the fused characteristic O, and the structure of the automatic encoder is formed, so that the consistency of sample information is guaranteed.
In an embodiment, further comprising: using the above loss function, selecting step length of 10 -3 Optimizing the depth network model, and after the model is stable, pooling the mean value characteristics
Figure BDA00034400283900001010
And taking all the first samples and the second samples as test samples as output, and obtaining expected embedded features.
Preferably, in the step S4, all x×y hyperspectral pixels are divided into a training set and a testing set according to a certain proportion, and normalized to make the value range between-1 and 1, and the normalization formula is as follows:
Figure BDA0003440028390000111
wherein x is min Representing minimum value, x in the pixel data max Is the maximum value. Then randomly sequencing and packaging the training set pixels, namely dividing the training set pixels into a plurality of sample packets, wherein each sample packet contains bs pixels, and only selecting for each iteration optimizationOne of the sample packets is input into the neural network, and the sample packet selected each time is different. The calculation formula of the Tanh activation function is as follows:
Figure BDA0003440028390000112
the method described above is verified next in connection with the detailed description.
The hyperspectral image feature extraction method based on the multi-level variation automatic encoder is used for extracting the hollow spectrum features of hyperspectral images and for subsequent classification research, and takes a Indiana forest data set (Indian Pines Dataset) as an example, the image size is 145 multiplied by 200, and the image contains 21025 pixels in total, each pixel contains 200 spectrum wavelengths, and the whole data set contains 16 effective categories and background noise categories in total. After the pixels belonging to the background noise category are removed, 10366 effective pixels are remained in total. The deep network architecture is shown in fig. 2:
input: the input hyperspectral image is an image of 145×145×200 in size.
Parameter setting: the neighborhood size is 5, the number of network layers in the spatial feature extraction module and the spectral feature extraction module is 3, the number of network layers in the decoder module is 3, and the embedded feature dimension is 40.
And selecting neighborhood information, obtaining the neighborhood information with the size of 5 multiplied by 200 for each pixel, and inputting the pixel and the neighborhood information into a depth network for training.
Training the CNN
40% of samples in 10366 training set data are selected for training the deep network model, and the samples are randomly ordered and packaged, and the number of pixels in a small batch is 512. Only one of the sample packets is used for each training. After training, all 10366 training set data are input into a depth model for testing, embedded features with the size of 10366 multiplied by 40 are obtained, and finally classification is carried out by using an SVM classifier. And randomly selecting 10% of samples to train the SVM classifier, testing by using the rest 90% of samples to finally obtain a classification result, and evaluating the classification result by selecting the overall classification precision and the average classification precision. The overall classification result refers to the ratio of the number of correctly classified samples divided by the total number of samples in all samples. The average classification accuracy is first the ratio of the number of correctly classified samples in each class divided by the number of the class samples, and the average value of the various ratios is calculated.
The hyperspectral image characteristic extraction method based on the multi-level variation automatic encoder and the common variation automatic encoder (the common variation automatic encoder comprises an encoder, a characteristic fusion module and a decoder, wherein the encoder consists of 3 full-connection layers, the decoder consists of 3 full-connection layers, and the number of network layer nodes and the characteristic fusion module structure are the same as those of the method implemented by the application), and the obtained classification results are shown in the following table.
Overall classification accuracy Average classification accuracy
The method implemented by the application 85.3% 79.1%
Adding random Gaussian noise 81.4% 72.3%
Automatic encoder for common variation 76.7% 66.3%
As can be seen from the table, the method can better improve the classification performance of the embedded features and has fewer misclassification samples. In addition, by adding certain random Gaussian noise into the original hyperspectral image and repeating the experiment, the overall classification accuracy is 81.4% (the overall classification accuracy reaches 85.3% when no random Gaussian noise is added), so that the method has strong anti-noise interference capability. Therefore, the method can effectively improve the classifiability and classification precision of the embedded features and improve the noise interference resistance of the model.
The foregoing embodiments are provided to illustrate the technical concept and features of the present application and are intended to enable those skilled in the art to understand the contents of the present application and implement the same according to the contents, and are not intended to limit the scope of the present application. All such equivalent changes and modifications as come within the spirit of the disclosure are desired to be protected.

Claims (10)

1. The hyperspectral image feature extraction method based on the multi-level variation automatic encoder is characterized by comprising the following steps of:
s1, selecting a hyperspectral image, wherein the size of the hyperspectral image is X multiplied by Y multiplied by B, X and Y are the space sizes of the hyperspectral image under each wavelength, B is the number of the wavelengths of the hyperspectral image,
s2, configuring neighborhood information for each hyperspectral pixel in the hyperspectral image, namely selecting a neighborhood pixel with the surrounding size of s multiplied by s as neighborhood information of the pixel, wherein the neighborhood information refers to a square area taking the hyperspectral pixel as the center, the side length is s, the s is an odd number,
s3, constructing a depth network model, and transforming the neighborhood information based on the depth network model to obtain the depth-based depth network model with the size of 1 Xs 2 X B as Input to the spatial feature extraction module p
Second sample of 1 XB hyperspectral pixelThe second sample is used as Input of the spectral feature extraction module q The first samples and the second samples have the same number and one-to-one correspondence,
s4, training a deep neural network,
s5, feature stitching and mean value feature mu calculation, namely the first step in the spatial feature extraction module
Figure FDA0003440028380000011
Input of layer->
Figure FDA0003440028380000012
Is the first
Figure FDA0003440028380000013
Layer output->
Figure FDA0003440028380000014
The spectral feature extraction module is->
Figure FDA0003440028380000015
Input of layer->
Figure FDA0003440028380000016
Is->
Figure FDA0003440028380000017
Layer(s)
Figure FDA0003440028380000018
Splicing the layer outputs according to the calculation formula: />
Figure FDA0003440028380000019
Wherein 1 < i < m, according to the calculation formula, < ->
Figure FDA00034400283800000110
The mean characteristic μ was obtained with dimensions bs×s 2 ×d,
S6, pooling operation, namely pooling the mean characteristic mu and the standard deviation characteristic delta by using an average pooling layer to obtain the pooled mean characteristic
Figure FDA00034400283800000111
And standard deviation characteristics after pooling->
Figure FDA00034400283800000112
The dimensions are all bs x d and,
s7, obtaining a fused feature O based on a feature fusion module, inputting the fused feature O to a decoder module, wherein the decoder is used for reconstructing data of the fused feature O,
s8. network optimization, i.e. according to the formula Γ=Γ RKLHomo Constructing a loss function in training a network model, wherein
Figure FDA00034400283800000113
Figure FDA0003440028380000021
Figure FDA0003440028380000022
Wherein Σ (·) is the content in brackets all added together.
2. The hyperspectral image feature extraction method based on a multi-level variational automatic encoder as claimed in claim 1, wherein in step S1, further comprising:
carrying out normalization preprocessing on the hyperspectral image, setting a neighborhood size s, setting the number m of network layers in a spatial feature extraction module and a spectral feature extraction module, setting the number n of network layers in a decoder module, and embedding a feature dimension d, wherein d is an even number larger than 0.
3. The hyperspectral image feature extraction method based on the multi-level variational automatic encoder as claimed in claim 1, wherein the step S4 comprises:
from X Y dimensions of 1 xs 2 And randomly selecting small batches of samples from the first sample of the XB and the second samples of the X Y size of the 1 XB respectively, inputting the small batches of samples into the deep neural network for training of the deep neural network, wherein the numbers of the small batches of pixels of the first sample and the second sample are bs, and the activation functions in the network are the Tanh activation functions.
4. The hyperspectral image feature extraction method based on a multi-level variational automatic encoder as claimed in claim 3, further comprising:
normalizing all X multiplied by Y hyperspectral pixels to enable the value range to be between-1 and 1, wherein the normalization formula is as follows:
Figure FDA0003440028380000023
wherein x is min Representing minimum value, x in the pixel data max Is the maximum value.
5. The hyperspectral image feature extraction method based on the multi-level variational automatic encoder as claimed in claim 3, wherein,
the calculation formula of the Tanh activation function is as follows:
Figure FDA0003440028380000031
6. the hyperspectral image feature extraction method based on the multi-level variational automatic encoder as claimed in claim 1, wherein,
the step S8 further includes: using a loss function, selecting a step size of 10 -3 Optimizing the constructed network model, and after the model is stable, pooling the mean value characteristics
Figure FDA0003440028380000034
And taking the first sample and the second sample as test samples to obtain expected embedded features.
7. The hyperspectral image feature extraction method based on the multi-level variational automatic encoder as claimed in claim 1, wherein,
the step S3 includes:
the hyperspectral pixel x with the size of 1 XB is transformed to obtain the pixel with the size of
Figure FDA0003440028380000032
The second sample is used as the Input of the spectral feature extraction module q
If B is not s 2 Is an integer multiple of s, then the epsilon wavelengths are removed so that B-epsilon is s 2 Where q is used to refer to the relevant variable in the spectral feature extraction module.
8. The hyperspectral image feature extraction method based on the multi-level variational automatic encoder as claimed in claim 1, wherein,
the step S6 further includes:
obtaining standard deviation feature delta by using long-term memory network layer L, wherein the input of the long-term memory network layer L is
Figure FDA0003440028380000033
The number of nodes is d, and the delta size is bs multiplied by s 2 ×d。
9. The hyperspectral image feature extraction method based on the multi-level variational automatic encoder as claimed in claim 1, wherein,
in step S7, the decoder module includes:
n fully connected network layers, wherein each network layer is { d }, respectively 1 ,d 2 …d n Each network layer input is { ind }, respectively 1 ,ind 2 …ind n Each network layer output is { outd }, respectively 1 ,outd 2 …outd n And the number of nodes of the last layer is B, and the number of nodes of other network layers is d.
10. The hyperspectral image feature extraction method based on a multi-level variational automatic encoder as claimed in claim 1, wherein step S7 further comprises:
according to the formula
Figure FDA0003440028380000041
And obtaining a fused characteristic O, wherein gamma is a randomly generated noise matrix, accords with Gaussian distribution and has the size of bs multiplied by d.
CN202111627432.1A 2021-12-28 2021-12-28 Hyperspectral image feature extraction method based on multi-level variational automatic encoder Pending CN116416441A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111627432.1A CN116416441A (en) 2021-12-28 2021-12-28 Hyperspectral image feature extraction method based on multi-level variational automatic encoder
PCT/CN2022/142106 WO2023125456A1 (en) 2021-12-28 2022-12-26 Multi-level variational autoencoder-based hyperspectral image feature extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111627432.1A CN116416441A (en) 2021-12-28 2021-12-28 Hyperspectral image feature extraction method based on multi-level variational automatic encoder

Publications (1)

Publication Number Publication Date
CN116416441A true CN116416441A (en) 2023-07-11

Family

ID=86997859

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111627432.1A Pending CN116416441A (en) 2021-12-28 2021-12-28 Hyperspectral image feature extraction method based on multi-level variational automatic encoder

Country Status (2)

Country Link
CN (1) CN116416441A (en)
WO (1) WO2023125456A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117115553B (en) * 2023-09-13 2024-01-30 南京审计大学 Hyperspectral remote sensing image classification method based on mask spectral space feature prediction
CN117455970B (en) * 2023-12-22 2024-05-10 山东科技大学 Airborne laser sounding and multispectral satellite image registration method based on feature fusion
CN117934975B (en) * 2024-03-21 2024-06-07 安徽大学 Full-variation regular guide graph convolution unsupervised hyperspectral image classification method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10664716B2 (en) * 2017-07-19 2020-05-26 Vispek Inc. Portable substance analysis based on computer vision, spectroscopy, and artificial intelligence
CN111160273B (en) * 2019-12-31 2023-05-09 北京云智空间科技有限公司 Hyperspectral image spatial spectrum joint classification method and device
CN111914907B (en) * 2020-07-13 2022-07-29 河海大学 Hyperspectral image classification method based on deep learning space-spectrum combined network
CN112101271A (en) * 2020-09-23 2020-12-18 台州学院 Hyperspectral remote sensing image classification method and device

Also Published As

Publication number Publication date
WO2023125456A1 (en) 2023-07-06

Similar Documents

Publication Publication Date Title
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN109685115B (en) Fine-grained conceptual model with bilinear feature fusion and learning method
CN111489358B (en) Three-dimensional point cloud semantic segmentation method based on deep learning
CN113011499B (en) Hyperspectral remote sensing image classification method based on double-attention machine system
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
CN116416441A (en) Hyperspectral image feature extraction method based on multi-level variational automatic encoder
WO2018010434A1 (en) Image classification method and device
CN111753828B (en) Natural scene horizontal character detection method based on deep convolutional neural network
CN109784197B (en) Pedestrian re-identification method based on hole convolution and attention mechanics learning mechanism
CN107944483B (en) Multispectral image classification method based on dual-channel DCGAN and feature fusion
CN103955702A (en) SAR image terrain classification method based on depth RBF network
CN105116397B (en) Radar high resolution range profile target identification method based on MMFA models
CN112001403B (en) Image contour detection method and system
CN109190511B (en) Hyperspectral classification method based on local and structural constraint low-rank representation
CN113592007B (en) Knowledge distillation-based bad picture identification system and method, computer and storage medium
CN107832786A (en) A kind of recognition of face sorting technique based on dictionary learning
CN109255339B (en) Classification method based on self-adaptive deep forest human gait energy map
CN109241813A (en) The sparse holding embedding grammar of differentiation for unconstrained recognition of face
CN112446357A (en) SAR automatic target recognition method based on capsule network
CN111652273A (en) Deep learning-based RGB-D image classification method
CN115966010A (en) Expression recognition method based on attention and multi-scale feature fusion
CN111881716A (en) Pedestrian re-identification method based on multi-view-angle generation countermeasure network
CN113642445A (en) Hyperspectral image classification method based on full convolution neural network
CN108805280B (en) Image retrieval method and device
CN109583456B (en) Infrared surface target detection method based on feature fusion and dense connection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination