CN116740457A - Hyperspectral image and laser radar image fusion classification method and system - Google Patents

Hyperspectral image and laser radar image fusion classification method and system Download PDF

Info

Publication number
CN116740457A
CN116740457A CN202310765131.8A CN202310765131A CN116740457A CN 116740457 A CN116740457 A CN 116740457A CN 202310765131 A CN202310765131 A CN 202310765131A CN 116740457 A CN116740457 A CN 116740457A
Authority
CN
China
Prior art keywords
image
laser radar
hyperspectral
hyperspectral image
cross
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310765131.8A
Other languages
Chinese (zh)
Inventor
于文博
黄鹤
沈纲祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN202310765131.8A priority Critical patent/CN116740457A/en
Publication of CN116740457A publication Critical patent/CN116740457A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)

Abstract

The application relates to a hyperspectral image and laser radar image fusion classification method and a system, wherein the method comprises the following steps: acquiring a hyperspectral image and a laser radar image; carrying out logarithmic transformation on all values in the hyperspectral image and the laser radar image; constructing a neighborhood block of each pixel in the hyperspectral image and the laser radar image after logarithmic transformation, wherein the neighborhood block is constructed by s multiplied by s pixels around the pixel; constructing a depth network model, wherein the depth network model comprises a decomposer and a fusion device, the characteristics of neighborhood blocks in a hyperspectral image and a laser radar image are extracted through the decomposer, the characteristics of each pixel are represented by the characteristics of the corresponding neighborhood block, and fusion of the hyperspectral image and the laser radar image is realized through the fusion device based on the extracted characteristics of the neighborhood block; and classifying the categories of the pixels after the hyperspectral image and the laser radar image are fused. The method can effectively fuse and classify the hyperspectral image and the laser radar image.

Description

Hyperspectral image and laser radar image fusion classification method and system
Technical Field
The application relates to the technical field of hyperspectral image and laser radar image fusion classification, in particular to a hyperspectral image and laser radar image fusion classification method and system.
Background
In the remote sensing field, hyperspectral images and lidar images are widely used in various related studies. The hyperspectral image has rich spatial information and spectral information, wherein the spatial information is the spatial position information of the pixels under each wavelength, and the spectral information is a spectral curve formed by the spectral reflectivity of a single pixel under each wavelength. The laser radar image records the elevation information of the target ground object, and the hyperspectral image and the laser radar image are fully fused to achieve the effect of information complementation, so that the complete information of the ground object is learned and modeled. Meanwhile, the two remote sensing images are fused and classified, so that the embedded characteristics in the pixels can be fully mined, and the recognition accuracy of subsequent classification research is improved. In the early stage, the fusion classification method generally adopts two independent branches to extract characteristics of two images and realizes the fusion of multi-source information in a simple connection mode, but the method does not consider the relevance among different branches and is difficult to realize the balance of the multi-source information. Along with the improvement of computer computing power and the deep learning research, methods for realizing the full fusion of hyperspectral images and laser radar images by training a neural network are sequentially provided, the information extraction processes of different images are improved by the methods, the relevance of the information extraction processes is improved, and the performance of an algorithm is improved.
The fusion classification method of hyperspectral images and laser radar images in the remote sensing field can be generally divided into a fusion classification method based on classical machine learning and a fusion classification method based on deep learning. The fusion classification method based on classical machine learning is mainly based on classical machine learning theory, and utilizes spatial information and spectral information in hyperspectral images and elevation information in laser radar images to construct a feature extraction module and a fusion module, so that joint expression among different remote sensing images is realized. More common machine learning theory includes principal component analysis (Principle Component Analysis, PCA), minimized noise separation (Minimum Noise Fraction, MNF), linear discriminant analysis (Linear Discriminant Analysis, LDA), and the like. Other machine learning methods, such as manifold learning algorithms, structure sparsification algorithms, dictionary set decomposition algorithms, and the like, play an important role. Such methods typically extract discrimination information in hyperspectral images and lidar images, and ensure the sortability of the sample by fusing different information. With the continued deep learning theory, some depth network models are also applied to the research of fusion classification of hyperspectral images and laser radar images, such as Auto-encoder (AE), variable Auto-encoder (VAE), long Short Term Memory (LSTM), and the like. Such a method describes discrimination features contained in a sample from various aspects by extracting deep discrimination information by using a complex network structure, so that more and more fusion classification methods based on deep learning are widely proposed. For example, deep Encoder-Decoder Networks for Classification of Hyperspectral and LiDAR Data published in IEEE Geoscience and Remote Sensing Letters by Danfeng Hong et al in 2020 proposes a fully connected network based on Encoder and decoder structures, wherein features in hyperspectral images and laser radar images are extracted and fused respectively, so as to reconstruct feature information and transfer a desired deeper embedded space. In addition, more Diverse Means Better: multimodal Deep Learning Meets Remote-Sensing Imagery Classification published by them in the same year as IEEE Transactions on Geoscience and Remote Sensing proposes a deep learning framework for multi-modal data, which performs secondary learning on complementary information between multi-modal images by performing parameter cross-selection during network training. It can be seen that the deep learning is widely applied to the fusion classification research of hyperspectral images and laser radar images in the remote sensing field, and better results are obtained.
The prior art has the following defects:
the existing hyperspectral image and laser radar image fusion classification method in the remote sensing field has certain defects: (1) the existing method does not consider the relevance between illumination information of the hyperspectral image and elevation information in the laser radar image, so that deep fusion of the illumination information and the elevation information is difficult to realize, and the performance of a classification model is weakened; (2) the existing method does not apply illumination information of hyperspectral images to construction of fusion classification models, does not consider decomposing hyperspectral images into intrinsic images and illumination images and fully gives full play to the advantages of the intrinsic images and the illumination images, and partial methods try to introduce the intrinsic decomposition theory into the classification models, but directly discard the illumination images obtained by decomposition, and only use the intrinsic images and the laser radar images for fusion classification, so that the advantages of the multi-mode remote sensing images cannot be exerted; (3) the existing method is less in consideration of the joint capability and the cooperative capability between the hyperspectral image and the laser radar image when the discrimination information is extracted from the hyperspectral image and the laser radar image, and only completely separated branches are used for information mining and feature extraction, so that full grasp of complete information of pixels is not facilitated, and the advantages of the multimodal remote sensing image in terms of pixel classification and identification are difficult to develop.
Disclosure of Invention
Therefore, the technical problem to be solved by the application is to overcome the defect that the characteristics of the hyperspectral image and the laser radar image cannot be fully mined in the prior art, and the combination capability and the synergy capability between the hyperspectral image and the laser radar image are less considered.
In order to solve the technical problems, the application provides a hyperspectral image and laser radar image fusion classification method, which comprises the following steps:
acquiring a hyperspectral image H and a laser radar image L;
carrying out logarithmic transformation on all values in the hyperspectral image H and the laser radar image L;
constructing a neighborhood block of each pixel in the hyperspectral image H and the laser radar image L after logarithmic transformation, wherein the neighborhood block is constructed based on the pixels of s multiplied by s around the pixel;
constructing a depth network model, wherein the depth network model comprises a decomposer and a fusion device, the characteristics of neighborhood blocks in a hyperspectral image H and a laser radar image L are extracted through the decomposer, the characteristics of each pixel are represented by the characteristics of the corresponding neighborhood block, and fusion of the hyperspectral image H and the laser radar image L is realized through the fusion device based on the extracted characteristics of the neighborhood block;
and classifying the categories of the pixels after the hyperspectral image H and the laser radar image L are fused.
In one embodiment of the present application, the method for extracting the features of the neighborhood blocks in the hyperspectral image H and the lidar image L by the decomposer includes:
randomly selecting two adjacent blocks H of hyperspectral pixels from the hyperspectral image H 1 and h2 And selecting a neighborhood block h from the laser radar image L 1 and h2 Neighborhood block l of laser radar pixels at same position 1 and l2
Using two multi-headed self-attention layers M 1 and M2 From the neighborhood block h of the hyperspectral pixel 1 and h2 Internal extraction feature h 11 and h22
Using two multi-headed self-attention layers M 3 and M4 From the characteristics h respectively 11 Internal extraction of intrinsic reflectance features and />Using two multi-headed self-attention layers M 5 and M6 From the characteristics h respectively 22 Intrinsic irradiation feature of internal extraction-> and />
Two-dimensional convolution layer C with number of two convolution kernels being B 1 and C2 Neighborhood block l of laser radar pixel 1 and l2 The number of channels is increased to B, and two outputs are obtained;
activating two outputs by using a Tanh activation function, and enabling the two activated outputs to pass through a batch normalization layer to obtain two initial laser radar features;
aiming at each initial laser radar feature, two multi-head self-attention layers are utilized to perform feature extraction, and finally the laser radar feature is obtained and />
All the multi-head self-attention layers are connected with a batch normalization layer, the number of heads in the multi-head self-attention layers needs to divide the channel number B of the hyperspectral image, and the number of output nodes is B.
In one embodiment of the present application, the extracting, by the decomposer, the characteristics of the neighborhood blocks in the hyperspectral image H and the lidar image L, the method further includes:
the said and />All the values in the range are subjected to exponential transformation to obtain r 1 、r 2 、s 1 、s 2 、ll 1 and ll2 And sequentially processes r by using the Tanh activation function and the batch normalization layer 1 、r 2 、s 1 、s 2 、ll 1 and ll2
Neighborhood block l of laser radar pixel 1 and l2 Performing exponential transformation and combining the result after the transformation with ll 1 and ll2 Respectively adding to obtain laser radar feature lc 1 and lc2
In one embodiment of the application, the aggregator comprises a cross-domain geographic information network stream comprising a first cross-domain geographic information module and a second cross-domain geographic information module and a reflected information stream comprising a third cross-domain geographic information module and a fourth cross-domain geographic information module.
In one embodiment of the present application, the method for constructing the cross-domain geographic information network flow includes:
according to s 1 and lc1 Constructing a first cross-domain geographic information module, wherein the first cross-domain geographic information module comprises three branches, namely a distribution conversion branch, a multi-mode grouping convolution branch and a weight branch;
the input of the distribution conversion branch is s 1 and lc1 Output is slc 1 out1 The distribution conversion branch comprises a distribution conversion layer, a Tanh activation function and a batch normalization layer, wherein the formula of the distribution conversion layer is as follows:
slc 1 out1 =(ex 1 +ex 2 )/2
wherein, mu (& gt) and sigma (& gt) are respectively the mean value and standard deviation,is a hadamard operator;
using splice layer to join s 1 and lc1 Splicing along the channel dimension to obtain the slc 1
The input of the multi-mode grouping convolution branch is slc 1 Output is slc 1 out2 The multi-mode grouping convolution branch comprises a multi-mode grouping convolution layer, a Tanh activation function and a batch normalization layer;
the formula of the multi-mode grouping convolution layer is as follows:
wherein ,and w is a network layer trainable parameter, +.> For the kronecker product, the product is a dot product calculation, the round dup (x) is an upward rounding, d is the number of convolution kernels in the multi-mode grouping convolution layer, input is the Input of the multi-mode grouping convolution layer, and Output is the Output of the multi-mode grouping convolution layer;
the weight branch comprises a two-dimensional convolution layer and a Sigmoid activation function, and the weight branch input is slc 1 The output is xi, wherein the number of convolution kernels of the two-dimensional convolution layer is B;
output CMG of the first cross-domain geographic information module 1 The method comprises the following steps:
according to the feature s 2 and lc2 Building a structure and combining the first cross-domain geographic information moduleA second cross-domain geographic information module with the same structure, wherein the input of the second cross-domain geographic information module is s 2 and lc2 The output is CMG 2
In one embodiment of the present application, the method for constructing the reflected information stream includes:
constructing a reflection information stream, wherein the structures of a third cross-domain geographic information module and a fourth cross-domain geographic information module in the reflection information stream are the same as those of a first cross-domain geographic information module and a second cross-domain geographic information module in a cross-domain geographic information network stream, and the third cross-domain geographic information module and the fourth cross-domain geographic information module respectively comprise three branches, namely a distribution conversion branch, a multi-mode grouping convolution branch and a weight branch;
wherein the input of the third cross-domain geographic information module is r 1 and CMG1 Output is GR 1 The method comprises the steps of carrying out a first treatment on the surface of the The fourth cross-domain geographic information module is input as r 2 and CMG2 Output is GR 2
In one embodiment of the present application, the method classifies pixels after the hyperspectral image H and the lidar image L are fused, and includes:
the CMG is performed by using a splicing layer 1 and GR1 Splicing along the channel dimension to obtain F 1 The method comprises the steps of carrying out a first treatment on the surface of the The CMG is performed by using a splicing layer 2 and GR2 Splicing along the channel dimension to obtain F 2 The method comprises the steps of carrying out a first treatment on the surface of the The F is subjected to 1 and F2 Inputting two different full connection layers to obtain final output as and />
wherein ,neighborhood block h of hyperspectral pixel identified for depth network model 1 Neighborhood block of laser radar pixel at same positionl 1 Category of->Neighborhood block h of hyperspectral pixel identified for depth network model 2 Neighborhood block l of laser radar pixels at same position 2 Is a category of (2); the number of nodes of two different full connection layers is c, and the activation function is a Softmax function.
In one embodiment of the present application, the loss function formula of the decomposer in the deep network model during training is:
Γ D =Γ 1234
wherein GAP (·) is a global average pooling layer, sigmoid (·) is an S igmoid activation function, euclidean (·) is Euclidean distance, || is absolute value calculation, || is norm calculation, and Σ (·) is adding all the contents in brackets together.
In one embodiment of the present application, the loss function formula during the training of the fusion device in the depth network model is:
Γ F =Γ 56
in order to solve the technical problems, the application provides a hyperspectral image and laser radar image fusion classification system, which comprises:
the acquisition module is used for: the method comprises the steps of acquiring a hyperspectral image H and a laser radar image L;
and a transformation module: the method is used for carrying out logarithmic transformation on all values in the hyperspectral image H and the laser radar image L;
a neighborhood block construction module: the method comprises the steps of constructing a neighborhood block of each pixel in a hyperspectral image H and a laser radar image L after logarithmic transformation, wherein the neighborhood block is constructed based on pixels of s multiplied by s around the pixel;
model construction module: the depth network model comprises a decomposer and a fusion device, wherein the decomposer is used for extracting the characteristics of a neighborhood block in the hyperspectral image H and the laser radar image L, the characteristics of each pixel are represented by the characteristics of the corresponding neighborhood block, and the fusion device is used for realizing fusion of the hyperspectral image H and the laser radar image L based on the extracted characteristics of the neighborhood block;
and a classification module: the method is used for classifying the categories of the pixels after the hyperspectral image H and the laser radar image L are fused.
Compared with the prior art, the technical scheme of the application has the following advantages:
according to the application, by establishing various cross-domain constraints, a depth network model is utilized to effectively decompose a hyperspectral image into an intrinsic reflection image and an intrinsic irradiation image, and the intrinsic correlation between the intrinsic irradiation image and a laser radar image is mined, so that the hyperspectral image and the laser radar image are fully fused on the intrinsic level, and the purpose of high-precision classification of the ground object pixels is further realized;
according to the application, the hyperspectral image and the laser radar image are utilized to constrain the intrinsic decomposition process, the constraints of the specific characteristics of two remote sensing modes on the intrinsic reflection image and the intrinsic irradiation image are mined, and the cross-domain constraints are introduced on the basis of the standard decomposition constraints to improve the accuracy of the decomposition result;
aiming at a cross-domain geographic information fusion module of the intrinsic illumination image and the laser radar image, the application respectively realizes the full fusion of the intrinsic illumination image and the laser radar image in a three-dimensional space from two angles of distribution and a sample, thereby generating cross-domain geographic information characteristics;
according to the method, the cross-domain geographic information features are utilized to assist the characteristic extraction of the intrinsic reflection image, and the cross-domain geographic information features are utilized to guide the characteristic extraction process of the intrinsic reflection image from two angles of distribution and samples, so that the characteristic extraction precision is improved;
according to the application, the decomposer and the fusion device are respectively constrained by utilizing different loss functions, and the decomposer and the fusion device are simultaneously optimized by adopting an alternate iterative optimization strategy, so that the decomposer can decompose high-quality intrinsic reflection images and intrinsic irradiation images, the fusion device can realize full fusion of hyperspectral images and laser radar images, and the classification precision of ground objects is further improved.
Drawings
In order that the application may be more readily understood, a more particular description of the application will be rendered by reference to specific embodiments thereof that are illustrated in the appended drawings.
Fig. 1 is a flow chart of the method of the present application.
Detailed Description
The present application will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the application and practice it.
Example 1
Referring to fig. 1, the application relates to a hyperspectral image and laser radar image fusion classification method, which comprises the following steps:
acquiring a hyperspectral image H and a laser radar image L;
carrying out logarithmic transformation on all values in the hyperspectral image H and the laser radar image L;
constructing a neighborhood block of each pixel in the hyperspectral image H and the laser radar image L after logarithmic transformation, wherein the neighborhood block is constructed based on the pixels of s multiplied by s around the pixel;
constructing a depth network model, wherein the depth network model comprises a decomposer and a fusion device, the characteristics of neighborhood blocks in a hyperspectral image H and a laser radar image L are extracted through the decomposer, the characteristics of each pixel are represented by the characteristics of the corresponding neighborhood block, and fusion of the hyperspectral image H and the laser radar image L is realized through the fusion device based on the extracted characteristics of the neighborhood block;
the classification of the pixel after the fusion of the hyperspectral image H and the lidar image L is performed (for example, whether the pixel belongs to mountain or grassland, etc.).
The present embodiment is described in detail below:
s1, selecting a hyperspectral image H and a laser radar image L according to practical problems, wherein the hyperspectral image is X multiplied by Y multiplied by B, X and Y are the space sizes of hyperspectral images in each wave band, B is the number of channels of the hyperspectral images, the laser radar image is X multiplied by Y, the number of channels is 1, X and Y are the space sizes of the laser radar images, and the space sizes of the two images are the same. Carrying out normalization pretreatment on the hyperspectral image and the laser radar image, setting a neighborhood size s (s is an odd number larger than 0), wherein the convolution kernel size of each two-dimensional convolution layer is [3,3], the convolution kernel sliding step length is [1,1], the filling parameter (packing) of each two-dimensional convolution layer is 'Same (Same)', the category of each ground object in the two images is label, and the category number is c.
S2, carrying out logarithmic transformation on all values in the hyperspectral image H and the laser radar image L.
S3, aiming at each hyperspectral pixel and each laser radar pixel (X multiplied by Y pixels are arranged in two images) after logarithmic transformation, selecting a neighborhood with the surrounding size of sxs as a neighborhood block of the pixel, wherein the neighborhood block size of each hyperspectral pixel in the hyperspectral image is sxsxB, and the neighborhood block size of the laser radar pixel in the laser radar image is sxS.
S4, constructing a deep network model, wherein the steps S4-S6 are decomposers. Randomly selecting two adjacent blocks h of hyperspectral pixels 1 and h2 Then selecting a neighborhood block l of the laser radar pixel at the same position as the hyperspectral pixel 1 and l2 The categories of the center pixels of the two neighborhood blocks are label1 and label2 respectively. First, a Multi-head Self-attention Layer (Multi-head Self-attention Layer) M is utilized with two parameters being the same 1 and M2 From a neighborhood block h of hyperspectral pixels 1 and h2 Internal extraction feature h 11 and h22 . The number of heads (heads) in the multi-Head self-attention layer needs to be divided by B (that is, B is divided by the number of heads is an integer), the number of output nodes is B, and parameters in all subsequent multi-Head self-attention layers are the same as those in the above description. Then, two multi-headed self-attention layers M are utilized 3 and M4 Respectively from h 11 Intrinsic reflection feature is extracted internally and />Using two multi-headed self-attention layers M 5 and M6 Respectively from h 22 Intrinsic irradiation feature is extracted internally> and />The English names of the intrinsic reflection features and the intrinsic illumination features are Intrinsic Reflectance Feature and Intrinsic Shading Feature respectively, the main purpose is to separate material information and illumination information of objects in an image, and reduce the influence of illumination on classification, wherein the intrinsic reflection features represent discrimination information provided by the self material of the objects under uniform illumination, and the intrinsic illumination features represent illumination intensity information in the image under specific illumination environments and are irrelevant to the self material of the objects. All of the multi-headed self-attention layers described above and below are followed by a batch normalization layer (Batch Normalization Layer).
S5, utilizing two-dimensional convolution layers C 1 and C2 Neighborhood block l of laser radar pixel 1 and l2 And (3) the number of channels is increased to B, two outputs are obtained, and the number of convolution kernels of the two-dimensional convolution layers is B. And then activating the two outputs by using a Tanh activation function, and adding a batch normalization layer at the back to obtain two initial laser radar features. Then, aiming at each initial laser radar feature, two multi-head self-attention layers are utilized to conduct feature extraction, and finally the laser radar feature is obtained and />
S6, the obtained characteristics and />All the values in the range are subjected to exponential transformation to obtain r 1 、r 2 、s 1 、s 2 、ll 1 and ll2 And processing the characteristics subjected to the exponential transformation by sequentially utilizing the Tanh activation function and the batch normalization layer. Neighborhood block l for laser radar pixel 1 and l2 Performing exponential transformation and connecting after the transformationFruit and ll 1 and ll2 Respectively adding to obtain laser radar feature lc 1 and lc2
S7, constructing a depth network model, wherein the steps S7-S8 are fusion devices. The fusion device comprises a cross-domain geographic information network flow and a reflection information flow, wherein the cross-domain geographic information network flow is used for realizing the characteristic of intrinsic irradiation and />With lidar feature lc 1 and lc2 The function of the reflected information stream is to achieve an intrinsic reflection characteristic +.> and />And the output of the cross-domain geographic information network flow is fully integrated, the cross-domain geographic information network flow comprises a first cross-domain geographic information module and a second cross-domain geographic information module, and the reflected information flow comprises a third cross-domain geographic information module and a fourth cross-domain geographic information module.
For a cross-domain geographic information network flow: first, for feature s 1 and lc1 And constructing a first cross-domain geographic information module. The first cross-domain geographic information module consists of three branches, namely a distributed conversion branch, a multi-mode grouping convolution branch and a weight branch. The input of the distribution conversion branch is s 1 and lc1 Output is slc 1 out1 The distributed conversion layer comprises a distributed conversion layer, a Tanh activation function and a batch normalization layer, wherein the specific formula of the distributed conversion layer is as follows:
slc 1 out1 =(ex 1 +ex 2 )/2
wherein, mu (·) and sigma (·) are respectively the mean and standard deviation, and DEG is Hadamard operator (Hadamard Product), i.e. bit-wise operator. S is formed by splicing layers (Concatenation Layer) 1 and lc1 Splicing along the channel dimension to obtain the slc 1 . The input of the multimode grouping convolution branch is slc 1 Output is slc 1 out2 Comprises a multi-mode grouping convolution layer, a Tanh activation function and a batch normalization layer. The weight branch route consists of a two-dimensional convolution layer and a Sigmoid activation function, and the input is slc 1 The output is ζ, where the number of convolution kernels of the two-dimensional convolution layer is B. Output CMG of first cross-domain geographic information module 1 The method comprises the following steps:
for feature s 2 and lc2 Constructing a second cross-domain geographic information module with the same structure, and inputting the second cross-domain geographic information module as s 2 and lc2 The final output is CMG 2
S8, constructing a reflected information stream. The structure of the reflection information flow is identical to that of the cross-domain geographic information network flow, and the reflection information flow also comprises a third cross-domain geographic information module and a fourth cross-domain geographic information module, and each cross-domain geographic information module also comprises three branches, namely a distribution conversion branch, a multi-mode grouping convolution branch and a weight branch. Wherein the input of the third cross-domain geographic information module is r 1 and CMG1 Output is GR 1 The fourth cross-domain geographic information module has input r 2 and CMG2 Output is GR 2
S9, utilizing the splicing layer to carry out CMG 1 and GR1 Splicing along the channel dimension to obtain F 1 . CMG with splice layer 2 and GR2 Splicing along the channel dimension to obtain F 2 . Will F 1 and F2 Inputting two different full connection layers (Fully Connected Layer) with node number c and activation function Softmax function to obtain final output and /> wherein ,/>Neighborhood block h of hyperspectral pixel identified for depth network model 1 Neighborhood block l of laser radar pixels at same position 1 Category of->Neighborhood block h of hyperspectral pixel identified for depth network model 2 Neighborhood block l of laser radar pixels at same position 2 Is a category of (2).
S10, constructing a loss function of the depth network model during training of the decomposer according to the following formula.
Γ D =Γ 1234
Wherein GAP (& gt) is a global average pooling layer (Global Average Pooling Layer), sigmoid (& gt) is a Sigmoid activation function, euclidean (& gt) is Euclidean distance, the term |·| is an absolute value calculation, ||·| is a norm calculation, Σ (·) is the content in brackets all added together.
S11, constructing a loss function of the fusion device in the depth network model during training according to the following formula.
Γ F =Γ 56
S12, selecting a step length of 10 by using the loss function -3 Adopts an alternate iterative optimization strategy to optimize network parameters in the decomposer and the fusion device, namely, firstly utilizes Γ D Updating the decomposer as a loss function for 10 generations, then using Γ F And updating the fusion device as a loss function, wherein the iteration times are 10 generations, and inputting all pixels into the depth network model for testing after the model is stable, so as to obtain a final classification result.
Some of the above steps are described in detail below:
in step S1, the hyperspectral image and the laser radar image need to be normalized to have a value ranging from-1 to 1, and the normalization formula is as follows:
wherein ,xmin Representing minimum value, x in the pixel data max Is the maximum value. The calculation formula of the Tanh activation function is as follows:
in step S7, the calculation formula of the multi-modal packet convolution layer is as follows:
wherein ,and w is a network layer trainable parameter, +.> Is the kronecker Product (Kronecker Product), the Product is the Dot Product, the round dup (x) is the round up, d is the number of convolution kernels in the multi-mode packet convolution layer, input is the Input of the multi-mode packet convolution layer, and Output isAnd outputting the multi-mode grouping convolution layer.
The hyperspectral image and the lidar image used in this example were taken in Trento (Italy), where the hyperspectral image was 166×600×63 in size and the lidar image was 166×600 in size.
(one) input of the present embodiment
The input hyperspectral image is an image of size 166×600×63, and the input lidar image is an image of size 166×600.
Parameter setting
And selecting a neighborhood block with the neighborhood size of 11, obtaining neighborhood blocks with the sizes of 11 multiplied by 63 and 11 multiplied by 11 for each pixel, and inputting the neighborhood blocks into a depth network model for training.
(II) training depth network model
1% of sample neighborhood blocks are randomly selected from the total 99600 groups of sample neighborhood blocks for training a depth network model, and the sample neighborhood blocks are randomly ordered and packed, wherein the number of the small sample neighborhood blocks is 512. Only one of the sample packets is used for each training. After training, inputting all 99600 sample neighborhood blocks into a depth network model for testing, finally obtaining classification results of all samples, and evaluating the classification results by selecting overall classification precision and average classification precision. The overall classification result refers to the ratio of the number of correctly classified samples divided by the total number of samples in all samples. The average classification accuracy is first the ratio of the number of correctly classified samples in each class divided by the number of the class samples, and the average value of the various ratios is calculated.
(III) results of the present example
The classification results obtained by the hyperspectral image and laser radar image fusion classification method in the remote sensing field and the currently more commonly used ENDnet network model are shown in the table 1.
Table 1 table of classification vs. results
Overall classification accuracy Average classification accuracy
The method of the application 99.12% 97.76%
Commonly used ENDnet 89.23% 86.63%
It is easy to find that the method can better fuse and classify the hyperspectral image and the laser radar image, and has fewer misclassification samples. In addition, when one of the inputs of the cross-domain geographic information module is transposed in the spatial domain and the experiment is repeated, the overall classification accuracy is 95.24%, so that the method has strong model robustness. In conclusion, the method can effectively improve the classifiability and classification accuracy of the multi-source remote sensing image.
Example two
The embodiment provides a hyperspectral image and laser radar image fusion classification system, which comprises:
the acquisition module is used for: the method comprises the steps of acquiring a hyperspectral image H and a laser radar image L;
and a transformation module: the method is used for carrying out logarithmic transformation on all values in the hyperspectral image H and the laser radar image L;
a neighborhood block construction module: the method comprises the steps of constructing a neighborhood block of each pixel in a hyperspectral image H and a laser radar image L after logarithmic transformation, wherein the neighborhood block is constructed based on pixels of s multiplied by s around the pixel;
model construction module: the depth network model comprises a decomposer and a fusion device, wherein the decomposer is used for extracting the characteristics of a neighborhood block in the hyperspectral image H and the laser radar image L, the characteristics of each pixel are represented by the characteristics of the corresponding neighborhood block, and the fusion device is used for realizing fusion of the hyperspectral image H and the laser radar image L based on the extracted characteristics of the neighborhood block;
and a classification module: the method is used for classifying the pixels after the hyperspectral image H and the laser radar image L are fused.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present application will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present application.

Claims (10)

1. A hyperspectral image and laser radar image fusion classification method is characterized in that: comprising the following steps:
acquiring a hyperspectral image H and a laser radar image L;
carrying out logarithmic transformation on all values in the hyperspectral image H and the laser radar image L;
constructing a neighborhood block of each pixel in the hyperspectral image H and the laser radar image L after logarithmic transformation, wherein the neighborhood block is constructed based on the pixels of s multiplied by s around the pixel;
constructing a depth network model, wherein the depth network model comprises a decomposer and a fusion device, the characteristics of neighborhood blocks in a hyperspectral image H and a laser radar image L are extracted through the decomposer, the characteristics of each pixel are represented by the characteristics of the corresponding neighborhood block, and fusion of the hyperspectral image H and the laser radar image L is realized through the fusion device based on the extracted characteristics of the neighborhood block;
and classifying the categories of the pixels after the hyperspectral image H and the laser radar image L are fused.
2. The hyperspectral image and lidar image fusion classification method of claim 1, wherein the hyperspectral image and lidar image fusion classification method is characterized by: the method for extracting the characteristics of the neighborhood blocks in the hyperspectral image H and the laser radar image L through the decomposer comprises the following steps:
randomly selecting two adjacent blocks H of hyperspectral pixels from the hyperspectral image H 1 and h2 And selecting a neighborhood block h from the laser radar image L 1 and h2 Neighborhood block l of laser radar pixels at same position 1 and l2
Using two multi-headed self-attention layers M 1 and M2 From the neighborhood block h of the hyperspectral pixel 1 and h2 Internal extraction feature h 11 and h22
Using two multi-headed self-attention layers M 3 and M4 From the characteristics h respectively 11 Internal extraction of intrinsic reflectance features and />Using two multi-headed self-attention layers M 5 and M6 From the characteristics h respectively 22 Intrinsic irradiation feature of internal extraction-> and />
Two-dimensional convolution layer with number of two convolution kernels being BC 1 and C2 Neighborhood block l of laser radar pixel 1 and l2 The number of channels is increased to B, and two outputs are obtained;
activating two outputs by using a Tanh activation function, and enabling the two activated outputs to pass through a batch normalization layer to obtain two initial laser radar features;
aiming at each initial laser radar feature, two multi-head self-attention layers are utilized to perform feature extraction, and finally the laser radar feature is obtained and />
All the multi-head self-attention layers are connected with a batch normalization layer, the number of heads in the multi-head self-attention layers needs to divide the channel number B of the hyperspectral image, and the number of output nodes is B.
3. The hyperspectral image and lidar image fusion classification method of claim 2, wherein the hyperspectral image and lidar image fusion classification method is characterized by: the method for extracting the characteristics of the neighborhood blocks in the hyperspectral image H and the laser radar image L through the decomposer further comprises the following steps:
the said and />All the values in the range are subjected to exponential transformation to obtain r 1 、r 2 、s 1 、s 2 、ll 1 and ll2 And sequentially processes r by using the Tanh activation function and the batch normalization layer 1 、r 2 、s 1 、s 2 、ll 1 and ll2
Neighborhood block l of laser radar pixel 1 and l2 Index of progressTransform and combine the changed result with ll 1 and ll2 Respectively adding to obtain laser radar feature lc 1 and lc2
4. A hyperspectral image and lidar image fusion classification method as claimed in claim 3 wherein: the aggregator comprises a cross-domain geographic information network stream and a reflected information stream, wherein the cross-domain geographic information network stream comprises a first cross-domain geographic information module and a second cross-domain geographic information module, and the reflected information stream comprises a third cross-domain geographic information module and a fourth cross-domain geographic information module.
5. The method for fusion classification of hyperspectral image and lidar image according to claim 4, wherein the method comprises the steps of: the construction method of the cross-domain geographic information network flow comprises the following steps:
according to s 1 and lc1 Constructing a first cross-domain geographic information module, wherein the first cross-domain geographic information module comprises three branches, namely a distribution conversion branch, a multi-mode grouping convolution branch and a weight branch;
the input of the distribution conversion branch is s 1 and lc1 Output is slc 1 out1 The distribution conversion branch comprises a distribution conversion layer, a Tanh activation function and a batch normalization layer, wherein the formula of the distribution conversion layer is as follows:
slc 1 out1 =(ex 1 +ex 2 )/2
wherein, mu (& gt) and sigma (& gt) are respectively the mean value and standard deviation,is a hadamard operator;
using splice layer to join s 1 and lc1 Splicing along the channel dimension to obtain the slc 1
The input of the multi-mode grouping convolution branch is slc 1 Output is slc 1 out2 The multi-mode grouping convolution branch comprises a multi-mode grouping convolution layer, a Tanh activation function and a batch normalization layer;
the formula of the multi-mode grouping convolution layer is as follows:
wherein ,and w is a network layer trainable parameter, +.> As the Cronecker product, the product is calculated by dot multiplication, the round dup (x) is rounded up, d is the number of convolution kernels in the multi-mode group convolution layer, and Input is multi-modeThe input of the grouping convolution layer, output is the Output of the multi-mode grouping convolution layer;
the weight branch comprises a two-dimensional convolution layer and a Sigmoid activation function, and the weight branch input is slc 1 The output is xi, wherein the number of convolution kernels of the two-dimensional convolution layer is B;
output CMG of the first cross-domain geographic information module 1 The method comprises the following steps:
according to the feature s 2 and lc2 Constructing a second cross-domain geographic information module with the same structure as the first cross-domain geographic information module, wherein the input of the second cross-domain geographic information module is s 2 and lc2 The output is CMG 2
6. The method for fusion classification of hyperspectral image and lidar image according to claim 5, wherein the method comprises the steps of: the construction method of the reflection information flow comprises the following steps:
constructing a reflection information stream, wherein the structures of a third cross-domain geographic information module and a fourth cross-domain geographic information module in the reflection information stream are the same as those of a first cross-domain geographic information module and a second cross-domain geographic information module in a cross-domain geographic information network stream, and the third cross-domain geographic information module and the fourth cross-domain geographic information module respectively comprise three branches, namely a distribution conversion branch, a multi-mode grouping convolution branch and a weight branch;
wherein the input of the third cross-domain geographic information module is r 1 and CMG1 Output is GR 1 The method comprises the steps of carrying out a first treatment on the surface of the The fourth cross-domain geographic information module is input as r 2 and CMG2 Output is GR 2
7. The hyperspectral image and lidar image fusion classification method of claim 6, wherein the hyperspectral image and lidar image fusion classification method is characterized by: the method for classifying the pixels after the hyperspectral image H and the laser radar image L are fused comprises the following steps:
the CMG is performed by using a splicing layer 1 and GH1 Splicing along the channel dimension to obtain F 1 The method comprises the steps of carrying out a first treatment on the surface of the The CMG is performed by using a splicing layer 2 and GR2 Splicing along the channel dimension to obtain F 2 The method comprises the steps of carrying out a first treatment on the surface of the The F is subjected to 1 and F2 Inputting two different full connection layers to obtain final output as and />
wherein ,neighborhood block h of hyperspectral pixel identified for depth network model 1 Neighborhood block l of laser radar pixels at same position 1 Category of->Neighborhood block h of hyperspectral pixel identified for depth network model 2 Neighborhood block l of laser radar pixels at same position 2 Is a category of (2); the number of nodes of two different full connection layers is c, and the activation function is a Softmax function.
8. The hyperspectral image and lidar image fusion classification method of claim 6, wherein the hyperspectral image and lidar image fusion classification method is characterized by: the loss function formula of the decomposer in the depth network model during training is as follows:
Γ D =Γ 1234
wherein GAP (·) is a global average pooling layer, sigmoid (·) is a Sigmoid activation function, euclidean (·) is Euclidean distance, || is absolute value calculation, || is norm calculation, and Σ (·) is adding all the contents in brackets together.
9. The hyperspectral image and lidar image fusion classification method of claim 7, wherein the hyperspectral image and lidar image fusion classification method is characterized by: the loss function formula of the fusion device in the depth network model during training is as follows:
Γ F =Γ 56
10. a hyperspectral image and laser radar image fusion classification system is characterized in that: comprising the following steps:
the acquisition module is used for: the method comprises the steps of acquiring a hyperspectral image H and a laser radar image L;
and a transformation module: the method is used for carrying out logarithmic transformation on all values in the hyperspectral image H and the laser radar image L;
a neighborhood block construction module: the method comprises the steps of constructing a neighborhood block of each pixel in a hyperspectral image H and a laser radar image L after logarithmic transformation, wherein the neighborhood block is constructed based on pixels of s multiplied by s around the pixel;
model construction module: the depth network model comprises a decomposer and a fusion device, wherein the decomposer is used for extracting the characteristics of a neighborhood block in the hyperspectral image H and the laser radar image L, the characteristics of each pixel are represented by the characteristics of the corresponding neighborhood block, and the fusion device is used for realizing fusion of the hyperspectral image H and the laser radar image L based on the extracted characteristics of the neighborhood block;
and a classification module: the method is used for classifying the categories of the pixels after the hyperspectral image H and the laser radar image L are fused.
CN202310765131.8A 2023-06-27 2023-06-27 Hyperspectral image and laser radar image fusion classification method and system Pending CN116740457A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310765131.8A CN116740457A (en) 2023-06-27 2023-06-27 Hyperspectral image and laser radar image fusion classification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310765131.8A CN116740457A (en) 2023-06-27 2023-06-27 Hyperspectral image and laser radar image fusion classification method and system

Publications (1)

Publication Number Publication Date
CN116740457A true CN116740457A (en) 2023-09-12

Family

ID=87918350

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310765131.8A Pending CN116740457A (en) 2023-06-27 2023-06-27 Hyperspectral image and laser radar image fusion classification method and system

Country Status (1)

Country Link
CN (1) CN116740457A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015207235A (en) * 2014-04-23 2015-11-19 日本電気株式会社 Data fusion device, land coverage classification system, method and program
WO2022109945A1 (en) * 2020-11-26 2022-06-02 深圳大学 Hyperspectral and lidar joint classification method based on scale adaptive filtering
CN114694039A (en) * 2022-04-15 2022-07-01 湖南大学 Remote sensing hyperspectral and laser radar image fusion classification method and device
CN115331110A (en) * 2022-08-26 2022-11-11 苏州大学 Fusion classification method and device for remote sensing hyperspectral image and laser radar image
CN116167955A (en) * 2023-02-24 2023-05-26 苏州大学 Hyperspectral and laser radar image fusion method and system for remote sensing field

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015207235A (en) * 2014-04-23 2015-11-19 日本電気株式会社 Data fusion device, land coverage classification system, method and program
WO2022109945A1 (en) * 2020-11-26 2022-06-02 深圳大学 Hyperspectral and lidar joint classification method based on scale adaptive filtering
CN114694039A (en) * 2022-04-15 2022-07-01 湖南大学 Remote sensing hyperspectral and laser radar image fusion classification method and device
CN115331110A (en) * 2022-08-26 2022-11-11 苏州大学 Fusion classification method and device for remote sensing hyperspectral image and laser radar image
CN116167955A (en) * 2023-02-24 2023-05-26 苏州大学 Hyperspectral and laser radar image fusion method and system for remote sensing field

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DANFENG HONG ET AL: "Deep Encoder–Decoder Networks for Classification of Hyperspectral and LiDAR Data", IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 28 August 2020 (2020-08-28) *
杨思睿;薛朝辉;张玲;苏红军;周绍光;: "高光谱与LiDAR数据融合研究――以黑河中游张掖绿洲农业区精细作物分类为例", 国土资源遥感, no. 04, 8 December 2018 (2018-12-08) *

Similar Documents

Publication Publication Date Title
CN105320965B (en) Sky based on depth convolutional neural networks composes united hyperspectral image classification method
CN111368896B (en) Hyperspectral remote sensing image classification method based on dense residual three-dimensional convolutional neural network
CN109584337B (en) Image generation method for generating countermeasure network based on condition capsule
CN109993220B (en) Multi-source remote sensing image classification method based on double-path attention fusion neural network
CN110555458B (en) Multi-band image feature level fusion method for generating countermeasure network based on attention mechanism
CN112836773B (en) Hyperspectral image classification method based on global attention residual error network
CN108154194B (en) Method for extracting high-dimensional features by using tensor-based convolutional network
CN111274869B (en) Method for classifying hyperspectral images based on parallel attention mechanism residual error network
CN111126256B (en) Hyperspectral image classification method based on self-adaptive space-spectrum multi-scale network
WO2024040828A1 (en) Method and device for fusion and classification of remote sensing hyperspectral image and laser radar image
CN112446476A (en) Neural network model compression method, device, storage medium and chip
CN107145836B (en) Hyperspectral image classification method based on stacked boundary identification self-encoder
CN113486851B (en) Hyperspectral image classification method based on double-branch spectrum multi-scale attention network
CN114937151A (en) Lightweight target detection method based on multi-receptive-field and attention feature pyramid
CN104217214A (en) Configurable convolutional neural network based red green blue-distance (RGB-D) figure behavior identification method
CN109190511B (en) Hyperspectral classification method based on local and structural constraint low-rank representation
CN112836666A (en) Hyperspectral image classification and identification method
CN113902622B (en) Spectrum super-resolution method based on depth priori joint attention
CN110852369B (en) Hyperspectral image classification method combining 3D/2D convolutional network and adaptive spectrum unmixing
CN113420838B (en) SAR and optical image classification method based on multi-scale attention feature fusion
Fırat et al. Spatial-spectral classification of hyperspectral remote sensing images using 3D CNN based LeNet-5 architecture
CN104036242A (en) Object recognition method based on convolutional restricted Boltzmann machine combining Centering Trick
CN110414338B (en) Pedestrian re-identification method based on sparse attention network
CN115965864A (en) Lightweight attention mechanism network for crop disease identification
Kate et al. A 3 Tier CNN model with deep discriminative feature extraction for discovering malignant growth in multi-scale histopathology images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination