CN116740457A

CN116740457A - Hyperspectral image and laser radar image fusion classification method and system

Info

Publication number: CN116740457A
Application number: CN202310765131.8A
Authority: CN
Inventors: 于文博; 黄鹤; 沈纲祥
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2023-06-27
Filing date: 2023-06-27
Publication date: 2023-09-12

Abstract

The application relates to a hyperspectral image and laser radar image fusion classification method and a system, wherein the method comprises the following steps: acquiring a hyperspectral image and a laser radar image; carrying out logarithmic transformation on all values in the hyperspectral image and the laser radar image; constructing a neighborhood block of each pixel in the hyperspectral image and the laser radar image after logarithmic transformation, wherein the neighborhood block is constructed by s multiplied by s pixels around the pixel; constructing a depth network model, wherein the depth network model comprises a decomposer and a fusion device, the characteristics of neighborhood blocks in a hyperspectral image and a laser radar image are extracted through the decomposer, the characteristics of each pixel are represented by the characteristics of the corresponding neighborhood block, and fusion of the hyperspectral image and the laser radar image is realized through the fusion device based on the extracted characteristics of the neighborhood block; and classifying the categories of the pixels after the hyperspectral image and the laser radar image are fused. The method can effectively fuse and classify the hyperspectral image and the laser radar image.

Description

Hyperspectral image and laser radar image fusion classification method and system

Technical Field

The application relates to the technical field of hyperspectral image and laser radar image fusion classification, in particular to a hyperspectral image and laser radar image fusion classification method and system.

Background

In the remote sensing field, hyperspectral images and lidar images are widely used in various related studies. The hyperspectral image has rich spatial information and spectral information, wherein the spatial information is the spatial position information of the pixels under each wavelength, and the spectral information is a spectral curve formed by the spectral reflectivity of a single pixel under each wavelength. The laser radar image records the elevation information of the target ground object, and the hyperspectral image and the laser radar image are fully fused to achieve the effect of information complementation, so that the complete information of the ground object is learned and modeled. Meanwhile, the two remote sensing images are fused and classified, so that the embedded characteristics in the pixels can be fully mined, and the recognition accuracy of subsequent classification research is improved. In the early stage, the fusion classification method generally adopts two independent branches to extract characteristics of two images and realizes the fusion of multi-source information in a simple connection mode, but the method does not consider the relevance among different branches and is difficult to realize the balance of the multi-source information. Along with the improvement of computer computing power and the deep learning research, methods for realizing the full fusion of hyperspectral images and laser radar images by training a neural network are sequentially provided, the information extraction processes of different images are improved by the methods, the relevance of the information extraction processes is improved, and the performance of an algorithm is improved.

The fusion classification method of hyperspectral images and laser radar images in the remote sensing field can be generally divided into a fusion classification method based on classical machine learning and a fusion classification method based on deep learning. The fusion classification method based on classical machine learning is mainly based on classical machine learning theory, and utilizes spatial information and spectral information in hyperspectral images and elevation information in laser radar images to construct a feature extraction module and a fusion module, so that joint expression among different remote sensing images is realized. More common machine learning theory includes principal component analysis (Principle Component Analysis, PCA), minimized noise separation (Minimum Noise Fraction, MNF), linear discriminant analysis (Linear Discriminant Analysis, LDA), and the like. Other machine learning methods, such as manifold learning algorithms, structure sparsification algorithms, dictionary set decomposition algorithms, and the like, play an important role. Such methods typically extract discrimination information in hyperspectral images and lidar images, and ensure the sortability of the sample by fusing different information. With the continued deep learning theory, some depth network models are also applied to the research of fusion classification of hyperspectral images and laser radar images, such as Auto-encoder (AE), variable Auto-encoder (VAE), long Short Term Memory (LSTM), and the like. Such a method describes discrimination features contained in a sample from various aspects by extracting deep discrimination information by using a complex network structure, so that more and more fusion classification methods based on deep learning are widely proposed. For example, deep Encoder-Decoder Networks for Classification of Hyperspectral and LiDAR Data published in IEEE Geoscience and Remote Sensing Letters by Danfeng Hong et al in 2020 proposes a fully connected network based on Encoder and decoder structures, wherein features in hyperspectral images and laser radar images are extracted and fused respectively, so as to reconstruct feature information and transfer a desired deeper embedded space. In addition, more Diverse Means Better: multimodal Deep Learning Meets Remote-Sensing Imagery Classification published by them in the same year as IEEE Transactions on Geoscience and Remote Sensing proposes a deep learning framework for multi-modal data, which performs secondary learning on complementary information between multi-modal images by performing parameter cross-selection during network training. It can be seen that the deep learning is widely applied to the fusion classification research of hyperspectral images and laser radar images in the remote sensing field, and better results are obtained.

The prior art has the following defects:

the existing hyperspectral image and laser radar image fusion classification method in the remote sensing field has certain defects: (1) the existing method does not consider the relevance between illumination information of the hyperspectral image and elevation information in the laser radar image, so that deep fusion of the illumination information and the elevation information is difficult to realize, and the performance of a classification model is weakened; (2) the existing method does not apply illumination information of hyperspectral images to construction of fusion classification models, does not consider decomposing hyperspectral images into intrinsic images and illumination images and fully gives full play to the advantages of the intrinsic images and the illumination images, and partial methods try to introduce the intrinsic decomposition theory into the classification models, but directly discard the illumination images obtained by decomposition, and only use the intrinsic images and the laser radar images for fusion classification, so that the advantages of the multi-mode remote sensing images cannot be exerted; (3) the existing method is less in consideration of the joint capability and the cooperative capability between the hyperspectral image and the laser radar image when the discrimination information is extracted from the hyperspectral image and the laser radar image, and only completely separated branches are used for information mining and feature extraction, so that full grasp of complete information of pixels is not facilitated, and the advantages of the multimodal remote sensing image in terms of pixel classification and identification are difficult to develop.

Disclosure of Invention

Therefore, the technical problem to be solved by the application is to overcome the defect that the characteristics of the hyperspectral image and the laser radar image cannot be fully mined in the prior art, and the combination capability and the synergy capability between the hyperspectral image and the laser radar image are less considered.

In order to solve the technical problems, the application provides a hyperspectral image and laser radar image fusion classification method, which comprises the following steps:

acquiring a hyperspectral image H and a laser radar image L;

carrying out logarithmic transformation on all values in the hyperspectral image H and the laser radar image L;

constructing a neighborhood block of each pixel in the hyperspectral image H and the laser radar image L after logarithmic transformation, wherein the neighborhood block is constructed based on the pixels of s multiplied by s around the pixel;

constructing a depth network model, wherein the depth network model comprises a decomposer and a fusion device, the characteristics of neighborhood blocks in a hyperspectral image H and a laser radar image L are extracted through the decomposer, the characteristics of each pixel are represented by the characteristics of the corresponding neighborhood block, and fusion of the hyperspectral image H and the laser radar image L is realized through the fusion device based on the extracted characteristics of the neighborhood block;

and classifying the categories of the pixels after the hyperspectral image H and the laser radar image L are fused.

In one embodiment of the present application, the method for extracting the features of the neighborhood blocks in the hyperspectral image H and the lidar image L by the decomposer includes:

randomly selecting two adjacent blocks H of hyperspectral pixels from the hyperspectral image H ₁ and h₂ And selecting a neighborhood block h from the laser radar image L ₁ and h₂ Neighborhood block l of laser radar pixels at same position ₁ and l₂ ；

Using two multi-headed self-attention layers M ₁ and M₂ From the neighborhood block h of the hyperspectral pixel ₁ and h₂ Internal extraction feature h ₁₁ and h₂₂ ；

Using two multi-headed self-attention layers M ₃ and M₄ From the characteristics h respectively ₁₁ Internal extraction of intrinsic reflectance features and />Using two multi-headed self-attention layers M ₅ and M₆ From the characteristics h respectively ₂₂ Intrinsic irradiation feature of internal extraction-> and />

Two-dimensional convolution layer C with number of two convolution kernels being B ₁ and C₂ Neighborhood block l of laser radar pixel ₁ and l₂ The number of channels is increased to B, and two outputs are obtained;

activating two outputs by using a Tanh activation function, and enabling the two activated outputs to pass through a batch normalization layer to obtain two initial laser radar features;

aiming at each initial laser radar feature, two multi-head self-attention layers are utilized to perform feature extraction, and finally the laser radar feature is obtained and />

All the multi-head self-attention layers are connected with a batch normalization layer, the number of heads in the multi-head self-attention layers needs to divide the channel number B of the hyperspectral image, and the number of output nodes is B.

In one embodiment of the present application, the extracting, by the decomposer, the characteristics of the neighborhood blocks in the hyperspectral image H and the lidar image L, the method further includes:

the said and />All the values in the range are subjected to exponential transformation to obtain r ₁ 、r ₂ 、s ₁ 、s ₂ 、ll ₁ and ll₂ And sequentially processes r by using the Tanh activation function and the batch normalization layer ₁ 、r ₂ 、s ₁ 、s ₂ 、ll ₁ and ll₂ ；

Neighborhood block l of laser radar pixel ₁ and l₂ Performing exponential transformation and combining the result after the transformation with ll ₁ and ll₂ Respectively adding to obtain laser radar feature lc ₁ and lc₂ 。

In one embodiment of the application, the aggregator comprises a cross-domain geographic information network stream comprising a first cross-domain geographic information module and a second cross-domain geographic information module and a reflected information stream comprising a third cross-domain geographic information module and a fourth cross-domain geographic information module.

In one embodiment of the present application, the method for constructing the cross-domain geographic information network flow includes:

according to s ₁ and lc₁ Constructing a first cross-domain geographic information module, wherein the first cross-domain geographic information module comprises three branches, namely a distribution conversion branch, a multi-mode grouping convolution branch and a weight branch;

the input of the distribution conversion branch is s ₁ and lc₁ Output is slc ₁ ^out1 The distribution conversion branch comprises a distribution conversion layer, a Tanh activation function and a batch normalization layer, wherein the formula of the distribution conversion layer is as follows:

slc ₁ ^out1 ＝(ex ₁ +ex ₂ )/2

wherein, mu (& gt) and sigma (& gt) are respectively the mean value and standard deviation,is a hadamard operator;

using splice layer to join s ₁ and lc₁ Splicing along the channel dimension to obtain the slc ₁ ；

The input of the multi-mode grouping convolution branch is slc ₁ Output is slc ₁ ^out2 The multi-mode grouping convolution branch comprises a multi-mode grouping convolution layer, a Tanh activation function and a batch normalization layer;

the formula of the multi-mode grouping convolution layer is as follows:

wherein ,and w is a network layer trainable parameter, +.> For the kronecker product, the product is a dot product calculation, the round dup (x) is an upward rounding, d is the number of convolution kernels in the multi-mode grouping convolution layer, input is the Input of the multi-mode grouping convolution layer, and Output is the Output of the multi-mode grouping convolution layer;

the weight branch comprises a two-dimensional convolution layer and a Sigmoid activation function, and the weight branch input is slc ₁ The output is xi, wherein the number of convolution kernels of the two-dimensional convolution layer is B;

output CMG of the first cross-domain geographic information module ₁ The method comprises the following steps:

according to the feature s ₂ and lc₂ Building a structure and combining the first cross-domain geographic information moduleA second cross-domain geographic information module with the same structure, wherein the input of the second cross-domain geographic information module is s ₂ and lc₂ The output is CMG ₂ 。

In one embodiment of the present application, the method for constructing the reflected information stream includes:

constructing a reflection information stream, wherein the structures of a third cross-domain geographic information module and a fourth cross-domain geographic information module in the reflection information stream are the same as those of a first cross-domain geographic information module and a second cross-domain geographic information module in a cross-domain geographic information network stream, and the third cross-domain geographic information module and the fourth cross-domain geographic information module respectively comprise three branches, namely a distribution conversion branch, a multi-mode grouping convolution branch and a weight branch;

wherein the input of the third cross-domain geographic information module is r ₁ and CMG₁ Output is GR ₁ The method comprises the steps of carrying out a first treatment on the surface of the The fourth cross-domain geographic information module is input as r ₂ and CMG₂ Output is GR ₂ 。

In one embodiment of the present application, the method classifies pixels after the hyperspectral image H and the lidar image L are fused, and includes:

the CMG is performed by using a splicing layer ₁ and GR₁ Splicing along the channel dimension to obtain F ₁ The method comprises the steps of carrying out a first treatment on the surface of the The CMG is performed by using a splicing layer ₂ and GR₂ Splicing along the channel dimension to obtain F ₂ The method comprises the steps of carrying out a first treatment on the surface of the The F is subjected to ₁ and F₂ Inputting two different full connection layers to obtain final output as and />

wherein ,neighborhood block h of hyperspectral pixel identified for depth network model ₁ Neighborhood block of laser radar pixel at same positionl ₁ Category of->Neighborhood block h of hyperspectral pixel identified for depth network model ₂ Neighborhood block l of laser radar pixels at same position ₂ Is a category of (2); the number of nodes of two different full connection layers is c, and the activation function is a Softmax function.

In one embodiment of the present application, the loss function formula of the decomposer in the deep network model during training is:

Γ _D ＝Γ ₁ +Γ ₂ +Γ ₃ +Γ ₄

wherein GAP (·) is a global average pooling layer, sigmoid (·) is an S igmoid activation function, euclidean (·) is Euclidean distance, || is absolute value calculation, || is norm calculation, and Σ (·) is adding all the contents in brackets together.

In one embodiment of the present application, the loss function formula during the training of the fusion device in the depth network model is:

Γ _F ＝Γ ₅ +Γ ₆

in order to solve the technical problems, the application provides a hyperspectral image and laser radar image fusion classification system, which comprises:

the acquisition module is used for: the method comprises the steps of acquiring a hyperspectral image H and a laser radar image L;

and a transformation module: the method is used for carrying out logarithmic transformation on all values in the hyperspectral image H and the laser radar image L;

a neighborhood block construction module: the method comprises the steps of constructing a neighborhood block of each pixel in a hyperspectral image H and a laser radar image L after logarithmic transformation, wherein the neighborhood block is constructed based on pixels of s multiplied by s around the pixel;

model construction module: the depth network model comprises a decomposer and a fusion device, wherein the decomposer is used for extracting the characteristics of a neighborhood block in the hyperspectral image H and the laser radar image L, the characteristics of each pixel are represented by the characteristics of the corresponding neighborhood block, and the fusion device is used for realizing fusion of the hyperspectral image H and the laser radar image L based on the extracted characteristics of the neighborhood block;

and a classification module: the method is used for classifying the categories of the pixels after the hyperspectral image H and the laser radar image L are fused.

Compared with the prior art, the technical scheme of the application has the following advantages:

according to the application, by establishing various cross-domain constraints, a depth network model is utilized to effectively decompose a hyperspectral image into an intrinsic reflection image and an intrinsic irradiation image, and the intrinsic correlation between the intrinsic irradiation image and a laser radar image is mined, so that the hyperspectral image and the laser radar image are fully fused on the intrinsic level, and the purpose of high-precision classification of the ground object pixels is further realized;

according to the application, the hyperspectral image and the laser radar image are utilized to constrain the intrinsic decomposition process, the constraints of the specific characteristics of two remote sensing modes on the intrinsic reflection image and the intrinsic irradiation image are mined, and the cross-domain constraints are introduced on the basis of the standard decomposition constraints to improve the accuracy of the decomposition result;

aiming at a cross-domain geographic information fusion module of the intrinsic illumination image and the laser radar image, the application respectively realizes the full fusion of the intrinsic illumination image and the laser radar image in a three-dimensional space from two angles of distribution and a sample, thereby generating cross-domain geographic information characteristics;

according to the method, the cross-domain geographic information features are utilized to assist the characteristic extraction of the intrinsic reflection image, and the cross-domain geographic information features are utilized to guide the characteristic extraction process of the intrinsic reflection image from two angles of distribution and samples, so that the characteristic extraction precision is improved;

according to the application, the decomposer and the fusion device are respectively constrained by utilizing different loss functions, and the decomposer and the fusion device are simultaneously optimized by adopting an alternate iterative optimization strategy, so that the decomposer can decompose high-quality intrinsic reflection images and intrinsic irradiation images, the fusion device can realize full fusion of hyperspectral images and laser radar images, and the classification precision of ground objects is further improved.

Drawings

In order that the application may be more readily understood, a more particular description of the application will be rendered by reference to specific embodiments thereof that are illustrated in the appended drawings.

Fig. 1 is a flow chart of the method of the present application.

Detailed Description

The present application will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the application and practice it.

Example 1

Referring to fig. 1, the application relates to a hyperspectral image and laser radar image fusion classification method, which comprises the following steps:

acquiring a hyperspectral image H and a laser radar image L;

the classification of the pixel after the fusion of the hyperspectral image H and the lidar image L is performed (for example, whether the pixel belongs to mountain or grassland, etc.).

The present embodiment is described in detail below:

s1, selecting a hyperspectral image H and a laser radar image L according to practical problems, wherein the hyperspectral image is X multiplied by Y multiplied by B, X and Y are the space sizes of hyperspectral images in each wave band, B is the number of channels of the hyperspectral images, the laser radar image is X multiplied by Y, the number of channels is 1, X and Y are the space sizes of the laser radar images, and the space sizes of the two images are the same. Carrying out normalization pretreatment on the hyperspectral image and the laser radar image, setting a neighborhood size s (s is an odd number larger than 0), wherein the convolution kernel size of each two-dimensional convolution layer is [3,3], the convolution kernel sliding step length is [1,1], the filling parameter (packing) of each two-dimensional convolution layer is 'Same (Same)', the category of each ground object in the two images is label, and the category number is c.

S2, carrying out logarithmic transformation on all values in the hyperspectral image H and the laser radar image L.

S3, aiming at each hyperspectral pixel and each laser radar pixel (X multiplied by Y pixels are arranged in two images) after logarithmic transformation, selecting a neighborhood with the surrounding size of sxs as a neighborhood block of the pixel, wherein the neighborhood block size of each hyperspectral pixel in the hyperspectral image is sxsxB, and the neighborhood block size of the laser radar pixel in the laser radar image is sxS.

S4, constructing a deep network model, wherein the steps S4-S6 are decomposers. Randomly selecting two adjacent blocks h of hyperspectral pixels ₁ and h₂ Then selecting a neighborhood block l of the laser radar pixel at the same position as the hyperspectral pixel ₁ and l₂ The categories of the center pixels of the two neighborhood blocks are label1 and label2 respectively. First, a Multi-head Self-attention Layer (Multi-head Self-attention Layer) M is utilized with two parameters being the same ₁ and M₂ From a neighborhood block h of hyperspectral pixels ₁ and h₂ Internal extraction feature h ₁₁ and h₂₂ . The number of heads (heads) in the multi-Head self-attention layer needs to be divided by B (that is, B is divided by the number of heads is an integer), the number of output nodes is B, and parameters in all subsequent multi-Head self-attention layers are the same as those in the above description. Then, two multi-headed self-attention layers M are utilized ₃ and M₄ Respectively from h ₁₁ Intrinsic reflection feature is extracted internally and />Using two multi-headed self-attention layers M ₅ and M₆ Respectively from h ₂₂ Intrinsic irradiation feature is extracted internally> and />The English names of the intrinsic reflection features and the intrinsic illumination features are Intrinsic Reflectance Feature and Intrinsic Shading Feature respectively, the main purpose is to separate material information and illumination information of objects in an image, and reduce the influence of illumination on classification, wherein the intrinsic reflection features represent discrimination information provided by the self material of the objects under uniform illumination, and the intrinsic illumination features represent illumination intensity information in the image under specific illumination environments and are irrelevant to the self material of the objects. All of the multi-headed self-attention layers described above and below are followed by a batch normalization layer (Batch Normalization Layer).

S5, utilizing two-dimensional convolution layers C ₁ and C₂ Neighborhood block l of laser radar pixel ₁ and l₂ And (3) the number of channels is increased to B, two outputs are obtained, and the number of convolution kernels of the two-dimensional convolution layers is B. And then activating the two outputs by using a Tanh activation function, and adding a batch normalization layer at the back to obtain two initial laser radar features. Then, aiming at each initial laser radar feature, two multi-head self-attention layers are utilized to conduct feature extraction, and finally the laser radar feature is obtained and />

S6, the obtained characteristics and />All the values in the range are subjected to exponential transformation to obtain r ₁ 、r ₂ 、s ₁ 、s ₂ 、ll ₁ and ll₂ And processing the characteristics subjected to the exponential transformation by sequentially utilizing the Tanh activation function and the batch normalization layer. Neighborhood block l for laser radar pixel ₁ and l₂ Performing exponential transformation and connecting after the transformationFruit and ll ₁ and ll₂ Respectively adding to obtain laser radar feature lc ₁ and lc₂ 。

S7, constructing a depth network model, wherein the steps S7-S8 are fusion devices. The fusion device comprises a cross-domain geographic information network flow and a reflection information flow, wherein the cross-domain geographic information network flow is used for realizing the characteristic of intrinsic irradiation and />With lidar feature lc ₁ and lc₂ The function of the reflected information stream is to achieve an intrinsic reflection characteristic +.> and />And the output of the cross-domain geographic information network flow is fully integrated, the cross-domain geographic information network flow comprises a first cross-domain geographic information module and a second cross-domain geographic information module, and the reflected information flow comprises a third cross-domain geographic information module and a fourth cross-domain geographic information module.

For a cross-domain geographic information network flow: first, for feature s ₁ and lc₁ And constructing a first cross-domain geographic information module. The first cross-domain geographic information module consists of three branches, namely a distributed conversion branch, a multi-mode grouping convolution branch and a weight branch. The input of the distribution conversion branch is s ₁ and lc₁ Output is slc ₁ ^out1 The distributed conversion layer comprises a distributed conversion layer, a Tanh activation function and a batch normalization layer, wherein the specific formula of the distributed conversion layer is as follows:

slc ₁ ^out1 ＝(ex ₁ +ex ₂ )/2

wherein, mu (·) and sigma (·) are respectively the mean and standard deviation, and DEG is Hadamard operator (Hadamard Product), i.e. bit-wise operator. S is formed by splicing layers (Concatenation Layer) ₁ and lc₁ Splicing along the channel dimension to obtain the slc ₁ . The input of the multimode grouping convolution branch is slc ₁ Output is slc ₁ ^out2 Comprises a multi-mode grouping convolution layer, a Tanh activation function and a batch normalization layer. The weight branch route consists of a two-dimensional convolution layer and a Sigmoid activation function, and the input is slc ₁ The output is ζ, where the number of convolution kernels of the two-dimensional convolution layer is B. Output CMG of first cross-domain geographic information module ₁ The method comprises the following steps:

for feature s ₂ and lc₂ Constructing a second cross-domain geographic information module with the same structure, and inputting the second cross-domain geographic information module as s ₂ and lc₂ The final output is CMG ₂ 。

S8, constructing a reflected information stream. The structure of the reflection information flow is identical to that of the cross-domain geographic information network flow, and the reflection information flow also comprises a third cross-domain geographic information module and a fourth cross-domain geographic information module, and each cross-domain geographic information module also comprises three branches, namely a distribution conversion branch, a multi-mode grouping convolution branch and a weight branch. Wherein the input of the third cross-domain geographic information module is r ₁ and CMG₁ Output is GR ₁ The fourth cross-domain geographic information module has input r ₂ and CMG₂ Output is GR ₂ 。

S9, utilizing the splicing layer to carry out CMG ₁ and GR₁ Splicing along the channel dimension to obtain F ₁ . CMG with splice layer ₂ and GR₂ Splicing along the channel dimension to obtain F ₂ . Will F ₁ and F₂ Inputting two different full connection layers (Fully Connected Layer) with node number c and activation function Softmax function to obtain final output and /> wherein ,/>Neighborhood block h of hyperspectral pixel identified for depth network model ₁ Neighborhood block l of laser radar pixels at same position ₁ Category of->Neighborhood block h of hyperspectral pixel identified for depth network model ₂ Neighborhood block l of laser radar pixels at same position ₂ Is a category of (2).

S10, constructing a loss function of the depth network model during training of the decomposer according to the following formula.

Γ _D ＝Γ ₁ +Γ ₂ +Γ ₃ +Γ ₄

Wherein GAP (& gt) is a global average pooling layer (Global Average Pooling Layer), sigmoid (& gt) is a Sigmoid activation function, euclidean (& gt) is Euclidean distance, the term |·| is an absolute value calculation, ||·| is a norm calculation, Σ (·) is the content in brackets all added together.

S11, constructing a loss function of the fusion device in the depth network model during training according to the following formula.

Γ _F ＝Γ ₅ +Γ ₆

S12, selecting a step length of 10 by using the loss function ^-3 Adopts an alternate iterative optimization strategy to optimize network parameters in the decomposer and the fusion device, namely, firstly utilizes Γ _D Updating the decomposer as a loss function for 10 generations, then using Γ _F And updating the fusion device as a loss function, wherein the iteration times are 10 generations, and inputting all pixels into the depth network model for testing after the model is stable, so as to obtain a final classification result.

Some of the above steps are described in detail below:

in step S1, the hyperspectral image and the laser radar image need to be normalized to have a value ranging from-1 to 1, and the normalization formula is as follows:

wherein ,x_min Representing minimum value, x in the pixel data _max Is the maximum value. The calculation formula of the Tanh activation function is as follows:

in step S7, the calculation formula of the multi-modal packet convolution layer is as follows:

wherein ,and w is a network layer trainable parameter, +.> Is the kronecker Product (Kronecker Product), the Product is the Dot Product, the round dup (x) is the round up, d is the number of convolution kernels in the multi-mode packet convolution layer, input is the Input of the multi-mode packet convolution layer, and Output isAnd outputting the multi-mode grouping convolution layer.

The hyperspectral image and the lidar image used in this example were taken in Trento (Italy), where the hyperspectral image was 166×600×63 in size and the lidar image was 166×600 in size.

(one) input of the present embodiment

The input hyperspectral image is an image of size 166×600×63, and the input lidar image is an image of size 166×600.

Parameter setting

And selecting a neighborhood block with the neighborhood size of 11, obtaining neighborhood blocks with the sizes of 11 multiplied by 63 and 11 multiplied by 11 for each pixel, and inputting the neighborhood blocks into a depth network model for training.

(II) training depth network model

1% of sample neighborhood blocks are randomly selected from the total 99600 groups of sample neighborhood blocks for training a depth network model, and the sample neighborhood blocks are randomly ordered and packed, wherein the number of the small sample neighborhood blocks is 512. Only one of the sample packets is used for each training. After training, inputting all 99600 sample neighborhood blocks into a depth network model for testing, finally obtaining classification results of all samples, and evaluating the classification results by selecting overall classification precision and average classification precision. The overall classification result refers to the ratio of the number of correctly classified samples divided by the total number of samples in all samples. The average classification accuracy is first the ratio of the number of correctly classified samples in each class divided by the number of the class samples, and the average value of the various ratios is calculated.

(III) results of the present example

The classification results obtained by the hyperspectral image and laser radar image fusion classification method in the remote sensing field and the currently more commonly used ENDnet network model are shown in the table 1.

Table 1 table of classification vs. results

	Overall classification accuracy	Average classification accuracy
			The method of the application	99.12％	97.76％
Commonly used ENDnet	89.23％	86.63％

It is easy to find that the method can better fuse and classify the hyperspectral image and the laser radar image, and has fewer misclassification samples. In addition, when one of the inputs of the cross-domain geographic information module is transposed in the spatial domain and the experiment is repeated, the overall classification accuracy is 95.24%, so that the method has strong model robustness. In conclusion, the method can effectively improve the classifiability and classification accuracy of the multi-source remote sensing image.

Example two

The embodiment provides a hyperspectral image and laser radar image fusion classification system, which comprises:

and a classification module: the method is used for classifying the pixels after the hyperspectral image H and the laser radar image L are fused.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present application will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present application.

Claims

1. A hyperspectral image and laser radar image fusion classification method is characterized in that: comprising the following steps:

acquiring a hyperspectral image H and a laser radar image L;

2. The hyperspectral image and lidar image fusion classification method of claim 1, wherein the hyperspectral image and lidar image fusion classification method is characterized by: the method for extracting the characteristics of the neighborhood blocks in the hyperspectral image H and the laser radar image L through the decomposer comprises the following steps:

Two-dimensional convolution layer with number of two convolution kernels being BC ₁ and C₂ Neighborhood block l of laser radar pixel ₁ and l₂ The number of channels is increased to B, and two outputs are obtained;

3. The hyperspectral image and lidar image fusion classification method of claim 2, wherein the hyperspectral image and lidar image fusion classification method is characterized by: the method for extracting the characteristics of the neighborhood blocks in the hyperspectral image H and the laser radar image L through the decomposer further comprises the following steps:

Neighborhood block l of laser radar pixel ₁ and l₂ Index of progressTransform and combine the changed result with ll ₁ and ll₂ Respectively adding to obtain laser radar feature lc ₁ and lc₂ 。

4. A hyperspectral image and lidar image fusion classification method as claimed in claim 3 wherein: the aggregator comprises a cross-domain geographic information network stream and a reflected information stream, wherein the cross-domain geographic information network stream comprises a first cross-domain geographic information module and a second cross-domain geographic information module, and the reflected information stream comprises a third cross-domain geographic information module and a fourth cross-domain geographic information module.

5. The method for fusion classification of hyperspectral image and lidar image according to claim 4, wherein the method comprises the steps of: the construction method of the cross-domain geographic information network flow comprises the following steps:

slc ₁ ^out1 ＝(ex ₁ +ex ₂ )/2

the formula of the multi-mode grouping convolution layer is as follows:

wherein ,and w is a network layer trainable parameter, +.> As the Cronecker product, the product is calculated by dot multiplication, the round dup (x) is rounded up, d is the number of convolution kernels in the multi-mode group convolution layer, and Input is multi-modeThe input of the grouping convolution layer, output is the Output of the multi-mode grouping convolution layer;

according to the feature s ₂ and lc₂ Constructing a second cross-domain geographic information module with the same structure as the first cross-domain geographic information module, wherein the input of the second cross-domain geographic information module is s ₂ and lc₂ The output is CMG ₂ 。

6. The method for fusion classification of hyperspectral image and lidar image according to claim 5, wherein the method comprises the steps of: the construction method of the reflection information flow comprises the following steps:

7. The hyperspectral image and lidar image fusion classification method of claim 6, wherein the hyperspectral image and lidar image fusion classification method is characterized by: the method for classifying the pixels after the hyperspectral image H and the laser radar image L are fused comprises the following steps:

the CMG is performed by using a splicing layer ₁ and GH₁ Splicing along the channel dimension to obtain F ₁ The method comprises the steps of carrying out a first treatment on the surface of the The CMG is performed by using a splicing layer ₂ and GR₂ Splicing along the channel dimension to obtain F ₂ The method comprises the steps of carrying out a first treatment on the surface of the The F is subjected to ₁ and F₂ Inputting two different full connection layers to obtain final output as and />

wherein ,neighborhood block h of hyperspectral pixel identified for depth network model ₁ Neighborhood block l of laser radar pixels at same position ₁ Category of->Neighborhood block h of hyperspectral pixel identified for depth network model ₂ Neighborhood block l of laser radar pixels at same position ₂ Is a category of (2); the number of nodes of two different full connection layers is c, and the activation function is a Softmax function.

8. The hyperspectral image and lidar image fusion classification method of claim 6, wherein the hyperspectral image and lidar image fusion classification method is characterized by: the loss function formula of the decomposer in the depth network model during training is as follows:

Γ _D ＝Γ ₁ +Γ ₂ +Γ ₃ +Γ ₄

wherein GAP (·) is a global average pooling layer, sigmoid (·) is a Sigmoid activation function, euclidean (·) is Euclidean distance, || is absolute value calculation, || is norm calculation, and Σ (·) is adding all the contents in brackets together.

9. The hyperspectral image and lidar image fusion classification method of claim 7, wherein the hyperspectral image and lidar image fusion classification method is characterized by: the loss function formula of the fusion device in the depth network model during training is as follows:

Γ _F ＝Γ ₅ +Γ ₆

10. a hyperspectral image and laser radar image fusion classification system is characterized in that: comprising the following steps: