CN115331110A

CN115331110A - Fusion classification method and device for remote sensing hyperspectral image and laser radar image

Info

Publication number: CN115331110A
Application number: CN202211037953.6A
Authority: CN
Inventors: 于文博; 黄鹤; 沈纲祥
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2022-08-26
Filing date: 2022-08-26
Publication date: 2022-11-11
Also published as: WO2024040828A1

Abstract

The invention relates to a fusion classification method of a remote sensing hyperspectral image and a laser radar image, which comprises the steps of obtaining a hyperspectral image and a laser radar image, carrying out intrinsic image decomposition on the hyperspectral image to obtain an intrinsic image and an illumination image, selecting a neighborhood block for each hyperspectral intrinsic pixel, hyperspectral illumination pixel and laser radar pixel, training a plurality of depth network branches by using the neighborhood block, splicing the outputs of the depth network branches pairwise by using a splicing layer, and carrying out multi-mode fusion on the spliced outputs to obtain the final output category. The method for fusing and classifying the remote sensing hyperspectral images and the laser radar images can fully fuse important judgment information in multisource remote sensing images, achieve the purpose of high-precision classification of target pixels, fully avoid loss and loss of the important information in the fusion process, and reduce the problems of low classification precision and the like caused by information loss.

Description

Fusion classification method and device for remote sensing hyperspectral image and laser radar image

Technical Field

The invention relates to the technical field of remote sensing image processing, in particular to a method and a device for fusion classification of remote sensing hyperspectral images and laser radar images.

Background

In the field of remote sensing, hyperspectral images and lidar images are widely used in various related studies. The hyperspectral image has abundant spatial information and spectral information, wherein the spatial information is spatial position information of pixels under various wavelengths, and the spectral information is a spectral curve formed by spectral reflectivities of a single pixel under various wavelengths. The laser radar image records the elevation information of the target ground object, the hyperspectral image and the laser radar image are fully fused, the information complementation effect can be achieved, and then the complete information of the ground object is learned and modeled. Meanwhile, the embedding characteristics in the pixels can be fully mined by fusing and classifying the two remote sensing images, so that the identification precision of subsequent classification research is improved. In the early stage, the fusion classification method usually adopts two independent branches to perform feature extraction on two images, and realizes the fusion of multi-source information in a simple connection mode and the like, but the method does not consider the relevance among different branches, and is difficult to realize the balance of the multi-source information. With the improvement of computer computing power and the deepening of deep learning research, methods for achieving full fusion of hyperspectral images and laser radar images through training of neural networks are proposed successively, the methods improve the information extraction process of different images, the relevance of the images is improved, and the performance of the algorithm is improved.

At present, a hyperspectral image and laser radar image fusion classification method in the field of remote sensing can be generally divided into a fusion classification method based on classical machine learning and a fusion classification method based on deep learning. The fusion classification method based on classical machine learning is mainly based on the classical machine learning theory, and utilizes spatial information and spectral information in a hyperspectral image and elevation information in a laser radar image to construct a feature extraction module and a fusion module, so that joint expression among different remote sensing images is realized. More commonly used machine learning theories include Principal Component Analysis (PCA), minimum Noise separation (MNF), linear Discriminant Analysis (LDA), and the like. Other machine learning methods, such as manifold learning algorithms, structure sparsification algorithms, dictionary set decomposition algorithms, and the like, play an important role. The method generally extracts discrimination information in a hyperspectral image and a laser radar image, and ensures the classifiable capability of the sample by fusing different information. With the continuous deepening of the deep learning theory, some deep network models are also applied to the research of the fusion classification of the hyperspectral image and the laser radar image, such as an Auto-encoder (AE), a Variational Auto-encoder (VAE), a Long-Short-term Memory network (LSTM), and the like. Such methods describe the discriminant features contained in the samples from multiple aspects by extracting deep discriminant information through a complex network structure, and therefore, more and more fusion classification methods based on deep learning are proposed. For example, deep Encoder-Decoder Networks for Classification of Hyperspectral and LiDAR Data published by Danfeng Hong et al in IEEE Geoscience and Remote Sensing Letters in 2020 propose a full-connection network based on Encoder and Decoder structures, which extracts and fuses features in a Hyperspectral image and a LiDAR image respectively, thereby realizing reconstruction of feature information and transmission of a desired deeper embedding space. In addition, more over Means beans Better, multi modal Deep Learning Means Remote-Sensing image Classification, published in the same year by IEEE Transactions on Geoscience and Remote Sensing, proposed a Deep Learning framework for multi modal data, which performs a second Learning of complementary information between multi modal images by performing parameter cross-selection during network training. Therefore, the deep learning is widely applied to the fusion classification research of the hyperspectral image and the laser radar image in the remote sensing field and obtains a better result.

However, the existing hyperspectral image and lidar image fusion classification method in the remote sensing-oriented field has certain defects: (1) the existing method does not consider the relevance between the illumination information of the hyperspectral image and the elevation information in the laser radar image, so that the two are difficult to realize deep fusion, and the performance of a classification model is weakened; (2) the existing method does not apply illumination information of a hyperspectral image to the construction of a fusion classification model, does not consider the decomposition of the hyperspectral image into an intrinsic image and an illumination image and give full play to the advantages of the intrinsic image and the illumination image, and some methods try to introduce the intrinsic decomposition theory into the classification model, but directly abandon the decomposed illumination image, only use the intrinsic image and a laser radar image for fusion classification, and cannot give full play to the advantages of a multi-modal remote sensing image; (3) the existing method only uses completely separated branches to perform information mining and feature extraction when extracting discrimination information from a hyperspectral image and a laser radar image, which is not beneficial to fully grasping complete information of pixels and is difficult to exert the advantages of a multi-modal remote sensing image in the aspect of pixel classification and identification; (4) the conventional method usually adopts a convolutional neural network when extracting image space information, but the conventional convolutional neural network does not consider the limitation in multi-modal learning, and has no excessive structural design for the fusion of information between different modal images, so that the method is not beneficial to improving the fusion classification precision of hyperspectral images and laser radar images.

Disclosure of Invention

Therefore, the technical problem to be solved by the invention is to overcome the problems in the prior art, and provide a method and a device for fusion classification of remote sensing hyperspectral images and laser radar images, which can fully fuse important judgment information in multisource remote sensing images, realize the purpose of high-precision classification of target pixels, fully avoid the loss and the loss of important information in the fusion process, and reduce the problems of classification precision reduction and the like caused by information loss.

In order to solve the technical problem, the invention provides a fusion classification method of a remote sensing hyperspectral image and a laser radar image, which comprises the following steps:

s1: acquiring a hyperspectral image and a laser radar image, wherein the category of each object in the two images is label;

s2: performing intrinsic image decomposition on the hyperspectral image to obtain an intrinsic image and an illumination image, and selecting a neighborhood with the surrounding size of s multiplied by s as a neighborhood block of each hyperspectral intrinsic pixel, hyperspectral illumination pixel and lidar pixel, wherein the neighborhood block size of each hyperspectral pixel in the hyperspectral image is s multiplied by B, and the neighborhood block size of the lidar pixel in the lidar image L is s multiplied by s;

s3: training deep network branch L using neighborhood blocks ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ And L ₆ Wherein L is ₁ And L ₂ The input of (a) is a hyperspectral intrinsic pixel with size of s × s × B in a hyperspectral intrinsic image, L ₃ And L ₄ Input of (2) is a lidar pixel of size sxsxsxsxb, L in a lidar image ₅ And L ₆ The input of the high spectrum illumination image is a high spectrum illumination pixel with the size of s multiplied by B, and the output is O ¹ ，O ² ，O ³ ，O ⁴ ，O ⁵ And O ⁶ The sizes are s multiplied by d;

s4: deep network tributary L using a concatenation layer ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ And L ₆ Output of (2) is spliced two by two to obtain O ¹² 、O ³⁴ And O ⁵⁶ ；

S5: is prepared from O ³⁴ And O ⁵⁶ Input to the 1 st multi-mode packet convolutional layer to obtain output O ³⁴⁵⁶ ₁ Introducing O ³⁴ Inputting into the 2 nd multi-mode packet convolution layer to obtain output O ³⁴ ₁ Introducing O ⁵⁶ Input into the 3 rd multi-mode packet convolution layer to obtain output O ⁵⁶ ₁ Introducing O ³⁴ ₁ 、O ³⁴⁵⁶ ₁ And O ⁵⁶ ₁ Inputting to the 4 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₂ Introducing O into ³⁴ ₁ Input the methodTo the 5 th multi-mode packet convolutional layer to obtain output O ³⁴ ₂ Introducing O ⁵⁶ ₁ Inputting into the 6 th multi-mode packet convolution layer to obtain output O ⁵⁶ ₂ Introducing O ³⁴ ₂ 、O ³⁴⁵⁶ ₂ And O ⁵⁶ ₂ Input into the 7 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₃ Introducing O ³⁴ ₂ Input into the 8 th multi-mode packet convolution layer to obtain output O ³⁴ ₃ Introducing O ⁵⁶ ₂ Inputting the data into the 9 th multi-mode packet convolution layer to obtain the output O ⁵⁶ ₃ Introducing O ³⁴ ₃ 、O ³⁴⁵⁶ ₃ And O ⁵⁶ ₃ Input into the 10 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₄ Introducing O ¹² And O ³⁴⁵⁶ ₁ Inputting into 11 th multi-mode packet convolution layer to obtain output O ¹² ₁ Introducing O ¹² ₁ And O ³⁴⁵⁶ ₂ Inputting into 12 th multi-mode packet convolution layer to obtain output O ¹² ₂ Introducing O ¹² ₂ And O ³⁴⁵⁶ ₃ Inputting into 13 th multi-mode packet convolution layer to obtain output O ¹² ₃ O of size s × s × d ³⁴⁵⁶ ₄ And O ¹² ₃ Inputting a two-dimensional average pooling layer with the size of s x s to obtain O with the size of 1 x d ³⁴⁵⁶ ₅ And O ¹² ₄ ；

S6: is prepared from O ¹² ₄ And O ³⁴⁵⁶ ₅ Input splice layer to obtain output O ¹²³⁴⁵⁶ With a size of 1X 2d, is ¹²³⁴⁵⁶ Inputting the full connection layer to obtain the final output of the class

In an embodiment of the present invention, in step S1, after selecting the hyperspectral image and the lidar image, the hyperspectral image and the lidar image are subjected to normalization preprocessing.

In an embodiment of the present invention, the method for performing intrinsic image decomposition on the hyperspectral image in step S2 to obtain an intrinsic image and an illumination image includes:

s2.1: calculating each hyperspectral pixel H _i Corresponding matrix D _i Wherein i is more than or equal to 1 and less than or equal to X multiplied by Y:

D _i ＝[H ₁ ，...，H _i-1 ，H _i+1 ，...，H _X×Y ，I _B ]∈R ^{B×(B+X×Y-1)}

in which I _B Is an identity matrix with the size of B multiplied by B;

s2.2: based on matrix D _i Calculating each hyperspectral pixel H _i Corresponding vector alpha _i ：

min||α _i || ₁ s.t.H _i ＝D _i α _i

Wherein alpha is _i The shape of (B + X Y-1) X1;

s2.3: constructing a weight matrix W epsilon R ^{(X×Y)×(X×Y)} For the element W of the ith row and jth column in the weight matrix _ij Value assignment is performed

Calculating a matrix G = (I) based on a weight matrix W _X×Y -W ^T )(I _X×Y -W)+δI _X×Y In which I _X×Y Is an identity matrix of size (X × Y) X (X × Y), δ is a constant, f is a transpose matrix;

s2.4: transforming the hyperspectral image H into a two-dimensional matrix, carrying out logarithmic calculation to obtain log (scatter (H)), and calculating a matrix K = (I) _B -1 _B 1 _B ^T /B)log(flatten(H))(I _X×Y -1 _X×Y 1 _X×Y ^T /(X. Times.Y)), wherein I _B And I _X×Y Are identity matrices of dimensions B × B and (X × Y) × (X × Y), 1 _B And 1 _X×Y All 1 vectors of size bx 1 and (X × Y) × 1, respectively;

s2.5: calculating a matrix ρ = δ KG based on the matrix G and the matrix K ^-1 Obtaining an intrinsic image RE = e obtained by decomposing the hyperspectral image H based on the matrix rho ^ρ And the illumination image SH = e ^log(H)-ρ Where e is a natural constant, the sizes of both images are X × Y × B.

In one embodiment of the present invention, the deep network leg L in the step S3 ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ And L ₆ Each including multiple two-dimensional convolution layers, each having d convolution kernels and a convolution kernel size of [3,3]]The convolution kernel sliding step lengths are all [1,1]Branch L ₂ And L ₃ Sharing all weights, tributaries L ₄ And L ₅ All weights are shared.

In an embodiment of the present invention, the loss function when constructing the deep network leg training in step S3 is:

where label is the input image category,

is the output image category.

In an embodiment of the present invention, the deep network leg L is split into two or more network legs L by using a splicing layer in step S4 ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ And L ₆ Output of (2) is spliced two by two to obtain O ¹² 、O ³⁴ And O ⁵⁶ Is represented by the formula O ¹² ＝Concatenation(O ¹ ，O ² )，O ³⁴ ＝Concatenation(O ³ ，O ⁴ )，O ⁵⁶ ＝Concatenation(O ⁵ ，O ⁶ )。

In addition, the invention also provides a fusion classification device of the remote sensing hyperspectral image and the laser radar image, which comprises:

the data acquisition module is used for acquiring a hyperspectral image and a laser radar image, and the category of each object in the two images is label;

the image decomposition module is used for performing intrinsic image decomposition on the hyperspectral image to obtain an intrinsic image and an illumination image, and selecting a neighborhood with the surrounding size of s multiplied by s as a neighborhood block of each hyperspectral intrinsic pixel, each hyperspectral illumination pixel and each lidar pixel, wherein the neighborhood block size of each hyperspectral pixel in the hyperspectral image is s multiplied by B, and the neighborhood block size of each lidar pixel in the lidar image L is s multiplied by s;

a deep network training module for training the deep network branch L by using the neighborhood block ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ And L ₆ Wherein L is ₁ And L ₂ The input of (a) is a hyperspectral intrinsic pixel with size of s × s × B in a hyperspectral intrinsic image, L ₃ And L ₄ The input of (a) is a lidar pixel of size sxsxsxsxb in a lidar image, L ₅ And L ₆ The input of the high spectrum illumination image is a high spectrum illumination pixel with the size of s multiplied by B, and the output is O ¹ ，O ² ，O ³ ，O ⁴ ，O ⁵ And O ⁶ The sizes are s multiplied by d;

image stitching module for stitching the deep network leg L with a stitching layer ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ And L ₆ The output of (A) is spliced two by two to obtain O ¹² 、O ³⁴ And O ⁵⁶ ；

Multimodal fusion module for fusing O ³⁴ And O ⁵⁶ Input to the 1 st multi-mode packet convolutional layer to obtain output O ³⁴⁵⁶ ₁ Introducing O ³⁴ Inputting into the 2 nd multi-mode packet convolution layer to obtain output O ³⁴ ₁ Introducing O ⁵⁶ Inputting to the 3 rd multi-mode packet convolution layer to obtain output O ⁵⁶ ₁ Introducing O ³⁴ ₁ 、O ³⁴⁵⁶ ₁ And O ⁵⁶ ₁ Input into the 4 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₂ Introducing O ³⁴ ₁ Input into the 5 th multi-mode packet convolution layer to obtain output O ³⁴ ₂ Introducing O ⁵⁶ ₁ Inputting into the 6 th multi-mode packet convolution layer to obtain output O ⁵⁶ ₂ Introducing O into ³⁴ ₂ 、O ³⁴⁵⁶ ₂ And O ⁵⁶ ₂ Inputting to 7 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₃ Introducing O ³⁴ ₂ Inputting into 8 th multi-mode packet convolution layer to obtain output O ³⁴ ₃ Introducing O ⁵⁶ ₂ Inputting the data into the 9 th multi-mode packet convolution layer to obtain the output O ⁵⁶ ₃ Introducing O ³⁴ ₃ 、O ³⁴⁵⁶ ₃ And O ⁵⁶ ₃ Input into the 10 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₄ Introducing O ¹² And O ³⁴⁵⁶ ₁ Inputting into 11 th multi-mode packet convolution layer to obtain output O ¹² ₁ Introducing O ¹² ₁ And O ³⁴⁵⁶ ₂ Inputting into 12 th multi-mode packet convolution layer to obtain output O ¹² ₂ Introducing O ¹² ₂ And O ³⁴⁵⁶ ₃ Inputting into 13 th multi-mode packet convolution layer to obtain output O ¹² ₃ O of size s.times.s.times.d ³⁴⁵⁶ ₄ And O ¹² ₃ Inputting a two-dimensional average pooling layer with the size of s x s to obtain O with the size of 1 x d ³⁴⁵⁶ ₅ And O ¹² ₄ ；

Image classification module to classify O ¹² ₄ And O ³⁴⁵⁶ ₅ Input splice layer to obtain output O ¹²³⁴⁵⁶ With a size of 1X 2d, is ¹²³⁴⁵⁶ Inputting the full connection layer to obtain the final output of the type

In an embodiment of the present invention, the data acquisition module includes a data preprocessing submodule, and the data preprocessing submodule is configured to perform normalization preprocessing on the hyperspectral image and the lidar image after the hyperspectral image and the lidar image are selected.

In an embodiment of the present invention, the intrinsic image decomposition module performs intrinsic image decomposition on the hyperspectral image to obtain an intrinsic image and an illumination image, and the intrinsic image and the illumination image include:

calculating each hyperspectral pixel H _i Corresponding matrix D _i Wherein i is more than or equal to 1 and less than or equal to X multiplied by Y:

wherein I _B Is an identity matrix with the size of B multiplied by B;

based on matrix D _i Calculating each hyperspectral pixel H _i Corresponding vector alpha _i ：

min||α _i || ₁ s.t.H _i ＝D _i α _i

Wherein alpha is _i The shape of (B + X Y-1) X1;

constructing a weight matrix W epsilon R ^{(X×Y)×(X×T)} For the element W of the ith row and the jth column in the weight matrix _ij Value assignment is performed

Calculating a matrix G = (I) based on a weight matrix W _X×Y -W ^T )(I _X×Y -W)+δI _X×Y In which I _X×Y Is an identity matrix of size (X × Y) × (X × Y), δ is a constant, T is a transposed matrix;

transforming the hyperspectral image H into a two-dimensional matrix, carrying out logarithmic calculation to obtain log (scatter (H)), and calculating a matrix K = (I) _B -1 _B 1 _B ^T /B)log(flatten(H))(I _X×Y -1 _X×Y 1 _X×Y ^T /(X X.times.Y)), where I _B And I _X×Y Are identity matrices of dimensions B × B and (X × Y) × (X × Y), 1 _B And 1 _X×Y All 1 vectors of size bx 1 and (X × Y) × 1, respectively;

calculating a matrix ρ = δ KG based on the matrix G and the matrix K ^-1 Obtaining an eigen image RE = e obtained by decomposing the hyperspectral image H based on the matrix rho ^ρ And the illumination image SH = e ^log(H)-ρ Where e is a natural constant, the sizes of both images are X × Y × B.

In one embodiment of the invention, theDeep network leg L ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ And L ₆ Each including multiple two-dimensional convolution layers, each having d convolution kernels and a convolution kernel size of [3,3]]The convolution kernel sliding step lengths are all [1,1]Branch L ₂ And L ₃ Sharing all weights, tributaries L ₄ And L ₅ All weights are shared.

Compared with the prior art, the technical scheme of the invention has the following advantages:

1. the remote sensing hyperspectral image and laser radar image fusion classification method provided by the invention can fully fuse important discrimination information in a multisource remote sensing image, realize the purpose of high-precision classification of target pixels, fully avoid the loss and the loss of important information in the fusion process, and reduce the problems of classification precision reduction and the like caused by information loss;

2. according to the method, the intrinsic image decomposition theory of the hyperspectral image is applied to the fusion classification research of the hyperspectral image and the laser radar image, the intrinsic image decomposition theory and the fusion classification research of the multimodal remote sensing image are fully improved, the phenomenon that the illumination image obtained by decomposition is abandoned during the conventional intrinsic image decomposition is avoided, and the loss of information is reduced;

3. the invention provides a method for fully fusing an illumination image obtained by decomposing a hyperspectral image and a laser radar image, so that the relevance between illumination information in the hyperspectral image and elevation information in the laser radar image is fully mined and utilized, the advantages of the illumination image in model construction research are fully exerted, and the final classification performance is improved.

Drawings

In order that the present disclosure may be more readily and clearly understood, reference will now be made in detail to the present disclosure, examples of which are illustrated in the accompanying drawings.

FIG. 1 is a flow chart of a fusion classification method of a remote sensing hyperspectral image and a laser radar image provided by the invention.

FIG. 2 is a schematic frame diagram of a remote sensing hyperspectral image and lidar image fusion classification device provided by the invention.

Wherein the reference numerals are as follows: 10. a data acquisition module; 20. an image decomposition module; 30. a deep network training module; 40. an image stitching module; 50. a multimodal fusion module; 60. and an image classification module.

Detailed Description

The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.

Referring to fig. 1, an embodiment of the present invention provides a method for fusion classification of a remote sensing hyperspectral image and a lidar image, including:

s3: training deep network branch L using neighborhood blocks ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ And L ₆ Wherein L is ₁ And L ₂ The input of (a) is a hyperspectral intrinsic pixel with size of s × s × B in a hyperspectral intrinsic image, L ₃ And L ₄ The input of (a) is a lidar pixel of size sxsxsxsxb in a lidar image, L ₅ And L ₆ The input of the high spectrum illumination image is a high spectrum illumination pixel with the size of s multiplied by B, and the output is O ¹ ，O ² ，O ³ ，O ⁴ ，O ⁵ And O ⁶ The sizes are s multiplied by d;

s4: deep network tributaries L using a splice layer ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ And L ₆ The output of (A) is spliced two by two to obtain O ¹² 、O ³⁴ And O ⁵⁶ ；

S5: mixing O with ³⁴ And O ⁵⁶ Input to the 1 st multi-mode packet convolutional layer to obtain output O ³⁴⁵⁶ ₁ Introducing O ³⁴ Inputting into the 2 nd multi-mode packet convolution layer to obtain output O ³⁴ ₁ Introducing O ⁵⁶ Inputting to the 3 rd multi-mode packet convolution layer to obtain output O ⁵⁶ ₁ Introducing O ³⁴ ₁ 、O ³⁴⁵⁶ ₁ And O ⁵⁶ ₁ Inputting to the 4 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₂ Introducing O into ³⁴ ₁ Input into the 5 th multi-mode packet convolution layer to obtain output O ³⁴ ₂ Introducing O ⁵⁶ ₁ Inputting into the 6 th multi-mode packet convolution layer to obtain output O ⁵⁶ ₂ Introducing O ³⁴ ₂ 、O ³⁴⁵⁶ ₂ And O ⁵⁶ ₂ Inputting to 7 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₃ Introducing O ³⁴ ₂ Inputting into 8 th multi-mode packet convolution layer to obtain output O ³⁴ ₃ Introducing O ⁵⁶ ₂ Inputting the data into the 9 th multi-mode packet convolution layer to obtain the output O ⁵⁶ ₃ Introducing O into ³⁴ ₃ 、O ³⁴⁵⁶ ₃ And O ⁵⁶ ₃ Input into the 10 th multi-mode packet convolution layer to obtain the output O ³⁴⁵⁶ ₄ Introducing O ¹² And O ³⁴⁵⁶ ₁ Inputting into 11 th multi-mode packet convolution layer to obtain output O ¹² ₁ Introducing O ¹² ₁ And O ³⁴⁵⁶ ₂ Inputting into 12 th multi-mode packet convolution layer to obtain output O ¹² ₂ Introducing O ¹² ₂ And O ³⁴⁵⁶ ₃ Input into the 13 th multi-mode packet convolution layer to obtain the output O ¹² ₃ O of size s × s × d ³⁴⁵⁶ ₄ And O ¹² ₃ Inputting a two-dimensional average pooling layer with the size of s x s to obtain O with the size of 1 x d ³⁴⁵⁶ ₅ And O ¹² ₄ ；

S6: mixing O with ¹² ₄ And O ³⁴⁵⁶ ₅ Input splicing layer to obtain output O ¹²³⁴⁵⁶ With a size of 1X 2d, and ¹²³⁴⁵⁶ inputting the full connection layer to obtain the final output of the class

Specifically, in the step S1, a hyperspectral image H and a lidar image L are selected according to practical problems, where the hyperspectral image size is X × Y × B, X and Y are the spatial sizes of the hyperspectral images in each waveband, B is the number of the waveband of the hyperspectral image, the lidar image size is X × Y, X and Y are the spatial sizes of the lidar images, and the spatial sizes of the two images are the same. The method comprises the steps of carrying out normalization preprocessing on a hyperspectral image and a laser radar image, setting a neighborhood size s (s is an odd number larger than 0), setting the number of convolution kernels of each two-dimensional convolution layer to be d, the size of the convolution kernels to be [3,3], the sliding step length of the convolution kernels to be [1,1], filling parameters (Padding) of each two-dimensional convolution layer to be kept the Same (Same)', selecting Tanh functions for activation functions, setting the category of each ground object in the two images to be label, setting the category size to be 1X (X X Y) and setting the category number to be c.

The method for performing intrinsic image decomposition on the hyperspectral image to obtain the intrinsic image and the illumination image in the step S2 comprises the following steps:

D _i ＝[H ₁ ，...，H _i-1 ，H _i+1 ，...，H _X×Y ，I _B ]∈R ^B×(B+X×Y-1 )

wherein I _B Is an identity matrix with the size of B multiplied by B;

min||α _i || ₁ s.t.H _i ＝D _i α _i

Wherein alpha is _i The shape of (B + X Y-1) X1;

s2.3: constructing a weight matrix W epsilon R ^{(X×Y)×(X×Y)} For the element W of the ith row and the jth column in the weight matrix _ij Assign a value

Calculating a matrix G = (I) based on the weight matrix W _X×Y -W ^T )(I _X×Y -W)+δI _X×Y In which I _X×Y Is an identity matrix of size (X × Y) × (X × Y), δ is a constant, T is a transposed matrix;

s2.4: transforming the hyperspectral image H into a two-dimensional matrix, carrying out logarithmic calculation to obtain log (scatter (H)), and calculating a matrix K = (I) _B -1 _B 1 _B ^T /B)log(flatten(H))(I _X×Y -1 _X×Y 1 _X×Y ^T /(X X.times.Y)), where I _B And I _X×Y Are unit matrices of sizes B × B and (X × Y) × (X × Y), 1 _B And 1 _X×Y All 1 vectors of size bx 1 and (X × Y) × 1, respectively;

In the step S2, for each hyperspectral intrinsic pixel, hyperspectral illumination pixel and lidar pixel (X multiplied by Y pixels exist in three images), a neighborhood with the surrounding size of S multiplied by S is selected as a neighborhood block of the pixel, wherein the neighborhood block size of each hyperspectral pixel in the hyperspectral image H is S multiplied by B, and the neighborhood block size of the lidar pixel in the lidar image L is S multiplied by S.

In step S3, six deep network branches, L respectively, are first constructed ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ And L ₆ Wherein L is ₁ And L ₂ The input of (a) is a hyperspectral intrinsic pixel with size of sxsxsxsBb in a hyperspectral intrinsic image RE, L ₃ And L ₄ Is inputtedIs a laser radar pixel with the size of s multiplied by B in a laser radar image L, L ₅ And L ₆ The input of the network is a hyperspectral illumination pixel with the size of s multiplied by B in a hyperspectral illumination image SH, the six depth network branches are respectively composed of three two-dimensional convolution layers, the number of convolution kernels in each two-dimensional convolution layer is d, and the sliding step length of the convolution kernels is [1,1]Branch L ₂ And L ₃ Sharing all weights, tributaries L ₄ And L ₅ All the weights are shared, and finally the outputs of the six deep network branches are O respectively ¹ ，O ² ，O ³ ，O ⁴ ，O ⁵ And O ⁶ The sizes are s × s × d.

The loss function in the step S3 when the deep network branch training is constructed is as follows:

where label is the input image category,

is the output image category.

In the step S4, the output of the six deep network branches is spliced two by using a splicing Layer (Concatenation Layer), according to the following formula:

O ¹² ＝Concatenation(O ¹ ，O ² )

O ³⁴ ＝Concatenation(O ³ ，O ⁴ )

O ⁵⁶ ＝Concatenation(O ⁵ ，O ⁶ )。

in the step S5, concatenation (O) ³⁴ ，O ⁵⁶ ) Input to the 1 st multi-mode packet convolutional layer to obtain output O ³⁴⁵⁶ ₁ Introducing O ³⁴ Input into the 2 nd multi-mode packet convolution layer to obtain output O ³⁴ ₁ Introducing O ⁵⁶ Inputting to the 3 rd multi-mode packet convolution layer to obtain output O ⁵⁶ ₁ Condensation (O) ³⁴ ₁ ，O ³⁴⁵⁶ ₁ ，O ⁵⁶ ₁ ) Input into the 4 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₂ Introducing O ³⁴ ₁ Input into the 5 th multi-mode packet convolution layer to obtain output O ³⁴ ₂ Introducing O into ⁵⁶ ₁ Input into the 6 th multi-mode packet convolution layer to obtain output O ⁵⁶ ₂ Condensation (O) ³⁴ ₂ ，O ³⁴⁵⁶ ₂ ，O ⁵⁶ ₂ ) Inputting to 7 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₃ Introducing O ³⁴ ₂ Inputting into 8 th multi-mode packet convolution layer to obtain output O ³⁴ ₃ Introducing O into ⁵⁶ ₂ Input into the 9 th multi-mode packet convolution layer to obtain the output O ⁵⁶ ₃ Condensation (O) ³⁴ ₃ ，O ³⁴⁵⁶ ₃ ，O ⁵⁶ ₃ ) Input into the 10 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₄ O of size s.times.s.times.d ³⁴⁵⁶ ₄ Inputting a two-dimensional Average Pooling Layer (Average Pooling Layer) with size of s × s to obtain O with size of 1 × d ³⁴⁵⁶ ₅ (ii) a Condensation (O) ¹² ，O ³⁴⁵⁶ ₁ ) Inputting into 11 th multi-mode packet convolution layer to obtain output O ¹² ₁ Condensation (O) ¹² ₁ ，O ³⁴⁵⁶ ₂ ) Inputting into 12 th multi-mode packet convolution layer to obtain output O ¹² ₂ Condensation (O) ¹² ₂ ，O ³⁴⁵⁶ ₃ ) Inputting into 13 th multi-mode packet convolution layer to obtain output O ¹² ₃ With a size of s × s × d, and ¹² ₃ inputting a two-dimensional Pooling Layer (Average Pooling Layer) with size of s × s to obtain O with size of 1 × d ¹² ₄ 。

In the step S6, O is added ¹² ₄ And O ³⁴⁵⁶ ₅ Input splicing layer gets input O ¹²³⁴⁵⁶ ＝Concatenation(O ¹² ₄ ，O ³⁴⁵⁶ ₅ ) The size is 1 × 2d. Mixing O with ¹²³⁴⁵⁶ Inputting a full connection layer, wherein the number of nodes is c, the activation function is a Softmax function, and the type of the final output is obtained

According to the remote sensing hyperspectral image and laser radar image fusion classification method, the intrinsic image decomposition theory and the multi-mode remote sensing image fusion classification research are efficiently combined, the advantages of illumination images are fully played in model construction, and loss of important discrimination information are reduced.

According to the method, the intrinsic image of the hyperspectral image is decomposed and introduced into the multi-modal remote sensing image fusion classification research for the first time, the purpose of balancing the elevation information in the illumination image and the laser radar image is achieved, meanwhile, the illumination information and the elevation information are fused, the intrinsic information mining process is guided, and the classification capability of the sample is improved.

The invention provides a method for fully fusing an illumination image obtained by decomposing a hyperspectral image and a laser radar image, so that the relevance between illumination information in the hyperspectral image and elevation information in the laser radar image is fully mined and utilized, and the advantages of the illumination image in model construction research are fully exerted.

The invention provides a distinguishing feature extraction method for hyperspectral images and laser radar images, which can greatly improve the relevance of the hyperspectral images and the laser radar images in the information mining process and reduce the occurrence of information imbalance.

The invention provides a method for applying a multi-mode grouping and convolution layer to hyperspectral image and laser radar image fusion classification research, which fully exerts the application value of the multi-mode grouping and convolution layer in the research field of the invention, greatly improves the joint capability among different modes, reduces unnecessary redundant information and strengthens the expression capability of important information.

The hyperspectral images and the lidar images adopted by the method for fusion classification of the remote sensing hyperspectral images and the lidar images are shot in Italy, tento (Italy), wherein the size of the hyperspectral images is 166 multiplied by 600 multiplied by 63, and the size of the lidar images is 166 multiplied by 600.

The embodiment inputs:

the input hyperspectral image is an image of 166 × 600 × 63, and the input lidar image is an image of 166 × 600.

(II) parameter setting

The neighborhood size is 11 and the number of convolution kernels per two-dimensional convolution layer is 120.

Decomposing the hyperspectral image to obtain an intrinsic image with the size of 166 multiplied by 600 multiplied by 63 and an illumination image with the size of 166 multiplied by 600 multiplied by 63.

And selecting neighborhood information, obtaining a neighborhood block with the size of 11 multiplied by 63 aiming at each pixel, and inputting the neighborhood block into a depth network for training.

(III) training the deep network model

Randomly selecting 10% sample neighborhood blocks from 99600 sample neighborhood blocks in total for training a deep network model, randomly ordering and packaging the sample neighborhood blocks, wherein the number of the small-batch sample neighborhood blocks is 512. Only one of the sample packets is used for each training. After training is finished, all 99600 sample neighborhood blocks are input into a deep network model for testing, finally, classification results of all samples are obtained, and overall classification accuracy and average classification accuracy are selected to evaluate the classification results. The overall classification result refers to the ratio of the number of correctly classified samples in all samples divided by the number of total samples. The average classification accuracy is obtained by dividing the number of samples classified correctly in each class by the ratio of the number of samples in the class and calculating the average value of the ratios of the classes.

(IV) results of this example

The classification results obtained by the remote sensing hyperspectral image and laser radar image fusion classification method and device provided by the invention and the conventional multi-stream encoder are shown in the following table 1.

	Overall classification accuracy	Average classification accuracy
			The method of the invention	92.84％	90.74％
Common multi-stream encoder	86.23％	83.79％

The method can better fuse and classify the hyperspectral image and the laser radar image, and has fewer misclassified samples. In addition, when the hyperspectral image decomposition part in the method is removed and the experiment is repeated, the obtained overall classification precision is 85.49%, so that the method has strong information mining capability. In conclusion, the method can effectively improve the classifiability and the classification precision of the multi-source remote sensing image.

In the following, a remote sensing hyperspectral image and lidar image fusion classification device disclosed by the second embodiment of the invention is introduced, and a remote sensing hyperspectral image and lidar image fusion classification device described below and a remote sensing hyperspectral image and lidar image fusion classification method described above can be referred to correspondingly.

Referring to fig. 2, an embodiment of the present invention provides a device for fusion and classification of a remote sensing hyperspectral image and a lidar image, including:

the data acquisition module 10 is used for acquiring a hyperspectral image and a laser radar image, and the category of each object in the two images is label;

the image decomposition module 20 is used for performing intrinsic image decomposition on the hyperspectral image to obtain an intrinsic image and an illumination image, and selecting a neighborhood with the surrounding size of s × s as a neighborhood block of each hyperspectral intrinsic pixel, hyperspectral illumination pixel and lidar pixel, wherein the neighborhood block size of each hyperspectral pixel in the hyperspectral image is s × s × B, and the neighborhood block size of the lidar pixel in the lidar image L is s × s;

a deep network training module 30 for training the deep network branch L using the neighborhood block ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ And L ₆ Wherein L is ₁ And L ₂ The input of (a) is a hyperspectral intrinsic pixel with size of s × s × B in a hyperspectral intrinsic image, L ₃ And L ₄ The input of (a) is a lidar pixel of size sxsxsxsxb in a lidar image, L ₅ And L ₆ The input of (a) is a hyperspectral illumination pixel with the size of s multiplied by B in the hyperspectral illumination image, and the output is O ¹ ，O ² ，O ³ ，O ⁴ ，O ⁵ And O ⁶ The sizes are s multiplied by d;

an image stitching module 40 that stitches the deep network leg L with a stitching layer ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ And L ₆ The output of (A) is spliced two by two to obtain O ¹² 、O ³⁴ And O ⁵⁶ ；

A multimodal fusion module 50 for fusing O ³⁴ And O ⁵⁶ Input to the 1 st multi-mode packet convolutional layer to obtain output O ³⁴⁵⁶ ₁ Introducing O ³⁴ Inputting into the 2 nd multi-mode packet convolution layer to obtain output O ³⁴ ₁ Introducing O ⁵⁶ Inputting to the 3 rd multi-mode packet convolution layer to obtain output O ⁵⁶ ₁ Introducing O into ³⁴ ₁ 、O ³⁴⁵⁶ ₁ And O ⁵⁶ ₁ Inputting to the 4 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₂ Introducing O ³⁴ ₁ Input to the 5 th multi-mode packet convolution layer to obtain output O ³⁴ ₂ Introducing O ⁵⁶ ₁ Inputting into the 6 th multi-mode packet convolution layer to obtain output O ⁵⁶ ₂ Introducing O ³⁴ ₂ 、O ³⁴⁵⁶ ₂ And O ⁵⁶ ₂ Inputting to 7 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₃ Introducing O into ³⁴ ₂ Input into the 8 th multi-mode packet convolution layer to obtain output O ³⁴ ₃ Introducing O ⁵⁶ ₂ Inputting the data into the 9 th multi-mode packet convolution layer to obtain the output O ⁵⁶ ₃ Introducing O ³⁴ ₃ 、O ³⁴⁵⁶ ₃ And O ⁵⁶ ₃ Input into the 10 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₄ Introducing O ¹² And O ³⁴⁵⁶ ₁ Inputting into 11 th multi-mode packet convolution layer to obtain output O ¹² ₁ Introducing O ¹² ₁ And O ³⁴⁵⁶ ₂ Input into 12 th multi-mode packet convolution layer to obtain output O ¹² ₂ Introducing O ¹² ₂ And O ³⁴⁵⁶ ₃ Input into the 13 th multi-mode packet convolution layer to obtain the output O ¹² ₃ O of size s × s × d ³⁴⁵⁶ ₄ And O ¹² ₃ Inputting a two-dimensional average pooling layer with the size of s x s to obtain O with the size of 1 x d ³⁴⁵⁶ ₅ And O ¹² ₄ ；

An image classification module 60 for classifying O ¹² ₄ And O ³⁴⁵⁶ ₅ Input splice layer to obtain output O ¹²³⁴⁵⁶ With a size of 1X 2d, is ¹²³⁴⁵⁶ Inputting the full connection layer to obtain the final output of the class

In an embodiment of the present invention, the data obtaining module 10 includes a data preprocessing sub-module, and the data preprocessing sub-module is configured to perform normalization preprocessing on the hyperspectral image and the lidar image after selecting the hyperspectral image and the lidar image.

In an embodiment of the present invention, the image decomposition module 20 performs intrinsic image decomposition on the hyperspectral image to obtain an intrinsic image and an illumination image, including:

wherein I _B Is an identity matrix with the size of B multiplied by B;

min||α _i || ₁ s.t.H _i ＝D _i α _i

Wherein alpha is _i The shape of (B + X Y-1) X1;

constructing a weight matrix W epsilon R ^{(X×Y)×(X×Y)} For the element W of the ith row and the jth column in the weight matrix _ij Value assignment is performed

transforming the hyperspectral image H into a two-dimensional matrix, carrying out logarithmic calculation to obtain log (scatter (H)), and calculating a matrix K = (I) _B -1 _B 1 _B ^T /B)log(flatten(H))(I _X×Y -1 _X×Y 1 _X×Y ^T /(X. Times.Y)), wherein I _B And I _X×Y Are unit matrices of sizes B × B and (X × Y) × (X × Y), 1 _B And 1 _X×Y All 1 vectors of size bx 1 and (X × Y) × 1, respectively;

In one embodiment of the invention, said deep network leg L ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ And L ₆ Each including multiple two-dimensional convolution layers, each having d convolution kernels and a convolution kernel size of [3,3]]The convolution kernel sliding step lengths are all [1,1]]Branch L ₂ And L ₃ Sharing all weights, tributaries L ₄ And L ₅ All weights are shared.

The remote sensing hyperspectral image and lidar image fusion classification device is used for realizing the remote sensing hyperspectral image and lidar image fusion classification method, so that the specific implementation mode of the device can be seen from the embodiment part of the remote sensing hyperspectral image and lidar image fusion classification method in the foregoing, and therefore, the specific implementation mode can refer to the description of the corresponding partial embodiments, and the description is not expanded here.

In addition, since the remote sensing hyperspectral image and lidar image fusion classification device is used for realizing the remote sensing hyperspectral image and lidar image fusion classification method, the functions of the remote sensing hyperspectral image and lidar image fusion classification device correspond to those of the system, and the details are not repeated here.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Various other modifications and alterations will occur to those skilled in the art upon reading the foregoing description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the invention may be made without departing from the spirit or scope of the invention.

Claims

1. A remote sensing hyperspectral image and laser radar image fusion classification method is characterized by comprising the following steps:

S5: mixing O with ³⁴ And O ⁵⁶ Input to the 1 st multi-mode packet convolutional layer to obtain output O ³⁴⁵⁶¹ Introducing O into ³⁴ Input into the 2 nd multi-mode packet convolution layer to obtain output O ³⁴ ₁ Introducing O ⁵⁶ Inputting to the 3 rd multi-mode packet convolution layer to obtain output O ⁵⁶ ₁ Introducing O ³⁴ ₁ 、O ³⁴⁵⁶ ₁ And O ⁵⁶ ₁ Inputting to the 4 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₂ Introducing O ³⁴ ₁ Input to the 5 th multi-mode packet convolution layer to obtain output O ³⁴ ₂ Introducing O ⁵⁶ ₁ Input into the 6 th multi-mode packet convolution layer to obtain output O ⁵⁶ ₂ Introducing O ³⁴ ₂ 、O ³⁴⁵⁶ ₂ And O ⁵⁶ ₂ Inputting to 7 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₃ Introducing O into ³⁴ ₂ Inputting into 8 th multi-mode packet convolution layer to obtain output O ³⁴ ₃ Introducing O ⁵⁶ ₂ Inputting the data into the 9 th multi-mode packet convolution layer to obtain the output O ⁵⁶ ₃ Introducing O ³⁴ ₃ 、O ³⁴⁵⁶ ₃ And O ⁵⁶ ₃ Input into the 10 th multi-mode packet convolution layer to obtain the output O ³⁴⁵⁶ ₄ Introducing O ¹² And O ³⁴⁵⁶ ₁ Input into the 11 th multi-mode packet convolution layer to obtain the output O ¹² ₁ Introducing O ¹² ₁ And O ³⁴⁵⁶ ₂ Inputting into 12 th multi-mode packet convolution layer to obtain output O ¹² ₂ Introducing O into ¹² ₂ And O ³⁴⁵⁶ ₃ Inputting into 13 th multi-mode packet convolution layer to obtain output O ¹² ₃ O of size s.times.s.times.d ³⁴⁵⁶ ₄ And O ¹² ₃ Inputting a two-dimensional average pooling layer with the size of s multiplied by s to obtain O with the size of 1 multiplied by d ³⁴⁵⁶ ₅ And O ¹² ₄ ；

S6: mixing O with ¹² ₄ And O ³⁴⁵⁶ ₅ Input splice layer to obtain output O ¹²³⁴⁵⁶ With a size of 1X 2d, is ¹²³⁴⁵⁶ Inputting the full connection layer to obtain the final output of the class

2. The remote sensing hyperspectral image and lidar image fusion classification method according to claim 1, characterized in that: and in the step S1, after the hyperspectral image and the laser radar image are selected, the hyperspectral image and the laser radar image are subjected to normalization preprocessing.

3. The remote sensing hyperspectral image and lidar image fusion classification method according to claim 1 or 2, characterized in that: the method for performing intrinsic image decomposition on the hyperspectral image to obtain the intrinsic image and the illumination image in the step S2 comprises the following steps:

wherein I _B Is an identity matrix with the size of B multiplied by B;

min||α _i || ₁ s.t.H _i ＝D _i α _i

Wherein alpha is _i The shape of (B + X Y-1) X1;

s2.4: transforming the hyperspectral image H into a two-dimensional matrix, carrying out logarithmic calculation to obtain log (scatter (H)), and calculating a matrix K = (I) _B -1 _B 1 _B ^T /B)log(flatten(H))(I _X×Y -1 _X×Y 1 _X×Y ^T /(X. Times.Y)), wherein I _B And I _X×Y Are unit matrices of sizes B × B and (X × Y) × (X × Y), 1 _B And 1 _X×Y All 1 vectors of size bx 1 and (X × Y) × 1, respectively;

s2.5: calculating a matrix p = δ KG based on the matrix G and the matrix K ^-1 Obtaining an intrinsic image RE = e obtained by decomposing the hyperspectral image H based on the matrix rho ^ρ And the illumination image SH = e ^log(H)-ρ Where e is a natural constant, the sizes of both images are X × Y × B.

4. The remote sensing hyperspectral image and lidar image fusion classification method according to claim 1, characterized in that: the deep network branch L in the step S3 ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ And L ₆ Each including multiple two-dimensional convolutional layers, each having d convolutional cores, and having a convolutional core size of [3,3]]The convolution kernel sliding step lengths are all [1,1]Branch L ₂ And L ₃ Sharing all weights, tributaries L ₄ And L ₅ All weights are shared.

5. The remote sensing hyperspectral image and lidar image fusion classification method according to claim 1, characterized in that: the loss function in the step S3 of constructing the deep network branch training is

6. The remote sensing hyperspectral image and lidar image fusion classification method according to claim 1 or 4, characterized in that: in the step S4, the deep network branch L is divided into two branches by using the splicing layer ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ And L ₆ The output of (A) is spliced two by two to obtain O ¹² 、O ³⁴ And O ⁵⁶ Is represented by the formula O ¹² ＝Concatenation(O ¹ ，O ² )，O ³⁴ ＝Concatenation(O ³ ，O ⁴ )，O ⁵⁶ ＝Concatenation(O ⁵ ，O ⁶ )。

7. The utility model provides a remote sensing hyperspectral image fuses sorter with laser radar image which characterized in that includes:

the image decomposition module is used for carrying out intrinsic image decomposition on the hyperspectral image to obtain an intrinsic image and an illumination image, and selecting a neighborhood with the surrounding size of s multiplied by s as a neighborhood block of each hyperspectral intrinsic pixel, hyperspectral illumination pixel and lidar pixel, wherein the neighborhood block size of each hyperspectral pixel in the hyperspectral image is s multiplied by B, and the neighborhood block size of the lidar pixel in the lidar image L is s multiplied by s;

deep network training module that trains deep network tributaries L using neighborhood blocks ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ And L ₆ Wherein L is ₁ And L ₂ The input of (a) is a hyperspectral intrinsic pixel with size of s × s × B in a hyperspectral intrinsic image, L ₃ And L ₄ Input of (2) is a lidar pixel of size sxsxsxsxb, L in a lidar image ₅ And L ₆ The input of the high spectrum illumination image is a high spectrum illumination pixel with the size of s multiplied by B, and the output is O ¹ ，O ² ，O ³ ，O ⁴ ，O ⁵ And O ⁶ The sizes are s multiplied by d;

Multimodal fusion module for fusing O ³⁴ And O ⁵⁶ Input into the 1 st multi-mode packet convolutional layer to obtain output O ³⁴⁵⁶ ₁ Introducing O ³⁴ Inputting into the 2 nd multi-mode packet convolution layer to obtain output O ³⁴ ₁ Introducing O ⁵⁶ Inputting to the 3 rd multi-mode packet convolution layer to obtain output O ⁵⁶ ₁ Introducing O ³⁴ ₁ 、O ³⁴⁵⁶ ₁ And O ⁵⁶ ₁ Inputting to the 4 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₂ Introducing O ³⁴ ₁ Input to the 5 th multi-mode packet convolution layer to obtain output O ³⁴ ₂ Introducing O ⁵⁶ ₁ Input into the 6 th multi-mode packet convolution layer to obtain output O ⁵⁶ ₂ Introducing O ³⁴ ₂ 、O ³⁴⁵⁶ ₂ And O ⁵⁶ ₂ Inputting to 7 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₃ Introducing O ³⁴ ₂ Inputting into 8 th multi-mode packet convolution layer to obtain output O ³⁴ ₃ Introducing O ⁵⁶ ₂ Inputting the data into the 9 th multi-mode packet convolution layer to obtain the output O ⁵⁶ ₃ Introducing O into ³⁴ ₃ 、O ³⁴⁵⁶ ₃ And O ⁵⁶ ₃ Input into the 10 th multi-mode packet convolution layer to obtain output O ³⁴⁵⁶ ₄ Introducing O ¹² And O ³⁴⁵⁶ ₁ Inputting into 11 th multi-mode packet convolution layer to obtain output O ¹² ₁ Introducing O into ¹² ₁ And O ³⁴⁵⁶ ₂ Input into 12 th multi-mode packet convolution layer to obtain output O ¹² ₂ Introducing O ¹² ₂ And O ³⁴⁵⁶ ₃ Input into the 13 th multi-mode packet convolution layer to obtain the output O ¹² ₃ O of size s × s × d ³⁴⁵⁶ ₄ And O ¹² ₃ Inputting a two-dimensional average pooling layer with the size of s x s to obtain O with the size of 1 x d ³⁴⁵⁶ ₅ And O ¹² ₄ ；

An image classification module to classify O ¹² ₄ And O ³⁴⁵⁶ ₅ Input splice layer to obtain output O ¹²³⁴⁵⁶ With a size of 1X 2d, is ¹²³⁴⁵⁶ Inputting the full connection layer to obtain the final output of the type

8. The remote sensing hyperspectral image and lidar image fusion classification device of claim 7, wherein: the data acquisition module comprises a data preprocessing submodule, and the data preprocessing submodule is used for performing normalization preprocessing on the hyperspectral image and the laser radar image after the hyperspectral image and the laser radar image are selected.

9. The remote sensing hyperspectral image and lidar image fusion classification device according to claim 7 or 8, characterized in that: the image decomposition module carries out intrinsic image decomposition on the hyperspectral image to obtain an intrinsic image and an illumination image, and the image decomposition module comprises:

wherein I _B Is an identity matrix with the size of B multiplied by B;

min||α _i || ₁ s.t.H _i ＝D _i α _i

Wherein alpha is _i The shape of (B + X Y-1) X1;

Calculating a matrix G = (I) based on a weight matrix W _X×Y -W ^T )(I _X×Y -W)+δI _X×Y In which I _X×Y Is an identity matrix of size (X × Y) X (X × Y), δ is a constant, T is a transpose matrix;

transforming the hyperspectral image H into a two-dimensional matrix, carrying out logarithmic calculation to obtain log (scatter (H)), and calculating a matrix K = (I) _B -1 _B 1 _B ^T /B)log(flatten(H))(I _X×Y -1 _X×Y 1 _X×Y ^T /(X. Times.Y)), wherein I _B And I _X×Y Are unit matrices of sizes B × B and (X × Y) × (X × Y), 1 _B And 1 _X×Y Are respectively provided withIs a full 1 vector of dimensions bx 1 and (X Y) × 1;

10. The remote sensing hyperspectral image and lidar image fusion classification device of claim 7, wherein: the deep network leg L ₁ ，L ₂ ，L ₃ ，L ₄ ，L ₅ And L ₆ Each including multiple two-dimensional convolution layers, each having d convolution kernels and a convolution kernel size of [3,3]]The convolution kernel sliding step lengths are all [1,1]Branch L ₂ And L ₃ Sharing all weights, tributaries L ₄ And L ₅ All weights are shared.