CN117036879A

CN117036879A - Method and device for identifying ground object by fusion of hyperspectral and laser radar multimode images and spatial spectrum

Info

Publication number: CN117036879A
Application number: CN202310910549.3A
Authority: CN
Inventors: 李树涛; 丁可心; 卢婷; 付巍
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2023-07-24
Filing date: 2023-07-24
Publication date: 2023-11-10

Abstract

The application discloses a hyperspectral and laser radar multimode image space spectrum fusion ground object recognition method and device, wherein the method comprises the steps of firstly generating a hyperspectral space image block and a spectrum vector according to a hyperspectral image, and generating a radar space image block according to a radar image; and performing space-spectrum feature level fusion by using a multi-level space-spectrum fusion encoder network to obtain multi-level fusion coding features and classifying the multi-level fusion coding features by using a classifier network to obtain classification results of ground object samples, wherein the multi-level space-spectrum fusion encoder network consists of a double-branch space feature encoder branch and a spectrum feature encoder branch, wherein the double-branch space feature encoder branch is formed by multiplying outputs, and the spectrum feature encoder branch comprises a feature embedding layer and a plurality of spectrum feature encoders in cascade connection. The application aims to solve the problem of heterogeneous multi-source remote sensing data fusion between multi-mode remote sensing images such as hyperspectral and laser radar, and the like, and realize the excavation of multi-level spatial spectrum fusion characteristics, so that the performance of a classifier network does not need to highly depend on the number of marked samples.

Description

Method and device for identifying ground object by fusion of hyperspectral and laser radar multimode images and spatial spectrum

Technical Field

The application relates to the technical field of remote sensing image processing, in particular to a hyperspectral and laser radar multimode image space spectrum fusion ground object identification method and device.

Background

With the high-speed development of the remote sensing imaging technology, available multi-mode remote sensing data are increased, and further improvement of the classification performance of the remote sensing images is promoted. The remote sensing hyperspectral image not only can represent the space structure information of the ground object, but also has the unique advantages of high spectral resolution and map integration, and is widely applied to the ground coverage classification task. The laser radar (LiDAR) data is constructed based on the reflected light information, can measure the elevation information of the target, has the characteristic of strong covering distance and long penetrating capacity, and is less influenced by environmental factors such as weather, shielding and the like. The remote sensing hyperspectral image and the ground feature information contained in the LiDAR data have complementary properties, so that the hyperspectral image and the LiDAR data can be jointly utilized to further improve the ground feature recognition accuracy, and become research hotspots in the field.

However, with the opportunity for large-scale multimodal remote sensing images to be available for research, the object recognition task also faces new problems and challenges. First, the heterogeneity of multimodal data exacerbates the difficulty of information fusion. Single modality data has its own drawbacks that need to be overcome. If the remote sensing hyperspectral image has the characteristics of high spectrum dimension and high similarity of spectrum information of adjacent wave bands, the redundancy of data is caused, and the problem is easily caused to be in a dimension disaster, namely the recognition effect is negatively influenced along with the increase of the dimension of the data. The laser radar data only contains limited spatial information, and cannot efficiently reflect the types of ground objects. In addition, the remote sensing hyperspectral and LiDAR data reflect the properties of the ground object in different aspects, and in the modeling process, the difference of heterogeneous data needs to be comprehensively considered, and then the fusion of multi-mode complementary information is further considered. Secondly, the field of remote sensing image recognition has long faced the difficult problem of 'small samples', and the performance improvement of the joint recognition task of the remote sensing hyperspectral and LiDAR is limited by the scarcity of the number of marked samples. Therefore, how to fully integrate the spatial spectrum information of the hyperspectral image and the elevation information of the LiDAR and realize high-precision feature recognition with low labeling sample dependence is a leading-edge challenge and difficult problem to be solved in the field of remote sensing image recognition.

In recent years, the deep learning model is widely applied to the field of remote sensing hyperspectral and LiDAR image combined recognition due to the strong feature extraction capability, and good performance is obtained. However, for the task of multi-modal image fusion and recognition, how to deeply mine and jointly use the complementary information of the multi-modal images becomes a difficult problem to be solved. Zhao et al in "Joint Classification of Hyperspectral and LiDAR Data Using Hierarchical Random Walk and Deep CNN Architecture, IEEE Transactions on Geoscience and Remote Sensing, 2020, DOI: 10.1109/TGRS.2020.2982064." firstly designed a double-branched Convolutional Neural Network (CNN) structure to extract spatial and spectral features of hyperspectral images, secondly proposed a pixel similarity branch dedicated to analyzing elevation information in LiDAR, and finally adopted a hierarchical random walk layer to jointly optimize global priori information from the two-channel CNN and local similarity information from the pixel similarity branch, enhancing the spatial consistency of the deep layer of the network. Hong et al in the literature "Deep Encoder-Decoder Networks for Classification of Hyperspectral and LiDAR Data, IEEE Geoscience and Remote Sensing Letters, 2022, DOI 10.1109/lgrs.2020.3017414" propose a system based on an Encoder-decoder network architecture, in which feature information of hyperspectral and LiDAR images is extracted and fused in an Encoder, and finally the fused encoded features are reconstructed by a decoder, thus achieving more compact information fusion and more efficient information transfer. Considering the problem of limited receptive field of a deep learning algorithm, a transducer architecture is introduced in the document "Multimodal Fusion Transformer for Remote Sensing Image classification, arXiv e-prints, 2022, DOI: 10.48550/arxiv.2203.16952" by Swalpa Kumar Roy et al to globally model a hyperspectral image, and feature embedding of LiDAR is used as an external class mark, so that multimode information interaction is improved through a attention mechanism, and the generalization capability of the model is enhanced. Hang et al in documents "Classification of Hyperspectral and LiDAR Data Using Coupled CNNs, IEEE Transactions on Geoscience and Remote Sensing, 2020, DOI: 10.1109/TGRS.2020.2969024," propose a parameter sharing strategy to couple two CNNs together, one CNN for learning spatial-spectral features from hyperspectral images and the other CNN for capturing elevation information from LiDARs, improving recognition accuracy and model efficiency by the joint utilization of feature level fusion and decision level fusion. However, the above classification methods have the defects that the unsupervised information in unlabeled samples is not fully explored and utilized, and meanwhile, the mining of the multi-mode deep fusion features is not enough, so that the performance of the model is highly dependent on the number of labeled samples, and the comprehensive recognition performance is poor.

Disclosure of Invention

The application aims to solve the technical problems: aiming at the problems in the prior art, the application provides a hyperspectral and laser radar multimode image space spectrum fusion ground object identification method and device, which aim to solve the problem of heterogeneous multisource remote sensing data fusion between hyperspectral and laser radar and other multimode remote sensing images, realize the excavation of multistage space spectrum fusion characteristics, ensure that the performance of a classifier network does not need to highly depend on the number of marked samples, thereby improving the training efficiency of the classifier network and the identification precision of the classifier network.

In order to solve the technical problems, the application adopts the following technical scheme:

a hyperspectral and laser radar multimode image space spectrum fusion ground object identification method comprises the following steps:

s101, according to hyperspectral image T _h Generating hyperspectral spatial image block X _h And spectral vector V _h The method comprises the steps of carrying out a first treatment on the surface of the From radar image T _l Generating a radar aerial image block X _l ；

S102, utilizing a multistage space-spectrum fusion encoder network to perform image block X on hyperspectral space _h Spectral vector V _h Radar aerial image block X _l Proceeding withSpace-spectrum characteristic level fusion to obtain multi-level fusion coding characteristic F ^k The method comprises the steps of carrying out a first treatment on the surface of the The multi-stage space-spectrum fusion encoder network consists of a dual-branch space feature encoder branch and a spectrum feature encoder branch, wherein the dual-branch space feature encoder branch comprises two space feature encoder branches with output weighted summation, the space feature encoder branch comprises a feature embedding layer and a plurality of cascaded space feature encoders, the space feature encoders of the two space feature encoder branches are identical in structure, the weights of the space feature encoders of the same stage are shared, and the space feature encoders are respectively used for hyperspectral space image blocks X _h Radar aerial image block X _l As input, the spectral signature encoder branch comprises a signature embedding layer and a plurality of spectral signature encoders in cascade and in a spectral vector V _h As input;

s103, fusing the multi-level fusion coding characteristic F ^k Classifying by using classifier network to obtain prediction probability P ^k And taking the ground object category corresponding to the maximum prediction probability as a recognition result to be output.

Optionally, the spatial feature encoder is composed of a depth separable convolution module and a downsampling module, wherein the depth separable convolution module comprises a channel-by-channel convolution layer, a layer normalization layer, a point-by-point convolution layer, a gaussian error linear unit activation function layer and a full connection layer which are sequentially connected, and the downsampling module comprises a layer normalization layer and a two-dimensional convolution layer with a convolution kernel size of 2×2.

Optionally, the spectral feature encoder is composed of a depth separable convolution module and a channel transformation module, wherein the depth separable convolution module comprises a channel-by-channel convolution layer, a layer normalization layer, a point-by-point convolution layer, a gaussian error linear unit activation function layer and a full connection layer which are sequentially connected, and the channel transformation module comprises a layer normalization layer and a two-dimensional convolution layer with a convolution kernel size of 1×1.

Optionally, the feature embedding layers of the dual-branch spatial feature encoder branch and the spatial feature encoder branch are composed of a two-dimensional convolution module with a convolution kernel size of 1×1 and a layer normalization module.

Optionally, provided thatThe classifier network is a multi-stage joint classification head comprising cascade connectionkA stage classification head, a layer depth separable convolution module, a global average pooling module, a full connection module and a Softmax normalization module, wherein the stage classification head comprises a layer depth separable convolution module, a global average pooling module, a full connection module and a Softmax normalization modulekEach stage of classification head in the stage classification heads consists of a depth separable convolution module and a channel transformation module, wherein the depth separable convolution module comprises a channel-by-channel convolution layer, a layer normalization layer, a point-by-point convolution layer, a Gaussian error linear unit activation function layer and a full connection layer which are sequentially connected, and the channel transformation module comprises a layer normalization layer and a two-dimensional convolution layer with the convolution kernel size of 1 multiplied by 1.

Optionally, step S101 is preceded by a step of obtaining a secondary hyperspectral image T _h And radar image T _l Constructing a training data set by the marked samples and the unmarked samples, training a multi-stage space-spectrum fusion encoder network and a classifier network by the training data set in a semi-supervised learning mode, wherein the training comprises the steps of adopting a depth mapping head to carry out multi-stage fusion encoding on the characteristic F ^k Performing feature mapping to obtain depth mapping features f, and performing unlabeled sample comparison learning on the depth mapping features f to calculate unlabeled sample comparison lossThe loss function adopted in training is as follows:

，

in the above-mentioned method, the step of,for loss function->To label sample class loss->Is a weight coefficient, and has:

，

wherein,representing the hierarchy of coding features->For the number of marked samples in a batch,ca true label representing each of the categories of features,Crepresenting the total number of the ground object categories->As a sign function, if the sampleiIs equal to (1)cTime sign function->Taking 1, otherwise sign function->Taking 0->Represent the firstkPrediction of stage coding featureiEqual tocProbability of (2);and->The comparison loss of the simple sample and the comparison loss of the difficult-to-separate sample are respectively; />As simple samples in a batchQuantity of->Is the firsti、jMask between simple samples, ++>Is the firsti、jDistance between simple samples, < >>Is super-parameter (herba Cinchi Oleracei)>For the number of difficult-to-separate samples in a batch, < > for each sample in a batch>Is the firsti、jMask between refractory samples, +.>Is the firsti、jThe distance between the simple samples, wherein the simple samples and the difficult-to-separate samples are the prediction probability P of passing through unlabeled samples ^k The depth mapping feature f is divided based on a probability distribution criterion and a feature learning criterion.

Optionally, the depth mapping head comprises a cascade connection multi-stage classification head, a layer of depth separable convolution module and a global average pooling module, wherein each stage of classification head in the multi-stage classification head consists of a depth separable convolution module and a channel transformation module, the depth separable convolution module comprises a channel-by-channel convolution layer, a layer normalization layer, a point-by-point convolution layer, a gaussian error linear unit activation function layer and a full connection layer which are sequentially connected, and the channel transformation module comprises a layer normalization layer and a two-dimensional convolution layer with a convolution kernel size of 1×1.

Optionally, the step of generating a hyperspectral image T is based on _h And radar image T _l The construction of the training data set from the marked samples and the unmarked samples comprises: from hyperspectral image T _h And radar image T _l Is selected randomlyNLabeling samples and aNA sample not marked, whereina is a multiple parameter greater than 1N=M×CWhereinMFor the number of feature samples of each type randomly selected from the annotation samples,Cis the total number of the ground object categories; surrounding hyperspectral image T _h And the center pixels of the marked sample and the unmarked sample are used for taking a hyperspectral space image block X with a square cross section based on the space size of the specified image block _h The method comprises the steps of carrying out a first treatment on the surface of the Surrounding radar image T _l Center pixels of marked samples and unmarked samples of the image block are selected based on the spatial dimension of the specified image block, and a radar spatial image block X with a square cross section is taken _l The method comprises the steps of carrying out a first treatment on the surface of the Along hyperspectral image T _h Taking spectral vectors V corresponding to pixel points of marked samples and unmarked samples _h The space size is 1 multiplied by 1, and the spectral properties of the ground object are reflected; hyperspectral space image block X according to marked sample and unmarked sample _h Spectral vector V _h Radar aerial image block X _l A training dataset is constructed.

In addition, the application also provides a hyperspectral and laser radar multimode image space spectrum fusion ground object recognition system which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the hyperspectral and laser radar multimode image space spectrum fusion ground object recognition method.

In addition, the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program is used for being programmed or configured by a microprocessor to execute the hyperspectral and laser radar multimode image space spectrum fusion ground object identification method.

Compared with the prior art, the application has the following advantages: the application includes utilizing a multi-level spatial-spectral fusion encoder network to encode hyperspectral spatial image blocks X _h Spectral vector V _h Radar aerial image block X _l Performing space-spectrum characteristic level fusion to obtain a multi-level fusion coding characteristic F ^k The method comprises the steps of carrying out a first treatment on the surface of the The multi-stage spatial-spectral fusion encoder network consists of a dual-branch spatial signature encoder branch and a spectral signature encoder branch of output multiplication, the dual-branch spatial signature encoder branch comprising an outputThe two spatial feature encoder branches of weighted summation comprise a feature embedding layer and a plurality of cascaded spatial feature encoders, the spatial feature encoders of the two spatial feature encoder branches have the same structure and share the weight of the spatial feature encoder at the same level, and the spectral feature encoder branches comprise the feature embedding layer and a plurality of cascaded spectral feature encoders, so that the problem of heterogeneous multi-source remote sensing data fusion between multimode remote sensing images such as hyperspectral images and laser radars is solved, the mining of multi-level spatial spectrum fusion features is realized, and the multi-level fusion coding feature F can be obtained by utilizing a classifier network ^k Classification is carried out to obtain the prediction probability P ^k The identification of the ground object samples is realized, so that the performance of the classifier network does not need to highly depend on the number of labeled samples, the training efficiency of the classifier network is improved, and the identification precision of the classifier network is improved. The method can efficiently extract the space-spectrum joint characteristic representation of the multimode remote sensing images such as the remote sensing hyperspectrum and the laser radar, fully excavate the semantic information of the marked sample and the unmarked sample, and realize the multistage empty spectrum characteristic high-precision fusion recognition of the multimode remote sensing images such as the remote sensing hyperspectrum and the laser radar under the difficult problem of 'small sample'.

Drawings

Fig. 1 is a schematic diagram of a network structure according to a method of an embodiment of the present application.

Fig. 2 is a schematic diagram of a basic flow of a method according to an embodiment of the present application.

FIG. 3 is a graph (%) showing the overall accuracy of each method for different numbers of training samples of each type in the example of the present application.

FIG. 4 is a graph of average accuracy versus (%) for various methods for different numbers of training samples of each type in an embodiment of the present application.

FIG. 5 is a graph (%) showing the comparison of Kappa coefficients for each method for different numbers of training samples of each type in the examples of the present application.

Detailed Description

Referring to fig. 1, the method for identifying the hyperspectral and laser radar multimode image spatial spectrum fusion ground object in the embodiment comprises the following steps:

S102, utilizing a multistage space-spectrum fusion encoder network to perform image block X on hyperspectral space _h Spectral vector V _h Radar aerial image block X _l Performing space-spectrum characteristic level fusion to obtain a multi-level fusion coding characteristic F ^k The method comprises the steps of carrying out a first treatment on the surface of the The multi-stage spatial-spectral fusion encoder network consists of two branches of output-multiplied spatial signature encoders and a branch of spectral signature encoders, the branch of two branches of output-weighted summation of spatial signature encoders comprising a signature embedding layer and a plurality of concatenated spatial signature encoders (3 x in fig. 1, three spatial signature encoders are concatenated, sok=3, multilevel fusion coding feature F ^k Representable multi-level fusion coding feature F ³ ) And the spatial feature encoders of the two spatial feature encoder branches have the same structure, and the weights of the spatial feature encoders of the same level are shared and respectively used as hyperspectral spatial image blocks X _h Radar aerial image block X _l As input, the spectral signature encoder branch comprises a signature embedding layer and a plurality of spectral signature encoders in cascade and in a spectral vector V _h As input;

s103, fusing the multi-level fusion coding characteristic F ^k Classifying by using classifier network to obtain prediction probability P ^k And the ground object category corresponding to the maximum prediction probability is taken as the recognition result to be output, which can be expressed as:

，

in the above-mentioned method, the step of,is the firstiClassification result of individual samples,/->Is the firstiIndividual samplesBelonging to the firstcThe prediction probability of the individual ground object category,Cis the total number of the ground object categories.

As shown in fig. 1, the spatial feature encoder of this embodiment is composed of a depth separable convolution module and a downsampling module, where the depth separable convolution module includes a channel-by-channel convolution layer, a layer normalization layer, a point-by-point convolution layer, a gaussian error linear unit activation function layer, and a full connection layer that are sequentially connected, and the downsampling module includes a layer normalization layer and a two-dimensional convolution layer with a convolution kernel size of 2×2.

As shown in fig. 1, the spectral feature encoder of the present embodiment is composed of a depth separable convolution module and a channel transform module, the depth separable convolution module includes a channel-by-channel convolution layer (DConv 7×7), a layer normalization layer (LayerNorm), a point-by-point convolution layer (conv1×1), a gaussian error linear unit activation function layer (GELU), and a full-connection layer (conv1×1) which are sequentially connected, and the channel transform module includes a layer normalization layer (LayerNorm) and a two-dimensional convolution layer (conv1×1) with a convolution kernel size of 1×1.

As shown in fig. 1, the dual-branch spatial feature encoder branch and the feature embedding layer of the spatial feature encoder branch of the present embodiment are composed of a two-dimensional convolution module (conv1×1) with a convolution kernel size of 1×1 and a layer normalization module (LayerNorm).

As shown in fig. 1, the classifier network of the present embodiment is a multi-stage joint classification header including cascade connectionskA stage classification head (1, 2, 3, etc.), a layer depth separable convolution module, a global average pooling module (Avgpool), a full connection module (Conv1×1), and a Softmax normalization module, whichkEach stage of classification head in the stage classification heads consists of a depth separable convolution module and a channel transformation module, wherein the depth separable convolution module comprises a channel-by-channel convolution layer (DConv7×7), a layer normalization layer (LayerNorm), a point-by-point convolution layer (Conv1×1), a Gaussian error linear unit activation function layer (GELU) and a full connection layer (Conv1×1) which are connected in sequence, and the channel transformation module comprises a layer normalization layer (LayerNorm) and a two-dimensional convolution layer (Conv1×1) with a convolution kernel size of 1×1.

In this embodiment, the multi-level spatial-spectral fusion encoder network is composed of a dual-branch spatial signature encoder branch and a spectral signature encoder branch with output multiplication, where the output multiplication has a functional expression:

，

in the above-mentioned method, the step of,for multi-level spatial fusion coding feature of dual-branch spatial feature encoder branch output,/for the dual-branch spatial feature encoder branch output>Multilevel spectral coding features for spectral feature encoder branch outputs,/->Representing tensor multiplication.

In this embodiment, the dual-branch spatial signature encoder leg includes two spatial signature encoder legs that output a weighted sum, where the weighted sum has a functional expression:

，

in the above-mentioned method, the step of,is an adaptive weight coefficient +.>And->Hyperspectral spatial image block X of two spatial signature encoder branches respectively _h And radar aerial image block X _l Spatial coding features obtained by spatial feature encoder branch, < >>Is hyperspectralSpatial coding features,/->Features are spatially encoded for lidar.

As shown in fig. 2, the method further comprises the following step S101 of the embodiment _h And radar image T _l Constructing a training data set by the marked samples and the unmarked samples, training a multi-stage space-spectrum fusion encoder network and a classifier network by the training data set in a semi-supervised learning mode, wherein the training comprises the steps of adopting a depth mapping head to carry out multi-stage fusion encoding on the characteristic F ^k Performing feature mapping to obtain depth mapping features f, and performing unlabeled sample comparison learning on the depth mapping features f to calculate unlabeled sample comparison lossThe loss function adopted in training is as follows:

，

in the above-mentioned method, the step of,for loss function->To label sample class loss->For the weight coefficient (for balancing the contribution degree of the classification task and the contrast learning task, the value can be empirically chosen, for example +.>=0.1), and has:

，

wherein,representing the hierarchy of coding features->For the number of marked samples in a batch,ca true label representing each of the categories of features,Crepresenting the total number of the ground object categories->As a sign function, if the sampleiIs equal to (1)cTime sign function->Taking 1, otherwise sign function->Taking 0->Represent the firstkPrediction samples derived from stage-encoded featuresiEqual tocProbability of (2); />And->The comparison loss of the simple sample and the comparison loss of the difficult-to-separate sample are respectively; />For the number of simple samples in a batch, +.>Is the firsti、jMask between simple samples, ++>Is the firsti、jThe distance between the simple samples can be expressed as:

，

is a super-parameter (can be empirically taken, for example +.>=2），/>For the number of difficult-to-separate samples in a batch, < > for each sample in a batch>Is the firsti、jMask between refractory samples, +.>Is the firsti、jThe distance between the simple samples can be expressed as:

，

wherein the simple sample and the difficult-to-separate sample are the prediction probability P passing through unlabeled samples ^k The depth mapping feature f is divided based on a probability distribution criterion and a feature learning criterion. Aiming at the heterogeneous multisource remote sensing data fusion problem and the depth model labeling sample quantity dependence problem, the method combines advanced machine learning paradigms such as deep learning, contrast learning and the like, extracts multimode multistage empty spectrum fusion characteristics with more compact structure and more identification, explores an organic combination mode of monitoring information and non-monitoring information, provides a semi-monitoring contrast learning accurate identification framework, and realizes multistage empty spectrum fusion characteristics of multimode remote sensing images oriented to remote sensing hyperspectrum, laser radar and the likeAnd (5) high-precision fusion identification of the spectral features.

In this embodiment, according to the prediction probability P of unlabeled samples ^k When the depth mapping feature f is divided into a simple sample and a difficult-to-divide sample, the feature corresponding to the simple sample is the simple sample feature f ^easy The corresponding characteristic of the difficult-to-separate sample is the characteristic f of the difficult-to-separate sample ^hard The partitioning basis includes probability distribution criteria and feature learning criteria.

Simple sample features meeting probability distribution criteriaThe expression of (2) is:

，

in the above-mentioned method, the step of,represented by the firstiDeep coding features F of individual samples ³ (i.e. F ^k ，k=3) the resulting prediction probability +.>Maximum value of>And->Respectively->Mean and standard deviation of (2);

simple sample features meeting feature learning criteriaThe expression of (2) is:

，

namely by the firstiCoding features F of individual samples ¹ (i.eF ^k ，kThe resulting predictive probability =1)And by the firstiCoding features F of individual samples ² (i.e. F ^k ，k=2) the resulting prediction probability +.>The corresponding pseudo tags are the same when the maximum value is obtained; simple sample feature satisfying both probability distribution criterion and feature learning criterion>Is->And->Is expressed as:

，

the rest of unlabeled sample features are divided into refractory sample features f ^hard 。

Calculating a simple sample feature f ^easy Corresponding mask matrix W ^easy The functional expression of (2) is:

，

in the above-mentioned method, the step of,for masking matrix W ^easy The i-th row j column element,/>And->Respectively the firstiAnd (d)jPseudo tag of a simple sample, the pseudo tag refers to the prediction probability P ³ (i.eP ^k ，k=3) the category corresponding to the maximum value in,i, j/>when (when)At the time of the firstjThe simple sample is regarded as the firstiPositive samples of the simple samples, and vice versa, are considered negative samples;

calculating the characteristic f of the difficult-to-separate sample ^hard Corresponding mask matrix W ^hard The functional expression of (2) is:

，

in the above-mentioned method, the step of,for masking matrix W ^hard Is the first of (2)iRow of linesjColumn element (s)/(S)>And->Respectively the firstiClass labels and the 1 st probability value corresponding to the 1 st probability value of each refractory sample predictionjPrediction of difficult-to-separate samplessClass labels corresponding to large probability values +.>The expression of the calculation function of (c) is:

，

wherein the method comprises the steps ofIs according to->And a preset weight dictionary. For->Performing descending order to obtain ∈>={/>,/>,/>}，/>Representation->The first of (3)sThe bit probability value is used to determine the probability,s/>calculate->The expression of (2) is +.>，/>The value of (2) can be based on->The method is characterized by dynamically and adaptively adjusting, wherein the value range is-1, positive and negative samples of uncertainty among unmarked samples difficult to separate are reflected, and more discriminative feature learning is encouraged.

As shown in fig. 1, the depth mapping head in this embodiment includes cascade-connected multi-stage classification heads (3×infig. 1, namely three-stage classification heads), and a layer of depth separable convolution module and a global averaging pooling module, where each stage of classification head in the multi-stage classification heads is composed of a depth separable convolution module and a channel transform module, the depth separable convolution module includes a channel-by-channel convolution layer, a layer normalization layer, a point-by-point convolution layer, a gaussian error linear unit activation function layer, and a fully-connected layer, and the channel transform module includes a layer normalization layer and a two-dimensional convolution layer with a convolution kernel size of 1×1.

In the present embodiment, the image is based on the secondary hyperspectral image T _h And radar image T _l The construction of the training data set from the marked samples and the unmarked samples comprises: from hyperspectral image T _h And radar image T _l Is selected randomlyNLabeling samples and aNAn unlabeled sample, where a is a multiple parameter greater than 1 (in this example a value of 10),Nin order to label the number of samples,N=M×CwhereinMTo randomly select the number of feature samples of each type (20 in this embodiment) from the labeled samples,Cis the total number of the ground object categories; surrounding hyperspectral image T _h And the center pixels of the marked sample and the unmarked sample, taking a hyperspectral spatial image block X with a square cross section based on the spatial dimension of the specified image block (the value is 32X 32 in the embodiment) _h The method comprises the steps of carrying out a first treatment on the surface of the Surrounding radar image T _l The center pixels of the marked sample and the unmarked sample of (a) are selected to be a radar spatial image block X with a square cross section based on the spatial dimension of the specified image block (the value is 32X 32 in the embodiment) _l The method comprises the steps of carrying out a first treatment on the surface of the Along hyperspectral image T _h Taking spectral vectors V corresponding to pixel points of marked samples and unmarked samples _h The space size is 1 multiplied by 1, and the spectral properties of the ground object are reflected; hyperspectral space image block X according to marked sample and unmarked sample _h Spectral vector V _h Radar aerial image block X _l A training dataset is constructed.

In order to verify the effectiveness of the implementation network (abbreviated as SSCL) method using semi-supervised contrast learning in the method of this embodiment, the training and testing of the model is performed using the public dataset MUUFL, and compared with the other four existing methods. The MUUFL dataset consisted of a hyperspectral image and corresponding LiDAR data, both modality data having a spatial size of 325 x 220 pixels. The hyperspectral image contains a total of 64 spectral bands,the wavelength range is from 380 to 1050nm, and the method comprises abundant spatial spectrum information; liDAR data contains 2 channels with 1060nm wavelength and feature elevation information. The MUUFL dataset provided 53687 annotation samples, containing 11 ground object types in total. Optionally, in this embodiment, 11 types of land features are randomly selectedMThe number of labeled samples=20 was used as training samples, while unlabeled samples 10 times the total number of labeled training samples were randomly selected for training, and the remaining labeled samples were used as tests. The Adam optimizer was used to optimize the network parameters, batch size (batch size) was set to 64, initial learning rate was set to 0.0005, and training time (epoch) was 100. The final experimental results were taken as the average of ten runs. Four existing hyperspectral and laser radar image joint recognition methods are respectively adopted, and HRWN (layered random walk network), endNet (encoder-decoder network), MFT (multi-mode fusion transducer) and CoupledNN (coupled convolutional neural network) are compared with an execution network (SSCL) in the embodiment, so that parameters of all comparison methods are optimized. For fair comparison, all comparison methods used the same labeling training samples and test samples, and specific results when 20 training samples were taken per class are shown in table 1.

Table 1: the execution network (SSCL) of this example compares the results (%) with the four existing methods.

Referring to table 1, three kinds of evaluation indexes are used in this example, namely, total accuracy (OA), average Accuracy (AA) and Kappa coefficient, and the three kinds of evaluation indexes finally take the average value of 10 recognition results. As can be seen in connection with table 1 (best results for each row have been marked in bold), the execution network (SSCL) in this embodiment is the best result among the three evaluation indexes. When 20 training samples are taken for each class, the OA values obtained by the execution network (SSCL) in this embodiment are respectively 7.58%, 7.17%, 3.87% and 2.98% higher than those obtained by HRWN, endNet, MFT and CoupledCNN, and at the same time, for six of eleven ground features, the execution in this embodimentThe highest recognition accuracy is achieved by the network (SSCL), and the validity of the execution network (SSCL) in the present embodiment is verified. Further, in order to verify the robustness of the execution network (SSCL) in the present embodiment, the number of training samples per type is setMAnd the range is 20-100, and the step length is 20, and independent repeated tests are carried out. For the purpose of fair comparison, the three evaluation indexes take the average value of 10 recognition results. Referring to fig. 3 to 5, when the number of training samples is counted for each typeMWhen increasing from 20 to 100, the overall recognition accuracy, average accuracy and Kappa coefficient of the execution network (SSCL) and other hyperspectral and lidar image recognition methods in the present embodiment follow the number of training samples of each typeMIs increased by the increase of (2). The execution network (SSCL) in this embodiment is superior to HRWN, endNet, MFT and couppledcnn in all three objective indicators. Especially when training the number of samples per classMSmaller, the implementation network (SSCL) in the embodiment has more obvious advantages on three indexes, and can effectively relieve the dependence of the depth model on the number of marked samples.

In summary, according to the hyperspectral and laser radar multimode image space spectrum fusion ground object identification method, aiming at the remote sensing hyperspectral image with rich spectrum information and the laser radar data containing elevation complementary information, a multistage space-spectrum fusion encoder is designed, so that the efficient fusion extraction of the multi-mode multi-scale space-spectrum characteristics can be realized, and the problem of fusion of heterogeneous multi-source remote sensing data is solved; in addition, the hyperspectral and laser radar multimode image empty spectrum fusion ground object recognition method fully excavates the non-supervision information in the unlabeled sample by combining a contrast learning strategy, and improves the discrimination and semantic distinction of multimode deep fusion features by semi-supervision learning of the labeled sample and the unlabeled sample information, so that the improvement of the overall recognition performance is realized, the problem of the dependence of the depth model labeling sample quantity is solved, and finally the multi-level empty spectrum feature high-precision fusion recognition of multimode remote sensing images such as remote sensing hyperspectral and laser radar is realized. According to the method for identifying the multi-mode remote sensing image multi-level spatial spectrum fusion ground object under the 'small sample' difficult problem, spatial-spectral joint characteristic representation of the multi-mode remote sensing image such as the remote sensing hyperspectrum and the laser radar can be extracted efficiently, semantic information of marked samples and unmarked samples is fully mined, and multi-level spatial spectrum characteristic fusion identification of the multi-mode remote sensing image under the 'small sample' difficult problem is achieved.

In addition, the embodiment also provides a hyperspectral and laser radar multimode image space spectrum fusion ground object recognition system which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the hyperspectral and laser radar multimode image space spectrum fusion ground object recognition method. In addition, the embodiment also provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and the computer program is used for being programmed or configured by a microprocessor to execute the hyperspectral and laser radar multimode image space spectrum fusion ground object identification method.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present application, and the protection scope of the present application is not limited to the above examples, and all technical solutions belonging to the concept of the present application belong to the protection scope of the present application. It should be noted that modifications and adaptations to the present application may occur to one skilled in the art without departing from the principles of the present application and are intended to be within the scope of the present application.

Claims

1. A hyperspectral and laser radar multimode image space spectrum fusion ground object identification method is characterized by comprising the following steps:

S102, utilizing a multistage space-spectrum fusion encoder network to perform image block X on hyperspectral space _h Spectral vector V _h Radar aerial image block X _l Performing space-spectrum characteristic level fusion to obtain a multi-level fusion coding characteristic F ^k The method comprises the steps of carrying out a first treatment on the surface of the The multi-stage spatial-spectral fusion encoder network consists of a dual-branch spatial feature encoder branch and a spectral feature encoder branch, wherein the dual-branch spatial feature encoder branch comprises two spatial feature encoder branches of output weighted summation, and the spatial feature encoder branch comprises a feature embedding layer and a plurality of cascaded spatial featuresThe encoders, and the spatial feature encoders of the two spatial feature encoder branches have the same structure, and the weights of the spatial feature encoders of the same level are shared and respectively used as hyperspectral spatial image blocks X _h Radar aerial image block X _l As input, the spectral signature encoder branch comprises a signature embedding layer and a plurality of spectral signature encoders in cascade and in a spectral vector V _h As input;

2. The method for identifying the hyperspectral and laser radar multimode image spatial spectrum fusion ground object according to claim 1, wherein the spatial feature encoder comprises a depth separable convolution module and a downsampling module, the depth separable convolution module comprises a channel-by-channel convolution layer, a layer normalization layer, a point-by-point convolution layer, a Gaussian error linear unit activation function layer and a full connection layer which are sequentially connected, and the downsampling module comprises a layer normalization layer and a two-dimensional convolution layer with a convolution kernel size of 2×2.

3. The method for identifying the hyperspectral and laser radar multimode image spatial spectrum fusion ground object according to claim 2, wherein the spectral feature encoder comprises a depth separable convolution module and a channel transformation module, the depth separable convolution module comprises a channel-by-channel convolution layer, a layer normalization layer, a point-by-point convolution layer, a Gaussian error linear unit activation function layer and a full connection layer which are sequentially connected, and the channel transformation module comprises a layer normalization layer and a two-dimensional convolution layer with a convolution kernel size of 1×1.

4. The method for identifying the hyperspectral and laser radar multimode image spatial spectrum fusion ground object according to claim 3, wherein the characteristic embedding layers of the double-branch spatial characteristic encoder branch and the spatial characteristic encoder branch consist of a two-dimensional convolution module with a convolution kernel size of 1×1 and a layer normalization module.

5. The method for identifying hyperspectral and laser radar multimode image space spectrum fusion ground object according to claim 1, wherein the classifier network is a multi-stage joint classification head, and the multi-stage joint classification head comprises cascade connectionkA stage classification head, a layer depth separable convolution module, a global average pooling module, a full connection module and a Softmax normalization module, wherein the stage classification head comprises a layer depth separable convolution module, a global average pooling module, a full connection module and a Softmax normalization modulekEach stage of classification head in the stage classification heads consists of a depth separable convolution module and a channel transformation module, wherein the depth separable convolution module comprises a channel-by-channel convolution layer, a layer normalization layer, a point-by-point convolution layer, a Gaussian error linear unit activation function layer and a full connection layer which are sequentially connected, and the channel transformation module comprises a layer normalization layer and a two-dimensional convolution layer with the convolution kernel size of 1 multiplied by 1.

6. The method for identifying a hyperspectral and lidar multimode image space-spectrum fusion ground object according to claim 1, wherein the step S101 is preceded by the step of based on the hyperspectral image T _h And radar image T _l Constructing a training data set by the marked samples and the unmarked samples, training a multi-stage space-spectrum fusion encoder network and a classifier network by the training data set in a semi-supervised learning mode, wherein the training comprises the steps of adopting a depth mapping head to carry out multi-stage fusion encoding on the characteristic F ^k Performing feature mapping to obtain depth mapping features f, and performing unlabeled sample comparison learning on the depth mapping features f to calculate unlabeled sample comparison lossThe loss function adopted in training is as follows:

，

wherein,representing the hierarchy of coding features->For the number of marked samples in a batch,ca true label representing each of the categories of features,Crepresenting the total number of the ground object categories->As a sign function, if the sampleiIs equal to (1)cTime sign function->Taking 1, otherwise sign function->Taking 0->Represent the firstkPrediction of stage coding featureiEqual tocProbability of (2); />Andthe comparison loss of the simple sample and the comparison loss of the difficult-to-separate sample are respectively; />For a simple number of samples in a batch,is the firsti、jMask between simple samples, ++>Is the firsti、jDistance between simple samples, < >>Is the parameter of the ultrasonic wave to be used as the ultrasonic wave,for the number of difficult-to-separate samples in a batch, < > for each sample in a batch>Is the firsti、jMask between refractory samples, +.>Is the firsti、jThe distance between the simple samples, wherein the simple samples and the difficult-to-separate samples are the prediction probability P of passing through unlabeled samples ^k The depth mapping feature f is divided based on a probability distribution criterion and a feature learning criterion.

7. The method for identifying the hyperspectral and laser radar multimode image spatial spectrum fusion ground object according to claim 6, wherein the depth mapping head comprises a cascade connection multi-stage classification head, a layer of depth separable convolution module and a global average pooling module, each stage of classification head in the multi-stage classification head consists of the depth separable convolution module and a channel transformation module, the depth separable convolution module comprises a channel-by-channel convolution layer, a layer normalization layer, a point-by-point convolution layer, a Gaussian error linear unit activation function layer and a full connection layer which are sequentially connected, and the channel transformation module comprises a layer normalization layer and a two-dimensional convolution layer with a convolution kernel size of 1×1.

8. The method for identifying the ground object based on the hyperspectral and laser radar multimode image space spectrum fusion according to claim 6, wherein the method is based on a hyperspectral image T _h And radar image T _l The construction of the training data set from the marked samples and the unmarked samples comprises: from hyperspectral image T _h And radar image T _l Is selected randomlyNLabeling samples and aNA unlabeled sample, wherein a is a multiple parameter greater than 1N=M×CWhereinMFor the number of feature samples of each type randomly selected from the annotation samples,Cis the total number of the ground object categories; surrounding hyperspectral image T _h And the center pixels of the marked sample and the unmarked sample are used for taking a hyperspectral space image block X with a square cross section based on the space size of the specified image block _h The method comprises the steps of carrying out a first treatment on the surface of the Surrounding radar image T _l Center pixels of marked samples and unmarked samples of the image block are selected based on the spatial dimension of the specified image block, and a radar spatial image block X with a square cross section is taken _l The method comprises the steps of carrying out a first treatment on the surface of the Along hyperspectral image T _h Taking spectral vectors V corresponding to pixel points of marked samples and unmarked samples _h The space size is 1 multiplied by 1, and the spectral properties of the ground object are reflected; hyperspectral space image block X according to marked sample and unmarked sample _h Spectral vector V _h Radar aerial image block X _l A training dataset is constructed.

9. A hyperspectral and lidar multimode image spatial spectrum fusion ground object recognition system comprising a microprocessor and a memory which are connected with each other, characterized in that the microprocessor is programmed or configured to perform the hyperspectral and lidar multimode image spatial spectrum fusion ground object recognition method according to any of claims 1 to 8.

10. A computer readable storage medium having a computer program stored therein, wherein the computer program is programmed or configured by a microprocessor to perform the hyperspectral and lidar multimode image space spectrum fusion method of ground object identification method of any of claims 1 to 8.