CN114758170A - Three-branch three-attention mechanism hyperspectral image classification method combined with D3D - Google Patents

Three-branch three-attention mechanism hyperspectral image classification method combined with D3D Download PDF

Info

Publication number
CN114758170A
CN114758170A CN202210344115.7A CN202210344115A CN114758170A CN 114758170 A CN114758170 A CN 114758170A CN 202210344115 A CN202210344115 A CN 202210344115A CN 114758170 A CN114758170 A CN 114758170A
Authority
CN
China
Prior art keywords
branch
model
attention
spatial
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210344115.7A
Other languages
Chinese (zh)
Other versions
CN114758170B (en
Inventor
潘新
唐婷
刘江平
罗小玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia Agricultural University
Original Assignee
Inner Mongolia Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia Agricultural University filed Critical Inner Mongolia Agricultural University
Priority to CN202210344115.7A priority Critical patent/CN114758170B/en
Publication of CN114758170A publication Critical patent/CN114758170A/en
Application granted granted Critical
Publication of CN114758170B publication Critical patent/CN114758170B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of spectral image classification, and discloses a three-branch three-attention mechanism hyperspectral image classification method combined with D3D, wherein a three-branch three-attention mechanism network D3DTBTA-Net combined with deformable 3D convolution is constructed, and spectral information and spatial information of a hyperspectral image are extracted; the D3DTBTA-Net is divided into a spectrum branch, a space X branch and a space Y branch, and after a spectrum characteristic diagram, a space X characteristic diagram and a space Y characteristic diagram are respectively extracted, the characteristic diagrams extracted from the three branches are fused and classified. The method can automatically classify according to the trained deep learning model without inputting any parameter and consuming a large amount of time cost and labor cost to label data; the characteristics with more discriminative power can be extracted through the deformable 3D convolution and the three-branch three-attention mechanism, so that the classification precision is improved, and the good classification performance can be still kept under the condition that the number of training samples is limited.

Description

Three-branch three-attention mechanism hyperspectral image classification method combined with D3D
Technical Field
The invention belongs to the technical field of spectral image classification, and particularly relates to a three-branch three-attention mechanism hyperspectral image classification method combined with D3D.
Background
At present, a hyperspectral image has nanoscale spectral resolution, can reflect slight differences of different ground objects in spectral dimensions, and greatly improves the resolution and identification capability of the ground objects. The hyperspectral image classification is characterized in that rich information contained in the hyperspectral image is utilized, each pixel is assigned with a unique class label, and the hyperspectral image classification is an important aspect of hyperspectral image application. However, the hyperspectral data has high-dimensional characteristics, the phenomena of same-object different spectrums and same-spectrum foreign matters exist in the hyperspectral image, so that the image data structure is highly nonlinear, and the adjacent wave bands and the adjacent pixels have strong correlation; meanwhile, labels in the hyperspectral images are insufficient, training samples are often limited in number, and dimensionality disasters are prone to occurring. Therefore, how to extract features with strong discriminability and realize accurate classification on the premise of small samples is the key of hyperspectral image classification.
The traditional machine learning method generally only utilizes spectral information and neglects rich spatial information of hyperspectral images, so that the classification precision is low; in addition, a great deal of time and labor costs are required to label data. When the convolutional neural network based on the convolutional neural network or the improved deeper network extracts the features, the sampling position of a convolutional kernel is usually fixed, and the size of a receptive field cannot be dynamically adjusted according to the actual condition of an image, so that the features are better extracted, and the classification performance is limited. The hyperspectral image classification method based on deep learning has low classification precision on small sample data. Therefore, it is necessary to design a new hyperspectral image classification method to overcome the defects in the prior art.
Through the above analysis, the problems and defects of the prior art are as follows:
(1) the traditional machine learning method only utilizes spectral information and ignores rich spatial information of hyperspectral images, so that the classification precision is low, and a large amount of time cost and labor cost are needed to label data.
(2) The hyperspectral image classification method based on deep learning has low classification precision on small sample data.
(3) When the convolutional neural network based on the convolutional neural network or the improved deeper network extracts the features, the sampling position of a convolutional kernel is usually fixed, and the size of a receptive field cannot be dynamically adjusted according to the actual condition of an image, so that the features are better extracted, and the classification performance is limited.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a three-branch three-attention mechanism hyperspectral image classification method combined with D3D, and particularly relates to a D3 DTBTA-Net: a hyperspectral image classification method, a system, a medium, equipment and a terminal of a three-branch three-attention mechanism combined with D3D aim at solving the problem that a hyperspectral image classification algorithm based on small samples in the prior art is low in classification accuracy.
The invention is realized in such a way that a three-branch three-attention mechanism hyperspectral image classification method combined with D3D comprises the following steps: constructing a three-branch three-attention mechanism network D3DTBTA-Net combined with deformable 3D convolution for extracting spectral information and spatial information of the hyperspectral image; the three-branch three-attention mechanism network D3DTBTA-Net respectively extracts a spectral feature map, a spatial X feature map and a spatial Y feature map by utilizing three branches, and performs feature map fusion and classification; the three branches are a spectral branch, a spatial X branch and a spatial Y branch.
Further, the three-branch three-attention hyperspectral image classification method combined with the D3D comprises the following steps of:
step one, generating a data set: generating a set of three-dimensional cubes, and randomly dividing the set of three-dimensional cubes into a training set, a verification set and a test set;
step two, training a model and verifying the model: the training set is used for updating parameters of multiple iterations, and the verification set is used for monitoring the performance of the model and selecting the model which is best trained;
step three, prediction: and selecting a test set to verify the effectiveness of the training model to obtain a classification result.
Further, the data set generation in the first step includes:
selecting a central pixel x from the raw dataiP neighboring pixels, generating a set of three-dimensional cubic blocks
Figure RE-GDA0003666577280000021
If the target pixel is located at the edge of the image, setting the missing neighboring pixel value to zero; in the D3DTBTA-Net algorithm, p is the size of the patch, the size of the patch is set to be 9, and b is the number of the spectral bands; randomly dividing three-dimensional cube set into training set XtrainVerification set XvalAnd test set XtestThe corresponding label vector is divided into Ytrain、YvalAnd YtestOnly spatial information around the target pixel is used.
Further, the training model and the verification model in the second step include:
training a model and verifying the model by using a D3DTBTA-Net algorithm, wherein the D3DTBTA-Net algorithm is divided into three branches: the spectrum branch, the space X branch and the space Y branch are respectively used for capturing a spectrum characteristic diagram, a space X characteristic diagram and a space Y characteristic diagram and fusing the three acquired characteristic diagrams for classification; wherein the spectrum branch comprises a Dense spectrum block and a spectrum attention block; the space X branch comprises a Dense space X block and a space X attention block; the space Y branch contains a Dense space Y block and a space Y attention block.
Further, the following basic modules are used in the second step:
(1) 3D-CNN with BN: the 3D-CNN with BN is a common element in a depth learning model based on 3D cubic blocks; for pm×pm×bmN of sizemFeature map, in a 3D-CNN layer, contains the size of alpham+1×αm+1×dm+1K of (a)m+1A channel of size pm+1×pm+1×bm+1N of (A) to (B)m+1Outputting a characteristic diagram; the ith output of the (m +1) th 3D-CNN layer with BN is calculated as:
Figure RE-GDA0003666577280000031
Figure RE-GDA0003666577280000032
wherein the content of the first and second substances,
Figure RE-GDA0003666577280000033
is the jth input feature map of the (m +1) layer,
Figure RE-GDA0003666577280000034
is the output after BN of m layers; e (-) and Var (-) represent the expectation and variance functions of the input, respectively;
Figure RE-GDA0003666577280000035
and
Figure RE-GDA0003666577280000036
and respectively representing the weight and the bias of the (m +1) layer 3D-CNN, wherein the is a 3D convolution operation, and R (-) is an activation function introduced into a network nonlinear unit.
(2) DenseNet dense ligation: the dense block is the basic unit in DenseNet, and the output of the l-th dense block is calculated as:
xl=Hl[x0,x1,...,xl-1];
wherein HlIs a block containing a convolutional layer, an active layer and a BN layer, x0,x1,...,xl-1Representing the generated dense blocks, the more connections, the more information flows in the dense network; the dense network with L layers has L (L +1)/2 linksThen, the conventional convolutional network with the same number of layers has only L direct connections.
(3) An attention mechanism is as follows:
spectral attention mapping
Figure RE-GDA0003666577280000041
Is directly inputted from the initial
Figure RE-GDA0003666577280000042
Calculating, wherein p × p is the size of the input block, and c represents the number of input channels; a and A are reactedTPerforming matrix multiplication to obtain channel attention mapping
Figure RE-GDA0003666577280000043
Connecting the softmax layer as:
Figure RE-GDA0003666577280000044
wherein x isjiRepresenting the influence of the ith channel on the jth channel; mixing XTThe result of the matrix multiplication with A is transformed into
Figure RE-GDA0003666577280000045
Weighting the reconstructed result through the parameter of the scale alpha, and adding the input A to obtain a final spectrum attention chart
Figure RE-GDA0003666577280000046
Figure RE-GDA0003666577280000047
Where α is initialized to zero and learned step by step. The final plot E contains a weighted sum of all channel features to describe a dependency relationship that enhances the discriminability of the features.
The space attention block: given an input profile
Figure RE-GDA0003666577280000048
Generating new feature maps B and C using two convolution layers, respectively, wherein
Figure RE-GDA0003666577280000049
Deforming B and C into
Figure RE-GDA00036665772800000410
Where n-p × p is the number of pixels; performing matrix multiplication between B and C, adding a softmax layer, and calculating a spatial attention feature map
Figure RE-GDA00036665772800000411
Figure RE-GDA00036665772800000412
Wherein s isjiRepresenting the influence of the ith pixel to the jth pixel; the closer the feature representations of two pixels are, the stronger the correlation between the representative pixels.
Simultaneously sending the initial input features A into the convolutional layer to obtain a new feature map
Figure RE-GDA00036665772800000413
Is deformed into
Figure RE-GDA00036665772800000414
At D and STPerforms matrix multiplication operation therebetween, and the result is transformed into
Figure RE-GDA00036665772800000415
Figure RE-GDA00036665772800000416
Wherein the initial value of beta is zero, and more weights are gradually learned and distributed; adding a certain weight to all the positions and the original features to obtain final features
Figure RE-GDA00036665772800000417
The context information in the spatial dimension is modeled as E.
(4) Deformable 3D convolution: the size of a receptive field is dynamically adjusted by deformable convolution according to the actual situation of an image, and an input feature with the size of C multiplied by H multiplied by W passes through a 3D-CNN with the size of p multiplied by q multiplied by r to generate an offset feature with the size of 3N multiplied by C multiplied by H multiplied by W, wherein N is the size of a sampling grid; having 3N values along the channel dimension, the values representing deformation values of the D3D sampling grid; applying the learned offset features to a deformation of the 3D-CNN sampling grid to generate a D3D sampling grid; the D3D sampling grid is used to generate output features.
D3D is represented by the following formula:
Figure RE-GDA0003666577280000051
wherein, Δ pnRepresenting the offset corresponding to the nth value in the p × q × r convolutional sampling grid, using bilinear interpolation to generate the exact value. The bilinear interpolation formula is:
Figure RE-GDA0003666577280000052
(5) mish activation function: the activation function adopted by the D3DTBTA is Mish, which is a self-regularized non-monotonic activation function, rather than the traditional relu (x) max (0, x). The formula of Mish is:
mish(x)=x×tanh(softplus(x));=xi×tanh(ln(1+ex))
wherein x represents an input; mish is unbounded at the upper bound, and the range at the lower bound is [ ≈ 0.31, ∞ ]; the differential coefficient of Mish is defined as:
Figure RE-GDA0003666577280000053
wherein the content of the first and second substances,
Figure RE-GDA0003666577280000054
(6) regarding the selection of the optimal weight, in the training process, selecting the model with the highest accuracy on the verification set as output, and if the accuracy on the verification set is consistent, selecting the model with the minimum loss on the verification set to output; the best model is saved in each iteration, if the model of the next iteration is better, the model saved last time is replaced, otherwise, the model is not replaced.
Dynamically adjusting the learning rate by adopting a cosine annealing method as shown in the following formula:
Figure RE-GDA0003666577280000061
wherein eta isiIs in the range of the ith iteration
Figure RE-GDA0003666577280000062
The learning rate of (c); t is a unit ofcurIs responsible for calculating the number of iterations that have been performed, and TiThe number of iterations performed in one adjustment period is controlled.
Further, the predicting in step three includes:
HSI data set A is composed of N marked pixels
Figure RE-GDA0003666577280000063
Composition, where p is the band and the corresponding class label set is
Figure RE-GDA0003666577280000064
Where q is the number of land cover categories.
In HSI classification, the quantitative measure for measuring the difference between the predicted result and the true value is a cross-entropy loss function defined as:
Figure RE-GDA0003666577280000065
wherein, the first and the second end of the pipe are connected with each other,
Figure RE-GDA0003666577280000066
a label vector representing the model prediction, y ═ y1,y2,...,yL]Representing the true tag vector.
Another object of the present invention is to provide a three-branch three-attention hyperspectral image classification system combined with D3D, which includes:
the data set generating module is used for generating a set of three-dimensional cubes and randomly dividing the set of three-dimensional cubes into a training set, a verification set and a test set;
the model training and verifying module is used for updating parameters of multiple iterations through a training set, monitoring the performance of the model by using a verifying set and selecting the model with the best training;
and the prediction module is used for selecting the test set to verify the effectiveness of the training model and obtain a classification result.
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
constructing a three-branch three-attention mechanism network D3DTBTA-Net combined with deformable 3D convolution for extracting spectral information and spatial information of the hyperspectral image; the D3DTBTA-Net is divided into three branches: and after the spectral characteristic diagram, the spatial X characteristic diagram and the spatial Y characteristic diagram are respectively extracted, fusing the characteristic diagrams extracted from the three branches for classification.
It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
constructing a three-branch three-attention mechanism network D3DTBTA-Net combined with deformable 3D convolution for extracting spectral information and spatial information of a hyperspectral image; the D3DTBTA-Net is divided into three branches: and after the spectral characteristic diagram, the spatial X characteristic diagram and the spatial Y characteristic diagram are respectively extracted from the spectral branch, the spatial X branch and the spatial Y branch, the characteristic diagrams extracted from the three branches are fused and classified.
Another object of the present invention is to provide an information data processing terminal for implementing the three-branch three-attention hyperspectral image classification system in combination with D3D.
In combination with the technical solutions and the technical problems to be solved, please analyze the advantages and positive effects of the technical solutions to be protected in the present invention from the following aspects:
first, aiming at the technical problems existing in the prior art and the difficulty in solving the problems, the technical problems to be solved by the technical scheme of the present invention are closely combined with results, data and the like in the research and development process, and some creative technical effects are brought after the problems are solved. The specific description is as follows:
the invention provides a three-branch three-attention mechanism network D3DTBTA-Net combined with deformable 3D convolution, which can enhance feature extraction and fully extract spectral information and spatial information of a hyperspectral image, thereby improving the classification precision of the hyperspectral image on the premise of a small sample. The D3DTBTA-Net of the invention is divided into three branches: and respectively extracting a spectral feature map, a spatial X feature map and a spatial Y feature map from the spectral branch, the spatial X branch and the spatial Y branch, and fusing the feature maps extracted from the three branches for classification. A comparison experiment with other classification methods shows that the D3DTBTA-Net is suitable for classifying the hyperspectral images of small samples and can obtain better classification performance.
Secondly, considering the technical solution as a whole or from the perspective of products, the technical effects and advantages of the technical solution to be protected by the present invention are specifically described as follows:
the method can automatically classify according to the trained deep learning model without inputting any parameter and consuming a large amount of time cost and labor cost to label data; the characteristics with more discriminative power can be extracted through the deformable 3D convolution and the three-branch three-attention mechanism, so that the classification precision is improved, the problem of dimension disasters is solved, and the good classification performance can be still kept under the condition that the number of training samples is limited.
Third, as an inventive supplementary proof of the claims of the present invention, there are also presented several important aspects:
the expected income and commercial value after the technical scheme of the invention is converted are as follows: at present, the remote sensing technology is widely applied to the fields of agriculture, forestry, geology, oceans, meteorology, hydrology, military affairs, environmental protection and the like. The method improves the classification precision of the remote sensing image, can be applied to various fields, for example, the method is applied to agricultural production, can dynamically monitor the growth vigor of crops, monitor the diseases and the insect pests of the crops, estimate the yield of the crops and the like, and in the agricultural production, the remote sensing technology can periodically observe and cover a large area to obtain ground information, thereby greatly saving the labor cost and reducing the errors caused by the labor factors, and further promoting the agricultural modernization process of China.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a three-branch three-attention hyperspectral image classification method in combination with D3D according to an embodiment of the invention;
FIG. 2 is a block diagram of a three-branch three-attention hyperspectral image classification system combined with D3D according to an embodiment of the invention;
FIG. 3 is a flow chart of the D3DTBTA-Net algorithm provided by the embodiment of the present invention;
FIG. 4 is a schematic diagram of a calculation process of a spectral attention map provided by an embodiment of the present invention;
FIG. 5 is a schematic diagram of a deformable 3D convolution provided by an embodiment of the present invention;
FIG. 6 is a schematic diagram of a network structure of a D3DTBTA according to an embodiment of the present invention;
FIG. 7 is an experimental plot on an Indian Pines (IP) data set provided by an embodiment of the present invention; wherein FIG. 7(a) is a pseudo-color image; FIG. 7(b) corresponds to a label; fig. 7(c) SVM (68.75%); FIG. 7(d) CDCNN (64.21%); FIG. 7(e) SSRN (91.59%); FIG. 7(f) FDSSC (93.85%); FIG. 7(g) DBDA (91.32%); FIG. 7(h) D3DTBTA (95.74%);
in the figure: 1. a data set generation module; 2. a model training and verification module; 3. and a prediction module.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In order to solve the problems in the prior art, the invention provides a three-branch three-attention hyperspectral image classification method combined with D3D, and the invention is described in detail with reference to the accompanying drawings.
First, an embodiment is explained. This section is an explanatory embodiment expanding on the claims so as to fully understand how the present invention is embodied by those skilled in the art.
Example 1
The hyperspectral image classification method aims at solving the problem that a hyperspectral image classification algorithm based on a small sample is low in classification precision in the prior art. The invention provides a three-branch three-attention mechanism network D3DTBTA-Net combined with deformable 3D convolution, which can enhance feature extraction and fully extract spectral information and spatial information of a hyperspectral image, thereby improving the classification precision of the hyperspectral image on the premise of a small sample. The D3DTBTA-Net is divided into three branches: and respectively extracting a spectral feature map, a spatial X feature map and a spatial Y feature map from the spectral branch, the spatial X branch and the spatial Y branch, and fusing the feature maps extracted from the three branches for classification. A comparison experiment with other classification methods shows that the D3DTBTA-Net is suitable for classifying the hyperspectral images of small samples and can obtain better classification performance.
As shown in fig. 1, the three-branch three-attention hyperspectral image classification method combined with D3D according to the embodiment of the invention includes the following steps:
s101, generating a data set: randomly dividing a three-dimensional cube set into a training set, a verification set and a test set;
s102, training a model and verifying the model: the training set is used for updating parameters of multiple iterations, and the verification set is used for monitoring the performance of the model and selecting the model which is best trained;
s103, predicting: and selecting a test set to verify the effectiveness of the training model to obtain a classification result.
As shown in fig. 2, the three-branch three-attention hyperspectral image classification system combined with D3D provided by the embodiment of the invention includes:
the data set generating module 1 is used for generating a set of three-dimensional cubes and randomly dividing the set of three-dimensional cubes into a training set, a verification set and a test set;
the model training and verifying module 2 is used for updating parameters of multiple iterations through a training set, monitoring the performance of the model by using a verifying set and selecting the model with the best training;
and the prediction module 3 is used for selecting the test set to verify the effectiveness of the training model and obtain a classification result.
The process of the D3DTBTA-Net algorithm provided by the embodiment of the invention comprises three steps: data set generation, training and validation, and prediction. Fig. 3 illustrates the overall algorithm flow of the method of the present invention.
Suppose that HSI data set A is made up of N labeled pixels
Figure RE-GDA0003666577280000101
Composition, where p is the band and the corresponding class label set is
Figure RE-GDA0003666577280000102
Where q is the number of land cover categories.
Step 1, generating a data set. Selecting a central pixel x from the raw dataiP neighboring pixels, generating a set of three-dimensional cubic blocks
Figure RE-GDA0003666577280000103
If the target pixel is located at an edge of the image, the missing neighboring pixel value is set to zero. In the D3DTBTA-Net algorithm, p is the size of the patch, the size of the patch in the method is set to be 9, and b is the number of the spectral bands. Then, the three-dimensional cube set is randomly divided into a training set XtrainVerification set XvalAnd test set Xtest. The corresponding label vector is divided into Ytrain、YvalAnd Ytest. Since the labels of neighboring pixels are not visible to the network, only spatial information around the target pixel is used.
And 2, training a model and verifying the model. The training set is used to update the parameters for multiple iterations, while the validation set is used to monitor the performance of the model and select the best trained model. In step 2, the training model and the verification model use the algorithm D3DTBTA-Net of the present invention, which is divided into three branches: and the spectral branch, the spatial X branch and the spatial Y branch are respectively used for capturing the spectral feature map, the spatial X feature map and the spatial Y feature map, and then the obtained three feature maps are fused and classified. Wherein the spectrum branch comprises a Dense spectrum block and a spectrum attention block; the space X branch comprises a Dense space X block and a space X attention block; the space Y branch contains a De nse space Y block and a space Y attention block.
And 3, predicting. And selecting a test set to verify the effectiveness of the training model, so as to obtain a classification result. In HSI classification, a common quantitative measure of the difference between the predicted result and the true value is a cross-entropy loss function defined as:
Figure RE-GDA0003666577280000111
wherein the content of the first and second substances,
Figure RE-GDA0003666577280000112
tag vector predicted for model, y ═ y1,y2,...,yL]Representing the true tag vector.
Further, the following basic modules are used in step 2.
(1) 3D-CNN with BN. 3D-CNN with BN is a common element in depth learning models based on 3D cube blocks. For pm×pm×bmN of sizemFeature map, in a 3D-CNN layer, comprising a size of αm+1×αm+1×dm+1K of (a)m+1A channel of size pm+1×pm+1×bm+1N of (a)m+1And outputting the characteristic diagram. The ith output of the (m +1) th 3D-CNN layer with BN is calculated as:
Figure RE-GDA0003666577280000113
Figure RE-GDA0003666577280000114
wherein
Figure RE-GDA0003666577280000115
Is the jth input feature map of the (m +1) layer,
Figure RE-GDA0003666577280000116
is the output after BN for m layers. E (-) and Var (-) represent the expectation and variance functions of the input, respectively.
Figure RE-GDA0003666577280000117
And
Figure RE-GDA0003666577280000118
and (3) weight values and bias values of the (m +1) layer 3D-CNN are represented, the (m +1) layer 3D-CNN is a 3D convolution operation, and R (-) represents an activation function introduced into a network nonlinear unit.
(2) DenseNet dense junctions. Generally, the more convolutional layers, the better the performance of the network. However, when the network reaches a certain depth, the performance cannot be improved by continuously increasing the number of layers, but the network is degraded, that is, the accuracy on the training set gradually saturates or even decreases with the increase of the number of layers of the network. DenseNet is an effective way to solve this problem.
The dense block is the basic unit in DenseNet, and the output of the l-th dense block is calculated as:
xl=Hl[x0,x1,...,xl-1] (4)
wherein HlIs a block containing a convolutional layer, an active layer and a BN layer, x0,x1,...,xl-1Representing the generated dense blocks, the more connections, the more information flows in the dense network. Specifically, a dense network with L layers has L (L +1)/2 connections, whereas a conventional convolutional network with equal layers has only L direct connections.
(3) Attention is paid to the mechanism. One drawback of 3D-CNN is that all spatial pixels and spectral bands possess equivalent weights in the spatial and spectral domains. Obviously, different spectral bands and spatial pixels contribute differently to the extracted features.
As shown in FIG. 4, spectral attention mapping
Figure RE-GDA0003666577280000121
Is directly inputted from the initial
Figure RE-GDA0003666577280000122
Calculated, where p × p is the size of the input block, and c represents the number of input channels. Firstly, A and A are mixedTPerforming matrix multiplication to obtain channel attention mapping
Figure RE-GDA0003666577280000123
Connecting the softmax layer as:
Figure RE-GDA0003666577280000124
wherein xjiIndicating the effect of the ith channel on the jth channel. Secondly, mixing XTThe result of the matrix multiplication with A is transformed into
Figure RE-GDA0003666577280000125
Finally, weighting the reconstructed result through the parameter of the scale alpha, and adding the input A to obtain a final spectrum attention chart
Figure RE-GDA0003666577280000126
Figure RE-GDA0003666577280000127
Where α is initialized to zero and can be learned step by step. The final plot E contains a weighted sum of all channel features, which may describe a dependency, enhancing the discriminability of the features.
The space notices the block. Given an input feature map
Figure RE-GDA0003666577280000128
Generating new feature maps B and C using two convolution layers, respectively, wherein
Figure RE-GDA0003666577280000129
First, B and C are deformed into
Figure RE-GDA00036665772800001210
Where n-p × p is the number of pixels. Secondly, performing matrix multiplication between B and C, then adding a softmax layer, and calculating a spatial attention feature map
Figure RE-GDA00036665772800001211
Figure RE-GDA00036665772800001212
Wherein s isjiRepresenting the impact of the ith pixel through the jth pixel. Of two pixelsThe closer the feature representations are, the stronger the correlation between them is represented.
Simultaneously feeding the initial input features A into the convolutional layer to obtain a new feature map
Figure RE-GDA00036665772800001213
Is deformed into
Figure RE-GDA00036665772800001214
Finally at D and STPerforms matrix multiplication operation therebetween, and the result is transformed into
Figure RE-GDA00036665772800001215
Figure RE-GDA00036665772800001216
Where the initial value of β is zero, more weight can be learned and assigned step by step. According to the equation (8), the final feature is obtained by adding a certain weight to all the positions and the original features
Figure RE-GDA0003666577280000131
Thus, the context information in the spatial dimension is modeled as E.
(4) A deformable 3D convolution. CNNs based on CNNs or improved deeper networks, when extracting features, the sampling positions of convolution kernels are usually fixed grids, and for very complex objects with different scales or shapes, the traditional method based on convolution neural networks cannot effectively extract features from complex structures, thereby limiting the classification performance. The size of the receptive field can be dynamically adjusted by the deformable convolution according to the actual condition of the image, and the features can be better extracted. The Deformable Convolution is generally two-dimensional, and Deformable 3D Convolution (D3D Convolution, D3D) fuses the Deformable Convolution and the 3D-CNN together, thereby significantly improving the deformation modeling capability of CNNs. D3D may expand the spatial field of view by a learnable offset variable, where as shown in fig. 5, an input feature of size C × H × W is first passed through a 3D-CNN of size p × q × r to generate an offset feature of size 3N × C × H × W (where N ═ p × q × r is the size of the sampling grid). Along its channel dimension, has 3N values representing the deformation values of the D3D sampling grid. The learned offset features are then used to deform the 3D-CNN sampling grid to generate a D3D sampling grid. Finally, a D3D sampling grid is used to produce the output features. D3D can be expressed by the following formula:
Figure RE-GDA0003666577280000132
wherein Δ pnRepresenting the offset corresponding to the nth value in the p × q × r convolutional sampling grid. Since offset variables are usually fractional numbers, bilinear interpolation is used to generate accurate values. The bilinear interpolation formula is:
Figure RE-GDA0003666577280000133
(5) the Mish activation function. The activation function adopted by the D3DTBTA is Mish, which is a self-regularized non-monotonic activation function, rather than the traditional relu (x) ═ max (0, x). The formula of Mish is:
mish(x)=x×tanh(softplus(x)) =xi×tanh(ln(1+ex)) (11)
where x represents the input. Mish is unbounded at the upper bound, and the range at the lower bound is [ ≈ -0.31, ∞ ]. The differential coefficient of Mish is defined as:
Figure RE-GDA0003666577280000141
wherein
Figure RE-GDA0003666577280000142
(6) And regarding selection of the optimal weight, selecting the model with the highest accuracy on the verification set as output in the training process, and selecting the model with the lowest loss on the verification set to output if the accuracy on the verification set is consistent. The best model is saved in each iteration, if the model of the next iteration is better, the model saved last time is replaced, otherwise, the model is not replaced.
The learning rate is an important hyper-parameter for training the network, and the dynamic learning rate can help the network avoid local minima. Dynamically adjusting the learning rate by adopting a cosine annealing method as follows:
Figure RE-GDA0003666577280000143
wherein etaiIs in the range of the ith iteration
Figure RE-GDA0003666577280000144
The learning rate of (2). T iscurIs responsible for calculating the number of iterations that have been performed, and TiThe number of iterations performed in one adjustment period is controlled.
Example 2
The network structure of the D3DTBTA is shown in fig. 6. For convenience, the upper branch is called the spectral branch, and the lower branch is called the spatial X branch and the spatial Y branch, respectively. And respectively inputting the spectrum branch, the space X branch and the space Y branch to obtain a spectrum characteristic diagram and a space characteristic diagram. And then, obtaining a classification result by adopting the fusion operation of the spectrum, the space X characteristic diagram and the space Y characteristic diagram.
The following section describes spectral branches, spatial X branches, spatial Y branches, and the operation of fusing spectra and spaces, taking an Indian Pipes (IP) dataset as an example. The sample cube size is 9 × 9 × 200, as the matrix mentioned below (9 × 9 × 97,24), 9 × 9 × 97 denotes the height, width and depth of the 3D cube, and 24 denotes the number of 3D cubes generated by the 3D-CNN. The IP data set contains 145 x 145 pixels with 200 spectral bands, i.e. the size of the IP is 145 x 200. Only 10249 pixels have a corresponding label, the other pixels being background.
Because the spectrum channels of the HSI are extremely numerous and are redundant for classification, the HSI classification algorithm generally carries out dimensionality reduction operation firstly, so that the redundancy is reduced, and the classification accuracy is improved. The D3DTBTA firstly uses a 3D-CNN layer with the convolution kernel size of 1 multiplied by 7, the step is set to be (1,1,2) to reduce the number of channels to obtain a characteristic diagram of (9 multiplied by 97,8), then uses a deformable 3D convolution enhancement characteristic with the convolution kernel size of 3 multiplied by 3, and then uses a 3D-CNN layer with the convolution kernel size of 1 multiplied by 7 to capture the characteristic diagram of (9 multiplied by 97,24) as an input characteristic diagram of three branches.
The captured signature of size (9 × 9 × 97,24) is input to the spectral branch, first passed through 3D-CNN sense spectral blocks with BN, each sense spectral block having 12 channels in 3D-CNN, with a convolution kernel size of 1 × 1 × 7. After passing through the Dense spectrum block, the number of channels of the feature map calculated by equation (5) is increased to 60, and the size of the feature map is (9 × 9 × 97, 60). Next, after the last 3D-CNN with a convolution kernel size of 1 × 1 × 97, a (9 × 9 × 1,60) feature map is generated. However, these 60 channels contribute differently to the classification. To refine the spectral features, spectral attention blocks are employed, which emphasize the weight of useful information and de-emphasize the weight of redundant information. After the weighted spectral feature map is obtained, the features are enhanced through a deformable 3D convolution with the size of 3 x 1, and then stability and robustness are improved by adopting BN layers and dropout layers. Finally, a 1 × 60 feature map is obtained by globally averaging the pooling layers. The details of the implementation of the spectral branching are shown in table 1.
TABLE 1 implementation details of spectral branching
Figure RE-GDA0003666577280000151
Meanwhile, the feature map of (9 × 9 × 97,24) is input to the spatial X branch, and then the 3D-CNN sense spatial X block with BN is added. Each 3D-CNN has 12 channels in the sense space X block, with a convolution kernel size of 3 × 1 × 1. Next, the feature map of (9 × 9 × 1,60) is input to the spatial X attention block, and the coefficients of each pixel are weighted by the spatial X attention block, thereby obtaining a more discriminative spatial X feature map. After obtaining the weighted spatial X feature map, enhancing the features by a deformable 3D convolution with a size of 3 × 3 × 1, and then obtaining a 1 × 60 spatial X feature map by the BN layer, the Dropout layer, and the global average pooling layer. The implementation details of the spatial X branch are shown in table 2.
TABLE 2 implementation details of spatial X-branching
Figure RE-GDA0003666577280000161
Likewise, the feature map of (9 × 9 × 97,24) is input to the space Y branch, and then the 3D-CNN sense space Y block with BN is added. Each 3D-CNN has 12 channels in the Dense space Y block, and the convolution kernel size is 1 × 3 × 1. The feature map of (9 × 9 × 1,60) is input to the spatial Y attention block, and the coefficients of each pixel are weighted by the spatial Y attention block, thereby obtaining a more discriminative spatial Y feature map. After obtaining the weighted spatial Y feature map, enhancing the features by a deformable 3D convolution with a size of 3 × 3 × 1, and then obtaining a 1 × 60 spatial Y feature map by the BN layer, the Dropout layer, and the global average pooling layer. The implementation details of the space Y branch are shown in table 3.
TABLE 3 implementation details of space Y-Branch
Figure RE-GDA0003666577280000162
Figure RE-GDA0003666577280000171
Obtaining a spectral feature map, a spatial X feature map and a spatial Y feature map through the spectral branch, the spatial X branch and the spatial Y branch, and then connecting the three feature maps for classification. In addition, the reason for using the tandem operation rather than the addition operation is that the spectral, spatial X, and spatial Y features are all in unrelated domains, the tandem operation can keep the spectral, spatial X, and spatial Y features independent, and the addition operation can mix the spectral, spatial X, and spatial Y features together. And finally, obtaining a classification result through the full connection layer and the softmax layer.
The inventive method was experimented on 4 published hyperspectral datasets, namely the Indian Pines (IP) dataset, the Pavia University (UP) dataset, the Salanas Valley (SV) dataset and the Kennedy Space Center (KSC). Other 5 methods were compared: SVM, CDCNN, SSRN, FDS SC, and DBDA. The methods are effective methods for classifying the hyperspectral images of the small samples and are authenticated by researchers.
The experiments were all performed on the same platform, configuring 16GB memory and NVIDIA GeForce RTX 1080 Ti GPU. All deep learning based classifiers are implemented using PyTorch and the support vector machine is implemented using ski earn.
Since the SVM directly uses spectral information for classification, the input sample size is 1 × 1 × p. For better comparative experiments, other deep learning based methods use the same input sample size of 9 × 9 × p, where p is the number of spectral bands.
The batch processing sizes of CDCNN, SSRN, FDSSC, DBDA and the method D3DTBTA of the invention are all set as 16, the optimizer is set as Adam, and the learning rate is 0.0005. Each method is independently performed for 10 iterations, and the experimental result is the average value of 10 iteration results. The total number of epochs is set to 150, with a step size of 30 per epoch. Experiments were performed using the optimal weight selection method.
The size of the training and validation samples was 3% of the total sample. The number of training, validation and test samples in the Indian Pines (IP) dataset is shown in table 4.
TABLE 4 number of training, validation and test samples in IP dataset
Figure RE-GDA0003666577280000181
And II, application embodiment. In order to prove the creativity and the technical value of the technical scheme of the invention, the part is the application example of the technical scheme of the claims on specific products or related technologies.
Embodiments of the present invention may be realized in hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
And thirdly, evidence of relevant effects of the embodiment. The embodiment of the invention achieves some positive effects in the process of research and development or use, and has great advantages compared with the prior art, and the following contents are described by combining data, diagrams and the like in the test process.
In the examples, the experimental results on the Indian Pines (IP) data set are shown in fig. 7 and in table 5 below, where the training set size is 3%.
TABLE 5
Figure RE-GDA0003666577280000191
Wherein FIG. 7(a) is a pseudo-color image; FIG. 7(b) corresponds to a label; fig. 7(c) SVM (68.75%); FIG. 7(d) CDCNN (64.21%); FIG. 7(e) SSRN (91.59%); FIG. 7(f) FDSSC (93.85%); FIG. 7(g) DBDA (91.32%); fig. 7(h) D3DTBTA (95.74%).
The Overall Accuracy (OA) of the four data sets at different training sample ratios is shown in table 6.
TABLE 6 Integrated accuracy (OA) at different training sample ratios
Figure RE-GDA0003666577280000192
Figure RE-GDA0003666577280000201
Under different training sample proportions, the best classification result is shown in bold. As shown in the table, the classification performance of the method is superior to that of other methods. The D3DTBTA provided by the invention is not the best in classification precision except on an IP data set with a training sample proportion of 1%, but has a small difference with the best classification precision, and the method provided by the invention obtains the best classification precision under other data sets and different training sample proportions. And with the increase of the proportion of the training samples, the classification precision is higher and higher. Under the condition of less training samples, the method provided by the invention can still keep good classification performance.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A three-branch three-attention hyperspectral image classification method combined with D3D is characterized by comprising the following steps of: constructing a three-branch three-attention mechanism network D3DTBTA-Net combined with deformable 3D convolution for extracting spectral information and spatial information of the hyperspectral image; the three-branch three-attention mechanism network D3DTBTA-Net respectively extracts a spectral feature map, a spatial X feature map and a spatial Y feature map by utilizing three branches, and performs feature map fusion and classification.
2. The three-branch three-attention mechanism hyperspectral image classification method combined with D3D according to claim 1, wherein the three-branch three-attention mechanism hyperspectral image classification method combined with D3D comprises the following steps:
step one, generating a data set: generating a set of three-dimensional cubes, and randomly dividing the set of three-dimensional cubes into a training set, a verification set and a test set;
step two, training a model and verifying the model: the training set is used for updating parameters of multiple iterations, and the verification set is used for monitoring the performance of the model and selecting the model which is best trained;
step three, prediction: and selecting a test set to verify the effectiveness of the training model to obtain a classification result.
3. The three-branch three-attention mechanism hyperspectral image classification method in combination with D3D according to claim 2, wherein the data set generation in the first step comprises:
selecting a central pixel x from the raw dataiP neighboring pixels, generating a set of three-dimensional cubic blocks
Figure FDA0003580396290000011
If the target pixel is located at the edge of the image, setting the missing neighboring pixel value to zero; in the D3DTBTA-Net algorithm, p is the size of the patch, the size of the patch is set to be 9, and b is the number of the spectral bands; randomly dividing three-dimensional cube set into training set XtrainVerification set XvalAnd test set XtestThe corresponding label vector is divided into Ytrain、YvalAnd YtestOnly spatial information around the target pixel is used.
4. The method for classifying hyperspectral images with a three-branch three-attention mechanism combined with D3D according to claim 2, wherein the training model and the verification model in the second step comprise:
training a model and a verification model by using a D3DTBTA-Net algorithm, wherein the D3DTBTA-Net algorithm is divided into three branches: the spectrum branch, the space X branch and the space Y branch are respectively used for capturing a spectrum characteristic diagram, a space X characteristic diagram and a space Y characteristic diagram and fusing the three acquired characteristic diagrams for classification; wherein the spectrum branch comprises a Dense spectrum block and a spectrum attention block; the space X branch comprises a Dense space X block and a space X attention block; the space Y branch contains a Dense space Y block and a space Y attention block.
5. The method for classifying hyperspectral images by using a three-branch three-attention mechanism in combination with D3D according to claim 2, wherein the following basic modules are used in the second step:
(1) 3D-CNN with BN: the 3D-CNN with BN is a common element in a depth learning model based on 3D cubic blocks; for pm×pm×bmN of sizemFeature map, in a 3D-CNN layer, comprising a size of αm+1×αm+1×dm+1K of (a)m+1A channel of size pm+1×pm+1×bm+1N of (A) to (B)m+1Outputting a characteristic diagram; the ith output of the (m +1) th 3D-CNN layer with BN is calculated as:
Figure FDA0003580396290000021
Figure FDA0003580396290000022
wherein the content of the first and second substances,
Figure FDA0003580396290000023
is the jth input feature map of the (m +1) layer,
Figure FDA0003580396290000024
is the output after BN of m layers; e (-) and Var (-) represent the expectation and variance functions of the input, respectively; hi m+1And bi m+1Respectively representing the weight and bias of (m +1) layer 3D-CNN, which is a 3D convolution operation, and R (-) is an activation function introduced into a network nonlinear unit;
(2) DenseNet dense ligation: the dense block is the basic unit in DenseNet, and the output of the l-th dense block is calculated as:
xl=Hl[x0,x1,...,xl-1];
wherein HlIs a block containing a convolutional layer, an active layer and a BN layer, x0,x1,...,xl-1Representing the generated dense blocks, the more connections, the more information flows in the dense network; the dense network with the layer number L has L (L +1)/2 connections, and the traditional convolution network with the same layer number only has L direct connections;
(3) an attention mechanism is as follows:
spectral attention mapping
Figure FDA0003580396290000025
Is directly inputted from the initial
Figure FDA0003580396290000026
Calculating, wherein p × p is the size of the input block, and c represents the number of input channels; a and A are reactedTPerforming matrix multiplication to obtain channel attention mapping
Figure FDA0003580396290000027
Connecting the softmax layer as:
Figure FDA0003580396290000031
wherein x isjiRepresenting the influence of the ith channel on the jth channel; mixing XTThe result of the matrix multiplication with A is transformed into
Figure FDA0003580396290000032
Weighting the reconstructed result through the parameter of the scale alpha, and adding the input A to obtain a final spectrum attention map
Figure FDA0003580396290000033
Figure FDA0003580396290000034
Wherein alpha is initialized to zero and gradually learned; the final graph E contains the weighted sum of all channel features, which is used for describing a dependency relationship and enhancing the discriminability of the features;
the space attention block: given an input profile
Figure FDA0003580396290000035
Generating new characteristic maps B and C by using two convolution layers respectively, wherein
Figure FDA0003580396290000036
Deforming B and C into
Figure FDA0003580396290000037
Where n is p × p is the number of pixels; performing matrix multiplication between B and C, adding a softmax layer, and calculating a spatial attention feature map
Figure FDA0003580396290000038
Figure FDA0003580396290000039
Wherein s isjiRepresenting the influence of the ith pixel to the jth pixel; the closer the feature representations of two pixels are, the stronger the correlation between the representative pixels;
simultaneously sending the initial input features A into the convolutional layer to obtain a new feature map
Figure FDA00035803962900000310
Is deformed into
Figure FDA00035803962900000311
At D and STPerforms matrix multiplication operation therebetween, and the result is transformed into
Figure FDA00035803962900000312
Figure FDA00035803962900000313
Wherein the initial value of beta is zero, and more weights are gradually learned and distributed; adding a certain weight to all the positions and the original features to obtain final features
Figure FDA00035803962900000314
So the context information in the spatial dimension is modeled as E;
(4) deformable 3D convolution: the size of a receptive field is dynamically adjusted by deformable convolution according to the actual situation of an image, and an input feature with the size of C multiplied by H multiplied by W passes through a 3D-CNN with the size of p multiplied by q multiplied by r to generate an offset feature with the size of 3N multiplied by C multiplied by H multiplied by W, wherein N is the size of a sampling grid; having 3N values along the channel dimension, the values representing deformation values of the D3D sampling grid; applying the learned offset features to a deformation of the 3D-CNN sampling grid to generate a D3D sampling grid; generating output features using a D3D sampling grid;
D3D is represented by the following formula:
Figure FDA0003580396290000041
wherein, Δ pnRepresenting an offset corresponding to an nth value in a p × q × r convolutional sampling grid, using bilinear interpolation to generate an accurate value; the bilinear interpolation formula is:
Figure FDA0003580396290000042
(5) mish activation function: the activation function adopted by the D3DTBTA is Mish, and the formula of Mish is as follows:
mish(x)=x×tanh(softplus(x));
=xi×tanh(ln(1+ex))
wherein x represents an input; mish is unbounded at the upper bound, and the range at the lower bound is [ ≈ 0.31, ∞ ]; the differential coefficient of Mish is defined as:
Figure FDA0003580396290000043
wherein the content of the first and second substances,
Figure FDA0003580396290000044
(6) regarding the selection of the optimal weight, in the training process, selecting the model with the highest accuracy on the verification set as output, and if the accuracy on the verification set is consistent, selecting the model with the minimum loss on the verification set to output; the best model is saved in each iteration, if the model of the next iteration is better, the model saved last time is replaced, otherwise, the model is not replaced;
dynamically adjusting the learning rate by adopting a cosine annealing method as shown in the following formula:
Figure FDA0003580396290000045
wherein etaiIs in the range of the ith iteration
Figure FDA0003580396290000046
The learning rate of (c); t iscurIs responsible for calculating the number of iterations that have been performed, and TiThe number of iterations performed in one adjustment period is controlled.
6. The three-branch three-attention mechanism hyperspectral image classification method in combination with D3D according to claim 2, wherein the prediction in step three comprises:
HSI data set A is composed of N marked pixels
Figure FDA0003580396290000047
Composition, where p is the band and the corresponding class label set is
Figure FDA0003580396290000051
Wherein q is the number of land cover categories;
in HSI classification, the quantitative measure for measuring the difference between the predicted result and the true value is a cross-entropy loss function defined as:
Figure FDA0003580396290000052
wherein the content of the first and second substances,
Figure FDA0003580396290000053
tag vector representing model prediction, y ═ y1,y2,...,yL]Representing the true tag vector.
7. A three-branch three-attention hyperspectral image classification system combined with D3D applying the three-branch three-attention hyperspectral image classification method combined with D3D according to any of claims 1 to 6, wherein the three-branch three-attention hyperspectral image classification system combined with D3D comprises:
the data set generating module is used for generating a set of three-dimensional cubes and randomly dividing the set of three-dimensional cubes into a training set, a verification set and a test set;
the model training and verifying module is used for updating parameters of multiple iterations through a training set, monitoring the performance of the model by using a verifying set and selecting the model with the best training;
and the prediction module is used for selecting the test set to verify the effectiveness of the training model and obtain a classification result.
8. A computer arrangement comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:
constructing a three-branch three-attention mechanism network D3DTBTA-Net combined with deformable 3D convolution for extracting spectral information and spatial information of a hyperspectral image; the D3DTBTA-Net is divided into three branches: and after the spectral characteristic diagram, the spatial X characteristic diagram and the spatial Y characteristic diagram are respectively extracted, fusing the characteristic diagrams extracted from the three branches for classification.
9. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
constructing a three-branch three-attention mechanism network D3DTBTA-Net combined with deformable 3D convolution for extracting spectral information and spatial information of the hyperspectral image; the D3DTBTA-Net is divided into three branches: and after the spectral characteristic diagram, the spatial X characteristic diagram and the spatial Y characteristic diagram are respectively extracted, fusing the characteristic diagrams extracted from the three branches for classification.
10. An information data processing terminal characterized by being configured to implement the three-branch three-attention hyperspectral image classification system in combination with D3D of claim 7.
CN202210344115.7A 2022-04-02 2022-04-02 Three-branch three-attention mechanism hyperspectral image classification method combined with D3D Active CN114758170B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210344115.7A CN114758170B (en) 2022-04-02 2022-04-02 Three-branch three-attention mechanism hyperspectral image classification method combined with D3D

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210344115.7A CN114758170B (en) 2022-04-02 2022-04-02 Three-branch three-attention mechanism hyperspectral image classification method combined with D3D

Publications (2)

Publication Number Publication Date
CN114758170A true CN114758170A (en) 2022-07-15
CN114758170B CN114758170B (en) 2023-04-18

Family

ID=82329787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210344115.7A Active CN114758170B (en) 2022-04-02 2022-04-02 Three-branch three-attention mechanism hyperspectral image classification method combined with D3D

Country Status (1)

Country Link
CN (1) CN114758170B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116310810A (en) * 2022-12-06 2023-06-23 青岛柯锐思德电子科技有限公司 Cross-domain hyperspectral image classification method based on spatial attention-guided variable convolution
CN116883726A (en) * 2023-06-25 2023-10-13 内蒙古农业大学 Hyperspectral image classification method and system based on multi-branch and improved Dense2Net
CN117036821A (en) * 2023-08-22 2023-11-10 翔鹏佑康(北京)科技有限公司 Single-cell rapid detection and identification method based on laser Raman spectrum

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170090068A1 (en) * 2014-09-12 2017-03-30 The Climate Corporation Estimating soil properties within a field using hyperspectral remote sensing
CN109978041A (en) * 2019-03-19 2019-07-05 上海理工大学 A kind of hyperspectral image classification method based on alternately update convolutional neural networks
CN111191736A (en) * 2020-01-05 2020-05-22 西安电子科技大学 Hyperspectral image classification method based on depth feature cross fusion
CN111539447A (en) * 2020-03-17 2020-08-14 广东省智能制造研究所 Hyperspectrum and terahertz data depth fusion-based classification method
CN112052755A (en) * 2020-08-24 2020-12-08 西安电子科技大学 Semantic convolution hyperspectral image classification method based on multi-path attention mechanism
CN112116563A (en) * 2020-08-28 2020-12-22 南京理工大学 Hyperspectral image target detection method and system based on spectral dimension and space cooperation neighborhood attention
CN112232280A (en) * 2020-11-04 2021-01-15 安徽大学 Hyperspectral image classification method based on self-encoder and 3D depth residual error network
CN113139515A (en) * 2021-05-14 2021-07-20 辽宁工程技术大学 Hyperspectral image classification method based on conditional random field and depth feature learning
WO2021262129A1 (en) * 2020-06-26 2021-12-30 Ceylan Murat An artificial intelligence analysis based on hyperspectral imaging for a quick determination of the health conditions of newborn premature babies without any contact

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170090068A1 (en) * 2014-09-12 2017-03-30 The Climate Corporation Estimating soil properties within a field using hyperspectral remote sensing
CN109978041A (en) * 2019-03-19 2019-07-05 上海理工大学 A kind of hyperspectral image classification method based on alternately update convolutional neural networks
CN111191736A (en) * 2020-01-05 2020-05-22 西安电子科技大学 Hyperspectral image classification method based on depth feature cross fusion
CN111539447A (en) * 2020-03-17 2020-08-14 广东省智能制造研究所 Hyperspectrum and terahertz data depth fusion-based classification method
WO2021262129A1 (en) * 2020-06-26 2021-12-30 Ceylan Murat An artificial intelligence analysis based on hyperspectral imaging for a quick determination of the health conditions of newborn premature babies without any contact
CN112052755A (en) * 2020-08-24 2020-12-08 西安电子科技大学 Semantic convolution hyperspectral image classification method based on multi-path attention mechanism
CN112116563A (en) * 2020-08-28 2020-12-22 南京理工大学 Hyperspectral image target detection method and system based on spectral dimension and space cooperation neighborhood attention
CN112232280A (en) * 2020-11-04 2021-01-15 安徽大学 Hyperspectral image classification method based on self-encoder and 3D depth residual error network
CN113139515A (en) * 2021-05-14 2021-07-20 辽宁工程技术大学 Hyperspectral image classification method based on conditional random field and depth feature learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
康拥朝等: "高光谱图像分类方法研究进展", 《新产经》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116310810A (en) * 2022-12-06 2023-06-23 青岛柯锐思德电子科技有限公司 Cross-domain hyperspectral image classification method based on spatial attention-guided variable convolution
CN116310810B (en) * 2022-12-06 2023-09-15 青岛柯锐思德电子科技有限公司 Cross-domain hyperspectral image classification method based on spatial attention-guided variable convolution
CN116883726A (en) * 2023-06-25 2023-10-13 内蒙古农业大学 Hyperspectral image classification method and system based on multi-branch and improved Dense2Net
CN117036821A (en) * 2023-08-22 2023-11-10 翔鹏佑康(北京)科技有限公司 Single-cell rapid detection and identification method based on laser Raman spectrum

Also Published As

Publication number Publication date
CN114758170B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
Shook et al. Crop yield prediction integrating genotype and weather variables using deep learning
Folberth et al. Spatio-temporal downscaling of gridded crop model yield estimates based on machine learning
Srivastava et al. Winter wheat yield prediction using convolutional neural networks from environmental and phenological data
CN114758170B (en) Three-branch three-attention mechanism hyperspectral image classification method combined with D3D
e Lucas et al. Reference evapotranspiration time series forecasting with ensemble of convolutional neural networks
Rußwurm et al. Multi-temporal land cover classification with long short-term memory neural networks
Wang et al. Evaluation of a deep-learning model for multispectral remote sensing of land use and crop classification
Plaza et al. A new approach to mixed pixel classification of hyperspectral imagery based on extended morphological profiles
CN102938072B (en) A kind of high-spectrum image dimensionality reduction and sorting technique based on the tensor analysis of piecemeal low-rank
Wylie et al. Geospatial data mining for digital raster mapping
Sun et al. Mapping plant functional types from MODIS data using multisource evidential reasoning
Hu et al. Integrating coarse-resolution images and agricultural statistics to generate sub-pixel crop type maps and reconciled area estimates
Adeluyi et al. Estimating the phenological dynamics of irrigated rice leaf area index using the combination of PROSAIL and Gaussian Process Regression
Cheng et al. Wheat yield estimation using remote sensing data based on machine learning approaches
Lin et al. Large-scale rice mapping using multi-task spatiotemporal deep learning and sentinel-1 sar time series
Ayaz et al. Estimation of reference evapotranspiration using machine learning models with limited data
Liu et al. An algorithm for early rice area mapping from satellite remote sensing data in southwestern Guangdong in China based on feature optimization and random Forest
von Bloh et al. Machine learning for soybean yield forecasting in Brazil
Zhang et al. Enhancing model performance in detecting lodging areas in wheat fields using UAV RGB Imagery: Considering spatial and temporal variations
Lang et al. Integrating environmental and satellite data to estimate county-level cotton yield in Xinjiang Province
Saravi et al. Reducing deep learning network structure through variable reduction methods in crop modeling
Zhong et al. Detect and attribute the extreme maize yield losses based on spatio-temporal deep learning
Haining Specification and estimation problems in models of spatial dependence.
Toomula et al. An Extensive Survey of Deep learning-based Crop Yield Prediction Models for Precision Agriculture
Hosseini et al. Areal precipitation coverage ratio for enhanced AI modelling of monthly runoff: a new satellite data-driven scheme for semi-arid mountainous climate

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant