CN115545166A - Improved ConvNeXt convolutional neural network and remote sensing image classification method thereof - Google Patents

Improved ConvNeXt convolutional neural network and remote sensing image classification method thereof Download PDF

Info

Publication number
CN115545166A
CN115545166A CN202211342737.2A CN202211342737A CN115545166A CN 115545166 A CN115545166 A CN 115545166A CN 202211342737 A CN202211342737 A CN 202211342737A CN 115545166 A CN115545166 A CN 115545166A
Authority
CN
China
Prior art keywords
remote sensing
features
layer
sensing image
inputting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211342737.2A
Other languages
Chinese (zh)
Inventor
王坤
杜景林
高文凯
杨陆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202211342737.2A priority Critical patent/CN115545166A/en
Publication of CN115545166A publication Critical patent/CN115545166A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an improved ConvNeXt convolutional neural network and a classification method of remote sensing images thereof, wherein the method comprises the steps of obtaining the remote sensing images, and processing the remote sensing images into images of 224 multiplied by 3; respectively carrying out down-sampling, global feature extraction and local feature extraction on the processed remote sensing image, and then inputting the processed remote sensing image into an average pooling layer and a full-connection layer to obtain a first feature vector with the size of 1 multiplied by 1000; respectively carrying out context information modeling, a random node sampler and a graph convolution network on the processed remote sensing image, and inputting an average pooling layer and a full-connection layer to obtain a second feature vector with the size of 1 multiplied by 1000; fusing the first feature vector and the second feature vector by using an addition strategy to obtain fused features; inputting the fused features into a full-link layer and a Softmax classification layer to predict and obtain a final classification result.

Description

Improved ConvNeXt convolutional neural network and remote sensing image classification method thereof
Technical Field
The invention relates to an improved ConvNeXt convolutional neural network and a remote sensing image classification method thereof, and belongs to the technical field of image classification.
Background
The high-resolution remote sensing image scene classification is an important component of remote sensing data processing, namely, a fixed semantic label is automatically allocated to each scene image, and the method is widely applied to the fields of urban planning, emergency disasters, land utilization, environment monitoring and the like. The early remote sensing image classification mainly uses a method for manually designing features, needs experts to carefully design, visually and explicitly extract the features aiming at the characteristics of different scenes, and is used for classification tasks after being coded. But are typically low-level dense features that contain a large amount of redundant information that affects the classification accuracy.
ConvNeXt is one of the best performance models in the field of image classification at present, and in macroscopic design, the calculation distribution is optimized, and the Patchthyy operation in ViT is used for replacing the initial downsampling operation. With the block convolution in resenxt, the use of deep separable convolution reduces the amount of parameters while broadening the number of channels to compensate for capacity loss. And a reverse bottleneck structure in the MobileNet V2 is adopted to avoid information loss, and a 7 × 7 convolution kernel is used for replacing a 3 × 3 convolution kernel to obtain a larger receptive field. However, when extracting features, the same weight is given to all channels, which limits the classification performance of the algorithm and cannot accurately extract local features and long-distance spatial features.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides an improved ConvNeXt convolutional neural network and a remote sensing image classification method thereof, and can effectively fuse local key features and long-distance spatial features to realize classification.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides an improved ConvNeXt convolutional neural network, which is, from front to back, a Conv1 layer, a layer regularization, a ConvNeXt network, an attention mechanism module, a fully connected layer, and parallel random node samplers, a graph convolution network, a fully connected layer;
the ConvNeXT network comprises stages 1, 2, 3 and 4, wherein Stage1 comprises a plurality of ConvNeXT blocks, stage2 comprises a plurality of ConvNeXT blocks, stage3 comprises a plurality of ConvNeXT blocks, and Stage4 comprises a plurality of ConvNeXT blocks.
In a second aspect, the present invention provides a method for classifying remote sensing images, which is applied to the above improved ConvNeXt convolutional neural network, and includes:
acquiring a remote sensing image, and processing the remote sensing image into an image of 224 multiplied by 3;
respectively carrying out down-sampling, global feature extraction and local feature extraction on the processed remote sensing image, and then inputting the processed remote sensing image into an average pooling layer and a full-connection layer to obtain a first feature vector with the size of 1 multiplied by 1000;
respectively carrying out context information modeling, a random node sampler and a graph convolution network on the processed remote sensing image, and then inputting the remote sensing image into an average pooling layer and a full-connection layer to obtain a second feature vector with the size of 1 multiplied by 1000;
fusing the first feature vector and the second feature vector by using an addition strategy to obtain fused features;
and inputting the fused features into a full-connection layer and a Softmax classification layer to predict to obtain a final classification result.
Further, the steps of respectively performing down-sampling, global feature extraction and local feature extraction on the processed remote sensing image, and then inputting the processed remote sensing image into the average pooling layer and the full-connection layer to obtain a first feature vector with the size of 1 × 1 × 1000 include:
down-sampling the processed remote sensing image to obtain a characteristic map of 56 multiplied by 96;
extracting global features from the 56 × 56 × 96 feature map obtained by down-sampling by using depth convolution;
extracting local features of the feature map with attention mechanism after global features are extracted to obtain a feature map containing global features and local features;
and inputting the feature map containing the global features and the local features into an average pooling layer and a full-connection layer to obtain a first feature vector with the size of 1 × 1 × 1000.
Further, after the context information modeling, the random node sampler and the graph convolution network are respectively performed on the processed remote sensing image, a second feature vector with the size of 1 × 1 × 1000 is obtained by inputting the average pooling layer and the full connection layer, and the method comprises the following steps:
carrying out context information modeling on the processed remote sensing image through a graph structure to obtain image space information;
constructing a vertex set by using pixel points in the remote sensing image, determining the relation between vertexes according to the image space information, and constructing an adjacency graph;
inputting the adjacency graph into a random node sampler, and repeatedly sampling vertexes in the adjacency graph until all vertexes are sampled to generate a group of subgraphs;
inputting the subgraph into a graph convolution network, and extracting context characteristics of the subgraph;
and inputting the subgraph after the context features are extracted into an average pooling layer and a full-connection layer to obtain a second feature vector with the size of 1 multiplied by 1000.
Further, the extracting local features from the feature map after the global features are extracted by using an attention mechanism to obtain a feature map including the global features and the local features includes:
the feature map with the global features extracted is segmented into S parts, and then spatial information is extracted in a multi-scale convolution kernel grouping convolution mode; the convolution kernel size K and the number of groups G are set as follows:
Figure BDA0003916972800000031
after extracting the spatial information, cascading the feature maps of all parts to obtain a multi-scale fusion feature map, wherein the whole process is specifically calculated as follows:
F i =Conv(K i ,K i ,G i )(X i ),i=0,...,S-1
F=Concat([F 0 ,...,F S-1 ])
after passing through the attention mechanism module, obtaining channel level attention vector scale features by using ECA;
re-correcting the channel-level attention vector scale features by adopting a Softmax function, and applying the corrected attention vector to a multi-scale fusion feature map to obtain a feature map with richer multi-scale information;
adding the characteristic graph with more abundant multi-scale information to a pooling layer and a full-connection layer to finally obtain a characteristic vector F containing global characteristics and local characteristics CNN,AM ∈R 1000
Further, the constructing a vertex set by using pixel points in the remote sensing image, determining a relationship between vertices according to the image space information, and constructing an adjacency graph includes:
constructing a vertex set V by using pixel points in the remote sensing image, wherein an edge set E consists of any two vertexes V i And V j The relation between the adjacent graphs is formed, and an adjacent graph G (V, E) is constructed;
describing the relationship between vertices using an adjacency matrix A, the weight a of an edge in the adjacency matrix i,j Obtained from the following function:
Figure BDA0003916972800000041
in the formula: x is the number of i And x j Representing a vertex v i And v j The associated feature vector, σ, is the width parameter of the function.
Further, inputting the subgraph into a graph convolution network, and extracting context features of the subgraph, including:
inputting the subgraph into a graph convolution network, wherein the graph convolution network aggregates a vertex V and all the vertices u epsilon V s The transfer of neighborhood relationship is realized by the features between the vertex v and the conduction equation of the vertex v at the l-th layer is defined as follows:
Figure BDA0003916972800000042
in the formula: s represents the s-th sub-graph and the s-th batch of network training, W is a parameter matrix, h (-) is an activation function, b is a bias parameter,
Figure BDA0003916972800000051
is a contiguous matrix of self-connecting,
Figure BDA0003916972800000052
a degree matrix for D, defined as follows:
Figure BDA0003916972800000053
Figure BDA0003916972800000054
wherein i, j are rows and columns;
and (4) carrying out cascade operation on the output results of all the subgraphs to obtain the subgraphs after the context features are extracted.
In a third aspect, the present invention provides a device for classifying remote sensing images, comprising:
acquiring a remote sensing image, and processing the remote sensing image into an image of 224 multiplied by 3;
respectively carrying out down-sampling, global feature extraction and local feature extraction on the processed remote sensing image, and then inputting the processed remote sensing image into an average pooling layer and a full-connection layer to obtain a first feature vector with the size of 1 multiplied by 1000;
respectively carrying out context information modeling, a random node sampler and a graph convolution network on the processed remote sensing image, and inputting an average pooling layer and a full-connection layer to obtain a second feature vector with the size of 1 multiplied by 1000;
fusing the first feature vector and the second feature vector by using an addition strategy to obtain fused features;
and inputting the fused features into a full-connection layer and a Softmax classification layer to predict to obtain a final classification result.
In a fourth aspect, the present invention provides a device for classifying remote sensing images, comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of the preceding claims.
In a fifth aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of any one of the preceding claims.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides an improved ConvNeXt convolutional neural network and a classification method of remote sensing images thereof, which can fuse global feature information of different scales and assign different weights to the importance degrees of different channel feature maps, so that a model can more easily extract distinguishable features, and long-distance spatial information in the remote sensing images can be effectively modeled; the invention has the characteristics of high classification accuracy, small calculation parameter quantity and high speed.
Drawings
FIG. 1 is a structural diagram of SPCECA provided in an embodiment of the present invention;
FIG. 2 is a block diagram of an improved ConvNeXt convolutional neural network provided by an embodiment of the present invention;
FIG. 3 is a flow chart of an improved ConvNeXt convolutional neural network and a method for classifying remote sensing images thereof according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a ConvNeXt convolution structure and a downsampling layer structure according to an embodiment of the present invention;
fig. 5 is a structural diagram of an improved ConvNeXt convolutional neural network according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Example 1
As shown in fig. 5, the present embodiment introduces an improved ConvNeXt convolutional neural network, which is a Conv1 layer, a layer regularization, a ConvNeXt network, an attention mechanism module, a fully connected layer, and parallel random node samplers, a graph convolution network, a fully connected layer from front to back;
the ConvNeXT network comprises stages 1, 2, 3 and 4, wherein Stage1 comprises a plurality of ConvNeXT blocks, stage2 comprises a plurality of ConvNeXT blocks, stage3 comprises a plurality of ConvNeXT blocks, and Stage4 comprises a plurality of ConvNeXT blocks.
The network comprises two branches, specifically, an image with the size of 224 × 224 × 3 is input into a convolution branch, a feature map with the size of 56 × 56 × 96 is obtained through a first layer of down-sampling layer, then the feature map is input into ConvNeXt Block, the ConvNeXt Block extracts global features in the image through deep convolution, then the feature map containing the global features is input into a local feature extraction module, and the local feature extraction module uses AM to extract local features of the feature map. Then, the features are extracted according to the same method, and after passing through 3 layers of the module, the module enters an average pooling layer and a full connection layer to obtain a feature vector with the size of 1 multiplied by 1000. The image is input into a context feature extraction module, the context feature extraction module models context information of a deep image through a graph structure, the context information enters an average pooling layer and a full-connection layer after passing through a random node sampler and two layers of GCN networks to obtain feature vectors with the size of 1 multiplied by 1000, and finally, final classification output is obtained through feature fusion.
Example 2
The embodiment provides a method for classifying remote sensing images, which is applied to the improved ConvNeXt convolutional neural network in embodiment 1, and includes:
acquiring a remote sensing image, and processing the remote sensing image into an image of 224 multiplied by 3;
respectively carrying out down-sampling, global feature extraction and local feature extraction on the processed remote sensing image, and then inputting the processed remote sensing image into an average pooling layer and a full-connection layer to obtain a first feature vector with the size of 1 multiplied by 1000;
respectively carrying out context information modeling, a random node sampler and a graph convolution network on the processed remote sensing image, and inputting an average pooling layer and a full-connection layer to obtain a second feature vector with the size of 1 multiplied by 1000;
fusing the first feature vector and the second feature vector by using an addition strategy to obtain fused features;
and inputting the fused features into a full-connection layer and a Softmax classification layer to predict to obtain a final classification result.
As shown in fig. 1 to fig. 3, the application process of the improved ConvNeXt convolutional neural network and the classification method of the remote sensing image thereof provided by this embodiment specifically involves the following steps:
the method comprises the following steps: input of 224X 3 remote sensing image
The invention selects the method for testing the remote sensing scene classification data sets with different scales, namely UCMercered Land-Use (UCM) and initial Image Dataset (AID). The selected data set contains multiple types of scene images, each type of scene contains thousands of images, and all the selected images are processed into 224 multiplied by 3 images for input.
Step two: downsampling an input image
And inputting the image of the step one into a down-sampling layer for sampling, and then obtaining a feature map of 56 × 56 × 96. The downsampled layer structure is as in fig. 4 (right).
Step three: modeling deep image context information by graph structure
After the image is input in the first step, the image is input into a context feature extraction module while down-sampling, and long-distance spatial information is modeled by using a graph structure by adopting a GCN method. As can be seen from FIG. 4, the vertex set V is constructed by using the pixel points in the remote sensing image, and the edge set E is formed by any two vertices V i And V j The relationship between them constitutes, construct the adjacency graph G (V, E). Describing the relationship between vertices using an adjacency matrix A, the weight a of an edge in the adjacency matrix i,j Obtained from the following function:
Figure BDA0003916972800000081
in the formula: x is a radical of a fluorine atom i And x j Representing a vertex v i And v j The associated feature vector, σ, is the width parameter of the function.
Step four: extracting global features using deep convolution
After the downsampling in the second step, a feature map of 56 × 56 × 96 is obtained, and then the capacity loss is compensated by using the number of channels while reducing the parameter amount by using the depth separable convolution by using the block convolution. Meanwhile, a reverse bottleneck structure in the MobileNet V2 is adopted to avoid information loss, and 7 is used × 7 convolution kernel instead of 3 × The convolution kernel acquires a larger receptive field. Meanwhile, a smoother GELU function is adopted, fewer activation functions and regularization functions are used, and a layer regularization function is adopted, so that the model can be trained more accurately and efficiently by adopting the design. And the network depth is increased, the characteristic diagram with a larger receptive field is extracted, and the overall content of the scene image is better expressed.
Step five: local feature extraction using AM
The AM can obviously improve the performance of ConvNeXt, and the ECA is a local cross-channel interaction attention mechanism without dimension reduction, and the local cross-channel interaction range is determined through one-dimensional convolution. However, the ground object targets in the remote sensing image are usually small and dispersed, and the key area is judged only by using each feature map, so that misjudgment often occurs. ECAs can only capture channel information, neglecting the importance of spatial information, and thus need to be improved. An SPCECA mechanism is provided, channels are segmented by adding an SPC module before ECA, and multi-scale feature extraction is carried out on spatial information on each channel feature map, so that the channel and the spatial information are effectively combined. The number of channels of the input feature map is divided into S parts, and then spatial information is extracted in a multi-scale convolution kernel grouping convolution mode, so that the grouping convolution can reduce the parameter number. The convolution kernel size K and the number of groups G are set as follows:
Figure BDA0003916972800000091
after extraction, cascading all the characteristic diagrams to obtain a multi-scale fusion characteristic diagram, wherein the whole process is specifically calculated as follows:
F i =Conv(K i ,K i ,G i )(X i ),i=0,...,S-1
F=Concat([F 0 ,...,F S-1 ])
after the SPC module, the ECA is used for obtaining the scale characteristics of the channel-level attention vector, the Softmax function is used for re-correcting the attention vector, the corrected attention vector acts on the multi-scale characteristic diagram, and the characteristic diagram with rich multi-scale information is obtained. Adding a GAP layer and a full connection layer (Fc layer) at the end of the network to finally obtain a feature vector F containing global features and local features CNN,AM ∈R 1000
Step six: sampling by a random node sampler
After modeling of the context information in the third step, in order to reduce the calculation cost of the GCN, before each iteration of the GCN, a random node sampler with the size of M is used, and the vertexes in the graph G are repeatedly sampled until all the vertexes are sampled, so that a group of subgraphs are generated.
Step seven: GCN extraction of image context features
And after the sixth sampling, inputting the subgraph into a GCN, wherein the GCN comprises a GCN layer and an Fc layer. To improve the stability of the modelUsing self-connecting adjacent matrices
Figure BDA0003916972800000101
And
Figure BDA0003916972800000102
the degree matrix of (c). GCN is formed by aggregating vertex V and all vertices u ∈ V s The transfer of neighborhood relationship is realized by the features between the vertex v and the conduction equation of the vertex v at the l-th layer is defined as follows:
Figure BDA0003916972800000103
in the formula: s denotes the s sub-graph and the s batch of network training. W is a parameter matrix, h (-) is an activation function, b is a bias parameter,
Figure BDA0003916972800000104
is a contiguous matrix of self-connecting,
Figure BDA0003916972800000105
a degree matrix of D, defined as follows:
Figure BDA0003916972800000106
Figure BDA0003916972800000107
where i, j are rows and columns. And performing cascade operation on the output results of all the subgraphs to obtain a final output result. miniGCN can make GCN complexity from O (NDP + N) 2 D) And (4) reducing to O (NDP + NMD), wherein N is the number of GCN vertexes, D and P are input and output characteristic dimensions, and M is the number of subgraph vertexes, and meanwhile, achieving a better network local optimal result. Further compressing the context characteristics of the result output by the GCN layer through an Fc layer to finally obtain the long-distance spatial characteristic F with the dimension of 1000 GCN ∈R 1000
Step eight: obtaining the final classification result by feature fusion
Different network structures have different feature representations extracted from remote sensing images, and generally, due to the lack of feature diversity, a single model often cannot achieve the best performance. ConvNeXt and GCN added with AM through combined training enhance feature recognition capability, features extracted by ConvNeXt and GCN are fused by using an addition strategy, and feature F after fusion fusion Is represented as follows:
Figure BDA0003916972800000111
and inputting the result obtained by the formula into the Fc layer and the Softmax classification layer to predict the final classification result.
The experimental result data pair of the invention and the baseline model is shown in table 1, and the baseline model comprises a classic CNN network and a visual Transformer network:
TABLE 1
Model (model) Rate of accuracy Quantity of model parameters
ViT 96.40% 89237152
ConvNeXt 97.25% 87597214
The invention 99.18% 32754842
The invention provides an improved ConvNeXt convolutional neural network and a classification method of remote sensing images thereof, which can fuse global feature information of different scales and assign different weights to the importance degrees of different channel feature maps, so that a model can more easily extract distinguishable features, and long-distance spatial information in the remote sensing images can be effectively modeled; the invention has the characteristics of high classification accuracy, small calculation parameter quantity and high speed.
Example 3
The embodiment provides a classification device of remote sensing images, including:
acquiring a remote sensing image, and processing the remote sensing image into an image of 224 multiplied by 3;
respectively carrying out down-sampling, global feature extraction and local feature extraction on the processed remote sensing image, and then inputting the processed remote sensing image into an average pooling layer and a full-connection layer to obtain a first feature vector with the size of 1 multiplied by 1000;
respectively carrying out context information modeling, a random node sampler and a graph convolution network on the processed remote sensing image, and inputting an average pooling layer and a full-connection layer to obtain a second feature vector with the size of 1 multiplied by 1000;
fusing the first feature vector and the second feature vector by using an addition strategy to obtain fused features;
and inputting the fused features into a full-connection layer and a Softmax classification layer to predict to obtain a final classification result.
Example 4
The embodiment provides a classification device of remote sensing images, which comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any of embodiment 2.
Example 5
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method of any of embodiment 2.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various improvements and modifications without departing from the technical principle of the present invention, and those improvements and modifications should be considered as the protection scope of the present invention.

Claims (10)

1. An improved ConvNeXt convolutional neural network, which is characterized in that the improved ConvNeXt convolutional neural network comprises a Conv1 layer, a layer regularization, a ConvNeXT network, an attention mechanism module, a full connection layer, a parallel random node sampler, a graph convolution network and a full connection layer from front to back;
the ConvNeXT network comprises stages 1, 2, 3 and 4, wherein Stage1 comprises a plurality of ConvNeXT blocks, stage2 comprises a plurality of ConvNeXT blocks, stage3 comprises a plurality of ConvNeXT blocks, and Stage4 comprises a plurality of ConvNeXT blocks.
2. A method for classifying remote sensing images, which is applied to the improved ConvNeXt convolutional neural network as claimed in claim 1, and is characterized by comprising the following steps:
acquiring a remote sensing image, and processing the remote sensing image into an image of 224 multiplied by 3;
respectively carrying out down-sampling, global feature extraction and local feature extraction on the processed remote sensing image, and then inputting the processed remote sensing image into an average pooling layer and a full-connection layer to obtain a first feature vector with the size of 1 multiplied by 1000;
respectively carrying out context information modeling, a random node sampler and a graph convolution network on the processed remote sensing image, and inputting an average pooling layer and a full-connection layer to obtain a second feature vector with the size of 1 multiplied by 1000;
fusing the first feature vector and the second feature vector by using an addition strategy to obtain fused features;
and inputting the fused features into a full-connection layer and a Softmax classification layer to predict to obtain a final classification result.
3. The method for classifying remote sensing images according to claim 2, wherein the steps of down-sampling, global feature extraction and local feature extraction are performed on the processed remote sensing images, and then the processed remote sensing images are input into an average pooling layer and a full connection layer to obtain a first feature vector with the size of 1 x 1000 comprise:
down-sampling the processed remote sensing image to obtain a characteristic map of 56 multiplied by 96;
extracting global features from the 56 × 56 × 96 feature map obtained by down-sampling by using depth convolution;
extracting local features of the feature map with attention mechanism after global features are extracted to obtain a feature map containing global features and local features;
and inputting the feature map containing the global features and the local features into an average pooling layer and a full-connection layer to obtain a first feature vector with the size of 1 × 1 × 1000.
4. The method for classifying remote sensing images according to claim 2, wherein after the context information modeling, the random node sampler and the graph convolution network are respectively performed on the processed remote sensing images, an average pooling layer and a full connection layer are input to obtain a second eigenvector with a size of 1 x 1000, comprising:
carrying out context information modeling on the processed remote sensing image through a graph structure to obtain image space information;
constructing a vertex set by using pixel points in the remote sensing image, determining the relation between vertexes according to the image space information, and constructing an adjacency graph;
inputting the adjacency graph into a random node sampler, and repeatedly sampling vertexes in the adjacency graph until all vertexes are sampled to generate a group of subgraphs;
inputting the subgraph into a graph convolution network, and extracting context characteristics of the subgraph;
and inputting the subgraph after the context features are extracted into an average pooling layer and a full-connection layer to obtain a second feature vector with the size of 1 multiplied by 1000.
5. The method for classifying remote sensing images according to claim 3, wherein the step of extracting local features from the feature map after global features are extracted by adopting an attention mechanism to obtain the feature map containing the global features and the local features comprises the following steps:
the feature map with the global features extracted is segmented into S parts, and then spatial information is extracted in a multi-scale convolution kernel grouping convolution mode; the convolution kernel size K and the group number G are set as follows:
Figure FDA0003916972790000021
after extracting the spatial information, cascading the feature maps of all parts to obtain a multi-scale fusion feature map, wherein the whole process is specifically calculated as follows:
F i =Conv(K i ,K i ,G i )(X i ),i=0,...,S-1
F=Concat([F 0 ,...,F S-1 ])
after passing through the attention mechanism module, obtaining a channel-level attention vector scale characteristic by using ECA;
performing re-correction on the channel level attention vector scale features by adopting a Softmax function, and acting the corrected attention vector on a multi-scale fusion feature map to obtain a feature map with more abundant multi-scale information;
adding the characteristic graph with more abundant multi-scale information to a pooling layer and a full-connection layer to finally obtain a characteristic vector F containing global characteristics and local characteristics CNN,AM ∈R 1000
6. The method for classifying remote sensing images according to claim 4, wherein the step of constructing a vertex set by using pixel points in the remote sensing images, determining the relation between the vertices according to the image space information and constructing an adjacency graph comprises the steps of:
constructing a vertex set V by using pixel points in the remote sensing image, wherein an edge set E consists of any two vertexes V i And V j Constructing an adjacency graph G (V, E);
describing the relationship between vertices using an adjacency matrix A, the weight a of an edge in the adjacency matrix i,j Obtained from the following function:
Figure FDA0003916972790000031
in the formula: x is the number of i And x j Representing a vertex v i And v j The associated feature vector, σ, is the width parameter of the function.
7. The method for classifying remote sensing images according to claim 2, wherein the step of inputting the subgraph into a graph convolution network and extracting context features of the subgraph comprises the following steps:
inputting the subgraph into a graph convolution network, wherein the graph convolution network aggregates a vertex V and all the vertices u epsilon V s The transfer of neighborhood relationship is realized by the features between the vertex v and the conduction equation of the vertex v at the l-th layer is defined as follows:
Figure FDA0003916972790000041
in the formula: s represents the s-th sub-graph and the s-th batch of network training, W is a parameter matrix, h (-) is an activation function, b is a bias parameter,
Figure FDA0003916972790000042
is a contiguous matrix of self-connecting links,
Figure FDA0003916972790000043
a degree matrix of D, defined as follows:
Figure FDA0003916972790000044
Figure FDA0003916972790000045
wherein i, j are rows and columns;
and (4) carrying out cascade operation on the output results of all the subgraphs to obtain the subgraphs after the context features are extracted.
8. A device for classifying a remote sensing image, comprising:
acquiring a remote sensing image, and processing the remote sensing image into an image of 224 multiplied by 3;
respectively carrying out down-sampling, global feature extraction and local feature extraction on the processed remote sensing image, and then inputting the processed remote sensing image into an average pooling layer and a full-connection layer to obtain a first feature vector with the size of 1 multiplied by 1000;
respectively carrying out context information modeling, a random node sampler and a graph convolution network on the processed remote sensing image, and inputting an average pooling layer and a full-connection layer to obtain a second feature vector with the size of 1 multiplied by 1000;
fusing the first feature vector and the second feature vector by using an addition strategy to obtain fused features;
and inputting the fused features into a full connection layer and a Softmax classification layer to predict and obtain a final classification result.
9. A classification device for remote sensing images is characterized in that: comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of claims 2 to 7.
10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the program when executed by a processor implements the steps of the method of any one of claims 2 to 7.
CN202211342737.2A 2022-10-31 2022-10-31 Improved ConvNeXt convolutional neural network and remote sensing image classification method thereof Pending CN115545166A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211342737.2A CN115545166A (en) 2022-10-31 2022-10-31 Improved ConvNeXt convolutional neural network and remote sensing image classification method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211342737.2A CN115545166A (en) 2022-10-31 2022-10-31 Improved ConvNeXt convolutional neural network and remote sensing image classification method thereof

Publications (1)

Publication Number Publication Date
CN115545166A true CN115545166A (en) 2022-12-30

Family

ID=84718320

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211342737.2A Pending CN115545166A (en) 2022-10-31 2022-10-31 Improved ConvNeXt convolutional neural network and remote sensing image classification method thereof

Country Status (1)

Country Link
CN (1) CN115545166A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051913A (en) * 2023-04-03 2023-05-02 吉林农业大学 Pilose antler decoction piece classification recognition model, method and system
CN116403056A (en) * 2023-06-07 2023-07-07 吉林农业大学 Ginseng grading system and method
CN116758360A (en) * 2023-08-21 2023-09-15 江西省国土空间调查规划研究院 Land space use management method and system thereof

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051913A (en) * 2023-04-03 2023-05-02 吉林农业大学 Pilose antler decoction piece classification recognition model, method and system
CN116051913B (en) * 2023-04-03 2023-05-30 吉林农业大学 Pilose antler decoction piece classification recognition model, method and system
CN116403056A (en) * 2023-06-07 2023-07-07 吉林农业大学 Ginseng grading system and method
CN116403056B (en) * 2023-06-07 2023-10-20 无锡学院 Ginseng grading system and method
CN116758360A (en) * 2023-08-21 2023-09-15 江西省国土空间调查规划研究院 Land space use management method and system thereof
CN116758360B (en) * 2023-08-21 2023-10-20 江西省国土空间调查规划研究院 Land space use management method and system thereof

Similar Documents

Publication Publication Date Title
CN110188685B (en) Target counting method and system based on double-attention multi-scale cascade network
CN111259905B (en) Feature fusion remote sensing image semantic segmentation method based on downsampling
Wang et al. Ultra-dense GAN for satellite imagery super-resolution
CN115545166A (en) Improved ConvNeXt convolutional neural network and remote sensing image classification method thereof
CN108288270B (en) Target detection method based on channel pruning and full convolution deep learning
CN114255238A (en) Three-dimensional point cloud scene segmentation method and system fusing image features
CN112381097A (en) Scene semantic segmentation method based on deep learning
CN108875076B (en) Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network
CN109284741A (en) A kind of extensive Remote Sensing Image Retrieval method and system based on depth Hash network
CN114332104B (en) Power grid power transmission scene RGB point cloud semantic segmentation multi-stage model joint optimization method
CN112329801A (en) Convolutional neural network non-local information construction method
CN115527036A (en) Power grid scene point cloud semantic segmentation method and device, computer equipment and medium
CN112766102A (en) Unsupervised hyperspectral video target tracking method based on space-spectrum feature fusion
CN115410087A (en) Transmission line foreign matter detection method based on improved YOLOv4
CN110264483B (en) Semantic image segmentation method based on deep learning
CN115272670A (en) SAR image ship instance segmentation method based on mask attention interaction
CN114359902A (en) Three-dimensional point cloud semantic segmentation method based on multi-scale feature fusion
CN116434039B (en) Target detection method based on multiscale split attention mechanism
CN114494284B (en) Scene analysis model and method based on explicit supervision area relation
Babaee et al. Assessment of dimensionality reduction based on communication channel model; application to immersive information visualization
CN116797640A (en) Depth and 3D key point estimation method for intelligent companion line inspection device
CN112990336B (en) Deep three-dimensional point cloud classification network construction method based on competitive attention fusion
CN115527082A (en) Deep learning small target detection method based on image multi-preprocessing
CN115165363A (en) CNN-based light bearing fault diagnosis method and system
CN115424012A (en) Lightweight image semantic segmentation method based on context information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination