CN115545166A - Improved ConvNeXt convolutional neural network and remote sensing image classification method thereof - Google Patents
Improved ConvNeXt convolutional neural network and remote sensing image classification method thereof Download PDFInfo
- Publication number
- CN115545166A CN115545166A CN202211342737.2A CN202211342737A CN115545166A CN 115545166 A CN115545166 A CN 115545166A CN 202211342737 A CN202211342737 A CN 202211342737A CN 115545166 A CN115545166 A CN 115545166A
- Authority
- CN
- China
- Prior art keywords
- remote sensing
- features
- layer
- sensing image
- inputting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an improved ConvNeXt convolutional neural network and a classification method of remote sensing images thereof, wherein the method comprises the steps of obtaining the remote sensing images, and processing the remote sensing images into images of 224 multiplied by 3; respectively carrying out down-sampling, global feature extraction and local feature extraction on the processed remote sensing image, and then inputting the processed remote sensing image into an average pooling layer and a full-connection layer to obtain a first feature vector with the size of 1 multiplied by 1000; respectively carrying out context information modeling, a random node sampler and a graph convolution network on the processed remote sensing image, and inputting an average pooling layer and a full-connection layer to obtain a second feature vector with the size of 1 multiplied by 1000; fusing the first feature vector and the second feature vector by using an addition strategy to obtain fused features; inputting the fused features into a full-link layer and a Softmax classification layer to predict and obtain a final classification result.
Description
Technical Field
The invention relates to an improved ConvNeXt convolutional neural network and a remote sensing image classification method thereof, and belongs to the technical field of image classification.
Background
The high-resolution remote sensing image scene classification is an important component of remote sensing data processing, namely, a fixed semantic label is automatically allocated to each scene image, and the method is widely applied to the fields of urban planning, emergency disasters, land utilization, environment monitoring and the like. The early remote sensing image classification mainly uses a method for manually designing features, needs experts to carefully design, visually and explicitly extract the features aiming at the characteristics of different scenes, and is used for classification tasks after being coded. But are typically low-level dense features that contain a large amount of redundant information that affects the classification accuracy.
ConvNeXt is one of the best performance models in the field of image classification at present, and in macroscopic design, the calculation distribution is optimized, and the Patchthyy operation in ViT is used for replacing the initial downsampling operation. With the block convolution in resenxt, the use of deep separable convolution reduces the amount of parameters while broadening the number of channels to compensate for capacity loss. And a reverse bottleneck structure in the MobileNet V2 is adopted to avoid information loss, and a 7 × 7 convolution kernel is used for replacing a 3 × 3 convolution kernel to obtain a larger receptive field. However, when extracting features, the same weight is given to all channels, which limits the classification performance of the algorithm and cannot accurately extract local features and long-distance spatial features.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides an improved ConvNeXt convolutional neural network and a remote sensing image classification method thereof, and can effectively fuse local key features and long-distance spatial features to realize classification.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides an improved ConvNeXt convolutional neural network, which is, from front to back, a Conv1 layer, a layer regularization, a ConvNeXt network, an attention mechanism module, a fully connected layer, and parallel random node samplers, a graph convolution network, a fully connected layer;
the ConvNeXT network comprises stages 1, 2, 3 and 4, wherein Stage1 comprises a plurality of ConvNeXT blocks, stage2 comprises a plurality of ConvNeXT blocks, stage3 comprises a plurality of ConvNeXT blocks, and Stage4 comprises a plurality of ConvNeXT blocks.
In a second aspect, the present invention provides a method for classifying remote sensing images, which is applied to the above improved ConvNeXt convolutional neural network, and includes:
acquiring a remote sensing image, and processing the remote sensing image into an image of 224 multiplied by 3;
respectively carrying out down-sampling, global feature extraction and local feature extraction on the processed remote sensing image, and then inputting the processed remote sensing image into an average pooling layer and a full-connection layer to obtain a first feature vector with the size of 1 multiplied by 1000;
respectively carrying out context information modeling, a random node sampler and a graph convolution network on the processed remote sensing image, and then inputting the remote sensing image into an average pooling layer and a full-connection layer to obtain a second feature vector with the size of 1 multiplied by 1000;
fusing the first feature vector and the second feature vector by using an addition strategy to obtain fused features;
and inputting the fused features into a full-connection layer and a Softmax classification layer to predict to obtain a final classification result.
Further, the steps of respectively performing down-sampling, global feature extraction and local feature extraction on the processed remote sensing image, and then inputting the processed remote sensing image into the average pooling layer and the full-connection layer to obtain a first feature vector with the size of 1 × 1 × 1000 include:
down-sampling the processed remote sensing image to obtain a characteristic map of 56 multiplied by 96;
extracting global features from the 56 × 56 × 96 feature map obtained by down-sampling by using depth convolution;
extracting local features of the feature map with attention mechanism after global features are extracted to obtain a feature map containing global features and local features;
and inputting the feature map containing the global features and the local features into an average pooling layer and a full-connection layer to obtain a first feature vector with the size of 1 × 1 × 1000.
Further, after the context information modeling, the random node sampler and the graph convolution network are respectively performed on the processed remote sensing image, a second feature vector with the size of 1 × 1 × 1000 is obtained by inputting the average pooling layer and the full connection layer, and the method comprises the following steps:
carrying out context information modeling on the processed remote sensing image through a graph structure to obtain image space information;
constructing a vertex set by using pixel points in the remote sensing image, determining the relation between vertexes according to the image space information, and constructing an adjacency graph;
inputting the adjacency graph into a random node sampler, and repeatedly sampling vertexes in the adjacency graph until all vertexes are sampled to generate a group of subgraphs;
inputting the subgraph into a graph convolution network, and extracting context characteristics of the subgraph;
and inputting the subgraph after the context features are extracted into an average pooling layer and a full-connection layer to obtain a second feature vector with the size of 1 multiplied by 1000.
Further, the extracting local features from the feature map after the global features are extracted by using an attention mechanism to obtain a feature map including the global features and the local features includes:
the feature map with the global features extracted is segmented into S parts, and then spatial information is extracted in a multi-scale convolution kernel grouping convolution mode; the convolution kernel size K and the number of groups G are set as follows:
after extracting the spatial information, cascading the feature maps of all parts to obtain a multi-scale fusion feature map, wherein the whole process is specifically calculated as follows:
F i =Conv(K i ,K i ,G i )(X i ),i=0,...,S-1
F=Concat([F 0 ,...,F S-1 ])
after passing through the attention mechanism module, obtaining channel level attention vector scale features by using ECA;
re-correcting the channel-level attention vector scale features by adopting a Softmax function, and applying the corrected attention vector to a multi-scale fusion feature map to obtain a feature map with richer multi-scale information;
adding the characteristic graph with more abundant multi-scale information to a pooling layer and a full-connection layer to finally obtain a characteristic vector F containing global characteristics and local characteristics CNN,AM ∈R 1000 。
Further, the constructing a vertex set by using pixel points in the remote sensing image, determining a relationship between vertices according to the image space information, and constructing an adjacency graph includes:
constructing a vertex set V by using pixel points in the remote sensing image, wherein an edge set E consists of any two vertexes V i And V j The relation between the adjacent graphs is formed, and an adjacent graph G (V, E) is constructed;
describing the relationship between vertices using an adjacency matrix A, the weight a of an edge in the adjacency matrix i,j Obtained from the following function:
in the formula: x is the number of i And x j Representing a vertex v i And v j The associated feature vector, σ, is the width parameter of the function.
Further, inputting the subgraph into a graph convolution network, and extracting context features of the subgraph, including:
inputting the subgraph into a graph convolution network, wherein the graph convolution network aggregates a vertex V and all the vertices u epsilon V s The transfer of neighborhood relationship is realized by the features between the vertex v and the conduction equation of the vertex v at the l-th layer is defined as follows:
in the formula: s represents the s-th sub-graph and the s-th batch of network training, W is a parameter matrix, h (-) is an activation function, b is a bias parameter,is a contiguous matrix of self-connecting,a degree matrix for D, defined as follows:
wherein i, j are rows and columns;
and (4) carrying out cascade operation on the output results of all the subgraphs to obtain the subgraphs after the context features are extracted.
In a third aspect, the present invention provides a device for classifying remote sensing images, comprising:
acquiring a remote sensing image, and processing the remote sensing image into an image of 224 multiplied by 3;
respectively carrying out down-sampling, global feature extraction and local feature extraction on the processed remote sensing image, and then inputting the processed remote sensing image into an average pooling layer and a full-connection layer to obtain a first feature vector with the size of 1 multiplied by 1000;
respectively carrying out context information modeling, a random node sampler and a graph convolution network on the processed remote sensing image, and inputting an average pooling layer and a full-connection layer to obtain a second feature vector with the size of 1 multiplied by 1000;
fusing the first feature vector and the second feature vector by using an addition strategy to obtain fused features;
and inputting the fused features into a full-connection layer and a Softmax classification layer to predict to obtain a final classification result.
In a fourth aspect, the present invention provides a device for classifying remote sensing images, comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of the preceding claims.
In a fifth aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of any one of the preceding claims.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides an improved ConvNeXt convolutional neural network and a classification method of remote sensing images thereof, which can fuse global feature information of different scales and assign different weights to the importance degrees of different channel feature maps, so that a model can more easily extract distinguishable features, and long-distance spatial information in the remote sensing images can be effectively modeled; the invention has the characteristics of high classification accuracy, small calculation parameter quantity and high speed.
Drawings
FIG. 1 is a structural diagram of SPCECA provided in an embodiment of the present invention;
FIG. 2 is a block diagram of an improved ConvNeXt convolutional neural network provided by an embodiment of the present invention;
FIG. 3 is a flow chart of an improved ConvNeXt convolutional neural network and a method for classifying remote sensing images thereof according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a ConvNeXt convolution structure and a downsampling layer structure according to an embodiment of the present invention;
fig. 5 is a structural diagram of an improved ConvNeXt convolutional neural network according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Example 1
As shown in fig. 5, the present embodiment introduces an improved ConvNeXt convolutional neural network, which is a Conv1 layer, a layer regularization, a ConvNeXt network, an attention mechanism module, a fully connected layer, and parallel random node samplers, a graph convolution network, a fully connected layer from front to back;
the ConvNeXT network comprises stages 1, 2, 3 and 4, wherein Stage1 comprises a plurality of ConvNeXT blocks, stage2 comprises a plurality of ConvNeXT blocks, stage3 comprises a plurality of ConvNeXT blocks, and Stage4 comprises a plurality of ConvNeXT blocks.
The network comprises two branches, specifically, an image with the size of 224 × 224 × 3 is input into a convolution branch, a feature map with the size of 56 × 56 × 96 is obtained through a first layer of down-sampling layer, then the feature map is input into ConvNeXt Block, the ConvNeXt Block extracts global features in the image through deep convolution, then the feature map containing the global features is input into a local feature extraction module, and the local feature extraction module uses AM to extract local features of the feature map. Then, the features are extracted according to the same method, and after passing through 3 layers of the module, the module enters an average pooling layer and a full connection layer to obtain a feature vector with the size of 1 multiplied by 1000. The image is input into a context feature extraction module, the context feature extraction module models context information of a deep image through a graph structure, the context information enters an average pooling layer and a full-connection layer after passing through a random node sampler and two layers of GCN networks to obtain feature vectors with the size of 1 multiplied by 1000, and finally, final classification output is obtained through feature fusion.
Example 2
The embodiment provides a method for classifying remote sensing images, which is applied to the improved ConvNeXt convolutional neural network in embodiment 1, and includes:
acquiring a remote sensing image, and processing the remote sensing image into an image of 224 multiplied by 3;
respectively carrying out down-sampling, global feature extraction and local feature extraction on the processed remote sensing image, and then inputting the processed remote sensing image into an average pooling layer and a full-connection layer to obtain a first feature vector with the size of 1 multiplied by 1000;
respectively carrying out context information modeling, a random node sampler and a graph convolution network on the processed remote sensing image, and inputting an average pooling layer and a full-connection layer to obtain a second feature vector with the size of 1 multiplied by 1000;
fusing the first feature vector and the second feature vector by using an addition strategy to obtain fused features;
and inputting the fused features into a full-connection layer and a Softmax classification layer to predict to obtain a final classification result.
As shown in fig. 1 to fig. 3, the application process of the improved ConvNeXt convolutional neural network and the classification method of the remote sensing image thereof provided by this embodiment specifically involves the following steps:
the method comprises the following steps: input of 224X 3 remote sensing image
The invention selects the method for testing the remote sensing scene classification data sets with different scales, namely UCMercered Land-Use (UCM) and initial Image Dataset (AID). The selected data set contains multiple types of scene images, each type of scene contains thousands of images, and all the selected images are processed into 224 multiplied by 3 images for input.
Step two: downsampling an input image
And inputting the image of the step one into a down-sampling layer for sampling, and then obtaining a feature map of 56 × 56 × 96. The downsampled layer structure is as in fig. 4 (right).
Step three: modeling deep image context information by graph structure
After the image is input in the first step, the image is input into a context feature extraction module while down-sampling, and long-distance spatial information is modeled by using a graph structure by adopting a GCN method. As can be seen from FIG. 4, the vertex set V is constructed by using the pixel points in the remote sensing image, and the edge set E is formed by any two vertices V i And V j The relationship between them constitutes, construct the adjacency graph G (V, E). Describing the relationship between vertices using an adjacency matrix A, the weight a of an edge in the adjacency matrix i,j Obtained from the following function:
in the formula: x is a radical of a fluorine atom i And x j Representing a vertex v i And v j The associated feature vector, σ, is the width parameter of the function.
Step four: extracting global features using deep convolution
After the downsampling in the second step, a feature map of 56 × 56 × 96 is obtained, and then the capacity loss is compensated by using the number of channels while reducing the parameter amount by using the depth separable convolution by using the block convolution. Meanwhile, a reverse bottleneck structure in the MobileNet V2 is adopted to avoid information loss, and 7 is used × 7 convolution kernel instead of 3 × The convolution kernel acquires a larger receptive field. Meanwhile, a smoother GELU function is adopted, fewer activation functions and regularization functions are used, and a layer regularization function is adopted, so that the model can be trained more accurately and efficiently by adopting the design. And the network depth is increased, the characteristic diagram with a larger receptive field is extracted, and the overall content of the scene image is better expressed.
Step five: local feature extraction using AM
The AM can obviously improve the performance of ConvNeXt, and the ECA is a local cross-channel interaction attention mechanism without dimension reduction, and the local cross-channel interaction range is determined through one-dimensional convolution. However, the ground object targets in the remote sensing image are usually small and dispersed, and the key area is judged only by using each feature map, so that misjudgment often occurs. ECAs can only capture channel information, neglecting the importance of spatial information, and thus need to be improved. An SPCECA mechanism is provided, channels are segmented by adding an SPC module before ECA, and multi-scale feature extraction is carried out on spatial information on each channel feature map, so that the channel and the spatial information are effectively combined. The number of channels of the input feature map is divided into S parts, and then spatial information is extracted in a multi-scale convolution kernel grouping convolution mode, so that the grouping convolution can reduce the parameter number. The convolution kernel size K and the number of groups G are set as follows:
after extraction, cascading all the characteristic diagrams to obtain a multi-scale fusion characteristic diagram, wherein the whole process is specifically calculated as follows:
F i =Conv(K i ,K i ,G i )(X i ),i=0,...,S-1
F=Concat([F 0 ,...,F S-1 ])
after the SPC module, the ECA is used for obtaining the scale characteristics of the channel-level attention vector, the Softmax function is used for re-correcting the attention vector, the corrected attention vector acts on the multi-scale characteristic diagram, and the characteristic diagram with rich multi-scale information is obtained. Adding a GAP layer and a full connection layer (Fc layer) at the end of the network to finally obtain a feature vector F containing global features and local features CNN,AM ∈R 1000 。
Step six: sampling by a random node sampler
After modeling of the context information in the third step, in order to reduce the calculation cost of the GCN, before each iteration of the GCN, a random node sampler with the size of M is used, and the vertexes in the graph G are repeatedly sampled until all the vertexes are sampled, so that a group of subgraphs are generated.
Step seven: GCN extraction of image context features
And after the sixth sampling, inputting the subgraph into a GCN, wherein the GCN comprises a GCN layer and an Fc layer. To improve the stability of the modelUsing self-connecting adjacent matricesAndthe degree matrix of (c). GCN is formed by aggregating vertex V and all vertices u ∈ V s The transfer of neighborhood relationship is realized by the features between the vertex v and the conduction equation of the vertex v at the l-th layer is defined as follows:
in the formula: s denotes the s sub-graph and the s batch of network training. W is a parameter matrix, h (-) is an activation function, b is a bias parameter,is a contiguous matrix of self-connecting,a degree matrix of D, defined as follows:
where i, j are rows and columns. And performing cascade operation on the output results of all the subgraphs to obtain a final output result. miniGCN can make GCN complexity from O (NDP + N) 2 D) And (4) reducing to O (NDP + NMD), wherein N is the number of GCN vertexes, D and P are input and output characteristic dimensions, and M is the number of subgraph vertexes, and meanwhile, achieving a better network local optimal result. Further compressing the context characteristics of the result output by the GCN layer through an Fc layer to finally obtain the long-distance spatial characteristic F with the dimension of 1000 GCN ∈R 1000 。
Step eight: obtaining the final classification result by feature fusion
Different network structures have different feature representations extracted from remote sensing images, and generally, due to the lack of feature diversity, a single model often cannot achieve the best performance. ConvNeXt and GCN added with AM through combined training enhance feature recognition capability, features extracted by ConvNeXt and GCN are fused by using an addition strategy, and feature F after fusion fusion Is represented as follows:
and inputting the result obtained by the formula into the Fc layer and the Softmax classification layer to predict the final classification result.
The experimental result data pair of the invention and the baseline model is shown in table 1, and the baseline model comprises a classic CNN network and a visual Transformer network:
TABLE 1
Model (model) | Rate of accuracy | Quantity of model parameters |
ViT | 96.40% | 89237152 |
ConvNeXt | 97.25% | 87597214 |
The invention | 99.18% | 32754842 |
The invention provides an improved ConvNeXt convolutional neural network and a classification method of remote sensing images thereof, which can fuse global feature information of different scales and assign different weights to the importance degrees of different channel feature maps, so that a model can more easily extract distinguishable features, and long-distance spatial information in the remote sensing images can be effectively modeled; the invention has the characteristics of high classification accuracy, small calculation parameter quantity and high speed.
Example 3
The embodiment provides a classification device of remote sensing images, including:
acquiring a remote sensing image, and processing the remote sensing image into an image of 224 multiplied by 3;
respectively carrying out down-sampling, global feature extraction and local feature extraction on the processed remote sensing image, and then inputting the processed remote sensing image into an average pooling layer and a full-connection layer to obtain a first feature vector with the size of 1 multiplied by 1000;
respectively carrying out context information modeling, a random node sampler and a graph convolution network on the processed remote sensing image, and inputting an average pooling layer and a full-connection layer to obtain a second feature vector with the size of 1 multiplied by 1000;
fusing the first feature vector and the second feature vector by using an addition strategy to obtain fused features;
and inputting the fused features into a full-connection layer and a Softmax classification layer to predict to obtain a final classification result.
Example 4
The embodiment provides a classification device of remote sensing images, which comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any of embodiment 2.
Example 5
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method of any of embodiment 2.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various improvements and modifications without departing from the technical principle of the present invention, and those improvements and modifications should be considered as the protection scope of the present invention.
Claims (10)
1. An improved ConvNeXt convolutional neural network, which is characterized in that the improved ConvNeXt convolutional neural network comprises a Conv1 layer, a layer regularization, a ConvNeXT network, an attention mechanism module, a full connection layer, a parallel random node sampler, a graph convolution network and a full connection layer from front to back;
the ConvNeXT network comprises stages 1, 2, 3 and 4, wherein Stage1 comprises a plurality of ConvNeXT blocks, stage2 comprises a plurality of ConvNeXT blocks, stage3 comprises a plurality of ConvNeXT blocks, and Stage4 comprises a plurality of ConvNeXT blocks.
2. A method for classifying remote sensing images, which is applied to the improved ConvNeXt convolutional neural network as claimed in claim 1, and is characterized by comprising the following steps:
acquiring a remote sensing image, and processing the remote sensing image into an image of 224 multiplied by 3;
respectively carrying out down-sampling, global feature extraction and local feature extraction on the processed remote sensing image, and then inputting the processed remote sensing image into an average pooling layer and a full-connection layer to obtain a first feature vector with the size of 1 multiplied by 1000;
respectively carrying out context information modeling, a random node sampler and a graph convolution network on the processed remote sensing image, and inputting an average pooling layer and a full-connection layer to obtain a second feature vector with the size of 1 multiplied by 1000;
fusing the first feature vector and the second feature vector by using an addition strategy to obtain fused features;
and inputting the fused features into a full-connection layer and a Softmax classification layer to predict to obtain a final classification result.
3. The method for classifying remote sensing images according to claim 2, wherein the steps of down-sampling, global feature extraction and local feature extraction are performed on the processed remote sensing images, and then the processed remote sensing images are input into an average pooling layer and a full connection layer to obtain a first feature vector with the size of 1 x 1000 comprise:
down-sampling the processed remote sensing image to obtain a characteristic map of 56 multiplied by 96;
extracting global features from the 56 × 56 × 96 feature map obtained by down-sampling by using depth convolution;
extracting local features of the feature map with attention mechanism after global features are extracted to obtain a feature map containing global features and local features;
and inputting the feature map containing the global features and the local features into an average pooling layer and a full-connection layer to obtain a first feature vector with the size of 1 × 1 × 1000.
4. The method for classifying remote sensing images according to claim 2, wherein after the context information modeling, the random node sampler and the graph convolution network are respectively performed on the processed remote sensing images, an average pooling layer and a full connection layer are input to obtain a second eigenvector with a size of 1 x 1000, comprising:
carrying out context information modeling on the processed remote sensing image through a graph structure to obtain image space information;
constructing a vertex set by using pixel points in the remote sensing image, determining the relation between vertexes according to the image space information, and constructing an adjacency graph;
inputting the adjacency graph into a random node sampler, and repeatedly sampling vertexes in the adjacency graph until all vertexes are sampled to generate a group of subgraphs;
inputting the subgraph into a graph convolution network, and extracting context characteristics of the subgraph;
and inputting the subgraph after the context features are extracted into an average pooling layer and a full-connection layer to obtain a second feature vector with the size of 1 multiplied by 1000.
5. The method for classifying remote sensing images according to claim 3, wherein the step of extracting local features from the feature map after global features are extracted by adopting an attention mechanism to obtain the feature map containing the global features and the local features comprises the following steps:
the feature map with the global features extracted is segmented into S parts, and then spatial information is extracted in a multi-scale convolution kernel grouping convolution mode; the convolution kernel size K and the group number G are set as follows:
after extracting the spatial information, cascading the feature maps of all parts to obtain a multi-scale fusion feature map, wherein the whole process is specifically calculated as follows:
F i =Conv(K i ,K i ,G i )(X i ),i=0,...,S-1
F=Concat([F 0 ,...,F S-1 ])
after passing through the attention mechanism module, obtaining a channel-level attention vector scale characteristic by using ECA;
performing re-correction on the channel level attention vector scale features by adopting a Softmax function, and acting the corrected attention vector on a multi-scale fusion feature map to obtain a feature map with more abundant multi-scale information;
adding the characteristic graph with more abundant multi-scale information to a pooling layer and a full-connection layer to finally obtain a characteristic vector F containing global characteristics and local characteristics CNN,AM ∈R 1000 。
6. The method for classifying remote sensing images according to claim 4, wherein the step of constructing a vertex set by using pixel points in the remote sensing images, determining the relation between the vertices according to the image space information and constructing an adjacency graph comprises the steps of:
constructing a vertex set V by using pixel points in the remote sensing image, wherein an edge set E consists of any two vertexes V i And V j Constructing an adjacency graph G (V, E);
describing the relationship between vertices using an adjacency matrix A, the weight a of an edge in the adjacency matrix i,j Obtained from the following function:
in the formula: x is the number of i And x j Representing a vertex v i And v j The associated feature vector, σ, is the width parameter of the function.
7. The method for classifying remote sensing images according to claim 2, wherein the step of inputting the subgraph into a graph convolution network and extracting context features of the subgraph comprises the following steps:
inputting the subgraph into a graph convolution network, wherein the graph convolution network aggregates a vertex V and all the vertices u epsilon V s The transfer of neighborhood relationship is realized by the features between the vertex v and the conduction equation of the vertex v at the l-th layer is defined as follows:
in the formula: s represents the s-th sub-graph and the s-th batch of network training, W is a parameter matrix, h (-) is an activation function, b is a bias parameter,is a contiguous matrix of self-connecting links,a degree matrix of D, defined as follows:
wherein i, j are rows and columns;
and (4) carrying out cascade operation on the output results of all the subgraphs to obtain the subgraphs after the context features are extracted.
8. A device for classifying a remote sensing image, comprising:
acquiring a remote sensing image, and processing the remote sensing image into an image of 224 multiplied by 3;
respectively carrying out down-sampling, global feature extraction and local feature extraction on the processed remote sensing image, and then inputting the processed remote sensing image into an average pooling layer and a full-connection layer to obtain a first feature vector with the size of 1 multiplied by 1000;
respectively carrying out context information modeling, a random node sampler and a graph convolution network on the processed remote sensing image, and inputting an average pooling layer and a full-connection layer to obtain a second feature vector with the size of 1 multiplied by 1000;
fusing the first feature vector and the second feature vector by using an addition strategy to obtain fused features;
and inputting the fused features into a full connection layer and a Softmax classification layer to predict and obtain a final classification result.
9. A classification device for remote sensing images is characterized in that: comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of claims 2 to 7.
10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the program when executed by a processor implements the steps of the method of any one of claims 2 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211342737.2A CN115545166A (en) | 2022-10-31 | 2022-10-31 | Improved ConvNeXt convolutional neural network and remote sensing image classification method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211342737.2A CN115545166A (en) | 2022-10-31 | 2022-10-31 | Improved ConvNeXt convolutional neural network and remote sensing image classification method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115545166A true CN115545166A (en) | 2022-12-30 |
Family
ID=84718320
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211342737.2A Pending CN115545166A (en) | 2022-10-31 | 2022-10-31 | Improved ConvNeXt convolutional neural network and remote sensing image classification method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115545166A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116051913A (en) * | 2023-04-03 | 2023-05-02 | 吉林农业大学 | Pilose antler decoction piece classification recognition model, method and system |
CN116403056A (en) * | 2023-06-07 | 2023-07-07 | 吉林农业大学 | Ginseng grading system and method |
CN116758360A (en) * | 2023-08-21 | 2023-09-15 | 江西省国土空间调查规划研究院 | Land space use management method and system thereof |
-
2022
- 2022-10-31 CN CN202211342737.2A patent/CN115545166A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116051913A (en) * | 2023-04-03 | 2023-05-02 | 吉林农业大学 | Pilose antler decoction piece classification recognition model, method and system |
CN116051913B (en) * | 2023-04-03 | 2023-05-30 | 吉林农业大学 | Pilose antler decoction piece classification recognition model, method and system |
CN116403056A (en) * | 2023-06-07 | 2023-07-07 | 吉林农业大学 | Ginseng grading system and method |
CN116403056B (en) * | 2023-06-07 | 2023-10-20 | 无锡学院 | Ginseng grading system and method |
CN116758360A (en) * | 2023-08-21 | 2023-09-15 | 江西省国土空间调查规划研究院 | Land space use management method and system thereof |
CN116758360B (en) * | 2023-08-21 | 2023-10-20 | 江西省国土空间调查规划研究院 | Land space use management method and system thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110188685B (en) | Target counting method and system based on double-attention multi-scale cascade network | |
CN111259905B (en) | Feature fusion remote sensing image semantic segmentation method based on downsampling | |
Wang et al. | Ultra-dense GAN for satellite imagery super-resolution | |
CN115545166A (en) | Improved ConvNeXt convolutional neural network and remote sensing image classification method thereof | |
CN108288270B (en) | Target detection method based on channel pruning and full convolution deep learning | |
CN114255238A (en) | Three-dimensional point cloud scene segmentation method and system fusing image features | |
CN112381097A (en) | Scene semantic segmentation method based on deep learning | |
CN108875076B (en) | Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network | |
CN109284741A (en) | A kind of extensive Remote Sensing Image Retrieval method and system based on depth Hash network | |
CN114332104B (en) | Power grid power transmission scene RGB point cloud semantic segmentation multi-stage model joint optimization method | |
CN112329801A (en) | Convolutional neural network non-local information construction method | |
CN115527036A (en) | Power grid scene point cloud semantic segmentation method and device, computer equipment and medium | |
CN112766102A (en) | Unsupervised hyperspectral video target tracking method based on space-spectrum feature fusion | |
CN115410087A (en) | Transmission line foreign matter detection method based on improved YOLOv4 | |
CN110264483B (en) | Semantic image segmentation method based on deep learning | |
CN115272670A (en) | SAR image ship instance segmentation method based on mask attention interaction | |
CN114359902A (en) | Three-dimensional point cloud semantic segmentation method based on multi-scale feature fusion | |
CN116434039B (en) | Target detection method based on multiscale split attention mechanism | |
CN114494284B (en) | Scene analysis model and method based on explicit supervision area relation | |
Babaee et al. | Assessment of dimensionality reduction based on communication channel model; application to immersive information visualization | |
CN116797640A (en) | Depth and 3D key point estimation method for intelligent companion line inspection device | |
CN112990336B (en) | Deep three-dimensional point cloud classification network construction method based on competitive attention fusion | |
CN115527082A (en) | Deep learning small target detection method based on image multi-preprocessing | |
CN115165363A (en) | CNN-based light bearing fault diagnosis method and system | |
CN115424012A (en) | Lightweight image semantic segmentation method based on context information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |