CN112183360B - Lightweight semantic segmentation method for high-resolution remote sensing image - Google Patents

Lightweight semantic segmentation method for high-resolution remote sensing image Download PDF

Info

Publication number
CN112183360B
CN112183360B CN202011049591.3A CN202011049591A CN112183360B CN 112183360 B CN112183360 B CN 112183360B CN 202011049591 A CN202011049591 A CN 202011049591A CN 112183360 B CN112183360 B CN 112183360B
Authority
CN
China
Prior art keywords
convolution
network
remote sensing
sensing image
semantic segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011049591.3A
Other languages
Chinese (zh)
Other versions
CN112183360A (en
Inventor
霍宏
吕亮
傅陈钦
沙拉依丁·斯热吉丁
方涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202011049591.3A priority Critical patent/CN112183360B/en
Publication of CN112183360A publication Critical patent/CN112183360A/en
Application granted granted Critical
Publication of CN112183360B publication Critical patent/CN112183360B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

A lightweight semantic segmentation method for a high-resolution remote sensing image comprises the following steps: the method comprises the steps of building, training and testing a network, wherein the network specifically builds a deep semantic segmentation network of an encoder-decoder structure for a pytoch deep learning framework, and obtains a segmentation result of the remote sensing image by taking the remote sensing image to be tested as network input after network training is carried out based on a remote sensing image data sample set. On one hand, the method reduces model parameters by decomposing depth separable convolution, reduces the calculation complexity, shortens the time of semantic segmentation of the high-resolution remote sensing image, and improves the efficiency of semantic segmentation of the high-resolution remote sensing image. On the other hand, the semantic segmentation precision is improved through multi-scale feature aggregation, a spatial attention module and gating convolution, so that the proposed lightweight deep semantic segmentation network can accurately and efficiently realize the semantic segmentation of the high-resolution remote sensing image.

Description

Lightweight semantic segmentation method for high-resolution remote sensing image
Technical Field
The invention relates to a technology in the field of remote sensing image processing, in particular to a light-weight semantic segmentation method for a high-resolution remote sensing image.
Background
With the development of aerospace technology, high-resolution remote sensing images are easier to obtain in large quantities, and the ground object boundaries in the remote sensing images are extracted by image segmentation, which is the basis for further analysis and utilization of the high-resolution remote sensing images. The traditional high-resolution remote sensing image segmentation algorithm usually realizes the extraction of the ground object boundary in the image by means of the manually designed characteristics such as texture, color and the like, but only the ground object boundary itself can be obtained, and the semantic information of the region defined by the boundary, namely the type of the ground object, cannot be obtained at the same time. In recent years, semantic segmentation based on a deep network has attracted much attention because it can extract feature boundaries and determine feature semantics at the same time. Since the complete Convolutional network (full volumetric Networks) for semantic segmentation proposed by Jonathan Long et al in 2015, a large number of semantic segmentation methods such as UNet, pspNet, deep lab series, etc. have been widely demonstrated to have advantages over the traditional remote sensing image segmentation algorithm, and have been widely used for automatic extraction of remote sensing image information.
However, when the method is used for semantic segmentation of the high-resolution remote sensing image, the high-resolution remote sensing image generally has the characteristic of large size, and the problems of slow training, low efficiency and the like often exist, so the light-weight semantic segmentation method is designed for semantic segmentation of the high-resolution remote sensing image, and the semantic segmentation efficiency of the high-resolution remote sensing image is greatly improved while the segmentation precision is ensured.
Disclosure of Invention
The invention provides a light-weight semantic segmentation method for a high-resolution remote sensing image, which solves the problem that the operation efficiency of a segmentation algorithm is low due to the quantity and the calculated amount of a large-amplitude high-resolution remote sensing image in the existing semantic segmentation network. On the other hand, the semantic segmentation precision is improved through multi-scale feature aggregation, a spatial attention module and gate control convolution, so that the proposed lightweight depth network can accurately and efficiently realize the semantic segmentation of the high-resolution remote sensing image.
The invention is realized by the following technical scheme:
the invention relates to a lightweight semantic segmentation method for a high-resolution remote sensing image, which comprises the following steps: the method comprises the steps of building, training and testing a network, wherein the network specifically constructs a deep semantic segmentation network of an encoder-decoder structure for a pytorch deep learning framework, and after network training is carried out based on a remote sensing image data sample set, a remote sensing image to be detected is used as network input to obtain a segmentation result of the remote sensing image.
The encoder is constructed by utilizing a multi-scale feature fusion and attention mechanism technology and comprises two sub-networks with the same structure and an attention module for capturing context information of a feature map, wherein: the image data is input into a first sub-network of the two sub-networks, the low-level feature map output by the first sub-network is subjected to up-sampling by 4 times and then fused with the first-level feature map to serve as the input of a second sub-network, the input of each level of feature map of the second sub-network is fused with the feature map of the first sub-network with the same scale, namely the first-level feature map is formed by fusing the output of the first sub-network and the first-level feature map, the high-level feature map output by the second sub-network is input into the spatial attention module, and the output result of the spatial attention module is input into the decoder.
Said first and second sub-networks each comprising: and each feature extraction layer consists of a downsampling layer and four decomposition depth separable convolution residual blocks.
The down-sampling layer is composed of a convolution kernel with the size of 1 multiplied by 1, a convolution layer with the step length of 2, a batch normalization layer and a Relu activation layer.
The decomposition depth separable convolution residual block extracts features through two groups of decomposition depth separable convolution kernels of 3 x 1 and 1 x 3, and residual connection is added in order to reduce gradient dispersion and facilitate network training. When the number of input feature map channels is c in Using c out The convolution kernels perform convolution operations. The standard 3 × 3 convolution kernel performs convolution operation on all channels, and the parameter number is 3 × 3 × c in ×c out . Decomposition ofConvolution decomposes a standard 3 × 3 convolution kernel into 3 × 1 and 1 × 3 convolution kernels with a parameter quantity of 2 × 3 × c in ×c out . The depth separable convolution includes:
i) And (3) longitudinal convolution: each 3 × 3 convolution kernel is only responsible for performing convolution operation on one channel, and the parameter number of the longitudinal convolution is 3 × 3 × c in
ii) point convolution: performing convolution operation by using 1 × 1 convolution kernel to realize information interaction between channels, wherein the parameter number of point convolution is 1 × 1 × c in ×c out
The total parameter for the depth separable convolution is therefore 3 × 3 × c in +c in ×c out . The method combines the decomposition convolution and the depth separable convolution and provides a decomposition depth separable convolution kernel with the parameter number of 2 multiplied by 3 multiplied by c in In contrast, the decomposition depth separable convolution kernel effectively reduces the parameter quantity and the calculated quantity, but because each channel is independently subjected to convolution operation, information interaction between the channels is lacked, and finally, the technology of channel random shuffling is introduced to improve the information interaction between the channels and improve the network performance.
The calculation process of the space attention module is as follows:
Figure BDA0002709139000000021
wherein: theta (X),
Figure BDA0002709139000000022
And
Figure BDA0002709139000000023
are all new feature maps generated by 1X 1 convolution from the input feature map, theta (X),
Figure BDA0002709139000000024
The multiplication result is fed into the softmax layer to obtain a spatial correlation coefficient matrix S,
Figure BDA0002709139000000025
the convolution with δ being 1 × 1 to recover the channel number by restoring to the size of the input feature map, and the final result Y is retainedGlobal context information.
The decoder comprises: three gated convolution modules for fusing high-level features and low-level features and four upsampling units, wherein: the input of the decoder is from the output of the encoder, each level of gated convolution module receives a low-level feature map from a first sub-network with the same scale, the gated convolution module performs refinement processing on the low-level features, information is input to each level of up-sampling units, the input of the first level of up-sampling units is from the output of the decoder, the other up-sampling units receive the feature map obtained by fusing the gated convolution and the high-level features, then double-bilinear interpolation processing is performed, the feature map obtained after interpolation is output to the next level of up-sampling units, and a semantic segmentation result map is output until the original image size is recovered.
The up-sampling unit comprises: the method comprises the steps of 1 multiplied by 1 convolution layer, batch normalization layer, activation layer and double-time bilinear interpolation layer, wherein a feature map is input into an up-sampling unit for decoding and carrying out double-time bilinear interpolation to obtain a feature map with improved resolution as the input of the next up-sampling unit.
The calculation process of the gate control convolution module is as follows:
Figure BDA0002709139000000031
wherein: x is an input feature map, i, j represents the position of each pixel, sigma is a sigmoid function, the weight of each pixel is set to be 0-1, and the edge part of the ground object in the image and the small-size ground object can obtain higher weight through learning, so that the semantic segmentation precision can be improved.
The training of the lightweight semantic segmentation network comprises the following steps of:
a1, dividing a remote sensing image data sample set into a training set, a verification set and a test set.
And A2, reading in the remote sensing images and the corresponding label images of the training set, randomly sampling the original large-format remote sensing images and the label data in order to fully utilize the training sample set, setting the sampling frequency of each round of training, setting sampling size parameters according to the size of a display memory, and simultaneously sampling the random positions of the remote sensing images and the label images.
The input sampling size parameter is the size of the image to be cut.
And A3, setting data enhancement parameters, and performing the same data enhancement on the remote sensing image and the corresponding label image.
The data enhancement parameters comprise an image rotation angle, an image turning angle, a brightness enhancement coefficient, a contrast enhancement coefficient, a chrominance enhancement coefficient and a scaling coefficient.
And A4, setting a learning rate, an exponential decay rate and a regularization coefficient to train a deep network, and selecting the deep network with the highest precision of a verification set as a lightweight semantic segmentation network obtained by training.
Technical effects
The invention integrally solves the problem that the existing semantic segmentation network faces the low operation efficiency of the segmentation algorithm caused by the parameter quantity and the calculated quantity of the large-amplitude high-resolution remote sensing image.
Compared with the prior art, the lightweight semantic segmentation network built by the decomposition depth separable convolution residual blocks reduces the parameter amount and the calculated amount, and greatly improves the operation speed of semantic segmentation. According to the invention, a multi-scale feature aggregation technology is adopted in an encoder and a decoder, low-level and high-level feature maps are aggregated to encode and decode multi-scale ground features in a high-resolution remote sensing image, a spatial attention module is combined to capture context information, and gated convolution is used to emphatically learn the edges of the ground features and small-size ground features when the low-level feature maps are aggregated, so that the semantic segmentation precision is improved. Therefore, the method can keep higher semantic segmentation precision, improves the efficiency of a semantic segmentation algorithm, and is an effective solution for performing high-resolution remote sensing image semantic segmentation.
Drawings
FIG. 1 is a flow chart of the method;
FIG. 2 is an exemplary diagram of a lightweight semantic segmentation network according to an embodiment;
FIG. 3 decomposes an example diagram of a depth separable residual convolution block;
FIG. 4 is a schematic diagram of an embodiment of a semantic segmentation dataset of a remote sensing image;
FIG. 5 is a diagram illustrating a comparison of semantic segmentation results of an embodiment;
in the figure: columns 1-5 are: original graph, label graph, DFANet prediction result graph, ENet prediction result graph and network prediction result graph of the embodiment.
Detailed Description
As shown in fig. 1, the method for lightweight semantic segmentation based on high-resolution remote sensing images according to the present invention includes the following steps:
step A, dividing a remote sensing image sample data set into a training set, a verification set and a test set according to the proportion of 0.5.
Step B, a deep semantic segmentation network is built and trained by using a pytoreh deep learning framework, original large-amplitude high-resolution remote sensing images and label data are read into a memory, in order to improve the data utilization rate, large-amplitude remote sensing images are randomly sampled into small graphs for batch training, the sampling frequency of each round of training is set to be 450, the sampling size and the training batch size are set according to the size of a display memory, the default size of an input image is 512 multiplied by 512, the batch size is defaulted to be 10, the original large-amplitude remote sensing image data and corresponding label data are randomly sampled, the remote sensing images and corresponding label graphs with the sizes of 512 multiplied by 512 are obtained in each round of sampling, training samples of each round are obtained after multiple sampling, a training sample enhancement parameter range is set, the random contrast is enhanced by 0.5 times to 1.5 times, the random saturation is enhanced by 0.5 times to 1.5 times, the random brightness is enhanced by 0.5 times to 1.5 times, the random scaling is enhanced by 0.5 times to 1.5 times, the semantic segmentation capability of the random sample and the maximum depth segmentation capability of the random sample and the training data of each round of the random depth segmentation is improved. And after each iteration, verifying the precision of the deep semantic segmentation network by using a verification data set to obtain the deep semantic segmentation network with the highest precision. And inputting the high-resolution remote sensing image in the test set into the obtained depth semantic segmentation network to obtain the semantic segmentation result of the image.
As shown in fig. 2, the lightweight semantic segmentation network has an encoder-decoder structure.
The encoder is constructed by utilizing a multi-scale feature fusion and attention mechanism technology, and comprises two sub-networks with the same structure and an attention module for capturing context information of a feature map, wherein the two sub-networks are respectively a first sub-network and a second sub-network, and the method comprises the following steps of: the data is firstly input into a first sub-network, the output result is up-sampled by 4 times and then fused with the feature map of the first layer to be used as the input of a second sub-network, the input of each level of feature map of the second sub-network is fused with the feature map of the first sub-network with the same scale, the output result of the second sub-network is input into a spatial attention module, and the output result of the spatial attention module is input into a decoder.
The sub-network comprises three feature extraction layers. Each feature extraction layer consists of one downsampled layer and four decomposition depth separable convolution residual blocks. The down-sampling layer is composed of a convolution layer with convolution kernel size of 1 multiplied by 1 and step length of 2, a batch normalization layer and a Relu activation function. The separable convolution residual block of the decomposition depth uses separable convolution kernels of the decomposition depth of 3 x 1 and 1 x 3 to extract features, in order to reduce gradient dispersion and facilitate network training, residual connection is used, and finally the technology of channel random mixing is introduced to improve information exchange among channels and improve network performance, wherein the sequence of the separable convolution residual block of the decomposition depth is as follows: 1 × 3 depth separable convolution- > Relu activation function- >3 × 1 depth separable convolution- > Relu activation function + batch normalization- >1 × 1 convolution- >1 × 3 depth separable convolution- > Relu activation function- >3 × 1 depth separable convolution- > Relu activation function + batch normalization- > residual concatenation- > channel random shuffle.
In order to keep a sufficiently large receptive field, the output of the first sub-network is up-sampled by 4 times and then used as the input of the second sub-network, and features with the same scale between the two sub-networks are fused at the same time so as to learn high-dimensional structure information of different ground object targets, and low-level and high-level features are aggregated to encode the ground object targets with different scales in the remote sensing image. The output of the second sub-network is sent to the spatial attention module.
The spatial attention module is used for capturing context information of the feature map, focusing on the most important part for semantic segmentation in the feature map, and suppressing useless information so as to improve segmentation performance. The input of the space attention module is from the feature map extracted by the sub-network, and the input is passed through theta (X),
Figure BDA0002709139000000051
And
Figure BDA0002709139000000052
three 1X 1 convolutional layers yield three new feature maps, θ (X),
Figure BDA0002709139000000053
The multiplication result is fed into the softmax layer to obtain a spatial correlation coefficient matrix S,
Figure BDA0002709139000000054
convolution is carried out to recover the channel number by recovering the dimension of the input feature diagram and delta is 1 multiplied by 1, and the whole process is as follows:
Figure BDA0002709139000000055
the resulting output Y retains the global context information and is input to the decoder.
The decoder comprises: three gated convolution modules for fusing high-level features with low-level features and four upsampling units, wherein: the input of the decoder is from the output of the encoder, each level of gated convolution module receives a low-level feature map from a first sub-network with the same scale, the gated convolution module performs thinning processing on the low-level features and then aggregates the low-level features with deep features, the feature maps obtained after aggregation are input into each level of up-sampling units, and finally, the feature maps are restored to the original image size and then semantic segmentation result maps are output.
In order to retain the detail information lost in the continuous down-sampling process of the deep feature map and prevent the over-segmentation easily caused by redundant information contained in the low-level features, a gated convolution module is used for dynamically selecting weight values of the features to refine the low-level feature map in the fusion process, the input of the gated convolution is from the low-level feature map of a first sub-network with the same scale, the refined feature map and the deep feature map are aggregated and then input into an upper sampling unit, and the whole process is as follows:
Figure BDA0002709139000000056
wherein: x is an input feature map, i, j represents the position of each pixel, sigma is a sigmoid function, the weight of the pixel is set between 0 and 1, and the whole module gives higher weight to the edge of the target and the small-size target through automatic learning.
The calculation sequence of the up-sampling unit is as follows: 1 × 1 convolutional layer, batch layer, active layer, and two-fold bilinear interpolation layer, where: the input of the first up-sampling unit is from the output of the decoder, the other up-sampling units receive the feature map after the gate control convolution and the high-level feature fusion, bilinear interpolation processing is carried out, the feature map after interpolation is output to the next-level up-sampling unit, and finally the segmentation result map with the same size as the original image is output.
In this embodiment, it is preferable to adopt a data set Potsdam of a remote sensing image 2D semantic segmentation competition of the international photogrammetry and remote sensing society, where the data set is an aerial image, each image has 3 bands of red, green, and blue, and surface features are divided into six categories, including an impermeable ground surface, buildings, low and short vegetation, trees, vehicles, and sundries, and have a truth diagram of labeling semantics pixel by pixel, and the truth diagram can be used for precision evaluation of semantic segmentation results, and is shown in fig. 3: the non-permeable ground surface is white (RGB value: 255, 255, 255), the buildings are blue (RGB value: 0, 255), the short vegetation is bright blue (RGB value: 0, 255, 255), the trees are green (RGB value: 0, 255, 0), the vehicles are yellow (RGB value: 255, 255, 0), and the sundries are red (RGB value: 255, 0). The semantic segmentation precision adopts total pixel precision and average F 1 And value evaluation, namely evaluating the semantic segmentation efficiency by adopting the model parameter quantity and the model prediction time.
The truth diagram of the semantics marked pixel by pixel is taken as reference, 14 test set remote sensing images of 6000 × 6000 are utilized, and the method is compared with four semantic segmentation methods in two aspects of precision and efficiency, wherein the four semantic segmentation methods comprise ENet, DFANet, PSPNet and deep LabV3+. Average F with overall pixel precision 1 The value serves as a criterion for measuring the accuracy of semantic segmentation. Overall pixel precision and average F 1 The higher the value of (A), the closer the semantic segmentation result is to the truth map, and the higher the semantic segmentation precision is. The prediction time (unit: second, s) of a remote sensing image with the size of 512 multiplied by 512 is used as a standard for measuring semantic segmentation efficiency by using the model parameter quantity (unit: mega, M), and the smaller the parameter quantity, the shorter the prediction time and the higher the efficiency. The results of the different semantic segmentation methods are shown in table 1:
TABLE 1 comparison of the present Process with the existing Process
Method Overall pixel accuracy Average F 1 Time(s) Reference quantity (M)
ENet 82.2% 82.8% 0.26 0.36
DFANet 83.3% 83.6% 0.23 7.8
PSPNet 86.9% 87.6% 1.03 48.7
DeepLabV3+ 88.1% 89.1% 4.13 56.7
Method for producing a composite material 86.7% 87.2% 0.32 1.29
As can be seen from the table, the method achieves an overall pixel accuracy of 86.7% and an average F of 87.2% 1 The value, predicted time, was 0.32s and parameter number was 1.29M. Although the ENet parameters are the least and the DFANet prediction time is the shortest, the pixel population precision (82.2%, 83.3%) and average F of ENet and DFANet are 1 The values (82.8%, 83.6%) are much lower than in this example. Despite the overall pixel accuracy and average F of the method 1 The values were slightly lower than DeepLanV3+ (88.1%, 89.1%) and PSPNet (86.9%, 87.6%). However, the parameter sizes of DeepLabV3+ and PSPNet (56.7M, 48.7M) were almost ten times as large as those of the method (1.29M) and the predicted time (4.13s, 1.03s) was several times as large as that of the method (0.32 s), so the integration was carried outThe method is superior to other deep semantic segmentation networks in semantic segmentation precision and efficiency.
From the visual effect, as shown in fig. 4, the present embodiment can accurately extract various feature boundaries and determine the semantics thereof, and can effectively reduce the false extraction compared with DFANet and ENet, and is closer to a true value map.
According to specific practical experiments, a model is trained and built under a pytoch deep learning framework, the learning rate is set to be 0.0001, the iteration times are 1500, the exponential decay rate is (0.9, 0.99), the regularization coefficient is 0.0002, the loss function is a cross entropy loss function, the sampling frequency of each round of training is set to be 450, the size of an input image is 512 x 512, the batch size is 10, the data enhancement parameter range comprises random rotation n x 90 degrees (n =0,1,2, 3), random horizontal direction and vertical direction 180 degrees are turned over, the random scale is scaled by 0.5 to 1.5 times, the random brightness is enhanced by 0.5 to 1.5 times, the random contrast is enhanced by 0.5 to 1.5 times, the random saturation is enhanced by 0.5 to 1.5 times, the learning rate is set to be 0.0001, and on a Potsdam data set of 2D semantic segmentation of remote sensing images of the international photography measurement and remote sensing society, the overall pixel accuracy of 86.7% and the average F2% of the remote sensing society are obtained 1 The value is that only 0.32s is needed for dividing one 512X 512 remote sensing image, and the parameter number is only 1.29M.
Compared with the prior art, the method obtains 86.7% of total pixel precision and 87.2% of average F in the data set of the 2D semantic segmentation competition of the remote sensing image of the international photogrammetry and remote sensing society 1 However, the prediction time is only 0.32s, the parameter number is only 1.29M, the prediction speed is faster compared with the smaller parameter numbers of DeepLabV3+ and PSPNet, the precision is higher compared with ENet and DFANet, and the balance between the semantic segmentation precision and the efficiency is better considered.
The foregoing embodiments may be modified in many different ways by one skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and not by the preceding embodiments, and all embodiments within their scope are intended to be limited by the scope of the invention.

Claims (1)

1. A high-resolution remote sensing image-oriented lightweight semantic segmentation method is characterized by comprising the following steps: building, training and testing a network, wherein the network specifically constructs a deep semantic segmentation network of an encoder-decoder structure for a pytorch deep learning framework, and after network training is carried out based on a remote sensing image data sample set, a remote sensing image to be tested is used as network input to obtain a segmentation result of the remote sensing image;
the encoder is constructed by utilizing a multi-scale feature fusion and attention mechanism technology and comprises two sub-networks with the same structure and an attention module for capturing context information of a feature map, wherein: the image data is input into a first sub-network of the two sub-networks, a low-level feature map output by the first sub-network is fused with the first-level feature map after being up-sampled and serves as the input of a second sub-network, the input of each level of feature map of the second sub-network is fused with the feature map of the first sub-network with the same scale, a high-level feature map output by the second sub-network is input into the spatial attention module, and the output result of the spatial attention module is input into the decoder;
said first and second sub-networks each comprising: three feature extraction layers, wherein each feature extraction layer consists of a downsampling layer and four separable convolution residual blocks with resolution depths;
the down-sampling layer consists of a convolution kernel with the size of 1 multiplied by 1, a convolution layer with the step length of 2, a batch normalization layer and a Relu activation layer;
the decomposition depth separable convolution residual block extracts features through two groups of decomposition depth separable convolution kernels of 3 x 1 and 1 x 3, and residual connection is added for reducing gradient dispersion and facilitating network training;
when the number of input feature map channels is c in Using c out Carrying out convolution operation on each convolution kernel;
the standard 3 × 3 convolution kernel performs convolution operation on all channels, and the parameter number is 3 × 3 × c in ×c out
Deconvolution the standard 3 × 3 convolution kernel is decomposed into 3 × 1 and 1 × 3 convolution kernels with a parameter quantity of 2 × 3 × c in ×c out
The depth separable convolution comprises:
i) And (3) longitudinal convolution: each 3 × 3 convolution kernel is only responsible for performing convolution operation on one channel, and the parameter number of the longitudinal convolution is 3 × 3 × c in
ii) point convolution: performing convolution operation by using 1 × 1 convolution kernel to realize information interaction between channels, wherein the parameter number of point convolution is 1 × 1 × c in ×c out
The total parameter quantity of the depth separable convolution is 3 multiplied by c in +c in ×c out
The calculation process of the space attention module is as follows:
Figure FDA0003835617670000016
wherein: theta (X),
Figure FDA0003835617670000012
And with
Figure FDA0003835617670000013
Are all new feature maps generated by 1X 1 convolution from the input feature map, theta (X),
Figure FDA0003835617670000014
The multiplication result is fed into the softmax layer to obtain a spatial correlation coefficient matrix S,
Figure FDA0003835617670000015
convolution is carried out to recover the number of channels by recovering the size of the input feature diagram and the delta is 1 multiplied by 1, and the obtained final result Y keeps the global context information;
the decoder comprises: three gated convolution modules for fusing high-level features and low-level features and four upsampling units, wherein: the input of the decoder is from the output of the encoder, each level of gated convolution module receives a low-level feature map from a first sub-network in the same scale, the gated convolution module carries out refinement processing on the low-level features, information is input to each level of up-sampling unit, the input of the first level of up-sampling unit is from the output of the decoder, the other up-sampling units receive the feature map formed by fusing the gated convolution and the high-level features, then double-linear interpolation processing is carried out, the feature map after interpolation is output to the next level of up-sampling unit, and a semantic segmentation result map is output until the original image size is recovered;
the up-sampling unit comprises: the method comprises the following steps that 1 x 1 convolutional layers, batch normalization layers, active layers and two-time bilinear interpolation layers are input into an up-sampling unit to be decoded, two-time bilinear interpolation is carried out, and a feature diagram with improved resolution is obtained and is used as the input of a next up-sampling unit;
the calculation process of the gated convolution module is as follows:
Figure FDA0003835617670000021
wherein: x is an input feature graph, i and j represent the position of each pixel, sigma is a sigmoid function, the weight of each pixel is set to be between 0 and 1, and through learning, the edge part of a ground object in an image and a small-size ground object obtain higher weight, which is beneficial to improving the precision of semantic segmentation;
the network training comprises the following steps:
a1, dividing a remote sensing image data sample set into a training set, a verification set and a test set;
a2, reading in a remote sensing image and a corresponding label image of a training set, randomly sampling an original large-amplitude remote sensing image and label data in order to fully utilize a training sample set, setting sampling frequency of each round of training, setting sampling size parameters according to the size of a video memory, and simultaneously sampling the remote sensing image and the label image at random positions;
a3, setting data enhancement parameters, and performing the same data enhancement on the remote sensing image and the corresponding label image;
a4, setting a learning rate, an exponential decay rate and a regularization coefficient to train a depth network, and selecting the depth network with the highest precision of a verification set for semantic segmentation of the high-resolution remote sensing image to be detected;
the data enhancement parameters comprise an image rotation angle, an image turning angle, a brightness enhancement coefficient, a contrast enhancement coefficient, a chroma enhancement coefficient and a scaling coefficient.
CN202011049591.3A 2020-09-29 2020-09-29 Lightweight semantic segmentation method for high-resolution remote sensing image Active CN112183360B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011049591.3A CN112183360B (en) 2020-09-29 2020-09-29 Lightweight semantic segmentation method for high-resolution remote sensing image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011049591.3A CN112183360B (en) 2020-09-29 2020-09-29 Lightweight semantic segmentation method for high-resolution remote sensing image

Publications (2)

Publication Number Publication Date
CN112183360A CN112183360A (en) 2021-01-05
CN112183360B true CN112183360B (en) 2022-11-08

Family

ID=73946579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011049591.3A Active CN112183360B (en) 2020-09-29 2020-09-29 Lightweight semantic segmentation method for high-resolution remote sensing image

Country Status (1)

Country Link
CN (1) CN112183360B (en)

Families Citing this family (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800963A (en) * 2021-01-28 2021-05-14 新华三大数据技术有限公司 Layout analysis method, model and electronic equipment based on deep neural network
CN112837320B (en) * 2021-01-29 2023-10-27 华中科技大学 Remote sensing image semantic segmentation method based on parallel hole convolution
CN112948604A (en) * 2021-02-01 2021-06-11 西北工业大学 Remote sensing image text description generation method with multi-semantic-level attention capability
CN112861727A (en) * 2021-02-09 2021-05-28 北京工业大学 Real-time semantic segmentation method based on mixed depth separable convolution
CN112927255B (en) * 2021-02-22 2022-06-21 武汉科技大学 Three-dimensional liver image semantic segmentation method based on context attention strategy
CN112966580B (en) * 2021-02-25 2022-07-12 山东科技大学 Remote sensing image green tide information extraction method based on deep learning and super-resolution
CN112819837B (en) * 2021-02-26 2024-02-09 南京大学 Semantic segmentation method based on multi-source heterogeneous remote sensing image
CN112950655A (en) * 2021-03-08 2021-06-11 甘肃农业大学 Land use information automatic extraction method based on deep learning
CN113065578B (en) * 2021-03-10 2022-09-23 合肥市正茂科技有限公司 Image visual semantic segmentation method based on double-path region attention coding and decoding
CN113012175B (en) * 2021-03-15 2022-10-25 南京理工大学 Road airborne scene semantic segmentation method with infrared image enhancement
CN112949549B (en) * 2021-03-19 2023-04-18 中山大学 Super-resolution-based change detection method for multi-resolution remote sensing image
CN112926533A (en) * 2021-04-01 2021-06-08 北京理工大学重庆创新中心 Optical remote sensing image ground feature classification method and system based on bidirectional feature fusion
CN113111835B (en) * 2021-04-23 2022-08-02 长沙理工大学 Semantic segmentation method and device for satellite remote sensing image, electronic equipment and storage medium
CN113159051B (en) * 2021-04-27 2022-11-25 长春理工大学 Remote sensing image lightweight semantic segmentation method based on edge decoupling
CN113205051B (en) * 2021-05-10 2022-01-25 中国科学院空天信息创新研究院 Oil storage tank extraction method based on high spatial resolution remote sensing image
CN113205524B (en) * 2021-05-17 2023-04-07 广州大学 Blood vessel image segmentation method, device and equipment based on U-Net
CN113239815B (en) * 2021-05-17 2022-09-06 广东工业大学 Remote sensing image classification method, device and equipment based on real semantic full-network learning
CN113255676A (en) * 2021-05-21 2021-08-13 福州大学 High-resolution remote sensing image semantic segmentation model and method based on multi-source data fusion
CN113362338B (en) * 2021-05-24 2022-07-29 国能朔黄铁路发展有限责任公司 Rail segmentation method, device, computer equipment and rail segmentation processing system
CN113327304A (en) * 2021-05-28 2021-08-31 北京理工大学重庆创新中心 Hyperspectral image saliency map generation method based on end-to-end neural network
CN113326847B (en) * 2021-06-04 2023-07-14 天津大学 Remote sensing image semantic segmentation method and device based on full convolution neural network
CN113240683B (en) * 2021-06-08 2022-09-20 北京航空航天大学 Attention mechanism-based lightweight semantic segmentation model construction method
CN113436204A (en) * 2021-06-10 2021-09-24 中国地质大学(武汉) High-resolution remote sensing image weak supervision building extraction method
CN113436198A (en) * 2021-06-15 2021-09-24 华东师范大学 Remote sensing image semantic segmentation method for collaborative image super-resolution reconstruction
CN113409322B (en) * 2021-06-18 2022-03-08 中国石油大学(华东) Deep learning training sample enhancement method for semantic segmentation of remote sensing image
CN113362343A (en) * 2021-06-22 2021-09-07 北京邮电大学 Lightweight image semantic segmentation algorithm suitable for operating at Android end
CN113326799A (en) * 2021-06-22 2021-08-31 长光卫星技术有限公司 Remote sensing image road extraction method based on EfficientNet network and direction learning
CN113298817A (en) * 2021-07-02 2021-08-24 贵阳欧比特宇航科技有限公司 High-accuracy semantic segmentation method for remote sensing image
CN113642390B (en) * 2021-07-06 2024-02-13 西安理工大学 Street view image semantic segmentation method based on local attention network
CN113591608A (en) * 2021-07-12 2021-11-02 浙江大学 High-resolution remote sensing image impervious surface extraction method based on deep learning
CN113763386B (en) * 2021-07-13 2024-04-19 合肥工业大学 Surgical instrument image intelligent segmentation method and system based on multi-scale feature fusion
CN113469094B (en) * 2021-07-13 2023-12-26 上海中科辰新卫星技术有限公司 Surface coverage classification method based on multi-mode remote sensing data depth fusion
CN113436243A (en) * 2021-07-30 2021-09-24 济宁安泰矿山设备制造有限公司 Depth information recovery method for intelligent pump cavity endoscope image
CN113688696B (en) * 2021-08-04 2023-07-18 南京信息工程大学 Ultrahigh-resolution remote sensing image earthquake damage building detection method
CN113642456B (en) * 2021-08-11 2023-08-11 福州大学 Remote sensing image scene classification method based on jigsaw-guided depth feature fusion
CN113781489B (en) * 2021-08-25 2024-03-29 浙江工业大学 Polyp image semantic segmentation method and device
CN113850818A (en) * 2021-08-27 2021-12-28 北京工业大学 Ear CT image vestibule segmentation method mixing 2D and 3D convolutional neural networks
CN113807210B (en) * 2021-08-31 2023-09-15 西安理工大学 Remote sensing image semantic segmentation method based on pyramid segmentation attention module
CN113780296B (en) * 2021-09-13 2024-02-02 山东大学 Remote sensing image semantic segmentation method and system based on multi-scale information fusion
CN113554032B (en) * 2021-09-22 2021-12-14 南京信息工程大学 Remote sensing image segmentation method based on multi-path parallel network of high perception
CN113887470B (en) * 2021-10-15 2024-06-14 浙江大学 High-resolution remote sensing image ground object extraction method based on multitask attention mechanism
CN114092801A (en) * 2021-10-28 2022-02-25 国家卫星气象中心(国家空间天气监测预警中心) Remote sensing image cloud detection method and device based on depth semantic segmentation
CN113887517B (en) * 2021-10-29 2024-04-09 桂林电子科技大学 Crop remote sensing image semantic segmentation method based on parallel attention mechanism
CN114120102A (en) * 2021-11-03 2022-03-01 中国华能集团清洁能源技术研究院有限公司 Boundary-optimized remote sensing image semantic segmentation method, device, equipment and medium
CN113887524B (en) * 2021-11-04 2024-06-25 华北理工大学 Magnetite microscopic image segmentation method based on semantic segmentation
CN114067116B (en) * 2021-11-25 2024-05-17 天津理工大学 Real-time semantic segmentation system and method based on deep learning and weight distribution
CN114092815B (en) * 2021-11-29 2022-04-15 自然资源部国土卫星遥感应用中心 Remote sensing intelligent extraction method for large-range photovoltaic power generation facility
CN114399519B (en) * 2021-11-30 2023-08-22 西安交通大学 MR image 3D semantic segmentation method and system based on multi-modal fusion
CN114119621A (en) * 2021-11-30 2022-03-01 云南电网有限责任公司输电分公司 SAR remote sensing image water area segmentation method based on depth coding and decoding fusion network
CN114140755A (en) * 2022-01-28 2022-03-04 北京文安智能技术股份有限公司 Conversion method of image semantic segmentation model and traffic road scene analysis platform
CN114842333B (en) * 2022-04-14 2022-10-28 湖南盛鼎科技发展有限责任公司 Remote sensing image building extraction method, computer equipment and storage medium
CN114898110B (en) * 2022-04-25 2023-05-09 四川大学 Medical image segmentation method based on full-resolution representation network
CN114723760B (en) * 2022-05-19 2022-08-23 北京世纪好未来教育科技有限公司 Portrait segmentation model training method and device and portrait segmentation method and device
CN115063685B (en) * 2022-07-11 2023-10-03 河海大学 Remote sensing image building feature extraction method based on attention network
CN114998363B (en) * 2022-08-03 2022-10-11 南京信息工程大学 High-resolution remote sensing image progressive segmentation method
CN115049936B (en) * 2022-08-12 2022-11-22 深圳市规划和自然资源数据管理中心(深圳市空间地理信息中心) High-resolution remote sensing image-oriented boundary enhanced semantic segmentation method
CN115393733B (en) * 2022-08-22 2023-08-18 河海大学 Automatic water body identification method and system based on deep learning
CN115082490B (en) * 2022-08-23 2022-11-15 腾讯科技(深圳)有限公司 Abnormity prediction method, and abnormity prediction model training method, device and equipment
CN115375922B (en) * 2022-09-03 2023-08-25 杭州电子科技大学 Light-weight significance detection method based on multi-scale spatial attention
CN115620013B (en) * 2022-12-14 2023-03-14 深圳思谋信息科技有限公司 Semantic segmentation method and device, computer equipment and computer readable storage medium
CN116310543B (en) * 2023-03-14 2023-09-22 自然资源部第一海洋研究所 GF-1WFV satellite red tide deep learning detection model, construction method and equipment
CN116665065B (en) * 2023-07-28 2023-10-17 山东建筑大学 Cross attention-based high-resolution remote sensing image change detection method
CN117274608B (en) * 2023-11-23 2024-02-06 太原科技大学 Remote sensing image semantic segmentation method based on space detail perception and attention guidance

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110136145A (en) * 2019-05-17 2019-08-16 东北大学 The MR brain image dividing method of convolutional neural networks is separated based on multichannel
CN110246145A (en) * 2019-06-21 2019-09-17 福州大学 A kind of dividing method of abdominal CT images
CN110516670A (en) * 2019-08-26 2019-11-29 广西师范大学 Suggested based on scene grade and region from the object detection method for paying attention to module
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion
CN111160356A (en) * 2020-01-02 2020-05-15 博奥生物集团有限公司 Image segmentation and classification method and device
CN111179273A (en) * 2019-12-30 2020-05-19 山东师范大学 Method and system for automatically segmenting leucocyte nucleoplasm based on deep learning
CN111179372A (en) * 2019-12-31 2020-05-19 上海联影智能医疗科技有限公司 Image attenuation correction method, device, computer equipment and storage medium
CN111445418A (en) * 2020-03-31 2020-07-24 联想(北京)有限公司 Image defogging method and device and computer equipment
CN111598174A (en) * 2020-05-19 2020-08-28 中国科学院空天信息创新研究院 Training method of image ground feature element classification model, image analysis method and system
CN111639677A (en) * 2020-05-07 2020-09-08 齐齐哈尔大学 Garbage image classification method based on multi-branch channel capacity expansion network
CN113269787A (en) * 2021-05-20 2021-08-17 浙江科技学院 Remote sensing image semantic segmentation method based on gating fusion

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110136145A (en) * 2019-05-17 2019-08-16 东北大学 The MR brain image dividing method of convolutional neural networks is separated based on multichannel
CN110246145A (en) * 2019-06-21 2019-09-17 福州大学 A kind of dividing method of abdominal CT images
CN110516670A (en) * 2019-08-26 2019-11-29 广西师范大学 Suggested based on scene grade and region from the object detection method for paying attention to module
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion
CN111179273A (en) * 2019-12-30 2020-05-19 山东师范大学 Method and system for automatically segmenting leucocyte nucleoplasm based on deep learning
CN111179372A (en) * 2019-12-31 2020-05-19 上海联影智能医疗科技有限公司 Image attenuation correction method, device, computer equipment and storage medium
CN111160356A (en) * 2020-01-02 2020-05-15 博奥生物集团有限公司 Image segmentation and classification method and device
CN111445418A (en) * 2020-03-31 2020-07-24 联想(北京)有限公司 Image defogging method and device and computer equipment
CN111639677A (en) * 2020-05-07 2020-09-08 齐齐哈尔大学 Garbage image classification method based on multi-branch channel capacity expansion network
CN111598174A (en) * 2020-05-19 2020-08-28 中国科学院空天信息创新研究院 Training method of image ground feature element classification model, image analysis method and system
CN113269787A (en) * 2021-05-20 2021-08-17 浙江科技学院 Remote sensing image semantic segmentation method based on gating fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Liang Lv 等.MFALNet: A Multiscale Feature Aggregation Lightweight Network for Semantic Segmentation of High-Resolution Remote Sensing Images.《IEEE》.2020, *
MFALNet: A Multiscale Feature Aggregation Lightweight Network for Semantic Segmentation of High-Resolution Remote Sensing Images;Liang Lv 等;《IEEE》;20200806;摘要,第I-III节,图1-3,表1 *

Also Published As

Publication number Publication date
CN112183360A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN112183360B (en) Lightweight semantic segmentation method for high-resolution remote sensing image
CN113159051B (en) Remote sensing image lightweight semantic segmentation method based on edge decoupling
CN109190752B (en) Image semantic segmentation method based on global features and local features of deep learning
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN108171701B (en) Significance detection method based on U network and counterstudy
CN109035267B (en) Image target matting method based on deep learning
CN114022785A (en) Remote sensing image semantic segmentation method, system, equipment and storage medium
CN112862774B (en) Accurate segmentation method for remote sensing image building
CN112634296A (en) RGB-D image semantic segmentation method and terminal for guiding edge information distillation through door mechanism
CN113743417B (en) Semantic segmentation method and semantic segmentation device
CN113408398B (en) Remote sensing image cloud detection method based on channel attention and probability up-sampling
CN111597920A (en) Full convolution single-stage human body example segmentation method in natural scene
CN111178438A (en) ResNet 101-based weather type identification method
CN112016400A (en) Single-class target detection method and device based on deep learning and storage medium
CN113034506A (en) Remote sensing image semantic segmentation method and device, computer equipment and storage medium
CN113743505A (en) Improved SSD target detection method based on self-attention and feature fusion
CN115908772A (en) Target detection method and system based on Transformer and fusion attention mechanism
CN110852369A (en) Hyperspectral image classification method combining 3D/2D convolutional network and adaptive spectrum unmixing
CN116612283A (en) Image semantic segmentation method based on large convolution kernel backbone network
CN117726954B (en) Sea-land segmentation method and system for remote sensing image
CN116524189A (en) High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
CN115049945A (en) Method and device for extracting lodging area of wheat based on unmanned aerial vehicle image
CN114037893A (en) High-resolution remote sensing image building extraction method based on convolutional neural network
CN117727046A (en) Novel mountain torrent front-end instrument and meter reading automatic identification method and system
CN115661673A (en) Image target detection method based on YOLOv4 and attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant