CN113159051B - Remote sensing image lightweight semantic segmentation method based on edge decoupling - Google Patents

Remote sensing image lightweight semantic segmentation method based on edge decoupling Download PDF

Info

Publication number
CN113159051B
CN113159051B CN202110456921.9A CN202110456921A CN113159051B CN 113159051 B CN113159051 B CN 113159051B CN 202110456921 A CN202110456921 A CN 202110456921A CN 113159051 B CN113159051 B CN 113159051B
Authority
CN
China
Prior art keywords
edge
semantic segmentation
feature
feature map
remote sensing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110456921.9A
Other languages
Chinese (zh)
Other versions
CN113159051A (en
Inventor
段锦
刘高天
祝勇
赵言
***
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun University of Science and Technology
Original Assignee
Changchun University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun University of Science and Technology filed Critical Changchun University of Science and Technology
Priority to CN202110456921.9A priority Critical patent/CN113159051B/en
Publication of CN113159051A publication Critical patent/CN113159051A/en
Application granted granted Critical
Publication of CN113159051B publication Critical patent/CN113159051B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing image lightweight semantic segmentation method based on edge decoupling, belongs to the field of computer vision, and can be used for intelligent interpretation in the field of remote sensing images. On one hand, the number of model parameters is reduced and network calculation overhead is reduced through a Ghost bottleneck module and deep separable convolution, the efficiency of semantic segmentation of the remote sensing image is effectively improved, and the lightweight of the proposed semantic segmentation network is realized; on the other hand, the precision of semantic segmentation is improved through the multi-scale feature pyramid, the global context module and the edge decoupling module, so that the proposed lightweight semantic segmentation network can accurately and efficiently realize the semantic segmentation of the remote sensing image, and the edge details of the remote sensing image are further refined.

Description

Remote sensing image lightweight semantic segmentation method based on edge decoupling
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a light-weight semantic segmentation method for a remote sensing image based on edge decoupling, which can be used for intelligent interpretation in the field of remote sensing images.
Background
The high-resolution remote sensing image contains information such as detailed color and texture characteristics of targets such as roads and buildings, and the intelligent interpretation of the information has important significance in various fields such as military, agriculture and environmental science. To accomplish the task of analytically classifying the remote sensing image, each pixel in the image should be assigned a label associated with the class to which it belongs, which is consistent with the purpose of semantic segmentation of the image.
The task has a better development direction under the inspiration of deep learning, and particularly, the full convolution network is provided, so that the image semantic segmentation method based on the deep learning is mainstream. Methods such as UNet, segNet, PSPNet, deeplab series and the like appear, and compared with the traditional remote sensing image segmentation algorithm, the methods have more advantages. When the algorithms are applied to semantic segmentation of high-resolution remote sensing images, although the algorithms can ensure a relatively excellent segmentation effect, the training speed and the segmentation efficiency are lower due to the fact that the image size is relatively large and the network structure is complex. In addition, the remote sensing image has too rich target diversity, unbalanced target class distribution and easy edge overlapping among different target classes, so that the remote sensing image cannot be finely divided.
Disclosure of Invention
The invention aims to provide a remote sensing image lightweight semantic segmentation method based on edge decoupling, and solve the technical problems that the inference speed is low and the segmentation efficiency is low due to large parameter quantity and large calculation overhead when the existing semantic segmentation method faces a high-resolution remote sensing image, and the edge segmentation effect is not ideal due to the fact that the edges of different types of targets are easy to overlap.
In order to achieve the purpose, the invention provides a remote sensing image light-weight semantic segmentation method based on edge decoupling, which has the following specific technical scheme:
a remote sensing image lightweight semantic segmentation method based on edge decoupling comprises the steps of building, training and testing a semantic segmentation network, wherein the semantic segmentation network is a lightweight coding and decoding network with a double-branch structure, after training of the semantic segmentation network is completed based on training samples, a remote sensing image to be tested is input into the semantic segmentation network, and a final remote sensing image semantic segmentation result is output;
comprises the following steps which are carried out in sequence:
step 1, acquiring a remote sensing image data set, and preparing a training and testing sample;
step 2, constructing a lightweight coding and decoding semantic segmentation network with a double-branch structure;
step 3, inputting the training samples into an encoder, performing feature encoding through feature extraction, and obtaining an encoding feature map F E
Step 4, encoding characteristic graph F E Input into a decoder to performThinning the edge characteristic and up-sampling to obtain a decoding characteristic graph F D
Step 5, inputting the decoding characteristic graph into a classifier to perform pixel-level classification prediction, outputting a segmentation result, and performing supervised training on the semantic segmentation network through a supervision mechanism;
step 6, training the semantic segmentation network built in the step 2 by using the training sample according to the step (3-5);
and 7, inputting the sample to be tested into the trained semantic segmentation network, outputting a final remote sensing image semantic segmentation result, and completing the test of the semantic segmentation network.
Further, the step 2 builds a lightweight coding and decoding semantic segmentation network with a double-branch structure, and the network comprises an encoder, a decoder and a classifier;
the encoder is of a double-branch structure and comprises a global downsampling block, a lightweight double-branch sub-network and a global feature fusion module;
the decoder consists of a lightweight edge decoupling module and an up-sampling module;
the classifier is composed of a conventional convolutional layer and a SoftMax layer.
Further, the coding feature map F is obtained in the step 3 E The method comprises the following steps:
step 3.1, inputting the training sample into a global downsampling block of an encoder to obtain a low-level feature map;
step 3.2, inputting the low-level feature map into a lightweight double-branch sub-network in an encoder to obtain a space detail feature map and an abstract semantic feature map;
step 3.3, the obtained space detail feature map and the abstract semantic feature map are subjected to multi-level feature fusion through a global feature fusion block of the encoder, and an encoding feature map F is output E
Further, the global downsampling block in the step 3.1 is composed of 3 parts, one of which is 1 conventional convolution, the other of which is 1 Ghost bottleneck module, and the third of which is 1 global context module;
after the input sample passes through the global downsampling block, a low-level feature map with the output resolution of 1/4 of the original input is generated and used as the input of the subsequent process.
Further, the step 3.2 light-weighted dual-branch sub-network includes two branches, namely a trunk depth branch for obtaining abstract semantic features and a space holding branch for obtaining space detail features, and the two branches share a low-level feature map output by the global downsampling block;
the trunk deep branch is constructed based on a GhostNet feature extraction network and comprises two structures, wherein one structure is a branch main body structure consisting of 16 GhostNet bottleneck modules and used for carrying out a downsampling process for 4 times to extract deep features; the second is a light-weight characteristic pyramid, the structure is composed of four parts of depth separable convolution, an up-sampling block, a light-weight cavity space pooling pyramid module and element fusion, 4 deep characteristic graphs with different scales formed by a main body are used as input, and finally abstract semantic characteristics with enlarged receptive field and multi-scale information are output;
the spatial preserving branch is composed of 3 depth separable convolutions, 1 down-sampling is realized for the input low-level features, and the resolution of the output spatial detail feature map is 1/2 of the input.
Further, the step 3.3 global feature fusion module includes 3 parts, one of which is a depth separable convolution with two parallel convolution kernels of 1 × 1; second, element fusion; thirdly, a global context module;
performing dimension adjustment on the input abstract semantic features and space detail features through two parallel convolutions, outputting feature graphs with rich space details and abstract semantic information through element fusion, and finally performing lightweight context modeling through a global context module to finally form a coding feature graph F E The global information can be better fused.
Further, the step 4 obtains a decoding feature map F D Firstly, the coding feature map F E Inputting the data into a lightweight edge decoupling module of a decoder to perform edge characterizationPerforming refining processing to generate a fine feature map with refined edges; inputting the fine characteristic diagram into an up-sampling module of a decoder, carrying out up-sampling operation, restoring the fine characteristic diagram to the size of the original input remote sensing image, and taking the restored fine characteristic diagram as a decoding characteristic diagram F output by the decoder D
Further, the lightweight edge decoupling module consists of 3 parts, namely a lightweight cavity space pooling pyramid, a main body feature generator and an edge retainer; firstly, the coding characteristics are subjected to light-weight cavity space pooling pyramid to generate a characteristic diagram F with multi-scale information and a larger receptive field aspp Then, more consistent feature representation is generated for pixels in the same object through a main body generator, and further a main body feature graph F of the target object is formed body (ii) a F is to be body 、F aspp And F E Inputting the data into an edge holder, and outputting a feature graph F of a refined edge through explicit subtraction operation, channel stack fusion and 1 x 1 conventional convolution dimensionality reduction edge Finally, the main body characteristic diagram and the refined edge characteristic diagram are fused, and a refined output characteristic diagram for performing up-sampling recovery is output and is marked as F final (ii) a The overall process can be represented by the following formula:
Figure GDA0003684880370000041
in the formula, f dsaspp Representing a lightweight void space pooling pyramid function, phi representing a principal feature generating function,
Figure GDA0003684880370000042
an edge-preserving function;
the up-sampling module comprises two steps of 1 × 1 conventional convolution operation and up-sampling operation, and a fine feature map F final After being output by the module, the characteristic diagram is restored to have the size of the original input remote sensing image, namely the output characteristic diagram F of the decoder D
Further, the supervision mechanism in step 5, decoding the feature map F D After being processed by the classifier, is finishedAnd performing pixel level classification prediction, namely outputting a result of semantic segmentation, and performing supervised training on the network through a supervision mechanism formed by the semantic segmentation result and a real label to ensure that the semantic segmentation network achieves the optimal segmentation performance.
Further, the supervision mechanism in step 5 is an edge-based supervision method, and the mechanism is implemented by a designed loss function, where the total loss function is denoted as L, and its formula is shown as follows:
Figure GDA0003684880370000051
in the formula L body 、L edge 、L final 、L G Respectively representing the main feature loss, the edge feature loss, the fine feature loss and the global coding loss, and the input of the 4 loss functions is respectively: respectively forming segmentation results and corresponding real labels of the segmentation results after the main body characteristic diagram, the refined edge characteristic diagram, the refined output characteristic diagram and the coding characteristic diagram are subjected to up-sampling recovery and a SoftMax layer;
wherein the loss function L edge The method is based on the edge prediction part to obtain the comprehensive loss function of the boundary edge prior, and comprises two aspects: one is the binary cross entropy loss L for boundary pixel classification bce And secondly, cross entropy loss L of edge parts in the scene bce ,λ 1 、λ 2 、λ 3 、λ 4 、λ 5 、λ 6 The representative hyperparameter is used to control the weighting between several losses.
The method of the invention has the following advantages: the method can fully consider the total parameter quantity and the total calculation quantity of the semantic segmentation network, the influence of a large number of redundant features on the segmentation efficiency and the segmentation accuracy, and fully consider the effect of the relation between the target body and the edge on the refined segmentation result;
firstly, the invention combines the idea of feature sharing, designs a global down-sampling block based on a global context module and a Ghost bottleneck module, and provides a first part of an encoder in a semantic segmentation network as the method, thereby effectively reducing the parameter scale of the early extraction of low-level features of the network, reducing the calculation cost, and better fusing global context information in the low-level features.
Secondly, the invention combines a double-branch structure and a global feature fusion mode based on a global context module, firstly, a lightweight double-branch sub-network is built based on a Ghost bottleneck module and a deep separable convolution, the parameter scale of a feature extraction stage is obviously reduced, the calculation complexity is reduced, and finally the output coding features contain rich space details and abstract semantic information. And secondly, the output characteristics of the double branches are fused in a global characteristic fusion mode based on a global context module, so that the finally output coding characteristics deepen the understanding of global information and the loss of the network to weak characteristic information is reduced.
Thirdly, the light-weight edge decoupling module is built by using the depth separable convolution, and the relation between the object body and the edge of the object body is introduced by modeling the main body and the edge of the target object; the problem that the existing remote sensing image semantic segmentation algorithm is not fine in edge segmentation is effectively solved, and the segmentation effect of edge details in the remote sensing image is improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a schematic diagram of a semantic segmentation network structure constructed by the method of the present invention.
Fig. 3 is a schematic structural diagram of a Ghost bottleneck module.
FIG. 4 is a block diagram of a global context module.
Fig. 5 is a schematic diagram of a lightweight feature pyramid structure in a trunk feature extraction branch.
Fig. 6 is a schematic structural view of a lightweight edge decoupling module.
Fig. 7 is an exemplary diagram of a remote sensing image in a data set and corresponding semantic tags.
FIG. 8 is a graph comparing semantic segmentation results according to an embodiment of the method of the present invention. (wherein (a) and (b) are input samples and corresponding labels, and (c) to (g) are semantic segmentation result graphs of Fast-SCNN, sem-FPN, the method of the invention, UNet and PSPNet in sequence).
Detailed Description
In order to better understand the purpose, structure and function of the invention, the following describes a remote sensing image lightweight semantic segmentation method based on edge decoupling in further detail with reference to the accompanying drawings.
As shown in FIG. 1, the invention designs a remote sensing image lightweight semantic segmentation method based on edge decoupling, which is applied to a high-resolution remote sensing image, so that the edge segmentation effect is refined while the precision is ensured, and the segmentation efficiency is greatly improved.
As shown in fig. 1, the invention relates to a remote sensing image light-weight semantic segmentation method based on edge decoupling, which specifically comprises the following steps:
step 1, acquiring a remote sensing image data set, and preparing a training and testing sample;
firstly, a high-resolution remote sensing image data set containing semantic annotation is obtained, labels and data are cut correspondingly, and the cutting mode is that sliding window cutting with the sliding step length of 384 is carried out according to the coverage rate of 0.75 and is a fixed window with 512-512 resolution. The cut new data and the new label are corresponding, data amplification is carried out by using modes of rotation, color enhancement and the like, and the influence of overfitting can be effectively weakened by sufficient samples. Finally, dividing samples of a training set and a testing set according to the proportion of 4;
step 2, building a lightweight coding and decoding semantic segmentation network with a double-branch structure;
the structure of the constructed semantic segmentation network is shown in FIG. 2. The network is a lightweight coding and decoding semantic segmentation network with a double-branch structure and comprises an encoder, a decoder and a classifier. The encoder is of a double-branch structure and comprises a global downsampling block, a lightweight double-branch sub-network and a global feature fusion module; the decoder is composed of a lightweight edge decoupling module and an up-sampling module. The classifier is composed of a conventional convolution layer and a SoftMax layer;
step 3, inputting the training sample into an encoder, and extracting through characteristicsTaking and coding the characteristic to obtain a coding characteristic diagram F E (ii) a The following three substeps are specifically experienced:
step 3.1, inputting the training sample into a global downsampling block of an encoder to obtain a low-level feature map;
the obtained training samples are input into an encoder of the network, the input samples have a scale of 512 × 512, and the samples first pass through a global downsampling block in the encoder. The module consists of 3 parts, one is a conventional convolution, the other is 1 Ghost bottleneck module, and the other is a global context module. The low-level feature map output by the global downsampling module is better fused with global context information and also contains rich spatial detail information;
the conventional convolution is a convolution block with a convolution kernel of 3 × 3 and a step size of 2 and with a batch normalization layer and a ReLU activation layer, and the training sample outputs a feature map with a resolution of 256 × 256 after being subjected to partial down-sampling;
the Ghost bottleneck module is a lightweight module, originates from a Ghost Net network, is composed of Ghost modules, can generate a feature map with deeper dimensions by using fewer parameters, and has a structure shown in FIG. 3. The module has different structures according to different step lengths, the structure contains two layers of Ghost modules with the step length of 1 when the step length is 1, and the two layers of Ghost modules are mixed with one channel-by-channel convolution with the step length of 2 when the step length is 2. The step size of this block in the global downsampling block is 2, and the implementation of the second downsampling further reduces the resolution of the feature map. The resolution of the feature map output by the module is 128 × 128;
the global context module, the structure of which is shown in fig. 4, includes 3 processes: one is a global attention-focusing mechanism for context modeling; acquiring a self-attention weight of an input feature map by adopting 1-by-1 conventional convolution and a SoftMax layer, and then performing attention focusing operation on the input feature map to acquire a global background feature map; secondly, acquiring channel dependence by feature conversion; the part consists of two 1-by-1 convolution layers which are connected through a batch normalization layer and a ReLu activation function; thirdly, element fusion; and fusing the original input feature graph and the feature graph after obtaining channel dependence through element fusion, and aggregating the global context features to the features of each position. The output feature map and the input feature map keep the same size after passing through the module, so that the scale of the finally formed low-level feature map is 128 × 128;
step 3.2, inputting the low-level feature map into a lightweight double-branch subnetwork in the encoder to obtain a space detail feature map and an abstract semantic feature map;
the two-branch subnetwork comprises two branches, namely a trunk depth branch for acquiring abstract semantic features and a space holding branch for acquiring space detail features. Two branches share the low-level features output by the global downsampling block, compared with the traditional two-branch network, one input path is reduced, and the parameter scale and the calculation cost when the low-level features are extracted in the early stage of the network are reduced;
the trunk deep branch is constructed based on a GhostNet network, and the main body part of the network comprises 16 GhostNet bottleneck modules for carrying out a downsampling process for 4 times so as to realize extraction of deep features. The method reserves 16 Ghost bottleneck modules in the Ghost Net and changes the 16 Ghost bottleneck modules into a full convolution network serving as a main body of trunk deep branches. The input low-level feature maps are subjected to the branching processing, and finally, the depth feature maps with 4 scales of 64 × 64, 32 × 32, 16 × 16 and 8 × 8 are generated. The 4 scales represent 4 stages, and the number of Ghost bottleneck modules corresponding to each stage is respectively: [3,2,6,5], corresponding convolution kernel sizes are: [3,5,3,5]. Since 4 downsampling is implemented, each stage has a Ghost bottleneck module with a step size of 2.
Meanwhile, in order to obtain rich abstract semantic features, the method combines a depth separable convolution module, an up-sampling block module and a lightweight cavity space pooling pyramid module, and builds a lightweight feature pyramid by using 4 feature maps, wherein the structure of the pyramid is shown in fig. 5. The newly generated 4 levels are closely connected, the receptive field is enlarged, and the feature map with multi-scale information is sampled to 64 × 64 scales, and a final abstract semantic feature group formed by element fusion is output as a trunk deep branch;
the space-preserving branch consists of 3 depth separable convolutions, the convolution kernel size of the three is 3 x 3, and the step sizes are [1,2,1] respectively. 1-time down-sampling can be realized on the input low-level feature map, and the resolution of the output spatial detail feature map is 64 x 64; the branch reserves the space scale of the input image with less parameter quantity and less calculation cost, and can encode rich space information;
step 3.3, the obtained space detail characteristic diagram and the abstract semantic characteristic diagram are subjected to multi-level characteristic fusion through a global characteristic fusion block of the encoder, and an encoding characteristic diagram F is output E
The global feature fusion module comprises 3 parts, wherein one part is a depth separable convolution with two parallel convolution kernels of 1 x 1; second, element fusion; thirdly, a global context module; the abstract semantic features and the space detail features from the double-branch sub-network are subjected to dimension adjustment through two parallel 1-by-1 convolutions, and a feature diagram with rich space details and abstract semantic information is output through element fusion. Finally, carrying out lightweight context modeling through a global context module to finally form a coding feature graph F E The global information can be better fused. Because the downsampling process is not included, the output coding feature map is consistent with the input size;
finally, generating a coding feature map with the scale of 64 x 64 by the coder through the steps, and using the coding feature map as input of a subsequent process;
step 4, encoding feature map F E Inputting the data into a decoder, performing edge feature refinement processing and up-sampling operation to obtain a decoded feature map F D
The decoder consists of two modules, namely an edge decoupling module and an up-sampling module. The lightweight edge decoupling module comprises 3 parts, and the structure is shown in figure 6. The device comprises a lightweight cavity space pooling pyramid, a main body feature generator and an edge retainer; the specific process of acquiring the decoding characteristic diagram comprises the following steps: firstly, encoding characteristic graph F E Inputting the data into a lightweight edge decoupling module of a decoder, and performing edge feature refinement processing to generate a fine feature map with refined edges; inputting the fine feature map into an up-sampling module of a decoder, performing up-sampling operation, and restoring the fine feature map to the original inputSize of remote sensing image as decoding characteristic graph F output by decoder D
The main body feature generator comprises two processes of flow field generation and feature deformation, wherein the flow field generation is composed of a micro coding and decoding structure containing one-time down-sampling and one-time up-sampling and a conventional convolution with a convolution kernel of 3 x 3 and is used for generating flow field feature representation with prominent features at the central part of a target object. The characteristic deformation is to obtain the obvious main characteristic representation of the target object by carrying out deformation operation on the flow field characteristics; therefore, the main feature generator is responsible for generating more consistent feature representation for pixels in the same object, and the extracted main feature is the main feature of the target object;
the edge retainer comprises two steps, wherein the first step is that a subtracter is used for carrying out display subtraction operation on the coding feature map subjected to the receptive field expansion and the main body feature map to obtain a rough edge feature map; the second is an edge feature refiner, which supplements edge features with a low-level feature map containing fine details. Specifically, the low-level feature map from the encoder is feature-fused to the generated coarse edge feature map through channel stacking, thereby supplementing high-frequency information. Performing dimensionality reduction through 1 × 1 conventional convolution, and outputting a feature map of a refined edge;
input coding feature F E Firstly, generating a feature map F with multi-scale information and larger receptive field through a lightweight cavity space pooling pyramid aspp Then, a subject feature diagram F of the target is formed through a subject generator body (ii) a F is to be body 、F aspp And F E Inputting into an edge holder, first F body And F E Performing explicit subtraction to generate a preliminary edge feature map, and comparing the preliminary edge feature map with F E Performing channel stacking fusion, performing 1 × 1 conventional convolution operation dimensionality reduction processing, and outputting an edge feature graph F with refined edges edge . Finally F body And F edge Carrying out element fusion to obtain a fine output characteristic diagram F for carrying out up-sampling recovery final (ii) a The whole process can be represented by the following formula:
Figure GDA0003684880370000101
in the formula, f dsaspp Representing a lightweight cavity space pooling pyramid function, phi representing a principal feature generating function,
Figure GDA0003684880370000102
an edge-preserving function;
to obtain the final decoded feature map F D Will F final Inputting into an upsampling module containing 1-by-1 conventional convolution and upsampling operation for processing, recovering to the size of the original input image, and finally generating a feature map which is a decoding feature map F D Its scale is 512 x 512;
step 5, inputting the decoding characteristic graph into a classifier to perform pixel level classification prediction, outputting a segmentation result, and performing supervised training on the semantic segmentation network through a supervision mechanism;
the classifier main body is a SoftMax layer, and a decoding characteristic diagram F D After the processing of the SoftMax layer, the classification prediction of the pixel level is completed, and the result of semantic segmentation is obtained. The network training is supervised by a supervision mechanism formed by the segmentation result and the real label, so that the semantic segmentation network achieves the optimal segmentation performance;
the supervision mechanism does not supervise only the final segmentation result, but for F body 、F edge 、F final 、F E The four parts are jointly supervised. The mechanism is realized by a designed loss function, and the total loss function is recorded as L, and the formula is shown as the following formula:
Figure GDA0003684880370000111
in the formula L body 、L edge 、L final 、L G Respectively representing the loss of main features, the loss of edge features, the loss of fine features and the loss of global coding. Wherein L is final And L G By adopting semantic divisionA cross entropy loss function common in the cut task. L is body It is assumed that the loss is lost by boundary relaxation, which can make F during training body The classification of the boundary pixels can be relaxed, and the segmentation network is allowed to predict the boundary pixels into a plurality of classes. L is edge The method is based on the edge prediction part to obtain the comprehensive loss function of the boundary edge prior, and comprises two aspects: one is the binary cross entropy loss L for boundary pixel classification bce . Second is cross entropy loss L of edge parts in a scene ce 。λ 1 、λ 2 、λ 3 、λ 4 、λ 5 、λ 6 The representative hyperparameter is used to control the weighting between several losses. The first three defaults are 1, and the last three are 0.4, 20, 1, respectively.
Figure GDA0003684880370000112
Represents the real semantic label of the real object,
Figure GDA0003684880370000113
is represented by
Figure GDA0003684880370000114
The generated binary mask, b represents the boundary prediction result, s body 、s final And s E Represents from F body 、F edge 、F E The segmentation map prediction result obtained in (1);
step 6, training the semantic segmentation network built in the step 2 by using the training sample according to the step (3-5);
according to the process, after a semantic segmentation network is built, continuously inputting training samples to train the network according to the step (3-5); before training, related training parameters such as input scale of the network, input sample batch, learning rate and the like need to be set.
Step 7, inputting a sample to be tested into the trained semantic segmentation network, outputting a final semantic segmentation result of the remote sensing image, and completing the test of the semantic segmentation network;
the following is a specific example experiment, which is not intended to limit the use of the method of the present invention, but is merely a better example for analysis.
The experiment used the Vaihingen dataset provided by isps, which contains 3 channels of IRRG images, DSM images, and NDSM images. 16 remote sensing images with the size of 6000 to 6000 and corresponding labels. The corresponding visualization results are shown in fig. 7. The corresponding semantic labels of the 6 types of target classes contained in the label are determined by RGB values, which are specifically shown in table 1 below:
TABLE 1 semantic annotation information Table
Figure GDA0003684880370000121
In this embodiment, the data set is preprocessed according to the sliding window cropping method and the data augmentation described in step 1, and the obtained data is a multi-channel graph of 512 × 3. Dividing the training sample and the test sample according to the proportion of 4;
and then, building a semantic segmentation network of the method, and setting relevant parameters before training. The input scale of the network is 512 × 512, the input batch is set to 10 (according to video memory), the optimizer adopts an SGD optimizer, the initial learning rate is set to 0.001, the minimum learning rate is set to 0.00001, the momentum is set to 0.9, and the weight attenuation coefficient is set to 0.0005.
In this embodiment, the selected semantic segmentation evaluation indexes are an average pixel intersection ratio (mlou), an average pixel precision (mAcc), GFLOPs (floating point computation), a parameter quantity, and a segmentation inference time of a single image. Selecting 4 semantic segmentation methods for comparing the method from two aspects of segmentation precision and efficiency, wherein the 4 semantic segmentation methods respectively comprise the following steps: UNet, PSPNet, fast _ SCNN, sem-FPN. And utilizing mIoU and mAcc as a standard for measuring the semantic segmentation accuracy. The higher the two is, the closer the segmentation result is to the real label is, and the higher the semantic segmentation precision is. GFLOPs, parameter numbers and inference time of single image segmentation are used as standards of semantic segmentation efficiency, and the smaller the GFLOPs, parameter numbers and inference time, the higher the segmentation efficiency. The experimental results of the different semantic segmentation methods are shown in table 2:
TABLE 2 comparison of the present Process with the existing Process
Method mIoU(%) mAcc(%) GFLOPs Reference quantity (M) Partition reasoning time(s)
UNet 86.19 91.16 203.04 29.06 0.067
PSPNet 86.40 92.19 178.48 48.98 0.066
Fast-SCNN 76.23 83.83 0.91 1.21 0.015
Sem-FPN 83.57 90.91 45.48 28.50 0.029
Method for producing a composite material 85.33 90.98 6.63 4.17 0.031
As can be seen from the results in Table 2, the method achieved 89.42% mIoU and 93.15% mAcc, GFlOPs was 6.9, the number of parameters was 4.1M, and the single image segmentation inference speed was 0.031s. Compared with Fast-SCNN, although the parameter quantity is minimum, the floating point calculation quantity is minimum, the reasoning time is minimum, and the precision is far lower than that of the method; compared with Sem-FPN, although the method is slightly inferior in inference time, the method is higher than Sem-FPN in mIoU and mACC, and the parameters and GFLOPs of Sem-FPN are far higher than that of the method. Both UNet and PSPNet are classical semantic segmentation networks, and compared with the method, the UNet and PSPNet are inferior to the two in precision, but the UNet and PSPNet are several times of the method in parameter quantity, GFLOPs and inference time. Therefore, the semantic segmentation network provided by the method is superior to other semantic segmentation networks in view of integrating the precision and the efficiency of semantic segmentation. In addition, from the view of parameter quantity, GFLOPs and reasoning speed, the invention is verified to be a light-weight semantic segmentation method for remote sensing images;
fig. 8 is a visualized semantic segmentation result obtained after a test sample is input. Compared with semantic segmentation results obtained by Fast-SCNN and Sem-FPN methods, the method is more accurate in pixel classification, and effectively improves the condition of segmentation errors caused by wrong classification; the method is more accurate in processing edge details and is closer to the real result of the semantic label. Compared with UNet and PSPNet, although the overall segmentation precision is insufficient, the method is closer to semantic labels in segmentation at edge details.
It is to be understood that the present invention has been described with reference to certain embodiments, and that various changes in the features and embodiments, or equivalent substitutions may be made therein by those skilled in the art without departing from the spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (7)

1. A remote sensing image lightweight semantic segmentation method based on edge decoupling is characterized by comprising the steps of building, training and testing a semantic segmentation network, wherein the semantic segmentation network is a lightweight coding and decoding network with a double-branch structure, after training of the semantic segmentation network is completed based on a training sample, a remote sensing image to be tested is input into the semantic segmentation network, and a final remote sensing image semantic segmentation result is output;
the method comprises the following steps in sequence:
step 1, acquiring a remote sensing image data set, and preparing a training and testing sample;
step 2, building a lightweight coding and decoding semantic segmentation network with a double-branch structure;
step 2, building a lightweight coding and decoding semantic segmentation network with a double-branch structure, wherein the network comprises an encoder, a decoder and a classifier;
the encoder is of a double-branch structure and comprises a global downsampling block, a lightweight double-branch sub-network and a global feature fusion module;
the decoder consists of a lightweight edge decoupling module and an up-sampling module;
the classifier is composed of a conventional convolutional layer and a SoftMax layer;
step 3, inputting the training samples into an encoder, performing feature encoding through feature extraction, and obtaining an encoding feature map F E
Obtaining a coding feature map F in the step 3 E The method comprises the following steps:
step 3.1, inputting the training sample into a global downsampling block of an encoder to obtain a low-level feature map;
the global downsampling block in the step 3.1 consists of 3 parts, wherein one part is 1 conventional convolution, the other part is 1 Ghost bottleneck module, and the third part is 1 global context module;
after an input sample passes through a global downsampling block, a low-level feature map with the output resolution being 1/4 of the original input is generated and used as the input of the subsequent process;
step 3.2, inputting the low-level feature map into a lightweight double-branch sub-network in an encoder to obtain a space detail feature map and an abstract semantic feature map;
step 3.3, the obtained space detail feature map and the abstract semantic feature map are subjected to multi-level feature fusion through a global feature fusion block of the encoder, and an encoding feature map F is output E
Step 4, encoding feature map F E Inputting the data into a decoder, performing edge feature refinement processing and up-sampling operation to obtain a decoded feature map F D
Step 5, inputting the decoding characteristic graph into a classifier to perform pixel-level classification prediction, outputting a segmentation result, and performing supervised training on the semantic segmentation network through a supervision mechanism;
step 6, training the semantic segmentation network built in the step 2 by using the training sample according to the step (3-5);
and 7, inputting the sample to be tested into the trained semantic segmentation network, outputting a final remote sensing image semantic segmentation result, and completing the test of the semantic segmentation network.
2. The remote sensing image light-weight semantic segmentation method based on edge decoupling as claimed in claim 1, wherein the light-weight double-branch sub-network in step 3.2 comprises two branches, namely a trunk depth branch for obtaining abstract semantic features and a space preservation branch for obtaining space detail features, and the two branches share a low-level feature map output by a global downsampling block;
the trunk deep branch is constructed based on a GhostNet feature extraction network and comprises two structures, wherein one structure is a branch main body structure consisting of 16 GhostNet bottleneck modules and is used for carrying out a downsampling process for 4 times to realize the extraction of deep features; the structure comprises four parts of depth separable convolution, an up-sampling block, a lightweight cavity space pooling pyramid module and element fusion, 4 deep feature maps with different scales formed by a main body are used as input, and finally, abstract semantic features with enlarged receptive field and multi-scale information are output;
the spatial preservation branch is composed of 3 depth separable convolutions, down-sampling is performed 1 time for the input low-level feature map, and the resolution of the output spatial detail feature map is 1/2 of the input.
3. The remote sensing image light-weight semantic segmentation method based on edge decoupling as claimed in claim 1, wherein the step 3.3 global feature fusion module comprises 3 parts, one of which is a depth separable convolution with two parallel convolution kernels of 1 x 1; second, element fusion; the third is 1 global context module;
performing dimension adjustment on the input abstract semantic features and space detail features through two parallel convolutions, outputting feature graphs with rich space details and abstract semantic information through element fusion, and finally performing lightweight context modeling through a global context module to finally form a coding feature graph F E The global information can be better fused.
4. The remote sensing image light-weight semantic segmentation method based on edge decoupling as claimed in claim 1, wherein the step 4 obtains a decoding feature map F D Firstly, the coding feature map F E Inputting the data into a lightweight edge decoupling module of a decoder, and performing edge feature refinement processing to generate a fine feature map with refined edges; inputting the fine characteristic diagram into an up-sampling module of a decoder, performing up-sampling operation, restoring the fine characteristic diagram to the size of the original input remote sensing image, and using the fine characteristic diagram as a decoding characteristic diagram F output by the decoder D
5. The remote sensing image light-weight semantic segmentation method based on the edge decoupling as claimed in claim 1, wherein the light-weight edge decoupling module is composed of 3 parts, namely a light-weight cavity space pooling pyramid, a main body feature generator and an edge retainer; firstly, the coding characteristics are subjected to light-weight cavity space pooling pyramid to generate a characteristic diagram F with multi-scale information and a larger receptive field aspp Then, more consistent feature representation is generated for pixels in the same object through a main body generator, and further a main body feature graph F of the target object is formed body (ii) a F is to be body 、F aspp And F E Inputting the data into an edge holder, and outputting a feature map F of a refined edge through explicit subtraction operation, channel stack fusion and 1 x 1 conventional convolution dimensionality reduction edge Finally, the main body characteristic diagram and the refined edge characteristic diagram are fused, and a refined output characteristic diagram F for carrying out up-sampling recovery is output final (ii) a The whole process can be represented by the following formula:
Figure FDA0003684880360000031
in the formula, f dsaspp Representing a lightweight void space pooling pyramid function, phi representing a principal feature generating function,
Figure FDA0003684880360000032
an edge preservation function;
the up-sampling module comprises two steps of 1 × 1 conventional convolution operation and up-sampling operation, and a fine feature map F final After being output by the module, the characteristic diagram is restored to have the size of the original input remote sensing image, namely the output characteristic diagram F of the decoder D
6. The remote sensing image light-weight semantic segmentation method based on edge decoupling as claimed in claim 1, wherein the supervision mechanism in step 5 is decoding feature map F D After the classification of the pixel level is completed by the classifier, the output is the result of semantic segmentation, and the network is supervised and trained by a supervision mechanism formed by the result of semantic segmentation and a real label, so that the semantic segmentation network achieves the best segmentation performance.
7. The remote sensing image light-weight semantic segmentation method based on edge decoupling as claimed in claim 1, wherein the supervision mechanism in step 5 is an edge-based supervision mode, the mechanism is realized by a designed loss function, the total loss function is denoted as L, and the formula is shown as follows:
Figure FDA0003684880360000041
in the formula L body 、L edge 、L final 、L G Respectively representing the main feature loss, the edge feature loss, the fine feature loss and the global coding loss, and the input of the 4 loss functions is respectively: respectively forming segmentation results and corresponding real labels of the segmentation results after the main body characteristic diagram, the refined edge characteristic diagram, the refined output characteristic diagram and the coding characteristic diagram are subjected to up-sampling recovery and a SoftMax layer;
wherein the loss function L edge The method is based on the edge prediction part to obtain the comprehensive loss function of the boundary edge prior, and comprises two aspects: one is binary cross-entropy loss for boundary pixel classificationL bce And secondly, the cross entropy loss L of the edge part in the scene bce ,λ 1 、λ 2 、λ 3 、λ 4 、λ 5 、λ 6 The representative hyperparameter is used to control the weighting between several losses.
CN202110456921.9A 2021-04-27 2021-04-27 Remote sensing image lightweight semantic segmentation method based on edge decoupling Active CN113159051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110456921.9A CN113159051B (en) 2021-04-27 2021-04-27 Remote sensing image lightweight semantic segmentation method based on edge decoupling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110456921.9A CN113159051B (en) 2021-04-27 2021-04-27 Remote sensing image lightweight semantic segmentation method based on edge decoupling

Publications (2)

Publication Number Publication Date
CN113159051A CN113159051A (en) 2021-07-23
CN113159051B true CN113159051B (en) 2022-11-25

Family

ID=76871278

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110456921.9A Active CN113159051B (en) 2021-04-27 2021-04-27 Remote sensing image lightweight semantic segmentation method based on edge decoupling

Country Status (1)

Country Link
CN (1) CN113159051B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113658200B (en) * 2021-07-29 2024-01-02 东北大学 Edge perception image semantic segmentation method based on self-adaptive feature fusion
CN113706546B (en) * 2021-08-23 2024-03-19 浙江工业大学 Medical image segmentation method and device based on lightweight twin network
CN113762396A (en) * 2021-09-10 2021-12-07 西南科技大学 Two-dimensional image semantic segmentation method
CN113706561B (en) * 2021-10-29 2022-03-29 华南理工大学 Image semantic segmentation method based on region separation
CN114426069B (en) * 2021-12-14 2023-08-25 哈尔滨理工大学 Indoor rescue vehicle based on real-time semantic segmentation and image semantic segmentation method
CN114398979A (en) * 2022-01-13 2022-04-26 四川大学华西医院 Ultrasonic image thyroid nodule classification method based on feature decoupling
CN114463542A (en) * 2022-01-22 2022-05-10 仲恺农业工程学院 Orchard complex road segmentation method based on lightweight semantic segmentation algorithm
CN114863094A (en) * 2022-05-31 2022-08-05 征图新视(江苏)科技股份有限公司 Industrial image region-of-interest segmentation algorithm based on double-branch network
CN115240041A (en) * 2022-07-13 2022-10-25 北京理工大学 Shale electron microscope scanning image crack extraction method based on deep learning segmentation network
CN115147703B (en) * 2022-07-28 2023-11-03 广东小白龙环保科技有限公司 Garbage segmentation method and system based on GinTrans network
CN115272681B (en) * 2022-09-22 2022-12-20 中国海洋大学 Ocean remote sensing image semantic segmentation method and system based on high-order feature class decoupling
CN116342884B (en) * 2023-03-28 2024-02-06 阿里云计算有限公司 Image segmentation and model training method and server
CN117078967B (en) * 2023-09-04 2024-03-01 石家庄铁道大学 Efficient and lightweight multi-scale pedestrian re-identification method
CN117475305B (en) * 2023-10-26 2024-07-19 广西壮族自治区自然资源遥感院 Multi-class building contour intelligent extraction and regularization method and application system
CN117475155B (en) * 2023-12-26 2024-04-02 厦门瑞为信息技术有限公司 Lightweight remote sensing image segmentation method based on semi-supervised learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112330681A (en) * 2020-11-06 2021-02-05 北京工业大学 Attention mechanism-based lightweight network real-time semantic segmentation method
CN112580654A (en) * 2020-12-25 2021-03-30 西南电子技术研究所(中国电子科技集团公司第十研究所) Semantic segmentation method for ground objects of remote sensing image

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8001527B1 (en) * 2004-12-21 2011-08-16 Zenprise, Inc. Automated root cause analysis of problems associated with software application deployments
CN104392209B (en) * 2014-11-07 2017-09-29 长春理工大学 A kind of image complexity evaluation method of target and background
CN104574296B (en) * 2014-12-24 2017-07-04 长春理工大学 A kind of method for polarizing the m ultiwavelet fusion treatment picture for removing haze
CN113424154A (en) * 2019-05-23 2021-09-21 西门子股份公司 Method of edge side model processing, edge calculation apparatus, and computer readable medium
CN110674866B (en) * 2019-09-23 2021-05-07 兰州理工大学 Method for detecting X-ray breast lesion images by using transfer learning characteristic pyramid network
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion
CN111079649B (en) * 2019-12-17 2023-04-07 西安电子科技大学 Remote sensing image ground feature classification method based on lightweight semantic segmentation network
CN111797676B (en) * 2020-04-30 2022-10-28 南京理工大学 High-resolution remote sensing image target on-orbit lightweight rapid detection method
CN111666836B (en) * 2020-05-22 2023-05-02 北京工业大学 High-resolution remote sensing image target detection method of M-F-Y type light convolutional neural network
CN112149547B (en) * 2020-09-17 2023-06-02 南京信息工程大学 Remote sensing image water body identification method based on image pyramid guidance and pixel pair matching
CN112183360B (en) * 2020-09-29 2022-11-08 上海交通大学 Lightweight semantic segmentation method for high-resolution remote sensing image

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112330681A (en) * 2020-11-06 2021-02-05 北京工业大学 Attention mechanism-based lightweight network real-time semantic segmentation method
CN112580654A (en) * 2020-12-25 2021-03-30 西南电子技术研究所(中国电子科技集团公司第十研究所) Semantic segmentation method for ground objects of remote sensing image

Also Published As

Publication number Publication date
CN113159051A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
CN113159051B (en) Remote sensing image lightweight semantic segmentation method based on edge decoupling
CN112183360B (en) Lightweight semantic segmentation method for high-resolution remote sensing image
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN108596248B (en) Remote sensing image classification method based on improved deep convolutional neural network
CN111695467B (en) Spatial spectrum full convolution hyperspectral image classification method based on super-pixel sample expansion
CN111612807B (en) Small target image segmentation method based on scale and edge information
CN113888550B (en) Remote sensing image road segmentation method combining super-resolution and attention mechanism
CN114120102A (en) Boundary-optimized remote sensing image semantic segmentation method, device, equipment and medium
CN113807210A (en) Remote sensing image semantic segmentation method based on pyramid segmentation attention module
CN111369563A (en) Semantic segmentation method based on pyramid void convolutional network
CN112381097A (en) Scene semantic segmentation method based on deep learning
CN109035267B (en) Image target matting method based on deep learning
CN110852369B (en) Hyperspectral image classification method combining 3D/2D convolutional network and adaptive spectrum unmixing
CN110517272B (en) Deep learning-based blood cell segmentation method
CN112560966B (en) Polarized SAR image classification method, medium and equipment based on scattering map convolution network
CN111178438A (en) ResNet 101-based weather type identification method
CN104700100A (en) Feature extraction method for high spatial resolution remote sensing big data
CN110930378A (en) Emphysema image processing method and system based on low data demand
CN114359297A (en) Attention pyramid-based multi-resolution semantic segmentation method and device
CN111815526B (en) Rain image rainstrip removing method and system based on image filtering and CNN
CN114821340A (en) Land utilization classification method and system
CN116935043A (en) Typical object remote sensing image generation method based on multitasking countermeasure network
CN111242028A (en) Remote sensing image ground object segmentation method based on U-Net
CN114511785A (en) Remote sensing image cloud detection method and system based on bottleneck attention module
CN111179272A (en) Rapid semantic segmentation method for road scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant