CN115205672A - Remote sensing building semantic segmentation method and system based on multi-scale regional attention - Google Patents

Remote sensing building semantic segmentation method and system based on multi-scale regional attention Download PDF

Info

Publication number
CN115205672A
CN115205672A CN202210577106.2A CN202210577106A CN115205672A CN 115205672 A CN115205672 A CN 115205672A CN 202210577106 A CN202210577106 A CN 202210577106A CN 115205672 A CN115205672 A CN 115205672A
Authority
CN
China
Prior art keywords
remote sensing
semantic segmentation
attention
scale
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210577106.2A
Other languages
Chinese (zh)
Inventor
徐胜军
邓博文
孟月波
刘光辉
赵敏华
韩九强
钟德星
吕红强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Architecture and Technology
Original Assignee
Xian University of Architecture and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Architecture and Technology filed Critical Xian University of Architecture and Technology
Priority to CN202210577106.2A priority Critical patent/CN115205672A/en
Publication of CN115205672A publication Critical patent/CN115205672A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/176Urban or other man-made structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

A remote sensing building semantic segmentation method and system based on multi-scale regional attention comprises the following steps: step 1, obtaining an image containing a remote sensing building, and constructing a remote sensing building data set; step 2, training a pre-constructed semantic segmentation network by using the acquired remote sensing building data set to obtain the trained semantic segmentation network, wherein the semantic segmentation network comprises a coding and decoding structure network based on regional attention and a multi-scale regional attention module; step 3, utilizing the trained semantic segmentation network to segment and extract buildings in the remote sensing image to be extracted; the method can effectively position the discriminative characteristic region, extract the global node characteristics and the local semantic information, can more effectively segment the high-resolution remote sensing building image, and has better robustness.

Description

Remote sensing building semantic segmentation method and system based on multi-scale regional attention
Technical Field
The invention belongs to the technical field of high-resolution remote sensing building image extraction, and particularly relates to a remote sensing building semantic segmentation method and system based on multi-scale regional attention.
Background
The semantic segmentation of the high-resolution remote sensing building is an important component of a remote sensing earth observation technology, and the main task of the semantic segmentation is to extract relevant characteristic information of the building by using an acquired remote sensing image, classify a target represented by each pixel in the remote sensing image and further finish the extraction of the building in the remote sensing image. With the development of computer vision, more and more researchers carry out deep research on the semantic segmentation problem of the high-resolution remote sensing building. These research methods are mainly divided into: the method comprises a high-resolution remote sensing building semantic segmentation method based on traditional machine learning and a high-resolution remote sensing building semantic segmentation method based on deep learning. The high-resolution remote sensing building semantic segmentation method based on traditional machine learning mainly utilizes artificially constructed features to train classifiers, such as basic feature information of some images of shapes, textures, colors, spectrums, spatial details and the like. Although artificially constructed features may effectively represent various attributes of an image. However, the classical algorithm of artificially constructed features often has the defects of poor generalization capability, complex design and the like, and the problem of semantic segmentation of the remote sensing image in a real complex environment is difficult to solve.
In recent years, deep learning has the characteristics of strong generalization, self-learning target characteristics and the like, so that a good effect is obtained in the problem of high-resolution remote sensing building image semantic segmentation. In order to enhance the representation capability of the network to the target to be extracted in different scenes, the mainstream idea is to increase an attention mechanism in an encoding and decoding module, so that the network has the capability of capturing long-distance dependency relationship and context information, and the segmentation and classification precision is further improved. Although significant research results have been obtained in the task of semantic segmentation of high-resolution remote sensing buildings based on deep learning and attention mechanism, the mainstream attention-based method is still limited to perform associated classification on pixel levels at different positions, and lacks attention on local sub-region level consistency and correlation between regions, so that the network lacks learning and supervision on the target region to be segmented and the edge consistency, and the semantic segmentation result precision is greatly influenced. Therefore, how to design an effective regional attention mechanism and further enhance the attention capacity of the network to the correlation between the remote sensing building image neighborhoods and the consistency of pixels in the neighborhoods remains a very challenging problem.
Disclosure of Invention
The invention aims to provide a remote sensing building semantic segmentation method and system based on multi-scale regional attention, and overcomes the defects in the prior art.
In order to achieve the purpose, the invention adopts the technical scheme that:
the invention provides a remote sensing building semantic segmentation method based on multi-scale regional attention, which comprises the following steps of:
step 1, obtaining an image containing a remote sensing building, and constructing a remote sensing building data set;
step 2, training a pre-constructed semantic segmentation network by using the acquired remote sensing building data set to obtain the trained semantic segmentation network, wherein the semantic segmentation network comprises a coding and decoding structure network based on regional attention and a multi-scale regional attention module;
and 3, segmenting and extracting buildings in the remote sensing image to be extracted by utilizing the trained semantic segmentation network.
Preferably, the regional attention-based codec structure network comprises an encoder and a decoder, wherein the encoder comprises a convolution block and four residual block based on a residual structure; the decoder includes four upsampling blocks and four multi-scale region attention modules and a convolution block.
Preferably, the convolution block of the encoder comprises two convolution layers, each convolution layer having associated therewith a batch normalization layer and a leaky linear rectifier.
Preferably, each residual block includes a maximum pooling layer, an output end of the maximum pooling layer is connected with two convolution layers, and an output end of each convolution layer is sequentially connected with a batch normalization layer and a linear rectification with leakage.
Preferably, four upsampling blocks and four multi-scale region attention modules of the decoder are alternately connected, and the output end of the multi-scale region attention module arranged at the last is connected with a convolution block.
Preferably, each of the up-sampling blocks comprises two convolution layers, and each convolution layer is connected with a batch normalization layer and a leaky linear rectifier.
Preferably, the multi-scale region attention module comprises a multi-scale neighborhood extraction module, a region embedding module, a self-attention module and a local weighting module.
Preferably, the multi-scale neighborhood extraction module comprises two stages, wherein the first stage comprises four void convolution layers; the second stage comprises a convolution layer, and the output of the convolution layer is sequentially connected with a batch normalization layer and linear rectification with leakage;
the region embedding module comprises a maximum pooling layer and a convolution layer, and the output of the convolution layer is connected with a batch normalization layer;
the self-attention module comprises three convolutional layers and a Softmax layer;
the local weighting module comprises an upper sampling layer and two convolution layers, and the output of each convolution layer is connected with a batch normalization layer and linear rectification with leakage.
Preferably, in step 3, the obtained remote sensing building data set is used to train the pre-constructed semantic segmentation network to obtain the trained semantic segmentation network, and the specific method is as follows:
and performing iterative optimization training on the pre-constructed semantic segmentation network by using the acquired remote sensing building data set and combining with a loss function of regional consistency supervision to obtain the trained semantic segmentation network.
A remote sensing building semantic segmentation system based on multi-scale regional attention, comprising:
the data acquisition unit is used for acquiring images containing the remote sensing buildings and constructing a remote sensing building data set;
the network training unit is used for training a pre-constructed semantic segmentation network by utilizing the acquired remote sensing building data set to obtain the trained semantic segmentation network, wherein the semantic segmentation network comprises a coding and decoding structure network based on regional attention and a multi-scale regional attention module;
and the segmentation and extraction unit is used for segmenting and extracting buildings in the remote sensing image to be extracted by utilizing the trained semantic segmentation network.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a remote sensing building semantic segmentation method based on multi-scale regional attention, wherein the network ReA-Net firstly utilizes an encoder to mainly extract the characteristics of textures, boundaries, deep semantics and the like of buildings in a remote sensing image. Secondly, resolution recovery is carried out on the extracted feature map by utilizing a decoder structure of progressive up-sampling, and meanwhile, the attention capacity of the network on correlation between the image neighborhoods of the remote sensing building and pixel consistency in the neighborhoods is enhanced by introducing a multi-scale region attention mechanism in the feature fusion stage of up-sampling, so that the extraction capacity of the network on the region and boundary feature information of the target to be segmented is enhanced. Finally, by introducing region consistency supervision loss and designing a weighted penalty term, the neighborhood and high-order neighborhood consistency of each pixel of the observation field and the label field is approximated, the neighborhood label continuity of the classification result is strengthened, and meanwhile, the sensitivity of the model to the building boundary and the precision of pixel classification are strengthened;
in conclusion, the method can effectively locate the distinguishing characteristic region, extract the global node characteristics and the local semantic information, can more effectively segment the high-resolution remote sensing building image, and has better robustness.
Drawings
FIG. 1 is a semantic segmentation network structure of a remote sensing building based on multi-scale regional attention;
FIG. 2 is a block diagram of a multi-scale zone attention module;
FIG. 3 is a block diagram of regional consistency loss;
FIG. 4 is a flow chart of the present invention;
fig. 5 is a graph of the effect of segmentation.
Detailed Description
The invention is further described below with reference to the accompanying drawings:
referring to fig. 1, the present invention provides a method for semantic segmentation of a remote sensing building based on multi-scale regional attention, which includes the following steps:
step 1, constructing a remote sensing building semantic segmentation network based on multi-scale regional attention, wherein the structure of the remote sensing building semantic segmentation network based on the multi-scale regional attention is shown in figure 1.
The semantic segmentation method constructs a Multi-scale region attention Module (MRA) based on a semantic segmentation network with a Unet coding and decoding structure, and further provides a semantic segmentation network (ReA-Net) based on Multi-scale attention; and for enhancing the smoothness of the network to distribute labels to the pixels in the local area, a local area consistency supervision module is established in a network output layer, wherein:
the proposed ReA-Net network is mainly composed of three parts: remote sensing image characteristic extraction module based on Unet codec structure, resolution ratio recovery module based on decoder and multi-scale regional attention (MRA) module, wherein:
the remote sensing image feature extraction module is used for extracting the features of textures, boundaries, deep semantics and the like of buildings in the remote sensing image and inputting the extracted features to the resolution recovery module based on the decoder;
the resolution recovery module based on the decoder is used for performing resolution recovery on input features, in order to enhance the characterization capability of a network on correlation between remote sensing image regions, a multi-scale region attention module is introduced into the resolution recovery module, region level features of different scales are constructed by utilizing cavity convolution and pooling operation and are input into the self-attention module, so that an enhanced graph of the region level correlation of a feature graph is obtained, and finally the region correlation enhanced graph and the input features are fused by utilizing a local weighting module, so that the expression of the region level correlation of the remote sensing image is realized.
Meanwhile, in order to improve the smoothness of the remote sensing image segmentation result in the segmentation result, a loss function for multi-scale neighborhood consistency supervision is provided based on the assumption that adjacent pixels in a local region tend to take the same segmentation label, so that the consistency constraint of the local region is enhanced, and the smoothness in the segmented local region is better.
The invention relates to a method for enhancing regional Attention of semantic features to a target Region, which comprises the following steps of (1) constructing a multi-scale regional Attention module based on a Unet network decoding structure, designing a Region-Attention-based coding and decoding structure network (Region-Attention Net, reA-Net), and enhancing regional Attention capability of the semantic features to the target Region; (2) Providing a loss function of multi-scale neighborhood consistency supervision based on the assumption that local area pixels tend to take the spatial consistency of the same label; (3) A multi-scale regional attention and neighborhood consistency supervision mechanism is fused, and a remote sensing building semantic segmentation algorithm based on multi-scale regional consistency attention supervision is provided.
Step 2, constructing a remote sensing building data set, which mainly comprises the following steps:
firstly, collecting high-resolution remote sensing Building image data, and carrying out experiments by adopting an initial image Dataset and a Massachusetts Building image Dataset.
And then, carrying out data enhancement on the collected remote sensing building data set by simultaneously adopting methods of random cutting, random horizontal turning, vertical turning and the like on the collected remote sensing building image, and dividing the remote sensing building image into a training data set and a test data set according to a certain proportion.
Step 3, constructing a coder-decoder part of ReA-Net, which mainly comprises the following steps:
the coder of the ReA-Net mainly comprises two stages, wherein the first stage utilizes a convolution block (Conv 1) to extract low-level texture features of a remote sensing image, the convolution block mainly comprises two convolution layers with convolution kernels of 3 multiplied by 3 and a filling of 1, and each convolution layer is followed by a Batch Normalization layer (BN) and a leakage Linear rectification layer (Leaky Rectified Linear Unit, leaky ReLU); the second stage comprises 4 residual block (ResConv 1-ResConv 4) based on residual structure, wherein each residual block comprises a largest pooling layer with a kernel size of 2 x 2 for down-sampling operation, two convolution layers with convolution kernels of 3 x 3 and a filling of 1 are arranged behind each convolution layer, and each convolution layer is followed by a batch normalization layer (BN) and leaky linear rectification (LeakyReLU) for increasing the capability of the network for extracting deeper semantic features of the remote sensing image. The encoder parameter table for ReA-Net is shown in Table 1.
TABLE 1ReA-Net encoder parameter Table
Figure RE-GDA0003851906800000061
Kernel in the table represents the convolution Kernel size; h, W represents the height and width of the input image; max stands for Max pooling.
The decoder mainly utilizes a progressive upsampling strategy to recover the resolution of the extracted feature map and complete the classification of dense pixels, and the classification is mainly divided into five stages.
The decoder mainly comprises 4 upsampling blocks (Upsample 1-Upsample 4), 4 multi-scale region attention (MRA 1-MRA 4) modules and a convolution block (Outconv), and is divided into five stages. From the first stage to the fourth stage, each stage includes an upsampling module and a multi-scale region attention module. And in order to eliminate the chessboard effect caused by deconvolution, each upsampling block comprises a bilinear upsampling layer and two convolution layers with convolution kernel size of 3 multiplied by 3 and filling quantity of 1, and each convolution layer is followed by a BN and a LeakyReLU. Meanwhile, in the up-sampling feature fusion stage, a multi-scale regional attention module is utilized to fuse regional level correlation enhancement maps of different scales, and the expression capability of the network on the large-scale spatial correlation among remote sensing image regions is enhanced. The multi-scale regional attention block mainly comprises a plurality of layers of pooling and convolution layers. And finally, the fifth stage is composed of a convolution block to realize the segmentation task of the remote sensing image output characteristic graph. The decoder parameter table for ReA-Net is shown in Table 2.
TABLE 2ReA-Net decoder parameter Table
Figure RE-GDA0003851906800000071
Kernel in the table represents the convolution Kernel size; h, W represents the height and width of the input image; scale _ factor represents the upsampling rate.
Step 4, constructing a multi-scale region attention Module (MRA), and specifically comprising the following steps:
let the characteristic diagram of input MRA be
Figure RE-GDA0003851906800000072
The output multi-scale region attention feature map is
Figure RE-GDA0003851906800000073
Wherein, W f ,H f And C is the height, width and channel number of the characteristic diagram of the input MRA respectively.
The proposed MRA mainly consists of a multi-scale neighborhood extraction (MNE) module, a Region Embedding (RE) module, a Self-Attention (SA) module, and a Local Weighting (LW) module.
Specifically, the multi-scale neighborhood extraction Module (MNE) consists of five convolutional layers, divided into two stages. The first stage is composed of four convolution layers with convolution kernel size of 3 multiplied by 3, void ratio of [1,3,5,7] and filling of [1,3,5,7], and is used for extracting multi-scale information and splicing. The second stage is composed of convolution layers with convolution kernel size of 1 × 1, and is used for performing dimension recovery on the features, and meanwhile, the convolution layer of the second stage is followed by a BN and a LeakyReLU;
a region embedding module (RE) for constructing a region level descriptor; the region embedding module (RE) is mainly composed of a convolution kernel size of 4, a maximum pooling layer with a step size of 4 and a convolution layer with a convolution kernel size of 3 × 3, and a BN is followed by the convolution layer.
The self-attention module (SA) is used for constructing correlation relation among characteristic regions, and the SA mainly comprises three convolution layers with the convolution kernel size of 1 multiplied by 1 and a Softmax layer;
and the Local Weighting (LW) module is used for weighting the correlation characteristics of the region level in the original input characteristic diagram so as to generate a multi-scale region attention characteristic diagram.
The LW mainly comprises an upsampling block which is composed of an upsampling layer with a scaling rate of 4 and an upsampling mode of a nearest sampling method and two convolutional layers with convolution kernel sizes of 3 multiplied by 3 and filling quantity of 1, and a BN and a LeakyReLU are arranged behind each convolutional layer in the upsampling block;
the specific flow of MRA can be described as:
firstly, extracting an input remote sensing building characteristic graph F by using cavity volume blocks with different cavity rates in The neighborhood characteristics extracted from the convolution layers with different void ratios are spliced to obtain a multi-scale neighborhood characteristic diagram
Figure RE-GDA0003851906800000081
Namely:
F in,d =ReLU(BN(Conv d,k,pad (F in ))) (1)
A=Concat(F in,1 ,F in,3 ,F in,5 ,F in,7 ) (2)
wherein, conv d,k,pad (. Cndot.) denotes the void fraction d ∈ [1,3,5,7]The convolution kernel size is k =3, and the padding is pad ∈ [1,3,5,7 ∈]The multilayer void convolution layer of (2); BN represents a batch normalization layer; reLU represents a leaky linear rectifying layer; concat represents the splicing operation.
Secondly, dimension selection is carried out on the feature map A by utilizing the convolution layer with convolution kernel of 1 to obtain a feature map
Figure RE-GDA0003851906800000091
And then reducing redundancy of multi-scale features, enhancing characterization capability of the features, and simultaneously carrying out region average pooling operation on the feature map B to obtain region level descriptors in order to obtain feature characterization information of all regions
Figure RE-GDA0003851906800000092
Namely:
C=Avgpool k,s (BN(Conv d,k,pad (A))) (3)
wherein, conv d,k,pad (. Cndot.) represents a convolutional layer with void rate d =1, convolutional kernel size k =1, and pad =0, where the dimensionality reduction rate is 0.25; BN represents a batch normalization layer; avgpool k,s (·) represents the maximum pooling layer with convolution kernel k =4 and step size s = 4; h r =1/4H,W r =1/4W。
Again, to obtain the correlation relationship between feature regions, C with region-level features is input from the attention module (SA). Thirdly, coding and deforming the characteristic diagram C by utilizing the convolution layer of the convolution kernel 1 to respectively obtain three coded characteristic matrixes,
Figure RE-GDA0003851906800000093
multiplying the two characteristic diagram matrixes V, G and then obtaining a space attention moment array through a Softmax activation function
Figure RE-GDA0003851906800000094
Finally, multiplying the characteristic matrix I and the space attention matrix Z and deforming to obtain an enhanced graph with correlation between areas
Figure RE-GDA0003851906800000095
. Namely:
Figure RE-GDA0003851906800000096
Figure RE-GDA0003851906800000097
wherein, conv d,k,pad (. Cndot.) represents a convolutional layer having a void rate d =1, a convolutional kernel size k =1, and a pad =0 as a filler;
Figure RE-GDA00038519068000000911
represents the multiplication of corresponding elements; v i ,G i Represents the attention score at spatial location i; m = W p ×H p
From time to time, an enhanced graph of inter-region correlation
Figure RE-GDA0003851906800000098
Inputting the correlation data into a local weighting module (LW), and linking the correlation at the region level in the original characteristic diagram F through weighting reaction in In (1). Upsampling by enhancing the graph K to obtain a region correlation weighted graph
Figure RE-GDA0003851906800000099
H and F in Carrying out pixel-by-pixel multiplication to obtain a regional attention feature map
Figure RE-GDA00038519068000000910
. Namely:
H=Upsample(K) (6)
Figure RE-GDA0003851906800000101
wherein, upesample scale,mode (·) denotes the upsampling layer with scale =4, upsampling mode being the nearest sampling method pad = 0;
finally, in order to integrate the relation between the global context semantic information and the regional relevance, a multi-scale regional attention feature map Q and a global attention feature map are combined
Figure RE-GDA0003851906800000102
Fusing, namely multiplying the fused result with the original characteristic pixel by pixel through an activation function Sigmoid to obtain a multi-scale region attention characteristic diagram F output by the MRA module out . Namely:
Figure RE-GDA0003851906800000103
step 5, training the ReA-Net, and the specific steps comprise:
inputting the established high-resolution remote sensing building training data set into a network, and calculating by using a forward propagation algorithm to obtain loss; solving a partial derivative of the objective function with respect to the feature; and obtaining a gradient by using a back propagation algorithm to update and learn parameters.
The invention provides a loss function for monitoring regional consistency, aims to quantitatively evaluate the regional consistency, the edge continuity and the error of a real pixel label in the remote sensing building image segmentation result by ReA-Net, and uses a network model to perform back propagation and iteratively optimize the network weight by using the network loss. In training, the established high-resolution remote sensing building training data set is input into a network, and when the loss function metric value is minimum, namely the difference between the input training image and the output network predicted value is minimum, the trained network is optimal. The regional uniformity loss structure is shown in figure 3.
Loss function Loss for region consistency supervision lc Is defined as:
Figure RE-GDA0003851906800000104
in the formula (I), the compound is shown in the specification,
Figure RE-GDA0003851906800000105
representing a result set of a prediction probability graph output by the ReA-Net;
Figure RE-GDA0003851906800000106
represents a set of tags, wherein
Figure RE-GDA0003851906800000107
The number of training images is set for a single input.
Let S = { S | S ≦ B × R } mean one defined in each training image
Figure RE-GDA0003851906800000108
A finite set of lattice points on, wherein
Figure RE-GDA0003851906800000109
And a neighborhood node set representing the node s, B and R are set sizes, and d represents the Euclidean distance between the neighborhood node and the central node. The penalty weight term may be defined as:
Figure RE-GDA0003851906800000111
then there are:
Figure RE-GDA0003851906800000112
the invention also provides a remote sensing building semantic segmentation system based on multi-scale regional attention, which comprises the following steps:
the data acquisition unit is used for acquiring images containing the remote sensing buildings and constructing a data set of the remote sensing buildings;
the network training unit is used for training a pre-constructed semantic segmentation network by utilizing the acquired remote sensing building data set to obtain the trained semantic segmentation network, wherein the semantic segmentation network comprises a coding and decoding structure network based on regional attention and a multi-scale regional attention module;
a segmentation and extraction unit for segmenting and extracting buildings in the remote sensing image to be extracted by utilizing the trained semantic segmentation network
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 5, fig. 5 is a segmentation effect diagram, and it can be seen from fig. 5 that semantic segmentation is affected by a problem of adhesion between buildings and a foreground background color, for example, in a remote sensing image shown in fig. 5 (a), when there is building adhesion, an adhesion part is relatively small compared with the building itself, and is similar to the problem of similarity between the building color and the background color in fig. 5 (b), fig. 5 (d), fig. 5 (e), and fig. 5 (g), and the problem of similarity between the building adhesion part color and the background color also occurs at the building adhesion part in fig. 5 (a); the illumination and the shadow also significantly influence the performance of semantic segmentation, and the boundary of a building in the remote sensing image is influenced by the illumination and the shadow as shown in fig. 5 (c); the semantic segmentation effect of the remote sensing image with a complex foreground color is shown in fig. 5 (f); FIG. 5 (k) illustrates mainly the semantic segmentation effect of the remote sensing image with small target buildings and continuous buildings;
as shown in the row of FIG. 5, the ReA-Net provided by the invention can better extract the space detail information of the building, and is constrained by the space consistency, so that the ReA-Net can further reduce the problem of inaccurate segmentation caused by building adhesion; for the problem that the building color is similar to the background color, as shown in fig. 5 (a), 5 (b), 5 (d), 5 (e) and 5 (g), the multi-scale region attention mechanism provided by the invention can accurately pay attention to the edge and region shape of the building which are not significant, thereby realizing accurate segmentation of the image with the building color similar to the background color; as shown in fig. 5 (c) and 5 (f), the problem that the boundary of the remote sensing building image with high frequency is influenced by illumination, shadow and complex foreground color is solved, the algorithm can effectively notice the boundary of the building under the influence of various noises, and therefore the building segmentation under the influence of the illumination, the shadow and the complex foreground color is realized; as shown in FIG. 5 (k), the problem of under-segmentation of small targets in segmentation is caused by different sizes of buildings in remote sensing building images, and the algorithm provides a multi-scale regional attention and regional consistency supervision method, so that a network can also pay attention to regional and boundary information of the small targets, and the extraction capability of the network on the semantic features of the buildings is enhanced. In conclusion, the remote sensing building semantic segmentation method based on multi-scale regional attention provided by the invention can effectively perform high-quality segmentation on challenging problems such as remote sensing building image adhesion, illumination shadow interference, complex foreground and background color interference, small targets and the like in a complex scene.
The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims (10)

1. A remote sensing building semantic segmentation method based on multi-scale regional attention is characterized by comprising the following steps:
step 1, obtaining an image containing a remote sensing building, and constructing a remote sensing building data set;
step 2, training a pre-constructed semantic segmentation network by using the acquired remote sensing building data set to obtain the trained semantic segmentation network, wherein the semantic segmentation network comprises a coding and decoding structure network based on regional attention and a multi-scale regional attention module;
and 3, segmenting and extracting buildings in the remote sensing image to be extracted by utilizing the trained semantic segmentation network.
2. The method for carrying out semantic segmentation on the remote sensing building based on the multiscale regional attention according to claim 1, wherein the regional attention-based coding and decoding structure network comprises an encoder and a decoder, wherein the encoder comprises a rolling block and four residual block based on a residual structure; the decoder includes four upsampling blocks and four multi-scale region attention modules and a convolution block.
3. The method for semantic segmentation of remote sensing buildings based on multiscale regional attention according to claim 2, wherein the convolution block of the encoder comprises two convolution layers, each convolution layer being connected to a batch normalization layer and leaky linear rectification.
4. The remote sensing building semantic segmentation method based on multi-scale regional attention according to claim 2, wherein each residual block comprises a maximum pooling layer, two convolution layers are connected to an output end of the maximum pooling layer, and a batch normalization layer and linear rectification with leakage are sequentially connected to an output end of each convolution layer.
5. The method for semantic segmentation of remote sensing buildings based on multi-scale regional attention of claim 2, wherein four upsampling blocks and four multi-scale regional attention modules of the decoder are alternately connected, and a convolution block is connected to the output end of the last multi-scale regional attention module.
6. The method for semantic segmentation of the remote sensing building based on multi-scale regional attention according to claim 2 or 5, wherein each upsampling block comprises two convolutional layers, and each convolutional layer is connected with a batch normalization layer and linear rectification with leakage.
7. The method for semantic segmentation of remote sensing buildings based on multi-scale regional attention of claim 1, wherein the multi-scale regional attention module comprises a multi-scale neighborhood extraction module, a region embedding module, a self-attention module and a local weighting module.
8. The remote sensing building semantic segmentation method based on multi-scale regional attention according to claim 1, wherein the multi-scale neighborhood extraction module comprises two stages, wherein the first stage comprises four void convolution layers; the second stage comprises a convolution layer, and the output of the convolution layer is sequentially connected with a batch normalization layer and linear rectification with leakage;
the region embedding module comprises a maximum pooling layer and a convolution layer, and the output of the convolution layer is connected with a batch normalization layer;
the self-attention module comprises three convolutional layers and a Softmax layer;
the local weighting module comprises an upper sampling layer and two convolution layers, and the output of each convolution layer is connected with a batch normalization layer and linear rectification with leakage.
9. The remote sensing building semantic segmentation method based on multi-scale regional attention according to claim 1, characterized in that in step 3, a pre-constructed semantic segmentation network is trained by using the obtained remote sensing building data set to obtain a trained semantic segmentation network, and the specific method is as follows:
and performing iterative optimization training on the pre-constructed semantic segmentation network by using the acquired remote sensing building data set and combining with a loss function of regional consistency supervision to obtain the trained semantic segmentation network.
10. A remote sensing building semantic segmentation system based on multi-scale regional attention is characterized by comprising:
the data acquisition unit is used for acquiring images containing the remote sensing buildings and constructing a remote sensing building data set;
the network training unit is used for training a pre-constructed semantic segmentation network by utilizing the acquired remote sensing building data set to obtain the trained semantic segmentation network, wherein the semantic segmentation network comprises a coding and decoding structure network based on regional attention and a multi-scale regional attention module;
and the segmentation and extraction unit is used for segmenting and extracting buildings in the remote sensing image to be extracted by utilizing the trained semantic segmentation network.
CN202210577106.2A 2022-05-25 2022-05-25 Remote sensing building semantic segmentation method and system based on multi-scale regional attention Pending CN115205672A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210577106.2A CN115205672A (en) 2022-05-25 2022-05-25 Remote sensing building semantic segmentation method and system based on multi-scale regional attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210577106.2A CN115205672A (en) 2022-05-25 2022-05-25 Remote sensing building semantic segmentation method and system based on multi-scale regional attention

Publications (1)

Publication Number Publication Date
CN115205672A true CN115205672A (en) 2022-10-18

Family

ID=83576952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210577106.2A Pending CN115205672A (en) 2022-05-25 2022-05-25 Remote sensing building semantic segmentation method and system based on multi-scale regional attention

Country Status (1)

Country Link
CN (1) CN115205672A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115439483A (en) * 2022-11-09 2022-12-06 四川川锅环保工程有限公司 High-quality welding seam and welding seam defect identification system, method and storage medium
CN117274608A (en) * 2023-11-23 2023-12-22 太原科技大学 Remote sensing image semantic segmentation method based on space detail perception and attention guidance
CN117496353A (en) * 2023-11-13 2024-02-02 安徽农业大学 Rice seedling weed stem center distinguishing and positioning method based on two-stage segmentation model
CN117612024A (en) * 2023-11-23 2024-02-27 国网江苏省电力有限公司扬州供电分公司 Remote sensing image roof recognition method and system based on multi-scale attention

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115439483A (en) * 2022-11-09 2022-12-06 四川川锅环保工程有限公司 High-quality welding seam and welding seam defect identification system, method and storage medium
CN117496353A (en) * 2023-11-13 2024-02-02 安徽农业大学 Rice seedling weed stem center distinguishing and positioning method based on two-stage segmentation model
CN117274608A (en) * 2023-11-23 2023-12-22 太原科技大学 Remote sensing image semantic segmentation method based on space detail perception and attention guidance
CN117274608B (en) * 2023-11-23 2024-02-06 太原科技大学 Remote sensing image semantic segmentation method based on space detail perception and attention guidance
CN117612024A (en) * 2023-11-23 2024-02-27 国网江苏省电力有限公司扬州供电分公司 Remote sensing image roof recognition method and system based on multi-scale attention
CN117612024B (en) * 2023-11-23 2024-06-07 国网江苏省电力有限公司扬州供电分公司 Remote sensing image roof recognition method based on multi-scale attention

Similar Documents

Publication Publication Date Title
CN110428428B (en) Image semantic segmentation method, electronic equipment and readable storage medium
CN111563902B (en) Lung lobe segmentation method and system based on three-dimensional convolutional neural network
CN111047551B (en) Remote sensing image change detection method and system based on U-net improved algorithm
CN112966684B (en) Cooperative learning character recognition method under attention mechanism
CN112001960B (en) Monocular image depth estimation method based on multi-scale residual error pyramid attention network model
CN115205672A (en) Remote sensing building semantic segmentation method and system based on multi-scale regional attention
CN110070091B (en) Semantic segmentation method and system based on dynamic interpolation reconstruction and used for street view understanding
Zhou et al. BOMSC-Net: Boundary optimization and multi-scale context awareness based building extraction from high-resolution remote sensing imagery
CN110738697A (en) Monocular depth estimation method based on deep learning
CN111489357A (en) Image segmentation method, device, equipment and storage medium
CN112396607A (en) Streetscape image semantic segmentation method for deformable convolution fusion enhancement
CN110910437B (en) Depth prediction method for complex indoor scene
CN111626994A (en) Equipment fault defect diagnosis method based on improved U-Net neural network
CN114724155A (en) Scene text detection method, system and equipment based on deep convolutional neural network
CN111401380A (en) RGB-D image semantic segmentation method based on depth feature enhancement and edge optimization
CN103049340A (en) Image super-resolution reconstruction method of visual vocabularies and based on texture context constraint
CN110992366A (en) Image semantic segmentation method and device and storage medium
CN113256649A (en) Remote sensing image station selection and line selection semantic segmentation method based on deep learning
CN107506769A (en) A kind of extracting method and system of urban water-body information
CN113505670A (en) Remote sensing image weak supervision building extraction method based on multi-scale CAM and super-pixels
Xu et al. Feature-based constraint deep CNN method for mapping rainfall-induced landslides in remote regions with mountainous terrain: An application to Brazil
Zhou et al. Attention transfer network for nature image matting
CN116645592A (en) Crack detection method based on image processing and storage medium
CN113988147A (en) Multi-label classification method and device for remote sensing image scene based on graph network, and multi-label retrieval method and device
Li et al. EAGNet: A method for automatic extraction of agricultural greenhouses from high spatial resolution remote sensing images based on hybrid multi-attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination