CN112489054A - Remote sensing image semantic segmentation method based on deep learning - Google Patents
Remote sensing image semantic segmentation method based on deep learning Download PDFInfo
- Publication number
- CN112489054A CN112489054A CN202011359068.0A CN202011359068A CN112489054A CN 112489054 A CN112489054 A CN 112489054A CN 202011359068 A CN202011359068 A CN 202011359068A CN 112489054 A CN112489054 A CN 112489054A
- Authority
- CN
- China
- Prior art keywords
- convolution
- remote sensing
- multiplied
- network
- sensing image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 66
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000013135 deep learning Methods 0.000 title claims abstract description 17
- 101100295091 Arabidopsis thaliana NUDT14 gene Proteins 0.000 claims abstract 2
- 238000012549 training Methods 0.000 claims description 16
- 239000011800 void material Substances 0.000 claims description 14
- 238000005070 sampling Methods 0.000 claims description 9
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 9
- 238000011156 evaluation Methods 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 8
- 238000011176 pooling Methods 0.000 claims description 7
- 238000012360 testing method Methods 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 claims description 5
- 230000002146 bilateral effect Effects 0.000 claims description 4
- 238000013461 design Methods 0.000 claims description 4
- 230000002708 enhancing effect Effects 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 claims description 3
- 238000009795 derivation Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 9
- 238000013527 convolutional neural network Methods 0.000 abstract description 2
- 238000003709 image segmentation Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 5
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000003708 edge detection Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4007—Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a remote sensing image semantic segmentation method based on deep learning, and belongs to the technical field of machine vision. Aiming at the problems of difficult acquisition of the characteristics of small objects and insufficient segmentation precision of a semantic segmentation method of a mainstream deep convolutional neural network, the semantic segmentation method improves a single upsampling layer by improving a Deeplabv3 algorithm, and performs multi-layer upsampling by using residual errors obtained from a backbone network to ensure complete semantics of an image on resolution; meanwhile, the expansion rate of 4 layers of expansion convolution in the ASPP layer is modified, so that the network has a better effect on small object segmentation. The results show that: the improved Deeplabv3 semantic segmentation algorithm achieves the mIou and pixel accuracy rates of 94.92% and 98.01% on a self-made data set, improves the mIou and pixel accuracy rates by 3.77% and 2.40% respectively compared with the original algorithm, has higher accuracy, and has better robustness on segmentation of various terrains; the method is suitable for complex urban remote sensing image environments, and can be well used in the fields of urban planning, agricultural planning, military warfare and the like.
Description
Technical Field
The invention belongs to the technical field of machine vision, and particularly relates to a remote sensing image semantic segmentation method based on deep learning.
Background
With the continuous development of remote sensing technology, semantic information contained in remote sensing images is more and more abundant, so how to perform semantic segmentation on the remote sensing images, quickly and accurately extract important semantic information, and perform later application and development is a very important research topic. The semantic segmentation of the remote sensing image has wide application range and relates to urban planning, geological disaster prevention and control, military war simulation and the like. Particularly, in the aspect of military war simulation, semantic information segmented from remote sensing images plays an extremely important role in the rapid generation of real battlefield terrain and the rapid construction of environment.
For the semantic segmentation of remote sensing images, the method is generally divided into two categories of traditional graphics algorithms and deep learning-based algorithms. Conventional semantic segmentation algorithms include edge detection-based image segmentation algorithms, threshold-based image segmentation algorithms, and region-based image segmentation algorithms. The image segmentation algorithm based on edge detection simulates the human visual process, separates the image edge from the background, and perceives the image details, thereby recognizing the image object contour. The basic idea of the threshold-based image segmentation algorithm is to use the difference of the gray characteristics of the target and the background of interest in the image, and use one or more thresholds to divide the gray level of the image into several classes, and pixels belonging to the same class are identified as the same object. The image segmentation algorithm based on the region is characterized in that a small region in a target to be segmented is selected as a seed region from pixels according to a criterion of consistent region attribute characteristics, the region attribution of each pixel is determined on the basis, the pixels around the pixel are added continuously according to a certain criterion and are used as new seed regions, and finally all the pixels with specified characteristics are combined repeatedly to form the region. Although these methods can segment a complete scene, they are far inferior to the deep learning method in terms of segmentation accuracy.
Disclosure of Invention
Aiming at the problems of difficulty in obtaining the characteristics of small objects and insufficient segmentation precision of a mainstream semantic segmentation method of a deep convolutional neural network, the invention provides a semantic segmentation method of a remote sensing image based on deep learning. The method is suitable for complex urban surface remote sensing image segmentation and is used for semantic segmentation of machine vision.
In order to achieve the purpose, the invention adopts the following technical scheme:
a remote sensing image semantic segmentation method based on deep learning comprises the following steps:
step 1, marking collected remote sensing data by using a labelme tool to obtain a marking result;
step 2, performing data enhancement on the labeling result obtained in the step 1 to obtain a data set;
step 5, reading the network weight trained in the step 4 into a network through evaluation and judgment, reading a picture to be predicted into the network, and calculating to obtain a Logit;
and 6, analyzing the Logit score, giving the corresponding color of each pixel to represent specific classification, and finally obtaining a segmentation result.
Further, the specific method for enhancing data in step 2 is as follows: randomly cutting the remote sensing data original image and the marked mask, wherein the size of the picture obtained by cutting each time is 256 multiplied by 256 pixels, rotating, turning, blurring, Gaussian filtering, bilateral filtering and white noise adding are carried out on each cut picture to obtain an enhanced picture, and then a data set is established.
Further, the specific method for designing the network in step 3 is as follows:
a main network is formed by resnet-50, and comprises convolution with convolution kernel of 7 multiplied by 7 and step length of 2 and output channel number of 64, and visual field pooling is maximum value with 3 multiplied by 3 and step length of 2;
then the three convolution kernels are 1 multiplied by 1, 3 multiplied by 3, the step length of 1 multiplied by 1 is 1, and the number of output channels is 64, 64 and 256 convolution respectively; the four convolution kernels are convolution with 1 × 1, 3 × 3, 1 × 1 step length being 1, and the number of output channels being 128, 128 and 512 respectively; the six convolution kernels are convolution with 1 × 1, 3 × 3 and 1 × 1 step length of 1, and the number of output channels is respectively 256, 256 and 1024; the three convolution kernels are convolution with 1 × 1, 3 × 3, 1 × 1 step length being 1, and the number of output channels being 512, 512 and 2048 respectively;
after the ASPP module with the modified void ratio is used, the five parallel sub-modules are respectively:
convolution kernel is convolution with 1 multiplied by 1, step length is 1, and output channel number is 256;
convolution kernel is convolution with 3 multiplied by 3 step length of 1, void rate of 3 and output channel number of 256;
convolution kernel is convolution with 3 multiplied by 3 step length of 1, void rate of 6 and output channel number of 256;
convolution kernel is convolution with 3 multiplied by 3 step length of 1, void rate of 9 and output channel number of 256;
the last layer is global average pooling, and the number of output channels is 256; the larger expansion rate has better segmentation effect on some large objects, but has disadvantages and benefits for small objects. And because the expansion rate of the expansion convolution is high, sparse sampling input signals are caused, so that no correlation exists between information obtained by long-distance convolution, and the classification result is influenced. For the remote sensing images, the expansion rates like 1, 6, 12 and 18 are too large, and the large receptive field is unfavorable for the segmentation of the tiny objects in the remote sensing images. Therefore, the relation of how to adjust the expansion ratio in the ASPP module and process the large and small objects is the key to design the expansion convolution network, so that the expansion convolutions with expansion ratios of 1, 3, 6 and 9 are used respectively. The modified expansion ratio reduces the receptive field of the ASPP module to a certain extent, and the sensitivity of the network to large and small objects is balanced. Meanwhile, the expansion rate is reduced, so that the sampling input signals are dense, and the problem of convolution failure caused by overlarge expansion rate is solved. And finally, the network can obtain a finer segmentation result of the small object.
Because the resolution of the output images of the 5 submodules is the same, the five submodules are superposed on the channel dimension to obtain the characteristic that the channel number is 1280, and the output channel number is fused into 256 through 1 multiplied by 1 convolution; then reducing the characteristics into 64 x 64 pixels through bilinear interpolation upsampling; then, the convolution kernel is superposed on a channel with the convolution kernel with the size of 7 multiplied by 7 at the beginning to obtain the characteristic that the number of output channels is 512; the feature map obtained by 7 x 7 convolution of the first layer of Resnet-50 is subjected to maximum pooling only once, so that the feature map has the characteristics of higher resolution, more complete spatial position information and the like, the feature map after ASPP and the feature map subjected to the first layer of Resnet-50 are subjected to channel dimension combination to construct a module similar to a Decoder, and the feature map containing rich spatial position information in the lower layer is utilized to enable the segmentation result to have more fine pixel position recovery. Compared with the original network, the improved up-sampling module only increases 256+256 × 3 × 3 × 2-4864 parameters, and has little influence on the operation cost of the whole network.
And finally, performing bilinear interpolation up-sampling to restore the image resolution to 256 × 256 and performing 1 × 1 convolution to change the channel number to 5 to obtain the Logit through two convolutions with the step length of 3 × 3 being 1.
Further, the specific method for evaluating the network training in the step 5 is as follows: average and cross-ratioBuilding, vegetation, water system and road intersectionAnd pixel accuracyAs a detection evaluation index; based on the fact that the semantic segmentation of the remote sensing image is a classification task, the prediction result is in four conditions: true Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN); iou is the ratio of the intersection and union of the two sets of true and predicted values, i.e. the ratio
Where k +1 is the number of categories containing the background class, piiNumber of pixels to be correctly predicted, PijAnd PjiAll represent the number of pixels that are falsely detected, mlou is a consideration of all classes, and Iou of each class are added and averaged to obtain a global-based evaluation. The network training condition is considered by utilizing the mIou, and the larger the mIou value is, the better the network training effect is, and the more consistent the network training effect is with the correct segmentation result. Meanwhile, the convergence condition of the network can be judged according to the change of the mIou, and the smaller the change of the mIou is, the closer the network is to the convergence. Therefore, the invention chooses mIou to consider the network training situation, so as to find out a group of optimal weights.
Further, the specific method for reading the picture to be predicted into the network in the step 5 is as follows: cutting a plurality of pictures with the size of 256 multiplied by 256 pixels from the left upper corner of the remote sensing image to the right and from the top to the bottom, wherein the interval between the first columns of two adjacent pictures in the same row which are cut each time is 256 pixels, and the interval between the first rows of two adjacent pictures in the same column is also 256 pixels; meanwhile, when the size of the pre-cut picture at the edge of the remote sensing image is less than 256 × 256 pixels, the 256 × 256 pixels are cut in the opposite direction by taking the pre-cut picture as a reference. And after the prediction of the cut pictures is finished, splicing the pictures according to a cutting rule, thereby obtaining a complete Logit score map of the remote sensing image.
Further, the step 6 of analyzing the Logit score, giving the color corresponding to each pixel to represent specific classification, and the specific method of finally obtaining the segmentation result is as follows: the number of channels of the score map is 5, 5 channels of each pixel respectively represent corresponding scores of buildings, vegetation, water systems, roads and other classifications, and the highest score is the category of the current pixel; newly building a zero matrix with the resolution as the original resolution of the test chart and the number of channels as 3; judging the pixel value corresponding to the original score map, if the pixel value is a building, the pixel value is [31,102,156 ]; if the vegetation is found, the pixel value is [0,255,0 ]; if the pixel is a water system, the pixel value is [255, 0 ]; if the road is found, the pixel value is [192,192,192 ]; if the pixel value is other types, the pixel value is [255,255 ]; each pixel is dyed by the method, and the finally obtained matrix derivation is a segmentation result. Therefore, the segmentation result can be well visualized, and the segmentation result which is easy to read and understand is obtained.
Compared with the prior art, the invention has the following advantages:
the invention provides an algorithm, which aims at optimizing the segmentation effect on small objects, and constructs a network model more suitable for the semantic segmentation of remote sensing images from changing a single up-sampling structure and reducing the overlarge receptive field of an ASPP (automatic sequence protocol) module, so that the problems of difficult segmentation, low segmentation precision and the like of the small objects are solved.
Mainstream semantic segmentation networks are often used for carrying out experiments on MS-COCO data sets with large objects, and for remote sensing images, segmentation targets are small, so that the network is often poor in small object segmentation effect. Aiming at semantic segmentation of remote sensing images and the problem of high difficulty in segmentation of tiny objects in a complex environment, the invention provides a deep learning-based Deeplabv3 improved algorithm to modify an up-sampling module and adjust the void rate of an ASPP module to construct a network model suitable for segmentation of the remote sensing images, thereby enhancing the capability of segmenting small objects in the complex environment. The problem of poor segmentation capability to little objects such as vegetation, building has been solved effectively, has promoted the segmentation precision, has fine segmentation effect.
Drawings
FIG. 1 is a data set annotation interface diagram;
FIG. 2 is a network layout of the present method;
FIG. 3 is a diagram of an ASPP module architecture;
FIG. 4 is a mIou converged during network training;
FIG. 5 is an original image of a remote sensing image for testing;
FIG. 6 is the segmentation result of the present invention.
Detailed Description
Example 1
The invention relates to a remote sensing image semantic segmentation method based on deep learning, which is used for carrying out semantic segmentation on a high-precision image and comprises the following specific steps:
step 1, labeling a data set: and marking the collected high-precision remote sensing images by using professional labelme software to obtain corresponding mask images after marking. And processing the obtained mask, and converting the mask into an 8-bit gray scale map as a label used by the training network.
Step 2, performing data enhancement on the labeling result obtained in the step 1: and randomly cutting the remote sensing data original image and the marked mask, wherein the size of the picture obtained by cutting each time is 256 multiplied by 256 pixels. And then, rotating, turning, blurring, Gaussian filtering, bilateral filtering and white noise adding are carried out on each cut picture to obtain an enhanced data set. As shown in the data set annotation interface diagram of fig. 1.
Wherein the selection data is prior to enhancementAs a training set, the training set is,as a test set. Gaussian filtering, bilateral filtering, etc. are prior art and will not be described in detail here.
As shown in the network design diagram of the method of fig. 2, wherein the network structure is designed as: the backbone network consists of convolutions with step size 2 and output channel number 64 including convolution kernel 7 x 7. The field of view is 3 x 3 with a maximum pooling of steps of 2. The convolution is followed by three convolution kernels with 1 × 1, 3 × 3, 1 × 1 step size of 1, and output channel numbers of 64, and 256, respectively. The four convolution kernels are convolutions with 1 × 1, 3 × 3, 1 × 1 step size of 1, and the number of output channels is 128, 512, respectively. The six convolution kernels are convolution with 1 × 1, 3 × 3, 1 × 1 step length being 1, and the number of output channels being 256, 1024 respectively. The three convolution kernels are convolution with 1 × 1, 3 × 3, 1 × 1 step length being 1 and the number of output channels being 512, 512 and 2048 respectively. And then, after the ASPP module with the modified void rate is adopted, the five parallel sub-modules are respectively convolution with convolution kernel of 1 multiplied by 1, step length of 1 and output channel number of 256. Convolution kernel is convolution with 3 × 3 step size of 1, void rate of 3, and number of output channels of 256. Convolution kernel is convolution with 3 × 3 step size of 1, void ratio of 6, and output channel number of 256. Convolution kernel is convolution with 3 × 3 step size of 1, void rate of 9, and output channel number of 256. The last layer is global average pooling, and the number of output channels is 256, as shown in the structure diagram of the ASPP module in fig. 3. Because the output image resolutions of the above 5 modules are the same, the above five modules are overlapped on the channel dimension to obtain the feature with the channel number of 1280, and the output channel number is fused into 256 through 1 × 1 convolution. The features are then restored to 64 x 64 pixels by bilinear interpolation upsampling. And then the convolution is superposed on the channel with the convolution with the initial convolution kernel size of 7 multiplied by 7, so that the characteristic that the number of output channels is 512 is obtained. And finally, performing bilinear interpolation up-sampling to restore the image resolution to 256 × 256 and performing 1 × 1 convolution to change the channel number to 5 to obtain the Logit through two convolutions with the step length of 3 × 3 being 1.
The network evaluation method comprises the following steps: first, average cross-over ratioBuilding, vegetation, water system and road intersectionAnd pixel accuracyAs a detection evaluation index. Finally, mIou is used and the pixel accuracy reaches 94.92% and 98.01% of the weight. As shown by the converged mlou during the network training process of fig. 4.
And 5, inputting the high-precision remote sensing image for testing into a network for prediction: cutting a plurality of pictures with the size of 256 multiplied by 256 pixels from the left upper corner of the remote sensing image to the right and from the top to the bottom, wherein the interval between the first columns of two adjacent pictures in the same row which are cut each time is 256 pixels, and the interval between the first rows of two adjacent pictures in the same column is also 256 pixels; meanwhile, when the size of the pre-cut picture at the edge of the remote sensing image is less than 256 × 256 pixels, the 256 × 256 pixels are cut in the opposite direction by taking the pre-cut picture as a reference. And after the prediction of the cut pictures is finished, splicing the pictures according to a cutting rule, thereby obtaining a complete Logit score map of the remote sensing image.
And 6, analyzing the obtained Logit score, and drawing a final segmentation picture: the number of Logit image channels is 5, 5 channels of each pixel respectively represent corresponding scores of buildings, vegetation, water systems, roads and other classifications, and the highest score is the category of the current pixel. Therefore, a zero matrix with the size of 256 × 256 and the number of channels of 3 is newly created. Judging the pixel value corresponding to the original score map, if the pixel value is of other classes, the pixel value is [255,255 ]; if the building is found, the pixel value is [31,102,156 ]; if the vegetation is found, the pixel value is [0,255,0 ]; if the pixel is a water system, the pixel value is [255, 0 ]; if it is a road, the pixel value is [192,192,192], and the image segmentation result is drawn. Fig. 5 shows the remote sensing image original for test and fig. 6 shows the segmentation result of the present invention.
Aiming at semantic segmentation of remote sensing images and the problem of high difficulty in segmentation of tiny objects in a complex environment, the invention provides a deep learning-based Deeplabv3 improved algorithm to modify an up-sampling module and adjust the void rate of an ASPP module to construct a network model suitable for segmentation of the remote sensing images, thereby enhancing the capability of segmenting small objects in the complex environment. The experimental results show that: the network model provided by the invention effectively solves the problem of poor segmentation capability on small objects such as bedding, buildings and the like, improves the segmentation precision and has a good segmentation effect.
TABLE 1 analysis of efficiency
As can be seen from Table 1, in terms of average cross-over ratio (mIou), the average cross-over ratio of the original Deeplabv3 algorithm is 91.15%, the average cross-over ratio of the U-Net algorithm is 87.95%, the average cross-over ratio of the SegNet algorithm is 86.88%, the average cross-over ratio of HR-Net is 92.88%, and the average cross-over ratio of DANet is 95.16%, and the average cross-over ratio of the improved method of the invention is 94.92%, which is slightly lower than that of the DANet algorithm, and is improved by 3.77% and 2.04% compared with the original Deeplabv3 algorithm and HR-Net algorithm.
In the aspect of vegetation cross-combination ratio (Iou), the cross-combination ratio of the original Deeplabv3 algorithm is 85.25%, the cross-combination ratio of the vegetation of the DANet algorithm is 90.84%, the cross-combination ratio of the vegetation of the HR-Net algorithm is 82.83%, the cross-combination ratio of the vegetation of the method is 88.66%, and the method is improved by 3.41% and 1.9% compared with the original Deeplabv3 algorithm and the DANet algorithm.
In terms of building cross-correlation (Iou), the cross-correlation of the original Deeplabv3 algorithm is 90.06%, the cross-correlation of the building of the DANet algorithm is 90.50%, and the cross-correlation of the building of the HR-Net algorithm is 91.64%, and the cross-correlation of the building of the improved method is 93.83%, which is 3.77% and 30.33% higher than the original Deeplabv3 algorithm and the DANet algorithm.
The improved method (particle method) of the invention achieves 98.01% for Pixel Accuracy (Pixel Accuracy), and is improved by 2.40%, 0.67% and 4.16% respectively compared with the SegNet algorithm with the worst segmentation effect, namely the Deeplabv3 algorithm and the DANet algorithm.
Those skilled in the art will appreciate that the invention may be practiced without these specific details. Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.
Claims (6)
1. A remote sensing image semantic segmentation method based on deep learning is characterized by comprising the following steps: the method comprises the following steps:
step 1, marking collected remote sensing data by using a labelme tool to obtain a marking result;
step 2, performing data enhancement on the labeling result obtained in the step 1 to obtain a data set;
step 3, designing a network;
step 4, reading the data set in the step 2 into the design network in the step 3 for training;
step 5, reading the network weight trained in the step 4 into a network through evaluation and judgment, reading a picture to be predicted into the network, and calculating to obtain a Logit;
and 6, analyzing the Logit score, giving the corresponding color of each pixel to represent specific classification, and finally obtaining a segmentation result.
2. The remote sensing image semantic segmentation method based on deep learning of claim 1, which is characterized in that: the specific method for enhancing the data in the step 2 is as follows: randomly cutting the remote sensing data original image and the marked mask, wherein the size of the picture obtained by cutting each time is 256 multiplied by 256 pixels, rotating, turning, blurring, Gaussian filtering, bilateral filtering and white noise adding are carried out on each cut picture to obtain an enhanced picture, and then a data set is established.
3. The remote sensing image semantic segmentation method based on deep learning of claim 2, which is characterized in that: the specific method for designing the network in the step 3 is as follows:
a main network is formed by resnet-50, and comprises convolution with convolution kernel of 7 multiplied by 7 and step length of 2 and output channel number of 64, and visual field pooling is maximum value with 3 multiplied by 3 and step length of 2;
then the three convolution kernels are 1 multiplied by 1, 3 multiplied by 3, the step length of 1 multiplied by 1 is 1, and the number of output channels is 64, 64 and 256 convolution respectively; the four convolution kernels are convolution with 1 × 1, 3 × 3, 1 × 1 step length being 1, and the number of output channels being 128, 128 and 512 respectively; the six convolution kernels are convolution with 1 × 1, 3 × 3 and 1 × 1 step length of 1, and the number of output channels is respectively 256, 256 and 1024; the three convolution kernels are convolution with 1 × 1, 3 × 3, 1 × 1 step length being 1, and the number of output channels being 512, 512 and 2048 respectively;
after the ASPP module with the modified void ratio is used, the five parallel sub-modules are respectively:
convolution kernel is convolution with 1 multiplied by 1, step length is 1, and output channel number is 256;
convolution kernel is convolution with 3 multiplied by 3 step length of 1, void rate of 3 and output channel number of 256;
convolution kernel is convolution with 3 multiplied by 3 step length of 1, void rate of 6 and output channel number of 256;
convolution kernel is convolution with 3 multiplied by 3 step length of 1, void rate of 9 and output channel number of 256;
the last layer is global average pooling, and the number of output channels is 256;
because the resolution of the output images of the 5 submodules is the same, the five submodules are superposed on the channel dimension to obtain the characteristic that the channel number is 1280, and the output channel number is fused into 256 through 1 multiplied by 1 convolution; then reducing the characteristics into 64 x 64 pixels through bilinear interpolation upsampling; then, the convolution kernel is superposed on a channel with the convolution kernel with the size of 7 multiplied by 7 at the beginning to obtain the characteristic that the number of output channels is 512; and finally, performing bilinear interpolation up-sampling to restore the image resolution to 256 × 256 and performing 1 × 1 convolution to change the channel number to 5 to obtain the Logit through two convolutions with the step length of 3 × 3 being 1.
4. The remote sensing image semantic segmentation method based on deep learning of claim 3, which is characterized in that: the tool for evaluating the network training in the step 5The method comprises the following steps: average and cross-ratioBuilding, vegetation, water system and road intersectionAnd pixel accuracyAs a detection evaluation index; based on the fact that the semantic segmentation of the remote sensing image is a classification task, the prediction result is in four conditions: true Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN); iou is the ratio of the intersection and union of the two sets of true and predicted values, i.e. the ratio
Where k +1 is the number of categories containing the background class, piiNumber of pixels to be correctly predicted, PijAnd PjiAll represent the number of pixels that are falsely detected, mlou is a consideration of all classes, and Iou of each class are added and averaged to obtain a global-based evaluation.
5. The remote sensing image semantic segmentation method based on deep learning of claim 4, which is characterized in that: the concrete method for reading the picture to be predicted into the network in the step 5 is as follows: cutting a plurality of pictures with the size of 256 multiplied by 256 pixels from the left upper corner of the remote sensing image to the right and from the top to the bottom, wherein the interval between the first columns of two adjacent pictures in the same row which are cut each time is 256 pixels, and the interval between the first rows of two adjacent pictures in the same column is also 256 pixels; meanwhile, when the size of the pre-cut picture at the edge of the remote sensing image is less than 256 multiplied by 256 pixels, the 256 multiplied by 256 pixels are cut in the opposite direction by taking the pre-cut picture as a reference; and after the prediction of the cut pictures is finished, splicing the pictures according to a cutting rule, thereby obtaining a complete Logit score map of the remote sensing image.
6. The remote sensing image semantic segmentation method based on deep learning of claim 5, which is characterized in that: the step 6 of analyzing the Logit score, giving the color corresponding to each pixel to represent specific classification, and the specific method of finally obtaining the segmentation result is as follows: the number of channels of the score map is 5, 5 channels of each pixel respectively represent corresponding scores of buildings, vegetation, water systems, roads and other classifications, and the highest score is the category of the current pixel; newly building a zero matrix with the resolution as the original resolution of the test chart and the number of channels as 3; judging the pixel value corresponding to the original score map, if the pixel value is a building, the pixel value is [31,102,156 ]; if the vegetation is found, the pixel value is [0,255,0 ]; if the pixel is a water system, the pixel value is [255, 0 ]; if the road is found, the pixel value is [192,192,192 ]; if the pixel value is other types, the pixel value is [255,255 ]; each pixel is dyed by the method, and the finally obtained matrix derivation is a segmentation result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011359068.0A CN112489054A (en) | 2020-11-27 | 2020-11-27 | Remote sensing image semantic segmentation method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011359068.0A CN112489054A (en) | 2020-11-27 | 2020-11-27 | Remote sensing image semantic segmentation method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112489054A true CN112489054A (en) | 2021-03-12 |
Family
ID=74936403
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011359068.0A Pending CN112489054A (en) | 2020-11-27 | 2020-11-27 | Remote sensing image semantic segmentation method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112489054A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112801929A (en) * | 2021-04-09 | 2021-05-14 | 宝略科技(浙江)有限公司 | Local background semantic information enhancement method for building change detection |
CN113256649A (en) * | 2021-05-11 | 2021-08-13 | 国网安徽省电力有限公司经济技术研究院 | Remote sensing image station selection and line selection semantic segmentation method based on deep learning |
CN113496221A (en) * | 2021-09-08 | 2021-10-12 | 湖南大学 | Point supervision remote sensing image semantic segmentation method and system based on depth bilateral filtering |
CN113537033A (en) * | 2021-07-12 | 2021-10-22 | 哈尔滨理工大学 | Building rubbish remote sensing image identification method based on deep learning |
CN113688956A (en) * | 2021-10-26 | 2021-11-23 | 西南石油大学 | Sandstone slice segmentation and identification method based on depth feature fusion network |
CN113837972A (en) * | 2021-10-14 | 2021-12-24 | 中铁十九局集团矿业投资有限公司 | Mining method based on multispectral remote sensing technology |
CN114494910A (en) * | 2022-04-18 | 2022-05-13 | 陕西自然资源勘测规划设计院有限公司 | Facility agricultural land multi-class identification and classification method based on remote sensing image |
CN114782406A (en) * | 2022-05-21 | 2022-07-22 | 上海贝特威自动化科技有限公司 | RESNEXT50 deep segmentation network-based automobile gluing visual detection method |
CN115222734A (en) * | 2022-09-20 | 2022-10-21 | 山东大学齐鲁医院 | Image analysis method and system for gastric mucosa intestinal metaplasia |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110119728A (en) * | 2019-05-23 | 2019-08-13 | 哈尔滨工业大学 | Remote sensing images cloud detection method of optic based on Multiscale Fusion semantic segmentation network |
CN110390251A (en) * | 2019-05-15 | 2019-10-29 | 上海海事大学 | A kind of pictograph semantic segmentation method based on the processing of multiple neural network Model Fusion |
CN111462124A (en) * | 2020-03-31 | 2020-07-28 | 武汉卓目科技有限公司 | Remote sensing satellite cloud detection method based on Deep L abV3+ |
CN113256649A (en) * | 2021-05-11 | 2021-08-13 | 国网安徽省电力有限公司经济技术研究院 | Remote sensing image station selection and line selection semantic segmentation method based on deep learning |
-
2020
- 2020-11-27 CN CN202011359068.0A patent/CN112489054A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390251A (en) * | 2019-05-15 | 2019-10-29 | 上海海事大学 | A kind of pictograph semantic segmentation method based on the processing of multiple neural network Model Fusion |
CN110119728A (en) * | 2019-05-23 | 2019-08-13 | 哈尔滨工业大学 | Remote sensing images cloud detection method of optic based on Multiscale Fusion semantic segmentation network |
CN111462124A (en) * | 2020-03-31 | 2020-07-28 | 武汉卓目科技有限公司 | Remote sensing satellite cloud detection method based on Deep L abV3+ |
CN113256649A (en) * | 2021-05-11 | 2021-08-13 | 国网安徽省电力有限公司经济技术研究院 | Remote sensing image station selection and line selection semantic segmentation method based on deep learning |
Non-Patent Citations (3)
Title |
---|
LIANG-CHIEH CHEN等: "《Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation》", 《ECCV 2018》 * |
熊风光 等: "《改进的遥感图像语义分割研究》", 《HTTPS://KNS.CNKI.NET/KCMS/DETAIL/11.2127.TP.20210327.1608.010.HTML》 * |
青晨 等: "《深度卷积神经网络图像语义分割研究进展》", 《中国图象图形学报》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112801929A (en) * | 2021-04-09 | 2021-05-14 | 宝略科技(浙江)有限公司 | Local background semantic information enhancement method for building change detection |
CN113256649A (en) * | 2021-05-11 | 2021-08-13 | 国网安徽省电力有限公司经济技术研究院 | Remote sensing image station selection and line selection semantic segmentation method based on deep learning |
CN113256649B (en) * | 2021-05-11 | 2022-07-01 | 国网安徽省电力有限公司经济技术研究院 | Remote sensing image station selection and line selection semantic segmentation method based on deep learning |
CN113537033A (en) * | 2021-07-12 | 2021-10-22 | 哈尔滨理工大学 | Building rubbish remote sensing image identification method based on deep learning |
CN113496221A (en) * | 2021-09-08 | 2021-10-12 | 湖南大学 | Point supervision remote sensing image semantic segmentation method and system based on depth bilateral filtering |
CN113496221B (en) * | 2021-09-08 | 2022-02-01 | 湖南大学 | Point supervision remote sensing image semantic segmentation method and system based on depth bilateral filtering |
CN113837972A (en) * | 2021-10-14 | 2021-12-24 | 中铁十九局集团矿业投资有限公司 | Mining method based on multispectral remote sensing technology |
CN113688956A (en) * | 2021-10-26 | 2021-11-23 | 西南石油大学 | Sandstone slice segmentation and identification method based on depth feature fusion network |
CN114494910A (en) * | 2022-04-18 | 2022-05-13 | 陕西自然资源勘测规划设计院有限公司 | Facility agricultural land multi-class identification and classification method based on remote sensing image |
CN114782406A (en) * | 2022-05-21 | 2022-07-22 | 上海贝特威自动化科技有限公司 | RESNEXT50 deep segmentation network-based automobile gluing visual detection method |
CN115222734A (en) * | 2022-09-20 | 2022-10-21 | 山东大学齐鲁医院 | Image analysis method and system for gastric mucosa intestinal metaplasia |
CN115222734B (en) * | 2022-09-20 | 2023-01-17 | 山东大学齐鲁医院 | Image analysis method and system for gastric mucosa enteroepithelization |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112489054A (en) | Remote sensing image semantic segmentation method based on deep learning | |
CN109934200B (en) | RGB color remote sensing image cloud detection method and system based on improved M-Net | |
CN111986099B (en) | Tillage monitoring method and system based on convolutional neural network with residual error correction fused | |
CN110705457A (en) | Remote sensing image building change detection method | |
CN111797716A (en) | Single target tracking method based on Siamese network | |
CN111723693B (en) | Crowd counting method based on small sample learning | |
CN112084869B (en) | Compact quadrilateral representation-based building target detection method | |
CN113011329A (en) | Pyramid network based on multi-scale features and dense crowd counting method | |
CN111611861B (en) | Image change detection method based on multi-scale feature association | |
CN110929621B (en) | Road extraction method based on topology information refinement | |
CN111738113A (en) | Road extraction method of high-resolution remote sensing image based on double-attention machine system and semantic constraint | |
CN111160205A (en) | Embedded multi-class target end-to-end unified detection method for traffic scene | |
CN114821069A (en) | Building semantic segmentation method for double-branch network remote sensing image fused with rich scale features | |
CN115471467A (en) | High-resolution optical remote sensing image building change detection method | |
CN114299383A (en) | Remote sensing image target detection method based on integration of density map and attention mechanism | |
CN116246169A (en) | SAH-Unet-based high-resolution remote sensing image impervious surface extraction method | |
CN112149526A (en) | Lane line detection method and system based on long-distance information fusion | |
CN114519819B (en) | Remote sensing image target detection method based on global context awareness | |
CN116206112A (en) | Remote sensing image semantic segmentation method based on multi-scale feature fusion and SAM | |
CN116524189A (en) | High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization | |
CN116778318A (en) | Convolutional neural network remote sensing image road extraction model and method | |
CN114926826A (en) | Scene text detection system | |
CN114943902A (en) | Urban vegetation unmanned aerial vehicle remote sensing classification method based on multi-scale feature perception network | |
CN112801021B (en) | Method and system for detecting lane line based on multi-level semantic information | |
CN113920421A (en) | Fast-classification full convolution neural network model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210312 |
|
RJ01 | Rejection of invention patent application after publication |