CN112489054A - Remote sensing image semantic segmentation method based on deep learning - Google Patents

Remote sensing image semantic segmentation method based on deep learning Download PDF

Info

Publication number
CN112489054A
CN112489054A CN202011359068.0A CN202011359068A CN112489054A CN 112489054 A CN112489054 A CN 112489054A CN 202011359068 A CN202011359068 A CN 202011359068A CN 112489054 A CN112489054 A CN 112489054A
Authority
CN
China
Prior art keywords
convolution
remote sensing
multiplied
network
sensing image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011359068.0A
Other languages
Chinese (zh)
Inventor
熊风光
张鑫
刘欢乐
韩燮
况立群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North University of China
Original Assignee
North University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North University of China filed Critical North University of China
Priority to CN202011359068.0A priority Critical patent/CN112489054A/en
Publication of CN112489054A publication Critical patent/CN112489054A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing image semantic segmentation method based on deep learning, and belongs to the technical field of machine vision. Aiming at the problems of difficult acquisition of the characteristics of small objects and insufficient segmentation precision of a semantic segmentation method of a mainstream deep convolutional neural network, the semantic segmentation method improves a single upsampling layer by improving a Deeplabv3 algorithm, and performs multi-layer upsampling by using residual errors obtained from a backbone network to ensure complete semantics of an image on resolution; meanwhile, the expansion rate of 4 layers of expansion convolution in the ASPP layer is modified, so that the network has a better effect on small object segmentation. The results show that: the improved Deeplabv3 semantic segmentation algorithm achieves the mIou and pixel accuracy rates of 94.92% and 98.01% on a self-made data set, improves the mIou and pixel accuracy rates by 3.77% and 2.40% respectively compared with the original algorithm, has higher accuracy, and has better robustness on segmentation of various terrains; the method is suitable for complex urban remote sensing image environments, and can be well used in the fields of urban planning, agricultural planning, military warfare and the like.

Description

Remote sensing image semantic segmentation method based on deep learning
Technical Field
The invention belongs to the technical field of machine vision, and particularly relates to a remote sensing image semantic segmentation method based on deep learning.
Background
With the continuous development of remote sensing technology, semantic information contained in remote sensing images is more and more abundant, so how to perform semantic segmentation on the remote sensing images, quickly and accurately extract important semantic information, and perform later application and development is a very important research topic. The semantic segmentation of the remote sensing image has wide application range and relates to urban planning, geological disaster prevention and control, military war simulation and the like. Particularly, in the aspect of military war simulation, semantic information segmented from remote sensing images plays an extremely important role in the rapid generation of real battlefield terrain and the rapid construction of environment.
For the semantic segmentation of remote sensing images, the method is generally divided into two categories of traditional graphics algorithms and deep learning-based algorithms. Conventional semantic segmentation algorithms include edge detection-based image segmentation algorithms, threshold-based image segmentation algorithms, and region-based image segmentation algorithms. The image segmentation algorithm based on edge detection simulates the human visual process, separates the image edge from the background, and perceives the image details, thereby recognizing the image object contour. The basic idea of the threshold-based image segmentation algorithm is to use the difference of the gray characteristics of the target and the background of interest in the image, and use one or more thresholds to divide the gray level of the image into several classes, and pixels belonging to the same class are identified as the same object. The image segmentation algorithm based on the region is characterized in that a small region in a target to be segmented is selected as a seed region from pixels according to a criterion of consistent region attribute characteristics, the region attribution of each pixel is determined on the basis, the pixels around the pixel are added continuously according to a certain criterion and are used as new seed regions, and finally all the pixels with specified characteristics are combined repeatedly to form the region. Although these methods can segment a complete scene, they are far inferior to the deep learning method in terms of segmentation accuracy.
Disclosure of Invention
Aiming at the problems of difficulty in obtaining the characteristics of small objects and insufficient segmentation precision of a mainstream semantic segmentation method of a deep convolutional neural network, the invention provides a semantic segmentation method of a remote sensing image based on deep learning. The method is suitable for complex urban surface remote sensing image segmentation and is used for semantic segmentation of machine vision.
In order to achieve the purpose, the invention adopts the following technical scheme:
a remote sensing image semantic segmentation method based on deep learning comprises the following steps:
step 1, marking collected remote sensing data by using a labelme tool to obtain a marking result;
step 2, performing data enhancement on the labeling result obtained in the step 1 to obtain a data set;
step 3, designing a network;
step 4, reading the data set in the step 2 into the design network in the step 3 for training;
step 5, reading the network weight trained in the step 4 into a network through evaluation and judgment, reading a picture to be predicted into the network, and calculating to obtain a Logit;
and 6, analyzing the Logit score, giving the corresponding color of each pixel to represent specific classification, and finally obtaining a segmentation result.
Further, the specific method for enhancing data in step 2 is as follows: randomly cutting the remote sensing data original image and the marked mask, wherein the size of the picture obtained by cutting each time is 256 multiplied by 256 pixels, rotating, turning, blurring, Gaussian filtering, bilateral filtering and white noise adding are carried out on each cut picture to obtain an enhanced picture, and then a data set is established.
Further, the specific method for designing the network in step 3 is as follows:
a main network is formed by resnet-50, and comprises convolution with convolution kernel of 7 multiplied by 7 and step length of 2 and output channel number of 64, and visual field pooling is maximum value with 3 multiplied by 3 and step length of 2;
then the three convolution kernels are 1 multiplied by 1, 3 multiplied by 3, the step length of 1 multiplied by 1 is 1, and the number of output channels is 64, 64 and 256 convolution respectively; the four convolution kernels are convolution with 1 × 1, 3 × 3, 1 × 1 step length being 1, and the number of output channels being 128, 128 and 512 respectively; the six convolution kernels are convolution with 1 × 1, 3 × 3 and 1 × 1 step length of 1, and the number of output channels is respectively 256, 256 and 1024; the three convolution kernels are convolution with 1 × 1, 3 × 3, 1 × 1 step length being 1, and the number of output channels being 512, 512 and 2048 respectively;
after the ASPP module with the modified void ratio is used, the five parallel sub-modules are respectively:
convolution kernel is convolution with 1 multiplied by 1, step length is 1, and output channel number is 256;
convolution kernel is convolution with 3 multiplied by 3 step length of 1, void rate of 3 and output channel number of 256;
convolution kernel is convolution with 3 multiplied by 3 step length of 1, void rate of 6 and output channel number of 256;
convolution kernel is convolution with 3 multiplied by 3 step length of 1, void rate of 9 and output channel number of 256;
the last layer is global average pooling, and the number of output channels is 256; the larger expansion rate has better segmentation effect on some large objects, but has disadvantages and benefits for small objects. And because the expansion rate of the expansion convolution is high, sparse sampling input signals are caused, so that no correlation exists between information obtained by long-distance convolution, and the classification result is influenced. For the remote sensing images, the expansion rates like 1, 6, 12 and 18 are too large, and the large receptive field is unfavorable for the segmentation of the tiny objects in the remote sensing images. Therefore, the relation of how to adjust the expansion ratio in the ASPP module and process the large and small objects is the key to design the expansion convolution network, so that the expansion convolutions with expansion ratios of 1, 3, 6 and 9 are used respectively. The modified expansion ratio reduces the receptive field of the ASPP module to a certain extent, and the sensitivity of the network to large and small objects is balanced. Meanwhile, the expansion rate is reduced, so that the sampling input signals are dense, and the problem of convolution failure caused by overlarge expansion rate is solved. And finally, the network can obtain a finer segmentation result of the small object.
Because the resolution of the output images of the 5 submodules is the same, the five submodules are superposed on the channel dimension to obtain the characteristic that the channel number is 1280, and the output channel number is fused into 256 through 1 multiplied by 1 convolution; then reducing the characteristics into 64 x 64 pixels through bilinear interpolation upsampling; then, the convolution kernel is superposed on a channel with the convolution kernel with the size of 7 multiplied by 7 at the beginning to obtain the characteristic that the number of output channels is 512; the feature map obtained by 7 x 7 convolution of the first layer of Resnet-50 is subjected to maximum pooling only once, so that the feature map has the characteristics of higher resolution, more complete spatial position information and the like, the feature map after ASPP and the feature map subjected to the first layer of Resnet-50 are subjected to channel dimension combination to construct a module similar to a Decoder, and the feature map containing rich spatial position information in the lower layer is utilized to enable the segmentation result to have more fine pixel position recovery. Compared with the original network, the improved up-sampling module only increases 256+256 × 3 × 3 × 2-4864 parameters, and has little influence on the operation cost of the whole network.
And finally, performing bilinear interpolation up-sampling to restore the image resolution to 256 × 256 and performing 1 × 1 convolution to change the channel number to 5 to obtain the Logit through two convolutions with the step length of 3 × 3 being 1.
Further, the specific method for evaluating the network training in the step 5 is as follows: average and cross-ratio
Figure BDA0002803467630000041
Building, vegetation, water system and road intersection
Figure BDA0002803467630000042
And pixel accuracy
Figure BDA0002803467630000043
As a detection evaluation index; based on the fact that the semantic segmentation of the remote sensing image is a classification task, the prediction result is in four conditions: true Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN); iou is the ratio of the intersection and union of the two sets of true and predicted values, i.e. the ratio
Figure BDA0002803467630000044
Where k +1 is the number of categories containing the background class, piiNumber of pixels to be correctly predicted, PijAnd PjiAll represent the number of pixels that are falsely detected, mlou is a consideration of all classes, and Iou of each class are added and averaged to obtain a global-based evaluation. The network training condition is considered by utilizing the mIou, and the larger the mIou value is, the better the network training effect is, and the more consistent the network training effect is with the correct segmentation result. Meanwhile, the convergence condition of the network can be judged according to the change of the mIou, and the smaller the change of the mIou is, the closer the network is to the convergence. Therefore, the invention chooses mIou to consider the network training situation, so as to find out a group of optimal weights.
Further, the specific method for reading the picture to be predicted into the network in the step 5 is as follows: cutting a plurality of pictures with the size of 256 multiplied by 256 pixels from the left upper corner of the remote sensing image to the right and from the top to the bottom, wherein the interval between the first columns of two adjacent pictures in the same row which are cut each time is 256 pixels, and the interval between the first rows of two adjacent pictures in the same column is also 256 pixels; meanwhile, when the size of the pre-cut picture at the edge of the remote sensing image is less than 256 × 256 pixels, the 256 × 256 pixels are cut in the opposite direction by taking the pre-cut picture as a reference. And after the prediction of the cut pictures is finished, splicing the pictures according to a cutting rule, thereby obtaining a complete Logit score map of the remote sensing image.
Further, the step 6 of analyzing the Logit score, giving the color corresponding to each pixel to represent specific classification, and the specific method of finally obtaining the segmentation result is as follows: the number of channels of the score map is 5, 5 channels of each pixel respectively represent corresponding scores of buildings, vegetation, water systems, roads and other classifications, and the highest score is the category of the current pixel; newly building a zero matrix with the resolution as the original resolution of the test chart and the number of channels as 3; judging the pixel value corresponding to the original score map, if the pixel value is a building, the pixel value is [31,102,156 ]; if the vegetation is found, the pixel value is [0,255,0 ]; if the pixel is a water system, the pixel value is [255, 0 ]; if the road is found, the pixel value is [192,192,192 ]; if the pixel value is other types, the pixel value is [255,255 ]; each pixel is dyed by the method, and the finally obtained matrix derivation is a segmentation result. Therefore, the segmentation result can be well visualized, and the segmentation result which is easy to read and understand is obtained.
Compared with the prior art, the invention has the following advantages:
the invention provides an algorithm, which aims at optimizing the segmentation effect on small objects, and constructs a network model more suitable for the semantic segmentation of remote sensing images from changing a single up-sampling structure and reducing the overlarge receptive field of an ASPP (automatic sequence protocol) module, so that the problems of difficult segmentation, low segmentation precision and the like of the small objects are solved.
Mainstream semantic segmentation networks are often used for carrying out experiments on MS-COCO data sets with large objects, and for remote sensing images, segmentation targets are small, so that the network is often poor in small object segmentation effect. Aiming at semantic segmentation of remote sensing images and the problem of high difficulty in segmentation of tiny objects in a complex environment, the invention provides a deep learning-based Deeplabv3 improved algorithm to modify an up-sampling module and adjust the void rate of an ASPP module to construct a network model suitable for segmentation of the remote sensing images, thereby enhancing the capability of segmenting small objects in the complex environment. The problem of poor segmentation capability to little objects such as vegetation, building has been solved effectively, has promoted the segmentation precision, has fine segmentation effect.
Drawings
FIG. 1 is a data set annotation interface diagram;
FIG. 2 is a network layout of the present method;
FIG. 3 is a diagram of an ASPP module architecture;
FIG. 4 is a mIou converged during network training;
FIG. 5 is an original image of a remote sensing image for testing;
FIG. 6 is the segmentation result of the present invention.
Detailed Description
Example 1
The invention relates to a remote sensing image semantic segmentation method based on deep learning, which is used for carrying out semantic segmentation on a high-precision image and comprises the following specific steps:
step 1, labeling a data set: and marking the collected high-precision remote sensing images by using professional labelme software to obtain corresponding mask images after marking. And processing the obtained mask, and converting the mask into an 8-bit gray scale map as a label used by the training network.
Step 2, performing data enhancement on the labeling result obtained in the step 1: and randomly cutting the remote sensing data original image and the marked mask, wherein the size of the picture obtained by cutting each time is 256 multiplied by 256 pixels. And then, rotating, turning, blurring, Gaussian filtering, bilateral filtering and white noise adding are carried out on each cut picture to obtain an enhanced data set. As shown in the data set annotation interface diagram of fig. 1.
Wherein the selection data is prior to enhancement
Figure BDA0002803467630000071
As a training set, the training set is,
Figure BDA0002803467630000072
as a test set. Gaussian filtering, bilateral filtering, etc. are prior art and will not be described in detail here.
Step 3, reading the training set after the data enhancement in the step 2 into a designed network for training: 66666 remote sensing images as training set, 33333 as test set, batch size 48, learning rate 2X 10-4The weight attenuation is normalized by l2 with a weight attenuation rate of 5 × 10-4. Finally, the average cross-over ratio (mlou) stabilized around 94.92, iterating to 48000 stops.
As shown in the network design diagram of the method of fig. 2, wherein the network structure is designed as: the backbone network consists of convolutions with step size 2 and output channel number 64 including convolution kernel 7 x 7. The field of view is 3 x 3 with a maximum pooling of steps of 2. The convolution is followed by three convolution kernels with 1 × 1, 3 × 3, 1 × 1 step size of 1, and output channel numbers of 64, and 256, respectively. The four convolution kernels are convolutions with 1 × 1, 3 × 3, 1 × 1 step size of 1, and the number of output channels is 128, 512, respectively. The six convolution kernels are convolution with 1 × 1, 3 × 3, 1 × 1 step length being 1, and the number of output channels being 256, 1024 respectively. The three convolution kernels are convolution with 1 × 1, 3 × 3, 1 × 1 step length being 1 and the number of output channels being 512, 512 and 2048 respectively. And then, after the ASPP module with the modified void rate is adopted, the five parallel sub-modules are respectively convolution with convolution kernel of 1 multiplied by 1, step length of 1 and output channel number of 256. Convolution kernel is convolution with 3 × 3 step size of 1, void rate of 3, and number of output channels of 256. Convolution kernel is convolution with 3 × 3 step size of 1, void ratio of 6, and output channel number of 256. Convolution kernel is convolution with 3 × 3 step size of 1, void rate of 9, and output channel number of 256. The last layer is global average pooling, and the number of output channels is 256, as shown in the structure diagram of the ASPP module in fig. 3. Because the output image resolutions of the above 5 modules are the same, the above five modules are overlapped on the channel dimension to obtain the feature with the channel number of 1280, and the output channel number is fused into 256 through 1 × 1 convolution. The features are then restored to 64 x 64 pixels by bilinear interpolation upsampling. And then the convolution is superposed on the channel with the convolution with the initial convolution kernel size of 7 multiplied by 7, so that the characteristic that the number of output channels is 512 is obtained. And finally, performing bilinear interpolation up-sampling to restore the image resolution to 256 × 256 and performing 1 × 1 convolution to change the channel number to 5 to obtain the Logit through two convolutions with the step length of 3 × 3 being 1.
The network evaluation method comprises the following steps: first, average cross-over ratio
Figure BDA0002803467630000081
Building, vegetation, water system and road intersection
Figure BDA0002803467630000082
And pixel accuracy
Figure BDA0002803467630000083
As a detection evaluation index. Finally, mIou is used and the pixel accuracy reaches 94.92% and 98.01% of the weight. As shown by the converged mlou during the network training process of fig. 4.
And 5, inputting the high-precision remote sensing image for testing into a network for prediction: cutting a plurality of pictures with the size of 256 multiplied by 256 pixels from the left upper corner of the remote sensing image to the right and from the top to the bottom, wherein the interval between the first columns of two adjacent pictures in the same row which are cut each time is 256 pixels, and the interval between the first rows of two adjacent pictures in the same column is also 256 pixels; meanwhile, when the size of the pre-cut picture at the edge of the remote sensing image is less than 256 × 256 pixels, the 256 × 256 pixels are cut in the opposite direction by taking the pre-cut picture as a reference. And after the prediction of the cut pictures is finished, splicing the pictures according to a cutting rule, thereby obtaining a complete Logit score map of the remote sensing image.
And 6, analyzing the obtained Logit score, and drawing a final segmentation picture: the number of Logit image channels is 5, 5 channels of each pixel respectively represent corresponding scores of buildings, vegetation, water systems, roads and other classifications, and the highest score is the category of the current pixel. Therefore, a zero matrix with the size of 256 × 256 and the number of channels of 3 is newly created. Judging the pixel value corresponding to the original score map, if the pixel value is of other classes, the pixel value is [255,255 ]; if the building is found, the pixel value is [31,102,156 ]; if the vegetation is found, the pixel value is [0,255,0 ]; if the pixel is a water system, the pixel value is [255, 0 ]; if it is a road, the pixel value is [192,192,192], and the image segmentation result is drawn. Fig. 5 shows the remote sensing image original for test and fig. 6 shows the segmentation result of the present invention.
Aiming at semantic segmentation of remote sensing images and the problem of high difficulty in segmentation of tiny objects in a complex environment, the invention provides a deep learning-based Deeplabv3 improved algorithm to modify an up-sampling module and adjust the void rate of an ASPP module to construct a network model suitable for segmentation of the remote sensing images, thereby enhancing the capability of segmenting small objects in the complex environment. The experimental results show that: the network model provided by the invention effectively solves the problem of poor segmentation capability on small objects such as bedding, buildings and the like, improves the segmentation precision and has a good segmentation effect.
TABLE 1 analysis of efficiency
Figure BDA0002803467630000091
As can be seen from Table 1, in terms of average cross-over ratio (mIou), the average cross-over ratio of the original Deeplabv3 algorithm is 91.15%, the average cross-over ratio of the U-Net algorithm is 87.95%, the average cross-over ratio of the SegNet algorithm is 86.88%, the average cross-over ratio of HR-Net is 92.88%, and the average cross-over ratio of DANet is 95.16%, and the average cross-over ratio of the improved method of the invention is 94.92%, which is slightly lower than that of the DANet algorithm, and is improved by 3.77% and 2.04% compared with the original Deeplabv3 algorithm and HR-Net algorithm.
In the aspect of vegetation cross-combination ratio (Iou), the cross-combination ratio of the original Deeplabv3 algorithm is 85.25%, the cross-combination ratio of the vegetation of the DANet algorithm is 90.84%, the cross-combination ratio of the vegetation of the HR-Net algorithm is 82.83%, the cross-combination ratio of the vegetation of the method is 88.66%, and the method is improved by 3.41% and 1.9% compared with the original Deeplabv3 algorithm and the DANet algorithm.
In terms of building cross-correlation (Iou), the cross-correlation of the original Deeplabv3 algorithm is 90.06%, the cross-correlation of the building of the DANet algorithm is 90.50%, and the cross-correlation of the building of the HR-Net algorithm is 91.64%, and the cross-correlation of the building of the improved method is 93.83%, which is 3.77% and 30.33% higher than the original Deeplabv3 algorithm and the DANet algorithm.
The improved method (particle method) of the invention achieves 98.01% for Pixel Accuracy (Pixel Accuracy), and is improved by 2.40%, 0.67% and 4.16% respectively compared with the SegNet algorithm with the worst segmentation effect, namely the Deeplabv3 algorithm and the DANet algorithm.
Those skilled in the art will appreciate that the invention may be practiced without these specific details. Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims (6)

1. A remote sensing image semantic segmentation method based on deep learning is characterized by comprising the following steps: the method comprises the following steps:
step 1, marking collected remote sensing data by using a labelme tool to obtain a marking result;
step 2, performing data enhancement on the labeling result obtained in the step 1 to obtain a data set;
step 3, designing a network;
step 4, reading the data set in the step 2 into the design network in the step 3 for training;
step 5, reading the network weight trained in the step 4 into a network through evaluation and judgment, reading a picture to be predicted into the network, and calculating to obtain a Logit;
and 6, analyzing the Logit score, giving the corresponding color of each pixel to represent specific classification, and finally obtaining a segmentation result.
2. The remote sensing image semantic segmentation method based on deep learning of claim 1, which is characterized in that: the specific method for enhancing the data in the step 2 is as follows: randomly cutting the remote sensing data original image and the marked mask, wherein the size of the picture obtained by cutting each time is 256 multiplied by 256 pixels, rotating, turning, blurring, Gaussian filtering, bilateral filtering and white noise adding are carried out on each cut picture to obtain an enhanced picture, and then a data set is established.
3. The remote sensing image semantic segmentation method based on deep learning of claim 2, which is characterized in that: the specific method for designing the network in the step 3 is as follows:
a main network is formed by resnet-50, and comprises convolution with convolution kernel of 7 multiplied by 7 and step length of 2 and output channel number of 64, and visual field pooling is maximum value with 3 multiplied by 3 and step length of 2;
then the three convolution kernels are 1 multiplied by 1, 3 multiplied by 3, the step length of 1 multiplied by 1 is 1, and the number of output channels is 64, 64 and 256 convolution respectively; the four convolution kernels are convolution with 1 × 1, 3 × 3, 1 × 1 step length being 1, and the number of output channels being 128, 128 and 512 respectively; the six convolution kernels are convolution with 1 × 1, 3 × 3 and 1 × 1 step length of 1, and the number of output channels is respectively 256, 256 and 1024; the three convolution kernels are convolution with 1 × 1, 3 × 3, 1 × 1 step length being 1, and the number of output channels being 512, 512 and 2048 respectively;
after the ASPP module with the modified void ratio is used, the five parallel sub-modules are respectively:
convolution kernel is convolution with 1 multiplied by 1, step length is 1, and output channel number is 256;
convolution kernel is convolution with 3 multiplied by 3 step length of 1, void rate of 3 and output channel number of 256;
convolution kernel is convolution with 3 multiplied by 3 step length of 1, void rate of 6 and output channel number of 256;
convolution kernel is convolution with 3 multiplied by 3 step length of 1, void rate of 9 and output channel number of 256;
the last layer is global average pooling, and the number of output channels is 256;
because the resolution of the output images of the 5 submodules is the same, the five submodules are superposed on the channel dimension to obtain the characteristic that the channel number is 1280, and the output channel number is fused into 256 through 1 multiplied by 1 convolution; then reducing the characteristics into 64 x 64 pixels through bilinear interpolation upsampling; then, the convolution kernel is superposed on a channel with the convolution kernel with the size of 7 multiplied by 7 at the beginning to obtain the characteristic that the number of output channels is 512; and finally, performing bilinear interpolation up-sampling to restore the image resolution to 256 × 256 and performing 1 × 1 convolution to change the channel number to 5 to obtain the Logit through two convolutions with the step length of 3 × 3 being 1.
4. The remote sensing image semantic segmentation method based on deep learning of claim 3, which is characterized in that: the tool for evaluating the network training in the step 5The method comprises the following steps: average and cross-ratio
Figure FDA0002803467620000021
Building, vegetation, water system and road intersection
Figure FDA0002803467620000022
And pixel accuracy
Figure FDA0002803467620000023
As a detection evaluation index; based on the fact that the semantic segmentation of the remote sensing image is a classification task, the prediction result is in four conditions: true Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN); iou is the ratio of the intersection and union of the two sets of true and predicted values, i.e. the ratio
Figure FDA0002803467620000031
Where k +1 is the number of categories containing the background class, piiNumber of pixels to be correctly predicted, PijAnd PjiAll represent the number of pixels that are falsely detected, mlou is a consideration of all classes, and Iou of each class are added and averaged to obtain a global-based evaluation.
5. The remote sensing image semantic segmentation method based on deep learning of claim 4, which is characterized in that: the concrete method for reading the picture to be predicted into the network in the step 5 is as follows: cutting a plurality of pictures with the size of 256 multiplied by 256 pixels from the left upper corner of the remote sensing image to the right and from the top to the bottom, wherein the interval between the first columns of two adjacent pictures in the same row which are cut each time is 256 pixels, and the interval between the first rows of two adjacent pictures in the same column is also 256 pixels; meanwhile, when the size of the pre-cut picture at the edge of the remote sensing image is less than 256 multiplied by 256 pixels, the 256 multiplied by 256 pixels are cut in the opposite direction by taking the pre-cut picture as a reference; and after the prediction of the cut pictures is finished, splicing the pictures according to a cutting rule, thereby obtaining a complete Logit score map of the remote sensing image.
6. The remote sensing image semantic segmentation method based on deep learning of claim 5, which is characterized in that: the step 6 of analyzing the Logit score, giving the color corresponding to each pixel to represent specific classification, and the specific method of finally obtaining the segmentation result is as follows: the number of channels of the score map is 5, 5 channels of each pixel respectively represent corresponding scores of buildings, vegetation, water systems, roads and other classifications, and the highest score is the category of the current pixel; newly building a zero matrix with the resolution as the original resolution of the test chart and the number of channels as 3; judging the pixel value corresponding to the original score map, if the pixel value is a building, the pixel value is [31,102,156 ]; if the vegetation is found, the pixel value is [0,255,0 ]; if the pixel is a water system, the pixel value is [255, 0 ]; if the road is found, the pixel value is [192,192,192 ]; if the pixel value is other types, the pixel value is [255,255 ]; each pixel is dyed by the method, and the finally obtained matrix derivation is a segmentation result.
CN202011359068.0A 2020-11-27 2020-11-27 Remote sensing image semantic segmentation method based on deep learning Pending CN112489054A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011359068.0A CN112489054A (en) 2020-11-27 2020-11-27 Remote sensing image semantic segmentation method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011359068.0A CN112489054A (en) 2020-11-27 2020-11-27 Remote sensing image semantic segmentation method based on deep learning

Publications (1)

Publication Number Publication Date
CN112489054A true CN112489054A (en) 2021-03-12

Family

ID=74936403

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011359068.0A Pending CN112489054A (en) 2020-11-27 2020-11-27 Remote sensing image semantic segmentation method based on deep learning

Country Status (1)

Country Link
CN (1) CN112489054A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801929A (en) * 2021-04-09 2021-05-14 宝略科技(浙江)有限公司 Local background semantic information enhancement method for building change detection
CN113256649A (en) * 2021-05-11 2021-08-13 国网安徽省电力有限公司经济技术研究院 Remote sensing image station selection and line selection semantic segmentation method based on deep learning
CN113496221A (en) * 2021-09-08 2021-10-12 湖南大学 Point supervision remote sensing image semantic segmentation method and system based on depth bilateral filtering
CN113537033A (en) * 2021-07-12 2021-10-22 哈尔滨理工大学 Building rubbish remote sensing image identification method based on deep learning
CN113688956A (en) * 2021-10-26 2021-11-23 西南石油大学 Sandstone slice segmentation and identification method based on depth feature fusion network
CN113837972A (en) * 2021-10-14 2021-12-24 中铁十九局集团矿业投资有限公司 Mining method based on multispectral remote sensing technology
CN114494910A (en) * 2022-04-18 2022-05-13 陕西自然资源勘测规划设计院有限公司 Facility agricultural land multi-class identification and classification method based on remote sensing image
CN114782406A (en) * 2022-05-21 2022-07-22 上海贝特威自动化科技有限公司 RESNEXT50 deep segmentation network-based automobile gluing visual detection method
CN115222734A (en) * 2022-09-20 2022-10-21 山东大学齐鲁医院 Image analysis method and system for gastric mucosa intestinal metaplasia

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110119728A (en) * 2019-05-23 2019-08-13 哈尔滨工业大学 Remote sensing images cloud detection method of optic based on Multiscale Fusion semantic segmentation network
CN110390251A (en) * 2019-05-15 2019-10-29 上海海事大学 A kind of pictograph semantic segmentation method based on the processing of multiple neural network Model Fusion
CN111462124A (en) * 2020-03-31 2020-07-28 武汉卓目科技有限公司 Remote sensing satellite cloud detection method based on Deep L abV3+
CN113256649A (en) * 2021-05-11 2021-08-13 国网安徽省电力有限公司经济技术研究院 Remote sensing image station selection and line selection semantic segmentation method based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390251A (en) * 2019-05-15 2019-10-29 上海海事大学 A kind of pictograph semantic segmentation method based on the processing of multiple neural network Model Fusion
CN110119728A (en) * 2019-05-23 2019-08-13 哈尔滨工业大学 Remote sensing images cloud detection method of optic based on Multiscale Fusion semantic segmentation network
CN111462124A (en) * 2020-03-31 2020-07-28 武汉卓目科技有限公司 Remote sensing satellite cloud detection method based on Deep L abV3+
CN113256649A (en) * 2021-05-11 2021-08-13 国网安徽省电力有限公司经济技术研究院 Remote sensing image station selection and line selection semantic segmentation method based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LIANG-CHIEH CHEN等: "《Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation》", 《ECCV 2018》 *
熊风光 等: "《改进的遥感图像语义分割研究》", 《HTTPS://KNS.CNKI.NET/KCMS/DETAIL/11.2127.TP.20210327.1608.010.HTML》 *
青晨 等: "《深度卷积神经网络图像语义分割研究进展》", 《中国图象图形学报》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801929A (en) * 2021-04-09 2021-05-14 宝略科技(浙江)有限公司 Local background semantic information enhancement method for building change detection
CN113256649A (en) * 2021-05-11 2021-08-13 国网安徽省电力有限公司经济技术研究院 Remote sensing image station selection and line selection semantic segmentation method based on deep learning
CN113256649B (en) * 2021-05-11 2022-07-01 国网安徽省电力有限公司经济技术研究院 Remote sensing image station selection and line selection semantic segmentation method based on deep learning
CN113537033A (en) * 2021-07-12 2021-10-22 哈尔滨理工大学 Building rubbish remote sensing image identification method based on deep learning
CN113496221A (en) * 2021-09-08 2021-10-12 湖南大学 Point supervision remote sensing image semantic segmentation method and system based on depth bilateral filtering
CN113496221B (en) * 2021-09-08 2022-02-01 湖南大学 Point supervision remote sensing image semantic segmentation method and system based on depth bilateral filtering
CN113837972A (en) * 2021-10-14 2021-12-24 中铁十九局集团矿业投资有限公司 Mining method based on multispectral remote sensing technology
CN113688956A (en) * 2021-10-26 2021-11-23 西南石油大学 Sandstone slice segmentation and identification method based on depth feature fusion network
CN114494910A (en) * 2022-04-18 2022-05-13 陕西自然资源勘测规划设计院有限公司 Facility agricultural land multi-class identification and classification method based on remote sensing image
CN114782406A (en) * 2022-05-21 2022-07-22 上海贝特威自动化科技有限公司 RESNEXT50 deep segmentation network-based automobile gluing visual detection method
CN115222734A (en) * 2022-09-20 2022-10-21 山东大学齐鲁医院 Image analysis method and system for gastric mucosa intestinal metaplasia
CN115222734B (en) * 2022-09-20 2023-01-17 山东大学齐鲁医院 Image analysis method and system for gastric mucosa enteroepithelization

Similar Documents

Publication Publication Date Title
CN112489054A (en) Remote sensing image semantic segmentation method based on deep learning
CN109934200B (en) RGB color remote sensing image cloud detection method and system based on improved M-Net
CN111986099B (en) Tillage monitoring method and system based on convolutional neural network with residual error correction fused
CN110705457A (en) Remote sensing image building change detection method
CN111797716A (en) Single target tracking method based on Siamese network
CN111723693B (en) Crowd counting method based on small sample learning
CN112084869B (en) Compact quadrilateral representation-based building target detection method
CN113011329A (en) Pyramid network based on multi-scale features and dense crowd counting method
CN111611861B (en) Image change detection method based on multi-scale feature association
CN110929621B (en) Road extraction method based on topology information refinement
CN111738113A (en) Road extraction method of high-resolution remote sensing image based on double-attention machine system and semantic constraint
CN111160205A (en) Embedded multi-class target end-to-end unified detection method for traffic scene
CN114821069A (en) Building semantic segmentation method for double-branch network remote sensing image fused with rich scale features
CN115471467A (en) High-resolution optical remote sensing image building change detection method
CN114299383A (en) Remote sensing image target detection method based on integration of density map and attention mechanism
CN116246169A (en) SAH-Unet-based high-resolution remote sensing image impervious surface extraction method
CN112149526A (en) Lane line detection method and system based on long-distance information fusion
CN114519819B (en) Remote sensing image target detection method based on global context awareness
CN116206112A (en) Remote sensing image semantic segmentation method based on multi-scale feature fusion and SAM
CN116524189A (en) High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
CN116778318A (en) Convolutional neural network remote sensing image road extraction model and method
CN114926826A (en) Scene text detection system
CN114943902A (en) Urban vegetation unmanned aerial vehicle remote sensing classification method based on multi-scale feature perception network
CN112801021B (en) Method and system for detecting lane line based on multi-level semantic information
CN113920421A (en) Fast-classification full convolution neural network model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210312

RJ01 Rejection of invention patent application after publication