CN113505792A - Multi-scale semantic segmentation method and model for unbalanced remote sensing image - Google Patents

Multi-scale semantic segmentation method and model for unbalanced remote sensing image Download PDF

Info

Publication number
CN113505792A
CN113505792A CN202110739174.XA CN202110739174A CN113505792A CN 113505792 A CN113505792 A CN 113505792A CN 202110739174 A CN202110739174 A CN 202110739174A CN 113505792 A CN113505792 A CN 113505792A
Authority
CN
China
Prior art keywords
layer
segmentation
network
semantic segmentation
remote sensing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110739174.XA
Other languages
Chinese (zh)
Other versions
CN113505792B (en
Inventor
聂婕
王成龙
魏志强
时津津
叶敏
陈昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202110739174.XA priority Critical patent/CN113505792B/en
Publication of CN113505792A publication Critical patent/CN113505792A/en
Application granted granted Critical
Publication of CN113505792B publication Critical patent/CN113505792B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multiscale semantic segmentation method and a multiscale semantic segmentation model for unbalanced remote sensing images, wherein the multiscale semantic segmentation model for the unbalanced remote sensing images adopts a multilevel semantic segmentation network which can learn fine-grained local features, reserve small category information, learn whole global context semantic features and reserve large-scale information; the whole network architecture is divided into three layers, each layer adopts different network structures to extract features of different scales, segmented images of different resolutions are output, the features are fused after the images are fused at the same level by adopting a Bayesian fusion method, the fusion of multi-scale segmented image information is realized, and the complementation of missing information is realized; the multi-scale semantic segmentation method for the non-equilibrium remote sensing image adopts an optimization algorithm which can enable pixels of different classes to be more separated and pixels of the same class to be more aggregated, so that a semantic segmentation network model can realize uniform segmentation on class imbalance data.

Description

Multi-scale semantic segmentation method and model for unbalanced remote sensing image
Technical Field
The invention belongs to the technical field of remote sensing image processing, and particularly relates to a non-equilibrium remote sensing image-oriented multi-scale semantic segmentation method and a non-equilibrium remote sensing image-oriented multi-scale semantic segmentation model.
Background
With the development of earth observation technology and the improvement of image acquisition technology, the remote sensing image provides massive research data for earth observation and discovery. The content analysis is carried out on the remote sensing image through image processing and artificial intelligence technology, and the method is an effective method for fully mining the remote sensing data. The main means comprises scene classification, target recognition, semantic segmentation and the like. Among them, semantic segmentation is one of the important techniques for content analysis of remote sensing images, and by inferring semantic categories of individual pixels of an image, objects and regions contained in the image are segmented.
The current method adopted by image semantic segmentation is a semantic segmentation method based on deep learning, and classic deep learning semantic segmentation networks comprise a complete convolution network (FCN), SegNet, U-Net and the like. The FCN realizes an end-to-end segmentation network for the first time by adopting convolution layers, can accept input images of any size and output segmentation images of the same size. However, the context information and the correlation between pixels are not comprehensively combined, and the segmentation accuracy is not sufficient. The U-Net network realizes the cascade function in the channel dimension and is suitable for the segmentation of multi-scale and large-size images. However, the U-Net has a high requirement on the computing power of the device and has a relatively slow computing speed. The SegNet network can improve the memory utilization rate and the model segmentation efficiency by using the maximum pooling index stored in the encoding stage in the decoding stage. However, when pooling the low-resolution feature map, information of neighboring pixels is ignored, resulting in a loss of accuracy. A pyramid scene parsing network (PSPNet) can make full use of global context information by using a pyramid pooling module to learn multi-level features. But does not fully utilize the entire scene information.
Because the remote sensing image and the common image have obvious difference in resolution, spatial structure and semantics, the traditional method and the common neural network method are difficult to realize efficient segmentation aiming at the characteristics of the remote sensing image. Semantic segmentation of remote sensing images still faces the following challenges:
firstly, semantic category distribution processed by the existing natural image segmentation method is relatively balanced, and the phenomenon that a certain category occupies a larger proportion of an image in a remote sensing image is not considered. Due to the difference of the distribution of the real physical entities on the earth surface, the foreground and the background of the remote sensing image are unbalanced, and the difference of the scales of different types of objects is large. Secondly, the deep learning segmentation model suitable for the natural image is insensitive to the scale change of an object, and the pixel precision is lost when the deep learning segmentation model is directly used for semantic segmentation of the remote sensing image. Compared with the categories such as land and lake, the volume of the categories such as vehicles in the remote sensing image can be even ignored, and large scale change exists between objects. Therefore, the prior semantic segmentation method is not suitable for being directly applied to the remote sensing image, and a corresponding segmentation algorithm needs to be designed aiming at the characteristics of the remote sensing image.
Due to the diversity of remote sensing image acquisition and the particularity of data which is different from natural images, the semantic segmentation can not solve the problem in a single mode before. The remote sensing image has the problem of multi-scale change of objects, large-scale objects dominate in segmentation, and meanwhile, learning of small-scale objects is inhibited, so that small-class objects are difficult to identify. In addition, due to the high resolution of the remote sensing image, the information contained in the image is usually very dense, which causes the problem of uneven distribution of image categories.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a non-equilibrium remote sensing image-oriented multi-scale semantic segmentation method and a non-equilibrium remote sensing image-oriented multi-scale semantic segmentation model, and solves the technical problems that: (1) the object scale difference in the remote sensing image is large. (2) The problem of unbalanced category distribution of the remote sensing image is solved. Aiming at the first problem, the invention provides a multi-scale semantic segmentation model for an unbalanced remote sensing image, which is characterized in that a multi-level semantic segmentation network is designed, features of different scales are extracted, the features are fused at the same level, the missing information complementation is realized, the global context information is fully utilized, meanwhile, on the premise of retaining the local detail information of the image, the mutual influence of multi-scale objects is overcome, and the robustness and the accuracy of remote sensing image segmentation are improved. Aiming at the second problem, the invention carries out algorithm design from two aspects: 1) constructing an inter-class loss function, and realizing the maximization of class intervals of samples of different classes; 2) and (3) constructing a class weight balance distribution loss function, and solving the problem of unbalance of positive and negative samples of all classes.
In order to solve the above technical problems, the technical solution adopted by the present invention is described in detail as follows:
firstly, the invention provides a multiscale semantic segmentation model facing to an unbalanced remote sensing image, which adopts a multi-level semantic segmentation network which can learn fine-grained local features, reserve small category information, learn the whole global context semantic features and reserve large-scale information; the whole network architecture of the multi-level semantic segmentation network is divided into three layers, each layer adopts different network structures to extract features of different scales, segmented images of different resolutions are output, the features are fused after the images are fused at the same level by adopting a Bayesian fusion method, the fusion of multi-scale segmented image information is realized, and the complementation of missing information is realized.
Further, a first Level 1 of the multi-Level semantic segmentation network model adopts data of an original resolution, a second Level 2 adopts data which is sampled 2 times, and a third Level 3 adopts data which is sampled 4 times;
the multi-level semantic segmentation network model adopts a SegNet semantic segmentation network as a main network, an encoder is arranged on the left side of the network and consists of 5 convolution pooling processes, each layer of the first two layers comprises two convolution layers, and each layer of the last three layers comprises three convolution layers;
the decoder is arranged on the right side of the network and consists of 5 upsampling and convolution processes, the first layer and the fourth layer on the right side are an upsampling layer and two convolution layers, the second layer and the third layer are an upsampling layer and three convolution layers, the fifth layer is an upsampling layer and two convolution layers, and finally a Softmax layer is added.
Furthermore, each layer of the encoding stage corresponds to each layer of the decoding stage one by one; each upsampling layer of the decoder corresponds to the maximum pooling layer of the same-level encoder, the upsampling layer of the decoder performs upsampling on the feature map by using an index reserved in the maximum pooling process, so that the features of image classification in the encoding stage are reproduced to generate dense feature maps, the feature maps are restored to the size same as that of the original image, and then the feature maps are classified through the softmax layer to generate the final segmentation map.
Further, the three-layer network output segmentation graph O of the multi-level semantic segmentation network1,O2And O3When the segmentation maps output by the three-layer network are fused after the images are processed, the segmentation map output by any one layer is selected as a priori OiI is 1,2,3, and any segmentation map O except this segmentation mapjThe likelihood, j ≠ i, j ═ 1,2,3, is calculated, so the posterior probability is calculated as:
Figure BDA0003140839110000031
n represents the category of the current pixel, and m represents the number of categories; fniAnd BniRespectively representing a foreground area and a background area when the category is n; o isniAs a priori, i ═ 1,2,3, OnjUsed to calculate the likelihood, j ≠ i, j ═ {1,2,3 }; in each region by comparing OniAnd OnjThe likelihood is calculated over the foreground and background of each class.
Preferably, when the segmentation maps output by the three-layer network are fused after the image, the segmentation maps output by the first two layers are fused; and then fusing the segmentation maps obtained by fusing the first two layers with the segmentation map output by the third layer, wherein the specific steps are as follows:
firstly, using a segmentation graph output by a first layer network as a priori, using a segmentation graph output by a second layer network to calculate the likelihood ratio, and then combining the information of the two segmentation graphs based on a Bayesian formula;
then, the two are exchanged, the segmentation graph output by the second layer is used as a priori, the segmentation graph output by the first layer is used for calculating the likelihood ratio, and then the integration is carried out based on a Bayes formula;
finally, the segmentation maps of the first two network layers and the third network layer are fused in the same manner to obtain the final set segmentation map.
Then, the invention also provides a multiscale semantic segmentation method for the unbalanced remote sensing image, which comprises the following steps:
firstly, improving a semantic segmentation network architecture: constructing a multi-level semantic segmentation network model, outputting segmentation maps with different scales on each layer of network, and fusing multi-scale segmentation image information by adopting a Bayesian fusion method;
II, equalizing loss function: the optimization algorithm which can separate pixels of different classes more and can aggregate pixels of the same class more is adopted, and the optimization algorithm is as follows:
1) constructing a class weight balanced distribution loss function based on a focus loss function Focal loss, and solving the problem of unbalance of positive and negative samples of all classes;
2) introducing a Hinge loss function Hinge loss to construct an inter-class loss function, and realizing the maximization of class intervals of samples of different classes; 3) equalization loss function: an overall loss function is constructed.
Further, the class weights balance the distribution loss function
Figure BDA0003140839110000041
Wherein p istIs the probability that the sample is positive for class t, M is the number of sample classes, t represents a certain class, γ is the hyperparameter, -log (p)t) Is an initial cross entropy loss function; λ is an adjustable hyper-parameter, set to 0<λ<1, itThe purpose is in order to increase the adjustability to different sample classification accuracy, reduce complicated sample punishment weight, increase the punishment contribution of good sample.
Further, the inter-class loss function is
Hinge=max(0,1+maxwwrong-wcorrect) (11)
Wherein, wwrongIs the number of misclassified samples, wcorrectIs the number of correctly classified samples, for wwrongTaking the maximum value indicates that the category with the most error samples is selected.
Further, the overall loss function is as follows
Figure BDA0003140839110000042
Beta is a hyper-parameter for controlling the contribution rate of the Hinge loss penalty term, and beta is more than 0.
Compared with the prior art, the invention has the advantages that:
(1) the invention provides a multi-level semantic segmentation network, extracts features of different scales, fuses the features at the same level, realizes the complementation of missing information, can learn local features of fine granularity, retain small-class information, can learn the semantic features of the whole global context, fully utilizes the global context information, and retains information of large scale; on the premise of retaining local detail information of the image, the mutual influence of multi-scale objects is overcome, and the robustness and accuracy of remote sensing image segmentation are improved.
(2) The invention also provides a Bayes-based multi-scale post-fusion semantic segmentation method, aiming at the scale dependency characteristics of the remote sensing image, the multi-scale post-fusion method is researched, different scale results are respectively modeled into a priori and a likelihood, the Bayes principle is utilized to make an optimal decision, and the method is verified to improve the segmentation accuracy.
The Bayesian fusion method can better identify the semantic information of the object, the distribution of the outline and the class of the whole object is relatively clearer, the boundaries of the objects in different classes are more obvious, and the method also improves the performance of the network output segmentation graph.
(3) The method designs an equalization loss function of the non-equilibrium remote sensing image semantic segmentation, balances the loss weight of a difficult-to-learn sample, reduces the weight of an easy-to-learn class, increases the weight of a difficult-to-learn class and improves the training stability aiming at the characteristic that the remote sensing image semantic distribution is unbalanced, particularly the phenomenon that the foreground and the background are unbalanced caused by space difference; meanwhile, hinge loss is introduced, the inter-class distance is enlarged, and the boundaries of samples of different classes are more obvious, so that the accuracy of the segmentation structure and the definition of local information classification are improved.
The balanced loss algorithm of the invention can realize uniform segmentation of the semantic segmentation network model on the class unbalanced data.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a diagram of a multi-level semantic segmentation network architecture of the present invention;
FIG. 2 is a first layer segmentation network codec structure according to the present invention;
FIG. 3 is a second layer segmentation network codec structure according to the present invention;
FIG. 4 is a third layer of split network codec structure according to the present invention;
FIG. 5 is a schematic diagram of a Bayesian image fusion algorithm of the present invention.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
Example 1
The embodiment designs a multi-level deep neural network model, and particularly provides a multi-scale semantic segmentation model for unbalanced remote sensing images, wherein a multi-level semantic segmentation network which can learn fine-grained local features, retain small-class information, learn whole global context semantic features and retain large-scale information is adopted. The network architecture is shown in fig. 1. The whole network architecture of the multi-Level semantic segmentation network is divided into three layers, wherein the first Level corresponds to Level 1 in the graph 1, and data of original resolution is adopted; the second Level corresponds to Level 2 in fig. 1, and data after 2 times of downsampling is adopted; the third Level corresponds to Level 3 in fig. 1, and data after downsampling by 4 times is used. And the remote sensing image is downsampled twice to obtain more local information and global information.
Aiming at multi-scale information, each layer adopts different network structures to extract features of different scales, more visual information is kept in the feature extraction process as much as possible, and segmented images of different resolutions are output; then, the features are subjected to image post-fusion by adopting a Bayesian fusion method at the same level, so that fusion of multi-scale segmented image information is realized, missing information complementation is realized, and accurate segmentation of the remote sensing image is realized.
The multi-level network model for segmentation and other classical deep neural network segmentation models have the advantages that the multi-level network can retain good local detail information and global semantic information.
1. The multi-level network architecture of the present embodiment is described in detail below:
the backbone network adopted in this embodiment is a SegNet semantic segmentation network, and as shown in fig. 2,3, and 4, the backbone network corresponds to Level 1, Level 2, and Level 3 in fig. 1, respectively. The left side of the network is the encoder, consisting of 5 convolutional pooling processes, with each of the first two layers containing two convolutional layers, and each of the last three layers containing three convolutional layers. Features of the image are extracted by the convolutional layer, and then the pooling layer is used to reduce the size of the feature map and increase the receptive field. The pooling layer employs maximal pooling to achieve spatial invariance over small spatial movements, but results in loss of positioning accuracy and loss of spatial detail.
The decoder is arranged on the right side of the network and consists of 5 upsampling and convolution processes, the first layer and the fourth layer on the right side are an upsampling layer and two convolution layers, the second layer and the third layer are an upsampling layer and three convolution layers, the fifth layer is an upsampling layer and two convolution layers, and finally a Softmax layer is added.
Each layer of the encoding stage and each layer of the decoding stage correspond one-to-one, similar to the U-shaped structure of a U-net network. At each upsampling layer of the decoder, which corresponds to the maximum pooling layer of the same-level encoder, the upsampling layer of the decoder upsamples the feature map using the indices reserved by the maximum pooling process. And reproducing the characteristics classified by the images in the coding stage through an upsampling layer to generate dense characteristic graphs, restoring the characteristic graphs to the same size as the original images, and classifying through a softmax layer to generate the final segmentation graph. The SegNet training parameters are few, the occupied calculation memory is small, meanwhile, the segmentation accuracy can be guaranteed, and the method is suitable for high-resolution remote sensing image semantic segmentation.
2. The image post-fusion method of the present embodiment is described in detail below:
the multi-scale network can output segmentation images with different resolutions, and due to the difference of the resolutions, the segmentation effects are different. The patent provides a multi-scale post-fusion semantic segmentation method based on the Bayesian principle, and in a significance detection task, the posterior probability is calculated by integrating significance mapping:
Figure BDA0003140839110000071
S1and S2Are all saliency maps, one of which is used as a prior probability Si(i ═ {1,2}), another Sj(j ≠ i, j ≠ 1, 2) is used for calculating the likelihood ratio; fiAnd BiRepresenting foreground and background regions, respectively, the likelihood in each region being calculated by the formula:
Figure BDA0003140839110000072
wherein
Figure BDA0003140839110000073
Indicating the number of pixels in the foreground and,
Figure BDA0003140839110000074
is that its color feature falls within the inclusion feature SjForeground of (1)
Figure BDA0003140839110000075
The number of pixels in (1);
Figure BDA0003140839110000076
which represents the number of pixels in the background,
Figure BDA0003140839110000077
is that its color feature falls within the inclusion feature SjBackground bin
Figure BDA0003140839110000078
The number of pixels in (1).
Three-layer network output segmentation graph O of multi-level semantic segmentation network1,O2And O3When the segmentation maps output by the three-layer network are fused after the images are processed, the segmentation map output by any one layer is selected as a priori Oi(i ═ 1,2,3), any segmentation map O other than this onej(j ≠ i, j ≠ 1,2,3}) the likelihood is computed, so the posterior probability is computed as:
Figure BDA0003140839110000079
wherein n represents the category of the current pixel, and m represents the number of categories; fniAnd BniRespectively representing a foreground area and a background area when the category is n; o isni(i ═ 1,2,3) as a priori, Onj(j ≠ i, j ≠ 1,2,3) is used for calculating likelihood; in each region by comparing OniAnd OnjThe likelihood is calculated at the foreground and background of each class:
Figure BDA0003140839110000081
Figure BDA0003140839110000082
Figure BDA0003140839110000083
representing the number of foreground pixels in the nth class,
Figure BDA0003140839110000084
is containing the characteristic Onj(z) number of pixels of the color feature in the foreground region. Using OnjThe posterior probability is calculated as a priori.
As a preferred embodiment, when the segmentation maps output by the three-layer network are fused after the image, the segmentation maps output by the first two layers are firstly used for fusion; then, the segmented images obtained by fusing the first two layers are fused with the segmented image output from the third layer, as shown in (6) and (7) below.
On4(z)=OB(On1(z),On2(z))=p(Fn1|On2(z))+p(Fn2|On1(z)) (6)
O(z)=OB(On3(z),On4(z))=p(Fn3|On4(z))+p(Fn4|On3(z)) (7)
As shown in fig. 5, specifically:
firstly, using a segmentation graph output by a first layer network as a priori, using a segmentation graph output by a second layer network to calculate the likelihood ratio, and then combining the information of the two segmentation graphs based on a Bayesian formula;
then, the two are exchanged, the segmentation graph output by the second layer is used as a priori, the segmentation graph output by the first layer is used for calculating the likelihood ratio, and then the integration is carried out based on a Bayes formula;
finally, the segmentation maps of the first two network layers and the third network layer are fused in the same manner to obtain the final set segmentation map.
Based on Bayes fusion, different output segmentation maps are repeatedly and forcibly used as priors, effective information of segmentation maps with different resolutions can be fused, and the precision of image segmentation is improved.
Example 2
The embodiment provides a multiscale semantic segmentation method for unbalanced remote sensing images, which comprises the following steps:
improved semantic segmentation network architecture
And constructing a multi-level semantic segmentation network model, outputting segmentation maps with different scales on each layer of network, and fusing multi-scale segmentation image information by adopting a Bayesian fusion method.
The semantic segmentation network may adopt a classical semantic segmentation network as a preferred embodiment, and the multi-level semantic segmentation network model may directly adopt the model described in embodiment 1, which may be specifically referred to the description in embodiment 1, and is not described herein again.
Two, equalizing loss function
The optimization algorithm which can separate pixels of different classes more and can aggregate pixels of the same class more is adopted, and the optimization algorithm is as follows:
1) and (3) constructing a class weight balanced distribution loss function based on the focus loss function Focal loss, and solving the problem of unbalance of positive and negative samples of all classes.
2) And introducing a Hinge loss function Hinge loss to construct an inter-class loss function, and realizing the maximization of class intervals of different classes of samples.
3) Equalization loss function: an overall loss function is constructed.
The following are introduced separately:
1. class weight balanced distribution
In the solution of the multi-classification problem, the sample classes of the data set are unevenly distributed, the number of negative samples is too large, and most of the samples are more differentiated, which often results in ineffective learning in the training process. This problem is particularly evident in the segmentation problem for remote sensing images. The purpose of using the Focal loss function is mainly to address the problem of extreme imbalance between the background and foreground of the target detection scenario. It is improved by increasing modulation factor on the basis of cross entropy loss (directly referring to the prior art), and the specific formula is as follows:
Figure BDA0003140839110000091
in this formula, ptIs the probability that the sample is positive for class t, M is the number of sample classes, t represents a certain class, log (p)t) Is an initial cross entropy loss function; gamma is more than or equal to 0 and is adjustable hyper-parameter, (1-p)t)γIs the modulation factor. For easy-to-learn samples, ptIs close to 1 when the modulation factor tends towards zero. However for difficult samples and misclassified samples ptThe value of the modulation factor is increased accordingly to balance the training inefficiency problem due to the sample problem. The advantage can effectively solve the problem of training failure caused by class imbalance in the remote sensing image. Therefore, aiming at the multi-classification problem of remote sensing image segmentation, the inter-class loss function formula is adjusted as follows:
Figure BDA0003140839110000092
where λ is an adjustable hyperparameter, set to 0<λ<1, the purpose is to increase the adjustability of the classification accuracy of different samples, further reduce the weight of a punishment item in the complex sample training on the basis of the original formula (8), increase the weight of the punishment item in the simple sample classification, and achieve the balance consistency of samples among classes. For example when ptHigher, indicating higher confidence in the class of samples, the penalty weight for alignment will be smaller if λ is set to 1. Its contribution to the training will decrease. Also, if ptSmaller, which means that the classification of the decision sample is difficult, the sample belongs to a complex sample, when λ is 1, the training weight is larger, the contribution to the training will be higher, and the contribution to λ is higherSet it to 0<λ<1, can reduce complicated sample punishment weight like this, increase the punishment contribution of good sample, and then improve the categorised accuracy of benign sample, consequently, the effective settlement of lambda can directly find a benign balance at complicated sample and easy sample, and then improves whole sample classification accuracy.
2. Balance between classes
Because the difference of adjacent samples in a remote sensing image is small, how to expand the difference of samples between classes is also a problem to be solved, a Hinge Loss (HL) loss function is introduced in the embodiment, and HL is generally used in a maximum interval classification task of a Support Vector Machine (SVM), so that the intra-class distance is reduced, and the inter-class distance is increased to realize a maximum boundary, and for binary classification, the formula is as follows:
Hinge=max(0,1-y*ypre) (10)
in the above formula, y is a label of a real sample, and its value can be only-1 or 1. y ispreIs a predicted value. When the absolute value of the predicted value is 1 or more, the distance between the sample and the boundary is 1 or more, and this case is not rewarded because the probability that the sample can be correctly classified is relatively large. Its multi-classification form is as follows:
Hinge=max(0,1+maxwwrong-wcorrect)
(11)
wherein, wwrongIs the number of misclassified samples, wcorrectIs the number of correctly classified samples. To wwrongTaking the maximum value indicates that the category with the most error samples is selected. When the number of the misclassified samples is large, the formula (11) gives a punishment item with large training data to promote the training to continue, and only when the number of the misclassified samples is small and the number of the correct training samples is large, the punishment item is automatically reduced, so that the training process is accelerated to finish as soon as possible. Therefore, the samples in the classes can be enabled to be consistent, the interval of the samples among the classes is enlarged, and the accuracy of classification is improved.
3. Balancing algorithm
Finally, in order to solve the problem of category imbalance, increase sample intervals of different categories and improve the accuracy of segmentation, the overall loss function of the invention is as follows:
Figure BDA0003140839110000111
wherein, beta is a hyper-parameter for controlling the contribution rate of the Hinge loss penalty term, and beta is more than 0. Finally, the model parameters are optimized by adopting a classical gradient descent method to obtain the optimal parameters, and then the training sample is tested.
In conclusion, the invention improves the semantic segmentation network structure and the design loss function equalization method, adopts the spatial multi-scale parallel post-fusion framework to realize scale difference portrayal, can retain good local detail information, and simultaneously better retains global semantic information; loss functions are designed based on pixel level sample distribution unbalance, loss weights of difficult and easy training samples are balanced based on a focusing loss thought, and training stability is improved; meanwhile, hinge loss is introduced, the inter-class distance is enlarged, and the semantic segmentation accuracy is improved; the equalization loss function is combined with the classical semantic segmentation network, so that the accuracy of the segmentation structure and the definition of local information classification are improved.
It is understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art should understand that they can make various changes, modifications, additions and substitutions within the spirit and scope of the present invention.

Claims (9)

1. The multi-scale semantic segmentation model oriented to the unbalanced remote sensing image is characterized in that a multi-level semantic segmentation network which can learn fine-grained local features, retain small category information, learn whole global context semantic features and retain large-scale information is adopted; the whole network architecture of the multi-level semantic segmentation network is divided into three layers, each layer adopts different network structures to extract features of different scales, segmented images of different resolutions are output, the features are fused after the images are fused at the same level by adopting a Bayesian fusion method, the fusion of multi-scale segmented image information is realized, and the complementation of missing information is realized.
2. The non-uniform remote sensing image-oriented multi-scale semantic segmentation model according to claim 1, characterized in that a first Level 1 of the multi-Level semantic segmentation network model adopts data of original resolution, a second Level 2 adopts data after down-sampling by 2 times, and a third Level 3 adopts data after down-sampling by 4 times;
the multi-level semantic segmentation network model adopts a SegNet semantic segmentation network as a main network, an encoder is arranged on the left side of the network and consists of 5 convolution pooling processes, each layer of the first two layers comprises two convolution layers, and each layer of the last three layers comprises three convolution layers;
the decoder is arranged on the right side of the network and consists of 5 upsampling and convolution processes, the first layer and the fourth layer on the right side are an upsampling layer and two convolution layers, the second layer and the third layer are an upsampling layer and three convolution layers, the fifth layer is an upsampling layer and two convolution layers, and finally a Softmax layer is added.
3. The multi-scale semantic segmentation model for the remote sensing images with unequal weight according to claim 2, wherein each layer of the encoding stage corresponds to each layer of the decoding stage in a one-to-one manner; each upsampling layer of the decoder corresponds to the maximum pooling layer of the same-level encoder, the upsampling layer of the decoder performs upsampling on the feature map by using an index reserved in the maximum pooling process, so that the features of image classification in the encoding stage are reproduced to generate dense feature maps, the feature maps are restored to the size same as that of the original image, and then the feature maps are classified through the softmax layer to generate the final segmentation map.
4. The non-equilibrium remote sensing image-oriented multi-scale semantic segmentation model according to any one of claims 1 to 3, characterized in that a three-layer network output segmentation graph O of the multi-level semantic segmentation network1,O2And O3The segmentation map output by the three-layer network is shown inWhen the image post-fusion is carried out, any layer of output segmentation graph is selected as prior OiI is 1,2,3, and any segmentation map O except this segmentation mapjThe likelihood, j ≠ i, j ═ 1,2,3, is calculated, so the posterior probability is calculated as:
Figure FDA0003140839100000011
n represents the category of the current pixel, and m represents the number of categories; fniAnd BniRespectively representing a foreground area and a background area when the category is n; o isniAs a priori, i ═ 1,2,3, OnjUsed to calculate the likelihood, j ≠ i, j ═ {1,2,3 }; in each region by comparing OniAnd OnjThe likelihood is calculated over the foreground and background of each class.
5. The multi-scale semantic segmentation model for the unbalanced remote sensing image according to claim 4, wherein when segmentation maps output by a three-layer network are fused after an image, the segmentation maps output by the first two layers are fused; and then fusing the segmentation maps obtained by fusing the first two layers with the segmentation map output by the third layer, wherein the specific steps are as follows:
firstly, using a segmentation graph output by a first layer network as a priori, using a segmentation graph output by a second layer network to calculate the likelihood ratio, and then combining the information of the two segmentation graphs based on a Bayesian formula;
then, the two are exchanged, the segmentation graph output by the second layer is used as a priori, the segmentation graph output by the first layer is used for calculating the likelihood ratio, and then the integration is carried out based on a Bayes formula;
finally, the segmentation maps of the first two network layers and the third network layer are fused in the same manner to obtain the final set segmentation map.
6. The multi-scale semantic segmentation method for the unbalanced remote sensing image is characterized by comprising the following steps of:
firstly, improving a semantic segmentation network architecture: constructing a multi-level semantic segmentation network model, outputting segmentation maps with different scales on each layer of network, and fusing multi-scale segmentation image information by adopting a Bayesian fusion method;
II, equalizing loss function: the optimization algorithm which can separate pixels of different classes more and can aggregate pixels of the same class more is adopted, and the optimization algorithm is as follows:
1) constructing a class weight balanced distribution loss function based on a focus loss function Focal loss, and solving the problem of unbalance of positive and negative samples of all classes;
2) introducing a Hinge loss function Hinge loss to construct an inter-class loss function, and realizing the maximization of class intervals of samples of different classes;
3) equalization loss function: an overall loss function is constructed.
7. The method for multi-scale semantic segmentation oriented to the remote sensing images with non-equilibrium weights as claimed in claim 6, characterized in that the class weight equilibrium distribution loss function is
Figure FDA0003140839100000021
Wherein p istIs the probability that the sample is positive for class t, M is the number of sample classes, t represents a certain class, γ is the hyperparameter, -log (p)t) Is an initial cross entropy loss function; λ is an adjustable hyper-parameter, set to 0<λ<1, aiming at increasing the adjustability of the classification accuracy of different samples, reducing the penalty weight of complex samples and increasing the penalty contribution of good samples.
8. The method for multiscale semantic segmentation oriented to unbalanced remote sensing images according to claim 7, wherein the inter-class loss function is
Hinge=max(0,1+maxwwrong-wcorrect) (11)
Wherein, wwrongIs the number of misclassified samples, wcorrectIs the number of correctly classified samples, for wwrongTaking the maximum value indicates that the category with the most error samples is selected.
9. The method for multiscale semantic segmentation of remote sensing images according to claim 8, wherein the overall loss function is as follows
Figure FDA0003140839100000031
Wherein, beta is a hyper-parameter for controlling the contribution rate of the Hinge loss penalty term, and beta is more than 0.
CN202110739174.XA 2021-06-30 2021-06-30 Multi-scale semantic segmentation method and model for unbalanced remote sensing image Active CN113505792B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110739174.XA CN113505792B (en) 2021-06-30 2021-06-30 Multi-scale semantic segmentation method and model for unbalanced remote sensing image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110739174.XA CN113505792B (en) 2021-06-30 2021-06-30 Multi-scale semantic segmentation method and model for unbalanced remote sensing image

Publications (2)

Publication Number Publication Date
CN113505792A true CN113505792A (en) 2021-10-15
CN113505792B CN113505792B (en) 2023-10-27

Family

ID=78009460

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110739174.XA Active CN113505792B (en) 2021-06-30 2021-06-30 Multi-scale semantic segmentation method and model for unbalanced remote sensing image

Country Status (1)

Country Link
CN (1) CN113505792B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114241339A (en) * 2022-02-28 2022-03-25 山东力聚机器人科技股份有限公司 Remote sensing image recognition model, method and system, server and medium
CN114322793A (en) * 2022-03-16 2022-04-12 科大天工智能装备技术(天津)有限公司 Workpiece size measuring method and device based on global segmentation network and storage medium
CN115131307A (en) * 2022-06-23 2022-09-30 腾讯科技(深圳)有限公司 Article defect detection method and related device
CN115272681A (en) * 2022-09-22 2022-11-01 中国海洋大学 Ocean remote sensing image semantic segmentation method and system based on high-order feature class decoupling
CN115374859A (en) * 2022-08-24 2022-11-22 东北大学 Method for classifying unbalanced and multi-class complex industrial data
CN115953582A (en) * 2023-03-08 2023-04-11 中国海洋大学 Image semantic segmentation method and system
CN115984281A (en) * 2023-03-21 2023-04-18 中国海洋大学 Multi-task completion method of time sequence sea temperature image based on local specificity deepening
CN116434037A (en) * 2023-04-21 2023-07-14 大连理工大学 Multi-mode remote sensing target robust recognition method based on double-layer optimization learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion
CN111797779A (en) * 2020-07-08 2020-10-20 兰州交通大学 Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion
WO2020233129A1 (en) * 2019-05-17 2020-11-26 深圳先进技术研究院 Image super-resolution and coloring method and system, and electronic device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020233129A1 (en) * 2019-05-17 2020-11-26 深圳先进技术研究院 Image super-resolution and coloring method and system, and electronic device
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion
CN111797779A (en) * 2020-07-08 2020-10-20 兰州交通大学 Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韩铮;肖志涛;: "基于纹元森林和显著性先验的弱监督图像语义分割方法", 电子与信息学报, no. 03, pages 106 - 113 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114241339A (en) * 2022-02-28 2022-03-25 山东力聚机器人科技股份有限公司 Remote sensing image recognition model, method and system, server and medium
CN114322793A (en) * 2022-03-16 2022-04-12 科大天工智能装备技术(天津)有限公司 Workpiece size measuring method and device based on global segmentation network and storage medium
CN114322793B (en) * 2022-03-16 2022-07-15 科大天工智能装备技术(天津)有限公司 Workpiece size measuring method and device based on global segmentation network and storage medium
CN115131307A (en) * 2022-06-23 2022-09-30 腾讯科技(深圳)有限公司 Article defect detection method and related device
CN115374859A (en) * 2022-08-24 2022-11-22 东北大学 Method for classifying unbalanced and multi-class complex industrial data
CN115272681A (en) * 2022-09-22 2022-11-01 中国海洋大学 Ocean remote sensing image semantic segmentation method and system based on high-order feature class decoupling
CN115272681B (en) * 2022-09-22 2022-12-20 中国海洋大学 Ocean remote sensing image semantic segmentation method and system based on high-order feature class decoupling
CN115953582A (en) * 2023-03-08 2023-04-11 中国海洋大学 Image semantic segmentation method and system
CN115984281A (en) * 2023-03-21 2023-04-18 中国海洋大学 Multi-task completion method of time sequence sea temperature image based on local specificity deepening
CN116434037A (en) * 2023-04-21 2023-07-14 大连理工大学 Multi-mode remote sensing target robust recognition method based on double-layer optimization learning
CN116434037B (en) * 2023-04-21 2023-09-22 大连理工大学 Multi-mode remote sensing target robust recognition method based on double-layer optimization learning

Also Published As

Publication number Publication date
CN113505792B (en) 2023-10-27

Similar Documents

Publication Publication Date Title
CN113505792B (en) Multi-scale semantic segmentation method and model for unbalanced remote sensing image
CN112396002B (en) SE-YOLOv 3-based lightweight remote sensing target detection method
CN110135267B (en) Large-scene SAR image fine target detection method
Ding et al. A deeply-recursive convolutional network for crowd counting
CN109299716B (en) Neural network training method, image segmentation method, device, equipment and medium
Byeon et al. Scene labeling with lstm recurrent neural networks
CN111126472A (en) Improved target detection method based on SSD
US20220230282A1 (en) Image processing method, image processing apparatus, electronic device and computer-readable storage medium
CN110852288B (en) Cell image classification method based on two-stage convolutional neural network
CN114841972A (en) Power transmission line defect identification method based on saliency map and semantic embedded feature pyramid
CN114511576B (en) Image segmentation method and system of scale self-adaptive feature enhanced deep neural network
CN112883887B (en) Building instance automatic extraction method based on high spatial resolution optical remote sensing image
CN111310821A (en) Multi-view feature fusion method, system, computer device and storage medium
CN110555461A (en) scene classification method and system based on multi-structure convolutional neural network feature fusion
CN114973011A (en) High-resolution remote sensing image building extraction method based on deep learning
CN116012722A (en) Remote sensing image scene classification method
Zhou et al. MSAR‐DefogNet: Lightweight cloud removal network for high resolution remote sensing images based on multi scale convolution
Ren et al. A lightweight object detection network in low-light conditions based on depthwise separable pyramid network and attention mechanism on embedded platforms
CN107529647B (en) Cloud picture cloud amount calculation method based on multilayer unsupervised sparse learning network
CN112132207A (en) Target detection neural network construction method based on multi-branch feature mapping
CN111339950A (en) Remote sensing image target detection method
Zhou et al. Ship target detection in optical remote sensing images based on multiscale feature enhancement
Wang et al. [Retracted] Small Target Detection Algorithm Based on Transfer Learning and Deep Separable Network
Ran et al. Adaptive fusion and mask refinement instance segmentation network for high resolution remote sensing images
CN114913588A (en) Face image restoration and recognition method applied to complex scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant