CN113505792A

CN113505792A - Multi-scale semantic segmentation method and model for unbalanced remote sensing image

Info

Publication number: CN113505792A
Application number: CN202110739174.XA
Authority: CN
Inventors: 聂婕; 王成龙; 魏志强; 时津津; 叶敏; 陈昊
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2021-10-15
Anticipated expiration: 2041-06-30
Also published as: CN113505792B

Abstract

The invention discloses a multiscale semantic segmentation method and a multiscale semantic segmentation model for unbalanced remote sensing images, wherein the multiscale semantic segmentation model for the unbalanced remote sensing images adopts a multilevel semantic segmentation network which can learn fine-grained local features, reserve small category information, learn whole global context semantic features and reserve large-scale information; the whole network architecture is divided into three layers, each layer adopts different network structures to extract features of different scales, segmented images of different resolutions are output, the features are fused after the images are fused at the same level by adopting a Bayesian fusion method, the fusion of multi-scale segmented image information is realized, and the complementation of missing information is realized; the multi-scale semantic segmentation method for the non-equilibrium remote sensing image adopts an optimization algorithm which can enable pixels of different classes to be more separated and pixels of the same class to be more aggregated, so that a semantic segmentation network model can realize uniform segmentation on class imbalance data.

Description

Multi-scale semantic segmentation method and model for unbalanced remote sensing image

Technical Field

The invention belongs to the technical field of remote sensing image processing, and particularly relates to a non-equilibrium remote sensing image-oriented multi-scale semantic segmentation method and a non-equilibrium remote sensing image-oriented multi-scale semantic segmentation model.

Background

With the development of earth observation technology and the improvement of image acquisition technology, the remote sensing image provides massive research data for earth observation and discovery. The content analysis is carried out on the remote sensing image through image processing and artificial intelligence technology, and the method is an effective method for fully mining the remote sensing data. The main means comprises scene classification, target recognition, semantic segmentation and the like. Among them, semantic segmentation is one of the important techniques for content analysis of remote sensing images, and by inferring semantic categories of individual pixels of an image, objects and regions contained in the image are segmented.

The current method adopted by image semantic segmentation is a semantic segmentation method based on deep learning, and classic deep learning semantic segmentation networks comprise a complete convolution network (FCN), SegNet, U-Net and the like. The FCN realizes an end-to-end segmentation network for the first time by adopting convolution layers, can accept input images of any size and output segmentation images of the same size. However, the context information and the correlation between pixels are not comprehensively combined, and the segmentation accuracy is not sufficient. The U-Net network realizes the cascade function in the channel dimension and is suitable for the segmentation of multi-scale and large-size images. However, the U-Net has a high requirement on the computing power of the device and has a relatively slow computing speed. The SegNet network can improve the memory utilization rate and the model segmentation efficiency by using the maximum pooling index stored in the encoding stage in the decoding stage. However, when pooling the low-resolution feature map, information of neighboring pixels is ignored, resulting in a loss of accuracy. A pyramid scene parsing network (PSPNet) can make full use of global context information by using a pyramid pooling module to learn multi-level features. But does not fully utilize the entire scene information.

Because the remote sensing image and the common image have obvious difference in resolution, spatial structure and semantics, the traditional method and the common neural network method are difficult to realize efficient segmentation aiming at the characteristics of the remote sensing image. Semantic segmentation of remote sensing images still faces the following challenges:

firstly, semantic category distribution processed by the existing natural image segmentation method is relatively balanced, and the phenomenon that a certain category occupies a larger proportion of an image in a remote sensing image is not considered. Due to the difference of the distribution of the real physical entities on the earth surface, the foreground and the background of the remote sensing image are unbalanced, and the difference of the scales of different types of objects is large. Secondly, the deep learning segmentation model suitable for the natural image is insensitive to the scale change of an object, and the pixel precision is lost when the deep learning segmentation model is directly used for semantic segmentation of the remote sensing image. Compared with the categories such as land and lake, the volume of the categories such as vehicles in the remote sensing image can be even ignored, and large scale change exists between objects. Therefore, the prior semantic segmentation method is not suitable for being directly applied to the remote sensing image, and a corresponding segmentation algorithm needs to be designed aiming at the characteristics of the remote sensing image.

Due to the diversity of remote sensing image acquisition and the particularity of data which is different from natural images, the semantic segmentation can not solve the problem in a single mode before. The remote sensing image has the problem of multi-scale change of objects, large-scale objects dominate in segmentation, and meanwhile, learning of small-scale objects is inhibited, so that small-class objects are difficult to identify. In addition, due to the high resolution of the remote sensing image, the information contained in the image is usually very dense, which causes the problem of uneven distribution of image categories.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a non-equilibrium remote sensing image-oriented multi-scale semantic segmentation method and a non-equilibrium remote sensing image-oriented multi-scale semantic segmentation model, and solves the technical problems that: (1) the object scale difference in the remote sensing image is large. (2) The problem of unbalanced category distribution of the remote sensing image is solved. Aiming at the first problem, the invention provides a multi-scale semantic segmentation model for an unbalanced remote sensing image, which is characterized in that a multi-level semantic segmentation network is designed, features of different scales are extracted, the features are fused at the same level, the missing information complementation is realized, the global context information is fully utilized, meanwhile, on the premise of retaining the local detail information of the image, the mutual influence of multi-scale objects is overcome, and the robustness and the accuracy of remote sensing image segmentation are improved. Aiming at the second problem, the invention carries out algorithm design from two aspects: 1) constructing an inter-class loss function, and realizing the maximization of class intervals of samples of different classes; 2) and (3) constructing a class weight balance distribution loss function, and solving the problem of unbalance of positive and negative samples of all classes.

In order to solve the above technical problems, the technical solution adopted by the present invention is described in detail as follows:

firstly, the invention provides a multiscale semantic segmentation model facing to an unbalanced remote sensing image, which adopts a multi-level semantic segmentation network which can learn fine-grained local features, reserve small category information, learn the whole global context semantic features and reserve large-scale information; the whole network architecture of the multi-level semantic segmentation network is divided into three layers, each layer adopts different network structures to extract features of different scales, segmented images of different resolutions are output, the features are fused after the images are fused at the same level by adopting a Bayesian fusion method, the fusion of multi-scale segmented image information is realized, and the complementation of missing information is realized.

Further, a first Level 1 of the multi-Level semantic segmentation network model adopts data of an original resolution, a second Level 2 adopts data which is sampled 2 times, and a third Level 3 adopts data which is sampled 4 times;

the multi-level semantic segmentation network model adopts a SegNet semantic segmentation network as a main network, an encoder is arranged on the left side of the network and consists of 5 convolution pooling processes, each layer of the first two layers comprises two convolution layers, and each layer of the last three layers comprises three convolution layers;

the decoder is arranged on the right side of the network and consists of 5 upsampling and convolution processes, the first layer and the fourth layer on the right side are an upsampling layer and two convolution layers, the second layer and the third layer are an upsampling layer and three convolution layers, the fifth layer is an upsampling layer and two convolution layers, and finally a Softmax layer is added.

Furthermore, each layer of the encoding stage corresponds to each layer of the decoding stage one by one; each upsampling layer of the decoder corresponds to the maximum pooling layer of the same-level encoder, the upsampling layer of the decoder performs upsampling on the feature map by using an index reserved in the maximum pooling process, so that the features of image classification in the encoding stage are reproduced to generate dense feature maps, the feature maps are restored to the size same as that of the original image, and then the feature maps are classified through the softmax layer to generate the final segmentation map.

Further, the three-layer network output segmentation graph O of the multi-level semantic segmentation network₁,O₂And O₃When the segmentation maps output by the three-layer network are fused after the images are processed, the segmentation map output by any one layer is selected as a priori O_iI is 1,2,3, and any segmentation map O except this segmentation map_jThe likelihood, j ≠ i, j ═ 1,2,3, is calculated, so the posterior probability is calculated as:

n represents the category of the current pixel, and m represents the number of categories; f_niAnd B_niRespectively representing a foreground area and a background area when the category is n; o is_niAs a priori, i ═ 1,2,3, O_njUsed to calculate the likelihood, j ≠ i, j ═ {1,2,3 }; in each region by comparing O_niAnd O_njThe likelihood is calculated over the foreground and background of each class.

Preferably, when the segmentation maps output by the three-layer network are fused after the image, the segmentation maps output by the first two layers are fused; and then fusing the segmentation maps obtained by fusing the first two layers with the segmentation map output by the third layer, wherein the specific steps are as follows:

firstly, using a segmentation graph output by a first layer network as a priori, using a segmentation graph output by a second layer network to calculate the likelihood ratio, and then combining the information of the two segmentation graphs based on a Bayesian formula;

then, the two are exchanged, the segmentation graph output by the second layer is used as a priori, the segmentation graph output by the first layer is used for calculating the likelihood ratio, and then the integration is carried out based on a Bayes formula;

finally, the segmentation maps of the first two network layers and the third network layer are fused in the same manner to obtain the final set segmentation map.

Then, the invention also provides a multiscale semantic segmentation method for the unbalanced remote sensing image, which comprises the following steps:

firstly, improving a semantic segmentation network architecture: constructing a multi-level semantic segmentation network model, outputting segmentation maps with different scales on each layer of network, and fusing multi-scale segmentation image information by adopting a Bayesian fusion method;

II, equalizing loss function: the optimization algorithm which can separate pixels of different classes more and can aggregate pixels of the same class more is adopted, and the optimization algorithm is as follows:

1) constructing a class weight balanced distribution loss function based on a focus loss function Focal loss, and solving the problem of unbalance of positive and negative samples of all classes;

2) introducing a Hinge loss function Hinge loss to construct an inter-class loss function, and realizing the maximization of class intervals of samples of different classes; 3) equalization loss function: an overall loss function is constructed.

Further, the class weights balance the distribution loss function

Wherein p is_tIs the probability that the sample is positive for class t, M is the number of sample classes, t represents a certain class, γ is the hyperparameter, -log (p)_t) Is an initial cross entropy loss function; λ is an adjustable hyper-parameter, set to 0<λ<1, itThe purpose is in order to increase the adjustability to different sample classification accuracy, reduce complicated sample punishment weight, increase the punishment contribution of good sample.

Further, the inter-class loss function is

Hinge＝max(0,1+maxw_wrong-w_correct) (11)

Wherein, w_wrongIs the number of misclassified samples, w_correctIs the number of correctly classified samples, for w_wrongTaking the maximum value indicates that the category with the most error samples is selected.

Further, the overall loss function is as follows

Beta is a hyper-parameter for controlling the contribution rate of the Hinge loss penalty term, and beta is more than 0.

Compared with the prior art, the invention has the advantages that:

(1) the invention provides a multi-level semantic segmentation network, extracts features of different scales, fuses the features at the same level, realizes the complementation of missing information, can learn local features of fine granularity, retain small-class information, can learn the semantic features of the whole global context, fully utilizes the global context information, and retains information of large scale; on the premise of retaining local detail information of the image, the mutual influence of multi-scale objects is overcome, and the robustness and accuracy of remote sensing image segmentation are improved.

(2) The invention also provides a Bayes-based multi-scale post-fusion semantic segmentation method, aiming at the scale dependency characteristics of the remote sensing image, the multi-scale post-fusion method is researched, different scale results are respectively modeled into a priori and a likelihood, the Bayes principle is utilized to make an optimal decision, and the method is verified to improve the segmentation accuracy.

The Bayesian fusion method can better identify the semantic information of the object, the distribution of the outline and the class of the whole object is relatively clearer, the boundaries of the objects in different classes are more obvious, and the method also improves the performance of the network output segmentation graph.

(3) The method designs an equalization loss function of the non-equilibrium remote sensing image semantic segmentation, balances the loss weight of a difficult-to-learn sample, reduces the weight of an easy-to-learn class, increases the weight of a difficult-to-learn class and improves the training stability aiming at the characteristic that the remote sensing image semantic distribution is unbalanced, particularly the phenomenon that the foreground and the background are unbalanced caused by space difference; meanwhile, hinge loss is introduced, the inter-class distance is enlarged, and the boundaries of samples of different classes are more obvious, so that the accuracy of the segmentation structure and the definition of local information classification are improved.

The balanced loss algorithm of the invention can realize uniform segmentation of the semantic segmentation network model on the class unbalanced data.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a diagram of a multi-level semantic segmentation network architecture of the present invention;

FIG. 2 is a first layer segmentation network codec structure according to the present invention;

FIG. 3 is a second layer segmentation network codec structure according to the present invention;

FIG. 4 is a third layer of split network codec structure according to the present invention;

FIG. 5 is a schematic diagram of a Bayesian image fusion algorithm of the present invention.

Detailed Description

The invention is further described with reference to the following figures and specific embodiments.

Example 1

The embodiment designs a multi-level deep neural network model, and particularly provides a multi-scale semantic segmentation model for unbalanced remote sensing images, wherein a multi-level semantic segmentation network which can learn fine-grained local features, retain small-class information, learn whole global context semantic features and retain large-scale information is adopted. The network architecture is shown in fig. 1. The whole network architecture of the multi-Level semantic segmentation network is divided into three layers, wherein the first Level corresponds to Level 1 in the graph 1, and data of original resolution is adopted; the second Level corresponds to Level 2 in fig. 1, and data after 2 times of downsampling is adopted; the third Level corresponds to Level 3 in fig. 1, and data after downsampling by 4 times is used. And the remote sensing image is downsampled twice to obtain more local information and global information.

Aiming at multi-scale information, each layer adopts different network structures to extract features of different scales, more visual information is kept in the feature extraction process as much as possible, and segmented images of different resolutions are output; then, the features are subjected to image post-fusion by adopting a Bayesian fusion method at the same level, so that fusion of multi-scale segmented image information is realized, missing information complementation is realized, and accurate segmentation of the remote sensing image is realized.

The multi-level network model for segmentation and other classical deep neural network segmentation models have the advantages that the multi-level network can retain good local detail information and global semantic information.

1. The multi-level network architecture of the present embodiment is described in detail below:

the backbone network adopted in this embodiment is a SegNet semantic segmentation network, and as shown in fig. 2,3, and 4, the backbone network corresponds to Level 1, Level 2, and Level 3 in fig. 1, respectively. The left side of the network is the encoder, consisting of 5 convolutional pooling processes, with each of the first two layers containing two convolutional layers, and each of the last three layers containing three convolutional layers. Features of the image are extracted by the convolutional layer, and then the pooling layer is used to reduce the size of the feature map and increase the receptive field. The pooling layer employs maximal pooling to achieve spatial invariance over small spatial movements, but results in loss of positioning accuracy and loss of spatial detail.

Each layer of the encoding stage and each layer of the decoding stage correspond one-to-one, similar to the U-shaped structure of a U-net network. At each upsampling layer of the decoder, which corresponds to the maximum pooling layer of the same-level encoder, the upsampling layer of the decoder upsamples the feature map using the indices reserved by the maximum pooling process. And reproducing the characteristics classified by the images in the coding stage through an upsampling layer to generate dense characteristic graphs, restoring the characteristic graphs to the same size as the original images, and classifying through a softmax layer to generate the final segmentation graph. The SegNet training parameters are few, the occupied calculation memory is small, meanwhile, the segmentation accuracy can be guaranteed, and the method is suitable for high-resolution remote sensing image semantic segmentation.

2. The image post-fusion method of the present embodiment is described in detail below:

the multi-scale network can output segmentation images with different resolutions, and due to the difference of the resolutions, the segmentation effects are different. The patent provides a multi-scale post-fusion semantic segmentation method based on the Bayesian principle, and in a significance detection task, the posterior probability is calculated by integrating significance mapping:

S₁and S₂Are all saliency maps, one of which is used as a prior probability S_i(i ═ {1,2}), another S_j(j ≠ i, j ≠ 1, 2) is used for calculating the likelihood ratio; f_iAnd B_iRepresenting foreground and background regions, respectively, the likelihood in each region being calculated by the formula:

wherein

Indicating the number of pixels in the foreground and,

is that its color feature falls within the inclusion feature S_jForeground of (1)

The number of pixels in (1);

which represents the number of pixels in the background,

is that its color feature falls within the inclusion feature S_jBackground bin

The number of pixels in (1).

Three-layer network output segmentation graph O of multi-level semantic segmentation network₁,O₂And O₃When the segmentation maps output by the three-layer network are fused after the images are processed, the segmentation map output by any one layer is selected as a priori O_i(i ═ 1,2,3), any segmentation map O other than this one_j(j ≠ i, j ≠ 1,2,3}) the likelihood is computed, so the posterior probability is computed as:

wherein n represents the category of the current pixel, and m represents the number of categories; f_niAnd B_niRespectively representing a foreground area and a background area when the category is n; o is_ni(i ═ 1,2,3) as a priori, O_nj(j ≠ i, j ≠ 1,2,3) is used for calculating likelihood; in each region by comparing O_niAnd O_njThe likelihood is calculated at the foreground and background of each class:

representing the number of foreground pixels in the nth class,

is containing the characteristic O_nj(z) number of pixels of the color feature in the foreground region. Using O_njThe posterior probability is calculated as a priori.

As a preferred embodiment, when the segmentation maps output by the three-layer network are fused after the image, the segmentation maps output by the first two layers are firstly used for fusion; then, the segmented images obtained by fusing the first two layers are fused with the segmented image output from the third layer, as shown in (6) and (7) below.

O_n4(z)＝O_B(O_n1(z),O_n2(z))＝p(F_n1|O_n2(z))+p(F_n2|O_n1(z)) (6)

O(z)＝O_B(O_n3(z),O_n4(z))＝p(F_n3|O_n4(z))+p(F_n4|O_n3(z)) (7)

As shown in fig. 5, specifically:

Based on Bayes fusion, different output segmentation maps are repeatedly and forcibly used as priors, effective information of segmentation maps with different resolutions can be fused, and the precision of image segmentation is improved.

Example 2

The embodiment provides a multiscale semantic segmentation method for unbalanced remote sensing images, which comprises the following steps:

improved semantic segmentation network architecture

And constructing a multi-level semantic segmentation network model, outputting segmentation maps with different scales on each layer of network, and fusing multi-scale segmentation image information by adopting a Bayesian fusion method.

The semantic segmentation network may adopt a classical semantic segmentation network as a preferred embodiment, and the multi-level semantic segmentation network model may directly adopt the model described in embodiment 1, which may be specifically referred to the description in embodiment 1, and is not described herein again.

Two, equalizing loss function

The optimization algorithm which can separate pixels of different classes more and can aggregate pixels of the same class more is adopted, and the optimization algorithm is as follows:

1) and (3) constructing a class weight balanced distribution loss function based on the focus loss function Focal loss, and solving the problem of unbalance of positive and negative samples of all classes.

2) And introducing a Hinge loss function Hinge loss to construct an inter-class loss function, and realizing the maximization of class intervals of different classes of samples.

3) Equalization loss function: an overall loss function is constructed.

The following are introduced separately:

1. class weight balanced distribution

In the solution of the multi-classification problem, the sample classes of the data set are unevenly distributed, the number of negative samples is too large, and most of the samples are more differentiated, which often results in ineffective learning in the training process. This problem is particularly evident in the segmentation problem for remote sensing images. The purpose of using the Focal loss function is mainly to address the problem of extreme imbalance between the background and foreground of the target detection scenario. It is improved by increasing modulation factor on the basis of cross entropy loss (directly referring to the prior art), and the specific formula is as follows:

in this formula, p_tIs the probability that the sample is positive for class t, M is the number of sample classes, t represents a certain class, log (p)_t) Is an initial cross entropy loss function; gamma is more than or equal to 0 and is adjustable hyper-parameter, (1-p)_t)^γIs the modulation factor. For easy-to-learn samples, p_tIs close to 1 when the modulation factor tends towards zero. However for difficult samples and misclassified samples p_tThe value of the modulation factor is increased accordingly to balance the training inefficiency problem due to the sample problem. The advantage can effectively solve the problem of training failure caused by class imbalance in the remote sensing image. Therefore, aiming at the multi-classification problem of remote sensing image segmentation, the inter-class loss function formula is adjusted as follows:

where λ is an adjustable hyperparameter, set to 0<λ<1, the purpose is to increase the adjustability of the classification accuracy of different samples, further reduce the weight of a punishment item in the complex sample training on the basis of the original formula (8), increase the weight of the punishment item in the simple sample classification, and achieve the balance consistency of samples among classes. For example when p_tHigher, indicating higher confidence in the class of samples, the penalty weight for alignment will be smaller if λ is set to 1. Its contribution to the training will decrease. Also, if p_tSmaller, which means that the classification of the decision sample is difficult, the sample belongs to a complex sample, when λ is 1, the training weight is larger, the contribution to the training will be higher, and the contribution to λ is higherSet it to 0<λ<1, can reduce complicated sample punishment weight like this, increase the punishment contribution of good sample, and then improve the categorised accuracy of benign sample, consequently, the effective settlement of lambda can directly find a benign balance at complicated sample and easy sample, and then improves whole sample classification accuracy.

2. Balance between classes

Because the difference of adjacent samples in a remote sensing image is small, how to expand the difference of samples between classes is also a problem to be solved, a Hinge Loss (HL) loss function is introduced in the embodiment, and HL is generally used in a maximum interval classification task of a Support Vector Machine (SVM), so that the intra-class distance is reduced, and the inter-class distance is increased to realize a maximum boundary, and for binary classification, the formula is as follows:

Hinge＝max(0,1-y*y_pre) (10)

in the above formula, y is a label of a real sample, and its value can be only-1 or 1. y is_preIs a predicted value. When the absolute value of the predicted value is 1 or more, the distance between the sample and the boundary is 1 or more, and this case is not rewarded because the probability that the sample can be correctly classified is relatively large. Its multi-classification form is as follows:

Hinge＝max(0,1+maxw_wrong-w_correct)

(11)

wherein, w_wrongIs the number of misclassified samples, w_correctIs the number of correctly classified samples. To w_wrongTaking the maximum value indicates that the category with the most error samples is selected. When the number of the misclassified samples is large, the formula (11) gives a punishment item with large training data to promote the training to continue, and only when the number of the misclassified samples is small and the number of the correct training samples is large, the punishment item is automatically reduced, so that the training process is accelerated to finish as soon as possible. Therefore, the samples in the classes can be enabled to be consistent, the interval of the samples among the classes is enlarged, and the accuracy of classification is improved.

3. Balancing algorithm

Finally, in order to solve the problem of category imbalance, increase sample intervals of different categories and improve the accuracy of segmentation, the overall loss function of the invention is as follows:

wherein, beta is a hyper-parameter for controlling the contribution rate of the Hinge loss penalty term, and beta is more than 0. Finally, the model parameters are optimized by adopting a classical gradient descent method to obtain the optimal parameters, and then the training sample is tested.

In conclusion, the invention improves the semantic segmentation network structure and the design loss function equalization method, adopts the spatial multi-scale parallel post-fusion framework to realize scale difference portrayal, can retain good local detail information, and simultaneously better retains global semantic information; loss functions are designed based on pixel level sample distribution unbalance, loss weights of difficult and easy training samples are balanced based on a focusing loss thought, and training stability is improved; meanwhile, hinge loss is introduced, the inter-class distance is enlarged, and the semantic segmentation accuracy is improved; the equalization loss function is combined with the classical semantic segmentation network, so that the accuracy of the segmentation structure and the definition of local information classification are improved.

It is understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art should understand that they can make various changes, modifications, additions and substitutions within the spirit and scope of the present invention.

Claims

1. The multi-scale semantic segmentation model oriented to the unbalanced remote sensing image is characterized in that a multi-level semantic segmentation network which can learn fine-grained local features, retain small category information, learn whole global context semantic features and retain large-scale information is adopted; the whole network architecture of the multi-level semantic segmentation network is divided into three layers, each layer adopts different network structures to extract features of different scales, segmented images of different resolutions are output, the features are fused after the images are fused at the same level by adopting a Bayesian fusion method, the fusion of multi-scale segmented image information is realized, and the complementation of missing information is realized.

2. The non-uniform remote sensing image-oriented multi-scale semantic segmentation model according to claim 1, characterized in that a first Level 1 of the multi-Level semantic segmentation network model adopts data of original resolution, a second Level 2 adopts data after down-sampling by 2 times, and a third Level 3 adopts data after down-sampling by 4 times;

3. The multi-scale semantic segmentation model for the remote sensing images with unequal weight according to claim 2, wherein each layer of the encoding stage corresponds to each layer of the decoding stage in a one-to-one manner; each upsampling layer of the decoder corresponds to the maximum pooling layer of the same-level encoder, the upsampling layer of the decoder performs upsampling on the feature map by using an index reserved in the maximum pooling process, so that the features of image classification in the encoding stage are reproduced to generate dense feature maps, the feature maps are restored to the size same as that of the original image, and then the feature maps are classified through the softmax layer to generate the final segmentation map.

4. The non-equilibrium remote sensing image-oriented multi-scale semantic segmentation model according to any one of claims 1 to 3, characterized in that a three-layer network output segmentation graph O of the multi-level semantic segmentation network₁,O₂And O₃The segmentation map output by the three-layer network is shown inWhen the image post-fusion is carried out, any layer of output segmentation graph is selected as prior O_iI is 1,2,3, and any segmentation map O except this segmentation map_jThe likelihood, j ≠ i, j ═ 1,2,3, is calculated, so the posterior probability is calculated as:

5. The multi-scale semantic segmentation model for the unbalanced remote sensing image according to claim 4, wherein when segmentation maps output by a three-layer network are fused after an image, the segmentation maps output by the first two layers are fused; and then fusing the segmentation maps obtained by fusing the first two layers with the segmentation map output by the third layer, wherein the specific steps are as follows:

6. The multi-scale semantic segmentation method for the unbalanced remote sensing image is characterized by comprising the following steps of:

2) introducing a Hinge loss function Hinge loss to construct an inter-class loss function, and realizing the maximization of class intervals of samples of different classes;

3) equalization loss function: an overall loss function is constructed.

7. The method for multi-scale semantic segmentation oriented to the remote sensing images with non-equilibrium weights as claimed in claim 6, characterized in that the class weight equilibrium distribution loss function is

Wherein p is_tIs the probability that the sample is positive for class t, M is the number of sample classes, t represents a certain class, γ is the hyperparameter, -log (p)_t) Is an initial cross entropy loss function; λ is an adjustable hyper-parameter, set to 0<λ<1, aiming at increasing the adjustability of the classification accuracy of different samples, reducing the penalty weight of complex samples and increasing the penalty contribution of good samples.

8. The method for multiscale semantic segmentation oriented to unbalanced remote sensing images according to claim 7, wherein the inter-class loss function is

Hinge＝max(0,1+maxw_wrong-w_correct) (11)

9. The method for multiscale semantic segmentation of remote sensing images according to claim 8, wherein the overall loss function is as follows

Wherein, beta is a hyper-parameter for controlling the contribution rate of the Hinge loss penalty term, and beta is more than 0.