CN111797712A

CN111797712A - Remote sensing image cloud and cloud shadow detection method based on multi-scale feature fusion network

Info

Publication number: CN111797712A
Application number: CN202010545643.XA
Authority: CN
Inventors: 张秀再; 胡敬锋; 周丽娟
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2020-06-16
Filing date: 2020-06-16
Publication date: 2020-10-20
Anticipated expiration: 2040-06-16
Also published as: CN111797712B

Abstract

The invention discloses a remote sensing image cloud and cloud shadow detection method based on a multi-scale feature fusion network, which comprises the steps of constructing a training set comprising a plurality of remote sensing image pictures, manufacturing labels of all pixel points in the remote sensing image pictures of the training set, enabling the labels to form mask images corresponding to the remote sensing image pictures, inputting the mask images and the remote sensing image pictures into a supervision network for training, storing optimal models through 50 epochs and automatically stopping training when cost function loss of the supervision network is convergent and tends to be stable and all indexes of the training set obtain the highest values, obtaining detection models, and adopting the images to be detected in a detection model test set to carry out cloud and cloud shadow detection so as to quickly and efficiently carry out corresponding detection on cloud and cloud shadows in the images to be detected, and improving the accuracy of corresponding detection results.

Description

Remote sensing image cloud and cloud shadow detection method based on multi-scale feature fusion network

Technical Field

The invention relates to the technical field of deep learning and image recognition, in particular to a remote sensing image cloud and cloud shadow detection method based on a multi-scale feature fusion network.

Background

In recent years, with the rapid development of remote sensing technology, information contained in various remote sensing satellite images is more and more abundant, and the remote sensing satellite images are more and more widely applied to the fields of agricultural production, natural disaster prediction, military science and technology, geographical mapping, change detection, water conservancy and traffic and the like. The cloud and the cloud shadow blur the spectral information of the optical remote sensing sensor, and are not beneficial to observing the ground object; in addition, long-term observation and practice has shown that cloud generation and dissipation, and evolution and transformation between various types of clouds, are performed under certain conditions of moisture and atmospheric movement. The evolution of the cloud can be seen as the movement of water vapor and atmospheric motion, so that the cloud is closely and inseparably connected with the occurrence of various weather phenomena. Therefore, the detection of cloud and cloud shadow in the remote sensing image is of great significance to research in many fields. At present, various cloud detection methods have been proposed, and these methods can be mainly classified into two types, a multi-image based method and a single-image based method. The multi-image based method requires a set of images to be taken at different times in the same background and requires more images to be acquired in a short time to ensure that the underground surface does not change too much, in which process it is very difficult to acquire a clear reference image. For single image detection methods, early cloud detection methods were based primarily on thresholds, by extracting various spectral features of each pixel, and then using several thresholds to determine the cloud pixels. The threshold-based cloud detection method usually only utilizes low-level spectral information, but ignores spatial information, and has poor cloud detection effect under different imaging conditions due to sensitivity to the underlying surface and sensitivity to the cloud coverage; with the rapid development of big data and machine learning technology, the complex work in cloud detection can be realized by a computer, so that the labor cost can be greatly saved, and the cloud detection method based on machine learning is more and more widely applied. Such as Xu et al extract cloud boundaries using a decision tree. Hu et al combine computer vision with a random forest algorithm to obtain a cloud-overlaid image map. In general, machine learning methods yield more accurate cloud detection results than threshold methods. However, these methods based on machine learning use a hand-made feature and a simple classifier, and do not use semantic-level information. These artificially designed features rely on expert a priori knowledge and sensors, it is difficult to accurately segment the features of the cloud with a complex underlying surface, and their detection performance depends mainly on the choice of manual features.

Recently, deep neural networks have been rapidly developed in many fields, the powerful segmentation capability thereof has been widely recognized, and cloud detection methods based on deep learning have attracted attention. Shi and S et al used superpixel segmentation and Full Convolution Neural Networks (FCNN) to detect clouds in Quickbird, *** earth images, and Landsat8 images. Chai et al used an adaptive SegNet to detect clouds and cloud shadows in Landsat images. Sorour et al use a full convolution neural network (FCN) for cloud detection, the algorithm includes a pre-processing thresholding step for ice and snow retrieval, and uses the existing landsat8 cloud mask as a premise for improved performance. Xu et al use a depth residual network to detect clouds and cloud shadows in remote sensing images. Wu and the like acquire a cloud probability map by using a deep convolutional neural network, and then obtain a refined high-resolution first-cloud mask by using a composite image filtering technology. The traditional technology only uses spectral features to hardly distinguish the cloud from some bright ground objects (such as snow and white buildings), so that the detection accuracy of the cloud and cloud shadow in the remote sensing image is low.

Disclosure of Invention

Aiming at the problems, the invention provides a remote sensing image cloud and cloud shadow detection method based on a multi-scale feature fusion network.

In order to realize the purpose of the invention, the invention provides a remote sensing image cloud and cloud shadow detection method based on a multi-scale feature fusion network, which comprises the following steps:

s10, constructing a training set comprising a plurality of remote sensing image pictures, making labels of all pixel points in the remote sensing image pictures of the training set, enabling the labels to form mask images corresponding to the remote sensing image pictures, and inputting the mask images and the remote sensing image pictures into a supervision network for training;

s20, when the loss of the cost function of the supervision network is convergent and tends to be stable, and each index of the training set obtains the highest value, storing the optimal model through 50 epochs and automatically stopping training to obtain a detection model;

and S30, carrying out cloud and cloud shadow detection by using the to-be-detected images in the detection model test set.

In one embodiment, the supervisory network includes a res.block module to prevent network degradation, a multi-scale convolution module to increase the network's receptive field, and a multi-scale feature module to extract different scale information.

Specifically, the experimental parameters of the supervision network during training include: learning rate of 0.0001, batch size of 8, first momentum size beta₁0.9 and a second momentum magnitude beta₂At 0.999, the learning rate is set to 0.0001, and the minimum learning rate is set to 0.0000001.

Specifically, in the res.block module, a convolution of 1 × 1 is added before and after the convolution of 3 × 3.

Specifically, the multi-scale convolution module includes 3 convolution kernels, each convolution kernel having a size of 1 × 1, 3 × 3, and 5 × 5, respectively, and each convolution kernel uses an expansion rate of 1, 2, and 4, respectively.

Specifically, in the training process of the supervision network, the information extracted by the encoder is processed by each convolution kernel to obtain a plurality of sampling features, the plurality of sampling features are subjected to feature fusion with the decoder, and the fusion features are extracted by each convolution kernel.

In an embodiment, the method for detecting cloud and cloud shadow of remote sensing image based on multi-scale feature fusion network further includes:

obtaining average cross-over ratio, precision rate, recall rate, precision rate and harmonic mean value generated in the process of detecting each image to be detected of the detection model, determining the detection performance of the detection model according to the average cross-over ratio, precision rate, recall rate, precision rate and harmonic mean value,

specifically, the average intersection ratio includes:

the essenceThe accuracy rate comprises:

the recall rate includes:

the accuracy rate comprises:

the blended mean includes:

wherein mIoU represents the average cross-over ratio, C_CCNumber of correctly detected cloud pixels, C_NCNumber of cloud pixels representing false detection as non-cloud pixels, C_NNRepresenting the number of correctly detected non-cloud pixels, Precison representing Accuracy, Recall representing Recall, Accuracy representing Accuracy, F_1scoreRepresenting the harmonic mean.

According to the remote sensing image cloud and cloud shadow detection method based on the multi-scale feature fusion network, a training set comprising a plurality of remote sensing image pictures is constructed, labels of all pixel points in the remote sensing image pictures of the training set are made, the labels form mask images corresponding to the remote sensing image pictures, the mask images and the remote sensing image pictures are input into a monitoring network for training, when the loss of cost functions of the monitoring network is convergent and tends to be stable, and all indexes of the training set obtain the highest values, the optimal model is stored through 50 epochs and the training is automatically stopped, a detection model is obtained, the detection model is adopted to test images to be detected in the set for cloud and cloud shadow detection, so that the cloud and cloud shadows in the images to be detected are correspondingly detected quickly and efficiently, and the accuracy of corresponding detection results can be improved.

Drawings

Fig. 1 is a flowchart of a remote sensing image cloud and cloud shadow detection method based on a multi-scale feature fusion network according to an embodiment;

fig. 2 is a flowchart of a remote sensing image cloud and cloud shadow detection method based on a multi-scale feature fusion network according to another embodiment;

fig. 3 is a block module schematic diagram of an embodiment;

FIG. 4 is a schematic diagram of a multi-scale convolution module of an embodiment;

FIG. 5 is a multi-scale feature module diagram of an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The technical problem that this application will solve is to overcome prior art and only use spectral feature to be difficult to distinguish cloud and some bright ground thing (e.g. snow, white building). Due to the limited spectral range (including blue, green, red, near infrared bands) of clouds and cloud shadows, the complexity of the underlying surface, accurate detection of clouds and cloud shadows is difficult for optical high resolution images. The existence of clouds and cloud shadows has a significant impact on the efficient late-stage application of optical high-resolution images, and manual labeling of cloud and cloud shadow pixels requires a great deal of time and human resources. To solve this problem, a detection algorithm based on a Multi-scale Feature Fusion Network (MFFN) model is proposed. The algorithm mainly comprises a Res.block module, a multi-scale convolution (MCM) and a multi-scale feature module (MFM), can extract rich spatial information and semantic information, can detect the cloud and cloud shadow in the remote sensing image at a pixel level, and can obtain a relatively fine edge.

Referring to fig. 1, fig. 1 is a flowchart of a method for detecting cloud and cloud shadow of a remote sensing image based on a multi-scale feature fusion network according to an embodiment, and includes the following steps:

s10, constructing a training set comprising a plurality of remote sensing image pictures, making labels of all pixel points in the remote sensing image pictures of the training set, enabling the labels to form mask images corresponding to the remote sensing image pictures, and inputting the mask images and the remote sensing image pictures into a supervision network for training.

The training set may include a plurality of remote sensing imagery pictures. In consideration of the video memory of a computer and the calculation speed, the picture (remote sensing image picture) is cut into 256 multiplied by 256 pixels, so that a training set is constructed, and the training speed can be improved. In the label manufacturing process, whether each pixel point is 'cloud', 'cloud shadow' or 'non-cloud' is respectively represented by 255, 128 and 0, and a mask image corresponding to the original image is formed for supervising the training of the network.

And S20, when the loss of the cost function of the supervision network is convergent and tends to be stable, and each index of the training set obtains the highest value, storing the optimal model through 5 epochs and automatically stopping training to obtain the detection model.

In one embodiment, the supervisory network includes a res.block (residual module) to prevent network degradation, a multi-scale convolution module to increase the network's receptive field, and a multi-scale feature module to extract different scale information.

In one example, the experimental parameters of the supervising network while training include: learning rate of 0.0001, batch size of 8, first momentum size beta₁0.9 and a second momentum magnitude beta₂At 0.999, the learning rate is set to 0.0001, and the minimum learning rate is set to 0.0000001.

In this example, the experimental parameter settings are: learning rate of 0.0001, batch size of 8, set momentum size beta₁0.9 and β₂The learning rate was attenuated to 0.999 to prevent overfitting, and was set to0.0001. the learning rate is reduced to one tenth of the original one every 10 epochs of iteration (one epoch after all data sets are iterated), and the minimum learning rate is set to 0.0000001. When the cost function loss is convergent and tends to be stable, and each index (accuracy rate and average cross-over ratio) of the training set and the verification set obtains the highest value, the optimal model is stored through 50 epochs, and the training is automatically stopped.

In one example, the res.block module adds a 1 × 1 convolution before and after a 3 × 3 convolution.

In res.block used in the constructed network model constructed in this example, one convolution of 1 × 1 is added before and after a convolution of 3 × 3, which can effectively reduce network parameters and suppress degradation with the depth of the network.

In one example, the multi-scale convolution module includes 3 convolution kernels, each convolution kernel being 1 × 1, 3 × 3, and 5 × 5 in size, respectively, and each convolution kernel using a dilation rate of 1, 2, and 4, respectively.

The model obtains different sampling characteristics from information extracted by an encoder through different convolution kernels, then performs characteristic fusion with a decoder, and extracts characteristics through three different convolution kernels, so that richer spatial information and semantic information are obtained, and the boundary of a segmentation result can be finer.

In one example, in the training process of the supervision network, the information extracted by the encoder is processed by each convolution kernel to obtain a plurality of sampling features, the plurality of sampling features are subjected to feature fusion with the decoder, and the fusion features are extracted by each convolution kernel.

The example adopts a multi-scale feature module similar to a PPM (pyramid pooling model) to extract multi-scale features, performs down-sampling through four parallel average pooling, obtains feature maps with sizes of 8 × 8, 6 × 6, 2 × 2 and 1 × 1 respectively, and then performs up-sampling on the pooled feature maps to the same size through a bilinear interpolation method to realize the function of multi-scale feature fusion.

Further, in the process of making the label, "cloud", "cloud shadow" and "non-cloud" are respectively represented by 255, 128 and 0, so that when the model is called, 2, 1 and 0 corresponding to each pixel recognized by the model are respectively converted into 255, 128 and 0, that is, the cloud is white, the cloud shadow is gray and the non-cloud is black in the recognized image.

In the above steps, the trained semantic segmentation model (detection model) can be used for performing cloud and cloud shadow detection on the images in the test set.

specifically, the average intersection ratio includes:

the accuracy rate includes:

the recall rate includes:

the accuracy rate comprises:

the blended mean includes:

Compared with the traditional scheme, the remote sensing image cloud and cloud shadow detection method based on the multi-scale feature fusion network has the following technical effects:

(1) the Res.block modules used in the constructed network model respectively increase a convolution of 1 multiplied by 1 before and after a convolution of 3 multiplied by 3, and an activation function used after three parallel convolutions is leak _ ReLU, so that compared with the use of two continuous convolutions of 3 multiplied by 3, the network model can effectively reduce network parameters and can inhibit the phenomenon of degradation along with the deepening of the network depth;

(2) the multi-scale convolution (MCM) module used in the model comprises three convolutions, the sizes of convolution kernels are respectively 1 × 1, 3 × 3 and 5 × 5, the receptive field of a neural network can be increased by expanding the convolutions, the expansion rates used by the three convolutions are respectively 1, 2 and 4, the model obtains different sampling characteristics from information extracted by an encoder through different convolution kernels, then the sampling characteristics are fused with a decoder, and the characteristics are extracted by using the three different convolution kernels, so that richer space information and semantic information are obtained, and the boundary of a segmentation result can be finer;

(3) the multi-scale feature module adopted in the model is similar to a PPM (pyramid pooling model) to extract multi-scale features, down-sampling is carried out through four parallel average pooling, the sizes of obtained feature maps are respectively 8 multiplied by 8, 6 multiplied by 6, 2 multiplied by 2 and 1 multiplied by 1, and then the pooled feature maps are up-sampled to the same size through a bilinear interpolation method so as to realize the function of multi-scale feature fusion.

(4) The cloud and cloud shadow with the complex underlying surface can be well identified, and the robustness is good.

In an embodiment, the method for detecting cloud and cloud shadow of remote sensing images based on the multi-scale feature fusion network can be as shown in fig. 2, firstly, inputting a data set and a corresponding label into a network, wherein the label is used for supervising network training, calculating a probability value of distribution of each pixel class by using a feature input loss function extracted finally, calculating a loss value according to the probability value, and stopping training when the loss value tends to be stable and an mlio u obtains a highest value. And after the network training is finished, calling the trained model, and inputting the model into a classifier to perform pixel-level cloud, cloud shadow and clear sky recognition. In one example, fig. 3 is a res.block block schematic; FIG. 4 is a schematic diagram of a multi-scale convolution module; FIG. 5 is a schematic diagram of a multi-scale feature module.

The data used in the experiment were Landsat8 remote sensing satellite images. Because the data contains limited information, the data is enhanced firstly, and the data set is expanded in modes of cutting, rotating, turning, adding noise and the like. And cutting the enhanced Landsat8 remote sensing data into 256 multiplied by 256 image blocks, wherein 6800 images are obtained, 6000 images are used as a training set, and 800 images are used as a test set. In the process of making the label image, the label image and the training data are subjected to the same manufacturing operations of rotation, overturning and the like so as to realize supervision training on each pixel, and gray values of 0, 128 and 255 are respectively used for representing a clear sky pixel, a cloud shadow pixel and a cloud pixel, namely three labels corresponding to three categories.

In the obtained deep semantic segmentation network model (detection), the overall framework of the multi-feature fusion segmentation network comprises four Res.block modules, four MCM modules, an MFM module and a decoder. The input of the network is 256 × 256 × N (the first two numbers are the dimensionality of data, N is the number of channels of data, e.g., RGB image N is 3) Landsat8 remote sensing image. Two convolution kernels with step size 1, size 3 × 3 and 1 × 1 are used in extracting image features, and an activation function ReLU is used after each convolution. The average pooling with a pooling step of 2 and a filter size of 2 is used in the network. The last layer of the encoder is input into an MFM module, the dimensions of four downsampling features passing through the MFM are reduced to 1/4 when the MFM is input through convolution of 1 multiplied by 1, then upsampling is carried out through a bilinear interpolation method, the sizes of four feature mapping graphs are the same, and then feature fusion is carried out on the output of the last Res.block module and the output feature graph of the MFM module, so that multi-scale context information is obtained. The output of each Res.block module passes through an MCM module to obtain the characteristics after multi-scale convolution, so that the cloud and cloud shadow boundary information can be more completely extracted. The feature graph obtained by the encoder module is subjected to up-sampling through bilinear interpolation, then is subjected to dimensionality reduction through 1 x 1 convolution, and then is subjected to feature fusion with the output feature of the MCM module. And finally, performing feature fusion on the output of each layer of the decoder through rich jump connection, and inputting the fused features into a classifier for 3-class classification. The end-to-end pixel-to-pixel semantic segmentation function can be realized, namely the input data and the output characteristic diagram are the same in size, and each pixel in the image is divided into a cloud pixel, a cloud shadow pixel and a clear sky pixel.

As the algorithm needs to solve the problem of multi-classification, a Softmax function is used when the probability value of the distribution of each pixel class is judged, a loss value is calculated by using the Softmax function, and a loss function J (theta) formula is defined as

In the formula, y_iIs asThe identification labels have K different values, and the remote sensing image is divided into cloud, cloud shadow and clear sky, which belong to the three classification problems, so that K is 3, h_iObserving vector H ═ H for input image pixel₁,h₂,h₃...h_mThe element in the equation, theta is a model parameter, m is the pixel number of the image, and 1 is an explicit function.

In the visual comparison of the detection of Landsat8 remote sensing image cloud and cloud shadow by the algorithm of the embodiment, the K-means algorithm and the reset algorithm, it can be found that more accumulated clouds in the original image accompany with the cloud shadow, the shape is irregular, and partial high-brightness objects exist in the underlying surface, so that the detection result is easily interfered. The K-means algorithm is easily interfered by a highlight object on the underlying surface, the highlight object is falsely detected as cloud, the cloud and cloud shadow edge detection is not fine enough, and the detail loss is serious in the cloud shadow detection process; the resuret algorithm is susceptible to the influence of a small amount of highlight ground objects, and partial details of cloud shadow are lost; the algorithm can effectively detect the cloud and the cloud shadow, has a fine detection result on the edge of the cloud and the cloud shadow, is not easily interfered by a white object on an underlying surface, can obtain a better detection result on a fine cloud area, and has a lower final false detection rate. The algorithm can extract more effective features, so that the visual detection result is closer to a true value.

In the visual comparison process of the algorithm of the embodiment with the K-means algorithm and the reset algorithm for the detection of Landsat8 and cloud shadow, the original image contains a large amount of ice/snow which easily interferes with the detection of a cloud area, and the detection difficulty of cloud and cloud shadow is increased. The K-means algorithm cannot distinguish ice/snow from clouds, and details in cloud shadow results are seriously lost; the resurset can better distinguish the cloud from the ice/snow, but the cloud and cloud shadow detection result has smooth edges and more lost details, and partial cloud shadow false detection and a small amount of thin cloud missing detection exist in the image. The algorithm can effectively distinguish the cloud from the ice/snow, is not easy to be interfered, obtains a better result for detecting the cloud shadow, and is generally superior to a K-means algorithm and a reset algorithm.

In the process of visually comparing the algorithm of the embodiment with the K-means algorithm and the reset to the cloud shadow detection of the Landsat8 image, it can be known that the K-means algorithm has a large error in the result of the cloud shadow detection, and the details are seriously lost; the algorithm of the embodiment obtains a better result for the detection of the cloud shadow, the detection result is obviously superior to the K-means algorithm and the reset algorithm, the edge details of the cloud shadow can be accurately detected, and the edge details are closer to the true value visually.

Table 1 is an average value of quantitative comparison indexes of cloud detection of Landsat8 remote sensing images by three algorithms. As can be seen from Table 1, the K-means algorithm is easily interfered by the objects on the underlying surface, and all detection indexes are the lowest; compared with a reset algorithm, the MFFN has obvious advantages for cloud detection, and each index is far higher than that of the reset algorithm, so that the MFFN is more suitable for cloud detection. In the process of extracting information, the MFFN algorithm uses multi-scale sampling, hole convolution, residual modules and rich jump connection to ensure that less useful information is lost, and a network structure is richer than spatial information and semantic information fused by a reset structure.

TABLE 1

Table 2 is an average value of the cloud shadow detection quantitative evaluation indexes of the Landsat8 remote sensing image by the three algorithms. As can be seen from table 2, the MFFN algorithm can obtain a better result for detecting cloud shadows, which proves that the model can well extract cloud and cloud shadow information; due to the lack of multi-scale feature extraction in the reset algorithm and the fact that multi-scale convolution is not used in the process of jump connection, although feature fusion exists, a large amount of information is still lost, and all indexes of cloud shadow detection are lower than those of the MFFN algorithm. Compared with the two algorithms, the cloud shadow detection indexes are lower than those of the cloud shadow detection, the cloud shadow learning of the model is insufficient due to relatively less cloud shadow information in the data set, and meanwhile, the cloud shadow is not irregular relative to the cloud and is easily interfered by black objects (such as shadows of buildings, trees, oceans and the like) on the underlying surface, so that the cloud shadow detection result is not ideal.

TABLE 2

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

It should be noted that the terms "first \ second \ third" referred to in the embodiments of the present application merely distinguish similar objects, and do not represent a specific ordering for the objects, and it should be understood that "first \ second \ third" may exchange a specific order or sequence when allowed. It should be understood that "first \ second \ third" distinct objects may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be implemented in an order other than those illustrated or described herein.

The terms "comprising" and "having" and any variations thereof in the embodiments of the present application are intended to cover non-exclusive inclusions. For example, a process, method, apparatus, product, or device that comprises a list of steps or modules is not limited to the listed steps or modules but may alternatively include other steps or modules not listed or inherent to such process, method, product, or device.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A remote sensing image cloud and cloud shadow detection method based on a multi-scale feature fusion network is characterized by comprising the following steps:

2. The method for detecting the cloud and cloud shadow of the remote sensing image based on the multi-scale feature fusion network as claimed in claim 1, wherein the supervision network comprises a Res.block module for preventing network degradation, a multi-scale convolution module for increasing a network reception field, and a multi-scale feature module for extracting different scale information.

3. The method for detecting cloud and cloud shadow of remote sensing images based on the multi-scale feature fusion network as claimed in claim 2, wherein experimental parameters of the supervision network during training comprise: learning rate of 0.0001, batch size of 8, first momentum size beta₁0.9 and a second momentum magnitude beta₂At 0.999, the learning rate is set to 0.0001, and the minimum learning rate is set to 0.0000001.

4. The method for detecting the cloud and cloud shadow of the remote sensing image based on the multi-scale feature fusion network as claimed in claim 2, wherein a convolution of 1 x 1 is respectively added before and after a convolution of 3 x 3 in the Res.

5. The method for detecting the cloud and the cloud shadow of the remote sensing image based on the multi-scale feature fusion network as claimed in claim 2, wherein the multi-scale convolution module comprises 3 convolution kernels, the sizes of the convolution kernels are 1 x 1, 3 x 3 and 5 x 5 respectively, and the expansion rates used by the convolution kernels are 1, 2 and 4 respectively.

6. The method for detecting the cloud and the cloud shadow of the remote sensing image based on the multi-scale feature fusion network as claimed in claim 4, wherein in the training process of the supervision network, the information extracted by the encoder is processed by each convolution kernel to obtain a plurality of sampling features, the plurality of sampling features are subjected to feature fusion with the decoder, and the features after fusion are extracted by each convolution kernel.

7. The method for detecting the cloud and cloud shadow of the remote sensing image based on the multi-scale feature fusion network according to any one of claims 1 to 6, further comprising:

and obtaining an average intersection ratio, an accuracy rate, a recall rate, an accuracy rate and a harmonic mean value generated in the detection process of each image to be detected of the detection model, and determining the detection performance of the detection model according to the average intersection ratio, the accuracy rate, the recall rate, the accuracy rate and the harmonic mean value.

8. The method for detecting cloud and cloud shadow of remote sensing image based on multi-scale feature fusion network according to claim 7,

the average intersection ratio comprises:

the accuracy rate includes:

the recall rate includes:

the accuracy rate comprises:

the blended mean includes: