CN115830471A

CN115830471A - Multi-scale feature fusion and alignment domain self-adaptive cloud detection method

Info

Publication number: CN115830471A
Application number: CN202310007711.0A
Authority: CN
Inventors: 徐凯; 王文昕; 张飞翔
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2023-01-04
Filing date: 2023-01-04
Publication date: 2023-03-21
Anticipated expiration: 2043-01-04
Also published as: CN115830471B

Abstract

The invention relates to a domain self-adaptive cloud detection method for multi-scale feature fusion and alignment, which comprises the following steps: the method comprises the steps of preparing multi-scale feature fusion and alignment domain adaptive cloud detection, constructing and training a multi-scale feature fusion and alignment domain adaptive cloud detection model, and testing and solving the multi-scale feature fusion and alignment domain adaptive cloud detection model. Compared with the prior art, the domain self-adaptive network for multi-scale feature fusion and alignment is constructed, and the multi-scale feature fusion module and the feature alignment module are provided. Compared with the existing cloud detection technology, the method and the device solve the problems that the existing cloud detection technology is difficult to detect the broken cloud and the boundary, and the domain of the source domain data set and the domain of the target domain data set generated by the cloud detection network are not changed, so that the source training model is difficult to popularize to the target data set due to the domain difference.

Description

Multi-scale feature fusion and alignment domain self-adaptive cloud detection method

Technical Field

The invention relates to the technical field of optical remote sensing image processing, in particular to a domain self-adaptive cloud detection method for multi-scale feature fusion and alignment.

Background

Clouds are a natural phenomenon in the atmosphere and often appear in remote sensing images acquired by optical satellites. In the meteorological field, cloud detection is used as preprocessing work for inverting various parameters of atmosphere and earth surface, and the distribution of the cloud detection directly influences the inversion results of other parameters. As the cloud is one of important meteorological and climatic elements, the distribution of the cloud not only can help to discover dangerous climatic phenomena such as rainstorms, hurricanes and tornadoes, but also can track the change of meteorological conditions; for the earth surface observation task, because 60% of the earth surface is covered by the cloud layer, the obtained optical remote sensing image is often polluted by the cloud layer, so that the spectrum distortion of an original object is caused, the interpretation of the remote sensing image and a product is influenced, and the information extraction is interfered. Therefore, it is important to improve the quality of the remote sensing image through cloud detection.

Cloud detection aims to identify and distinguish cloud pixels from non-cloud pixels in a remote sensing image. Most existing depth models (especially depth neural networks) are composed of a fixed-scale receptive field, can detect most of the clouds with large-scale boundaries, and are difficult to process, such as small-scale clouds are often ignored. Moreover, the diversity of cloud scenarios and the variability of the cloud make cloud detection challenging. For the complex phenomenon of highly mixed scenes with ragged clouds filling uneven land, it is difficult to detect and segment ragged clouds and boundaries of different sizes and dimensions. Furthermore, the shape of the cloud changes dramatically with changes in the environment, making complex cloud boundaries more difficult to capture.

In addition, most current Convolutional Neural Network (CNN) based cloud detection methods are built on supervised learning frameworks that require a large number of pixel-level labels. However, manually annotating pixel labels for large remote sensing images is both expensive and time consuming. An Unsupervised Domain Adaptation (UDA) method proposed for this, which generalizes the model trained on the labeled images of the source satellites to the unlabeled images of the target satellites, can solve the problem of no large number of labels for training, but still has the phenomenon of domain migration across satellite images in domain adaptation.

Disclosure of Invention

The invention aims to solve the defects of domain shift in cloud detection of cross-satellite images and detection of broken clouds and cloud boundaries in the prior art, and provides a domain self-adaptive cloud detection method for multi-scale feature fusion and alignment to solve the problems.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a domain self-adaptive cloud detection method for multi-scale feature fusion and alignment comprises the following steps:

11 Multi-scale feature fusion and aligned domain-adaptive cloud detection preparation: classifying the source domain remote sensing satellite image and the target domain remote sensing satellite image according to the cloud content, and selecting image data of each percentage according to a certain proportion; preprocessing such as wave band merging and cutting is carried out on the source domain remote sensing satellite image and the target domain remote sensing satellite image; carrying out normalization preprocessing on the source domain remote sensing satellite image label;

12 Constructing and training a domain-adaptive cloud detection model for multi-scale feature fusion and alignment: constructing a multi-scale feature fusion and alignment domain adaptive cloud detection model, inputting the preprocessed remote sensing satellite image and the label into the multi-scale feature fusion and alignment domain adaptive cloud detection model for training to obtain the trained multi-scale feature fusion and alignment domain adaptive cloud detection model;

13 Testing and solving for a domain-adaptive cloud detection model for multi-scale feature fusion and alignment: inputting the target domain remote sensing satellite images which are not marked in a large amount into a trained multi-scale feature fusion and aligned domain self-adaptive cloud detection model for testing and obtaining a cloud detection prediction segmentation result.

The domain-adaptive cloud detection preparation for multi-scale feature fusion and alignment comprises the following steps:

21 B4, B3 and B2 wave bands of the source domain remote sensing satellite image and the target domain remote sensing satellite image are merged and preprocessed into an RGB three-channel image;

22 Non-overlapping cutting is carried out on the source domain remote sensing satellite image, the label and the target domain remote sensing satellite image to be fixed size of 321 x 321;

23 Normalizing the pixel values of the source domain remote sensing satellite image label: 0. 64 and 128 are 0 and represent clean pixels, 192 and 256 are 1 and represent cloud pixels, and a binary single-channel label image is obtained;

24 Derive processed images into tif format;

25 The processed images are evenly selected according to the cloud content percentage to ensure the balance of positive and negative samples.

The method for constructing and training the multi-scale feature fusion and alignment domain self-adaptive cloud detection model comprises the following steps:

31 Specific steps of constructing a domain-adaptive cloud detection model for multi-scale feature fusion and alignment are as follows:

311 Constructing an encoder structure for extracting abstract features, wherein the encoder structure comprises four downsampling blocks consisting of a common 3x3 convolution layer, a batch normalization layer, a RELU linear activation unit and the first four layers of the resnet 34;

312 Constructing a multi-scale fusion module for generating fused multi-scale features, wherein the multi-scale fusion module comprises 4 void convolution layers with different expansion rates and 1 image pooling operation, four void rates are 1, 6, 12 and 18, features with different receiving fields are generated, one image pooling module generates channel features, five parallel outputs are obtained, and the multi-scale features are obtained through one splicing operation and one 1x1 convolution;

313 Constructing a decoder structure for recovering cloud masks from features of different scales, wherein the decoder structure comprises four common 3x3 convolution layers, a batch normalization layer and an up-sampling block formed by RELU linear activation units;

314 Constructing a jump connection structure for combining shallow spatial information and deep semantic information of different scales, wherein the jump connection structure is used for splicing two inputs into one output;

315 Constructing a domain-adaptive framework structure for reducing dependency on tag data, wherein the domain-adaptive framework structure is a cloud detection model framework which inputs cloud images from two different satellites but shares the cloud images for segmentation;

316 Constructing a feature selection structure for guiding class-related feature selection, wherein the feature selection structure obtains a spatial attention map for grouping alignment by using a prediction score map of a segmentation network;

317 Constructing a grouping feature alignment structure for narrowing the feature distribution gap between a source domain and a target domain to obtain domain invariant feature representation, wherein the grouping feature alignment structure divides feature mapping related to cloud extracted from the kth hidden layer of a network into a series of groups, and a counterstudy strategy is respectively used for each grouping feature of the source domain and the target domain;

32 Specific steps for training a domain-adaptive cloud detection model for multi-scale feature fusion and alignment are as follows:

321 Inputting the preprocessed source domain remote sensing satellite image and the label and target domain remote sensing satellite image into a cloud detection model;

322 By forward propagation, the segmentation probability is obtained;

323 Computing the segmentation probability by using the binary cross entropy BCE loss as a loss function of the network model to obtain segmentation loss;

324 Using binary cross entropy BCE loss as a loss function of a grouped feature aligned countermeasure structure to calculate the discrimination results of the source domain and the target domain to obtain countermeasure loss;

325 Using the MAE loss as a loss function of the packet feature alignment structure to calculate the discrimination result of the source domain and the target domain to obtain the loss so as to minimize the distance between the source domain and the target domain;

326 Backward propagation to determine gradient vectors, updating model parameters;

327 Repeating the above processes until the set times or the average value of the error loss does not decrease any more, and completing the training to obtain the domain self-adaptive cloud detection model.

The testing and solving of the domain self-adaptive cloud detection model for multi-scale feature fusion and alignment comprises the following steps:

41 Reading a remote sensing satellite image of a target domain, and forming the remote sensing satellite image into a tif format image;

42 B4, B3 and B2 wave bands of the target domain remote sensing satellite image are merged and preprocessed into an RGB three-channel image;

43 Cutting the target domain remote sensing satellite image into a fixed size of 321 × 321 without overlapping,

44 Inputting the preprocessed image into a trained multi-scale feature fusion and aligned domain self-adaptive cloud detection model for cloud pixel detection;

45 Get the well-segmented cloud mask map.

Advantageous effects

The invention relates to a multi-scale feature fusion and alignment domain self-adaptive cloud detection method, which is characterized in that a multi-scale feature fusion and alignment domain self-adaptive network is constructed, wherein a cloud image input from two different satellites shares a cloud detection model framework for segmentation by a domain self-adaptive framework structure, so that the problem of less satellite image label amount of a target domain is solved, a multi-scale feature fusion module is added in the segmentation network, a multi-scale receptive field capable of capturing different scale clouds is obtained after convolution of different voidage rates, multi-scale representation of a multi-cloud image is obtained, and the detection accuracy of boundary clouds of different scales is improved. In addition, in a cross-satellite image cloud detection task, images acquired by different satellite sensors have domain differences such as spectrum and resolution difference, and therefore the source training model is difficult to popularize to a target data set. The feature alignment module in the model constructed by the invention encourages the cloud detection network to generate the domain invariant features of the source and target domain data sets.

Drawings

FIG. 1 is a flow chart of a domain-adaptive cloud detection method for multi-scale feature fusion and alignment;

FIG. 2 is a diagram of a shared network model for a domain-adaptive cloud detection method with multi-scale feature fusion and alignment;

FIG. 3 is a domain adaptive framework diagram of a multi-scale feature fusion and alignment domain adaptive cloud detection method;

FIG. 4 is a comparison graph of the results of a domain-adaptive cloud detection method for fusion and alignment of GF-2 satellite images and their labels with a multi-scale feature;

Detailed Description

For a better understanding and appreciation of the structural features and advantages achieved by the present invention, reference will be made to the following detailed description of preferred embodiments thereof, in conjunction with the accompanying drawings, in which:

as shown in fig. 1, the method for domain-adaptive cloud detection with multi-scale feature fusion and alignment according to the present invention includes the following steps:

the first step, domain self-adaptive cloud detection preparation work of multi-scale feature fusion and alignment:

classifying the source domain remote sensing satellite image and the target domain remote sensing satellite image according to the cloud content, and selecting image data of each percentage according to a certain proportion; preprocessing such as wave band merging and cutting is carried out on the source domain remote sensing satellite image and the target domain remote sensing satellite image; carrying out normalization preprocessing on the source domain remote sensing satellite image label; the method has the advantages that the model can be rapidly and stably converged, and the segmentation precision is improved, and the specific steps are as follows;

(1) Merging and preprocessing the B4, B3 and B2 wave bands of the source domain remote sensing satellite image and the target domain remote sensing satellite image into an RGB three-channel image;

(2) Performing non-overlapping cutting on the source domain remote sensing satellite image, the label and the target domain remote sensing satellite image to be a fixed size of 321 multiplied by 321;

(3) Normalizing the pixel value of the source domain remote sensing satellite image: 0. 64 and 128 are 0 and represent clean pixels, 192 and 256 are 1 and represent cloud pixels, and a binary single-channel label image is obtained;

(4) Deriving the processed image into a tif format;

(5) And (4) averagely selecting the training set images from the processed images according to the cloud content percentage so as to ensure the balance of positive and negative samples.

Secondly, constructing and training a domain self-adaptive cloud detection model with multi-scale feature fusion and alignment:

and constructing a multi-scale feature fusion and alignment domain self-adaptive cloud detection model, and inputting the preprocessed source domain remote sensing satellite image, the label data and the target domain remote sensing satellite image into the multi-scale feature fusion and alignment domain self-adaptive cloud detection model to obtain a trained cloud detection network model. The domain self-adaptive framework structure enables the cloud pictures input from two different satellites to share the cloud detection model framework for segmentation as shown in figure 2, and a multi-scale feature fusion module is added in a segmentation network to obtain multi-scale representation of a multi-cloud image, so that the detection precision of boundary clouds with different scales is improved. In addition, as shown in fig. 3, the constructed feature alignment module encourages the cloud detection network to generate domain-invariant features of the source and target domain data sets to eliminate domain differences, such as spectral and resolution differences, in the images acquired by different satellite sensors.

The method comprises the following specific steps:

(1) The specific steps of constructing the multi-scale feature fusion and alignment domain self-adaptive cloud detection model are as follows:

(1-1) constructing an encoder structure for extracting abstract features, wherein the encoder structure comprises four downsampling blocks consisting of a common 3x3 convolution layer, a batch normalization layer, a RELU linear activation unit and the first four layers of a resnet 34;

(1-2) constructing a multi-scale fusion module for generating fused multi-scale features. The multi-scale fusion module comprises 4 hole convolution layers with different expansion rates and 1 image pooling operation. The four void rates are 1, 6, 12, 18, resulting in features with different reception fields. An image pool module generates channel features. Obtaining five parallel outputs and obtaining multi-scale characteristics through one splicing operation and one 1 multiplied by 1 convolution;

(1-3) constructing a decoder structure for recovering cloud masks from features of different scales, wherein the decoder structure comprises four common 3x3 convolutional layers, a batch normalization layer and an upsampling block formed by RELU linear activation units;

(1-4) constructing a jump connection structure for combining shallow spatial information and deep semantic information with different scales, wherein the jump connection structure is formed by splicing two inputs into one output;

(1-5) constructing a domain-adaptive framework structure for reducing dependency on tag data, wherein the domain-adaptive framework structure is a cloud detection model framework which inputs cloud images from two different satellites but shares the cloud images for segmentation;

(1-6) constructing a feature selection structure for guiding class-related feature selection, wherein the feature selection structure obtains a spatial attention map for grouping alignment by using a prediction score map of a segmentation network;

(1-7) constructing a grouped feature alignment structure for narrowing the feature distribution gap between the source domain and the target domain to obtain a domain invariant feature representation, wherein the grouped feature alignment structure divides feature mapping related to cloud extracted from the kth hidden layer of the network into a series of groups, and a counterstudy strategy is respectively used for each grouped feature of the source domain and the target domain.

Firstly, an original source domain image and a target domain image respectively pass through an encoder structure, then pass through a multi-scale feature fusion structure for once, output of the multi-scale feature fusion structure passes through a decoder structure, then a fourth down-sampling of the encoder is connected with a first up-sampling of the decoder through a skip connection structure, the third down-sampling is connected with a second up-sampling through the skip connection structure, the second down-sampling is connected with the third up-sampling through the skip connection structure, the first down-sampling is connected with the fourth up-sampling through the skip connection structure, the last two layers of source domain output and target domain output after the skip connection and splicing pass through a feature alignment module, and finally the building of the whole model is completed through a small-size common convolution structure.

(2) The specific steps of training the multi-scale feature fusion and alignment domain self-adaptive cloud detection model are as follows:

(2-1) inputting the preprocessed source domain remote sensing satellite image, the preprocessed label and the preprocessed target domain remote sensing satellite image into a cloud detection model;

(2-2) executing the encoder structure once to obtain four down-sampling outputs;

(2-2-1) performing a normal convolutional layer with a convolution kernel of 3x3, a batch normalization layer, a RELU linear activation unit and layer1 of the resnet34 to obtain a first down-sampling output;

(2-2-2) performing layer2 of resnet34 on the first down-sampled output to obtain a second down-sampled output;

(2-2-3) performing layer2 of resnet34 on the second down-sampled output to obtain a third down-sampled output;

(2-2-4) performing layer2 of resnet34 on the third down-sampled output to obtain a fourth down-sampled output;

(2-3) performing a multi-scale feature fusion layer once on the fourth down-sampled output, wherein an output of the multi-scale feature fusion layer is as follows;

(2-3-1) executing a hole convolution layer with a first expansion factor of 1 and a convolution kernel of 3x3 on the output of the fourth downsampling of the encoder structure by the first path to obtain a first output;

(2-3-2) performing a second path on the empty convolutional layer with the expansion factor of 6 and the convolution kernel of 3x3 on the output of the fourth downsampling of the encoder structure to obtain a second output;

(2-3-3) executing a hole convolution layer with a first expansion factor of 12 and a convolution kernel of 3x3 on the output of the fourth downsampling of the encoder structure by the third path to obtain a third output;

(2-3-4) performing, by the fourth path, on the output of the fourth downsampling of the encoder structure, a hole convolution layer with a first expansion factor of 18 and a convolution kernel of 3 × 3 to obtain a fourth output;

(2-3-5) executing a global average pooling for the output of the fourth down-sampling of the encoder structure by a fifth path and executing a common convolutional layer with a convolutional kernel of 1x1 to obtain a fifth output;

(2-4) splicing the results output by the five paths to obtain multi-scale fusion characteristic output;

(2-5) performing a primary convolution on the multiscale fusion feature output by using a common convolution layer with a convolution kernel size of 1x 1;

(2-6) performing up-sampling blocks comprising a common convolution layer with a convolution kernel size of 3x3, a batch normalization layer and a RELU linear activation unit for three times to obtain output of first up-sampling;

(2-7) splicing the output of the first upsampling with the output of the fourth downsampling through a jump connection structure;

(2-8) performing three times on the output obtained by the first splicing, wherein the three times comprise an ordinary convolution layer with a convolution kernel size of 3x3, a batch normalization layer and an upsampling block of a RELU linear activation unit, so as to obtain the output of second upsampling;

(2-9) splicing the output of the second upsampling with the output of the third downsampling through a jump connection structure;

(2-10) performing three times on the output obtained by the second splicing, wherein the three times include an ordinary convolution layer with a convolution kernel size of 3x3, a batch normalization layer and an upsampling block of a RELU linear activation unit, so as to obtain a third upsampled output;

(2-11) splicing the output of the third upsampling with the output of the second downsampling through a jump connection structure;

(2-12) performing three times on the output obtained by the third splicing, wherein the three times include an ordinary convolution layer with a convolution kernel size of 3x3, a batch normalization layer and an upsampling block of a RELU linear activation unit, so as to obtain a fourth upsampled output;

(2-13) splicing the output of the fourth upsampling with the output of the first downsampling through a jump connection structure;

(2-14) executing four times on the output obtained by the fourth splicing, wherein the four times include a common convolution layer with a convolution kernel size of 3x3, a batch normalization layer, a RELU linear activation unit and a global maximum pooled downsampled sampling block with a pooled kernel size of 2x 2;

(2-15) performing four upsampling blocks including one ordinary convolution layer with a convolution kernel size of 3x3, one batch normalization layer, one RELU linear activation unit, and one upsampling block for restoring dimensional upsampling;

(2-16) performing a normal convolution layer with a convolution kernel size of 3x3, a batch normalization layer and a RELU linear activation unit;

(2-17) splicing the result of executing the ordinary convolutional layer with the convolutional kernel size of 3x3 for the first time with the output obtained by the fourth splicing of the jump connection module;

(2-18) forward propagation is carried out, and the final segmentation probability is obtained;

(2-19) executing a feature selection structure once, multiplying outputs obtained by splicing the source domain and the target domain for the third time and the fourth time with the final segmentation probability pixel by pixel respectively, and segmenting into 4 sub-feature maps in the channel dimension;

(2-20) executing a grouping feature alignment structure once, and grouping and inputting 4 sub-feature maps of the two domains to obtain a judgment result;

(2-20-1) performing four times on the sub-feature map of the source domain, including once a normal convolution layer with a convolution kernel of 4x4, once a RELU linear activation unit, and once a dropout layer with a probability of 0.5;

(2-20-2) performing one-time pooling to obtain a source domain judgment result by using the adaptive global average pooling with the pooling kernel size of 1x 1;

(2-20-3) performing four times on the sub-feature map of the target domain, including once a normal convolution layer with a convolution kernel of 4x4, once a RELU linear activation unit, and once a dropout layer with a probability of 0.5;

(2-20-4) performing one-time pooling to obtain a target domain discrimination result by using the adaptive global average pooling with the pooling kernel size of 1x 1;

(2-20-5) forward propagation is carried out, and the discrimination results of the source domain and the target domain are respectively obtained;

(2-21) calculating the discrimination results of the source domain and the target domain by using the binary cross entropy BCE loss as a loss function of the grouped feature aligned countermeasure structure to obtain countermeasure loss;

(2-22) calculating the discrimination results of the source domain and the target domain by using the MAE loss as a loss function of the packet feature alignment structure to obtain a loss so as to minimize the distance between the source domain and the target domain;

(2-23) calculating the final segmentation probability by using the binary cross entropy BCE loss as a loss function of the network model to obtain the segmentation loss;

(2-24) determining gradient vectors through back propagation, and updating model parameters;

and (2-25) judging whether the set number of turns is reached, if so, obtaining a trained segmentation model, and if not, returning to the step (2-1) to reload data and continue training.

Thirdly, testing and solving a multi-scale feature fusion and alignment domain self-adaptive cloud detection model:

the method comprises the following steps of obtaining a target domain remote sensing satellite image to be detected, inputting the target domain remote sensing satellite image into a trained multi-scale feature fusion and alignment domain self-adaptive cloud detection model for model testing, and obtaining a cloud detection prediction segmentation mask map, wherein the specific steps are as follows:

(1) Reading a target domain remote sensing satellite image, and leading out the target domain remote sensing satellite image into a tif format image;

(2) Merging the target domain remote sensing satellite images B4, B3 and B2 wave bands to obtain RGB three-channel images;

(3) Performing non-overlapping cutting on the target domain remote sensing satellite image to obtain a fixed size of 321 x 321;

(4) Inputting the preprocessed image into a trained multi-scale feature fusion and aligned domain self-adaptive cloud detection model for cloud pixel detection;

(5) And obtaining the well-segmented cloud mask image.

The method of the present invention is described below by taking the LANDSAT8 satellite image and the GF-2 satellite image as examples:

LANDSAT8 satellite images are used as source domain data, and GF-2 satellite images are used as target domain data. Selecting 50 LANDSAT8 satellite images containing different scenes and labels thereof and 34 GF-2 satellite images, and carrying out pretreatment, namely wave band merging, clipping, selecting and the like to obtain a 5040 Zhang Yuanyu data set with the size of 321 x 321 and a target domain data set, and carrying out normalized image labels on the LANDSAT8 satellite image labels. The method is utilized to train a domain self-adaptive cloud detection model, construct a domain self-adaptive network for multi-scale feature fusion and alignment, and provide a multi-scale feature fusion module and a feature alignment module. Compared with the existing cloud detection technology, the method and the device solve the problems that the existing cloud detection technology is difficult to detect the broken clouds and the boundaries and the source training model is difficult to popularize to the target data set due to domain differences.

As shown in fig. 4, it is a comparison graph of GF-2 satellite images and their labels with the segmentation result of the model of the present invention, wherein a is a GF-2 satellite image, B is a label of the GF-2 satellite image, and C is the cloud segmentation result of the method of the present invention. As can be seen from FIG. 4, the segmentation result obtained by the method of the present invention is basically the same as the label, and the effect of satisfying the practical application is achieved.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are merely illustrative of the principles of the invention, but various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A domain self-adaptive cloud detection method for multi-scale feature fusion and alignment is characterized by comprising the following steps:

11 Multi-scale feature fusion and aligned domain-adaptive cloud detection preparation: classifying the source domain remote sensing satellite image and the target domain remote sensing satellite image according to the cloud content, and selecting image data of each percentage according to a certain proportion; preprocessing such as wave band merging, cutting and the like is carried out on the source domain remote sensing satellite image and the target domain remote sensing satellite image; carrying out normalization pretreatment on the source domain remote sensing satellite image label;

12 Building and training a domain-adaptive cloud detection model for multi-scale feature fusion and alignment: constructing a multi-scale feature fusion and alignment domain adaptive cloud detection model, inputting the preprocessed remote sensing image and the label into the multi-scale feature fusion and alignment domain adaptive cloud detection model for training to obtain the trained multi-scale feature fusion and alignment domain adaptive cloud detection model;

13 Testing and solving for domain-adaptive cloud detection models for multi-scale feature fusion and alignment: inputting the target domain remote sensing satellite images which are not marked in a large amount into a trained multi-scale feature fusion and aligned domain self-adaptive cloud detection model for model testing and obtaining a cloud detection prediction segmentation result.

2. The method of claim 1, wherein the preparation for the multi-scale feature fusion and aligned domain-adaptive cloud detection comprises the following steps:

23 Normalizing the pixel values of the source domain remote sensing satellite image label: 0. 64 and 128 are 0 representing clean pixels, 192 and 256 are 1 representing cloud pixels, and a binary single-channel label image is obtained;

24 Derive processed images into tif format;

3. The method for multi-scale feature fusion and alignment domain-adaptive cloud detection according to claim 1, wherein the constructing and training the multi-scale feature fusion and alignment domain-adaptive cloud detection model comprises the following steps:

312 Construct a multi-scale fusion module for generating fused multi-scale features;

the multi-scale fusion module comprises 4 void convolution layers with different expansion rates and 1 image pooling operation;

the four void rates are 1, 6, 12 and 18, and characteristics with different receiving fields are generated;

an image pool module generates channel features;

obtaining five parallel outputs and obtaining multi-scale characteristics through one-time splicing operation and one-time 1 multiplied by 1 convolution;

317 Constructing a grouping feature alignment structure for reducing feature distribution gaps between a source domain and a target domain to obtain domain invariant feature representation, wherein the grouping feature alignment structure divides feature mapping related to cloud extracted from a kth hidden layer of a network into a series of groups, and a counterstudy strategy is respectively used for each grouping feature of the source domain and the target domain;

318 Firstly, respectively passing an original source domain remote sensing satellite image and a target domain remote sensing satellite image through an encoder structure, then passing the original source domain remote sensing satellite image and the target domain remote sensing satellite image through a multi-scale feature fusion structure for one time, passing the output of the multi-scale feature fusion structure through a decoder structure, then connecting a fourth down-sampling of the encoder with a first up-sampling of the decoder through a skip connection structure, connecting the third down-sampling with a second up-sampling through the skip connection structure, connecting the second down-sampling with the third up-sampling through the skip connection structure, connecting the first down-sampling with the fourth up-sampling through the skip connection structure, passing the output of the last two layers of source domains and the output of the target domain after the skip connection and splicing through a feature alignment module, and finally completing the construction of the whole model through a small-size common convolution structure;

322 Execute a one-pass encoder structure to obtain four down-sampled outputs;

executing a common convolution layer with convolution kernel of 3x3, a batch normalization layer, a RELU linear activation unit and layer1 of resnet34 to obtain a first down-sampling output;

executing layer2 of resnet34 on the first down-sampling output to obtain a second down-sampling output;

executing layer2 of resnet34 on the second down-sampling output to obtain a third down-sampling output;

executing layer2 of resnet34 on the third down-sampling output to obtain a fourth down-sampling output;

323 Performing a multi-scale feature fusion layer once on the fourth down-sampled output, wherein the output of the multi-scale feature fusion layer is as follows;

the first path executes a hole convolution layer with a first expansion factor of 1 and a convolution kernel of 3x3 on the output of the fourth downsampling of the encoder structure to obtain a first output;

the second path executes a cavity convolution layer with the expansion factor of 6 and the convolution kernel of 3x3 on the output of the fourth downsampling of the encoder structure to obtain a second output;

the third path executes a hole convolution layer with a dilation factor of 12 and a convolution kernel of 3x3 on the output of the fourth downsampling of the encoder structure to obtain a third output;

the fourth path executes a hole convolution layer with a first expansion factor of 18 and a convolution kernel of 3x3 on the output of the fourth downsampling of the encoder structure to obtain a fourth output;

the fifth path performs one-time global average pooling on the output of the fourth downsampling of the encoder structure and one-time common convolution layer with convolution kernel of 1x1 to obtain a fifth output;

324 Splicing the results output by the five paths to obtain multi-scale fusion characteristic output;

325 A common convolution layer with a convolution kernel size of 1x1 is performed once on the multi-scale fusion feature output;

326 Execute three times of upsampling blocks including a common convolution layer with a convolution kernel size of 3x3, a batch normalization layer and a RELU linear activation unit to obtain output of first upsampling;

327 Output of the first upsampling is spliced with output of the fourth downsampling through a jump connection structure;

328 Carry out three times to the output that the first concatenation got and include the ordinary convolution layer, one batch of normalization layers, one RELU linear activation unit's upsampling block of the size of convolution kernel 3x3 once, get the output sampled for the second time;

329 The output of the second upsampling is spliced with the output of the third downsampling through a jump connection structure;

3210 Performing three times on the output obtained by the second splicing, wherein the three times include an ordinary convolution layer with a convolution kernel size of 3x3, a batch normalization layer and an upsampling block of a RELU linear activation unit, and obtaining the output of upsampling for the third time;

3211 Output of the third upsampling is spliced with output of the second downsampling through a jump connection structure;

3212 Carry out three times to the output that the third concatenation got and include the ordinary convolution layer, one batch of normalization layers, one RELU linear activation unit's upsampling block of one time convolution kernel size 3x3, get the output of the fourth upsampling;

3213 The output of the fourth upsampling is spliced with the output of the first downsampling through a jump connection structure;

3214 Carry out four times to the output that the fourth concatenation got including the ordinary convolution layer that the convolution kernel size is 3x3 once, the one-time batch normalization layer, the linear activation unit of one RELU, the lower sampling block of the overall situation maximum pooling downsampling that the pooling kernel size is 2x2 once;

3215 Four times including one convolution kernel of 3x3 size, one batch normalization layer, one RELU linear activation unit, one upsampling block to restore dimensional upsampling;

3216 Execute once a normal convolution layer with convolution kernel size of 3x3, once a batch normalization layer, once a RELU linear activation unit;

3217 The result of executing the ordinary convolutional layer with the convolutional kernel size of 3x3 for one time is spliced with the output obtained by the fourth splicing of the jump connection module;

3218 Forward propagation to obtain the final segmentation probability;

3219 Executing a feature selection structure once, multiplying outputs obtained by third and fourth splicing of the source domain and the target domain with the final segmentation probability pixel by pixel respectively, and segmenting into 4 sub-feature graphs in channel dimension;

3220 Executing a grouping feature alignment structure once, and grouping and inputting 4 sub-feature graphs of two domains to obtain a judgment result;

executing four blocks including a common convolution layer with 4x4 convolution kernel, a RELU linear activation unit and a dropout layer with 0.5 probability for one time on the sub-feature graph of the source domain;

performing self-adaptive global average pooling with the pooling kernel size of 1x1 once to obtain a source domain discrimination result;

executing four blocks including a common convolution layer with 4x4 convolution kernel, a RELU linear activation unit and a dropout layer with 0.5 probability for one time on the sub-feature graph of the target domain;

performing primary self-adaptive global average pooling with the pooling kernel size of 1x1 to obtain a target domain discrimination result;

forward propagation is carried out, and discrimination results of a source domain and a target domain are respectively obtained;

3221 Using binary cross entropy BCE loss as a loss function of a grouped feature aligned countermeasure structure to calculate the discrimination results of the source domain and the target domain to obtain countermeasure loss;

3222 Using the MAE loss as a loss function of the packet feature alignment structure to calculate the discrimination result of the source domain and the target domain to obtain the loss so as to minimize the distance between the source domain and the target domain;

3223 Using binary cross entropy BCE loss as a loss function of the network model to calculate the final segmentation probability to obtain segmentation loss;

3224 Backward propagation to determine gradient vectors, updating model parameters;

3225 Whether the set number of turns is reached is judged, if yes, a trained segmentation model is obtained, otherwise, the data is returned (321) to be reloaded to continue training.

4. The method for multi-scale feature fusion and alignment domain-adaptive cloud detection according to claim 1, wherein the testing and solving of the multi-scale feature fusion and alignment domain-adaptive cloud detection model comprises the following steps:

41 Read the remote sensing satellite image of the target domain, derive it into tif format image;

43 Non-overlapping cutting is carried out on the target domain remote sensing satellite image, and the target domain remote sensing satellite image is cut into a fixed size of 321 x 321;

45 Get the well-segmented cloud mask map.