CN111462124A

CN111462124A - Remote sensing satellite cloud detection method based on Deep L abV3+

Info

Publication number: CN111462124A
Application number: CN202010241130.XA
Authority: CN
Inventors: 黄焱; 杜飞飞; 倪彦朝; 姜炳强; 王天玮
Original assignee: Beijing Mechanical And Electrical Engineering General Design Department; Wuhan Zmvision Technology Co ltd
Current assignee: Beijing Mechanical And Electrical Engineering General Design Department; Wuhan Zmvision Technology Co ltd
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2020-07-28

Abstract

A remote sensing satellite cloud detection method of Deep L abV3+ includes the steps of S1, obtaining satellite load simulation images of different scenes, S2, inputting image data into a Deep convolution neural network to obtain semantic information feature maps a and semantic information feature maps b of different levels, S3, directly transmitting the semantic information feature maps a into a decoding area, enabling the semantic information feature maps b to be subjected to parallel cavity convolution, S4, obtaining output feature maps b of a coding process through the feature maps of the levels after the parallel cavity convolution in S3, S5, changing the number of channels of the bottom layer semantic feature maps a obtained in the coding process, enlarging the feature maps b in the coding process to be in the size, S6, splicing the feature maps a and the feature maps b obtained in S5 to extract features again, S7, conducting amplification and up-sampling on the feature maps obtained in S6 again to obtain segmented final feature maps, and solving the problem that the segmentation effect of the segmented images is poor in the image decoding-coding process of the existing CNN image segmentation cloud detection method.

Description

Remote sensing satellite cloud detection method based on Deep L abV3+

Technical Field

The invention relates to the field of Deep learning and computer vision application semantic segmentation, in particular to a remote sensing satellite cloud detection method based on Deep L abV3 +.

Background

Semantic segmentation and tracking is an important research direction in computer vision, and has extremely wide application in a plurality of fields such as land satellite image division, unmanned vehicle automatic driving, unmanned vehicle aerial image entity identification, face identification, medical image and the like, however, the semantic segmentation faces a plurality of challenges such as non-uniform image object picture, different image shooting angles, multi-posture or multi-view of an object, external illumination change, complex image semantic information and the like, so that the reliable and accurate image segmentation method has very important practical significance.

The conventional segmentation method is mainly based on digital image processing, topology, mathematics and other methods to perform image segmentation, such as a threshold-based segmentation method, and the basic idea of the threshold method is to calculate one or more gray threshold values based on gray features of an image, compare the gray value of each pixel in the image with the threshold values, and finally classify the pixels into proper classes according to the comparison results. Therefore, the most critical step of the method is to solve the optimal gray threshold according to some criterion function. The thresholding method is particularly suitable for maps where the object and the background occupy different gray level ranges. If the image only has two categories of a target and a background, only one threshold value needs to be selected for segmentation, and the method becomes single-threshold segmentation; however, if there are multiple objects in the image to be extracted, the segmentation with a single threshold value will be wrong, in this case, multiple threshold values need to be selected to separate each object, and this segmentation method is correspondingly called multi-threshold segmentation. The traditional segmentation method mainly has the characteristics of simple calculation and high efficiency, but only considers the characteristics of the gray value of a pixel point, and does not generally consider the spatial characteristics, so the traditional segmentation method is sensitive to noise and has low robustness.

In recent years, with the increase of computer power and the continuous development of deep learning, some conventional segmentation methods have been unable to compare in effect with the segmentation methods based on deep learning. Convolutional neural networks have a remarkable effect in the field of image segmentation, such as the well-known FCN full convolutional neural network, where the FCN replaces the fully-connected network in the conventional CNN with a convolutional network, so that the network can accept pictures of any size and output segmentation maps of the same size as the original pictures. Only then can a classification be made for each pixel. Meanwhile, in order to solve the influence of convolution and pooling on the image size, a method for recovering the image size by using an upsampling mode is provided. Even though the FCN solves the problems of any image input size, local feature extraction, and the like, in the image decoding-encoding process, the conventional CNN image segmentation method has the problems that the image segmentation result is poor in effect under the conditions that the internal data structure is lost, the semantic bottom layer information is lost, the small object image information cannot be reconstructed, and the like, due to the problems that the learning cannot be performed, the spatial hierarchy information is lost, and the small object information cannot be reconstructed in the sampling and pooling layer.

Disclosure of Invention

In view of the above, the present invention has been made to provide a method for remote sensing satellite cloud detection based on Deep L abV3+, which overcomes or at least partially solves the above problems.

The technical scheme provided by the invention is as follows:

a remote sensing satellite cloud detection method of Deep L abV3+, the method comprises the following steps:

s1, acquiring satellite load simulation images of different scenes, reading images of a multi-dimensional channel, normalizing the images, and converting the data format of the images;

s2, inputting the image data into a deep convolution neural network, setting the size of a convolution kernel by using serial hole convolution, and obtaining semantic information characteristic graphs a and b of different levels according to the size of the convolution kernel;

s3, directly transmitting the semantic information feature graph a into a decoding area, setting the size of cavity convolution expansion through parallel cavity convolution on the semantic information feature graph b, extracting features, combining the feature, performing 1x1 convolution compression features, and finally adding an average pooling layer to obtain 5 level feature graphs;

s4, splicing the feature maps of all levels after the parallel hole convolution in the S3 on a channel dimension to obtain an output feature map b in the encoding process;

s5, carrying out convolution on the bottom-layer semantic feature map a obtained in the encoding process by 1x1, changing the number of channels, outputting a feature map b in the encoding process, carrying out up-sampling by adopting a bilinear interpolation method, and enlarging the size of a picture;

s6, splicing the characteristic graph a and the characteristic graph b obtained in the step S5, performing convolution with a convolution kernel of 3x3, and re-extracting characteristics;

and S7, performing up-sampling on the feature map obtained in the step S6 in an enlarged size again to obtain a final feature map after segmentation.

Further, in S1, the input image is processed with a keras function using a keras frame to convert the image into a tensor format.

Further, in S2, the convolution kernels have sizes of 32, 64, 128, and 256, respectively, and 256x256x32, 128x128x64, 64x64x128, and 32x32x256 semantic information feature maps of different levels are sequentially obtained.

Further, the size of the semantic information feature map a is 128x128x64, and the size of the semantic information feature map b is 32x32x 256.

Further, in S3, the hole convolution expansion size is set to [1, 6, 12, 18], the pooling layer is a depth separable convolution layer, wherein the step size is 2;

further, in S5, the bilinear interpolation method includes:

the known function f is in Q₁₁＝(x1，y1)，Q₁₂＝(x1，y2)，Q₂₁(x2, y1), and Q₂₂The pixel values of four points (x2, y2) are required to have a size of (x, y) point,

and performing linear interpolation in the x direction of the coordinate axis to obtain:

and performing linear interpolation in the y direction of the coordinate axis to obtain:

the size of the pixel value at the point P ═ x, y is obtained.

Further, a loss function in the neural network model training process adopts a binary _ cross entropy loss function, and a calculation formula is as follows:

wherein i represents the position of the ith point, y'_iThe predicted pixel value, y, representing the ith point_iRepresenting the real pixel value of the ith point.

Further, Adam function is adopted as an optimization function to reduce the size of the loss function value.

Further, the intermediate convolutional layer of the Deeplab V3 model was repeated 16 times, followed by the batch normalization and Relu activation functions.

Further, the neural network model is evaluated by adopting an image segmentation evaluation index Miou. Wherein, the calculation formula is as follows:

FN is false negative, which means that the reality is 1 and the prediction is 0;

TP is true, which means that the reality is 1, and the prediction is 1;

FP is false positive, indicating a true 0, predicted to be 1.

Compared with the prior art, the invention at least has the following beneficial effects:

the remote sensing satellite cloud detection method based on Deep L abV3+ provided by the invention introduces a hollow convolution and hollow space convolution pooling pyramid structure, increases the sense field of convolution under the condition of no information loss, and enables each convolution output to contain information in a larger range, so that the image segmentation result is more accurate and accords with space semantic logic, and the cloud detection direction has better cloud segmentation effect than the traditional cloud segmentation direction.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

fig. 1 is a flowchart of a remote sensing satellite cloud detection method based on Deep L abV3+, in an embodiment of the present invention;

fig. 2 is a Deep neural network structure diagram based on Deep L abv3+ in the second embodiment of the invention.

Detailed Description

Example 1

specifically, an OPENCV function library is adopted to read an image of a multi-dimensional channel, a keras framework is adopted after the image is normalized, a keras framework is adopted, an input image is processed by a keras.

In some preferred embodiments, 15530 satellite load simulation data are acquired, 530 invalid data are deleted, the remaining 15000 valid data are divided according to 7:2:1, 10500 data are used for model training, 3000 data are used for model verification, and 1500 data are used for model testing, as shown in table one below:

watch 1

Training data	Verification data	Test data
			10500(70％)	3000(20％)	1500(10％)

Aiming at the cloud layer characteristics of different scenes, the invention respectively trains 4 types of models, namely a full-landform all-day-time infrared model, a marine daytime model, a land daytime model and a polar all-day-time infrared model. To simplify the model, the model is a two-class model, i.e. the truth mask is the dichotomy result: "cloud" and "not cloud".

The satellite cloud data includes 19 visible and near-infrared channels, 6 mid-infrared and far-infrared channels, for a total of 25 channels. The different data channels used by each model are shown in table two below:

watch two

Model name	Data channel
		All-landform all-time infrared model	6 infrared band data, the channel number is 20 ~ 25
Ocean daytime model	6 infrared band data, the channel number is 20 ~ 25
		Land day model	Visible light and infrared, the channel numbers are 1-5, 7, 16-25
Polar region all-time infrared model	6 infrared band data, the channel number is 20 ~ 25

Different models aim at different scenes, and the application scene effect has better accuracy than the training of a full landform model.

specifically, the format of the input image in S2 is hxwxc (height width xchanchannel), and taking the image size as 512x512x25 as an example, the input image is subjected to a DCNN deep convolution neural network, and is convolved by using serial AtrousConvolution holes, the sizes of convolution kernels are 32, 64, 128 and 256, so as to sequentially obtain semantic information feature maps of 256x256x32, 128x128x64, 64x64x128 and 32x32x256 at different levels;

specifically, the semantic information feature map a is 128x128x64, the semantic information feature map b is 32x32x256, and the feature map is directly transmitted into a decoding area; and (b) performing parallel cavity convolution on the feature graph b, performing feature extraction by respectively using cavity convolutions with different cavity convolution expansion sizes, merging, and performing 1x1 convolution compression features. The convolution expansion sizes of the parallel AtrousConvolition holes are [1, 6, 12 and 18], and finally an average pooling layer is added, wherein the pooling layer is a depth separable convolution layer, the step length is 2, and 5 level characteristic graphs are obtained;

in some preferred embodiments, the output signature b of the encoding process is obtained in a size of 32x32x 1280.

in some preferred embodiments, the underlying semantic feature map b is subjected to 1x1 convolution, the number of channels is changed, a feature map with a size of 32x32x256 is obtained, then, bilinear interpolation is adopted for upsampling, and the amplification size is 4 times that of the original feature map, so that a feature map with a size of 128x128x256 is obtained.

Specifically, the bilinear interpolation method includes:

the size of the pixel value at the point P ═ x, y is obtained.

in some preferred embodiments, the feature map a and the feature map b obtained in S5 are spliced to obtain a 128x128x320 feature map, and then the feature is re-extracted through convolution kernel convolution of 3x3 to obtain a feature map of 128x128x256 size;

In some preferred embodiments, the feature map obtained in S6 is subjected to up-sampling of 4 times size again, and dimension reduction is performed on the convolutional layer with the size of 1x1 and the size of the convolutional kernel of 1, so as to obtain a feature map of 512x512x1, which is the same as the size of the original map, wherein the feature map channel dimension 1 is a grayscale map, and the segmentation result is the final prediction result.

The real cloud detection label map is a gray scale map, the image size is 512x512x1, that is, the value set of the pixel values is {0,1 }. 0 represents that the point is a non-cloud part, and 1 represents that the point is a cloud.

In some preferred embodiments, the loss function in the neural network model training process adopts a binary _ cross entropy loss function, and the calculation formula is as follows:

where i represents the position of the ith point, yi_iThe predicted pixel value, y, representing the ith point_iRepresenting the real pixel value of the ith point.

In some preferred embodiments, Adam's function is used as the optimization function to reduce the magnitude of the loss function value.

In some preferred embodiments, the intermediate convolutional layer of the Deeplab V3 model is repeated 16 times, followed by the batch normalization and Relu activation functions.

In some preferred embodiments, the neural network model is evaluated using an image segmentation evaluation index Miou. Wherein, the calculation formula is as follows:

TP is true, which means that the reality is 1, and the prediction is 1;

FP is false positive, indicating a true 0, predicted to be 1.

It should be understood that the specific order or hierarchy of steps in the processes disclosed is an example of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not intended to be limited to the specific order or hierarchy presented.

In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus, the following claims are hereby expressly incorporated into the detailed description, with each claim standing on its own as a separate preferred embodiment of the invention.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. Of course, the processor and the storage medium may reside as discrete components in a user terminal.

For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in memory units and executed by processors. The memory unit may be implemented within the processor or external to the processor, in which case it can be communicatively coupled to the processor via various means as is known in the art.

What has been described above includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean a "non-exclusive or".

Claims

1. A remote sensing satellite cloud detection method of Deep L abV3+, which is characterized by comprising the following steps:

2. The method for remotely sensing the satellite cloud by Deep L abV3+ as claimed in claim 1, wherein in S1, a keras frame is adopted, and a keras.

3. The method for remotely sensing satellite cloud detection through Deep L abV3+ as claimed in claim 1, wherein in S2, the sizes of convolution kernels are 32, 64, 128 and 256 respectively, and semantic information feature maps of 256x256x32, 128x128x64, 64x64x128 and 32x32x256 at different levels are sequentially obtained.

4. The method for remotely sensing satellite cloud by Deep L abV3+, according to claim 3, wherein the size of semantic information feature map a is 128x128x64, and the size of semantic information feature map b is 32x32x 256.

5. The method for remotely sensing the satellite cloud by Deep L abV3+, according to claim 1, wherein in S3, the size of the hole convolution expansion is set to [1, 6, 12, 18], the pooling layer is a depth separable convolution layer, and the step size is 2.

6. The remote sensing satellite cloud detection method of Deep L abV3+, according to claim 1, wherein in S5, the bilinear interpolation method is:

the size of the pixel value at the point P ═ x, y is obtained.

7. The method for remotely sensing the satellite cloud by Deep L abV3+, according to claim 1, wherein a loss function in a neural network model training process is a binary _ cross entropy loss function, and a calculation formula is as follows:

8. The method for remotely sensing the satellite cloud through Deep L abV3+, according to claim 1, wherein an Adam function is adopted as an optimization function to reduce the size of the loss function value.

9. The method for remotely sensing the cloud of Deep L abV3+ as recited in claim 1, wherein the number of times of repetition of the intermediate convolutional layer of the Deep v3 model is 16, and the convolutional layer is followed by batch normalization and Relu activation functions.

10. The method for remotely sensing satellite cloud detection through Deep L abV3+ as claimed in claim 1, wherein the evaluation of the neural network model is performed by using an image segmentation evaluation index Miou, wherein the calculation formula is as follows:

TP is true, which means that the reality is 1, and the prediction is 1;

FP is false positive, indicating a true 0, predicted to be 1.