CN117152616A

CN117152616A - Remote sensing image typical object extraction method based on spectrum enhancement and double-path coding

Info

Publication number: CN117152616A
Application number: CN202311169642.XA
Authority: CN
Inventors: 张柏玮; 李玉霞; 张靖霖; 司宇; 何媛; 童忠贵; 邓万涛
Original assignee: University of Electronic Science and Technology of China; South West Institute of Technical Physics
Current assignee: University of Electronic Science and Technology of China; South West Institute of Technical Physics
Priority date: 2023-09-12
Filing date: 2023-09-12
Publication date: 2023-12-01

Abstract

The invention discloses a remote sensing image typical feature extraction method based on spectrum enhancement and double-path coding, which utilizes a spectrum enhancement module to introduce external information, and enhances the utilization capacity of a network on spectrum dimension information, thereby solving the problems of complex spectrum characteristics and difficult spectrum information extraction of ground features; the two-way coding module is utilized to fuse the space dimension information and the spectrum dimension information, so that the extraction capacity and the utilization capacity of the network for complex space dimension information are maintained while the spectrum dimension information is further enhanced, the automatic and intelligent identification of multiple types of typical objects is realized, and the identification precision is higher.

Description

Remote sensing image typical object extraction method based on spectrum enhancement and double-path coding

Technical Field

The invention belongs to the technical field of remote sensing image processing, and particularly relates to a method for extracting typical objects of a remote sensing image based on spectrum enhancement and double-path coding.

Background

The multi-class ground object element extraction and recognition task belongs to a typical semantic segmentation task, and typical ground objects such as roads, buildings, vegetation, water bodies and the like in a remote sensing image are required to be segmented at a pixel level. The multi-class feature element extraction is one of the core technologies of the high-resolution remote sensing application service chain, and plays a great role in various fields such as global change, disaster detection, resource management and control and the like. In the field of global change, the spatial distribution of the types of the ground features such as buildings, vegetation, water bodies and the like is extracted, so that the change condition of each ground feature can be known; in the disaster detection field, the remote sensing image can dynamically detect disaster places and areas in real time, and information support is provided for the establishment of disaster relief schemes; in the field of resource management and control, the remote sensing technology can provide the distribution condition of resource types in real time, and is beneficial to realizing the fine management of resources. In order to obtain the space distribution information of the ground object classification, the most accurate method adopts manual labeling. However, this method has high cost and low efficiency, and cannot be applied on a large scale. Therefore, a targeted algorithm needs to be designed, and automation, intellectualization and high efficiency of the classification of the remote sensing image features are realized by utilizing a computer.

In the early days, visual methods were adopted to reduce the workload of manual labeling, such as traditional methods of threshold segmentation, classifier, index and the like by using remote sensing processing software. The method realizes semi-automation of the feature extraction of the remote sensing image to a certain extent, but still has the defects of low labeling precision, poor universality and the like. Then, object-oriented image analysis technology gradually becomes a main means for researching the extraction of the ground features of the remote sensing image. The object-oriented image analysis technology divides the spatial semantic information of the remote sensing image into three layers, namely a basic feature layer, object semantics and scene semantics, and the contained semantic information is from low to high, and the object-oriented image analysis technology aims at realizing the mapping of basic features of the image, such as spectrum, texture and the like, to high-level semantics by using a computer. However, it is difficult to establish a high-precision mapping relationship using only the manually constructed underlying features.

In recent years, with the development of deep learning technology, the field of computer vision is also making a great breakthrough, and for simple image classification scenes in daily life, the deep learning model can be comparable to human beings. The semantic segmentation task of the multi-classification ground object extraction task of the remote sensing image is greatly improved in precision under the support of the deep learning technology, but has a larger gap from the human level. Therefore, the method is based on the deep learning technology, and the method realizes high-precision and high-efficiency ground feature information extraction, and also becomes a hot research direction.

In the prior art, the patent name is: a remote sensing image typical feature extraction method based on a multi-task attention mechanism utilizes four attention modules to fuse global features from the inside and the outside, and increases the receptive field of a model, thereby solving the problems of wide distribution range of feature elements and large area of a region; constructing a multi-decoder structure by utilizing a multi-task mechanism, so that competition of different ground object types to model parameters is reduced, and misjudgment of similar ground objects is reduced; the edge extraction task and the distance graph extraction task are utilized, so that edge constraint is increased, the effect of edge extraction is improved, and finally, intelligent extraction of multiple types of typical object elements is realized; however, the typical object extraction method of the remote sensing image based on the multitasking attention mechanism has the following problems:

1. the manual intervention steps are more. The multi-decoder quad-attentive extraction model requires processing such as single-hot code and edge extraction on the label image, and all the processing needs to be manually operated in advance.

2. The accuracy is low. The multi-decoder quadruple attention extraction model has lower accuracy and larger lifting space although the extraction accuracy is improved.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a remote sensing image typical feature extraction method based on spectrum enhancement and double-path coding, which fully discovers and utilizes spectrum information of a remote sensing image by utilizing the spectrum enhancement method and fully fuses different dimensional information of the remote sensing image by double-path coding, thereby realizing intelligent extraction of various typical feature elements, avoiding manual intervention steps of a multi-decoder four-attention extraction model and further improving extraction precision to realize the purposes of the invention, and is characterized by comprising the following steps:

(1) Constructing a training data set;

(1.1) downloading a plurality of remote sensing images, and cutting each remote sensing image into a pattern block with the size of m x n;

(1.2) marking typical objects in the remote sensing image with different shapes by using a semantic segmentation marking tool, wherein the typical objects comprise a background, a watertight ground, a vehicle, trees, grasslands and buildings;

(1.3) setting the pixel value corresponding to each typical object to 0, 1,2, 3, 4, 5, thereby generating a label image, wherein the pixel value of the background area is set to 0, the pixel value of the watertight ground is set to 1, and so on;

(1.4) taking each remote sensing image and the corresponding label image as a group of training data, thereby forming a training data set;

(2) Constructing and training a spectrum enhancement two-way coding network;

taking a group of training data as input of a spectrum enhanced two-way coding network;

spectral enhancement two-way coding networkStarting with an initial information extraction module, wherein the initial information extraction module comprises three convolution modules and a 2x2 average value pooling layer; wherein each convolution module comprises a 3x3 convolution layer, a batch normalization layer and a Relu activation function; after the remote sensing image passes through the initial information extraction module, an initial feature map F is obtained ₁ ^C×H×W Wherein, C, H and W respectively represent the channel number, the height and the width of the initial characteristic diagram; then the initial feature map F ₁ ^C×H×W Inputting the spectrum enhancement operation to a spectrum enhancement module;

the spectrum enhancement module comprises a spectrum generator and a spectrum attention module; the spectrum generator comprises 6 parallel spectrum generating modules with the same structure, and each class of typical objects is distributed with one spectrum generating module; the spectrum generation module comprises a spectrum generation operation and two convolution operations;

initial feature map F ₁ ^C×H×W First, the spectrum is sent into a spectrum generator to obtain a spectrum enhancement characteristic diagramAnd 6 initial probability weight maps +.>k=1, 2, 6; then the spectrum enhancement feature map->Input to a spectral attention module; in the spectral attention module,/a->Sequentially passing through a 4x4 maximum value pooling layer, a global convolution layer and two one-dimensional convolution layers with convolution kernel size of 5 to obtain a characteristic diagram +.>Then will->And->Multiplying to obtain spectrum enhancement result characteristic diagram +.>Finally will->And F ₁ ^C×H×W Together as input to a two-way encoder;

the two-way encoder comprises two parallel branches: a spatial encoder and a channel encoder; the space encoder comprises three cascaded space encoding modules, the channel encoder comprises three groups of channel encoding modules which are connected in series, and each channel encoder module comprises two convolution modules and a channel attention module;

the input of the spatial encoder is F ₁ ^C×H×W ，F ₁ ^C×H×W Obtaining a space characteristic diagram after passing through a space encoderThe input of the channel encoder is +.>Obtaining a channel characteristic diagram after passing through a channel encoder>Then will->And->Superimposed in the channel direction, the output of the two-way encoder is obtained>Then the characteristic diagram F is obtained after the convolution operation of the two convolution modules ₃ ^C×H×W Then F is carried out ₃ ^C×H×W Sending to a decoder;

the spectrum enhancement two-way coding network comprises 6 decoders with the same structure in total, each type of typical feature is allocated with one decoder, and each decoder outputs a classification probability weight graph corresponding to the typical feature;

in the spectrum enhancement two-way coding network, the number of decoders is the same as the class number of typical feature elements, and the spectrum enhancement two-way coding network totally comprises 6 decoders with the same structure, each class of typical feature is allocated with one decoder, and each decoder outputs a classification probability weight graph corresponding to the typical feature;

wherein each decoder comprises four attention modules in parallel, denoted PAM, CAM, LAM and SAM;

in PAM, F ₃ ^C×H×W Three branches are obtained after 3 parallel convolution operations, and the results of the three branches are respectively recorded asWill->And->After dot multiplication, carrying out softmax operation, and then combining withDot product to obtain PAM output result, which is marked as +.>

In CAM, F ₃ ^C×H×W Three branches are obtained after 3 parallel convolution operations, and the results of the three branches are respectively recorded asWill->And->After dot multiplication, performing softmax operation, and then combining with +.>The point-of-time multiplication is performed,obtaining the output result of CAM, which is marked as +.>

In LAM, F ₃ ^C×H×W After 2 parallel convolution operations, two branches are obtained, and the results of the two branches are respectively recorded asWill->Firstly, performing softmax operation to obtain an attention probability map att of the LAM _L Then is combined withSumming to obtain output result of LAM, which is marked as +.>

In SAM, F ₃ ^C×H×W Sequentially passing through a 4x4 maximum value pooling layer, a global convolution layer and two one-dimensional convolution layers with convolution kernel size of 5 to obtainThen F is carried out ₃ ^C×H×W And->Multiplying to obtain spectrum enhancement result characteristic diagram

Will beAnd->After summation, a layer of convolution module is adopted to obtain an output characteristic diagram of a single channel, and then the output characteristic diagram is matched withSumming and then go through upscalingAfter sampling, the decoder outputs a classification probability weight graph of the typical object;

calculating Loss function value Loss of spectrum enhanced two-way coding model after the training of the round _total ：

Wherein,loss value representing ith preliminary probability weight map and corresponding label image, loss _att Loss of attention value, loss of attention, for LAM _seg Loss values extracted for ground objects;

finally, training the spectrum enhancement two-way coding network by utilizing each group of training data until the loss function converges, and stopping training, thereby obtaining the spectrum enhancement two-way coding network after training;

(3) Performing typical object visual extraction on the remote sensing image;

cutting a remote sensing image to be extracted into a pattern block with m x n, inputting the pattern block into a trained spectrum enhanced two-way coding network, outputting tag values 0, 1,2, 3, 4 and 5 corresponding to each typical feature in the remote sensing image, and mapping the tag values to a color range to form a visual image.

The invention aims at realizing the following steps:

according to the remote sensing image typical feature extraction method based on spectrum enhancement and double-path coding, external information is introduced by utilizing a spectrum enhancement module, and the utilization capacity of a network on spectrum dimension information is enhanced, so that the problems of complex spectrum characteristics and difficult spectrum information extraction of ground features are solved; the two-way coding module is utilized to fuse the space dimension information and the spectrum dimension information, so that the extraction capacity and the utilization capacity of the network for complex space dimension information are maintained while the spectrum dimension information is further enhanced, the automatic and intelligent identification of multiple types of typical objects is realized, and the identification precision is higher.

Meanwhile, the remote sensing image typical object extraction method based on spectrum enhancement and double-path coding has the following beneficial effects:

(1) The invention is based on the traditional deep learning network structure, and the spectrum enhancement module is introduced to improve the utilization capability of the network model on spectrum information, and can also improve the extraction precision of the network model on various typical objects;

(2) Aiming at the characteristics of complex spectrum information, useful spectrum information and interference light information in a remote sensing image, a spectrum attention module is constructed, the useful spectrum information can be enhanced by spectrum attention, and useless spectrum information is restrained;

(3) Aiming at the problem that fusion and utilization of different dimension information and different scale information in a remote sensing image are difficult, the invention introduces a two-way coding module, changes a single-way coder in a general deep learning network into two parts of a channel coder and a space coder, extracts remote sensing image information from the channel dimension and the space dimension respectively, enhances the utilization capacity of the network to spectrum information, simultaneously maintains the extraction capacity of the network to space information, and enhances the utilization capacity of the network to multi-scale information.

Drawings

FIG. 1 is an overall block diagram of a spectrally enhanced two-way coding network of the present invention;

FIG. 2 is a block diagram of a single channel coding module;

FIG. 3PAM block diagram;

FIG. 4 is a diagram of a CAM structure;

FIG. 5 is a block diagram of the LAM;

FIG. 6SAM structure diagram;

fig. 7 shows experimental results, (a) shows an original image, (b) shows a label image, (c) shows a UNet experimental result diagram, (d) shows a multi-decoder four-attention network experimental result diagram, and (e) shows a spectrum enhancement two-way coding network experimental result diagram

Detailed Description

The following description of the embodiments of the invention is presented in conjunction with the accompanying drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.

Examples

For convenience of description, related terms appearing in the detailed description will be described first:

PAM (Position Attention Module): position attention module

CAM (Channel Attention Module): channel attention module

LAM (Label Attention Module): label attention module

SAM (Edge Attention Module): spectrum attention module

In this embodiment, the method for extracting the typical feature of the remote sensing image based on spectrum enhancement and double-path coding comprises the following steps:

(1) Constructing a training data set;

(1.1) downloading a plurality of remote sensing images, and cutting each remote sensing image into a pattern block with the size of m x n, wherein in the embodiment, each remote sensing image is cut into a pattern block with the size of 1024 x 1024;

(2) Constructing and training a spectrum enhancement two-way coding network;

as shown in fig. 1, the spectrally enhanced two-way coding network starts with an initial information extraction module comprising three convolution modules and one2x2 average pooling layer; wherein each convolution module comprises a 3x3 convolution layer, a batch normalization layer and a Relu activation function; after the remote sensing image passes through the initial information extraction module, an initial feature map F is obtained ₁ ^C×H×W Wherein, C, H and W respectively represent the channel number, the height and the width of the initial characteristic diagram; then the initial feature map F ₁ ^C×H×W Inputting the spectrum enhancement operation to a spectrum enhancement module;

the two-way encoder comprises two parallel branches: a spatial encoder and a channel encoder; wherein the space encoder comprises three cascaded space encoding modules, the structure of a single space encoding module is shown in fig. 2, the space encoding module comprises 3 convolution modules of 3x3 and 3 convolution modules of 5x5, and one PAM module, wherein the structure of the PAM module is shown in fig. 3; the channel encoder comprises three groups of channel encoding modules connected in series, each channel encoder module comprises two convolution modules and a CAM module, wherein the structure of the CAM module is shown in FIG. 4;

the input of the spatial encoder is F ₁ ^C×H×W ，F ₁ ^C×H×W Obtaining a space characteristic diagram F after passing through a space encoder _s ^C×H×W The method comprises the steps of carrying out a first treatment on the surface of the The input of the channel encoder isObtaining a channel characteristic diagram after passing through a channel encoder>Then will->And->Superimposed in the channel direction, the output of the two-way encoder is obtained>Then the characteristic diagram F is obtained after the convolution operation of the two convolution modules ₃ ^C×H×W Then F is carried out ₃ ^C×H×W Sending to a decoder;

in PAM, F is shown in FIG. 3 ₃ ^C×H×W Three branches are obtained after 3 parallel convolution operations, and the results of the three branches are respectively recorded asWill->And->After dot multiplication, performing softmax operation, and then combining with +.>Dot product to obtain PAM output result, which is marked as +.>

In CAM, F is shown in FIG. 4 ₃ ^C×H×W Three branches are obtained after 3 parallel convolution operations, and the results of the three branches are respectively recorded asWill->And->After dot multiplication, performing softmax operation, and then combining with +.>Dot product, obtain CAM output, recorded as +.>

In LAM, as shown in fig. 5, F ₃ ^C×H×W After 2 parallel convolution operations, two branches are obtained, and the results of the two branches are respectively recorded asWill->Firstly, performing softmax operation to obtain an attention probability map att of the LAM _L Then, with->Summing to obtain output result of LAM, which is marked as +.>

In SAM, as shown in FIG. 6, F ₃ ^C×H×W Sequentially passing through a 4x4 maximum value pooling layer, a global convolution layer and two one-dimensional convolution layers with convolution kernel size of 5 to obtainThen F is carried out ₃ ^C×H×W And->Multiplying to obtain spectrum enhancement result characteristic diagram +.>

Will beAnd->After summation, a layer of convolution module is adopted to obtain an output characteristic diagram of a single channel, and then the output characteristic diagram is matched withSumming, and then outputting a classification probability weight graph of the typical object by the decoder after up-sampling;

(3) Performing typical object visual extraction on the remote sensing image;

Fig. 7 is an experimental result diagram, wherein (a) is an original image, (b) is a label image, (c) is a UNet experimental result diagram, (d) is a multi-decoder four-attention network experimental result diagram, and (e) is a spectrum enhancement two-way coding network experimental result diagram. It can be seen by comparison that the (e) plot has these advantages over the (d) plot: 1. the holes in large-area extraction are fewer, and the extraction continuity is better; 2. the situation of misidentifying the ground object category is less. Overall, the spectrum enhanced two-way coding network has significantly higher accuracy in extracting multiple features of the remote sensing image than the four-fold attention network of the multiple decoders.

While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims

1. A remote sensing image typical object extraction method based on spectrum enhancement and double-path coding is characterized by comprising the following steps:

(1) Constructing a training data set;

(2) Constructing and training a spectrum enhancement two-way coding network;

the spectrum enhanced double-path coding network starts with an initial information extraction module, wherein the initial information extraction module comprises three convolution modules and a 2x2 average value pooling layer; wherein each convolution module comprises a 3x3 convolution layer, a batch normalization layer and a Relu activation function; after the remote sensing image passes through the initial information extraction module, an initial feature map F is obtained ₁ ^C ^×H×W Wherein, C, H and W are respectively as followsThe number of channels, height and width of the initial feature map are shown; then the initial feature map F ₁ ^C×H×W Inputting the spectrum enhancement operation to a spectrum enhancement module;

initial feature map F ₁ ^C×H×W First, the spectrum is sent into a spectrum generator to obtain a spectrum enhancement characteristic diagramAnd 6 initial probability weight maps +.>Then the spectrum enhancement feature map->Input to a spectral attention module; in the spectral attention module,/a->Sequentially passing through a 4x4 maximum value pooling layer, a global convolution layer and two one-dimensional convolution layers with convolution kernel size of 5 to obtain a characteristic diagram +.>Then will->And->Multiplying to obtain spectrum enhancement result characteristic diagram +.>Finally will->And F ₁ ^C×H×W Together as input to a two-way encoder;

the input of the spatial encoder is F ₁ ^C×H×W ，F ₁ ^C×H×W Obtaining a space characteristic diagram F after passing through a space encoder _s ^C×H×W The method comprises the steps of carrying out a first treatment on the surface of the The input of the channel encoder is Obtaining a channel characteristic diagram F after passing through a channel encoder _c ^12C×H×W The method comprises the steps of carrying out a first treatment on the surface of the Then willAnd F _s ^C×H×W Superimposed in the channel direction, the output of the two-way encoder is obtained>Then the characteristic diagram F is obtained after the convolution operation of the two convolution modules ₃ ^C×H×W Then F is carried out ₃ ^C×H×W Sending to a decoder;

In CAM, F ₃ ^C×H×W Three branches are obtained after 3 parallel convolution operations, and the results of the three branches are respectively recorded asWill->And->After dot multiplication, performing softmax operation, and then combining with +.>Dot product, obtain CAM output, recorded as +.>

In LAM, F ₃ ^C×H×W After 2 parallel convolution operations, two branches are obtained, and the results of the two branches are respectively recorded asWill->Firstly, performing softmax operation to obtain an attention probability map att of the LAM _L Then, with->Summing to obtain output result of LAM, which is marked as +.>

In SAM, F ₃ ^C×H×W Sequentially passing through a 4x4 maximum value pooling layer, a global convolution layer and two one-dimensional convolution layers with convolution kernel size of 5 to obtainThen F is carried out ₃ ^C×H×W And F _S ^C×1×1 Multiplying to obtain spectrum enhancement result characteristic diagram +.>

(3) Performing typical object visual extraction on the remote sensing image;

2. The method for extracting typical features from remote sensing images based on spectral enhancement and two-way coding according to claim 1, wherein the Loss is characterized in that _att The method meets the following conditions:

wherein l _i Representing the value of the i-th pixel in the label image,attention probability map att _L N represents the total number of pixels, n=m×n.

3. The method for extracting typical features from remote sensing images based on spectral enhancement and two-way coding according to claim 1, wherein the Loss is characterized in that _seg The method meets the following conditions:

wherein p is _i And the value of the ith pixel point in the classification probability weight graph is represented.

4. The method for extracting typical features from remote sensing images based on spectral enhancement and two-way coding according to claim 1, wherein the method is characterized in thatThe method meets the following conditions:

wherein,representing an initial probability weight map->The value of the i-th pixel in (c).

5. The method for extracting typical features from remote sensing images based on spectral enhancement and two-way coding according to claim 1, wherein the spectral enhancement feature map is characterized in thatThe generation process of (1) is as follows:

1) In each spectrum generation module, the initial characteristic diagram F is firstly compared with ₁ ^C×H×W Performing spectrum generation operation to obtain a characteristic diagram

Wherein α, β, γ are adaptive parameters, and concat () represents channel direction superposition;

2) Will beObtaining a characteristic diagram through two convolution operations>

3) The characteristic diagram obtained by the kth spectrum generation module is recorded asWill->Generating a preliminary probability weight map through a convolution layer and a sigmoid layer>

4) Finally six are toGenerating a spectral enhancement profile by superposition in the channel direction>