CN117152616A - Remote sensing image typical object extraction method based on spectrum enhancement and double-path coding - Google Patents

Remote sensing image typical object extraction method based on spectrum enhancement and double-path coding Download PDF

Info

Publication number
CN117152616A
CN117152616A CN202311169642.XA CN202311169642A CN117152616A CN 117152616 A CN117152616 A CN 117152616A CN 202311169642 A CN202311169642 A CN 202311169642A CN 117152616 A CN117152616 A CN 117152616A
Authority
CN
China
Prior art keywords
spectrum
enhancement
remote sensing
convolution
typical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311169642.XA
Other languages
Chinese (zh)
Inventor
张柏玮
李玉霞
张靖霖
司宇
何媛
童忠贵
邓万涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
South West Institute of Technical Physics
Original Assignee
University of Electronic Science and Technology of China
South West Institute of Technical Physics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China, South West Institute of Technical Physics filed Critical University of Electronic Science and Technology of China
Priority to CN202311169642.XA priority Critical patent/CN117152616A/en
Publication of CN117152616A publication Critical patent/CN117152616A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a remote sensing image typical feature extraction method based on spectrum enhancement and double-path coding, which utilizes a spectrum enhancement module to introduce external information, and enhances the utilization capacity of a network on spectrum dimension information, thereby solving the problems of complex spectrum characteristics and difficult spectrum information extraction of ground features; the two-way coding module is utilized to fuse the space dimension information and the spectrum dimension information, so that the extraction capacity and the utilization capacity of the network for complex space dimension information are maintained while the spectrum dimension information is further enhanced, the automatic and intelligent identification of multiple types of typical objects is realized, and the identification precision is higher.

Description

Remote sensing image typical object extraction method based on spectrum enhancement and double-path coding
Technical Field
The invention belongs to the technical field of remote sensing image processing, and particularly relates to a method for extracting typical objects of a remote sensing image based on spectrum enhancement and double-path coding.
Background
The multi-class ground object element extraction and recognition task belongs to a typical semantic segmentation task, and typical ground objects such as roads, buildings, vegetation, water bodies and the like in a remote sensing image are required to be segmented at a pixel level. The multi-class feature element extraction is one of the core technologies of the high-resolution remote sensing application service chain, and plays a great role in various fields such as global change, disaster detection, resource management and control and the like. In the field of global change, the spatial distribution of the types of the ground features such as buildings, vegetation, water bodies and the like is extracted, so that the change condition of each ground feature can be known; in the disaster detection field, the remote sensing image can dynamically detect disaster places and areas in real time, and information support is provided for the establishment of disaster relief schemes; in the field of resource management and control, the remote sensing technology can provide the distribution condition of resource types in real time, and is beneficial to realizing the fine management of resources. In order to obtain the space distribution information of the ground object classification, the most accurate method adopts manual labeling. However, this method has high cost and low efficiency, and cannot be applied on a large scale. Therefore, a targeted algorithm needs to be designed, and automation, intellectualization and high efficiency of the classification of the remote sensing image features are realized by utilizing a computer.
In the early days, visual methods were adopted to reduce the workload of manual labeling, such as traditional methods of threshold segmentation, classifier, index and the like by using remote sensing processing software. The method realizes semi-automation of the feature extraction of the remote sensing image to a certain extent, but still has the defects of low labeling precision, poor universality and the like. Then, object-oriented image analysis technology gradually becomes a main means for researching the extraction of the ground features of the remote sensing image. The object-oriented image analysis technology divides the spatial semantic information of the remote sensing image into three layers, namely a basic feature layer, object semantics and scene semantics, and the contained semantic information is from low to high, and the object-oriented image analysis technology aims at realizing the mapping of basic features of the image, such as spectrum, texture and the like, to high-level semantics by using a computer. However, it is difficult to establish a high-precision mapping relationship using only the manually constructed underlying features.
In recent years, with the development of deep learning technology, the field of computer vision is also making a great breakthrough, and for simple image classification scenes in daily life, the deep learning model can be comparable to human beings. The semantic segmentation task of the multi-classification ground object extraction task of the remote sensing image is greatly improved in precision under the support of the deep learning technology, but has a larger gap from the human level. Therefore, the method is based on the deep learning technology, and the method realizes high-precision and high-efficiency ground feature information extraction, and also becomes a hot research direction.
In the prior art, the patent name is: a remote sensing image typical feature extraction method based on a multi-task attention mechanism utilizes four attention modules to fuse global features from the inside and the outside, and increases the receptive field of a model, thereby solving the problems of wide distribution range of feature elements and large area of a region; constructing a multi-decoder structure by utilizing a multi-task mechanism, so that competition of different ground object types to model parameters is reduced, and misjudgment of similar ground objects is reduced; the edge extraction task and the distance graph extraction task are utilized, so that edge constraint is increased, the effect of edge extraction is improved, and finally, intelligent extraction of multiple types of typical object elements is realized; however, the typical object extraction method of the remote sensing image based on the multitasking attention mechanism has the following problems:
1. the manual intervention steps are more. The multi-decoder quad-attentive extraction model requires processing such as single-hot code and edge extraction on the label image, and all the processing needs to be manually operated in advance.
2. The accuracy is low. The multi-decoder quadruple attention extraction model has lower accuracy and larger lifting space although the extraction accuracy is improved.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a remote sensing image typical feature extraction method based on spectrum enhancement and double-path coding, which fully discovers and utilizes spectrum information of a remote sensing image by utilizing the spectrum enhancement method and fully fuses different dimensional information of the remote sensing image by double-path coding, thereby realizing intelligent extraction of various typical feature elements, avoiding manual intervention steps of a multi-decoder four-attention extraction model and further improving extraction precision to realize the purposes of the invention, and is characterized by comprising the following steps:
(1) Constructing a training data set;
(1.1) downloading a plurality of remote sensing images, and cutting each remote sensing image into a pattern block with the size of m x n;
(1.2) marking typical objects in the remote sensing image with different shapes by using a semantic segmentation marking tool, wherein the typical objects comprise a background, a watertight ground, a vehicle, trees, grasslands and buildings;
(1.3) setting the pixel value corresponding to each typical object to 0, 1,2, 3, 4, 5, thereby generating a label image, wherein the pixel value of the background area is set to 0, the pixel value of the watertight ground is set to 1, and so on;
(1.4) taking each remote sensing image and the corresponding label image as a group of training data, thereby forming a training data set;
(2) Constructing and training a spectrum enhancement two-way coding network;
taking a group of training data as input of a spectrum enhanced two-way coding network;
spectral enhancement two-way coding networkStarting with an initial information extraction module, wherein the initial information extraction module comprises three convolution modules and a 2x2 average value pooling layer; wherein each convolution module comprises a 3x3 convolution layer, a batch normalization layer and a Relu activation function; after the remote sensing image passes through the initial information extraction module, an initial feature map F is obtained 1 C×H×W Wherein, C, H and W respectively represent the channel number, the height and the width of the initial characteristic diagram; then the initial feature map F 1 C×H×W Inputting the spectrum enhancement operation to a spectrum enhancement module;
the spectrum enhancement module comprises a spectrum generator and a spectrum attention module; the spectrum generator comprises 6 parallel spectrum generating modules with the same structure, and each class of typical objects is distributed with one spectrum generating module; the spectrum generation module comprises a spectrum generation operation and two convolution operations;
initial feature map F 1 C×H×W First, the spectrum is sent into a spectrum generator to obtain a spectrum enhancement characteristic diagramAnd 6 initial probability weight maps +.>k=1, 2, 6; then the spectrum enhancement feature map->Input to a spectral attention module; in the spectral attention module,/a->Sequentially passing through a 4x4 maximum value pooling layer, a global convolution layer and two one-dimensional convolution layers with convolution kernel size of 5 to obtain a characteristic diagram +.>Then will->And->Multiplying to obtain spectrum enhancement result characteristic diagram +.>Finally will->And F 1 C×H×W Together as input to a two-way encoder;
the two-way encoder comprises two parallel branches: a spatial encoder and a channel encoder; the space encoder comprises three cascaded space encoding modules, the channel encoder comprises three groups of channel encoding modules which are connected in series, and each channel encoder module comprises two convolution modules and a channel attention module;
the input of the spatial encoder is F 1 C×H×W ,F 1 C×H×W Obtaining a space characteristic diagram after passing through a space encoderThe input of the channel encoder is +.>Obtaining a channel characteristic diagram after passing through a channel encoder>Then will->And->Superimposed in the channel direction, the output of the two-way encoder is obtained>Then the characteristic diagram F is obtained after the convolution operation of the two convolution modules 3 C×H×W Then F is carried out 3 C×H×W Sending to a decoder;
the spectrum enhancement two-way coding network comprises 6 decoders with the same structure in total, each type of typical feature is allocated with one decoder, and each decoder outputs a classification probability weight graph corresponding to the typical feature;
in the spectrum enhancement two-way coding network, the number of decoders is the same as the class number of typical feature elements, and the spectrum enhancement two-way coding network totally comprises 6 decoders with the same structure, each class of typical feature is allocated with one decoder, and each decoder outputs a classification probability weight graph corresponding to the typical feature;
wherein each decoder comprises four attention modules in parallel, denoted PAM, CAM, LAM and SAM;
in PAM, F 3 C×H×W Three branches are obtained after 3 parallel convolution operations, and the results of the three branches are respectively recorded asWill->And->After dot multiplication, carrying out softmax operation, and then combining withDot product to obtain PAM output result, which is marked as +.>
In CAM, F 3 C×H×W Three branches are obtained after 3 parallel convolution operations, and the results of the three branches are respectively recorded asWill->And->After dot multiplication, performing softmax operation, and then combining with +.>The point-of-time multiplication is performed,obtaining the output result of CAM, which is marked as +.>
In LAM, F 3 C×H×W After 2 parallel convolution operations, two branches are obtained, and the results of the two branches are respectively recorded asWill->Firstly, performing softmax operation to obtain an attention probability map att of the LAM L Then is combined withSumming to obtain output result of LAM, which is marked as +.>
In SAM, F 3 C×H×W Sequentially passing through a 4x4 maximum value pooling layer, a global convolution layer and two one-dimensional convolution layers with convolution kernel size of 5 to obtainThen F is carried out 3 C×H×W And->Multiplying to obtain spectrum enhancement result characteristic diagram
Will beAnd->After summation, a layer of convolution module is adopted to obtain an output characteristic diagram of a single channel, and then the output characteristic diagram is matched withSumming and then go through upscalingAfter sampling, the decoder outputs a classification probability weight graph of the typical object;
calculating Loss function value Loss of spectrum enhanced two-way coding model after the training of the round total
Wherein,loss value representing ith preliminary probability weight map and corresponding label image, loss att Loss of attention value, loss of attention, for LAM seg Loss values extracted for ground objects;
finally, training the spectrum enhancement two-way coding network by utilizing each group of training data until the loss function converges, and stopping training, thereby obtaining the spectrum enhancement two-way coding network after training;
(3) Performing typical object visual extraction on the remote sensing image;
cutting a remote sensing image to be extracted into a pattern block with m x n, inputting the pattern block into a trained spectrum enhanced two-way coding network, outputting tag values 0, 1,2, 3, 4 and 5 corresponding to each typical feature in the remote sensing image, and mapping the tag values to a color range to form a visual image.
The invention aims at realizing the following steps:
according to the remote sensing image typical feature extraction method based on spectrum enhancement and double-path coding, external information is introduced by utilizing a spectrum enhancement module, and the utilization capacity of a network on spectrum dimension information is enhanced, so that the problems of complex spectrum characteristics and difficult spectrum information extraction of ground features are solved; the two-way coding module is utilized to fuse the space dimension information and the spectrum dimension information, so that the extraction capacity and the utilization capacity of the network for complex space dimension information are maintained while the spectrum dimension information is further enhanced, the automatic and intelligent identification of multiple types of typical objects is realized, and the identification precision is higher.
Meanwhile, the remote sensing image typical object extraction method based on spectrum enhancement and double-path coding has the following beneficial effects:
(1) The invention is based on the traditional deep learning network structure, and the spectrum enhancement module is introduced to improve the utilization capability of the network model on spectrum information, and can also improve the extraction precision of the network model on various typical objects;
(2) Aiming at the characteristics of complex spectrum information, useful spectrum information and interference light information in a remote sensing image, a spectrum attention module is constructed, the useful spectrum information can be enhanced by spectrum attention, and useless spectrum information is restrained;
(3) Aiming at the problem that fusion and utilization of different dimension information and different scale information in a remote sensing image are difficult, the invention introduces a two-way coding module, changes a single-way coder in a general deep learning network into two parts of a channel coder and a space coder, extracts remote sensing image information from the channel dimension and the space dimension respectively, enhances the utilization capacity of the network to spectrum information, simultaneously maintains the extraction capacity of the network to space information, and enhances the utilization capacity of the network to multi-scale information.
Drawings
FIG. 1 is an overall block diagram of a spectrally enhanced two-way coding network of the present invention;
FIG. 2 is a block diagram of a single channel coding module;
FIG. 3PAM block diagram;
FIG. 4 is a diagram of a CAM structure;
FIG. 5 is a block diagram of the LAM;
FIG. 6SAM structure diagram;
fig. 7 shows experimental results, (a) shows an original image, (b) shows a label image, (c) shows a UNet experimental result diagram, (d) shows a multi-decoder four-attention network experimental result diagram, and (e) shows a spectrum enhancement two-way coding network experimental result diagram
Detailed Description
The following description of the embodiments of the invention is presented in conjunction with the accompanying drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.
Examples
For convenience of description, related terms appearing in the detailed description will be described first:
PAM (Position Attention Module): position attention module
CAM (Channel Attention Module): channel attention module
LAM (Label Attention Module): label attention module
SAM (Edge Attention Module): spectrum attention module
In this embodiment, the method for extracting the typical feature of the remote sensing image based on spectrum enhancement and double-path coding comprises the following steps:
(1) Constructing a training data set;
(1.1) downloading a plurality of remote sensing images, and cutting each remote sensing image into a pattern block with the size of m x n, wherein in the embodiment, each remote sensing image is cut into a pattern block with the size of 1024 x 1024;
(1.2) marking typical objects in the remote sensing image with different shapes by using a semantic segmentation marking tool, wherein the typical objects comprise a background, a watertight ground, a vehicle, trees, grasslands and buildings;
(1.3) setting the pixel value corresponding to each typical object to 0, 1,2, 3, 4, 5, thereby generating a label image, wherein the pixel value of the background area is set to 0, the pixel value of the watertight ground is set to 1, and so on;
(1.4) taking each remote sensing image and the corresponding label image as a group of training data, thereby forming a training data set;
(2) Constructing and training a spectrum enhancement two-way coding network;
taking a group of training data as input of a spectrum enhanced two-way coding network;
as shown in fig. 1, the spectrally enhanced two-way coding network starts with an initial information extraction module comprising three convolution modules and one2x2 average pooling layer; wherein each convolution module comprises a 3x3 convolution layer, a batch normalization layer and a Relu activation function; after the remote sensing image passes through the initial information extraction module, an initial feature map F is obtained 1 C×H×W Wherein, C, H and W respectively represent the channel number, the height and the width of the initial characteristic diagram; then the initial feature map F 1 C×H×W Inputting the spectrum enhancement operation to a spectrum enhancement module;
the spectrum enhancement module comprises a spectrum generator and a spectrum attention module; the spectrum generator comprises 6 parallel spectrum generating modules with the same structure, and each class of typical objects is distributed with one spectrum generating module; the spectrum generation module comprises a spectrum generation operation and two convolution operations;
initial feature map F 1 C×H×W First, the spectrum is sent into a spectrum generator to obtain a spectrum enhancement characteristic diagramAnd 6 initial probability weight maps +.>k=1, 2, 6; then the spectrum enhancement feature map->Input to a spectral attention module; in the spectral attention module,/a->Sequentially passing through a 4x4 maximum value pooling layer, a global convolution layer and two one-dimensional convolution layers with convolution kernel size of 5 to obtain a characteristic diagram +.>Then will->And->Multiplying to obtain spectrum enhancement result characteristic diagram +.>Finally will->And F 1 C×H×W Together as input to a two-way encoder;
the two-way encoder comprises two parallel branches: a spatial encoder and a channel encoder; wherein the space encoder comprises three cascaded space encoding modules, the structure of a single space encoding module is shown in fig. 2, the space encoding module comprises 3 convolution modules of 3x3 and 3 convolution modules of 5x5, and one PAM module, wherein the structure of the PAM module is shown in fig. 3; the channel encoder comprises three groups of channel encoding modules connected in series, each channel encoder module comprises two convolution modules and a CAM module, wherein the structure of the CAM module is shown in FIG. 4;
the input of the spatial encoder is F 1 C×H×W ,F 1 C×H×W Obtaining a space characteristic diagram F after passing through a space encoder s C×H×W The method comprises the steps of carrying out a first treatment on the surface of the The input of the channel encoder isObtaining a channel characteristic diagram after passing through a channel encoder>Then will->And->Superimposed in the channel direction, the output of the two-way encoder is obtained>Then the characteristic diagram F is obtained after the convolution operation of the two convolution modules 3 C×H×W Then F is carried out 3 C×H×W Sending to a decoder;
the spectrum enhancement two-way coding network comprises 6 decoders with the same structure in total, each type of typical feature is allocated with one decoder, and each decoder outputs a classification probability weight graph corresponding to the typical feature;
in the spectrum enhancement two-way coding network, the number of decoders is the same as the class number of typical feature elements, and the spectrum enhancement two-way coding network totally comprises 6 decoders with the same structure, each class of typical feature is allocated with one decoder, and each decoder outputs a classification probability weight graph corresponding to the typical feature;
wherein each decoder comprises four attention modules in parallel, denoted PAM, CAM, LAM and SAM;
in PAM, F is shown in FIG. 3 3 C×H×W Three branches are obtained after 3 parallel convolution operations, and the results of the three branches are respectively recorded asWill->And->After dot multiplication, performing softmax operation, and then combining with +.>Dot product to obtain PAM output result, which is marked as +.>
In CAM, F is shown in FIG. 4 3 C×H×W Three branches are obtained after 3 parallel convolution operations, and the results of the three branches are respectively recorded asWill->And->After dot multiplication, performing softmax operation, and then combining with +.>Dot product, obtain CAM output, recorded as +.>
In LAM, as shown in fig. 5, F 3 C×H×W After 2 parallel convolution operations, two branches are obtained, and the results of the two branches are respectively recorded asWill->Firstly, performing softmax operation to obtain an attention probability map att of the LAM L Then, with->Summing to obtain output result of LAM, which is marked as +.>
In SAM, as shown in FIG. 6, F 3 C×H×W Sequentially passing through a 4x4 maximum value pooling layer, a global convolution layer and two one-dimensional convolution layers with convolution kernel size of 5 to obtainThen F is carried out 3 C×H×W And->Multiplying to obtain spectrum enhancement result characteristic diagram +.>
Will beAnd->After summation, a layer of convolution module is adopted to obtain an output characteristic diagram of a single channel, and then the output characteristic diagram is matched withSumming, and then outputting a classification probability weight graph of the typical object by the decoder after up-sampling;
calculating Loss function value Loss of spectrum enhanced two-way coding model after the training of the round total
Wherein,loss value representing ith preliminary probability weight map and corresponding label image, loss att Loss of attention value, loss of attention, for LAM seg Loss values extracted for ground objects;
finally, training the spectrum enhancement two-way coding network by utilizing each group of training data until the loss function converges, and stopping training, thereby obtaining the spectrum enhancement two-way coding network after training;
(3) Performing typical object visual extraction on the remote sensing image;
cutting a remote sensing image to be extracted into a pattern block with m x n, inputting the pattern block into a trained spectrum enhanced two-way coding network, outputting tag values 0, 1,2, 3, 4 and 5 corresponding to each typical feature in the remote sensing image, and mapping the tag values to a color range to form a visual image.
Fig. 7 is an experimental result diagram, wherein (a) is an original image, (b) is a label image, (c) is a UNet experimental result diagram, (d) is a multi-decoder four-attention network experimental result diagram, and (e) is a spectrum enhancement two-way coding network experimental result diagram. It can be seen by comparison that the (e) plot has these advantages over the (d) plot: 1. the holes in large-area extraction are fewer, and the extraction continuity is better; 2. the situation of misidentifying the ground object category is less. Overall, the spectrum enhanced two-way coding network has significantly higher accuracy in extracting multiple features of the remote sensing image than the four-fold attention network of the multiple decoders.
While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims (5)

1. A remote sensing image typical object extraction method based on spectrum enhancement and double-path coding is characterized by comprising the following steps:
(1) Constructing a training data set;
(1.1) downloading a plurality of remote sensing images, and cutting each remote sensing image into a pattern block with the size of m x n;
(1.2) marking typical objects in the remote sensing image with different shapes by using a semantic segmentation marking tool, wherein the typical objects comprise a background, a watertight ground, a vehicle, trees, grasslands and buildings;
(1.3) setting the pixel value corresponding to each typical object to 0, 1,2, 3, 4, 5, thereby generating a label image, wherein the pixel value of the background area is set to 0, the pixel value of the watertight ground is set to 1, and so on;
(1.4) taking each remote sensing image and the corresponding label image as a group of training data, thereby forming a training data set;
(2) Constructing and training a spectrum enhancement two-way coding network;
taking a group of training data as input of a spectrum enhanced two-way coding network;
the spectrum enhanced double-path coding network starts with an initial information extraction module, wherein the initial information extraction module comprises three convolution modules and a 2x2 average value pooling layer; wherein each convolution module comprises a 3x3 convolution layer, a batch normalization layer and a Relu activation function; after the remote sensing image passes through the initial information extraction module, an initial feature map F is obtained 1 C ×H×W Wherein, C, H and W are respectively as followsThe number of channels, height and width of the initial feature map are shown; then the initial feature map F 1 C×H×W Inputting the spectrum enhancement operation to a spectrum enhancement module;
the spectrum enhancement module comprises a spectrum generator and a spectrum attention module; the spectrum generator comprises 6 parallel spectrum generating modules with the same structure, and each class of typical objects is distributed with one spectrum generating module; the spectrum generation module comprises a spectrum generation operation and two convolution operations;
initial feature map F 1 C×H×W First, the spectrum is sent into a spectrum generator to obtain a spectrum enhancement characteristic diagramAnd 6 initial probability weight maps +.>Then the spectrum enhancement feature map->Input to a spectral attention module; in the spectral attention module,/a->Sequentially passing through a 4x4 maximum value pooling layer, a global convolution layer and two one-dimensional convolution layers with convolution kernel size of 5 to obtain a characteristic diagram +.>Then will->And->Multiplying to obtain spectrum enhancement result characteristic diagram +.>Finally will->And F 1 C×H×W Together as input to a two-way encoder;
the two-way encoder comprises two parallel branches: a spatial encoder and a channel encoder; the space encoder comprises three cascaded space encoding modules, the channel encoder comprises three groups of channel encoding modules which are connected in series, and each channel encoder module comprises two convolution modules and a channel attention module;
the input of the spatial encoder is F 1 C×H×W ,F 1 C×H×W Obtaining a space characteristic diagram F after passing through a space encoder s C×H×W The method comprises the steps of carrying out a first treatment on the surface of the The input of the channel encoder is Obtaining a channel characteristic diagram F after passing through a channel encoder c 12C×H×W The method comprises the steps of carrying out a first treatment on the surface of the Then willAnd F s C×H×W Superimposed in the channel direction, the output of the two-way encoder is obtained>Then the characteristic diagram F is obtained after the convolution operation of the two convolution modules 3 C×H×W Then F is carried out 3 C×H×W Sending to a decoder;
the spectrum enhancement two-way coding network comprises 6 decoders with the same structure in total, each type of typical feature is allocated with one decoder, and each decoder outputs a classification probability weight graph corresponding to the typical feature;
in the spectrum enhancement two-way coding network, the number of decoders is the same as the class number of typical feature elements, and the spectrum enhancement two-way coding network totally comprises 6 decoders with the same structure, each class of typical feature is allocated with one decoder, and each decoder outputs a classification probability weight graph corresponding to the typical feature;
wherein each decoder comprises four attention modules in parallel, denoted PAM, CAM, LAM and SAM;
in PAM, F 3 C×H×W Three branches are obtained after 3 parallel convolution operations, and the results of the three branches are respectively recorded asWill->And->After dot multiplication, carrying out softmax operation, and then combining withDot product to obtain PAM output result, which is marked as +.>
In CAM, F 3 C×H×W Three branches are obtained after 3 parallel convolution operations, and the results of the three branches are respectively recorded asWill->And->After dot multiplication, performing softmax operation, and then combining with +.>Dot product, obtain CAM output, recorded as +.>
In LAM, F 3 C×H×W After 2 parallel convolution operations, two branches are obtained, and the results of the two branches are respectively recorded asWill->Firstly, performing softmax operation to obtain an attention probability map att of the LAM L Then, with->Summing to obtain output result of LAM, which is marked as +.>
In SAM, F 3 C×H×W Sequentially passing through a 4x4 maximum value pooling layer, a global convolution layer and two one-dimensional convolution layers with convolution kernel size of 5 to obtainThen F is carried out 3 C×H×W And F S C×1×1 Multiplying to obtain spectrum enhancement result characteristic diagram +.>
Will beAnd->After summation, a layer of convolution module is adopted to obtain an output characteristic diagram of a single channel, and then the output characteristic diagram is matched withSumming, and then outputting a classification probability weight graph of the typical object by the decoder after up-sampling;
calculating Loss function value Loss of spectrum enhanced two-way coding model after the training of the round total
Wherein,loss value representing ith preliminary probability weight map and corresponding label image, loss att Loss of attention value, loss of attention, for LAM seg Loss values extracted for ground objects;
finally, training the spectrum enhancement two-way coding network by utilizing each group of training data until the loss function converges, and stopping training, thereby obtaining the spectrum enhancement two-way coding network after training;
(3) Performing typical object visual extraction on the remote sensing image;
cutting a remote sensing image to be extracted into a pattern block with m x n, inputting the pattern block into a trained spectrum enhanced two-way coding network, outputting tag values 0, 1,2, 3, 4 and 5 corresponding to each typical feature in the remote sensing image, and mapping the tag values to a color range to form a visual image.
2. The method for extracting typical features from remote sensing images based on spectral enhancement and two-way coding according to claim 1, wherein the Loss is characterized in that att The method meets the following conditions:
wherein l i Representing the value of the i-th pixel in the label image,attention probability map att L N represents the total number of pixels, n=m×n.
3. The method for extracting typical features from remote sensing images based on spectral enhancement and two-way coding according to claim 1, wherein the Loss is characterized in that seg The method meets the following conditions:
wherein p is i And the value of the ith pixel point in the classification probability weight graph is represented.
4. The method for extracting typical features from remote sensing images based on spectral enhancement and two-way coding according to claim 1, wherein the method is characterized in thatThe method meets the following conditions:
wherein,representing an initial probability weight map->The value of the i-th pixel in (c).
5. The method for extracting typical features from remote sensing images based on spectral enhancement and two-way coding according to claim 1, wherein the spectral enhancement feature map is characterized in thatThe generation process of (1) is as follows:
1) In each spectrum generation module, the initial characteristic diagram F is firstly compared with 1 C×H×W Performing spectrum generation operation to obtain a characteristic diagram
Wherein α, β, γ are adaptive parameters, and concat () represents channel direction superposition;
2) Will beObtaining a characteristic diagram through two convolution operations>
3) The characteristic diagram obtained by the kth spectrum generation module is recorded asWill->Generating a preliminary probability weight map through a convolution layer and a sigmoid layer>
4) Finally six are toGenerating a spectral enhancement profile by superposition in the channel direction>
CN202311169642.XA 2023-09-12 2023-09-12 Remote sensing image typical object extraction method based on spectrum enhancement and double-path coding Pending CN117152616A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311169642.XA CN117152616A (en) 2023-09-12 2023-09-12 Remote sensing image typical object extraction method based on spectrum enhancement and double-path coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311169642.XA CN117152616A (en) 2023-09-12 2023-09-12 Remote sensing image typical object extraction method based on spectrum enhancement and double-path coding

Publications (1)

Publication Number Publication Date
CN117152616A true CN117152616A (en) 2023-12-01

Family

ID=88905943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311169642.XA Pending CN117152616A (en) 2023-09-12 2023-09-12 Remote sensing image typical object extraction method based on spectrum enhancement and double-path coding

Country Status (1)

Country Link
CN (1) CN117152616A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112287978A (en) * 2020-10-07 2021-01-29 武汉大学 Hyperspectral remote sensing image classification method based on self-attention context network
CN113011499A (en) * 2021-03-22 2021-06-22 安徽大学 Hyperspectral remote sensing image classification method based on double-attention machine system
CN114694031A (en) * 2022-04-24 2022-07-01 电子科技大学 Remote sensing image typical ground object extraction method based on multitask attention mechanism
US20220222914A1 (en) * 2021-01-14 2022-07-14 Tata Consultancy Services Limited System and method for attention-based surface crack segmentation
CN115272776A (en) * 2022-09-26 2022-11-01 山东锋士信息技术有限公司 Hyperspectral image classification method based on double-path convolution and double attention and storage medium
CN115331104A (en) * 2022-08-17 2022-11-11 中国农业大学 Crop planting information extraction method based on convolutional neural network
US20230053588A1 (en) * 2021-08-12 2023-02-23 Adobe Inc. Generating synthesized digital images utilizing a multi-resolution generator neural network
CN116563606A (en) * 2023-04-11 2023-08-08 重庆邮电大学 Hyperspectral image classification method based on dual-branch spatial spectrum global feature extraction network

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112287978A (en) * 2020-10-07 2021-01-29 武汉大学 Hyperspectral remote sensing image classification method based on self-attention context network
US20220222914A1 (en) * 2021-01-14 2022-07-14 Tata Consultancy Services Limited System and method for attention-based surface crack segmentation
CN113011499A (en) * 2021-03-22 2021-06-22 安徽大学 Hyperspectral remote sensing image classification method based on double-attention machine system
US20230053588A1 (en) * 2021-08-12 2023-02-23 Adobe Inc. Generating synthesized digital images utilizing a multi-resolution generator neural network
CN114694031A (en) * 2022-04-24 2022-07-01 电子科技大学 Remote sensing image typical ground object extraction method based on multitask attention mechanism
CN115331104A (en) * 2022-08-17 2022-11-11 中国农业大学 Crop planting information extraction method based on convolutional neural network
CN115272776A (en) * 2022-09-26 2022-11-01 山东锋士信息技术有限公司 Hyperspectral image classification method based on double-path convolution and double attention and storage medium
CN116563606A (en) * 2023-04-11 2023-08-08 重庆邮电大学 Hyperspectral image classification method based on dual-branch spatial spectrum global feature extraction network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LI Y, SI Y, TONG Z, HE L, ZHANG J, LUO S, GONG Y: "MQANet: Multi-Task Quadruple Attention Network of Multi-Object Semantic Segmentation from Remote Sensing Images", REMOTE SENSING. 2022; 14(24):6256, 10 December 2022 (2022-12-10), pages 1 - 25 *
ZHANG J, LI Y, ZHANG B, HE L, HE Y, DENG W, SI Y, TONG Z, GONG Y, LIAO K: "CD-MQANet: Enhancing Multi-Objective Semantic Segmentation of Remote Sensing Images through Channel Creation and Dual-Path Encoding", REMOTE SENSING. 2023; 15(18):4520, 14 September 2023 (2023-09-14), pages 1 - 23 *

Similar Documents

Publication Publication Date Title
CN108596248B (en) Remote sensing image classification method based on improved deep convolutional neural network
CN111310773B (en) Efficient license plate positioning method of convolutional neural network
CN115797931A (en) Remote sensing image semantic segmentation method based on double-branch feature fusion
CN112183258A (en) Remote sensing image road segmentation method based on context information and attention mechanism
CN103942564B (en) High-resolution remote sensing image scene classifying method based on unsupervised feature learning
CN104462494B (en) A kind of remote sensing image retrieval method and system based on unsupervised feature learning
CN111860351B (en) Remote sensing image fishpond extraction method based on line-row self-attention full convolution neural network
US11941865B2 (en) Hyperspectral image classification method based on context-rich networks
CN113705580B (en) Hyperspectral image classification method based on deep migration learning
CN112419333B (en) Remote sensing image self-adaptive feature selection segmentation method and system
CN115527123B (en) Land cover remote sensing monitoring method based on multisource feature fusion
CN116052016A (en) Fine segmentation detection method for remote sensing image cloud and cloud shadow based on deep learning
CN114022408A (en) Remote sensing image cloud detection method based on multi-scale convolution neural network
CN111738052B (en) Multi-feature fusion hyperspectral remote sensing ground object classification method based on deep learning
CN113435254A (en) Sentinel second image-based farmland deep learning extraction method
CN116385902A (en) Remote sensing big data processing method, system and cloud platform
CN115497002A (en) Multi-scale feature fusion laser radar remote sensing classification method
CN117058367A (en) Semantic segmentation method and device for high-resolution remote sensing image building
CN115527113A (en) Bare land classification method and device for remote sensing image
CN114694031B (en) Remote sensing image typical object extraction method based on multitasking attention mechanism
Yu et al. Coupling dual graph convolution network and residual network for local climate zone mapping
CN105718858B (en) A kind of pedestrian recognition method based on positive and negative broad sense maximum pond
CN117152616A (en) Remote sensing image typical object extraction method based on spectrum enhancement and double-path coding
CN115631434A (en) Land utilization classification method based on remote sensing image
CN114332640A (en) Cloud platform and random forest based earth surface covering object identification and area estimation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination