CN110490081B - Remote sensing object interpretation method based on focusing weight matrix and variable-scale semantic segmentation neural network - Google Patents

Remote sensing object interpretation method based on focusing weight matrix and variable-scale semantic segmentation neural network Download PDF

Info

Publication number
CN110490081B
CN110490081B CN201910660740.0A CN201910660740A CN110490081B CN 110490081 B CN110490081 B CN 110490081B CN 201910660740 A CN201910660740 A CN 201910660740A CN 110490081 B CN110490081 B CN 110490081B
Authority
CN
China
Prior art keywords
scale
remote sensing
image
network
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910660740.0A
Other languages
Chinese (zh)
Other versions
CN110490081A (en
Inventor
崔巍
何新
姚勐
王梓溦
郝元洁
穆力玮
马力
陈先锋
史燕娟
胡颖
申雪皎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN201910660740.0A priority Critical patent/CN110490081B/en
Publication of CN110490081A publication Critical patent/CN110490081A/en
Application granted granted Critical
Publication of CN110490081B publication Critical patent/CN110490081B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing object interpretation method based on a focusing weight matrix and a variable-scale semantic segmentation neural network, which comprises the following steps of: data acquisition and data preprocessing; making a thematic map; cutting a sample; designing a multi-spatial scale remote sensing image marking strategy; making a label of the sample set; constructing a multi-scale remote sensing image semantic interpretation model; selecting a training set and a verification set; setting training parameters; training a model; and designing a remote sensing object recognition algorithm based on a focusing weight matrix and verifying and analyzing the effect of the variable-scale remote sensing image semantic interpretation model. According to the invention, through constructing the LSTM, the obtained relation between the noun in the semantic description and the object mask graph obtained by semantic segmentation is transferred to the space between the object mask graphs, so that the variable-scale semantic segmentation and the end-to-end identification of the space relation of the remote sensing object are realized, and the image classification and identification work in the remote sensing application field is guided to step to a higher step.

Description

Remote sensing object interpretation method based on focusing weight matrix and variable-scale semantic segmentation neural network
Technical Field
The invention relates to the technical field of image processing, in particular to a remote sensing object interpretation method based on a focusing weight matrix and a variable-scale semantic segmentation neural network.
Background
Remote sensing image classification and remote sensing object identification are research hotspots of the current remote sensing technology, and along with the development of the artificial intelligence technology, the deep neural network is widely applied to high-resolution remote sensing image analysis and increasingly becomes an effective processing method.
At present, the conventional LSTM model based on the Attention mechanism is mainly applied to semantic description of common digital images, and the present inventors find that the method in the prior art has at least the following technical problems in the process of implementing the present invention:
uncertainty of spatial position: at different times, the focus area mechanism generates an image feature matrix with the size of 14 × 14, corresponding to 196 spatial positions in the remote sensing image, which often has some deviations, and the application of the focus area mechanism in remote sensing object identification is limited.
Boundary uncertainty: nouns (labels of objects) in the semantic description cannot accurately segment the boundaries of remotely sensed objects in the image and therefore cannot identify spatial relationships between objects.
Uncertainty of spatial scale: the peripheral information of the object is complex and changeable, the remote sensing object is difficult to identify through a model with a single scale, and sometimes the remote sensing object can be identified more accurately by semantic information with a larger scale.
Therefore, the method in the prior art has the technical problem of inaccurate identification.
Disclosure of Invention
In view of the above, the invention provides a remote sensing object interpretation method based on a focus weight matrix and a variable-scale semantic segmentation neural network, which is used for solving or at least partially solving the technical problem of inaccurate identification in the method in the prior art.
The invention provides a remote sensing object interpretation method based on a focus weight matrix and a variable-scale semantic segmentation neural network, which comprises the following steps:
step S1: acquiring a high-resolution remote sensing image of a preset research area, and preprocessing the acquired high-resolution remote sensing image;
step S2: vectorizing by using professional GIS software to obtain a thematic map layer of a research area, and rasterizing the vector thematic map to obtain a corresponding grid gray map;
step S3: cutting the preprocessed remote sensing image and the grid gray level image, and extracting two sets of data sample sets with spatial scales, wherein the two sets of data sample sets with the spatial scales respectively comprise an original image, a large-scale GT image, the original image and a small-scale GT image;
step S4: carrying out content annotation on each remote sensing image in the two sets of spatial scale data sample sets according to a multi-spatial scale remote sensing image annotation strategy to obtain sample set annotations;
step S5: constructing a variable-scale remote sensing image semantic interpretation model, obtaining a multi-scale semantic segmentation image through the interpretation model, extracting masks of two scale objects through a mask extraction algorithm, and associating a small-scale mask object segmented by a U-Net network with a noun in semantic description through variable-scale object identification, wherein the variable-scale remote sensing image semantic interpretation model comprises the following steps: the system comprises an FCN full convolution network, a U-Net semantic segmentation network and an LSTM network based on an Attention mechanism, wherein the FCN network is used for large-scale object segmentation, the U-Net network is used for small-scale object segmentation, and the LSTM is used for generating semantic description containing two space scale objects and space relation thereof;
step S6: training an FCN (fuzzy C-means) network, a U-Net semantic segmentation network and an LSTM (least Square TM) network in the constructed variable-scale remote sensing image semantic interpretation model to obtain a trained model;
step S7: the method for recognizing the remote sensing object by using the trained model specifically comprises the following steps: and positioning a focusing weight matrix of the noun generated by the LSTM network at the current moment to a corresponding small-scale object in a mask image obtained by the U-Net semantic segmentation, and finishing the identification of the object if the object class label is the same as the noun.
In one embodiment, when the object class label is different from the noun, the method further includes starting a multi-scale remote sensing object rectification algorithm, specifically: firstly, a large-scale mask object obtained by FCN semantic segmentation positioned in a current interest area is positioned through a scale-up method, and then a small-scale object with the same class label and noun is positioned in a candidate large-scale object through a scale-down method, so that the identification of the object is completed.
In one embodiment, the method further comprises: and performing effect verification analysis on the multi-scale remote sensing image semantic interpretation model.
In one embodiment, the spatial scale remote sensing image labeling strategy in step S4 is: each descriptive statement is composed of a small-scale remote sensing object and a spatial relation thereof, and a large-scale object is hidden.
In one embodiment, step S6 specifically includes:
step S6.1: dividing a training set and a verification set from a data sample set according to a preset proportion;
step S6.2: respectively setting training parameters of an FCN network, a U-Net network and an LSTM network;
step S6.3: adding the original image and the large-scale GT image as input data into an FCN, performing iterative training on the FCN, and storing a corresponding result and an optimal model weight obtained after the training is completed;
step S6.4: adding the original image and the small-scale GT image as input data into a U-Net network, performing iterative training on the U-Net network, and storing a corresponding result and an optimal model weight obtained after the training is finished;
step S6.5: LSTM network training: and adding features extracted from the original image by VGG-19 and multi-scale semantic labels as input data into the LSTM network, performing iterative training on the LSTM network, and storing corresponding results and optimal model weights obtained after training.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
the invention provides a remote sensing object interpretation method based on a focusing weight matrix and a variable-scale semantic segmentation neural network, which comprises the following steps of firstly, obtaining a high-resolution remote sensing image of a preset research area, and preprocessing the high-resolution remote sensing image; then, making a thematic map layer of the research area, and rasterizing the vector thematic map to obtain a corresponding grid gray map; then, cutting the preprocessed remote sensing image and the grid gray level image, and extracting two sets of data sample sets with spatial scales; then, carrying out content annotation on each remote sensing image in the two sets of spatial scale data sample sets according to a multi-spatial scale remote sensing image annotation strategy to obtain sample set annotations; then constructing a variable-scale remote sensing image semantic interpretation model, obtaining a multi-scale semantic segmentation image through the interpretation model, extracting masks of two scale objects through a mask extraction algorithm, and associating small-scale mask objects segmented by the U-Net network with nouns in semantic description through variable-scale object identification; training an FCN (fuzzy C-means) network, a U-Net semantic segmentation network and an LSTM (least Square TM) network in the constructed variable-scale remote sensing image semantic interpretation model to obtain a trained model; and finally, recognizing the remote sensing object by using the trained model and adopting a remote sensing object recognition algorithm based on a focusing weight matrix.
Compared with the prior art, the invention constructs a remote sensing image variable-scale semantic interpretation model based on FCN, U-Net and LSTM networks, can generate remote sensing image description with multiple spatial scales, and can segment objects in an image and identify the spatial relationship end to end. Firstly, respectively inputting a remote sensing image into an FCN and a U-Net network to carry out semantic segmentation of two spatial scales, so that each pixel of an original image has a semantic label of two scales, and a hierarchical relation of multi-scale remote sensing objects can be formed; secondly, inputting the features extracted from the same image after the pre-trained VGG-19 into an LSTM network, and outputting semantic descriptions of the remote sensing objects and the spatial relationship thereof in two scales; and finally, establishing the relation between the nouns and the object mask graph in the semantic description through the focusing weight matrix, thereby improving the accuracy of object identification.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method for interpreting a remote sensing object based on a focusing weight matrix and a variable-scale semantic segmentation neural network according to the present invention;
FIG. 2 is a schematic diagram of variable scale object segmentation and image semantic annotation according to the present invention;
FIG. 3 is a network model structure diagram of the remote sensing object interpretation method based on the focus weight matrix and the variable scale semantic segmentation neural network.
Detailed Description
The invention aims to solve the technical problem that the identification is inaccurate due to the fact that the spatial relationship of a remote sensing object cannot be accurately identified by the method in the prior art, and provides a method for constructing the link between a noun in semantic description obtained by LSTM and an object mask image obtained by semantic segmentation and transferring the spatial relationship in the semantic description to the object mask images, so that the semantic segmentation of the remote sensing object and the end-to-end identification of the spatial relationship are realized.
In order to achieve the above purpose, the main concept of the invention is as follows:
by designing a remote sensing image variable-scale semantic interpretation model based on FCN, U-Net and LSTM networks, the remote sensing image description with multiple spatial scales can be generated, simultaneously, objects in the image are segmented, and the spatial relationship is identified end to end. Firstly, respectively inputting a remote sensing image into an FCN and a U-Net network to carry out semantic segmentation of two spatial scales, so that each pixel of an original image has a semantic label of two scales, and a hierarchical relation of multi-scale remote sensing objects can be formed; secondly, inputting the features extracted from the same image after the pre-trained VGG-19 into an LSTM network, and outputting semantic descriptions of the remote sensing objects and the spatial relationship thereof in two scales; and finally, establishing the relationship between the nouns and the object mask graph in the semantic description through a focusing weight matrix.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment provides a method for identifying a remote sensing object based on a focus weight matrix and a variable-scale semantic segmentation neural network, please refer to fig. 1, and the method comprises the following steps:
step S1: and acquiring a high-resolution remote sensing image of a preset research area, and preprocessing the acquired high-resolution remote sensing image.
Specifically, the acquired remote sensing image data is preprocessed, including geometric correction, atmospheric correction, clipping processing and the like. The preset study area can be selected according to the needs and the actual situation. In the embodiment, a Quickbird remote sensing image with the resolution of 60cm in a certain area of a certain city is obtained.
Step S2: and vectorizing by using professional GIS software to obtain a thematic map layer of the research area, and rasterizing the vector thematic map to obtain a corresponding grid gray map.
In particular, the professional GIS software may be ArcGIS software or other processing software.
Step S3: and cutting the preprocessed remote sensing image and the grid gray image, and extracting two sets of data sample sets with spatial scales, wherein the two sets of data sample sets with the spatial scales respectively comprise an original image, a large-scale GT image, the original image and a small-scale GT image.
In a specific implementation process, a proper cutting scale can be selected, the remote sensing image and the grid gray-scale image of the research area are cut by utilizing the ArcGIS script, and the cut sample is named by ID plus a suffix name of an image format. By the method of step S3, two sets of data sets of spatial scales can be extracted, one set includes the original image and the large-scale GT image, and the other set includes the original image and the small-scale GT image.
Step S4: and carrying out content annotation on each remote sensing image in the two sets of spatial scale data sample sets according to a multi-spatial scale remote sensing image annotation strategy to obtain sample set annotations.
Specifically, step S4 is to make labels GT of the sample set, make multi-scale semantic labels for each image in the sample set according to the semantic labeling policy, and write the labeling results into an Excel table, where the first column of each row in the table is the image name of each individual image, and the following is the corresponding multi-scale labeling statement.
In one embodiment, the spatial scale remote sensing image labeling strategy in step S4 is: each descriptive statement is composed of a small-scale remote sensing object and a spatial relation thereof, and a large-scale object is hidden.
Specifically, the multi-spatial scale remote sensing image labeling strategy of the invention is as follows:
(1) each description is composed of small-scale remote sensing objects and spatial relations thereof, and the small-scale remote sensing objects imply a large-scale object. That is, the image contains many large-scale objects, which means that each large-scale object contains many small-scale objects, and spatial relationships exist between objects of the same scale. Our labeling strategy is to describe the scale and spatial relationship information contained in the image as completely as possible, as shown in FIG. 2, in which O isi,OjRepresenting a large-scale object, 0i1,Oi2,Oj1,Oj2,OjnRepresenting small scale objects.
(2) In small scale labeling, one object is usually selected as the primary object to which the other objects are attached through spatial relationships. In this way, homogeneous small-scale objects do not appear repeatedly in one large object.
(3) If there are two or more large-scale objects, the corresponding sub-descriptions (small-scale objects and their spatial relationships, e.g. O)i1Ri12Oi2...) with a connection.
Step S5: constructing a variable-scale remote sensing image semantic interpretation model, obtaining a multi-scale semantic segmentation image through the interpretation model, extracting masks of two scale objects through a mask extraction algorithm, and associating a small-scale mask object segmented by a U-Net network with a noun in semantic description through variable-scale object identification, wherein the variable-scale remote sensing image semantic interpretation model comprises the following steps: the system comprises an FCN full convolution network, a U-Net semantic segmentation network and an LSTM network based on an Attention mechanism, wherein the FCN network is used for large-scale object segmentation, the U-Net network is used for small-scale object segmentation, and the LSTM is used for generating semantic description containing two space scale objects and space relation thereof.
Specifically, LSTM networks of FCN and U-Net semantic segmentation networks based on the Attention mechanism can be constructed in Tensorflow respectively. The FCN network is input with an original image and a large-scale GT image during training, the U-Net network is input with an original image and a small-scale GT image during training, and the LSTM network is input with an original image and a multi-scale semantic annotation GT during training, namely the image semantic annotation made in S4 comprises an artificial annotation statement of each image. In this way, a large-scale semantic segmentation image, a small-scale semantic segmentation image and multi-scale semantic description can be obtained by the model in a verification stage, wherein the semantic segmentation image can extract a mask of an object in the image through a mask algorithm, and finally, a small-scale mask object segmented by the U-Net network is associated with a noun in the semantic description through a variable-scale remote sensing object recognition algorithm, so that the spatial relationship between the remote sensing objects is obtained from the semantic description in this way, namely, the spatial relationship between the remote sensing objects is obtained through the constructed variable-scale remote sensing image semantic interpretation model in order to recognize the spatial relationship between the remote sensing objects, wherein the specific model structure is shown in FIG. 3.
Step S6: and training an FCN (fuzzy C-means) network, a U-Net semantic segmentation network and an LSTM (least Square TM) network in the constructed variable-scale remote sensing image semantic interpretation model to obtain a trained model.
Wherein, step S6 specifically includes:
step S6.1: dividing a training set and a verification set from a data sample set according to a preset proportion;
step S6.2: respectively setting training parameters of an FCN network, a U-Net network and an LSTM network;
step S6.3: adding the original image and the large-scale GT image as input data into an FCN, performing iterative training on the FCN, and storing a corresponding result and an optimal model weight obtained after the training is completed;
step S6.4: adding the original image and the small-scale GT image as input data into a U-Net network, performing iterative training on the U-Net network, and storing a corresponding result and an optimal model weight obtained after the training is finished;
step S6.5: LSTM network training: and adding features extracted from the original image by VGG-19 and multi-scale semantic labels as input data into the LSTM network, performing iterative training on the LSTM network, and storing corresponding results and optimal model weights obtained after training.
In a specific implementation, step S6.1 randomly divides the 1835 study sample sets into training sets and validation sets according to a certain proportion, for example, 1167 training samples and 668 validation samples are obtained.
Step S6.2, training parameters are set: for the FCN network, setting a learning rate of 1 × e-5, a batch _ size of 1 and an iteration number of 60000, setting a learning rate of 1 × e-4, a batch _ size of 20 and an iteration number of 120 in the U-Net network, and setting a Dropout parameter of 0.7 to prevent an overfitting phenomenon of the network; for the LSTM network, it is necessary to extract image features using a VGG-19 pre-training model, the size of the feature map is 14 × 512, the number of hidden layer neurons is set to 1024, the word embedding vector dimension is 512, the learning rate is set to 0.001, the batch _ size is set to 20, and the number of iterations is 120.
In step S6.3, the FCN segmentation precision reaches 0.89 through analysis, and in step S6.4, the U-Net segmentation precision reaches 0.93.
In step S6.5, the evaluation index values are shown in table 1 by analysis:
TABLE 1 LSTM evaluation indexes
Bleu_1 Bleu_2 Bleu_3 Bleu_4 METEOR ROUGE_L CIDEr
Method for producing a composite material 0.893 0.744 0.655 0.587 0.455 0.779 5.044
In table 1, BLEU is a common machine translation evaluation criterion, and n is usually 1 to 4, an accuracy (precision) based evaluation. The ROUGE _ L is calculated according to the recall rate and is an evaluation criterion of the automatic summarization task. METEOR is used for evaluating machine translation, performing word alignment on a translation given by a model and a reference translation, and calculating the accuracy, recall rate and F value of various conditions such as complete matching of words, stem matching, synonym matching and the like. The CIDER index treats each sentence as a "document," represents it as a tf-idf vector, and then calculates and scores the cosine similarity of the reference caption to the model-generated caption.
Step S7: the method for recognizing the remote sensing object by using the trained model specifically comprises the following steps: and positioning a focusing weight matrix of the noun generated by the LSTM network at the current moment to a corresponding small-scale object in a mask image obtained by the U-Net semantic segmentation, and finishing the identification of the object if the object class label is the same as the noun. Where the focus weight matrix is generated by the LSTM network at each time instant when a word is generated, which represents the region of interest (focus position) in the image for the currently generated word.
Specifically, by designing a remote sensing object recognition algorithm based on a focus weight matrix, the trained model can be used for recognizing the remote sensing object.
The identification of the remote sensing object is based on a focusing weight matrix generated by an LSTM network and a mask object extracted by a semantic segmentation graph obtained by a U-Net network through a mask algorithm. First, in the present embodiment, a 14 × 14 weight matrix (i.e., focus weight matrix) is resampled to a 210 × 210 size, and defined
Figure BDA0002138469270000081
Is the weight value of the focus weight matrix at the (i, j) position mijFor the pixel value of the mask object image obtained after the U-Net segmentation at the (i, j) position, in each mask object, the pixel value is alignedThe pixel value of the position of the image is the class index value C of the object, and the rest positions are 0.
Figure BDA0002138469270000082
The region of intersection of the region of interest weight matrix and the object mask map may be calculated using the following formula:
Figure BDA0002138469270000083
wherein C is a normalization factor, and the average weight value of the intersection region can be calculated by the following formula:
Figure BDA0002138469270000084
and if the class label of the remote sensing object is the same as the noun generated at the moment t, the position and the boundary of the remote sensing object can be identified through an object mask diagram.
Generally speaking, the invention designs a remote sensing image variable-scale semantic interpretation model based on FCN, U-Net and LSTM networks, which can generate remote sensing image description with multiple spatial scales, segment objects in images and identify spatial relations end to end. Firstly, respectively inputting a remote sensing image into an FCN and a U-Net network to carry out semantic segmentation of two spatial scales, so that each pixel of an original image has a semantic label of two scales, and a hierarchical relation of multi-scale remote sensing objects can be formed; secondly, inputting the features extracted from the same image after the pre-trained VGG-19 into an LSTM network, and outputting semantic descriptions of the remote sensing objects and the spatial relationship thereof in two scales; and finally, establishing the relationship between the nouns and the object mask graph in the semantic description through a focusing weight matrix.
In order to further improve the identification accuracy, in one embodiment, when the object class label is different from the noun, the method further includes starting a multi-scale remote sensing object correction algorithm, specifically: firstly, a large-scale mask object obtained by FCN semantic segmentation positioned in a current interest area is positioned through a scale-up method, and then a small-scale object with the same class label and noun is positioned in a candidate large-scale object through a scale-down method, so that the identification of the object is completed.
Specifically, the class label of the remote sensing object obtained directly by the object identification method in step S7 is often different from the noun generated at time t, and in order to solve this problem, the invention further provides a multi-scale remote sensing object correction algorithm.
In one embodiment, the method further comprises: and performing effect verification analysis on the multi-scale remote sensing image semantic interpretation model.
In a specific implementation process, the effect verification analysis of the multi-scale remote sensing image semantic interpretation model comprises the following steps: and analyzing and verifying the remote sensing object identification and correction result of the model by using the verification set sample so as to test the object identification and correction effect. Through analysis, in 668 verification samples, 300 GT sentences carry "with", and 256 generated description sentences carry "with", accounting for 85%, which indicates that the multi-scale semantic annotation strategy is feasible.
This embodiment analyzes the reliability of the description statements generated by 668 verification samples, and obtains the results shown in tables 2 and 3:
table 2 generated descriptive statement reliability analysis
Figure BDA0002138469270000091
TABLE 3 noun number matching before and after correction
Figure BDA0002138469270000092
Figure BDA0002138469270000101
Through the correction algorithm provided by the invention, the matching rate of nouns is improved from 41.87% to 83.64%, and is improved by 42%, and the experimental result proves that the correction algorithm is scientific and feasible.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims (5)

1. A remote sensing object interpretation method based on a focus weight matrix and a variable-scale semantic segmentation neural network is characterized by comprising the following steps:
step S1: acquiring a high-resolution remote sensing image of a preset research area, and preprocessing the acquired high-resolution remote sensing image;
step S2: vectorizing by using professional GIS software to obtain a thematic map layer of a research area, and rasterizing the vector thematic map to obtain a corresponding grid gray map;
step S3: cutting the preprocessed remote sensing image and the grid gray level image, and extracting two sets of data sample sets with spatial scales, wherein the two sets of data sample sets with the spatial scales respectively comprise an original image, a large-scale GT image, the original image and a small-scale GT image;
step S4: carrying out content annotation on each remote sensing image in the two sets of spatial scale data sample sets according to a multi-spatial scale remote sensing image annotation strategy to obtain sample set annotations;
step S5: constructing a variable-scale remote sensing image semantic interpretation model, obtaining a multi-scale semantic segmentation image through the interpretation model, extracting masks of two scale objects through a mask extraction algorithm, and associating a small-scale mask object segmented by a U-Net network with a noun in semantic description through variable-scale object identification, wherein the variable-scale remote sensing image semantic interpretation model comprises the following steps: the system comprises an FCN full convolution network, a U-Net semantic segmentation network and an LSTM network based on an Attention mechanism, wherein the FCN network is used for large-scale object segmentation, the U-Net network is used for small-scale object segmentation, and the LSTM is used for generating semantic description containing two space scale objects and space relation thereof;
step S6: training an FCN (fuzzy C-means) network, a U-Net semantic segmentation network and an LSTM (least Square TM) network in the constructed variable-scale remote sensing image semantic interpretation model to obtain a trained model;
step S7: the method for recognizing the remote sensing object by using the trained model specifically comprises the following steps: the method comprises the following steps of positioning a focus weight matrix of a noun generated by an LSTM network at the current moment to a corresponding small-scale object in a mask image obtained by U-Net semantic segmentation, and finishing the identification of the object if an object class label is the same as the noun, wherein the focus weight matrix is generated by the LSTM network when a word is generated at each moment and represents a region of interest of the currently generated word in an image, and the focus weight matrix of the noun generated at the current moment by the LSTM network is positioned to the corresponding small-scale object in the mask image obtained by U-Net semantic segmentation, and comprises the following steps:
obtaining pixel values of mask object images obtained after U-Net segmentation at (i, j) positions, wherein in each mask object, the pixel value of the position where the object is located is the class index value C of the object, and the rest positions are 0;
obtaining an intersection area of the focusing weight matrix and the object mask image;
and calculating the average weight value of the intersection region, and selecting the remote sensing object with the maximum average weight value, namely positioning the remote sensing object to a corresponding small-scale object in a mask image obtained by the semantic segmentation of the U-Net.
2. The method of claim 1, wherein when the object class label is not the same as the noun, the method further comprises initiating a multi-scale remote sensing object rectification algorithm, specifically: firstly, a large-scale mask object obtained by FCN semantic segmentation positioned in a current interest area is positioned through a scale-up method, and then a small-scale object with the same class label and noun is positioned in a candidate large-scale object through a scale-down method, so that the identification of the object is completed.
3. The method of claim 1, wherein the method further comprises: and performing effect verification analysis on the multi-scale remote sensing image semantic interpretation model.
4. The method of claim 1, wherein the spatial scale telemetric image labeling policy in step S4 is: each descriptive statement is composed of a small-scale remote sensing object and a spatial relation thereof, and a large-scale object is hidden.
5. The method according to claim 1, wherein step S6 specifically comprises:
step S6.1: dividing a training set and a verification set from a data sample set according to a preset proportion;
step S6.2: respectively setting training parameters of an FCN network, a U-Net network and an LSTM network;
step S6.3: adding the original image and the large-scale GT image as input data into an FCN, performing iterative training on the FCN, and storing a corresponding result and an optimal model weight obtained after the training is completed;
step S6.4: adding the original image and the small-scale GT image as input data into a U-Net network, performing iterative training on the U-Net network, and storing a corresponding result and an optimal model weight obtained after the training is finished;
step S6.5: LSTM network training: and adding features extracted from the original image by VGG-19 and multi-scale semantic labels as input data into the LSTM network, performing iterative training on the LSTM network, and storing corresponding results and optimal model weights obtained after training.
CN201910660740.0A 2019-07-22 2019-07-22 Remote sensing object interpretation method based on focusing weight matrix and variable-scale semantic segmentation neural network Active CN110490081B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910660740.0A CN110490081B (en) 2019-07-22 2019-07-22 Remote sensing object interpretation method based on focusing weight matrix and variable-scale semantic segmentation neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910660740.0A CN110490081B (en) 2019-07-22 2019-07-22 Remote sensing object interpretation method based on focusing weight matrix and variable-scale semantic segmentation neural network

Publications (2)

Publication Number Publication Date
CN110490081A CN110490081A (en) 2019-11-22
CN110490081B true CN110490081B (en) 2022-04-01

Family

ID=68547555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910660740.0A Active CN110490081B (en) 2019-07-22 2019-07-22 Remote sensing object interpretation method based on focusing weight matrix and variable-scale semantic segmentation neural network

Country Status (1)

Country Link
CN (1) CN110490081B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021059572A1 (en) * 2019-09-27 2021-04-01 富士フイルム株式会社 Information processing device, method for operating information processing device, and program for operating information processing device
CN111666849B (en) * 2020-05-28 2022-02-01 武汉大学 Multi-source remote sensing image water body detection method based on multi-view depth network iterative evolution
CN112651314A (en) * 2020-12-17 2021-04-13 湖北经济学院 Automatic landslide disaster-bearing body identification method based on semantic gate and double-temporal LSTM
CN112906627B (en) * 2021-03-15 2022-11-15 西南大学 Green pricklyash peel identification method based on semantic segmentation
CN113362287B (en) * 2021-05-24 2022-02-01 江苏星月测绘科技股份有限公司 Man-machine cooperative remote sensing image intelligent interpretation method
CN113313180B (en) * 2021-06-04 2022-08-16 太原理工大学 Remote sensing image semantic segmentation method based on deep confrontation learning
CN113435284B (en) * 2021-06-18 2022-06-28 武汉理工大学 Post-disaster road extraction method based on dynamic filtering and multi-direction attention fusion
CN113591633B (en) * 2021-07-18 2024-04-30 武汉理工大学 Object-oriented land utilization information interpretation method based on dynamic self-attention transducer
CN113591685B (en) * 2021-07-29 2023-10-27 武汉理工大学 Geographic object spatial relationship identification method and system based on multi-scale pooling
CN114882292B (en) * 2022-05-31 2024-04-12 武汉理工大学 Remote sensing image ocean target identification method based on cross-sample attention mechanism graph neural network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101692224A (en) * 2009-07-08 2010-04-07 南京师范大学 High-resolution remote sensing image search method fused with spatial relation semantics
CN105740901A (en) * 2016-01-29 2016-07-06 武汉理工大学 Geographic ontology based variable scale object-oriented remote sensing classification correction method
CN109086770A (en) * 2018-07-25 2018-12-25 成都快眼科技有限公司 A kind of image, semantic dividing method and model based on accurate scale prediction

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010132731A1 (en) * 2009-05-14 2010-11-18 Lightner Jonathan E Inverse modeling for characteristic prediction from multi-spectral and hyper-spectral remote sensed datasets

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101692224A (en) * 2009-07-08 2010-04-07 南京师范大学 High-resolution remote sensing image search method fused with spatial relation semantics
CN105740901A (en) * 2016-01-29 2016-07-06 武汉理工大学 Geographic ontology based variable scale object-oriented remote sensing classification correction method
CN109086770A (en) * 2018-07-25 2018-12-25 成都快眼科技有限公司 A kind of image, semantic dividing method and model based on accurate scale prediction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A fractal and entropy-based model for selecting the optimum spatial scale of soil erosion;崔巍;《Springer》;20180412;第1-7页 *
基于局部特征和弱标注信息的图像分类和识别;吴绿;《万方数据》;20181215;第1-114页 *

Also Published As

Publication number Publication date
CN110490081A (en) 2019-11-22

Similar Documents

Publication Publication Date Title
CN110490081B (en) Remote sensing object interpretation method based on focusing weight matrix and variable-scale semantic segmentation neural network
RU2661750C1 (en) Symbols recognition with the use of artificial intelligence
RU2699687C1 (en) Detecting text fields using neural networks
RU2701995C2 (en) Automatic determination of set of categories for document classification
CA3124358C (en) Method and system for identifying citations within regulatory content
US20190294921A1 (en) Field identification in an image using artificial intelligence
RU2760471C1 (en) Methods and systems for identifying fields in a document
CN116861014B (en) Image information extraction method and device based on pre-training language model
CN112052684A (en) Named entity identification method, device, equipment and storage medium for power metering
CN108681735A (en) Optical character recognition method based on convolutional neural networks deep learning model
CN112069900A (en) Bill character recognition method and system based on convolutional neural network
Karunarathne et al. Recognizing ancient sinhala inscription characters using neural network technologies
CN111666937A (en) Method and system for recognizing text in image
CN112000809A (en) Incremental learning method and device for text categories and readable storage medium
CN116311310A (en) Universal form identification method and device combining semantic segmentation and sequence prediction
Van Hoai et al. Text recognition for Vietnamese identity card based on deep features network
Ahmed et al. Recognition of Urdu Handwritten Alphabet Using Convolutional Neural Network (CNN).
Sharma et al. [Retracted] Optimized CNN‐Based Recognition of District Names of Punjab State in Gurmukhi Script
CN117115565B (en) Autonomous perception-based image classification method and device and intelligent terminal
Al Ghamdi A novel approach to printed Arabic optical character recognition
CN112836709A (en) Automatic image description method based on spatial attention enhancement mechanism
CN109657710B (en) Data screening method and device, server and storage medium
Zhou et al. SRRNet: A Transformer Structure with Adaptive 2D Spatial Attention Mechanism for Cell Phone-Captured Shopping Receipt Recognition
Dadi Tifinagh-IRCAM Handwritten character recognition using Deep learning
Su et al. FPRNet: end-to-end full-page recognition model for handwritten Chinese essay

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant