CN115937693A

CN115937693A - Road identification method and system based on remote sensing image

Info

Publication number: CN115937693A
Application number: CN202310134104.0A
Authority: CN
Inventors: 张凌涛; 段晨博; 严浩然; 薛帆
Original assignee: Central South University of Forestry and Technology
Current assignee: Central South University of Forestry and Technology
Priority date: 2023-02-20
Filing date: 2023-02-20
Publication date: 2023-04-07

Abstract

The application discloses a road identification method and a system based on remote sensing images, wherein the method comprises the following steps: s1, constructing a road identification model; s2, inputting the training set remote sensing image into the road recognition model to obtain a road recognition training prediction image; s3, calculating a loss value according to the training set label image and the training prediction image, and updating model parameters; s4, repeating the S2 and the S3 to obtain a trained road recognition model; s5, inputting the remote sensing image of the test set into the trained road recognition model to obtain a road recognition test prediction image; and S6, comparing the test prediction graph with the test set label graph to obtain each evaluation index, and evaluating the trained road recognition model according to the evaluation index values. According to the method and the device, the multi-scale feature extraction module is used for enhancing the U-Net feature extraction capability, and an attention mechanism is added at the jump connection of the U-Net model, so that the model can make full use of effective feature information.

Description

Road identification method and system based on remote sensing image

Technical Field

The application relates to the field of semantic segmentation, in particular to a road identification method and system based on a remote sensing image.

Background

Deep learning has rapidly progressed in recent years. Compared with the traditional semantic segmentation mode, the semantic segmentation method based on deep learning can learn useful characteristic information in mass data, and has higher segmentation precision. The proposal of a full convolutional neural network (FCN) has the significance of milestones in the research process of semantic segmentation, but continuous down-sampling of the FCN can lose characteristic information; the full convolution neural network up-samples the output characteristic diagram of the last convolution layer in a deconvolution mode to make the size of the output characteristic diagram the same as that of an input image, and the loss of spatial information can be caused by a high sampling rate, so that the segmentation accuracy is greatly influenced finally. The DeepLab v3 network improves the hollow space pyramid pooling structure, and a space hollow pyramid module is constructed by using hollow convolution kernels with different sizes to acquire multi-scale characteristic information; and a batch normalization layer (BN) is added to avoid network overfitting, but the situation of discontinuous segmentation and rough segmentation boundary can occur when a DeepLab v3 network is used for semantic segmentation research. SegNet is a symmetrical network structure, and the thought of SegNet is based on a full convolution neural network, but the SegNet has the problems of large parameter quantity, low speed and the like. The U-Net network splices the image characteristic information extracted in the encoder stage and the characteristic information in the decoder stage in the channel dimension on the basis of the SegNet network, namely, the jumping connection operation; and partial characteristic information lost in the down-sampling process is compensated, so that the segmentation precision of the model is improved, but the effect is still limited.

Aiming at the adverse effects of losing characteristic information in the downsampling process of the U-Net model and the fact that the model cannot effectively utilize useful characteristic information on a prediction result, the road segmentation method in the remote sensing image based on the multi-scale characteristic extraction and attention mechanism is provided. According to the method, a multi-scale feature extraction module is used for enhancing the U-Net feature extraction capability, and an attention mechanism is added at the jump connection of the U-Net model, so that the model can fully utilize effective feature information. Experimental results show that the method is superior to a common semantic segmentation method.

Disclosure of Invention

The application discloses a road identification method and system based on remote sensing images, and the method and system use a multi-scale feature extraction module to enhance the U-Net feature extraction capability, and add an attention mechanism at the jump connection of a U-Net model to enable the model to fully utilize effective feature information.

In order to achieve the above purpose, the present application provides the following solutions:

a road identification method based on remote sensing images comprises the following steps:

s1, constructing a road identification model;

s2, inputting the training set remote sensing image into the road recognition model to obtain a road recognition training prediction image;

s3, calculating a loss value according to the training set label image and the training prediction image, and updating model parameters;

s4, repeating the S2 and the S3 to obtain a trained road recognition model;

and S5, inputting the remote sensing image of the test set into the trained road recognition model to obtain a road recognition test prediction map.

Optionally, the method for identifying a road based on a remote sensing image further includes S6:

and comparing the test prediction graph with the test set label graph to obtain each evaluation index, and evaluating the trained road identification model according to the evaluation index value.

Optionally, the method for obtaining the road recognition training prediction graph includes:

carrying out convolution, multi-scale feature extraction and down-sampling on the training set remote sensing image to obtain a feature map;

carrying out up-sampling on the feature map to obtain a feature map with a converted size;

splicing according to the feature diagram after size conversion and the feature diagram obtained according to the attention mechanism to obtain a spliced feature diagram;

and performing convolution on the spliced characteristic diagram to obtain the road recognition training prediction diagram.

Optionally, the method for obtaining the feature map includes:

carrying out convolution on the input remote sensing image to obtain a feature map with the preset number of channels;

performing multi-scale feature extraction on the feature maps with the preset number of channels to obtain feature maps with extracted features;

performing maximum pooling on the feature map after feature extraction to obtain a feature map with a preset size;

and iterating the feature graph after the features are extracted by adopting the convolution operation, the multi-scale feature extraction operation and the maximum pooling operation until the feature graph reaches a preset length, width and number of channels.

Optionally, the multi-scale feature extraction specifically includes:

and carrying out pyramid convolution on the feature graph after feature extraction to obtain a multi-scale feature extraction feature graph with the same length, width and number of channels as the feature graph after feature extraction.

Optionally, the method for obtaining the feature map after size transformation includes:

and performing up-sampling operation on the characteristic diagrams reaching the preset length and width and the preset number of channels to obtain characteristic diagrams of up-sampling preset sizes.

splicing the characteristic diagram of the up-sampling preset size and the characteristic diagram obtained by the attention mechanism to obtain a spliced characteristic diagram;

performing convolution on the spliced feature maps by 3 multiplied by 3 twice to obtain feature maps with preset channel numbers;

obtaining a characteristic diagram with a preset size by adopting the characteristic diagram with the preset size of the up-sampling, the characteristic diagram obtained by the attention mechanism and the two times of convolution iterations of 3 multiplied by 3;

and convolving the feature map with the preset size to obtain the road recognition training prediction map.

Optionally, the attention mechanism specifically includes:

wherein x is _c () is an input characteristic diagram,

for attention weighting, is asserted>

Is the attention weight.

Optionally, the method for calculating the loss value includes:

calculating the loss values of the road recognition training label image and the road recognition training prediction image by adding two loss functions of Cross EntropyLoss and DiceLoss;

the crossEntropyLoss loss function is:

wherein p is _i Training prediction graph for model output, g _i Training label images, wherein N is the number of samples;

the DiceLoss loss function is:

wherein p is _i Training prediction graph output for model, g _i For training label images, N is the number of samples.

A remote sensing image-based road identification system, comprising: the system comprises a model construction module, a training prediction diagram acquisition module, a parameter updating module, a model optimization module, a test prediction diagram acquisition module and a network model evaluation module;

the model construction module is used for constructing a road identification model;

the training prediction image acquisition module is used for inputting the training set remote sensing images into the road recognition model to obtain a road recognition training prediction image;

the parameter updating module is used for calculating a loss value according to the training set label image and the training prediction image and updating the model parameters;

the model optimization module is used for repeating the training prediction image acquisition module and the parameter updating module to obtain a trained road identification model;

the test prediction image acquisition module is used for inputting the test set remote sensing image into the trained road identification model to obtain a road identification test prediction image;

and the network model evaluation module is used for comparing the test prediction graph with the test set label graph to obtain each evaluation index, and evaluating the trained road identification model according to the evaluation index values.

The beneficial effect of this application does:

a road identification method and a system based on remote sensing images are disclosed, wherein a multi-scale feature extraction module is used for enhancing the U-Net feature extraction capability, and an attention mechanism is added at the jump connection of a U-Net model, so that the model can fully utilize effective feature information. Experimental results show that the method is superior to a common semantic segmentation method. The road segmentation aims at identifying road parts from pixel levels in the remote sensing image, the segmentation result has important significance, can be used as one of judgment bases of evacuation transition personnel of disasters such as earthquake and the like, and can accurately judge the road condition by segmenting the parts including the road in the image, so that people can be evacuated from the road with good road surface condition quickly, and the loss is reduced.

Drawings

In order to more clearly illustrate the technical solution of the present application, the drawings needed to be used in the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for a person skilled in the art to obtain other drawings without any inventive exercise.

FIG. 1 is a method step diagram of a method and system for road identification based on remote sensing images according to an embodiment of the present application;

FIG. 2 is a method main structure diagram of a road identification method and system based on remote sensing images according to an embodiment of the present application;

FIG. 3 is a structural diagram of a multi-scale feature extraction module in the method and system for road identification based on remote sensing images according to the embodiment of the present application;

FIG. 4 is a structural diagram of an attention mechanism in a method and system for road identification based on remote sensing images according to an embodiment of the present application;

FIG. 5 is a comparison diagram of road segmentation effects of a road identification method and system based on remote sensing images according to an embodiment of the present application;

fig. 6 is a comparison diagram of segmentation details of a road identification method and system based on a remote sensing image according to an embodiment of the present application.

The specific implementation mode is as follows:

the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.

In this embodiment, a method and a system for road identification based on remote sensing images, as shown in fig. 1 to 6, specifically include:

s1, constructing a road identification model;

in the embodiment, a remote sensing original image with the size of 3 × 256 × 256 is input into a model, where C =3 is the number of image channels, i.e., R, G, and B channels, H =256 is the length of the image, and W =256 is the width of the image;

s4, repeating the S2 and the S3 to obtain a trained road recognition model;

the method for obtaining the road recognition training prediction graph comprises the following steps:

carrying out convolution, multi-scale feature extraction and downsampling on the training set remote sensing image to obtain a feature map;

The method for obtaining the characteristic diagram comprises the following steps:

In the present embodiment, the size of the input image is 3 × 256 × 256, and the image of this size is convolved by 3 × 3 to obtain an image of 64 × 256 × 256; after the initial convolution, the length and the width of the image are kept unchanged, but the number of channels is changed from 3 to 64;

carrying out multi-scale feature extraction on feature maps with the size of 64 multiplied by 256; the characteristic information can be lost in the model down-sampling process, the model cannot effectively utilize the adverse effect of useful information on the prediction result, and the characteristic extraction capability can be enhanced by adopting multi-scale characteristic extraction; the U-Net down-sampling module is used for extracting the features of the target, the feature extraction capability of an SPC enhanced model is used in the EPSANet, and the second 3 x 3 convolution layer in the U-Net down-sampling module is replaced by the multi-scale feature extraction module based on the U-Net down-sampling module and the SPC.

The process of the multi-scale feature extraction module comprises the following steps: setting the size of a feature map X input into the multi-scale feature extraction module as C multiplied by H multiplied by W, wherein C is the number of feature map channels, and H is the length of the feature map; w is the width of the feature map; inputting, obtaining a feature map X after convolution of 3 multiplied by 3, firstly dividing the feature map X into S groups of sub-feature maps [ F ] by using pyramid convolution ₀ ，F ₁ ，...，F _S-1 ]Compressing the number of channels of each group of sub-feature maps to C/S, wherein C is an integral multiple of S, and obtaining F as each group of sub-feature maps _i ∈ ^(C/S)×H×W Wherein i =0,1.. S-1; this process is represented by equation (1):

F _i ＝Conv(n _i ×n _i ,G _i )(X) i＝0,1,.....S-1 (1)

wherein n is _i Is the convolution kernel size; g is the packet convolution, can play the effect that reduces the parameter quantity, can effectively reduce the parameter quantity of model, promotes efficiency. Through the steps, the sub-feature graphs with the S groups of channels with the number of C/S and the length and the width of H and W respectively can be obtained. And secondly, splicing the S groups of sub-feature maps on the channel dimension to finally obtain an output feature map, wherein the process is expressed by an equation (2):

F＝Cat([F ₀ ,F ₁ ,...,F _S-1 ]) (2)

and then inputting the result into a pooling layer, and changing the length and width of an output characteristic diagram into half of the input characteristic diagram after passing through the pooling layer. In this embodiment, a multi-scale feature extraction module is used to perform feature extraction on an input with a size of 64 × 256 × 256 to obtain a 64 × 256 × 256 feature map.

The obtained 64 × 256 × 256 feature maps are pooled to obtain 64 × 128 × 128 feature maps. The method comprises the steps of performing 3 × 3 convolution, multi-scale feature extraction and maximum pooling on feature maps with the size of 64 × 128 × 128 to obtain feature maps with the size of 128 × 64 × 64, performing 3 × 3 convolution, multi-scale feature extraction and maximum pooling on feature maps with the size of 128 × 64 × 64 to obtain feature maps with the size of 256 × 32 × 32, performing 3 × 3 convolution, multi-scale feature extraction and maximum pooling on feature maps with the size of 256 × 32 × 32 to obtain feature maps with the size of 512 × 16 × 16, and performing 3 × 3 convolution and multi-scale feature extraction on feature maps with the size of 512 × 16 × 16 to obtain feature maps with the size of 512 × 16 × 16.

And splicing the feature graph after the size conversion and the feature graph obtained according to the attention mechanism to obtain a spliced feature graph, and obtaining the road recognition training prediction graph according to the spliced feature graph. Splicing the characteristic graph of the up-sampling preset size and the characteristic graph obtained by the attention mechanism to obtain a spliced characteristic graph; it is worth mentioning that the sizes of the two feature maps to be spliced are required to be consistent. Performing 3 × 3 convolution twice on the spliced feature graph to obtain images with the preset number of channels; obtaining a characteristic diagram with a preset size by adopting the characteristic diagram with the preset size of the up-sampling, the characteristic diagram obtained by the attention mechanism and the two times of convolution iteration of 3 multiplied by 3; and carrying out 1 x 1 convolution on the feature map with the preset size to obtain the road recognition training prediction map.

Among them, the attention mechanism adopted in the present embodiment includes: the attention mechanism structure used in the present application is modified based on the attention mechanism CA (coordinateateatent) as shown in fig. 4.

For input F ∈ ^C×H×W Each channel is encoded along the horizontal coordinate direction and the vertical coordinate direction by using pooling kernels with the sizes of (H, 1) and (1, W), and input features in the horizontal direction and the vertical direction can be aggregated into a pair of squares through the transformationTo the perceptual feature map, then for the input F ∈ ^512×32×32 The sizes of the obtained directional perception feature maps are 512 × 32 × 1 and 512 × 1 × 32 respectively. Each feature map contains the long-range dependence of the input feature map along one spatial direction and stores precise position information along the other spatial direction, which helps the network to more precisely locate the target of interest.

After transformation in information embedding, the generated output characteristic graphs are spliced through a shared 1 x 1 convolution transformation function F ₁ And (3) carrying out a transformation operation: as shown in equation (3):

f＝δ(F ₁ ([z ^h ,z ^w ])) (3)

wherein: f is an element of ^{(512/r)×(32+32)} The feature map of the spatial information in the horizontal direction and the vertical direction is an intermediate feature map, r represents a down-sampling proportion, the complexity and the calculation cost of the model can be reduced by reducing the number of channels of f by using a proper reduction ratio r, and z ^h And z ^w For direction-aware feature maps, [, ]]Is a stitching operation along a spatial dimension; δ is a non-linear activation function.

Decomposing f into 2 independent tensors f along spatial dimension after normalization and non-linear processing ^h ∈ ^(512/r)×32 And f ^w ∈ ^(512/r)×32 . Using 2 1 x 1 convolution transforms F _h And F _w Will the characteristic diagram f ^h And f ^w Transformation to and input feature layer F ∈ ^512×32×32 Tensors with the same number of channels. This process can be expressed as equation (4) and equation (5):

g ^h ＝σ(F _h (f ^h )) (4)

g ^w ＝σ(F _w (f ^w )) (5)

to improve the result accuracy, 2S =4 SPC modules are used in this application instead of 2 1 × 1 convolution transforms in the above steps and a BN layer is added. The modified attention mechanism structure diagram is shown in fig. 4:

the results of equations (4) and (5) are converted into equations (6) and (7) at this time:

s ^h ＝σ(S _h (f ^h )) (6)

s ^w ＝σ(S _w (f ^w )) (7)

wherein sigma represents a sigmoid activation function, the complexity of the model can be reduced, and the calculation cost can be reduced. S _h (. And S) _w (. Each represents a characteristic diagram f ^h And f ^w The output obtained after passing through the SPC module is specifically shown in equations (8) and (9):

wherein:

F _i ^h ＝Conv(n _i ×n _i ,G _i )(f ^h ) i＝0,....,3 (10)

F _i ^w ＝Conv(n _i ×n _i ,G _i )(f ^w ) i＝0,....,3 (11)

finally, the obtained result s is compared ^h And s ^w Expanding, and obtaining a final attention weight matrix by using a matrix multiplication method, wherein the final output of the attention mechanism is shown as an equation (12):

the attention output at this time is Y e ^512×32×32 。

The method for obtaining the road recognition training prediction graph according to the spliced feature graph comprises the following steps of:

splicing the characteristic diagram of the up-sampling preset size with a characteristic diagram obtained by an attention mechanism to obtain a spliced characteristic diagram;

The upsampling comprises the following steps: and performing up-sampling operation on the characteristic diagram reaching the preset length and width and the preset number of channels.

And splicing the feature map obtained by up-sampling with the attention mechanism feature map with the same size to obtain the feature map with the preset size.

Performing 3 × 3 convolution on the feature map with the preset size to obtain a feature map with the preset number of channels;

the above up-sampling, splicing and convolution operations are repeated until the length and width of the image and the number of channels reach the final preset values, in this embodiment, the final preset values are 64 × 256 × 256.

In this embodiment, an up-sampling transform size is adopted for a 512 × 16 × 16 feature map obtained by multi-scale feature extraction, and a 512 × 32 × 32 feature map is obtained by up-sampling; splicing two characteristic graphs by utilizing a characteristic graph obtained by a multi-scale characteristic extraction channel and a characteristic graph with the length and the width of 512 multiplied by 32 through an attention mechanism to obtain a characteristic graph with the size of 1024 multiplied by 32; carrying out two times of 3 × 3 convolution on the obtained 1024 × 32 × 32 feature map, obtaining a 512 × 32 × 32 feature map through the first convolution, obtaining a 256 × 32 × 32 feature map through the second convolution, and transforming the number of channels; the obtained 256 × 32 × 32 feature map is subjected to up-sampling to transform the length and the width, and the 256 × 64 × 64 feature map is obtained after up-sampling; splicing two feature maps by using a feature map obtained by attention mechanism of the feature map with the length, width and number of channels being 256 × 64 × 64 after multi-scale feature extraction to obtain a feature map with the length, width and number of channels being 512 × 64 × 64; performing two times of 3 × 3 convolution on the obtained 512 × 64 × 64 feature map, obtaining a 256 × 64 × 64 feature map through the first convolution, obtaining a 128 × 64 × 64 feature map through the second convolution, and changing the number of channels through the two times of convolution; the obtained 128 × 64 × 64 feature map is subjected to up-sampling transform size, and a 128 × 128 × 128 feature map is obtained after up-sampling; splicing two feature maps by using a feature map obtained by attention mechanism of the feature map with the size and the number of channels of 128 multiplied by 128 after multi-scale feature extraction to obtain a feature map with the length, the width and the number of channels of 256 multiplied by 128; performing two times of 3 × 3 convolution on the obtained 256 × 128 × 128 feature map, obtaining a 128 × 128 × 128 feature map through the first convolution, obtaining a 64 × 128 × 128 feature map through the second convolution, and changing the number of channels through the two times of convolution; the obtained feature map of 64 × 128 × 128 is subjected to up-sampling transform size, and a feature map of 64 × 256 × 256 is obtained through up-sampling; extracting feature maps with the length, the width and the number of channels of 64 multiplied by 256 by utilizing multi-scale features, obtaining the feature maps through an attention mechanism, and splicing the two feature maps to obtain the feature map with the length, the width and the number of channels of 128 multiplied by 256; performing two times of 3 × 3 convolution on the obtained 128 × 256 × 256 feature map, obtaining a 64 × 256 × 256 feature map through the first convolution, obtaining a 64 × 256 × 256 feature map through the second convolution, and changing the number of channels through the two times of convolution; the obtained 64 × 256 × 256 feature maps are convolved by 1 × 1 to obtain 2 × 256 × 256 feature maps, i.e., road recognition training prediction maps.

Adding two loss functions of Cross EntropyLoss and DiceLoss to calculate the loss values of the training set label image and the training prediction image;

L＝L _CEL +L _DL (13)

the crossEntropyLoss loss function includes:

wherein p is _i Training prediction graph output for model, g _i For training label images, N is the number of samples;

the DiceLoss loss function includes:

The loss function is used for measuring the difference between the predicted image output by the model and the image real tag image, and an optimizer is used for updating and adjusting the model parameters so that the loss value is gradually reduced.

S5, inputting the remote sensing image of the test set into the trained road recognition model to obtain a road recognition test prediction image;

and S6, comparing the test prediction graph with the test set label graph to obtain each evaluation index, and evaluating the trained road recognition model according to the evaluation index values.

And (3) putting 230 test set images (including 230 corresponding label images) into the trained road recognition model to obtain a road recognition test prediction image.

In this embodiment, for the road segmentation data set, three indexes of Pixel Accuracy (PA), class average pixel accuracy (mPA) and average cross-over ratio (mlou) are used to evaluate the performance of the segmentation model. The three indicators are calculated as follows:

in the equation k represents the number of prediction classes, p _ij The number of points representing class i predicted as class j, p _ii And p _jj I.e. the result that the pixels representing class i and class j are correctly judged. PA representsPredicting the proportion of the correct pixel number to the total pixel number; mPA represents the proportion of the correctly predicted pixel number of each category to the total pixel number, and the pixel numbers are accumulated and then averaged; and mIoU represents the ratio of the intersection and union of the prediction result and the real result of each category, and the average value is obtained after accumulation and summation.

Compared with other semantic segmentation methods, the method is used for building a model network structure based on a pytorch deep learning framework, the Adam optimizer algorithm is used for optimizing weight updating, data set samples come from Massachusetts Roads Dataset, a new data set comprising 5800 training sets, 230 verification sets and 230 test sets is obtained after cutting, and the length and the width of each image are 256 multiplied by 256.

1. Comparative experiment:

the network framework provided by the application is compared with other commonly used semantic segmentation networks, and as shown in table 1, the method provided by the application is superior to the existing common methods in all indexes. On the index mIoU, the index mIoU is improved by 1.8% compared with DeepLabv3, is improved by 1.7% compared with FCN, is improved by 0.6% compared with SegNet, and is advanced by 0.4% compared with U-Net; the index PA is the same as but better than other methods, wherein the index PA is improved by 0.5% compared with FCN and DeepLabv3, and is advanced by 0.1% compared with SegNet and U-Net; on the index mPA, about 0.3-0.7% of all the methods in the table are used. Compared with other methods, the method provided by the application has higher result segmentation precision. Although the improvement of the three evaluation indicators is less significant than other networks compared to U-Net, the method of the present application clearly has better effect in some segmentation details, as shown in fig. 6. The road division index pairs are shown in table 1:

TABLE 1

2. Ablation experiment

In order to verify the validity of each module of the network proposed by the present application, ablation experiments were performed on different modules in this subsection. The selection uses U-Net as the basic network (baseline) of the ablation experiment.

(1) Multi-scale feature extraction module

A multi-scale feature extraction module is used in the basic network U-Net and compared with the basic network, as shown in Table 2, the PA of the network is increased by 0.1% compared with the U-Net after the multi-scale feature extraction module is added, but the mIoU index is increased by 0.3% compared with the U-Net. This shows that the road features can be better extracted by adding a multi-scale feature extraction Module (MFE) to the downsampling module.

(2) Attention mechanism

On the basis of using a multi-scale feature extraction module in a basic network U-Net downsampling module, an attention mechanism CA is added at a middle jump connection position, and the attention mechanism CA is effectively shown as the result that the index mPA is increased by 0.3% in the table 2. On the basis, by using the modified attention mechanism provided by the application, as can be seen from the table, compared with the unmodified attention mechanism CA, the method of the application is improved by 0.1% on the index mIoU, and the index mPA is improved by 0.5%. This shows that the modified attention mechanism has a significant effect on improving the network segmentation performance. A comparison of the performance of the different modules added is shown in table 2:

TABLE 2

Example two:

a remote sensing image-based road identification system, comprising: the device comprises a model construction module, a training prediction diagram acquisition module, a parameter updating module, a model optimization module, a test prediction diagram acquisition module and a network model evaluation module;

the training prediction image acquisition module is used for inputting a training set remote sensing image into the road recognition model to obtain a road recognition training prediction image;

The above-described embodiments are merely illustrative of the preferred embodiments of the present application, and do not limit the scope of the present application, and various modifications and improvements made to the technical solutions of the present application by those skilled in the art without departing from the spirit of the present application should fall within the protection scope defined by the claims of the present application.

Claims

1. A road identification method based on remote sensing images is characterized by comprising the following steps:

s1, constructing a road identification model;

s2, inputting the training set remote sensing image into the road recognition model to obtain a road recognition training prediction chart;

s4, repeating the S2 and the S3 to obtain a trained road recognition model;

2. The remote sensing image-based road recognition method according to claim 1, further comprising S6:

3. The method for road recognition based on remote sensing images as claimed in claim 1, wherein the method for obtaining the road recognition training prediction map comprises the following steps:

up-sampling the characteristic diagram to obtain a characteristic diagram after size conversion;

4. The method for road identification based on remote sensing images as claimed in claim 3, wherein the method for obtaining the feature map comprises:

and iterating the feature map after the features are extracted by adopting the convolution operation, the multi-scale feature extraction operation and the maximum pooling operation until the feature map reaches a preset length, width and number of channels.

5. The method and system for road identification based on remote sensing images according to claim 4, wherein the multi-scale feature extraction specifically comprises:

6. The method for road identification based on remote sensing images as claimed in claim 4, wherein the method for obtaining the feature map after size transformation comprises:

and performing up-sampling operation on the characteristic diagrams reaching the preset length and width and the preset number of channels to obtain the characteristic diagrams of up-sampling preset sizes.

7. The method for road recognition based on remote sensing images as claimed in claim 6, wherein the method for obtaining the road recognition training prediction map comprises the following steps:

performing convolution on the spliced feature maps by 3 × 3 twice to obtain feature maps with preset channel numbers;

and performing convolution on the feature map with the preset size to obtain the road recognition training prediction map.

8. The method for road identification based on remote sensing images according to claim 7, wherein the attention mechanism specifically comprises:

wherein x is _c () is an input characteristic diagram,

for attention weighting, based on the weight of the eye>

Is the attention weight.

9. The method and system for road identification based on remote sensing images of claim 1, wherein the method for calculating the loss value comprises:

the crossEntropyLoss loss function is:

wherein p is _i Training prediction graph output for model, g _i Training label images, wherein N is the number of samples;

the DiceLoss loss function is:

10. A road recognition system based on remote sensing images, comprising: the system comprises a model construction module, a training prediction diagram acquisition module, a parameter updating module, a model optimization module, a test prediction diagram acquisition module and a network model evaluation module;

the test prediction image acquisition module is used for inputting a test set remote sensing image into the trained road recognition model to obtain a road recognition test prediction image;

and the network model evaluation module is used for comparing the test prediction graph with the test set label graph to obtain each evaluation index, and evaluating the trained road identification model according to the evaluation index value.