CN112084901A

CN112084901A - GCAM-based high-resolution SAR image airport runway area automatic detection method and system

Info

Publication number: CN112084901A
Application number: CN202010871235.3A
Authority: CN
Inventors: 陈立福; 谭思雨; 潘舟浩; 邢进; 李振洪; 袁志辉; 邢学敏
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2020-12-15
Anticipated expiration: 2040-08-26
Also published as: CN112084901B

Abstract

The invention discloses a GCAM-based high-resolution SAR image airport runway area automatic detection method and a system thereof, wherein the GCAM-based high-resolution SAR image airport runway area automatic detection method comprises the steps of carrying out downsampling on a high-resolution SAR image to generate a medium-resolution image; inputting the medium-resolution image into a geographic space context attention mechanism network GCAM to extract a runway area; and carrying out coordinate mapping on the extracted runway area to obtain the detection result of the runway area of the final high-resolution SAR image. Experiments show that compared with DeeplaLV 3+, RefineNet and MDDA networks, the method is high in precision and short in time consumption, can fully learn the geospatial information of the SAR image airport, and can realize high-precision, rapid and automatic extraction of the runway area of the high-resolution SAR image airport.

Description

GCAM-based high-resolution SAR image airport runway area automatic detection method and system

Technical Field

The invention relates to an airport runway area automatic detection technology, in particular to a GCAM-based high-resolution SAR image airport runway area automatic detection method and a GCAM-based high-resolution SAR image airport runway area automatic detection system.

Background

Airports are important transportation hubs and military facilities, and the detection of airport targets from Synthetic Aperture Radar (SAR) images has become an important application. SAR has the advantages of all-weather imaging all day long, cloud and fog penetration and the like, but SAR images are not easy to read compared with optical images, interpretation is more complex, and most airport detection is usually based on optical remote sensing images. With the higher resolution and more data of the SAR images, the research on extracting airports by using the SAR images is gradually increased in recent years, and the related research also starts to be deepened continuously. The traditional airport extraction method is time-consuming and labor-consuming, most of the airport extraction methods only have good effect on the optical image, but the airport extraction effect on the SAR image is poor. Therefore, the realization of automatic and rapid extraction of the airport runway area on the high-resolution SAR image has profound and urgent practical significance. In addition, the false alarm generated in the airplane detection can be greatly reduced by using the airport runway area to mask the airplane detection, and the airplane detection precision is improved.

Airport detection has wide application in navigation, accident search and rescue, airplane positioning and the like of a navigation station. The runway area is one of the main components of the airport, and in the research on airport detection, the research on detecting the airport on an optical remote sensing image is more. The prior art carries out airport detection by using a traditional method for extracting airport edge line segments, but the method for extracting the line segments requires that an airport has obvious linear characteristics, which is not suitable for large civil airports with more stations and weak runway linear characteristics; some other schemes use a sparse reconstruction saliency model (SRS) and a target-aware active contour model (TAACM) to complete airport detection, and the method enhances the detail extraction of an airport; some schemes combine a visual saliency analysis model, a bidirectional complementary saliency analysis module and a saliency active contour model (SOACM) to extract the airport contour, and the method is suitable for most optical remote sensing images; the SAR image has strong penetration capability, can work without interference, and can acquire abundant ground feature information, so that the SAR image gradually becomes an experimental object for airport detection. Some schemes combine the traditional line segment grouping method and the significance analysis model to detect the airport on the small SAR image, but the method is not suitable for detecting the airport on the large SAR image; some other proposals propose a PolSAR airport runway detection algorithm combining optimized polarization features and random forests, but the method can only effectively extract the parallel runway features in the airport.

In recent years, deep learning has achieved a very good effect in the direction of semantic segmentation. The semantic segmentation is a deep learning method for performing feature learning based on image pixel points so as to realize different types of image division. Airport detection needs to extract all airport features, and the principle is consistent with the semantic segmentation idea, so that a method combining deep learning and airport detection begins to appear. For example: a certain prior art provides an airport detection method combining a deep learning YOLO model and a significance analysis model; in the prior art, an airport is detected by combining a deep learning Goole-LF network and a Support Vector Machine (SVM) method; in the prior art, airport extraction is carried out by combining a deep learning Faster-CNN network and a space analysis method; an end-to-end depth transferable convolution depth learning network is constructed in the prior art to detect an airport; however, the above methods are examples of applying deep learning to optical remote sensing images, and due to the scarcity of airport sample data, the deep learning model is often over-fitted during training. For the extraction of the high-resolution SAR image airport, a certain prior art provides a deep learning network MDDA (Mult-level and densely dual attribute) of the high-resolution SAR image runway area, which can realize the high-precision airport extraction, but requires a large data set and a long training time. Therefore, it is very practical to find a deep learning method which is suitable for a small sample data set and can efficiently extract an airport.

The deep learning network is developed very rapidly, and the deep learning DeepLab series has excellent performance in the semantic segmentation field. DeepLabv1 was proposed in 2014, a perforated convolution (Atrous Conv) is introduced for the first time, the problems that a traditional CNN algorithm is adopted under signals existing in pixel markers and does not deform in space are solved, the capacity of capturing fine details of a model is improved through a Conditional Random Field (CRF), and DeepLabv1 obtains a second name in the PASCAL semantic segmentation challenge; DeepLabv2 was proposed in 2016, and DeepLabv2 further proposes an ASPP (asynchronous text messaging) module on the basis of DeepLabv1, so that context semantic information is captured from a multi-scale direction, and a backbone network VGG-16 is changed into ResNet, so that the problem of characteristic resolution reduction caused by pooling in the traditional CNN is solved; in 2017, DeepLabv3 appears, and the DeepLabv3 improves ASPP on the basis of DeepLabv2, so that the network performance is better; in 2018, the DeepLabv3+ is further improved on the basis of the DeepLabv3, a coding-decoding structure is introduced into the DeepLabv3+, the DeepLabv3 is used as a coding part, a simple and effective decoding block is designed, and a deep separable convolution (Depthwise separable convolution) is added into a backbone network, so that the computation amount and the parameter amount are effectively reduced on the premise of keeping the performance of a model.

Therefore, how to realize high-precision, rapid and automatic extraction of the high-resolution SAR image airport runway area based on deep learning is a key technical problem to be solved urgently aiming at the problems in the extraction of the SAR image airport.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides the high-resolution SAR image airport runway area automatic detection method and system based on GCAM, which can fully learn the geographic spatial information of the SAR image airport and can realize the high-precision, quick and automatic extraction of the high-resolution SAR image airport runway area.

In order to solve the technical problems, the invention adopts the technical scheme that:

a GCAM-based high-resolution SAR image airport runway area automatic detection method comprises the following steps:

1) down-sampling the high-resolution SAR image to generate a medium-resolution image;

2) inputting the medium-resolution image into a geographic space context attention mechanism network GCAM to extract a runway area;

3) and carrying out coordinate mapping on the extracted runway area to obtain a detection result of the final high-resolution SAR image.

Optionally, the downsampling the high-resolution SAR image in step 1) specifically refers to performing 5-fold downsampling on the SAR image by using a pixel value extraction method.

Optionally, the GCAM includes a coding block and a decoding block, where the coding block includes a residual error network ResNet, a multi-scale extrusion pyramid MSP and an edge refinement module EDM, where the residual error network ResNet is used to perform feature extraction on an input data set to obtain a preliminary feature, the multi-scale extrusion pyramid MSP is used to obtain global context information from different resolutions by using different pooling convolutional layers for the preliminary feature, the edge refinement module EDM is used to enhance network edge extraction capability for the preliminary feature, and outputs of the multi-scale extrusion pyramid MSP and the edge refinement module EDM are further fused to obtain a multi-level feature; the decoding block is used for carrying out semantic segmentation on the runway area of the airport by combining the preliminary features and the multi-level features to extract the runway area.

Optionally, the residual error network ResNet is an improved residual error network obtained by replacing a common two-dimensional convolution with hole convolutions with hole rates of 2, 4, 8, and 16 on the basis of the residual error network ResNet _ 101.

Optionally, the multi-scale extrusion pyramid MSP comprises a multi-receptive-field parallel pooling working layer and an effective attention module eSE, wherein the multi-receptive-field parallel pooling working layer is built in parallel by a 1 × 1 convolution with a void rate of 1, a 3 × 3 convolution with three void rates of 6,12 and 18, a global average pooling module GAP and a stripe pooling module SP;the stripe pooling module SP performs pooling operation in the horizontal direction by utilizing a stripe pooling window H multiplied by 1 and pooling operation in the vertical direction by utilizing a stripe pooling window H multiplied by 1 in the vertical direction aiming at a two-dimensional feature tensor with the input size of H multiplied by W, averages element values in a pooling kernel respectively to obtain output of stripe pooling in the horizontal direction and output of stripe pooling in the vertical direction, then performs expansion in the left-right direction and the up-down direction on the output respectively by using two one-dimensional convolutions aiming at the output of stripe pooling in the horizontal direction and the output of stripe pooling in the vertical direction, the two expanded feature maps have the same size, then fuses the two expanded feature maps, and finally multiplies the original data and the data subjected to Sigmoid processing to obtain the output of the H multiplied by W two-dimensional feature tensor; the active attention Module eSE learns first by globally averaged pooling of features F for the input feature map Xi_avgWill feature F_avgObtaining a weight matrix W by full connection layer processing_CThe weight matrix W_CReadjusting the extracted channel attention feature A through Sigmoid function_eSEThen the channel attention feature A_eSEApplying the input feature map Xi to obtain a refined feature map X_refineFinally, the refined characteristic diagram X is obtained_refineAnd performing feature re-screening to obtain global context information.

Optionally, the edge refinement module EDM comprises a global convolution module GCB for enhancing the affinity of the feature map to the pixel classification layer and the ability to process feature maps of different resolutions to obtain global information, an edge refinement module BR for enhancing the edge extraction ability of the coding block from the global information; the global convolution module GCB comprises a big convolution kernel of kxk and a characteristic combination module, wherein the big convolution kernel of kxk comprises two paths, one path consists of convolution of kx01 x1 cxc x 2c and convolution of 1 x3 kxc x c, the other path consists of convolution of 1 xkxxc x c and convolution of kx1 xc x c, wherein c is the number of channels, and output results of the two paths are input into the characteristic combination module together to obtain the characteristic Sum_W×H×_C(ii) a The edge refinement module BR targets a feature Sum_W×H×CSequentially processing by small convolution kernel, activation function and small convolution kernel, and processingSuperposition of the processing results to the original features Sum_W×H×CAnd finally obtaining a characteristic diagram after the edges of the refined runway area.

Optionally, the decoding block performs 1 × 1 convolution dimensionality reduction on output features of the coding block, performs edge information decoding on a feature map obtained by refining the edge of the runway area by using an edge refinement module EDM, performs bilinear 4-fold upsampling, connects a result obtained by performing 1 × 1 convolution dimensionality reduction on preliminary features output by a residual error network ResNet and bilinear 4-fold upsampling, applies a 3 × 3 convolution to the connected features to refine the features, and performs simple bilinear 4-fold upsampling, so that a final segmentation result is obtained.

In addition, the invention also provides a GCAM-based high-resolution SAR image airport runway area automatic detection system, which comprises:

the down-sampling program unit is used for down-sampling the high-resolution SAR image to generate a medium-resolution image;

a runway area extraction program unit for inputting the medium resolution image into a geographic space context attention mechanism network GCAM to extract a runway area;

and the coordinate mapping program unit is used for carrying out coordinate mapping on the extracted runway area to obtain a final detection result.

In addition, the invention also provides a GCAM-based high-resolution SAR image airport runway area automatic detection system, which comprises a computer device and a memory, wherein the computer device comprises a microprocessor and the memory which are connected with each other, the microprocessor is programmed or configured to execute the steps of the GCAM-based high-resolution SAR image airport runway area automatic detection method, or a computer program which is programmed or configured to execute the GCAM-based high-resolution SAR image airport runway area automatic detection method is stored in the memory.

Furthermore, the present invention also provides a computer readable storage medium having stored therein a computer program programmed or configured to execute the GCAM-based high resolution SAR image airport runway area automatic detection method.

Compared with the prior art, the invention has the following advantages: the method comprises the steps of carrying out downsampling on a high-resolution SAR image to generate a medium-resolution image; inputting the medium-resolution image into a geographic space context attention mechanism network GCAM to extract a runway area; coordinate mapping is carried out on the extracted runway area to obtain a final detection result of the high-resolution SAR image, the deep learning and the SAR image are combined with the extraction of the runway area of the airport, the geographic space information of the SAR image airport can be fully learned, and the high-precision, quick and automatic extraction of the high-resolution SAR image runway area of the airport can be realized.

Drawings

FIG. 1 is a schematic diagram of the basic principle of the method according to the embodiment of the present invention.

Fig. 2 is a schematic structural diagram of an improved residual error network in the embodiment of the present invention.

Fig. 3 is a schematic structural diagram of the stripe pooling module SP in the embodiment of the present invention.

Fig. 4 is a schematic structural diagram of an effective attention module eSE according to an embodiment of the present invention.

Fig. 5 is a schematic structural diagram of the global convolution module GCB and the edge refinement module BR in the embodiment of the present invention.

Fig. 6 shows an SAR image, a tag, and an optical remote sensing image of a certain airport sample tag in an embodiment of the present invention.

Fig. 7 shows the runway extraction result for airport I in the embodiment of the present invention.

Fig. 8 shows runway extraction results for airport II in an embodiment of the present invention.

Fig. 9 shows runway extraction results for airport III in an embodiment of the invention.

Detailed Description

As shown in fig. 1, the GCAM-based high-resolution SAR image airport runway area automatic detection method of the embodiment includes:

In this embodiment, the downsampling of the high-resolution SAR image in step 1) specifically refers to performing 5-fold downsampling on the SAR image by using a pixel value extraction method. The method mainly comprises two parts of down-sampling, namely down-sampling of a data set sample picture, and down-sampling of three high-resolution test SAR images, wherein the three high-resolution test SAR images are medium-resolution SAR images after sampling.

In order to extract the SAR image airport runway area quickly, the embodiment proposes a geographic space context Attention mechanism network GCAM (geographic spatial context Attention mechanism), as shown in fig. 2, the geospatial context attention mechanism network GCAM includes a coding block and a decoding block, the coding block includes a residual error network ResNet, a Multi-scale Squeeze Pyramid MSP (Multi-scale squaqueeze Pyramid) and an edge refinement module EDM (edge refinement module), the residual error network ResNet is used for extracting features of an input data set to obtain a preliminary feature, the Multi-scale Squeeze Pyramid MSP is used for obtaining global context information by operating different pooling layers from different resolutions according to the preliminary feature, the edge refinement module EDM is used for enhancing network edge extraction capability according to the preliminary feature, and outputs of the Multi-scale Squeeze Pyramid MSP and the edge refinement module EDM are further fused to obtain a Multi-level feature; the decoding block is used for carrying out semantic segmentation on the runway area of the airport by combining the preliminary features and the multi-level features to extract the runway area. Firstly, a coding block performs primary feature extraction on an input data set by using a residual error network ResNet; the multi-scale extrusion pyramid MSP and the edge refinement module EDM respectively extract and fuse the initial features, the multi-scale extrusion pyramid MSP obtains global context information from different resolutions by different pooling convolutional layer operations, and the edge refinement module EDM enhances the network edge extraction capability and further fuses multi-level features; the decoding block adopts edge refinement decoding, one part of the decoding block receives multi-level high-level features from the coding block, and the other part of the decoding block receives preliminary features from a residual error network ResNet, so that semantic segmentation of the runway area of the airport is realized.

The residual error network ResNet is a backbone network of a geographic space context attention mechanism network GCAM, has the characteristics of jump connection, residual error optimization and the like, can accelerate training by the structure, improves model accuracy, and is very suitable for building a semantic segmentation network. In order to solve the problem that the network pooling operation is prone to lose detailed features, as shown in fig. 2, a residual error network ResNet adopted in this embodiment is an improved residual error network obtained by replacing a common two-dimensional convolution with hole convolutions having hole rates of 2, 4, 8, and 16 on the basis of a residual error network ResNet _ 101. The hole convolution can solve the problem that detailed features are easy to lose in network pooling operation, the addition of the hole convolution does not increase the number of parameters of residual error network ResNet additionally, but the subsequent convolution layer can keep larger feature diagram size, so that the detection of target pixels is facilitated, and the overall performance of the model is improved. Considering the addition of the hole convolution, for an arbitrary position j of the picture, applying a filter ω (k) on the input feature x [ j + r.k ], the output y (j) is:

wherein the rate r introduces r-1 0 values between sampling points, effectively extending the receptive field from k × k to k + (k-1) (r-1) without increasing the number of parameters and the amount of computation. Fig. 2 shows an improved structure part of the improved residual error network. The last block (block) of the residual error network ResNet _101 is copied 4 times and then built in parallel, but the pure parallel work of the blocks does not utilize the network to acquire deep semantic information, so that the features are concentrated in the last few layers of smaller feature maps, and the continuous convolution with step length does not utilize semantic segmentation. Therefore, in the present embodiment, the hole convolutions with the hole rates of 2, 4, 8 and 16 are substituted for the ordinary two-dimensional convolution, thereby improving the final output step size. The resolution of a part of feature maps is changed by adding the hole convolution, so that the final output features of the residual error network ResNet _101 not only have high-dimensional low-resolution feature maps, but also contain part of low-dimensional high-resolution features, and full extraction of multi-size features is realized.

Referring to fig. 1, a multi-scale extrusion pyramid MSP includes a multi-field parallel pooling working layer and an active attention module eSE.

Referring to fig. 1, the multi-sensing-field parallel pooling working layer is constructed by a 1 × 1 convolution with a void rate of 1, a 3 × 3 convolution with three void rates of 6,12 and 18 respectively, a global average pooling module GAP and a stripe pooling module SP in parallel; the stripe pooling module SP performs pooling operation in the horizontal direction by utilizing a stripe pooling window H multiplied by 1 and pooling operation in the vertical direction by utilizing a stripe pooling window H multiplied by 1 in the vertical direction aiming at a two-dimensional feature tensor with the input size of H multiplied by W, averages element values in a pooling kernel respectively to obtain output of stripe pooling in the horizontal direction and output of stripe pooling in the vertical direction, then performs expansion in the left-right direction and the up-down direction on the output respectively by using two one-dimensional convolutions aiming at the output of stripe pooling in the horizontal direction and the output of stripe pooling in the vertical direction, the two expanded feature maps have the same size, then fuses the two expanded feature maps, and finally multiplies the original data and the data subjected to Sigmoid processing to obtain the output of the H multiplied by W two-dimensional feature tensor; in this embodiment, the feature map after the improved residual error network processing includes 256 channels and rich semantic information, and is first input to a multi-sensitive-field parallel pooling working layer, which is constructed by a 1 × 1 convolution with a void rate of 1, a 3 × 3 convolution with three void rates of 6,12, and 18, a global average pooling module GAP, and a stripe pooling module SP in parallel. The cavity convolution of four different cavity rates can effectively capture multi-scale information from different receptive fields; the addition of global average pooling carries out down-sampling processing on the characteristics so as to prevent over-fitting of the network; strip pooling captures local information of the features; the multi-receptive-field parallel pooling working layer realizes multi-scale feature fusion.

The striped pooling module SP (stripe Pooling) can overcome the disadvantage that general pooling is prone to false alarms. As shown in fig. 3, when the two-dimensional feature tensor x e R is input^H×WThe stripe pooling module SP performs pooling operations in the horizontal and vertical directions using the band pooling

windows hx

1 and 1 × W, respectively, and averages the element values in the pooling kernel and takes the value as a pooling output value. Level ofOutput y of striped pooling in direction^h＝R^HComprises the following steps:

in the above formula, the first and second carbon atoms are,

for output of arbitrary matrix elements, x, for strippooling in the horizontal direction_i,jAll matrix elements within the pooling core.

Output y of stripe pooling in vertical direction^v＝R^WComprises the following steps:

in the above formula, the first and second carbon atoms are,

for output of arbitrary matrix elements, x, for strippooling in the vertical direction_i,jAll matrix elements within the pooling core.

After H × 1 and 1 × W coring, the output is expanded in the left-right direction and the up-down direction using two one-dimensional convolutions. And after expansion, the two feature graphs have the same size, then are fused, and finally, the original data and the data processed by the Sigmoid function are multiplied to output a result. In the stripe pooling layer in the horizontal and vertical directions, the discretely distributed pixel regions and the band-shaped pixel regions easily depend on each other. Since the convolution kernel is long and narrow, and the convolution kernel shape is narrow in the opposite dimension, it is easy to capture local information of the feature. These features all make streak pooling preferable to square kernel based average pooling.

And the effective attention Module eSE (effective Squeeze-and-Excitation Module) is used for receiving the multi-scale features and then screening the advantages and the disadvantages of the features from the channel information. Referring to fig. 4, the active attention module eSE first learns the features F by global mean pooling for the input feature map Xi_avgWill specially beSign F_avgObtaining a weight matrix W through Full Connectivity (FC) processing_CThe weight matrix W_CReadjusting the extracted channel attention feature A through Sigmoid function_eSEThen the channel attention feature A_eSEMultiplying the input feature map Xi to obtain a refined feature map X_refineThus, each input Xi is subjected to weight assignment pixel by pixel, and feature re-screening is realized. Wherein, the Full Connectivity (FC) and a Sigmoid function readjust the input feature map to extract the useful channel information.

When the size of the input feature map is X_i∈R^C×W×HThen the valid channel attention map A_eSE(X_i)∈R^C×1×1The calculation is as follows:

A_eSE(X_i)＝σ(W_C(F_gap(X_i)))

in the above formula, A_eSE(X_i) Representing the channel attention feature A extracted from the feature map Xi for the input_eSEσ is Sigmoid function, W_CAs a weight matrix, F_gap(X_i) Features F obtained for global average pooling of feature maps Xi against input_avgAnd F is_gap(X_i) The functional expression of (a) is:

in the above equation, Xi, j represents all elements in the matrix of the feature map Xi.

Attention to channel feature A_eSEApplying the input feature map Xi to obtain a refined feature map X_refineThe expression of (a) is as follows:

in the above formula, the first and second carbon atoms are,

representing an exclusive or. The input feature map Xi is the multi-scale feature map from the output of the multi-scale extrusion pyramid MSP. A is to be_eSE(X_i) Applying attention as a channel feature to a multiscale feature map makes the multiscale feature more informative. Finally, the output characteristic diagram is input into the refined characteristic diagram X element by element_refineAnd (5) performing characteristic rescreening.

Referring to fig. 1, the multi-scale extrusion pyramid MSP, the edge refinement module EDM work in parallel and receive simultaneously the output signature from the improved residual network. As shown in fig. 1, the edge refinement module EDM comprises a global convolution module gcb (global conditional block) for enhancing the affinity of the feature map to the pixel classification layer and the ability to process feature maps of different resolutions to obtain global information, an edge refinement module br (boundary reference) for enhancing the edge extraction capability of the coding block from the global information. The edge refinement module EDM can effectively solve the problem of pixel point classification and positioning in semantic segmentation, wherein the global convolution module GCB increases the size of a convolution kernel to the space size of a feature map, so that the feature map and a pixel classification layer are closely related, thereby enhancing the capability of processing different features and obtaining global information; and then an edge refinement module BR is introduced to further improve the network edge extraction capability.

As shown in fig. 5, the global convolution module GCB in this embodiment includes a large k × k convolution kernel and a feature combination module, where the large k × k convolution kernel includes two paths, one path is composed of a convolution of k × 01 × 1c × 2c and a convolution of 1 × 3k × c × c, and the other path is composed of a convolution of 1 × k × c × c and a convolution of k × 1 × c × c, where c is the number of channels, and output results of the two paths are input to the feature combination module together to obtain the feature Sum_W×H×C(ii) a Edge refinement Module BR for feature Sum_W×H×CSequentially processing the data by a small convolution kernel, an activation function and a small convolution kernel, and then overlapping the processing result to the original characteristic Sum_W×H×CAnd finally obtaining a characteristic diagram after the edges of the refined runway area.

Referring to fig. 5, the global convolution module GCB adopts a convolution construction mode to fully utilize the multi-channel information of the features. Aiming at the problem of pixel point classification, the global convolution module GCB adopts a large convolution kernel, so that semantic information corresponding to each pixel point cannot be changed due to image transformation (translation, turnover and the like), and the relation among pixels is closer; in terms of aiming at the pixel point positioning problem, the global convolution module GCB uses complete convolution, utilizes the matrix decomposition principle, and uses convolution of 1 xk and kx 1, and convolution of

kx

1 and 1 xk to replace large kernel convolution of kx k, thereby reducing parameter quantity, reducing calculation quantity, matching each pixel type with corresponding correct type, and realizing accurate pixel segmentation. Because the global convolution module GCB does not have a BN layer (Batch Normalization) and an activation function, an edge thinning module BR of a small convolution kernel is introduced, the phenomenon of object boundary pixel misclassification is prevented, and classification accuracy and positioning accuracy are realized.

As shown in fig. 1, a decoding block performs 1 × 1 convolution dimensionality reduction on output features of a coding block, performs edge information decoding and bilinear 4-fold upsampling on a feature map obtained by refining the edge of a runway area by using an edge refining module EDM, then connects a result obtained by performing 1 × 1 convolution dimensionality reduction on preliminary features output by a residual error network ResNet and bilinear 4-fold upsampling on the preliminary features, applies a 3 × 3 convolution to the connected features to refine the features, and finally performs a simple bilinear 4-fold upsampling, thereby obtaining a final segmentation result. The input to the decoding block comprises two parts: the output characteristics of the coding blocks and the preliminary characteristics output by the residual error network ResNet. The output characteristics of the coding block are firstly reduced through 1 multiplied by 1 convolution, then edge information decoding is carried out by utilizing EDM, and then bilinear 4 times of upsampling is carried out, so that the operation is characterized in that the edge information is fully decoded while the number of characteristic channels is reduced; and then connected with corresponding features from the backbone network of the same spatial resolution, since the features from the backbone network contain a part of low-level features, which usually contain a large number of channels, then we also take a 1 × 1 convolution to reduce the number of channels and reduce the unnecessary channel computation of the network.

In this embodiment, step 3) is configured to perform coordinate mapping on the extracted runway area to obtain a final detection result. The coordinate mapping is the same as the existing method, so the description is omitted, and after the geographic space context attention mechanism network GCAM realizes the segmentation of the airport runway area of the medium-resolution SAR image, the result graph is processed by using the coordinate mapping method, so that the result graph of the high-resolution SAR original graph is obtained. And finally, visualizing the result image and the original image to realize runway area extraction of the high-resolution SAR image.

The GCAM-based high-resolution SAR image airport runway area automatic detection method of the embodiment is experimentally verified below. The experimental environment was as follows: CPU Inter to qian jin brand 5120; GPU (single) NVIDIA RTX 2080 Ti; the data set uses an SAR image of a high-resolution No. 3 system, and firstly, a pixel extraction method is utilized to carry out 5-time down-sampling on 10 airport sample images; and then, using LaberImage software to label pixels, wherein the pixels are divided into a runway area and a background. We cut 10 down-sampled medium resolution SAR images arbitrarily into images larger than 480 x 480 to make small sample datasets, and generate 466 images. The ratio of training set to validation set was 4: 1. As shown in fig. 6, sub-images (a) - (c) are respectively an SAR image, a tag and an optical remote sensing image of a certain airport sample tag; the communication area where the mark a is located is a runway area, and the runway area comprises an airstrip, a taxiway, a parking apron and an airplane; the remaining individual communication areas are background.

The parameters are set as follows: the learning rate in the network training process is set to be 0.00001, and the weight attenuation coefficient is 0.995. The batch of input pictures (batch size) is 1, and the network training is iterated 100 times, keeping an epoch every 5 times. And randomly cutting the input picture in the training process, wherein the size of a window for random cutting is 480 multiplied by 480.

In this embodiment, PA (Pixel accuracy) and IOU (Intersection over Intersection) are used as parameters for verifying the runway extraction accuracy. PA represents the proportion of pixels marked correctly to the total pixels; the IOU represents the ratio of intersection and union between the segmentation result and the label; then MPA (Mean pixel proportion) represents the proportion of the number of pixels each class is correctly classified into; MIOU (Mean intersection over unit) represents the average of IOU per category. The method specifically comprises the following steps:

assuming a total of k +1 classes (including a class of backgrounds), in the above formula, P_ijRepresenting the number of pixels that originally belonged to class i but predicted to be class j, is a false positive sample, P_jiRepresenting the number of pixels that originally belong to class j but are predicted to be class i, is a false negative sample, P_iiRepresenting the number of true pixels of class i.

In order to verify the high efficiency of extracting the SAR image in the airport runway area, three groups of comparison experiments are performed. Comparative experiments were performed using the method of this example with deep lab v3+, RefineNet and MDDA. The number of experimental airports is three, including an airport i of 12000 × 15000, an airport ii of 9600 × 9600, and an airport iii of 15000 × 17500, which have not been used in data collection. MDDA is a deep learning network which is suitable for extracting SAR images from airport runway areas and is proposed previously, and DeepLabV3+ and RefineNet are semantic segmentation mainstream networks. The dataset used for the experiment was a manually annotated 466 small sample dataset. Because the network outputs down-sampled medium resolution images, the sizes of the down-sampled airports i, ii, iii are 2400 x 3000, 2000 x 2000, and 3000 x 3500, respectively. And finally, carrying out coordinate mapping processing on the result graph to directly obtain the result graph before downsampling. Network training time, picture testing time and runway area extraction precision before and after sampling are analyzed.

Fig. 7 to 9 show the extraction results of airport runway areas of airports i, ii, and iii, respectively. The SAR image with the medium resolution is obtained by performing down-sampling on the SAR image with the high resolution (a), the SAR image with the medium resolution (b) which is 5 times of that of the down-sampled SAR image, and the SAR image with the medium resolution (c) is marked by the category of a runway area, wherein red is the runway area, and black is a non-runway area, namely a background; (d) the extraction result of the RefineNet on the SAR image with the medium resolution, (e) the extraction result of the MDDA on the SAR image with the medium resolution, (f) the extraction result of the DeeplLabV 3+ on the SAR image with the medium resolution, and (g) the extraction result of the SAR image with the medium resolution in the method (GCAM) of the embodiment; (h) is a fused graph of the RefineNet result (d) and the medium resolution SAR map (b), (i) is a fused graph of the MDDA result (e) and the medium resolution SAR map (b), (j) is a fused graph of the deplab v3+ result (f) and the medium resolution SAR map (b), (k) is a fused graph of the method (GCAM) result (g) and the medium resolution SAR map (b) of the present embodiment; (l) - (o) is a fusion graph of the results (d) - (g) after coordinate mapping processing and the high-resolution original image (a); wherein the area marked with the number 1 is a runway area; the area frame with the reference number 2 is a false detection frame, namely, the background false detection is a part of the runway area; the area box numbered 3 marks the missed detection box, i.e. the portion of the runway area that is not detected.

I, airport experiment results and analysis.

As shown in a subgraph (a) in fig. 7, an airport i mainly comprises a large-area long runway area and an airplane parking apron, and a large number of airplanes in the airport have obvious airplane target bright spots; the background area has a gathering housing area and an intricate traffic line.

We tested medium resolution SAR images of airport i, with a test pattern size of 2400 x 3000. As shown in subgraphs (d) - (g) in fig. 7, the extraction result of the method of the present embodiment is the closest tag, and the MDDA does not completely extract the partial edge of the runway area; DeepLabV3+ has a small part missing detection phenomenon on the runway area; the RefineNet extraction was least effective. From the visual views (h) - (k), we mark the main missed boxes. The method of the embodiment has no large missing detection area, the MDDA has 2 main missing detection areas, the deep LabV3+ has 4 obvious missing detection areas, the number of false detection frames of the RefineNet is the most, more edge missing detection exists, and the missing detection areas are all in the edge area of the runway area. Comparing the network result (j) of the method of this embodiment with the extraction result (k) of the deep lab v3+, it can be seen that the addition of the edge refinement module EDM in the method of this embodiment enhances the learning of the network edge feature.

And II, carrying out experimental result and analysis on the airport II.

Airport ii is a simpler feature than airport i. The runway area of the airport II is mainly composed of long straight runways, small building groups are arranged near the edge area of the airport, large residential areas are not arranged, and a plurality of water areas are arranged around the runway area. The water area is imaged under the synthetic aperture radar to present the same dark black characteristic as a runway, which has interference on network distinguishing characteristics.

We tested medium resolution SAR images of airport ii size 2000 x 2000. Fig. 8 shows the runway area extraction for airport ii. Comparing sub-graphs (d), (e), (f), (g) and (c) in fig. 8, it can be seen that the method of the present embodiment has no false alarm and the best extraction effect. As can be seen from the subgraphs (h) - (k) in FIG. 8, the method of the present embodiment has only one small missed detection frame; the MDDA has 1 false detection frame and 4 obvious missed detection frames; DeepLabV3+ has more false alarms and most missed detection boxes; for the left edge area of the airport II, several missed detection areas exist in RefineNet; their extraction efficiency is to be improved. The edge extraction capability and the false alarm removal capability of the method are best, which also brings the superiority of the multi-scale squeezing pyramid MSP in the method.

And III, experimental results and analysis of the airport.

The runway area structure and surrounding terrain structure of airport iii are the most complex. The number of runways, taxiways, rest stops and parking ramps is large. The airport III is a civil airport, and the runway area of the airport III is mostly short runways without large-area long straight runways. The surrounding ground features SAR have more gray colors and bright spots, and are obviously compared with the characteristics of the airport runway area, so that the probability of network misjudgment is reduced. However, the edge features of the airport III are complex, and the edge information is the most, so that the network is required to have better global semantic information learning capability and can effectively decode the edge information.

We tested a medium resolution SAR image of size 3000 x 3500 for airport iii. Compared with subgraphs (d) - (l) in fig. 9, the method of the present embodiment has the same best extraction effect, and only part of small areas are missed; MDDA detection has two significant false alarms; the deep LabV3+ has a large number of missed detections, which indicates that the learning ability of the deep LabV3+ on the edge information is not strong; RefineNet has a large number of false alarms and the extraction effect is the worst. This also presents the effectiveness of edge decoding for the method of the present embodiment.

In order to more intuitively embody the high efficiency of the method for extracting the runway area of the airport. Table 1 shows the extraction accuracy of the medium resolution SAR images of three airports under different algorithms. The average extraction precision of the method for three airport runway areas reaches 0.9823, the average IOU reaches 0.9665, and the average extraction precision is higher than MDDA, DeepLabV3+ and RefineNet. According to table 1, the difference between the PA and IOU values of the same airport runway area is small, which indicates that the method of the present embodiment can almost completely extract the runway area without false alarm; DeepLabV3+ is easy to generate false alarm, so that the values of PA and IOU in the same airport runway area have a certain value difference, and the false alarm can reduce the IOU value in the runway area; although the overall extraction effect of the MDDA is not good, the MDDA has defects in the detail learning of a small sample data set; both PA and IOU values for RefineNet are lowest.

Table 1: and (5) analyzing the extraction precision of different networks.

Table 2 gives the training times for the different algorithms for the small sample dataset and the test times for the medium resolution SAR images for the three airports. According to table 2, from the training time of a small sample data set, our network only needs about 2 hours of training time; the training time of the MDDA is longest, and nearly 8 hours, the effect of training small samples of the MDDA is obviously not good as that of large samples; the training time of DeepLabV3+ and RefineNet is almost the same as the method of the present embodiment, but the precision is quite different. From the test time of the medium-resolution SAR images of three airports, the smaller the picture size is, the shorter the test time is, and the average test time of the method of the present embodiment is only 16.95s, the average test time of the reflonenet is 16.69s, the average test time of the deep lab v3+ is 15.89s, and the test time of the MDDA is approximately 2.5 times that of the method of the present embodiment. The addition of the MSP and the EDM brings a certain parameter quantity to the network, which is also the reason that the training time and the testing time of the method of the embodiment are slightly longer than those of the deep LabV3+, and the shorter the network training time and the picture testing time, the higher the efficiency of the actual engineering. Therefore, in summary, the method can achieve high-precision and fast extraction in processing of the small sample data set of the SAR image, and the method has high efficiency.

Table 2: data sets of different networks train time and test time of medium resolution airport images.

Therefore, the method can realize the rapid automatic extraction of the high-resolution SAR image airport runway area. The network design is light, the network iteration time is greatly shortened, and the network training time and the picture testing time are reduced. MSP enables the network to learn global features and encode effective features in a multi-scale and all-around manner, the parallel working mode of EDM and MSP enhances the learning between context and semantic information, and EDM enables edge information to be completely decoded and extracted. Meanwhile, the network is more suitable for training small sample data sets, a large common SAR airport data set for semantic segmentation is not available at present, the large common SAR airport data set can only be manually marked, and the small sample application is more beneficial to saving of manual time and cost. In general, from the view of extraction precision, data set training time and picture test time, the network is superior to the main stream algorithm DeepLabV3+, the GCAM performance is superior to the previously proposed algorithm MDDA, and high-efficiency automation is realized.

In summary, in order to realize rapid and automatic extraction of an airport by using a high-resolution SAR image, the embodiment provides an airport runway area automatic detection method based on a GCAM (general navigation aid memory access), which comprises three parts, namely, the step of performing downsampling processing on an original high-resolution SAR image, the step of extracting the airport runway area by using the GCAM, and the step of mapping coordinates of a result image generated by the GCAM. The down-sampling process enables a single training sample to contain more airport information, and is beneficial to making a small sample data set; MSP is added into stripe pooling and four parallel convolutions to work together, so that the characteristic can be learned in a multi-scale mode, and an eSE module is used for screening useful characteristics; the EDM helps the network to learn edge semantic information, and the coordinate mapping processing can obtain the extraction result of the original high-resolution SAR image. In the test of three airport runway areas, our network compares best to DeepLabV3+, RefineNet and MDDA, MPA can reach 0.98, and MIOU can reach 0.96. In addition, the time of the network training data set is only 2.25h, and the average test time of the images is only 16.94 s. From the extraction result, the GCAM has no false alarm and less missing detection, and can efficiently realize the extraction of the runway area of the airport. In addition, the GCAM can improve the detection efficiency in the actual engineering; and after the airport runway area is extracted, the detection range of subsequent airplane extraction can be shortened, and the time is saved.

In addition, the embodiment further provides a GCAM-based high-resolution SAR image airport runway area automatic detection system, which includes:

and the coordinate mapping program unit is used for carrying out coordinate mapping on the extracted runway area to obtain a detection result of the final high-resolution SAR image.

In addition, the embodiment also provides a GCAM-based high-resolution SAR image airport runway area automatic detection system, which comprises a computer device, wherein the computer device comprises a microprocessor and a memory which are connected with each other, the microprocessor is programmed or configured to execute the steps of the GCAM-based high-resolution SAR image airport runway area automatic detection method, or a computer program which is programmed or configured to execute the GCAM-based high-resolution SAR image airport runway area automatic detection method is stored in the memory.

Furthermore, the present embodiment also provides a computer-readable storage medium having stored therein a computer program programmed or configured to execute the aforementioned GCAM-based high resolution SAR image airport runway area automatic detection method.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is directed to methods, apparatus (systems), and computer program products according to embodiments of the application wherein instructions, which execute via a flowchart and/or a processor of the computer program product, create means for implementing functions specified in the flowchart and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims

1. A GCAM-based high-resolution SAR image airport runway area automatic detection method is characterized by comprising the following steps:

3) and carrying out coordinate mapping on the extracted runway area to obtain a final detection result of the high-resolution SAR image.

2. The GCAM-based high-resolution SAR image airport runway area automatic detection method according to claim 1, characterized in that, the down-sampling of the high-resolution SAR image in step 1) is specifically a 5-fold down-sampling processing of the SAR image by adopting a pixel value extraction method.

3. The GCAM-based high resolution SAR image airport runway area automatic detection method of claim 1, wherein the geospatial context attention mechanism network GCAM comprises a coding block and a decoding block, the coding block comprises a residual network ResNet, a multi-scale extrusion pyramid MSP and an edge refinement module EDM, the residual network ResNet is used for performing feature extraction on an input data set to obtain preliminary features, the multi-scale extrusion pyramid MSP is used for obtaining global context information from different resolutions with different pooling convolutional layer operations for the preliminary features, the edge refinement module EDM is used for enhancing network edge extraction capability for the preliminary features, and outputs of the multi-scale extrusion pyramid MSP and the edge refinement module EDM are further fused to obtain multi-level features; the decoding block is used for carrying out semantic segmentation on the runway area of the airport by combining the preliminary features and the multi-level features to extract the runway area.

4. The GCAM-based high-resolution SAR image airport runway area automatic detection method according to claim 3, wherein the residual error network ResNet is an improved residual error network obtained by replacing ordinary two-dimensional convolution with hole convolutions with hole rates of 2, 4, 8 and 16 on the basis of the residual error network ResNet _ 101.

5. The GCAM-based high-resolution SAR image airport runway area automatic detection method of claim 3, wherein the multi-scale extrusion pyramid MSP comprises a multi-field parallel pooling working layer and an effective attention module eSE, wherein the multi-field parallel pooling working layer is built by a 1 x 1 convolution with a voidage of 1, a 3 x 3 convolution with three voidages of 6,12 and 18 respectively, a global average pooling module GAP and a stripe pooling module SP in parallel; the stripe pooling module SP performs pooling operation in the horizontal direction by utilizing a stripe pooling window H multiplied by 1 and pooling operation in the vertical direction by utilizing a stripe pooling window H multiplied by 1 in the vertical direction aiming at a two-dimensional feature tensor with the input size of H multiplied by W, averages element values in a pooling kernel respectively to obtain output of stripe pooling in the horizontal direction and output of stripe pooling in the vertical direction, then performs expansion in the left-right direction and the up-down direction on the output respectively by using two one-dimensional convolutions aiming at the output of stripe pooling in the horizontal direction and the output of stripe pooling in the vertical direction, the two expanded feature maps have the same size, then fuses the two expanded feature maps, and finally multiplies the original data and the data subjected to Sigmoid processing to obtain the output of the H multiplied by W two-dimensional feature tensor; the active attention Module eSE learns first by globally averaged pooling of features F for the input feature map Xi_avgWill feature F_avgObtaining a weight matrix W by full connection layer processing_CThe weight matrix W_CReadjusting the extracted channel attention feature A through Sigmoid function_eSEThen the channel attention feature A_eSEApplying the input feature map Xi to obtain a refined feature map X_refineFinally, the refined characteristic diagram X is obtained_refineAnd performing feature re-screening to obtain global context information.

6. The GCAM-based high resolution SAR image airport runway area automatic detection method of claim 3, characterized in that the edge refinement module EDM comprises a global convolution module GCB for enhancing the close relation of feature maps and pixel classification layers and the ability to process feature maps of different resolutions to obtain global information, an edge refinement module BR for enhancing the edge extraction ability of coded blocks from global information; the global convolution module GCB comprises a big convolution kernel of kxk and a characteristic combination module, wherein the big convolution kernel of kxk comprises two paths, one path consists of convolution of kx01 x1 cxc x 2c and convolution of 1 x3 kxc x c, the other path consists of convolution of 1 xkxxc x c and convolution of kx1 xc x c, wherein c is the number of channels, and output results of the two paths are input into the characteristic combination module together to obtain the characteristic Sum_W×H×C(ii) a The edge refinement module BR targets a feature Sum_W×H×CSequentially processing the data by a small convolution kernel, an activation function and a small convolution kernel, and then overlapping the processing result to the original characteristic Sum_W×H×CAnd finally obtaining a characteristic diagram after the edges of the refined runway area.

7. The GCAM-based high-resolution SAR image airport runway area automatic detection method of claim 3, wherein the decoding block performs 1 x 1 convolution dimensionality reduction on output features of a coding block, performs edge information decoding on a feature map obtained by an edge refinement module EDM after refining the runway area edge, performs bilinear 4-fold upsampling, then connects the result obtained by performing 1 x 1 convolution dimensionality reduction on preliminary features output by a residual error network ResNet and bilinear 4-fold upsampling, then applies a 3 x 3 convolution to the connected features to refine the features, and finally performs a simple bilinear 4-fold upsampling, thereby obtaining the final segmentation result.

8. A GCAM-based high-resolution SAR image airport runway area automatic detection system is characterized by comprising:

9. A GCAM-based high resolution SAR image airport runway area automatic detection system comprising a computer device comprising a microprocessor and a memory connected to each other, wherein the microprocessor is programmed or configured to perform the steps of the GCAM-based high resolution SAR image airport runway area automatic detection method according to any one of claims 1 to 7, or the memory has stored therein a computer program programmed or configured to perform the GCAM-based high resolution SAR image airport runway area automatic detection method according to any one of claims 1 to 7.

10. A computer-readable storage medium having stored thereon a computer program programmed or configured to perform the GCAM-based high resolution SAR image airport runway area automatic detection method of any of claims 1-7.