CN113378763B - SAR image-oriented target automatic detection method and system - Google Patents

SAR image-oriented target automatic detection method and system Download PDF

Info

Publication number
CN113378763B
CN113378763B CN202110712235.3A CN202110712235A CN113378763B CN 113378763 B CN113378763 B CN 113378763B CN 202110712235 A CN202110712235 A CN 202110712235A CN 113378763 B CN113378763 B CN 113378763B
Authority
CN
China
Prior art keywords
module
features
feature
shuffling
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202110712235.3A
Other languages
Chinese (zh)
Other versions
CN113378763A (en
Inventor
肖红光
王杰岚
陈立福
邢进
韩松涛
罗汝
蔡兴敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha University of Science and Technology
Original Assignee
Changsha University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha University of Science and Technology filed Critical Changsha University of Science and Technology
Priority to CN202110712235.3A priority Critical patent/CN113378763B/en
Publication of CN113378763A publication Critical patent/CN113378763A/en
Application granted granted Critical
Publication of CN113378763B publication Critical patent/CN113378763B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an SAR image-oriented target automatic detection method and system, which comprises the steps of inputting an input image into a feature fusion and anti-interference network FFAN which completes training in advance to obtain a target frame for target detection, wherein the feature fusion and anti-interference network FFAN comprises the following steps: the system comprises a backbone network, a pooling and channel shuffling module, a depth fusion module, a spatial attention mechanism RSAM, a spatial feature fusion module ASFF and a classification regression network. The invention can effectively extract and fuse the characteristics and improve the regression precision, so that the detection effect is obviously improved on the premise of hardly influencing the testing speed by the characteristic fusion and anti-interference network FFAN. The method has the advantage of good universality, not only can be used for realizing high-precision and quick detection of the airplane, but also can be used for detecting and identifying other dense targets in the SAR image.

Description

SAR image-oriented target automatic detection method and system
Technical Field
The invention relates to a target detection technology based on an SAR image, in particular to an SAR image-oriented target automatic detection method and system.
Background
Target detection is a hot problem in the field of high resolution Synthetic Aperture Radar (SAR). The airplane is a typical artificial target, has important military value in the military field, can efficiently and accurately acquire airplane target information at positions such as airports, airspaces and the like, and has important significance for acquiring military information of battlefields in real time and making operation plans; in the civil field, the detection and identification of the airplane can more effectively manage the airport. Because Synthetic Aperture Radars (SAR) have the characteristic of working all day long and all weather, the target detection and identification technology in SAR images has been widely concerned and researched.
The traditional SAR image target detection method such as Constant False Alarm Rate (CFAR) algorithm and CA-CFAR algorithm has the problems of poor robustness, low automation degree and the like, and is difficult to break.
In recent years, the synthetic aperture radar imaging and deep learning technology is rapidly developed, people acquire more SAR images with high resolution, and many scholars begin to research and utilize a deep learning algorithm to detect SAR image targets. Compared with the traditional SAR image target detection method and machine learning algorithm, the method has higher detection precision and shorter test time, achieves great research progress, and particularly can realize end-to-end detection. The airplane targets have the characteristics of small size, various types, dense distribution and the like, and are more difficult to detect compared with targets such as runways, bridge buildings and the like. Therefore, target detection is a difficult point in SAR image target detection. Li et al in 2019 combined an improved line segment detector LSD with Faster-CNN 4 designed an aircraft detection method in SAR images. To better achieve target detection, more and more scholars have introduced attention mechanisms. Zhao et al designs a pyramid attention extension network (PADN) that enhances learning of backscatter fine-grained features of an aircraft. In 2020, guo et al propose a method of detecting aircraft by combining scattering information with deep learning to detect aircraft targets. Chen et al propose a multi-level densely connected dual-attention network to automatically detect airport runway areas, which achieves better extraction results, but the network training speed and testing speed are relatively slow. In 2020, chen et al proposes a new end-to-end framework based on deep learning, which can automatically classify water and shadow regions in SAR images. Chen et al propose a new scene classification framework called feature recalibration network (FRN-MSF) with multi-scale spatial features, which enables high-precision classification in the context of SAR. Tan et al propose a geospatial context attention mechanism GCAM for automatic detection of airport areas, which is not only high in precision, but also short in test time. In 2021, wang et al combined the GCAM algorithm with the airplane detection algorithm to provide an efficient SAR image airplane detection framework, which realizes automatic and rapid detection of airplane targets.
The EWFAN is an efficient SAR image target detection network and has considerable advantages in terms of precision and speed. However, due to the complex background of the SAR image, various airplane types and the like, the detection of the airplane by using the ewfa algorithm still has problems, and both the detection rate and the false alarm rate do not reach satisfactory levels.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: in view of the above problems of the prior art, a method is provided
In order to solve the technical problems, the invention adopts the technical scheme that:
a target automatic detection method facing SAR images comprises the steps of inputting input images into a feature fusion and anti-interference network FFAN which is trained in advance to obtain a target frame for target detection, wherein the feature fusion and anti-interference network FFAN comprises the following steps:
a backbone network, which is used for down-sampling the input image into features of different scales, including low-level features P3, middle-level features P4-P6 and high-level features P7;
the pooling and channel shuffling module is used for extracting and fusing the intermediate-level features P4-P6 respectively to obtain corresponding shuffling features P4 '-P6';
the depth fusion module is used for performing depth fusion on the low-level features P3, the shuffle features P4 'to P6' and the high-level features P7 to respectively obtain the corresponding low-level features P3', the shuffle features P4' to P6 'and the high-level features P7' after depth fusion;
the spatial attention mechanism RSAM is respectively used for further extracting and fusing the deeply fused shuffle feature P6 'and the high-level feature P7';
the spatial feature fusion module ASFF is used for further spatial feature extraction and fusion of the deeply fused low-level features P3' and the shuffled features P4' to P5' respectively;
and the classification regression network is used for performing classification regression on all spatial features output by the spatial attention mechanism RSAM and the spatial feature fusion module ASFF through extraction and fusion to generate a target frame for target detection.
Optionally, the backbone network is an Efficientnet network, the Efficientnet network sequentially performs down-sampling of different sizes of the 512 × 512 input images to obtain corresponding features P1 to P7, and the features obtained from the 3 rd to 7 th down-sampling are respectively used as a low-level feature P3, middle-level features P4 to P6, and a high-level feature P7.
Optionally, the pooling and channel shuffling module includes three shuffling modules, namely a shuffling module PSM-l, a shuffling module PSM-m, and a shuffling module PSM-s, where the shuffling module PSM-l is configured to extract and fuse the intermediate-level feature P4 to obtain corresponding shuffling features P4', the shuffling module PSM-m is configured to extract and fuse the intermediate-level feature P5 to obtain corresponding shuffling features P5', and the shuffling module PSM-s is configured to extract and fuse the intermediate-level feature P6 to obtain corresponding shuffling features P6', and the shuffling module PSM-l, the shuffling module PSM-m, and the shuffling module PSM-s each include:
the channel separation module is used for dividing the channel of the input characteristics into two parts;
the maximum pooling operation module is used for performing maximum pooling operation on a part of channels output by the channel separation module;
the average pooling operation module is used for carrying out average pooling operation on the other part of channels output by the channel separation module;
the splicing module is used for splicing the output of the maximum pooling operation module, the output of the average pooling operation module and the original input characteristics of the input channel separation module;
the shuffling module is used for carrying out shuffling operation on the output of the splicing module so as to enrich the information of each channel;
the convolution module is used for carrying out convolution operation on the output of the shuffling module to enable the number of output characteristic channels to be the same as that of input characteristics;
the sizes of the pooling cores of the shuffling module PSM-l, the shuffling module PSM-m and the shuffling module PSM-s for performing the maximum pooling operation and the average pooling operation are 9,15,25, 5,11 and 3 respectively.
Optionally, the deep fusion module is an improved structure of a BiFPN, and the improved structure adds four jump connections on the basis of the BiFPN, where the first jump connects the middle-level feature P6 and the shuffle feature P5', the second jump connects the middle-level feature P5 and the shuffle feature P4', the third jump connects the middle-level feature P5 and the middle-level feature P7, and the fourth jump connects the middle-level feature P4 and the middle-level feature P6.
Optionally, the classification regression network includes a classification network and a regression network, the classification network uses convolution of 64 channels 3 times and convolution of n _ b × n _ c 1 time to predict the category of each target frame, and the regression network uses convolution of 64 channels 3 times and convolution of n _ b × 4 1 time to predict the regression condition of each prior frame, where n _ b refers to the number of prior frames owned by the feature layer, and n _ c refers to the number of network target detection categories.
Optionally, the method further includes a step of pre-training the feature fusion and anti-interference network FFAN before inputting the input image into the feature fusion and anti-interference network FFAN which completes training in advance to obtain the target frame of the target detection, and the loss function L adopted when training the feature fusion and anti-interference network FFAN AIF Comprises the following steps:
L AIF =L FL +L AIoU
in the above formula, L FL Is the Focal loss classification loss, L AIoU Is AIoU regression loss, and has:
L AIoU =L CIoU +pw,
Figure BDA0003133368040000031
Figure BDA0003133368040000032
in the above formula, L CIoU For CIoU regression loss, a and a t Respectively represent the areas of the prediction box and the real box, p isThe weight function of the area, w is the similarity of the areas, and IoU is the cross-over ratio.
In addition, the invention also provides an automatic target detection method facing the SAR image, which comprises the following steps:
s1) generating a window image from an input SAR image in a sliding window mode with an overlapping area;
s2) taking each window image as an input image, and adopting the steps of the SAR image-oriented target automatic detection method to perform target detection to obtain a target frame;
s3) converting the target frame coordinates of each window image into the original SAR image;
and S4) filtering the target frames overlapped in the original SAR image to obtain the target frame serving as the final target detection result.
Optionally, the method adopted in the step S4) of filtering the target frames overlapped in the original SAR image is a non-maximum suppression method.
In addition, the invention also provides an automatic target detection system for the SAR image, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the automatic target detection method for the SAR image.
Furthermore, the present invention also provides a computer-readable storage medium having stored therein a computer program programmed or configured to execute the SAR image-oriented target automatic detection method.
Compared with the prior art, the invention has the following advantages:
1. the characteristic fusion and anti-interference network FFAN adopted by the method comprises the following steps: a backbone network, which is used for down-sampling the input image into features of different scales, including low-level features P3, middle-level features P4-P6 and high-level features P7; the pooling and channel shuffling module is used for extracting and fusing the intermediate-level features P4-P6 respectively to obtain corresponding shuffling features P4 '-P6'; the depth fusion module is used for performing depth fusion on the low-level features P3, the shuffle features P4 'to P6' and the high-level features P7 to respectively obtain the corresponding low-level features P3', the shuffle features P4' to P6 'and the high-level features P7' after depth fusion; the spatial attention mechanism RSAM is respectively used for further spatial feature extraction and fusion of the deeply fused shuffle feature P6 'and the high-level feature P7'; the spatial feature fusion module ASFF is respectively used for further spatial feature extraction and fusion of the deeply fused low-level features P3' and the shuffled features P4' -P5 '; and the classification regression network is used for performing classification regression on all the spatial features output by the spatial attention mechanism RSAM and the spatial feature fusion module ASFF through extraction and fusion to generate a target frame for target detection. Through the structure, the characteristics can be effectively extracted and fused and the regression precision is improved, so that the detection effect is obviously improved on the premise that the testing speed is hardly influenced by the characteristic fusion and anti-interference network FFAN.
2. The method has the advantage of good universality, not only can be used for realizing high-precision and quick detection of the airplane, but also can be used for detecting and identifying other dense targets in the SAR image.
Drawings
Fig. 1 is a schematic structural diagram of a feature fusion and interference rejection network FFAN in the embodiment of the present invention.
Fig. 2 is a schematic diagram of an application organizing method of feature fusion and interference rejection network FFAN in an embodiment of the present invention.
FIG. 3 is a block diagram of a pooling and channel shuffling module in an embodiment of the present invention.
Fig. 4 is a schematic diagram illustrating a structural improvement of the depth fusion module according to an embodiment of the present invention.
FIG. 5 is a diagram illustrating a classification regression network according to an embodiment of the present invention
FIG. 6 is a diagram illustrating a schematic analysis of a problem existing in CIoU Loss according to an embodiment of the present invention.
Detailed Description
As shown in fig. 1, the method for automatically detecting a target oriented to an SAR image in this embodiment includes inputting an input image into a Feature Fusion and Anti-interference Network (FFAN) which is trained in advance to obtain a target frame for target detection, where the Feature Fusion and Anti-interference Network (FFAN) includes:
a backbone network, which is used for down-sampling the input image into features of different scales, including low-level features P3, middle-level features P4-P6 and high-level features P7;
the pooling and channel shuffling module is used for extracting and fusing the intermediate-level features P4-P6 respectively to obtain corresponding shuffling features P4 '-P6';
the depth fusion module is used for performing depth fusion on the low-level features P3, the shuffled features P4 '-P6' and the high-level features P7 to respectively obtain corresponding depth-fused low-level features P3', shuffled features P4' -P6 'and high-level features P7';
the spatial attention mechanism RSAM is respectively used for further spatial feature extraction and fusion of the deeply fused shuffle feature P6 'and the high-level feature P7';
the spatial feature fusion module ASFF is used for further spatial feature extraction and fusion of the deeply fused low-level features P3' and the shuffled features P4' to P5' respectively;
and the classification regression network is used for performing classification regression on all spatial features output by the spatial attention mechanism RSAM and the spatial feature fusion module ASFF through extraction and fusion to generate a target frame for target detection.
Referring to fig. 1, in the present embodiment, the backbone network is an Efficientnet network, the Efficientnet network sequentially performs down-sampling of different sizes of input images of 512 × 512 to obtain corresponding features P1 to P7, and features obtained from the 3 rd to 7 th down-sampling are respectively used as a low-level feature P3, middle-level features P4 to P6, and a high-level feature P7. The Efficientnet network is proposed by Google, takes MBConv in MobileNet V2 as a main component of a model, and introduces an Squeeze and Excitation (SE) method in SENet to adjust the network structure. The Efficientnet network has certain advantages in the aspects of precision and speed, and is suitable for being used as a backbone network for an efficient target detection algorithm.
The pooling and channel shuffling module is used to further extract valid features. As shown in fig. 3, the pooling and channel shuffling module in this embodiment includes three types of shuffling modules, namely a shuffling module PSM-l, a shuffling module PSM-m, and a shuffling module PSM-s, where the shuffling module PSM-l is configured to extract and merge the intermediate-level feature P4 to obtain corresponding shuffled features P4', the shuffling module PSM-m is configured to extract and merge the intermediate-level feature P5 to obtain corresponding shuffled features P5', and the shuffling module PSM-s is configured to extract and merge the intermediate-level feature P6 to obtain corresponding shuffled features P6', respectively, and as shown in fig. 3, each of the shuffling module PSM-l, the shuffling module PSM-m, and the shuffling module PSM-s includes:
the channel separation module is used for dividing the channel of the input characteristics into two parts;
the maximum pooling operation module is used for performing maximum pooling operation on a part of channels output by the channel separation module;
the average pooling operation module is used for carrying out average pooling operation on the other part of channels output by the channel separation module;
the splicing module is used for splicing the output of the maximum pooling operation module, the output of the average pooling operation module and the original input characteristics of the input channel separation module;
a shuffling module for shuffling the output of the splicing module to enrich the information of each channel;
the convolution module is used for carrying out convolution operation on the output of the shuffling module to enable the number of output characteristic channels to be the same as that of input characteristics;
the pool core sizes of the maximum pool operation and the average pool operation performed by the three of the shuffle module PSM-l, the shuffle module PSM-m and the shuffle module PSM-s are {9,15,25}, {5,11} and {3}, respectively.
The depth fusion module is used for carrying out depth fusion on the characteristics of the five-layer characteristic diagram, and promoting information exchange between the characteristic layer and the channel so as to improve the detection effect. As shown in fig. 4, the depth fusion module is an improved structure of a BiFPN, which adds four jump connections (as shown by dashed arrows in a subgraph (c) in fig. 4) on the basis of the BiFPN, where the first jump connects the mid-level feature P6 and the shuffled feature P5', the second jump connects the mid-level feature P5 and the shuffled feature P4', the third jump connects the mid-level feature P5 and the mid-level feature P7, and the fourth jump connects the mid-level feature P4 and the mid-level feature P6. Referring to fig. 4, the depth fusion module in this embodiment is denoted as M-bipfn, where M-bipfn is an improvement over bipfn, and bipfn is an improvement over FPN. BiFPN (as shown in FIG. 4 (b)) has not only top-down fusion but also bottom-up fusion, and BiFPN removes points with only one input source, so that the precision and efficiency are improved to some extent compared with FPN (as shown in FIG. 4 (a)). In the embodiment, four jump connections are added on the basis of BiFPN, and the original feature map of each layer is added to the output, so that more features can be fused to improve the network performance.
On the basis that the depth fusion module completes the preliminary feature fusion, the spatial attention mechanism RSAM and the spatial feature fusion module ASFF are adopted in the embodiment for further spatial feature extraction and fusion. The spatial attention mechanism RSAM carries out further spatial feature extraction and fusion on the deeply fused shuffle feature P6 'and the advanced feature P7'; the spatial feature fusion module ASFF further performs spatial feature extraction and fusion on the deeply fused low-level features P3' and the shuffled features P4' to P5 '. The spatial attention mechanism RSAM and the spatial feature fusion module ASFF are adopted for further spatial feature extraction and fusion, the target significance can be enhanced, the interference of negative samples can be effectively restrained, and the influence on the network testing speed is small.
Finally, the feature graph is subjected to further spatial feature extraction and fusion by adopting a spatial attention mechanism RSAM and a spatial feature fusion module ASFF to obtain five effective feature layers, and a classification and regression network is used for predicting a result by the feature fusion and anti-interference network FFAN. As shown in fig. 5, the classification regression network includes a classification network and a regression network, the classification network uses convolution of 64 channels 3 times and convolution of n _ b × n _ c 1 time to predict the category of each target frame, and the regression network uses convolution of 64 channels 3 times and convolution of n _ b × 4 1 time to predict the regression condition of each prior frame, where n _ b refers to the number of prior frames owned by the feature layer, and n _ c refers to the number of network target detection categories.
Before inputting an input image into a feature fusion and anti-interference network FFAN which completes training in advance to obtain a target frame of target detection, the method also comprises the step of training the feature fusion and anti-interference network FFAN in advanceIn this embodiment, the CIoU Loss is improved, and then the AIoU Loss is introduced into the Focal Loss, so as to provide the Loss function L adopted when the AIF Loss, the training feature fusion and the anti-interference network FFAN are combined AIF Comprises the following steps:
L AIF =L FL +L AIoU
in the above formula, L FL Is the Focal loss classification loss, L AIoU Is AIoU regression loss, and has:
L AIoU =L CIoU +pw,
Figure BDA0003133368040000061
Figure BDA0003133368040000062
in the above formula, L CIoU For CIoU regression loss, a and a t Respectively representing the areas of the prediction frame and the real frame, wherein p is a weight function of the areas, w is the similarity of the areas, and IoU is an intersection ratio. The CIoU Loss adds an aspect ratio penalty term on the basis of the DIoU Loss, and the improvement can improve the network regression accuracy to a certain extent. However, when the aspect ratio of the common center point, the prediction box and the real box is the same, the CIoU Loss is also degraded to the IoU Loss, as shown in fig. 6. Therefore, in this embodiment, the loss function L is adopted when the feature fusion and the anti-interference network FFAN are trained AIF A penalty term pw is designed, and the penalty term pw considers the difference of the areas of the prediction frame and the real frame, so that the area of the prediction frame is closer to the area of the real frame. As shown in FIG. 6, the area of prediction box 2 is closer to the area of the real box of prediction box 1 than to prediction box 1, L of prediction box 2 CIoU L equal to prediction box 1 CIoU Predicting L of box 1 AIoU L greater than prediction box 2 AIoU
Referring to fig. 2, the present embodiment further provides an automatic target detection method for an SAR image, including:
s1) generating a window image from an input SAR image in a sliding window mode with an overlapping area;
s2) taking each window image as an input image, and carrying out target detection by adopting the steps of the SAR image-oriented target automatic detection method to obtain a target frame;
s3) converting the target frame coordinates of each window image into the original SAR image;
and S4) filtering the target frames overlapped in the original SAR image to obtain the target frame serving as a final target detection result.
In the embodiment, when the window image is generated in the step S1) in a sliding window manner with an overlapping area, adjacent windows may overlap by 20%; since the sliding window detection has a certain overlap rate, which may cause the generation of an overlap frame, step S4) needs to filter the target frame overlapped in the original SAR image to obtain a target frame serving as a final target detection result. Referring to fig. 2, the method used for filtering the target frame overlapped in the original SAR image in step S4) of this embodiment is a non-maximum suppression method (NMS algorithm), and needless to say, other known filtering methods or algorithms may also be used as needed. In the detection process, a plurality of predicted target boxes with different confidence degrees are generated on one target of the feature fusion and anti-interference network FFAN, and in the embodiment, the NMS algorithm is used for removing redundant candidate boxes and reserving a box which is most matched with the target, so that the number of false alarms can be effectively reduced.
The verification experiment will be performed on the feature fusion and the anti-interference network FFAN in the method of the present embodiment. A dozen scenes of Gaofen-3 system 1m resolution airport imagery were used in the experiment. The data set contains 4396 airplane samples of 500 × 500 size and corresponding label files, the ratio of training set to validation set is 4:1. in addition, in the present embodiment, an independent test experiment for airplane detection was performed using an SAR image not used for the data set. The experiments in this example were all performed in the pytorech frame of the Ubuntu 18.04 system. All networks were trained using the same data set. Each model was trained on 100 epochs on two RTX 2080Ti GPUs with an initial learning rate of 0.0005 and a batch size of 16 blocksize. The input image size is 512 × 512 pixels. For fairness, the four models of Efficientdet, YOLOV4, ewfa, FFAN in this embodiment do not use the pre-training model and the automatic data enhancement function, and do not use any advanced testing techniques such as Softer-NMS or data enhancement, etc. in the testing process. The NMS thresholds for these four models were set to 0.5 during the test phase.
In this embodiment, two evaluation indexes, namely, a Detection Rate (DR) and a False Alarm Rate (FAR), are used to measure the effect of the network, and a specific calculation formula is as follows:
Figure BDA0003133368040000071
the detection rate represents the ratio of the number (C) of the airplane targets correctly detected by the network to the number (L) of the airplane targets in the label, and the false alarm rate is the ratio of the number of false alarms to the number (S) of the prediction frames output by the network.
In order to verify the performance of the proposed algorithm, in this example, airport i (rainbow bridge airport, 12000 × 14400) and airport ii (capital airport, 14400 × 16800) in the Gaofen-3 system were independently tested and the airplane detection performance was evaluated analytically. The results of the aircraft testing are shown in table 1.
Table 1: and comparing the airplane detection results of different network SAR images.
Figure BDA0003133368040000081
The airport I and the airport II are large civil airports. The airplane targets in the airport have the characteristics of multiple types, small size, dense distribution and the like, and the image background is complex, so that the detection difficulty is increased. Airport I and airport II have 127 and 143 planes respectively. By analyzing the detection results of the two airports, the EfficientDet has the worst detection effect, the false alarm rate is very high, the average false alarm rate reaches 40.58%, and the average detection rate is also low. The false alarm rates for YOLOv4 and ewfa were slightly lower than for EfficientDet, 31.95% and 18.20%, respectively, but the false alarm rates were still higher. The FFAN (the embodiment) has the best detection effect, and the average detection rate and the false alarm rate of the FFAN are respectively 86.26% and 8.42%, which shows that the network can well extract features and improve the capability of resisting background interference. From the aspect of test time, the average test time of the FFAN (the embodiment) is 23.87s, which is only increased by 0.83s compared with the FFicientDet, so that the FFAN (the embodiment) greatly improves the overall detection effect while the test speed is basically not affected, and has a great engineering application value.
To sum up, the target detection is a hot spot problem in the field of high-resolution Synthetic Aperture Radar (SAR), and for the problems that the SAR image airplane target has a complex background and characteristics with diversity, the embodiment provides an efficient airplane target automatic detection method, the method can more effectively extract and fuse airplane characteristics based on the characteristic fusion and anti-interference network FFAN, and can quickly obtain a satisfactory detection result, and the detection rate and the false alarm rate of the method are superior to the existing airplane detection methods such as EfficientDet, ewwan, yolo 4, and the like. The method can be used for SAR image target detection based on feature fusion and an anti-interference network FFAN, can achieve a satisfactory detection effect in a short test time, can also be used for detecting and identifying other dense targets in an SAR image, and has practical application value.
In addition, the present embodiment also provides an automatic target detection system for SAR images, which includes a microprocessor and a memory connected to each other, where the microprocessor is programmed or configured to execute the steps of the aforementioned automatic target detection method for SAR images. Furthermore, the present embodiment also provides a computer-readable storage medium in which a computer program programmed or configured to execute the foregoing SAR image-oriented target automatic detection method is stored.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims (9)

1. A target automatic detection method facing SAR images is characterized by comprising the steps of inputting input images into a feature fusion and anti-interference network FFAN which is trained in advance to obtain a target frame of target detection, wherein the feature fusion and anti-interference network FFAN comprises the following steps:
a backbone network, which is used for down-sampling the input image into features of different scales, including low-level features P3, middle-level features P4-P6 and high-level features P7;
the pooling and channel shuffling module is used for extracting and fusing the intermediate-level features P4-P6 respectively to obtain corresponding shuffling features P4 '-P6';
the depth fusion module is used for performing depth fusion on the low-level features P3, the shuffle features P4 'to P6' and the high-level features P7 to respectively obtain the corresponding low-level features P3', the shuffle features P4' to P6 'and the high-level features P7' after depth fusion;
the spatial attention mechanism RSAM is respectively used for further spatial feature extraction and fusion of the deeply fused shuffle feature P6 'and the high-level feature P7';
the spatial feature fusion module ASFF is used for further spatial feature extraction and fusion of the deeply fused low-level features P3' and the shuffled features P4' to P5' respectively;
the classification regression network is used for performing classification regression on all spatial features output by the spatial attention mechanism RSAM and the spatial feature fusion module ASFF through extraction and fusion to generate a target frame for target detection;
the pooling and channel shuffling module comprises three shuffling modules including a shuffling module PSM-l, a shuffling module PSM-m and a shuffling module PSM-s, wherein the shuffling module PSM-l is used for extracting and fusing intermediate-level features P4 to obtain corresponding shuffling features P4 respectively, the shuffling module PSM-m is used for extracting and fusing intermediate-level features P5 to obtain corresponding shuffling features P5 respectively, the shuffling module PSM-s is used for extracting and fusing intermediate-level features P6 to obtain corresponding shuffling features P6 respectively, and the shuffling module PSM-l, the shuffling module PSM-m and the shuffling module PSM-s respectively comprise:
the channel separation module is used for dividing the channel of the input characteristics into two parts;
the maximum pooling operation module is used for performing maximum pooling operation on a part of channels output by the channel separation module;
the average pooling operation module is used for carrying out average pooling operation on the other part of channels output by the channel separation module;
the splicing module is used for splicing the output of the maximum pooling operation module, the output of the average pooling operation module and the original input characteristics of the input channel separation module;
a shuffling module for shuffling the output of the splicing module to enrich the information of each channel;
the convolution module is used for carrying out convolution operation on the output of the shuffling module to enable the number of output characteristic channels to be the same as that of input characteristics;
the sizes of the pooling cores of the shuffling module PSM-l, the shuffling module PSM-m and the shuffling module PSM-s for performing the maximum pooling operation and the average pooling operation are 9,15,25, 5,11 and 3 respectively.
2. The SAR image-oriented target automatic detection method according to claim 1, wherein the backbone network is an Efficientnet network, the Efficientnet network sequentially performs down-sampling of different sizes of 512 × 512 input images respectively to obtain corresponding features P1 to P7, and the features obtained by the 3 rd to 7 th down-sampling are respectively used as a low-level feature P3, middle-level features P4 to P6 and a high-level feature P7.
3. The SAR image-oriented target automatic detection method according to claim 1, characterized in that the depth fusion module is an improved structure of BiFPN, and the improved structure adds four jump connections on the basis of BiFPN, wherein the first jump connects the middle-level feature P6 and the shuffle feature P5', the second jump connects the middle-level feature P5 and the shuffle feature P4', the third jump connects the middle-level feature P5 and the middle-level feature P7, and the fourth jump connects the middle-level feature P4 and the middle-level feature P6.
4. The SAR image-oriented target automatic detection method according to claim 1, characterized in that the classification regression network comprises a classification network and a regression network, the classification network adopts 3 times of convolution of 64 channels and 1 time of convolution of n _ b × n _ c to predict the category of each target frame, the regression network adopts 3 times of convolution of 64 channels and 1 time of convolution of n _ b × 4 to predict the regression condition of each prior frame, wherein n _ b refers to the number of prior frames owned by the feature layer, and n _ c refers to the number of network target detection categories.
5. The SAR image-oriented target automatic detection method according to claim 1, wherein a step of pre-training a feature fusion and anti-interference network FFAN is further included before inputting an input image into a feature fusion and anti-interference network FFAN which is trained in advance to obtain a target frame for target detection, and a loss function L adopted when training the feature fusion and anti-interference network FFAN AIF Comprises the following steps:
L AIF =L FL +L AIoU
in the above formula, L FL Is the Focal loss classification loss, L AIoU Is AIoU regression loss, and has:
L AIoU =L CIoU +pw,
Figure FDA0003775707440000021
in the above formula, L CIoU For CIoU regression loss, a and a t Respectively representing the areas of the prediction frame and the real frame, wherein p is a weight function of the areas, w is the similarity of the areas, and IoU is an intersection ratio.
6. An automatic target detection method for SAR images is characterized by comprising the following steps:
s1) generating a window image from an input SAR image in a sliding window mode with an overlapping area;
s2) taking each window image as an input image, and carrying out target detection by adopting the steps of the SAR image-oriented target automatic detection method in any one of claims 1-5 to obtain a target frame;
s3) converting the target frame coordinates of each window image into the original SAR image;
and S4) filtering the target frames overlapped in the original SAR image to obtain the target frame serving as the final target detection result.
7. The SAR image-oriented target automatic detection method as claimed in claim 6, characterized in that the method adopted in step S4) for filtering the target frames overlapped in the original SAR image is a non-maximum suppression method.
8. An automatic target detection system for SAR images, comprising a microprocessor and a memory connected to each other, characterized in that the microprocessor is programmed or configured to execute the steps of the automatic target detection method for SAR images according to any one of claims 1 to 7.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program programmed or configured to execute the SAR image-oriented object automatic detection method according to any one of claims 1 to 7.
CN202110712235.3A 2021-06-25 2021-06-25 SAR image-oriented target automatic detection method and system Expired - Fee Related CN113378763B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110712235.3A CN113378763B (en) 2021-06-25 2021-06-25 SAR image-oriented target automatic detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110712235.3A CN113378763B (en) 2021-06-25 2021-06-25 SAR image-oriented target automatic detection method and system

Publications (2)

Publication Number Publication Date
CN113378763A CN113378763A (en) 2021-09-10
CN113378763B true CN113378763B (en) 2022-11-22

Family

ID=77579205

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110712235.3A Expired - Fee Related CN113378763B (en) 2021-06-25 2021-06-25 SAR image-oriented target automatic detection method and system

Country Status (1)

Country Link
CN (1) CN113378763B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114022752B (en) * 2021-11-04 2024-03-15 中国人民解放军国防科技大学 SAR target detection method based on attention feature refinement and alignment
CN114067126A (en) * 2021-11-19 2022-02-18 长春理工大学 Infrared image target detection method
CN114399504A (en) * 2022-03-25 2022-04-26 济南大学 Intelligent rim detection and tracking method and system based on industrial Internet
CN115409817B (en) * 2022-08-30 2024-01-26 中南大学 YOLO-based light-weight and efficient detection network
CN115481694B (en) * 2022-09-26 2023-09-05 南京星环智能科技有限公司 Data enhancement method, device and equipment for training sample set and storage medium
CN116958825B (en) * 2023-08-28 2024-03-22 中国公路工程咨询集团有限公司 Mobile remote sensing image acquisition method and highway maintenance monitoring method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11216988B2 (en) * 2017-10-24 2022-01-04 L'oreal System and method for image processing using deep neural networks
CN112149591B (en) * 2020-09-28 2022-09-09 长沙理工大学 SSD-AEFF automatic bridge detection method and system for SAR image
CN112528896B (en) * 2020-12-17 2024-05-31 长沙理工大学 SAR image-oriented automatic aircraft target detection method and system

Also Published As

Publication number Publication date
CN113378763A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
CN113378763B (en) SAR image-oriented target automatic detection method and system
Zhang et al. Concrete crack detection using context‐aware deep semantic segmentation network
Song et al. Automatic pavement crack detection and classification using multiscale feature attention network
Liu et al. Deep learning–based enhancement of motion blurred UAV concrete crack images
Deng et al. Vision based pixel-level bridge structural damage detection using a link ASPP network
Li et al. Cross-layer attention network for small object detection in remote sensing imagery
Li et al. Sewer pipe defect detection via deep learning with local and global feature fusion
CN104301630B (en) A kind of video image joining method and device
Li et al. Automatic bridge crack identification from concrete surface using ResNeXt with postprocessing
CN109492596B (en) Pedestrian detection method and system based on K-means clustering and regional recommendation network
CN114519819B (en) Remote sensing image target detection method based on global context awareness
Dai et al. Real-time safety helmet detection system based on improved SSD
Wang et al. Multi-size object detection in large scene remote sensing images under dual attention mechanism
CN116861262B (en) Perception model training method and device, electronic equipment and storage medium
Xiao et al. Region of interest (ROI) extraction and crack detection for UAV-based bridge inspection using point cloud segmentation and 3D-to-2D projection
Qu et al. A time sequence location method of long video violence based on improved C3D network
Li et al. CrackCLF: Automatic pavement crack detection based on closed-loop feedback
CN114494893B (en) Remote sensing image feature extraction method based on semantic reuse context feature pyramid
Zhang et al. TB-Net: a three-stream boundary-aware network for fine-grained pavement disease segmentation
Marine et al. Pothole Detection on Urban Roads Using YOLOv8
Wang et al. A deep learning-based method for vehicle licenseplate recognition in natural scene
Liu et al. Efficient dense attention fusion network with channel correlation loss for road damage detection
Wang Automatic detection of concrete cracks from images using Adam-SqueezeNet deep learning model
Yazid et al. Automated system form concrete damage classification identification using pretrained deep learning model
Nazarkevych et al. Method of Dataset Filling and Recognition of Moving Objects in Video Sequences based on YOLO.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20221122