CN112101153A - Remote sensing target detection method based on receptive field module and multiple characteristic pyramid - Google Patents

Remote sensing target detection method based on receptive field module and multiple characteristic pyramid Download PDF

Info

Publication number
CN112101153A
CN112101153A CN202010906088.9A CN202010906088A CN112101153A CN 112101153 A CN112101153 A CN 112101153A CN 202010906088 A CN202010906088 A CN 202010906088A CN 112101153 A CN112101153 A CN 112101153A
Authority
CN
China
Prior art keywords
feature
layer
convolution
receptive field
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010906088.9A
Other languages
Chinese (zh)
Other versions
CN112101153B (en
Inventor
赵丹培
刘子铭
姜志国
史振威
张浩鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202010906088.9A priority Critical patent/CN112101153B/en
Publication of CN112101153A publication Critical patent/CN112101153A/en
Application granted granted Critical
Publication of CN112101153B publication Critical patent/CN112101153B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing target detection method based on a receptive field module and a multiple characteristic pyramid, which comprises the following steps: performing feature extraction on the visible light remote sensing image through a VGG network to obtain feature maps with different sizes; carrying out cascade fusion on the feature maps with different sizes according to the fusion module to obtain a multi-scale cascade feature map; sequentially convolving the multi-scale cascade characteristic graphs through the step length convolution characteristic pyramid to obtain an optimized characteristic graph; inputting the first layer of feature layer of the optimized feature map into a receptive field module, and extracting information to obtain a receptive field layer; inputting a second layer feature layer, a third layer feature map, a fourth layer feature map and a fifth layer feature map of the receptive field layer and the optimized feature map into an anchor point optimization module for anchor point second classification; and inputting the optimized anchor points into a target detection module, and classifying the targets to obtain the multi-scale targets. The invention can simultaneously detect the targets with different sizes, and improves the detection precision and speed.

Description

Remote sensing target detection method based on receptive field module and multiple characteristic pyramid
Technical Field
The invention relates to the technical field of digital image processing, in particular to a remote sensing target detection method based on a receptive field module and a multi-characteristic pyramid.
Background
The optical remote sensing image target detection has many detection challenges, such as images of a large number of examples, large image breadth, complex background texture and other difficulties, and meanwhile, the optical remote sensing image has high utilization value in the civil and military fields and is not thoroughly mined. In recent years, as the resolution of optical images of satellites has increased, finer objects can be identified in remote sensing images. Although high resolution remote sensing images can provide more detailed information about the target, the background of the image will therefore become more complex, which also makes target detection more difficult. Especially, the detection of different kinds of targets such as multi-scale targets becomes more difficult, and the remote sensing targets are mostly distributed targets, and the distribution positions and distances are not fixed, which has become a big problem in the field of target detection.
Patent CN111126359A proposes a high-definition image small target detection method based on an autoencoder and a YOLO algorithm, which mainly solves the problem that the accuracy and speed of the high-definition image small target detection in the prior art cannot be considered at the same time. Although the detection speed and the detection accuracy of the network are improved, the training process of the network needs to be carried out in two steps, is very complicated and is not suitable for detecting the size mixing target.
Patent CN110378242A proposes a method for detecting a remote sensing target with a dual attention mechanism. Because the algorithm is a two-stage detection algorithm, the detection speed cannot be greatly improved, and meanwhile, the structure design is not carried out aiming at different targets with multiple scales.
Therefore, how to provide an efficient multi-scale target detection method is a problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
In view of this, the invention provides a remote sensing target detection method based on a receptive field module and a multi-feature pyramid, which can simultaneously detect targets with different sizes, and improve detection accuracy and speed.
In order to achieve the purpose, the invention adopts the following technical scheme:
the remote sensing target detection method based on the receptive field module and the multi-characteristic pyramid comprises the following steps:
step 1: the method comprises the following steps of performing feature extraction on an input visible light remote sensing image according to a multi-feature pyramid, and specifically comprises the following steps:
step 11: performing feature extraction on the visible light remote sensing image through a VGG network to obtain feature maps with different sizes;
step 12: carrying out cascade fusion on the feature maps with different sizes according to a fusion module to obtain a multi-scale cascade feature map;
step 13: sequentially convolving the multi-scale cascade feature map by using a step convolution feature pyramid to obtain an optimized feature map;
step 2: carrying out target identification and classification on the optimized feature map through a detection network to obtain a multi-scale target, which specifically comprises the following steps:
step 21: inputting the first layer of feature layer of the optimized feature map into a receptive field module, and extracting information to obtain a receptive field layer;
step 22: inputting the receptive field layer and the second layer feature layer, the third layer feature map, the fourth layer feature map and the fifth layer feature map of the optimized feature map into an anchor point optimization module for anchor point second classification;
step 23: and inputting the optimized anchor points into a target detection module, and classifying the targets to obtain the multi-scale targets.
Further, the characteristic map output by the step 11 includes Conv3_3, Conv4_3, Conv5_3 and Conv 7.
Further, the step 12 specifically includes:
upsampling the Conv4_3, the Conv5_3, and the Conv7 to the Conv3_3 size, a spatial dimension size of 80 × 80;
and performing cascade fusion on the Conv3_3, the upsampled Conv4_3, the Conv5_3 and the Conv7 to obtain the multi-scale cascade characteristic diagram of 1024 channels.
Further, the step 13 specifically includes: and sequentially carrying out 3 × 3 × 512 convolution with the step size of 1 and 4 step sizes of 2 on the multi-scale cascade feature map to obtain the optimized feature map.
Further, the step 21 specifically includes:
the first branch is sequentially subjected to 1 × 1 convolution and 3 × 3 convolution with the void rate of 1;
the second branch is sequentially subjected to 1 × 1 convolution, 1 × 3 convolution and 3 × 3 convolution with the void rate of 3;
the third branch is sequentially subjected to 1 × 1 convolution, 3 × 1 convolution and 3 × 3 convolution with the void rate of 3;
the fourth branch is sequentially subjected to 1 × 1 convolution, 3 × 3 convolution and 3 × 3 convolution with the void rate of 5;
the fifth branch is operated through shortcut;
and integrating the output characteristic graphs of the first four branches, performing 1 × 1 convolution, and summing the result of the shortcut operation to obtain the receptive field layer.
Further, the loss function of the detection network is:
Figure BDA0002661513090000031
in the formula (I), the compound is shown in the specification,
Figure BDA0002661513090000032
a ground truth value category label representing anchor point i,
Figure BDA0002661513090000033
the position and the length and the width of the ground surface truth value of the anchor point i are shown; p is a radical ofiAnd xiRespectively representing anchor points i as pairsThe prediction confidence of the elephant hour and the accurate coordinate of the anchor point i in the anchor point optimization module; c. CiAnd tiCoordinates representing the predicted object class and the final bounding box in the target detection module; n is a radical ofaAnd NoRespectively representing the number of anchor points detected as positive in the anchor point optimization module and the target detection module, LrRepresents the regression loss function, LbRepresenting a binary classification penalty, LmIndicating a multi-classification penalty, when the target is true,
Figure BDA0002661513090000034
the output of (a) is 1; if the goal is to be false,
Figure BDA0002661513090000035
the value of (d) is 0.
According to the technical scheme, compared with the prior art, the invention discloses a remote sensing target detection method based on a receptive field module and a multi-feature pyramid, and discloses a new network model based on multi-scale feature fusion and a receptive field mechanism. More abundant information can be obtained by mining the receptive field information on the characteristic diagram, and multi-scale output detection is carried out on the characteristic diagram based on the anchor point optimization module and the target detection module, so that the detection speed and efficiency are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flow chart of a remote sensing target detection method based on a receptive field module and a multi-feature pyramid according to the present invention.
FIG. 2 is a schematic diagram of a pyramid structure with multiple features.
FIG. 3 is a schematic view of a receptor field module.
Fig. 4 is a diagram showing the effect of experiments performed by the algorithm of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a remote sensing target detection method based on a receptive field module and a multiple characteristic pyramid, which comprises the following steps:
step 1: the method comprises the following steps of utilizing a VGG network to carry out feature extraction on an input visible light remote sensing image, and specifically realizing the following processes:
and sending the visible light remote sensing image into a VGG feature extraction network, performing continuous local convolution and pooling operation to obtain the depth feature of the input image, and obtaining a feature map after convolution or pooling every time. Four layers of Conv3_3, Conv4_3, Conv5_3 and Conv7 of the VGG backbone network are selected as feature layer output results, and each layer of pyramid feature map contains different semantic and spatial information, so that the bottom-up process is completed. Wherein, the sizes of Conv3_3, Conv4_3, Conv5_3 and Conv7 are 80 × 80,40 × 40,20 × 20 and 10 × 10 respectively.
In order to prevent the problem that the training of the network from 0 causes slow speed and improve the training speed of the network, the feature extraction network is pre-trained by using ILSVRC CLS-LOC data sets.
Step 2: the fusion module is used for combining the four feature graphs extracted from the VGG backbone network, and the specific implementation process is as follows:
and performing cascade fusion on the feature maps by using a fusion module, namely setting a basic feature map size based on the size of Conv3_3, and upsampling three layers of Conv4_3, Conv5_3 and Conv7 to 80 × 80 sizes so that all feature space dimension sizes are kept consistent. And (3) reserving the original channel number of the channel, and combining the channel number into a multi-scale cascade feature map, wherein the multi-scale cascade feature map comprises all semantic features from low level to high level. Thereby completing the fusion of the multi-layer features at the bottom layer.
Different from the traditional characteristic pyramid network, the method directly adds the characteristics of each layer, which wastes time and labor.
Also, the present invention selects Conv3_3 instead of Conv4_3 of the general network as a result of statistics on the data set. Because the target of the remote sensing image is generally small, especially for automobiles and ships, the size of part of the target is smaller than 20 multiplied by 20, and by using Conv4_3, the receptive field of each element in the characteristic diagram is large, the semantic information is prominent, and the detail characteristic is not obvious.
And step 3: and performing secondary feature extraction on the multi-scale cascade feature map by using the step convolution feature pyramid, wherein the specific implementation process is as follows:
the multiscale concatenated feature map is sequentially convolved by steps 1 and 4 by 3 × 3 × 512 with steps 2. Thus completing the bottom-up second feature extraction.
In the second bottom-up stage, a new pyramid network is reconstructed. As shown in FIG. 2, the normalization of the input layers and the input data by using batch normalization for each convolutional layer can speed up the training of the model and reduce the variation of uncertainty. Moreover, the convolution is used for layering, the detection precision of small targets is improved, the reduction of the feature map is determined by the convolution kernel step length, the region information is enriched, and the features are more diverse.
The multi-feature pyramid structure can neutralize the respective advantages and disadvantages of pooling and convolution extraction features. The characteristic pyramid is divided into three parts, namely bottom-up characteristic extraction, fusion of multilayer characteristics in a bottom layer and bottom-up secondary characteristic extraction.
And 4, step 4: and extracting the receptive field information of the combined characteristic diagram, wherein the concrete implementation process is as follows:
inputting the first layer of feature layer of the optimized feature map into a receptive field module, and totally processing five branches, which are respectively:
the first branch is sequentially subjected to 1 × 1 convolution and 3 × 3 convolution with the void rate of 1;
the second branch is sequentially subjected to 1 × 1 convolution, 1 × 3 convolution and 3 × 3 convolution with the void rate of 3;
the third branch is sequentially subjected to 1 × 1 convolution, 3 × 1 convolution and 3 × 3 convolution with the void rate of 3;
the fourth branch is sequentially subjected to 1 × 1 convolution, 3 × 3 convolution and 3 × 3 convolution with the void rate of 5;
the fifth branch is operated through shortcut;
and integrating the output characteristic graphs of the first four branches, performing 1 × 1 convolution, and summing the result of the shortcut operation to obtain the receptive field layer.
The invention places the receptive field module on the characteristic diagram after cascade fusion, can extract each element of the characteristic diagram and the characteristics of the receptive field around the characteristic diagram, and constructs the target and the context information around the target, thereby improving the detection precision of the small target.
And 5: the anchor point optimization module is used for carrying out anchor point two classification, and the concrete implementation process is as follows:
in the anchor point optimization stage, the anchor point and the boundary frame are optimized once, and the surrounding frame and the class prediction are generated by utilizing the last four layers and the receptive field layer of the step convolution characteristic pyramid feedforward network. The last four layers of the receptive field layer and step convolution pyramid feed-forward network are {80 × 80,40 × 40,20 × 20,10 × 10,5 × 5) in size.
And distributing an anchor point for each feature point in each feature layer, distributing a surrounding frame by taking each anchor point as a center, comparing the features surrounded by the surrounding frame with the surface real value frame, judging whether each surrounding frame contains the target or not through the intersection and comparison of the surrounding frame and the target, and assigning a value to each target through the intersection and comparison value. The bounding box assigned a value of 0 does not contain a target, and is a bad detection box that is easy to detect.
This link is used for getting the detection frame that encloses the frame and get rid of the detection simultaneously and be not good, remains to detect the detection frame that contains the target and be difficult to judge whether good and not good detection frame, thereby can guarantee that the ratio of negative sample and positive sample is too big influence classification accuracy, can also guarantee simultaneously that the network can learn the sample that is difficult to judge more easily, increases the throughput of network to difficult sample. So this is actually a two-branch path, one branch path is the regression of the bounding box for learning and correcting the difference between the bounding box and the target box, and the other branch path is used for two-classification to screen out the correct and wrong bounding box. Here we use multi-layer features in order to obtain the result of the target on multiple scales.
And the anchor point optimization module identifies and deletes the wrong anchor frame, so that the calculation amount of the subsequent classifier is reduced.
Step 6: the anchor point target classification is carried out through a target detection module, and the specific implementation process is as follows:
after eliminating the surrounding frames which do not contain the target from the rear four layers of the step length convolution characteristic pyramid and the receptive field layer in the anchor point optimization module, sequentially inputting the surrounding frames into a target detection module through a connecting block for target classification, and specifically:
and the last layer of feature graph of the step length convolution feature pyramid is sent into a connecting block after eliminating a bounding box which does not contain a target, the output of the connecting block is two, one output is a feature graph which contains the bounding box and is required by a target detection module, the second output is the feature graph which is input into the connecting block after the input feature graph is subjected to up-sampling and is subjected to additive fusion with the feature graph input into the connecting block from the previous layer, and the purpose is to add higher-level features into the feature graph of the previous layer so as to inherit larger-scale context information.
And after eliminating the surrounding frame which does not contain the target from the penultimate layer feature graph of the step length convolution feature pyramid, inputting the feature graph into a connecting block, performing additive fusion after the feature graph is up-sampled with the penultimate layer feature graph, outputting two paths of the feature graph after additive fusion, sending the fused feature graph into a target detection module from one path, performing additive fusion after the feature graph after the additive fusion is up-sampled with the feature graph which is input into the connecting block from the penultimate layer, repeating the steps until the surrounding frame which does not contain the target is eliminated from the receptive field layer, sending the surrounding frame into the connecting block, and directly performing additive fusion with the feature graph which is up-sampled with the penultimate layer to obtain the multi-scale target.
The relationship between the anchor point optimization module and the target detection module is constructed in this way. In the target detection module, secondary regression is carried out on coordinates of the remaining refined bounding boxes obtained by the anchor point optimization module, and meanwhile, the category of the target is judged by using a softmax multi-classification loss function, so that a more accurate bounding box result and a category identification result can be obtained.
And the two detection modules are linked through a conversion connecting block. The regression feature graph containing the bounding box information can be converted into all required forms of a top-down detection module, and meanwhile, context features can be provided by utilizing an upsampling mode, so that the detection effect on large targets is better, and the detection accuracy is further improved.
In the step of loss function, the loss function for constructing the network comprises the loss of a bottom-up feature pyramid structure (anchor point optimization module) and the loss of a top-down structure (target detection module). And for the anchor point optimization module, the optimized anchor point with the negative confidence coefficient smaller than the threshold value is sent to the target detection module to be used for further predicting the class of the object and obtaining more accurate regression coordinates. The loss function of the network is defined as:
Figure BDA0002661513090000071
wherein, the loss function of the anchor point optimization module is:
Figure BDA0002661513090000072
the loss function of the target detection module is:
Figure BDA0002661513090000073
wherein i is the information of all anchor points in each batch,
Figure BDA0002661513090000081
a ground truth value category label representing anchor point i,
Figure BDA0002661513090000082
position and length and width, p, of ground truth value representing anchor point iiAnd xiRespectively representing the prediction confidence coefficient when the anchor point i is taken as an object and the accurate coordinate of the anchor point i in the anchor point optimization module, ciAnd tiCoordinates representing the predicted object class and the final bounding box in the object detection Module, NaAnd NoRespectively representing the number of anchor points detected as positive by the anchor point optimization module and the target detection module. L isrRepresenting a regression loss function, a binary classification loss LbIs the cross-entropy log-loss of two classes (target and non-target), while the multi-class loss LmIs a softmax loss function for multiple classes of confidence. When the target is true, the target is,
Figure BDA0002661513090000083
the output of (a) is 1; if the goal is to be false,
Figure BDA0002661513090000084
the value of (d) is represented as 0. If the denominator N is usedaAnd NoWhen a result of 0 occurs, the values of the numerator are all set to 0 for the purpose of result consideration.
On the top-down target detection module, the size and the position of a target frame are further improved, and meanwhile, the category of the detection frame is predicted. And performing multi-class target identification and regression by using different feature levels, so that multi-scale targets can be accurately detected at the same time. Meanwhile, the target frame is optimized once by the bottom-up pyramid feature network, so that an optimized regression frame is obtained in the top-down link. Meanwhile, the shallowest layer feature map of the top-down network also fuses information contained in the deep layer feature map, so that the features of the target are further enhanced. It is very beneficial for the detection of small objects to be able to repeatedly highlight the features of the small objects themselves.
The experimental graph of the present invention is shown in FIG. 4, which shows the results of the detection using the method under several complex conditions. The first situation is when a large number of small objects are present densely, such as tanks and ships. It can be seen that even if the target is distributed at each corner of the image, the target at the corner can be found because the distribution of the anchor frame is constructed based on the full graph, and the smaller target can be captured more accurately because the feature pyramid is built by using the feature graph with a larger scale. The second situation is that when a large target and a small target appear in an image at the same time and the targets are mutually shielded, such as ships and ports, vehicles and bridges, it can be seen that the invention can well distinguish and detect the targets with different sizes, because the pooling and convolution characteristics are mutually combined, so that the large and small targets are better considered. In the third case, when facing objects with complex texture features, such as golf courses and airports, the information of the object itself and the surrounding background thereof can be extracted due to the addition of the receptive field module, thereby greatly improving the detection accuracy.
The invention has the following advantages:
1. the multi-feature pyramid comprises a pooling feature pyramid (VGG network) and a step size convolution feature pyramid, and the two feature pyramids respectively highlight the global information and the local information of the remote sensing image and the target.
2. The cascade characteristic layer is combined with local and global characteristics of the visible light remote sensing image, and the effect on dispersed targets and small targets is better.
3. The model of the invention breaks through the deficiency of the traditional one-stage detection network in the target detection precision, and has better result compared with the two-stage remote sensing target detection network.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (6)

1. The remote sensing target detection method based on the receptive field module and the multi-characteristic pyramid is characterized by comprising the following steps of:
step 1: the method comprises the following steps of performing feature extraction on an input visible light remote sensing image according to a multi-feature pyramid, and specifically comprises the following steps:
step 11: performing feature extraction on the visible light remote sensing image through a VGG network to obtain feature maps with different sizes;
step 12: carrying out cascade fusion on the feature maps with different sizes according to a fusion module to obtain a multi-scale cascade feature map;
step 13: sequentially convolving the multi-scale cascade feature map by using a step convolution feature pyramid to obtain an optimized feature map;
step 2: carrying out target identification and classification on the optimized feature map through a detection network to obtain a multi-scale target, which specifically comprises the following steps:
step 21: inputting the first layer of feature layer of the optimized feature map into a receptive field module, and extracting information to obtain a receptive field layer;
step 22: inputting the receptive field layer and the second layer feature layer, the third layer feature map, the fourth layer feature map and the fifth layer feature map of the optimized feature map into an anchor point optimization module for anchor point second classification;
step 23: and inputting the optimized anchor points into a target detection module, and classifying the targets to obtain the multi-scale targets.
2. The method for remote sensing target detection based on the receptive field module and the multi-feature pyramid as claimed in claim 1, wherein the feature map output in step 11 includes Conv3_3, Conv4_3, Conv5_3 and Conv 7.
3. The method for detecting the remote sensing target based on the receptive field module and the multi-feature pyramid as claimed in claim 2, wherein the step 12 specifically comprises:
upsampling the Conv4_3, the Conv5_3, and the Conv7 to the Conv3_3 size, a spatial dimension size of 80 × 80;
and performing cascade fusion on the Conv3_3, the upsampled Conv4_3, the Conv5_3 and the Conv7 to obtain the multi-scale cascade characteristic diagram of 1024 channels.
4. The method for detecting the remote sensing target based on the receptive field module and the multi-feature pyramid as claimed in claim 3, wherein the step 13 specifically comprises: and sequentially carrying out 3 × 3 × 512 convolution with the step size of 1 and 4 step sizes of 2 on the multi-scale cascade feature map to obtain the optimized feature map.
5. The method for detecting the remote sensing target based on the receptive field module and the multi-feature pyramid as claimed in claim 4, wherein the step 21 specifically comprises:
the first branch is sequentially subjected to 1 × 1 convolution and 3 × 3 convolution with the void rate of 1;
the second branch is sequentially subjected to 1 × 1 convolution, 1 × 3 convolution and 3 × 3 convolution with the void rate of 3;
the third branch is sequentially subjected to 1 × 1 convolution, 3 × 1 convolution and 3 × 3 convolution with the void rate of 3;
the fourth branch is sequentially subjected to 1 × 1 convolution, 3 × 3 convolution and 3 × 3 convolution with the void rate of 5;
carrying out short operation on the fifth branch;
and integrating the output characteristic graphs of the first four branches, and summing the output characteristic graphs of the first four branches and the result of the shortcut operation through 1 × 1 convolution to obtain the receptive field layer.
6. The method for detecting the remote sensing target based on the receptive field module and the multi-feature pyramid as claimed in claim 5, wherein the loss function of the detection network is:
Figure FDA0002661513080000021
in the formula (I), the compound is shown in the specification,
Figure FDA0002661513080000022
a ground truth value category label representing anchor point i,
Figure FDA0002661513080000023
the position and the length and the width of the ground surface truth value of the anchor point i are shown; p is a radical ofiAnd xiRespectively representing the prediction confidence coefficient when the anchor point i is taken as an object and the accurate coordinate of the anchor point i in the anchor point optimization module; c. CiAnd tiCoordinates representing the predicted object class and the final bounding box in the target detection module; n is a radical ofaAnd NoRespectively representing the number of anchor points detected as positive in the anchor point optimization module and the target detection module, LrRepresents the regression loss function, LbRepresenting a binary classification penalty, LmIndicating a multi-classification penalty, when the target is true,
Figure FDA0002661513080000024
the output of (a) is 1; if the goal is to be false,
Figure FDA0002661513080000025
the value of (d) is 0.
CN202010906088.9A 2020-09-01 2020-09-01 Remote sensing target detection method based on receptive field module and multiple characteristic pyramids Active CN112101153B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010906088.9A CN112101153B (en) 2020-09-01 2020-09-01 Remote sensing target detection method based on receptive field module and multiple characteristic pyramids

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010906088.9A CN112101153B (en) 2020-09-01 2020-09-01 Remote sensing target detection method based on receptive field module and multiple characteristic pyramids

Publications (2)

Publication Number Publication Date
CN112101153A true CN112101153A (en) 2020-12-18
CN112101153B CN112101153B (en) 2022-08-26

Family

ID=73757412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010906088.9A Active CN112101153B (en) 2020-09-01 2020-09-01 Remote sensing target detection method based on receptive field module and multiple characteristic pyramids

Country Status (1)

Country Link
CN (1) CN112101153B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580585A (en) * 2020-12-28 2021-03-30 深圳职业技术学院 Excavator target detection method and device based on stacked dense network
CN113111718A (en) * 2021-03-16 2021-07-13 苏州海宸威视智能科技有限公司 Fine-grained weak-feature target emergence detection method based on multi-mode remote sensing image
CN113436148A (en) * 2021-06-02 2021-09-24 范加利 Method and system for detecting critical points of ship-borne airplane wheel contour based on deep learning
CN113837080A (en) * 2021-09-24 2021-12-24 江西理工大学 Small target detection method based on information enhancement and receptive field enhancement
CN115527123A (en) * 2022-10-21 2022-12-27 河北省科学院地理科学研究所 Land cover remote sensing monitoring method based on multi-source feature fusion

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711295A (en) * 2018-12-14 2019-05-03 北京航空航天大学 A kind of remote sensing image offshore Ship Detection
CN110796037A (en) * 2019-10-15 2020-02-14 武汉大学 Satellite-borne optical remote sensing image ship target detection method based on lightweight receptive field pyramid
CN111079604A (en) * 2019-12-06 2020-04-28 重庆市地理信息和遥感应用中心(重庆市测绘产品质量检验测试中心) Method for quickly detecting tiny target facing large-scale remote sensing image
US20200175352A1 (en) * 2017-03-14 2020-06-04 University Of Manitoba Structure defect detection using machine learning algorithms

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200175352A1 (en) * 2017-03-14 2020-06-04 University Of Manitoba Structure defect detection using machine learning algorithms
CN109711295A (en) * 2018-12-14 2019-05-03 北京航空航天大学 A kind of remote sensing image offshore Ship Detection
CN110796037A (en) * 2019-10-15 2020-02-14 武汉大学 Satellite-borne optical remote sensing image ship target detection method based on lightweight receptive field pyramid
CN111079604A (en) * 2019-12-06 2020-04-28 重庆市地理信息和遥感应用中心(重庆市测绘产品质量检验测试中心) Method for quickly detecting tiny target facing large-scale remote sensing image

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
YUAN YAO 等: "On-Board Ship Detection in Micro-Nano Satellite Based on Deep Learning and COTS Component", 《REMOTE SENSING》 *
周慧 等: "利用改进特征金字塔模型的SAR图像多目标船舶检测", 《电讯技术》 *
张云飞 等: "一种基于孪生网络的高鲁棒性实时单目标船舶跟踪方法", 《舰船科学技术》 *
王言鹏 等: "用于内河船舶目标检测的单次多框检测器算法", 《哈尔滨工程大学学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580585A (en) * 2020-12-28 2021-03-30 深圳职业技术学院 Excavator target detection method and device based on stacked dense network
CN113111718A (en) * 2021-03-16 2021-07-13 苏州海宸威视智能科技有限公司 Fine-grained weak-feature target emergence detection method based on multi-mode remote sensing image
CN113436148A (en) * 2021-06-02 2021-09-24 范加利 Method and system for detecting critical points of ship-borne airplane wheel contour based on deep learning
CN113436148B (en) * 2021-06-02 2022-07-12 中国人民解放军海军航空大学青岛校区 Method and system for detecting critical points of ship-borne airplane wheel contour based on deep learning
CN113837080A (en) * 2021-09-24 2021-12-24 江西理工大学 Small target detection method based on information enhancement and receptive field enhancement
CN113837080B (en) * 2021-09-24 2023-07-25 江西理工大学 Small target detection method based on information enhancement and receptive field enhancement
CN115527123A (en) * 2022-10-21 2022-12-27 河北省科学院地理科学研究所 Land cover remote sensing monitoring method based on multi-source feature fusion

Also Published As

Publication number Publication date
CN112101153B (en) 2022-08-26

Similar Documents

Publication Publication Date Title
CN112101153B (en) Remote sensing target detection method based on receptive field module and multiple characteristic pyramids
CN108764063B (en) Remote sensing image time-sensitive target identification system and method based on characteristic pyramid
CN113569667B (en) Inland ship target identification method and system based on lightweight neural network model
CN108537824B (en) Feature map enhanced network structure optimization method based on alternating deconvolution and convolution
CN112766188B (en) Small target pedestrian detection method based on improved YOLO algorithm
CN111461083A (en) Rapid vehicle detection method based on deep learning
CN112070713A (en) Multi-scale target detection method introducing attention mechanism
CN113076842A (en) Method for improving identification precision of traffic sign in extreme weather and environment
CN111126359A (en) High-definition image small target detection method based on self-encoder and YOLO algorithm
CN113971764B (en) Remote sensing image small target detection method based on improvement YOLOv3
CN113642390A (en) Street view image semantic segmentation method based on local attention network
CN114140683A (en) Aerial image target detection method, equipment and medium
CN113313706B (en) Power equipment defect image detection method based on detection reference point offset analysis
CN113591617B (en) Deep learning-based water surface small target detection and classification method
CN107038442A (en) A kind of car plate detection and global recognition method based on deep learning
CN113468978A (en) Fine-grained vehicle body color classification method, device and equipment based on deep learning
CN112766409A (en) Feature fusion method for remote sensing image target detection
CN115631427A (en) Multi-scene ship detection and segmentation method based on mixed attention
CN115861756A (en) Earth background small target identification method based on cascade combination network
Du et al. Improved detection method for traffic signs in real scenes applied in intelligent and connected vehicles
CN115527096A (en) Small target detection method based on improved YOLOv5
CN115330703A (en) Remote sensing image cloud and cloud shadow detection method based on context information fusion
Guo et al. D3-Net: Integrated multi-task convolutional neural network for water surface deblurring, dehazing and object detection
Saravanarajan et al. Improving semantic segmentation under hazy weather for autonomous vehicles using explainable artificial intelligence and adaptive dehazing approach
Li et al. Road extraction from high spatial resolution remote sensing image based on multi-task key point constraints

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant