CN112270279B - Multi-dimensional-based remote sensing image micro-target detection method - Google Patents

Multi-dimensional-based remote sensing image micro-target detection method Download PDF

Info

Publication number
CN112270279B
CN112270279B CN202011204146.XA CN202011204146A CN112270279B CN 112270279 B CN112270279 B CN 112270279B CN 202011204146 A CN202011204146 A CN 202011204146A CN 112270279 B CN112270279 B CN 112270279B
Authority
CN
China
Prior art keywords
network
feature
resolution
scale
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011204146.XA
Other languages
Chinese (zh)
Other versions
CN112270279A (en
Inventor
李嫄源
张源川
朱智勤
李鹏华
冒睿睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202011204146.XA priority Critical patent/CN112270279B/en
Publication of CN112270279A publication Critical patent/CN112270279A/en
Application granted granted Critical
Publication of CN112270279B publication Critical patent/CN112270279B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a multi-dimensional remote sensing image micro target detection method, and belongs to the field of target detection. And performing multi-depth and multi-resolution feature extraction by adopting a multi-depth image pyramid network, extracting high-resolution image features by using a shallow layer flow, and extracting low-resolution image features by using a deep layer flow. Meanwhile, in order to relieve information imbalance among multi-scale and multi-depth features, a multi-scale feature pyramid network is also provided, so that semantic information and feature texture information of the micro target are transmitted from a high layer to a low layer and from a deep stream to a shallow stream, and the recognition result of the micro target is improved. And finally, adding two simple full connection layers for classification and regression tasks, wherein the whole model is called a multi-depth, multi-scale and multi-resolution micro target detection network M3 DNet. Finally, the method for detecting the micro target of the remote sensing image with multiple depths, multiple scales and multiple resolutions can effectively complete the detection of the micro target.

Description

Multi-dimensional-based remote sensing image micro-target detection method
Technical Field
The invention belongs to the field of target detection, and relates to a multi-dimensional remote sensing image micro target detection method.
Background
Object detection has been an important research direction in the field of computer vision and pattern recognition research, where tiny object detection is more challenging because small objects do not contain detailed information and may even disappear in a deep network. There are many methods for detecting and locating targets from images captured by satellites or drones, but the detection effect is not ideal for images with high noise and low resolution, especially for small targets. Even on high resolution images, the detection performance of small objects is lower than that of large objects. In general, small objects in an image generally refer to objects with a resolution of less than 32 x 32 pixels, while tiny objects generally refer to objects with a resolution of less than 15 x 15 pixels. Generally, feeding a high-resolution remote sensing image into a network can alleviate this problem, and can improve the recognition result of a tiny target to some extent. However, simply increasing the resolution causes more problems, for example, it exacerbates large variations in scale and incurs prohibitive computational costs and memory consumption. Directly feeding the low-resolution remote sensing image into the network also brings problems, for example, the information of the tiny target is seriously deficient, and the detailed information of the tiny target is gradually lost along with the down-sampling process of the network. Meanwhile, remote sensing images generally have problems of complex background, multi-resolution and the like, and how to process characteristic information under different scales is also a challenging problem. In order to alleviate the above problems without new problems, the present invention proposes a Multi-Depth, Multi-Scale, Multi-Resolution micro target Detection Network (M3 DNet), which mainly includes two modules: a multi-depth image pyramid network and a multi-scale feature pyramid network. The multi-depth image pyramid network is mainly composed of backbone networks with different depths, namely a deep convolutional neural network is adopted for a low-resolution remote sensing image, and the purpose is to extract semantic information of more micro targets; and a shallow convolutional neural network is adopted for the high-resolution remote sensing image, so that the calculated amount is reduced while more characteristic texture information is kept. The multi-scale feature pyramid network is composed of a plurality of traditional feature pyramid networks and is mainly used for aligning and fusing multi-scale feature maps generated by the multi-depth image pyramid network, information imbalance among the multi-scale and multi-depth features is reduced, meanwhile, different features of micro targets are extracted by fusing shallow and deep backbone networks to improve the recognition performance of the micro targets, and the performance of medium and large objects is maintained. And finally, obtaining a detection result by passing the fusion result of the multi-scale feature pyramid through a simple classification and regression network. The invention discloses a remote sensing image micro-target detection algorithm combining multiple depths, multiple scales and multiple resolutions, which can balance the instability of a micro target caused by image information of different depths and characteristic information of different scales, well reserve the characteristic texture information and semantic information of the micro target, and has better effect and robustness on the detection of the micro target in the remote sensing image.
Disclosure of Invention
In view of the above, the present invention provides a method for detecting a micro-target in a remote sensing image based on multiple dimensions.
In order to achieve the purpose, the invention provides the following technical scheme:
a multi-dimensional based remote sensing image micro-target detection method comprises the following steps:
1) the M3DNet network can extract the characteristic information of a tiny target by using a plurality of streams, firstly, N streams with different depths are selected to construct a multi-depth image pyramid network, and remote sensing images with different resolutions are input into different streams in the network;
2) inputting a high-resolution image by using a shallow layer flow, and focusing on texture information of a tiny target; inputting a low-resolution image by the deep layer flow, extracting deep semantic information of a tiny target, and adjusting images with different resolutions by using a scaling factor beta;
3) constructing a multi-scale feature pyramid according to the number of the streams, and performing multi-scale fusion on the features of the corresponding N streams to relieve the imbalance among the multi-scale and multi-level features; dimension matching is carried out between different dimension characteristic layers by using a convolution network of [1 multiplied by 1,1], and dimension matching is carried out between different dimension characteristics by using a bilinear interpolation algorithm;
4) the multi-scale feature pyramid network transmits semantic information and feature texture information of the tiny target from a high layer to a low layer and from a deep stream to a shallow stream; the matched and fused multi-scale features are processed by [1 x 1,1]]Is performed to obtain an output F 'of the multi-scale feature pyramid'iWherein the value range of i is determined by the number of the selected characteristic layers;
5) calculating by using a formula to obtain a selected value of k, and using the selected value as a selected characteristic layer; completing classification and regression tasks through two full connection layers; the whole network is trained in an end-to-end mode, and the weight is updated in a gradient descending mode until the network converges.
Optionally, in 1), the specific steps of the multi-depth image pyramid network are as follows:
11) giving a remote sensing image, marking as I, and marking the resolution as R;
12) taking a high resolution image as I0Resolution R, into a shallow convolutional neural network denoted C0
13) Taking the medium-resolution image as I1Resolution β R, into the middle layer convolutional neural network, denoted C1
14) Taking a low-resolution image as I2Resolution of beta2R, feeding into deep convolutional neural network, denoted as C2(ii) a Wherein, β is a hyper-parameter for adjusting resolution, β ∈ (0,1), and β ═ 0.5 by default; the multi-depth image pyramid network is capable of constructing N streams, CiAnd the network is utilized to improve the feature extraction capability of the tiny target, so that the extracted features comprise strong semantic information and strong feature texture information of the tiny target, and the detection effect of the tiny target is improved.
Optionally, in 2), N is 3M 3DNet network and ResNet backbone network, C0、C1And C2ResNet18, ResNet34 and ResNet50, or ResNet18, ResNet34 and ResNet101, ResNet34, ResNet50 and ResNet101, respectivelyCombinations of (a) and (b).
Optionally, in the 3), note
Figure BDA0002756457340000031
For the input multi-resolution remote sensing image,
Figure BDA0002756457340000032
output characteristics of N streams corresponding to the multi-depth image pyramid network, output characteristic O of each streamiAnd includes a multi-layer feature { Fi,jThe method comprises the steps of obtaining a backbone network, wherein i is a multi-scale index (indexes of different streams), i belongs to {0,1, 2.., N-1}, j is a multi-layer index of the backbone network, j belongs to {0,1, 2.., M-1}, and represents feature layer indexes corresponding to different down-sampling rates in the backbone network, if the down-sampling rate of a ResNet backbone network is 32, feature layers of the different down-sampling rates have 5 layers, namely feature layers corresponding to 2, 4, 8, 16 and 32 of down-sampling rates, and if four feature layers are taken and input into the multi-scale feature pyramid network, M is 4;
n ═ 3 and M ═ 4, OiExpressed as the following equation:
Oi=Ci(Ii)={Fi,0,Fi,1,Fi,2,Fi,4} (1)
wherein i ∈ {0,1, 2.
Optionally, in the 4), 1 × 1,1 is used]The convolution network carries out dimension matching, the up-sampling adopts a bilinear interpolation algorithm, and the feature graphs with the same size adopt an element addition mode; highest resolution feature F0,0The strong texture features of the tiny target are kept, and the strong semantic features in the multi-scale stream are combined, so that the method is expressed by the formula (2):
Figure BDA0002756457340000033
wherein, Fi,jRepresents the j-th layer feature in the i-th stream, Up (-) represents bilinear interpolated upsampling with upsampling rate of 2, and Conv (-) represents [1 × 1,1]The convolution operation of (2); output feature set of multi-scale feature pyramid networkIs O', and is defined by the formula:
O′={F′0,F′1,F′2,...,F′i,...} (3)
wherein, F'iIs defined as follows:
F′i=Conv(F0,i) (4)
wherein, F0,iIs O0I.e. the output feature level at which the highest resolution is at.
Optionally, in the step 5), the final output result O ', O' of the multi-scale feature pyramid network includes M feature layers with different resolutions, and the most suitable feature layer is selected through the formula (5), as shown in the following formula:
Figure BDA0002756457340000034
wherein w and h respectively represent the width and the height of the remote sensing image I; k is a radical of0Is a hyper-parameter, and the default value is M-2; k represents the number of selected feature layers, and k belongs to {0,1, 2., M-1 };
the selected feature layer passes through a simple classification and regression network, namely, a full connection layer is respectively used, one is used for a classification task, and the other is used for a regression task; the whole network adopts an end-to-end training mode, and the network weight is continuously updated until the model converges.
The invention has the beneficial effects that: the invention provides a method for combining multiple depths, multiple scales and multiple resolutions, which adopts a mode of extracting high-resolution image features by a shallow convolutional neural network and extracting low-resolution image features by a deep convolutional neural network, thereby not only reducing the calculated amount and the memory consumption, but also improving the extraction capability of tiny targets in remote sensing images without increasing the parameters. Meanwhile, a multi-scale feature pyramid structure is adopted for feature fusion, so that feature information can be transmitted from a high layer to a low layer (multi-scale), and can be transmitted from a deep stream to a shallow stream (multi-depth and multi-resolution), the micro target has strong feature texture information, and strong semantic information is kept, and the detection result of the micro target is improved.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a diagram of a multi-depth image pyramid network structure;
FIG. 2 is a diagram of a multi-scale feature pyramid network structure;
fig. 3 is an overall structure diagram of a network model.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.
Referring to fig. 1 to 3, the present invention includes the following steps:
1) the multi-depth image pyramid network is mainly composed of backbone networks with N different depths, and is used for processing an image pyramid, namely a multi-resolution remote sensing image, and each backbone network is called as a 'stream'. M3DNet networks are able to use multiple streams (backbone networks) to build a multi-depth image pyramid network. To better explain the feasibility of the method of the invention, an M3DNet network using 3 streams is taken as an example. The specific steps of the multi-depth image pyramid network are as follows:
a) giving a remote sensing image, marking as I, and marking the resolution as R;
b) high resolution image (image is I)0Resolution R) into a shallow convolutional neural network (denoted C)0);
c) The medium resolution image (image is I)1Resolution β R) into a middle convolutional neural network (denoted C)1);
d) Low resolution image (image is I)2Resolution of beta2R) feed-in deep convolutional neural network (denoted C)2) (ii) a Where β is a hyper-parameter used to adjust the resolution size, β ∈ (0,1), and β ═ 0.5 by default. Overall, the multi-depth image pyramid network is able to construct N streams, CiN-1, i ═ 0,1,2, can be very large with this networkThe feature extraction capability of the tiny target is improved to a certain extent, so that the extracted features not only contain strong semantic information of the tiny target, but also contain strong feature texture information, and the detection effect of the tiny target is improved.
2) With the development of deep learning, there are many mature and efficient backbone networks such as darkennet 53 used by the YOLO series of the one-phase network, fasternn used by the fasternn, ResNet used by the MaskRCNN, and resenxt. Taking ResNet backbone network as an example, the network is divided into different depth networks such as ResNet18, ResNet34, ResNet50 and ResNet101, and the networks can well fit the micro-object detection algorithm provided by the invention. As described above, C is an example of an M3DNet network and a ResNet backbone network in which N is 30、C1And C2It can be ResNet18, ResNet34 and ResNet50, or ResNet18, ResNet34 and ResNet101, ResNet34, ResNet50 and ResNet101, etc. combinations, and backbone networks such as ResNeXt, DenseNet, DarketNet53 are the same.
3) Note the book
Figure BDA0002756457340000051
For the input multi-resolution remote sensing image,
Figure BDA0002756457340000052
output characteristics of N streams corresponding to the multi-depth image pyramid network, output characteristic O of each streamiAnd includes a multi-layer feature { Fi,jAnd if the downsampling rate of the ResNet backbone network is 32, the feature layers with different downsampling rates have 5 layers in total (namely, the feature layers corresponding to the downsampling rates of 2, 4, 8, 16 and 32 respectively), and if the four layers of feature layers are taken and input into the multi-scale feature pyramid network, M is 4, and the other conditions are the same. Taking the case where N is 3 and M is 4 as an example, OiCan be expressed as the following equation:
Oi=Ci(Ii)={Fi,0,Fi,1,Fi,2,Fi,4} (1)
wherein i ∈ {0,1, 2.
4) In different streams and different feature layers, the semantic information and the feature texture information of the tiny target have strong or weak strength, and in order to make the feature information of the tiny target richer and more comprehensive, a feature pyramid structure is generally used. Feature Pyramid Networks (FPN) are one of the key components of most target detection algorithms, combining low-resolution, strong semantic Feature information with high-resolution, strong Feature texture information via top-down paths and multiple connections. In the M3DNet network, multi-scale (different resolutions) and multi-level (different-level features) features are generated by the multi-depth image pyramid network, and in order to relieve the imbalance among the features, the invention also provides the multi-scale feature pyramid network. Unlike conventional FPNs, the semantic information and feature texture information of the multi-scale feature pyramid network is propagated not only from higher layers to lower layers, but also from deep streams (low resolution, deep convolutional neural networks) to shallow streams (high resolution, shallow convolutional neural networks).
Because the pixels of the micro target are too small, the detail information of the micro target is gradually lost along with the increase of the down-sampling rate, the deep network often obtains strong semantic information under the condition of losing the texture information of the deep network, and on the contrary, the shallow network can obtain the strong texture information but the semantic information is weaker. The multi-scale characteristic pyramid network can well fuse multi-scale and multi-level characteristics, so that the fused characteristics comprise strong semantic information and strong texture information, and the identification capability of the tiny target is improved. As with conventional FPN, [ 1X 1,1] is used]The convolution network carries out dimension matching, the up-sampling adopts a bilinear interpolation algorithm, and the feature graphs with the same size adopt an element addition mode. After such processing, the highest resolution feature F0,0Not only the strong texture features of the tiny target are maintained, but also the strong semantic features in the multi-scale stream are combined, and the process can be expressed by formula (2):
Figure BDA0002756457340000061
wherein, Fi,jRepresents the j-th layer feature in the i-th stream, Up (-) represents bilinear interpolated upsampling with upsampling rate of 2, and Conv (-) represents [1 × 1,1]The convolution operation of (1). Finally, the output feature set of the multi-scale feature pyramid network is O', which is defined as follows:
O′={F′0,F′1,F′2,...,F′i,...} (3)
wherein, F'iIs defined as follows:
F′i=Conv(F0,i) (4)
wherein, F0,iIs O0I.e. the output feature level at which the highest resolution is at.
5) Through the steps, a final output result O 'of the multi-scale feature pyramid network can be obtained, where O' includes M feature layers with different resolutions, and the most suitable feature layer is selected through formula (5), as shown in the following formula:
Figure BDA0002756457340000071
wherein w and h respectively represent the width and the height of the remote sensing image I; k is a radical of0Is a hyper-parameter, and the default value is M-2; k represents the number of selected feature layers, k ∈ {0,1, 2.
And (4) the selected feature layers pass through a simple classification and regression network, namely, a full connection layer is respectively used, one is used for a classification task, and the other is used for a regression task. The whole network adopts an end-to-end training mode, and the network weight is continuously updated until the model converges.
The specific implementation details of each part of the invention are as follows:
1. extracting high-level semantic and low-level semantic features of the remote sensing image by using the multi-depth stream, and extracting strong semantic information and strong texture information of a tiny target after passing through a multi-depth image pyramid network;
2. by utilizing a multi-scale characteristic pyramid structure, the characteristics of different scales and different levels are fused, semantic information is transmitted from a high layer to a low layer and from a deep stream to a shallow stream, the characteristic information of the tiny target is further enriched, the imbalance among the characteristics is relieved, and the identification result of the tiny target is improved;
3. after obtaining a plurality of groups of characteristics O', the characteristics are selected by using a formula (5), and a detection result can be obtained through a simple classification and regression structure. Meanwhile, the whole network adopts an end-to-end training mode, and weight parameters are continuously updated until the network converges.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (4)

1. A multi-dimensional based remote sensing image micro-target detection method is characterized in that: the method comprises the following steps:
1) the M3DNet network can extract the characteristic information of a tiny target by using a plurality of streams, firstly, N streams with different depths are selected to construct a multi-depth image pyramid network, and remote sensing images with different resolutions are input into different streams in the network;
2) inputting a high-resolution image by using a shallow layer flow, and focusing on texture information of a tiny target; inputting a low-resolution image by the deep layer flow, extracting deep semantic information of a tiny target, and adjusting images with different resolutions by using a scaling factor beta;
3) constructing a multi-scale feature pyramid according to the number of the streams, and performing multi-scale fusion on the features of the corresponding N streams to relieve the imbalance among the multi-scale and multi-level features; dimension matching is carried out between different dimension characteristic layers by using a convolution network of [1 multiplied by 1,1], and dimension matching is carried out between different dimension characteristics by using a bilinear interpolation algorithm;
4) the multi-scale feature pyramid network transmits semantic information and feature texture information of the tiny target from a high layer to a low layer and from a deep stream to a shallow stream; the matched and fused multi-scale features are processed by [1 x 1,1]]Is performed to obtain an output F 'of the multi-scale feature pyramid'iWherein the value range of i is determined by the number of the selected characteristic layers;
5) calculating by using a formula to obtain a selected value of k, and using the selected value as a selected characteristic layer; completing classification and regression tasks through two full connection layers; training the whole network in an end-to-end mode, and updating the weight in a gradient descending mode until the network converges;
in the step 1), the specific steps of the multi-depth image pyramid network are as follows:
11) giving a remote sensing image, marking as I, and marking the resolution as R;
12) taking a high resolution image as I0Resolution R, into a shallow convolutional neural network denoted C0
13) Taking the medium-resolution image as I1Resolution β R, into the middle layer convolutional neural network, denoted C1
14) Taking a low-resolution image as I2Resolution of beta2R, feeding into deep convolutional neural network, denoted as C2(ii) a Wherein, beta is a hyper-parameter used for adjusting the resolution, and beta belongs to (0, 1); the multi-depth image pyramid network is capable of constructing N streams, CiThe network is utilized to improve the feature extraction capability of the tiny target, so that the extracted features comprise strong semantic information and strong feature texture information of the tiny target, and the detection effect of the tiny target is improved;
in said 4), use is made of [ 1X 1,1]]The convolution network carries out dimension matching, the up-sampling adopts a bilinear interpolation algorithm, and the feature graphs with the same size adopt an element addition mode; highest resolution feature F0,0Keeping strong texture features of tiny objects, combining strong semantic features in multi-scale streams, and passing through publicThe formula (2) represents:
Figure FDA0003523735250000011
where i is the multi-scale index, i.e. the index of the different streams, j is the multi-level index of the backbone network, Fi,jRepresents the j-th layer feature in the i-th stream, Up (-) represents bilinear interpolated upsampling with upsampling rate of 2, and Conv (-) represents [1 × 1,1]The convolution operation of (2); the output feature set of the multi-scale feature pyramid network is O', and is defined as follows:
O′={F0′,F1′,F2′,...,F′i,...} (3)
wherein, F'iIs defined as follows:
F′i=Conv(F0,i) (4)
wherein, F0,iIs O0I.e. the output feature level at which the highest resolution is at.
2. The method for detecting the micro target of the remote sensing image based on the multiple dimensions as claimed in claim 1, wherein: m3DNet network and ResNet backbone network with N ═ 3 in 2) above, C0、C1And C2Respectively, ResNet18, ResNet34, and ResNet50, or a combination of ResNet18, ResNet34, and ResNet101, ResNet34, ResNet50, and ResNet 101.
3. The method for detecting the micro target of the remote sensing image based on the multiple dimensions as claimed in claim 2, wherein: in said 3), note
Figure FDA0003523735250000021
For the input multi-resolution remote sensing image,
Figure FDA0003523735250000022
output of corresponding N streams for multi-depth image pyramid networkCharacteristic, output characteristic O of each streamiAnd includes a multi-layer feature { Fi,jI is a multi-scale index, namely indexes of different streams, i belongs to {0,1, 2.., N-1}, j is a multi-layer index of the backbone network, j belongs to {0,1, 2.., M-1}, and M is the number of feature layers with different resolution sizes; the characteristic layer indexes corresponding to different down-sampling rates in the backbone network are represented, the down-sampling rate of the ResNet backbone network is 32, the characteristic layers with different down-sampling rates have 5 layers in total, namely the characteristic layers corresponding to the down-sampling rates of 2, 4, 8, 16 and 32 respectively, the four layers of characteristic layers are taken and input into the multi-scale characteristic pyramid network, and M is 4;
n ═ 3 and M ═ 4, OiExpressed as the following equation:
Oi=Ci(Ii)={Fi,0,Fi,1,Fi,2,Fi,4} (1)
wherein i ∈ {0,1, 2.
4. The method for detecting the micro target of the remote sensing image based on the multiple dimensions as claimed in claim 1, wherein: in said 5), the final output result O ', O' of the multi-scale feature pyramid network includes M feature layers with different resolution sizes, and the feature layers are selected according to formula (5), as shown in the following formula:
Figure FDA0003523735250000023
wherein w and h respectively represent the width and the height of the remote sensing image I; k is a radical of0Is a hyperparameter and is M-2; k represents the number of selected feature layers, and k belongs to {0,1, 2., M-1 };
the selected feature layer passes through a simple classification and regression network, namely, a full connection layer is respectively used, one is used for a classification task, and the other is used for a regression task; the whole network adopts an end-to-end training mode, and the network weight is continuously updated until the model converges.
CN202011204146.XA 2020-11-02 2020-11-02 Multi-dimensional-based remote sensing image micro-target detection method Active CN112270279B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011204146.XA CN112270279B (en) 2020-11-02 2020-11-02 Multi-dimensional-based remote sensing image micro-target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011204146.XA CN112270279B (en) 2020-11-02 2020-11-02 Multi-dimensional-based remote sensing image micro-target detection method

Publications (2)

Publication Number Publication Date
CN112270279A CN112270279A (en) 2021-01-26
CN112270279B true CN112270279B (en) 2022-04-12

Family

ID=74345857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011204146.XA Active CN112270279B (en) 2020-11-02 2020-11-02 Multi-dimensional-based remote sensing image micro-target detection method

Country Status (1)

Country Link
CN (1) CN112270279B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599832A (en) * 2016-12-09 2017-04-26 重庆邮电大学 Method for detecting and recognizing various types of obstacles based on convolution neural network
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN110084292A (en) * 2019-04-18 2019-08-02 江南大学 Object detection method based on DenseNet and multi-scale feature fusion
CN110909642A (en) * 2019-11-13 2020-03-24 南京理工大学 Remote sensing image target detection method based on multi-scale semantic feature fusion
CN111460980A (en) * 2020-03-30 2020-07-28 西安工程大学 Multi-scale detection method for small-target pedestrian based on multi-semantic feature fusion

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10679351B2 (en) * 2017-08-18 2020-06-09 Samsung Electronics Co., Ltd. System and method for semantic segmentation of images

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599832A (en) * 2016-12-09 2017-04-26 重庆邮电大学 Method for detecting and recognizing various types of obstacles based on convolution neural network
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN110084292A (en) * 2019-04-18 2019-08-02 江南大学 Object detection method based on DenseNet and multi-scale feature fusion
CN110909642A (en) * 2019-11-13 2020-03-24 南京理工大学 Remote sensing image target detection method based on multi-scale semantic feature fusion
CN111460980A (en) * 2020-03-30 2020-07-28 西安工程大学 Multi-scale detection method for small-target pedestrian based on multi-semantic feature fusion

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A Multi-Scale Wavelet 3D-CNN for Hyperspectral Image Super-Resolution;Jingxiang Yang等;《Remote Sensing》;20190630;第11卷(第13期);第1-22页 *
Infrared small target detection with complex background based on image layer and confidence analysis;Li,H等;《AOPC 2015:IMAGE PROCESSING AND ANALYSIS》;20151231;第1-6页 *
基于多尺度融合SSD的小目标检测算法;赵亚男等;《计算机工程》;20200131;第46卷(第1期);第247-254页 *
基于深度学习的遥感图像目标检测算法的研究与实现;吴佳祥;《中国优秀硕士学位论文全文数据库 (工程科技辑)》;20200115;第C028-253页 *

Also Published As

Publication number Publication date
CN112270279A (en) 2021-01-26

Similar Documents

Publication Publication Date Title
CN107368831B (en) English words and digit recognition method in a kind of natural scene image
CN112926396B (en) Action identification method based on double-current convolution attention
CN109101975A (en) Image, semantic dividing method based on full convolutional neural networks
Zhou et al. Scale adaptive image cropping for UAV object detection
CN109800692B (en) Visual SLAM loop detection method based on pre-training convolutional neural network
CN112052783B (en) High-resolution image weak supervision building extraction method combining pixel semantic association and boundary attention
CN108010045A (en) Visual pattern characteristic point error hiding method of purification based on ORB
CN107239736A (en) Method for detecting human face and detection means based on multitask concatenated convolutional neutral net
CN111414968A (en) Multi-mode remote sensing image matching method based on convolutional neural network characteristic diagram
CN104809731B (en) A kind of rotation Scale invariant scene matching method based on gradient binaryzation
CN113627228B (en) Lane line detection method based on key point regression and multi-scale feature fusion
CN108345850A (en) The scene text detection method of the territorial classification of stroke feature transformation and deep learning based on super-pixel
CN104599275A (en) Understanding method of non-parametric RGB-D scene based on probabilistic graphical model
CN108520213B (en) Face beauty prediction method based on multi-scale depth
CN112560865B (en) Semantic segmentation method for point cloud under outdoor large scene
CN113763441B (en) Medical image registration method and system without supervision learning
CN113674334A (en) Texture recognition method based on depth self-attention network and local feature coding
CN113449612B (en) Three-dimensional target point cloud identification method based on sub-flow sparse convolution
CN107463932A (en) A kind of method that picture feature is extracted using binary system bottleneck neutral net
CN113537243A (en) Image classification method based on SE module and self-attention mechanism network
CN109902808A (en) A method of convolutional neural networks are optimized based on floating-point numerical digit Mutation Genetic Algorithms Based
CN116563682A (en) Attention scheme and strip convolution semantic line detection method based on depth Hough network
CN116469561A (en) Breast cancer survival prediction method based on deep learning
CN116258953A (en) Remote sensing image target detection method
CN110633706B (en) Semantic segmentation method based on pyramid network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant