CN111339950A - Remote sensing image target detection method - Google Patents

Remote sensing image target detection method Download PDF

Info

Publication number
CN111339950A
CN111339950A CN202010122264.XA CN202010122264A CN111339950A CN 111339950 A CN111339950 A CN 111339950A CN 202010122264 A CN202010122264 A CN 202010122264A CN 111339950 A CN111339950 A CN 111339950A
Authority
CN
China
Prior art keywords
prediction
branch
network
scale
remote sensing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010122264.XA
Other languages
Chinese (zh)
Other versions
CN111339950B (en
Inventor
滕竹
段雅妮
张宝鹏
李芮
李浥东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN202010122264.XA priority Critical patent/CN111339950B/en
Publication of CN111339950A publication Critical patent/CN111339950A/en
Application granted granted Critical
Publication of CN111339950B publication Critical patent/CN111339950B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a remote sensing image target detection method which designs an anchor frame based on image semantic features and enables an anchor frame generation stage to be efficient and accurate by means of strong expression of the features. The position and the size of the anchor frame are predicted through the central prediction branch and the shape prediction branch, the adopted bottleneck structure enlarges the receptive field of the characteristics relative to the original image, and the small target detection capability is enhanced. The method not only solves the problem of multi-scale of the remote sensing image target, but also has better detection performance on the small target; the method provided by the invention designs the anchor frame based on the image semantic features, and the anchor frame generation stage is efficient and accurate by means of strong expression of the features; the position and the size of an anchor frame are predicted through a central prediction branch and a shape prediction branch, the adopted bottleneck structure enlarges the receptive field of the characteristics relative to the original image, and the small target detection capability is enhanced; the model provided by the invention improves the adaptability to multi-scale targets in various data sets and enhances the generalization.

Description

Remote sensing image target detection method
Technical Field
The invention relates to the technical field of remote sensing image detection, in particular to a remote sensing image target detection method.
Background
In recent years, the traditional image target detection field has made a major breakthrough due to the introduction of a deep convolutional neural network, and currently, methods based on candidate regions represented by methods of R-CNN system, R-FCN, Mask R-CNN and the like and methods based on end-to-end represented by methods of SSD, YOLO system, RetinaNet and the like are mainly formed. With the continuous development of remote sensing technology, a remote sensing image can be acquired more and more easily and analyzed and processed, and the target detection of the remote sensing image is one of the basic problems of the processing and analysis of the remote sensing image. At present, most remote sensing image target detection methods are directly migrated from the traditional image target detection method. The remote sensing image target detection method based on the candidate region divides the detection process into two stages, firstly generates a series of candidate regions based on an original picture, sends a feature map and the candidate regions into a region of interest Pooling layer (ROI Pooling), and then carries out secondary classification prediction and regression prediction on the candidate regions to generate a final prediction result. The method greatly increases the calculation cost and limits the target detection speed of the remote sensing image. The end-to-end-based remote sensing image target detection method directly considers the detection problem as a regression problem and pays attention to the integrity. The method does not need a region recommendation stage, and directly generates the category prediction probability value and the position offset prediction value of the target object. The method is difficult to solve the problem of multi-scale targets in the remote sensing image, and the detection precision is low. Under the conditions of large scale change range of target objects, small cluster of target objects and various data sets in the remote sensing image, how to construct a detector which has both speed and precision and high adaptability to the scale change of the target objects in different data sets of the remote sensing image is the most important factor.
At present, the prior art achieves good effects according to a traditional image target detection method, as shown in fig. 4, the prior art firstly extracts features of an input image through a feature extraction network, then sets a plurality of groups of anchor frames with fixed sizes and proportions for feature images of a plurality of scales manually, and combines the multi-scale features and the anchor frames to generate a final prediction result, thereby improving the detection effect of a target object. However, in the prior art, anchor frames are often randomly generated or artificially preset, which is particularly disadvantageous for the characteristics of large scale change of a target object and multiple types of data sets in a remote sensing image, and often causes the problems of low efficiency, large proportion of negative anchor frames and difficulty in adapting to the target scale change of multiple data sets.
Disclosure of Invention
The embodiment of the invention provides a remote sensing image target detection method, which is used for solving the following technical problems in the prior art:
aiming at various data sets in the remote sensing image, anchor frames are set in advance manually, and an independent anchor frame strategy needs to be designed for each data set, so that the time cost is high and the efficiency is low;
because the scale change range of the target object of the remote sensing image is large, a plurality of groups of anchor frames need to be designed artificially, but only a small part of the anchor frames are available in practice, so that the negative anchor frames are too many and the error rate is high;
aiming at the problem of multi-scale target detection in remote sensing images, the method of multi-scale feature information fusion can only detect large and small scale targets in the same data set, and the large and small scale target detection in various data sets is difficult to realize.
In order to achieve the purpose, the invention adopts the following technical scheme.
A remote sensing image target detection method is implemented based on a scale generation network model and specifically comprises the following steps:
obtaining multi-scale information through a feature extraction network based on an original remote sensing image;
based on the multi-scale information, generating a central prediction branch and a shape prediction branch of the network through scales to obtain an anchor frame;
based on the anchor frame and the multi-scale information, obtaining regional recommendations through a first regional classification branch and a first regional regression branch of a regional generation network;
and obtaining a target image through a category label prediction operation and a regression prediction operation of the prediction network and a non-maximum suppression operation based on the region recommendation and the multi-scale information.
Preferably, based on the multi-scale information, scaling a center predicted branch and a shape predicted branch of the network to obtain an anchor box comprises:
based on the multi-scale information, obtaining a central prediction value through a central prediction operation of a central prediction branch, and obtaining a shape prediction offset through a shape prediction operation;
screening the central predicted value through the central predicted branch to obtain a central predicted position larger than a preset threshold value;
and performing offset regression on the central prediction bit value through the shape prediction branch to obtain the anchor frame with continuous scale.
Preferably, obtaining the region recommendation through a first region classification branch and a first region regression branch of the region generation network based on the anchor box and the multi-scale information comprises:
carrying out foreground score screening and anchor frame offset regression on the multi-scale information through the first region classification branch to obtain an anchor frame foreground background prediction score and an anchor frame offset prediction value;
and carrying out non-maximum suppression on the prediction score of the foreground and the background of the anchor frame and the predicted value of the offset of the anchor frame through the first regional regression branch to obtain a plurality of regional recommendations.
Preferably, obtaining the target image through a category label prediction operation and a regression amount prediction operation of the prediction network and a non-maximum suppression operation based on the region recommendation and the multi-scale information includes:
mapping the region recommendation to multi-scale information, and obtaining equal-size feature blocks through region-of-interest pooling operation;
based on the equal-size feature block, performing category label prediction operation through a second region classification branch, and performing regression quantity prediction operation through a second region regression branch to obtain the score of each category label recommended by the region and the offset of the region recommendation and the multi-scale information relative to the original target frame;
and obtaining a target image through non-maximum suppression based on the score of each type of label recommended by the region and the offset of the region recommendation and the multi-scale information relative to the original target frame.
Preferably, the training of the scale-generating network model further includes:
respectively generating loss values through a scale generation network, a region generation network and a prediction network to construct a loss function
Figure BDA0002393312000000031
(1) Wherein α represents the weight of the position prediction loss function, β represents the weight of the shape prediction loss function, and theta
The number of pictures (θ ═ 1,2, …, Θ) is shown, and Θ represents the total number of pictures;
and obtaining a multitask loss result through the loss function (1), and performing gradient back propagation on the scale generation network model through the multitask loss result to update the attention network parameters.
Preferably, obtaining multi-scale information through a feature extraction network based on the original remote sensing image comprises:
an input image pyramid in the original remote sensing image extraction is extracted through a ResNext branch of the feature extraction network;
and performing feature fusion operation on the input image pyramid through the FPN branch of the feature extraction network to obtain multi-scale information.
Preferably, the method further comprises the step of preprocessing the original remote sensing image, and specifically comprises the following steps:
fixing the size of the original remote sensing image by a resize method;
and carrying out mean value removing processing on the original remote sensing image with the fixed size to obtain a preprocessed original remote sensing image.
According to the technical scheme provided by the embodiment of the invention, the anchor frame is designed based on the image semantic features by the remote sensing image target detection method, and the anchor frame generation stage is efficient and accurate by means of strong expression of the features. The position and the size of the anchor frame are predicted through the central prediction branch and the shape prediction branch, the adopted bottleneck structure enlarges the receptive field of the characteristics relative to the original image, and the small target detection capability is enhanced. The method not only solves the problem of multi-scale of the remote sensing image target, but also has better detection performance on the small target; the method provided by the invention designs the anchor frame based on the image semantic features, and the anchor frame generation stage is efficient and accurate by means of strong expression of the features; the position and the size of an anchor frame are predicted through a central prediction branch and a shape prediction branch, the adopted bottleneck structure enlarges the receptive field of the characteristics relative to the original image, and the small target detection capability is enhanced; the model provided by the invention improves the adaptability to multi-scale targets in various data sets and enhances the generalization.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a processing flow chart of a method for detecting a target in a remote sensing image according to the present invention;
FIG. 2 is a process flow diagram of a preferred embodiment of a method for detecting a target in a remote sensing image according to the present invention;
FIG. 3 is a frame diagram of a scale-generating network model of a remote sensing image target detection method provided by the invention;
fig. 4 is a network model framework diagram provided in the prior art.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.
The first embodiment;
referring to fig. 1 and 2, the method for detecting a remote sensing image target provided by the invention is implemented based on a scale generation network model, and specifically comprises the following steps:
obtaining multi-scale information through a feature extraction network based on an original remote sensing image;
based on the multi-scale information, generating a central prediction branch and a shape prediction branch of the network through scales to obtain an anchor frame;
based on the anchor frame and the multi-scale information, obtaining regional recommendations through a first regional classification branch and a first regional regression branch of a regional generation network;
and obtaining a target image through a category label prediction operation and a regression prediction operation of the prediction network and a non-maximum suppression operation based on the region recommendation and the multi-scale information.
In the embodiment provided by the invention, the multi-scale information is that after the input original remote sensing picture is subjected to a plurality of convolution operations and down-sampling operations, the size of the feature map is smaller and smaller, a feature layer similar to a pyramid shape is formed, and each feature map represents one scale information.
Further, in the embodiment of the present invention, a scale generation network for generating a continuous scale anchor frame based on semantic features is used to solve the problem of tedious mechanism for manually designing an anchor frame and the problem of high error rate of negative samples, and the specific process is as follows:
based on the multi-scale information, obtaining a central prediction value through a central prediction operation of a central prediction branch, and obtaining a shape prediction offset through a shape prediction operation;
screening the central predicted value through the central predicted branch to obtain a central predicted position larger than a preset threshold value;
performing offset regression on the central prediction bit value through the shape prediction branch to obtain an anchor frame with continuous scale;
in this embodiment, each branch is moved into a bottleneck (bottle) structure, so as to enhance the receptive field of the feature map relative to the original image, which is helpful to improve the detection performance of the small target object.
Further, the obtaining of the region recommendation through the first region classification branch and the first region regression branch of the region generation network based on the anchor frame and the multi-scale information includes:
performing one-round foreground score screening and anchor frame offset regression on the multi-scale information through the first region classification branch to obtain an anchor frame foreground background prediction score and an anchor frame offset prediction value;
performing non-maximum suppression on the prediction score of the foreground and the background of the anchor frame and the predicted value of the offset of the anchor frame through the first regional regression branch to obtain a plurality of regional recommendations;
in this embodiment, only one round of screening is performed as the input of the final prediction, which is the end-to-end target detection method, also called a one-stage method, and the effect is to achieve a fast prediction speed.
Further, the obtaining the target image through the category label prediction operation and the regression amount prediction operation of the prediction network and the non-maximum suppression operation based on the region recommendation and the multi-scale information includes:
mapping the region recommendation to multi-scale information, and obtaining equal-size feature blocks through region-of-interest pooling operation;
based on the equal-size feature block, performing category label prediction operation through a second region classification branch, and performing regression quantity prediction operation through a second region regression branch to obtain the score of each category label recommended by the region and the offset of the region recommendation and the multi-scale information relative to the original target frame;
and obtaining a target image through non-maximum suppression based on the score of each type of label recommended by the region and the offset of the region recommendation and the multi-scale information relative to the original target frame.
Further, the method provided by the present invention further includes training the scale generation network model, specifically including:
generating loss values and constructing loss functions through total 6 branch networks in the scale generation network, the area generation network and the prediction network respectively
Figure BDA0002393312000000061
(1) Wherein α represents the weight of the position prediction loss function, β represents the weight of the shape prediction loss function, and theta
The number of pictures (θ ═ 1,2, …, Θ) is shown, and Θ represents the total number of pictures;
and obtaining a multitask loss result through the loss function (1), and performing gradient back propagation on the scale generation network model through the multitask loss result to update the attention network parameters.
The loss function in the embodiment has the effects that the end-to-end training model is carried out through the summation of 6 loss functions, the accuracy of each network structure is observed through continuously adjusting the parameters of the network model, and therefore the accuracy of the whole network model is improved.
Furthermore, in a preferred embodiment provided by the present invention, the feature extraction network adds feature pyramid network fusion high-low semantic feature information, and the deep feature map of the convolutional neural network is suitable for extracting features of a larger target object corresponding to a larger receptive field of the original image, and the shallow feature map thereof is suitable for extracting features of a smaller target object corresponding to a smaller receptive field of the original image, and the feature fusion module is added to enhance expressible power of multi-scale information; the specific process is as follows:
an input image pyramid in the original remote sensing image extraction is extracted through a ResNext branch of the feature extraction network;
and performing feature fusion operation on the input image pyramid through the FPN branch of the feature extraction network to obtain multi-scale information.
Further, in a preferred embodiment provided by the present invention, the method further includes a step of preprocessing the original remote sensing image, specifically including:
fixing the size of the original remote sensing image by a resize method;
and carrying out mean value removing processing on the original remote sensing image with the fixed size to obtain a preprocessed original remote sensing image.
Example two:
the invention provides a scale generation network model for implementing the method, as shown in fig. 3, comprising:
a feature extraction module;
the image is input into a model, firstly multi-scale feature information of the image is extracted through a feature extraction module, and the multi-scale feature information is mainly realized by ResNext101 and FPN structures. And obtaining multi-level feature maps after the FPN network. And feeding each featuremap into the scale generation module.
A scale generation module;
the input of the scale generation module is a multi-layer feature map obtained by FPN, for each feature map, the feature map is enabled to respectively pass through a center prediction branch and a shape prediction branch of the scale generation module to obtain a center prediction value and a shape prediction offset, positions which are larger than a certain preset threshold value in the center prediction value are screened, the offset regression operation is carried out on the positions, and finally an anchor frame is obtained.
A region generation module;
the input of the area generation module is an anchor frame generated by the scale generation module and a plurality of layers of feature maps obtained by FPN, each feature map is sent to the area classification branch and the area regression branch to obtain an anchor frame foreground background prediction score and an anchor frame offset prediction value, and a plurality of proposals are obtained after NMS operation.
A final prediction module;
the input of the final prediction module is the region recommendation generated by the region generation module and the multi-layer feature maps obtained by the FPN. The region recommendations are mapped onto feature map and sampled into feature blocks of the same size through the ROI Pooling layer. And respectively sending the feature blocks into the prediction classification branch and the prediction regression branch to obtain a category prediction fraction value and a regression offset prediction value, and obtaining a final prediction result after NMS operation.
In summary, according to the remote sensing image target detection method provided by the invention, based on the region recommendation method (two-stage method) in the target detection method, firstly, the multi-layer feature information of the input picture is extracted through the feature extraction network, and is sent into the scale generation network to generate the anchor frame with continuous scale, and then the final result is predicted through the region generation network and the prediction network in sequence.
The feature extraction network is composed of ResNext101 and FPN networks and is mainly used for extracting multi-scale feature information of the remote sensing image. The scale generation network aims to improve the anchor frame generation phase, and the anchor frame with continuous scale is generated by introducing an anchor frame center prediction branch and an anchor frame shape prediction branch. And each branch introduces a bottleeck structure for enhancing the receptive field of the characteristic graph relative to the original graph and contributing to improving the detection performance of the small target object. And the area generation network respectively generates a foreground prediction score and an offset prediction value through an anchor frame foreground background classification network and an anchor frame position offset prediction network. And then, screening the foreground score, performing regression operation, and performing non-maximum suppression operation (NMS) operation to obtain a screened anchor frame, namely the propusals. And the prediction network respectively generates a fraction prediction value of the Proposal relative to each established category and a position offset of the Proposal relative to the actual calibration frame through the target category prediction network and the target position prediction network, and a final prediction result is obtained after screening, regression and NMS.
The method provided by the invention designs the anchor frame based on the image semantic features, and enables the anchor frame generation stage to be efficient and accurate by means of strong expression of the features. The position and the size of an anchor frame are predicted through a central prediction branch and a shape prediction branch, and the adopted bottleeck structure enlarges the receptive field of the characteristics relative to the original image and enhances the small target detection capability. The method not only solves the problem of multi-scale of the remote sensing image target, but also has better detection performance on the small target; the model provided by the invention improves the adaptability to multi-scale targets in various data sets and enhances the generalization.
Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (7)

1. A remote sensing image target detection method is characterized in that a scale-based network generation model is implemented, and specifically comprises the following steps:
obtaining multi-scale information through a feature extraction network based on an original remote sensing image;
based on the multi-scale information, generating a central prediction branch and a shape prediction branch of the network through scales to obtain an anchor frame;
based on the anchor frame and the multi-scale information, obtaining regional recommendations through a first regional classification branch and a first regional regression branch of a regional generation network;
and obtaining a target image through a category label prediction operation and a regression prediction operation of the prediction network and a non-maximum suppression operation based on the region recommendation and the multi-scale information.
2. The method of claim 1, wherein obtaining an anchor block by scaling a central predicted branch and a shape predicted branch of the network based on the multi-scale information comprises:
based on the multi-scale information, obtaining a central prediction value through a central prediction operation of a central prediction branch, and obtaining a shape prediction offset through a shape prediction operation;
screening the central predicted value through the central predicted branch to obtain a central predicted position larger than a preset threshold value;
and performing offset regression on the central prediction bit value through the shape prediction branch to obtain the anchor frame with continuous scale.
3. The method of claim 2, wherein obtaining the regional recommendation via the first regional classification branch and the first regional regression branch of the regional generation network based on the anchor box and the multi-scale information comprises:
carrying out foreground score screening and anchor frame offset regression on the multi-scale information through the first region classification branch to obtain an anchor frame foreground background prediction score and an anchor frame offset prediction value;
and carrying out non-maximum suppression on the prediction score of the foreground and the background of the anchor frame and the predicted value of the offset of the anchor frame through the first regional regression branch to obtain a plurality of regional recommendations.
4. The method of claim 3, wherein obtaining the target image based on the region recommendation and the multi-scale information through a class label prediction operation and a regression prediction operation of the prediction network and a non-maximum suppression operation comprises:
mapping the region recommendation to multi-scale information, and obtaining equal-size feature blocks through region-of-interest pooling operation;
based on the equal-size feature block, performing category label prediction operation through a second region classification branch, and performing regression quantity prediction operation through a second region regression branch to obtain the score of each category label recommended by the region and the offset of the region recommendation and the multi-scale information relative to the original target frame;
and obtaining a target image through non-maximum suppression based on the score of each type of label recommended by the region and the offset of the region recommendation and the multi-scale information relative to the original target frame.
5. The method according to claim 1, further comprising training the scale-generating network model, specifically comprising:
respectively generating loss values through a scale generation network, a region generation network and a prediction network to construct a loss function
Figure FDA0002393311990000021
(1) Wherein α represents the weight of the position prediction loss function, β represents the weight of the shape prediction loss function, and theta
The number of pictures (θ ═ 1,2, …, Θ) is shown, and Θ represents the total number of pictures;
and obtaining a multitask loss result through the loss function (1), and performing gradient back propagation on the scale generation network model through the multitask loss result to update the attention network parameters.
6. The method according to any one of claims 1 to 5, wherein the obtaining of the multi-scale information through the feature extraction network based on the original remote sensing image comprises:
an input image pyramid in the original remote sensing image extraction is extracted through a ResNext branch of the feature extraction network;
and performing feature fusion operation on the input image pyramid through the FPN branch of the feature extraction network to obtain multi-scale information.
7. The method according to any one of claims 1 to 5, characterized by the further step of preprocessing the raw remote sensing image, in particular comprising:
fixing the size of the original remote sensing image by a resize method;
and carrying out mean value removing processing on the original remote sensing image with the fixed size to obtain a preprocessed original remote sensing image.
CN202010122264.XA 2020-02-27 2020-02-27 Remote sensing image target detection method Active CN111339950B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010122264.XA CN111339950B (en) 2020-02-27 2020-02-27 Remote sensing image target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010122264.XA CN111339950B (en) 2020-02-27 2020-02-27 Remote sensing image target detection method

Publications (2)

Publication Number Publication Date
CN111339950A true CN111339950A (en) 2020-06-26
CN111339950B CN111339950B (en) 2024-01-23

Family

ID=71185587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010122264.XA Active CN111339950B (en) 2020-02-27 2020-02-27 Remote sensing image target detection method

Country Status (1)

Country Link
CN (1) CN111339950B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814889A (en) * 2020-07-14 2020-10-23 大连理工大学人工智能大连研究院 Single-stage target detection method using anchor-frame-free module and enhanced classifier
CN111860287A (en) * 2020-07-16 2020-10-30 Oppo广东移动通信有限公司 Target detection method and device and storage medium
CN112069910A (en) * 2020-08-11 2020-12-11 上海海事大学 Method for detecting multi-direction ship target by remote sensing image

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013102797A1 (en) * 2012-01-06 2013-07-11 Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi System and method for detecting targets in maritime surveillance applications
WO2016054778A1 (en) * 2014-10-09 2016-04-14 Microsoft Technology Licensing, Llc Generic object detection in images
CN109086668A (en) * 2018-07-02 2018-12-25 电子科技大学 Based on the multiple dimensioned unmanned aerial vehicle remote sensing images road information extracting method for generating confrontation network
CN110189255A (en) * 2019-05-29 2019-08-30 电子科技大学 Method for detecting human face based on hierarchical detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013102797A1 (en) * 2012-01-06 2013-07-11 Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi System and method for detecting targets in maritime surveillance applications
WO2016054778A1 (en) * 2014-10-09 2016-04-14 Microsoft Technology Licensing, Llc Generic object detection in images
CN109086668A (en) * 2018-07-02 2018-12-25 电子科技大学 Based on the multiple dimensioned unmanned aerial vehicle remote sensing images road information extracting method for generating confrontation network
CN110189255A (en) * 2019-05-29 2019-08-30 电子科技大学 Method for detecting human face based on hierarchical detection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NAN MO等: "Class-Specific Ancor Based and Context-Guided Multi-Class Object Detection in High Resolution Remote Sensing Imagery with a Convolutional Neural Network", 《REMOTE SENSING》 *
王佳琪: "基于卷积神经网络的多尺度目标检测方法研究", 《中国优秀硕士学位论文(电子期刊)工程科技II辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814889A (en) * 2020-07-14 2020-10-23 大连理工大学人工智能大连研究院 Single-stage target detection method using anchor-frame-free module and enhanced classifier
CN111860287A (en) * 2020-07-16 2020-10-30 Oppo广东移动通信有限公司 Target detection method and device and storage medium
CN112069910A (en) * 2020-08-11 2020-12-11 上海海事大学 Method for detecting multi-direction ship target by remote sensing image
CN112069910B (en) * 2020-08-11 2024-03-01 上海海事大学 Multi-directional ship target detection method for remote sensing image

Also Published As

Publication number Publication date
CN111339950B (en) 2024-01-23

Similar Documents

Publication Publication Date Title
CN109859190B (en) Target area detection method based on deep learning
US11373305B2 (en) Image processing method and device, computer apparatus, and storage medium
CN114202672A (en) Small target detection method based on attention mechanism
CN112396002A (en) Lightweight remote sensing target detection method based on SE-YOLOv3
Wang et al. A review of image super-resolution approaches based on deep learning and applications in remote sensing
CN112380921A (en) Road detection method based on Internet of vehicles
CN110533041B (en) Regression-based multi-scale scene text detection method
CN111339950A (en) Remote sensing image target detection method
CN113591795A (en) Lightweight face detection method and system based on mixed attention feature pyramid structure
CN115731533B (en) Vehicle-mounted target detection method based on improved YOLOv5
CN112927279A (en) Image depth information generation method, device and storage medium
CN113361645B (en) Target detection model construction method and system based on meta learning and knowledge memory
CN113505792A (en) Multi-scale semantic segmentation method and model for unbalanced remote sensing image
CN115035295B (en) Remote sensing image semantic segmentation method based on shared convolution kernel and boundary loss function
CN112784756A (en) Human body identification tracking method
CN111008979A (en) Robust night image semantic segmentation method
CN116091946A (en) Yolov 5-based unmanned aerial vehicle aerial image target detection method
CN113052039A (en) Method, system and server for detecting pedestrian density of traffic network
CN117853955A (en) Unmanned aerial vehicle small target detection method based on improved YOLOv5
Lin et al. Small object detection in aerial view based on improved YoloV3 neural network
CN112084897A (en) Rapid traffic large-scene vehicle target detection method of GS-SSD
CN113901924A (en) Document table detection method and device
CN117710841A (en) Small target detection method and device for aerial image of unmanned aerial vehicle
CN112132207A (en) Target detection neural network construction method based on multi-branch feature mapping
Zhao et al. Recognition and Classification of Concrete Cracks under Strong Interference Based on Convolutional Neural Network.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant