CN111339950A - Remote sensing image target detection method - Google Patents
Remote sensing image target detection method Download PDFInfo
- Publication number
- CN111339950A CN111339950A CN202010122264.XA CN202010122264A CN111339950A CN 111339950 A CN111339950 A CN 111339950A CN 202010122264 A CN202010122264 A CN 202010122264A CN 111339950 A CN111339950 A CN 111339950A
- Authority
- CN
- China
- Prior art keywords
- prediction
- branch
- network
- scale
- remote sensing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 37
- 238000000034 method Methods 0.000 claims abstract description 40
- 238000000605 extraction Methods 0.000 claims description 20
- 230000001629 suppression Effects 0.000 claims description 13
- 238000012216 screening Methods 0.000 claims description 9
- 230000004927 fusion Effects 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 238000012549 training Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000008859 change Effects 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/13—Satellite images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Astronomy & Astrophysics (AREA)
- Remote Sensing (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a remote sensing image target detection method which designs an anchor frame based on image semantic features and enables an anchor frame generation stage to be efficient and accurate by means of strong expression of the features. The position and the size of the anchor frame are predicted through the central prediction branch and the shape prediction branch, the adopted bottleneck structure enlarges the receptive field of the characteristics relative to the original image, and the small target detection capability is enhanced. The method not only solves the problem of multi-scale of the remote sensing image target, but also has better detection performance on the small target; the method provided by the invention designs the anchor frame based on the image semantic features, and the anchor frame generation stage is efficient and accurate by means of strong expression of the features; the position and the size of an anchor frame are predicted through a central prediction branch and a shape prediction branch, the adopted bottleneck structure enlarges the receptive field of the characteristics relative to the original image, and the small target detection capability is enhanced; the model provided by the invention improves the adaptability to multi-scale targets in various data sets and enhances the generalization.
Description
Technical Field
The invention relates to the technical field of remote sensing image detection, in particular to a remote sensing image target detection method.
Background
In recent years, the traditional image target detection field has made a major breakthrough due to the introduction of a deep convolutional neural network, and currently, methods based on candidate regions represented by methods of R-CNN system, R-FCN, Mask R-CNN and the like and methods based on end-to-end represented by methods of SSD, YOLO system, RetinaNet and the like are mainly formed. With the continuous development of remote sensing technology, a remote sensing image can be acquired more and more easily and analyzed and processed, and the target detection of the remote sensing image is one of the basic problems of the processing and analysis of the remote sensing image. At present, most remote sensing image target detection methods are directly migrated from the traditional image target detection method. The remote sensing image target detection method based on the candidate region divides the detection process into two stages, firstly generates a series of candidate regions based on an original picture, sends a feature map and the candidate regions into a region of interest Pooling layer (ROI Pooling), and then carries out secondary classification prediction and regression prediction on the candidate regions to generate a final prediction result. The method greatly increases the calculation cost and limits the target detection speed of the remote sensing image. The end-to-end-based remote sensing image target detection method directly considers the detection problem as a regression problem and pays attention to the integrity. The method does not need a region recommendation stage, and directly generates the category prediction probability value and the position offset prediction value of the target object. The method is difficult to solve the problem of multi-scale targets in the remote sensing image, and the detection precision is low. Under the conditions of large scale change range of target objects, small cluster of target objects and various data sets in the remote sensing image, how to construct a detector which has both speed and precision and high adaptability to the scale change of the target objects in different data sets of the remote sensing image is the most important factor.
At present, the prior art achieves good effects according to a traditional image target detection method, as shown in fig. 4, the prior art firstly extracts features of an input image through a feature extraction network, then sets a plurality of groups of anchor frames with fixed sizes and proportions for feature images of a plurality of scales manually, and combines the multi-scale features and the anchor frames to generate a final prediction result, thereby improving the detection effect of a target object. However, in the prior art, anchor frames are often randomly generated or artificially preset, which is particularly disadvantageous for the characteristics of large scale change of a target object and multiple types of data sets in a remote sensing image, and often causes the problems of low efficiency, large proportion of negative anchor frames and difficulty in adapting to the target scale change of multiple data sets.
Disclosure of Invention
The embodiment of the invention provides a remote sensing image target detection method, which is used for solving the following technical problems in the prior art:
aiming at various data sets in the remote sensing image, anchor frames are set in advance manually, and an independent anchor frame strategy needs to be designed for each data set, so that the time cost is high and the efficiency is low;
because the scale change range of the target object of the remote sensing image is large, a plurality of groups of anchor frames need to be designed artificially, but only a small part of the anchor frames are available in practice, so that the negative anchor frames are too many and the error rate is high;
aiming at the problem of multi-scale target detection in remote sensing images, the method of multi-scale feature information fusion can only detect large and small scale targets in the same data set, and the large and small scale target detection in various data sets is difficult to realize.
In order to achieve the purpose, the invention adopts the following technical scheme.
A remote sensing image target detection method is implemented based on a scale generation network model and specifically comprises the following steps:
obtaining multi-scale information through a feature extraction network based on an original remote sensing image;
based on the multi-scale information, generating a central prediction branch and a shape prediction branch of the network through scales to obtain an anchor frame;
based on the anchor frame and the multi-scale information, obtaining regional recommendations through a first regional classification branch and a first regional regression branch of a regional generation network;
and obtaining a target image through a category label prediction operation and a regression prediction operation of the prediction network and a non-maximum suppression operation based on the region recommendation and the multi-scale information.
Preferably, based on the multi-scale information, scaling a center predicted branch and a shape predicted branch of the network to obtain an anchor box comprises:
based on the multi-scale information, obtaining a central prediction value through a central prediction operation of a central prediction branch, and obtaining a shape prediction offset through a shape prediction operation;
screening the central predicted value through the central predicted branch to obtain a central predicted position larger than a preset threshold value;
and performing offset regression on the central prediction bit value through the shape prediction branch to obtain the anchor frame with continuous scale.
Preferably, obtaining the region recommendation through a first region classification branch and a first region regression branch of the region generation network based on the anchor box and the multi-scale information comprises:
carrying out foreground score screening and anchor frame offset regression on the multi-scale information through the first region classification branch to obtain an anchor frame foreground background prediction score and an anchor frame offset prediction value;
and carrying out non-maximum suppression on the prediction score of the foreground and the background of the anchor frame and the predicted value of the offset of the anchor frame through the first regional regression branch to obtain a plurality of regional recommendations.
Preferably, obtaining the target image through a category label prediction operation and a regression amount prediction operation of the prediction network and a non-maximum suppression operation based on the region recommendation and the multi-scale information includes:
mapping the region recommendation to multi-scale information, and obtaining equal-size feature blocks through region-of-interest pooling operation;
based on the equal-size feature block, performing category label prediction operation through a second region classification branch, and performing regression quantity prediction operation through a second region regression branch to obtain the score of each category label recommended by the region and the offset of the region recommendation and the multi-scale information relative to the original target frame;
and obtaining a target image through non-maximum suppression based on the score of each type of label recommended by the region and the offset of the region recommendation and the multi-scale information relative to the original target frame.
Preferably, the training of the scale-generating network model further includes:
respectively generating loss values through a scale generation network, a region generation network and a prediction network to construct a loss function
(1) Wherein α represents the weight of the position prediction loss function, β represents the weight of the shape prediction loss function, and theta
The number of pictures (θ ═ 1,2, …, Θ) is shown, and Θ represents the total number of pictures;
and obtaining a multitask loss result through the loss function (1), and performing gradient back propagation on the scale generation network model through the multitask loss result to update the attention network parameters.
Preferably, obtaining multi-scale information through a feature extraction network based on the original remote sensing image comprises:
an input image pyramid in the original remote sensing image extraction is extracted through a ResNext branch of the feature extraction network;
and performing feature fusion operation on the input image pyramid through the FPN branch of the feature extraction network to obtain multi-scale information.
Preferably, the method further comprises the step of preprocessing the original remote sensing image, and specifically comprises the following steps:
fixing the size of the original remote sensing image by a resize method;
and carrying out mean value removing processing on the original remote sensing image with the fixed size to obtain a preprocessed original remote sensing image.
According to the technical scheme provided by the embodiment of the invention, the anchor frame is designed based on the image semantic features by the remote sensing image target detection method, and the anchor frame generation stage is efficient and accurate by means of strong expression of the features. The position and the size of the anchor frame are predicted through the central prediction branch and the shape prediction branch, the adopted bottleneck structure enlarges the receptive field of the characteristics relative to the original image, and the small target detection capability is enhanced. The method not only solves the problem of multi-scale of the remote sensing image target, but also has better detection performance on the small target; the method provided by the invention designs the anchor frame based on the image semantic features, and the anchor frame generation stage is efficient and accurate by means of strong expression of the features; the position and the size of an anchor frame are predicted through a central prediction branch and a shape prediction branch, the adopted bottleneck structure enlarges the receptive field of the characteristics relative to the original image, and the small target detection capability is enhanced; the model provided by the invention improves the adaptability to multi-scale targets in various data sets and enhances the generalization.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a processing flow chart of a method for detecting a target in a remote sensing image according to the present invention;
FIG. 2 is a process flow diagram of a preferred embodiment of a method for detecting a target in a remote sensing image according to the present invention;
FIG. 3 is a frame diagram of a scale-generating network model of a remote sensing image target detection method provided by the invention;
fig. 4 is a network model framework diagram provided in the prior art.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.
The first embodiment;
referring to fig. 1 and 2, the method for detecting a remote sensing image target provided by the invention is implemented based on a scale generation network model, and specifically comprises the following steps:
obtaining multi-scale information through a feature extraction network based on an original remote sensing image;
based on the multi-scale information, generating a central prediction branch and a shape prediction branch of the network through scales to obtain an anchor frame;
based on the anchor frame and the multi-scale information, obtaining regional recommendations through a first regional classification branch and a first regional regression branch of a regional generation network;
and obtaining a target image through a category label prediction operation and a regression prediction operation of the prediction network and a non-maximum suppression operation based on the region recommendation and the multi-scale information.
In the embodiment provided by the invention, the multi-scale information is that after the input original remote sensing picture is subjected to a plurality of convolution operations and down-sampling operations, the size of the feature map is smaller and smaller, a feature layer similar to a pyramid shape is formed, and each feature map represents one scale information.
Further, in the embodiment of the present invention, a scale generation network for generating a continuous scale anchor frame based on semantic features is used to solve the problem of tedious mechanism for manually designing an anchor frame and the problem of high error rate of negative samples, and the specific process is as follows:
based on the multi-scale information, obtaining a central prediction value through a central prediction operation of a central prediction branch, and obtaining a shape prediction offset through a shape prediction operation;
screening the central predicted value through the central predicted branch to obtain a central predicted position larger than a preset threshold value;
performing offset regression on the central prediction bit value through the shape prediction branch to obtain an anchor frame with continuous scale;
in this embodiment, each branch is moved into a bottleneck (bottle) structure, so as to enhance the receptive field of the feature map relative to the original image, which is helpful to improve the detection performance of the small target object.
Further, the obtaining of the region recommendation through the first region classification branch and the first region regression branch of the region generation network based on the anchor frame and the multi-scale information includes:
performing one-round foreground score screening and anchor frame offset regression on the multi-scale information through the first region classification branch to obtain an anchor frame foreground background prediction score and an anchor frame offset prediction value;
performing non-maximum suppression on the prediction score of the foreground and the background of the anchor frame and the predicted value of the offset of the anchor frame through the first regional regression branch to obtain a plurality of regional recommendations;
in this embodiment, only one round of screening is performed as the input of the final prediction, which is the end-to-end target detection method, also called a one-stage method, and the effect is to achieve a fast prediction speed.
Further, the obtaining the target image through the category label prediction operation and the regression amount prediction operation of the prediction network and the non-maximum suppression operation based on the region recommendation and the multi-scale information includes:
mapping the region recommendation to multi-scale information, and obtaining equal-size feature blocks through region-of-interest pooling operation;
based on the equal-size feature block, performing category label prediction operation through a second region classification branch, and performing regression quantity prediction operation through a second region regression branch to obtain the score of each category label recommended by the region and the offset of the region recommendation and the multi-scale information relative to the original target frame;
and obtaining a target image through non-maximum suppression based on the score of each type of label recommended by the region and the offset of the region recommendation and the multi-scale information relative to the original target frame.
Further, the method provided by the present invention further includes training the scale generation network model, specifically including:
generating loss values and constructing loss functions through total 6 branch networks in the scale generation network, the area generation network and the prediction network respectively
(1) Wherein α represents the weight of the position prediction loss function, β represents the weight of the shape prediction loss function, and theta
The number of pictures (θ ═ 1,2, …, Θ) is shown, and Θ represents the total number of pictures;
and obtaining a multitask loss result through the loss function (1), and performing gradient back propagation on the scale generation network model through the multitask loss result to update the attention network parameters.
The loss function in the embodiment has the effects that the end-to-end training model is carried out through the summation of 6 loss functions, the accuracy of each network structure is observed through continuously adjusting the parameters of the network model, and therefore the accuracy of the whole network model is improved.
Furthermore, in a preferred embodiment provided by the present invention, the feature extraction network adds feature pyramid network fusion high-low semantic feature information, and the deep feature map of the convolutional neural network is suitable for extracting features of a larger target object corresponding to a larger receptive field of the original image, and the shallow feature map thereof is suitable for extracting features of a smaller target object corresponding to a smaller receptive field of the original image, and the feature fusion module is added to enhance expressible power of multi-scale information; the specific process is as follows:
an input image pyramid in the original remote sensing image extraction is extracted through a ResNext branch of the feature extraction network;
and performing feature fusion operation on the input image pyramid through the FPN branch of the feature extraction network to obtain multi-scale information.
Further, in a preferred embodiment provided by the present invention, the method further includes a step of preprocessing the original remote sensing image, specifically including:
fixing the size of the original remote sensing image by a resize method;
and carrying out mean value removing processing on the original remote sensing image with the fixed size to obtain a preprocessed original remote sensing image.
Example two:
the invention provides a scale generation network model for implementing the method, as shown in fig. 3, comprising:
a feature extraction module;
the image is input into a model, firstly multi-scale feature information of the image is extracted through a feature extraction module, and the multi-scale feature information is mainly realized by ResNext101 and FPN structures. And obtaining multi-level feature maps after the FPN network. And feeding each featuremap into the scale generation module.
A scale generation module;
the input of the scale generation module is a multi-layer feature map obtained by FPN, for each feature map, the feature map is enabled to respectively pass through a center prediction branch and a shape prediction branch of the scale generation module to obtain a center prediction value and a shape prediction offset, positions which are larger than a certain preset threshold value in the center prediction value are screened, the offset regression operation is carried out on the positions, and finally an anchor frame is obtained.
A region generation module;
the input of the area generation module is an anchor frame generated by the scale generation module and a plurality of layers of feature maps obtained by FPN, each feature map is sent to the area classification branch and the area regression branch to obtain an anchor frame foreground background prediction score and an anchor frame offset prediction value, and a plurality of proposals are obtained after NMS operation.
A final prediction module;
the input of the final prediction module is the region recommendation generated by the region generation module and the multi-layer feature maps obtained by the FPN. The region recommendations are mapped onto feature map and sampled into feature blocks of the same size through the ROI Pooling layer. And respectively sending the feature blocks into the prediction classification branch and the prediction regression branch to obtain a category prediction fraction value and a regression offset prediction value, and obtaining a final prediction result after NMS operation.
In summary, according to the remote sensing image target detection method provided by the invention, based on the region recommendation method (two-stage method) in the target detection method, firstly, the multi-layer feature information of the input picture is extracted through the feature extraction network, and is sent into the scale generation network to generate the anchor frame with continuous scale, and then the final result is predicted through the region generation network and the prediction network in sequence.
The feature extraction network is composed of ResNext101 and FPN networks and is mainly used for extracting multi-scale feature information of the remote sensing image. The scale generation network aims to improve the anchor frame generation phase, and the anchor frame with continuous scale is generated by introducing an anchor frame center prediction branch and an anchor frame shape prediction branch. And each branch introduces a bottleeck structure for enhancing the receptive field of the characteristic graph relative to the original graph and contributing to improving the detection performance of the small target object. And the area generation network respectively generates a foreground prediction score and an offset prediction value through an anchor frame foreground background classification network and an anchor frame position offset prediction network. And then, screening the foreground score, performing regression operation, and performing non-maximum suppression operation (NMS) operation to obtain a screened anchor frame, namely the propusals. And the prediction network respectively generates a fraction prediction value of the Proposal relative to each established category and a position offset of the Proposal relative to the actual calibration frame through the target category prediction network and the target position prediction network, and a final prediction result is obtained after screening, regression and NMS.
The method provided by the invention designs the anchor frame based on the image semantic features, and enables the anchor frame generation stage to be efficient and accurate by means of strong expression of the features. The position and the size of an anchor frame are predicted through a central prediction branch and a shape prediction branch, and the adopted bottleeck structure enlarges the receptive field of the characteristics relative to the original image and enhances the small target detection capability. The method not only solves the problem of multi-scale of the remote sensing image target, but also has better detection performance on the small target; the model provided by the invention improves the adaptability to multi-scale targets in various data sets and enhances the generalization.
Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (7)
1. A remote sensing image target detection method is characterized in that a scale-based network generation model is implemented, and specifically comprises the following steps:
obtaining multi-scale information through a feature extraction network based on an original remote sensing image;
based on the multi-scale information, generating a central prediction branch and a shape prediction branch of the network through scales to obtain an anchor frame;
based on the anchor frame and the multi-scale information, obtaining regional recommendations through a first regional classification branch and a first regional regression branch of a regional generation network;
and obtaining a target image through a category label prediction operation and a regression prediction operation of the prediction network and a non-maximum suppression operation based on the region recommendation and the multi-scale information.
2. The method of claim 1, wherein obtaining an anchor block by scaling a central predicted branch and a shape predicted branch of the network based on the multi-scale information comprises:
based on the multi-scale information, obtaining a central prediction value through a central prediction operation of a central prediction branch, and obtaining a shape prediction offset through a shape prediction operation;
screening the central predicted value through the central predicted branch to obtain a central predicted position larger than a preset threshold value;
and performing offset regression on the central prediction bit value through the shape prediction branch to obtain the anchor frame with continuous scale.
3. The method of claim 2, wherein obtaining the regional recommendation via the first regional classification branch and the first regional regression branch of the regional generation network based on the anchor box and the multi-scale information comprises:
carrying out foreground score screening and anchor frame offset regression on the multi-scale information through the first region classification branch to obtain an anchor frame foreground background prediction score and an anchor frame offset prediction value;
and carrying out non-maximum suppression on the prediction score of the foreground and the background of the anchor frame and the predicted value of the offset of the anchor frame through the first regional regression branch to obtain a plurality of regional recommendations.
4. The method of claim 3, wherein obtaining the target image based on the region recommendation and the multi-scale information through a class label prediction operation and a regression prediction operation of the prediction network and a non-maximum suppression operation comprises:
mapping the region recommendation to multi-scale information, and obtaining equal-size feature blocks through region-of-interest pooling operation;
based on the equal-size feature block, performing category label prediction operation through a second region classification branch, and performing regression quantity prediction operation through a second region regression branch to obtain the score of each category label recommended by the region and the offset of the region recommendation and the multi-scale information relative to the original target frame;
and obtaining a target image through non-maximum suppression based on the score of each type of label recommended by the region and the offset of the region recommendation and the multi-scale information relative to the original target frame.
5. The method according to claim 1, further comprising training the scale-generating network model, specifically comprising:
respectively generating loss values through a scale generation network, a region generation network and a prediction network to construct a loss function
(1) Wherein α represents the weight of the position prediction loss function, β represents the weight of the shape prediction loss function, and theta
The number of pictures (θ ═ 1,2, …, Θ) is shown, and Θ represents the total number of pictures;
and obtaining a multitask loss result through the loss function (1), and performing gradient back propagation on the scale generation network model through the multitask loss result to update the attention network parameters.
6. The method according to any one of claims 1 to 5, wherein the obtaining of the multi-scale information through the feature extraction network based on the original remote sensing image comprises:
an input image pyramid in the original remote sensing image extraction is extracted through a ResNext branch of the feature extraction network;
and performing feature fusion operation on the input image pyramid through the FPN branch of the feature extraction network to obtain multi-scale information.
7. The method according to any one of claims 1 to 5, characterized by the further step of preprocessing the raw remote sensing image, in particular comprising:
fixing the size of the original remote sensing image by a resize method;
and carrying out mean value removing processing on the original remote sensing image with the fixed size to obtain a preprocessed original remote sensing image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010122264.XA CN111339950B (en) | 2020-02-27 | 2020-02-27 | Remote sensing image target detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010122264.XA CN111339950B (en) | 2020-02-27 | 2020-02-27 | Remote sensing image target detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111339950A true CN111339950A (en) | 2020-06-26 |
CN111339950B CN111339950B (en) | 2024-01-23 |
Family
ID=71185587
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010122264.XA Active CN111339950B (en) | 2020-02-27 | 2020-02-27 | Remote sensing image target detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111339950B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111814889A (en) * | 2020-07-14 | 2020-10-23 | 大连理工大学人工智能大连研究院 | Single-stage target detection method using anchor-frame-free module and enhanced classifier |
CN111860287A (en) * | 2020-07-16 | 2020-10-30 | Oppo广东移动通信有限公司 | Target detection method and device and storage medium |
CN112069910A (en) * | 2020-08-11 | 2020-12-11 | 上海海事大学 | Method for detecting multi-direction ship target by remote sensing image |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013102797A1 (en) * | 2012-01-06 | 2013-07-11 | Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi | System and method for detecting targets in maritime surveillance applications |
WO2016054778A1 (en) * | 2014-10-09 | 2016-04-14 | Microsoft Technology Licensing, Llc | Generic object detection in images |
CN109086668A (en) * | 2018-07-02 | 2018-12-25 | 电子科技大学 | Based on the multiple dimensioned unmanned aerial vehicle remote sensing images road information extracting method for generating confrontation network |
CN110189255A (en) * | 2019-05-29 | 2019-08-30 | 电子科技大学 | Method for detecting human face based on hierarchical detection |
-
2020
- 2020-02-27 CN CN202010122264.XA patent/CN111339950B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013102797A1 (en) * | 2012-01-06 | 2013-07-11 | Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi | System and method for detecting targets in maritime surveillance applications |
WO2016054778A1 (en) * | 2014-10-09 | 2016-04-14 | Microsoft Technology Licensing, Llc | Generic object detection in images |
CN109086668A (en) * | 2018-07-02 | 2018-12-25 | 电子科技大学 | Based on the multiple dimensioned unmanned aerial vehicle remote sensing images road information extracting method for generating confrontation network |
CN110189255A (en) * | 2019-05-29 | 2019-08-30 | 电子科技大学 | Method for detecting human face based on hierarchical detection |
Non-Patent Citations (2)
Title |
---|
NAN MO等: "Class-Specific Ancor Based and Context-Guided Multi-Class Object Detection in High Resolution Remote Sensing Imagery with a Convolutional Neural Network", 《REMOTE SENSING》 * |
王佳琪: "基于卷积神经网络的多尺度目标检测方法研究", 《中国优秀硕士学位论文(电子期刊)工程科技II辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111814889A (en) * | 2020-07-14 | 2020-10-23 | 大连理工大学人工智能大连研究院 | Single-stage target detection method using anchor-frame-free module and enhanced classifier |
CN111860287A (en) * | 2020-07-16 | 2020-10-30 | Oppo广东移动通信有限公司 | Target detection method and device and storage medium |
CN112069910A (en) * | 2020-08-11 | 2020-12-11 | 上海海事大学 | Method for detecting multi-direction ship target by remote sensing image |
CN112069910B (en) * | 2020-08-11 | 2024-03-01 | 上海海事大学 | Multi-directional ship target detection method for remote sensing image |
Also Published As
Publication number | Publication date |
---|---|
CN111339950B (en) | 2024-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109859190B (en) | Target area detection method based on deep learning | |
US11373305B2 (en) | Image processing method and device, computer apparatus, and storage medium | |
CN114202672A (en) | Small target detection method based on attention mechanism | |
CN112396002A (en) | Lightweight remote sensing target detection method based on SE-YOLOv3 | |
Wang et al. | A review of image super-resolution approaches based on deep learning and applications in remote sensing | |
CN112380921A (en) | Road detection method based on Internet of vehicles | |
CN110533041B (en) | Regression-based multi-scale scene text detection method | |
CN111339950A (en) | Remote sensing image target detection method | |
CN113591795A (en) | Lightweight face detection method and system based on mixed attention feature pyramid structure | |
CN115731533B (en) | Vehicle-mounted target detection method based on improved YOLOv5 | |
CN112927279A (en) | Image depth information generation method, device and storage medium | |
CN113361645B (en) | Target detection model construction method and system based on meta learning and knowledge memory | |
CN113505792A (en) | Multi-scale semantic segmentation method and model for unbalanced remote sensing image | |
CN115035295B (en) | Remote sensing image semantic segmentation method based on shared convolution kernel and boundary loss function | |
CN112784756A (en) | Human body identification tracking method | |
CN111008979A (en) | Robust night image semantic segmentation method | |
CN116091946A (en) | Yolov 5-based unmanned aerial vehicle aerial image target detection method | |
CN113052039A (en) | Method, system and server for detecting pedestrian density of traffic network | |
CN117853955A (en) | Unmanned aerial vehicle small target detection method based on improved YOLOv5 | |
Lin et al. | Small object detection in aerial view based on improved YoloV3 neural network | |
CN112084897A (en) | Rapid traffic large-scene vehicle target detection method of GS-SSD | |
CN113901924A (en) | Document table detection method and device | |
CN117710841A (en) | Small target detection method and device for aerial image of unmanned aerial vehicle | |
CN112132207A (en) | Target detection neural network construction method based on multi-branch feature mapping | |
Zhao et al. | Recognition and Classification of Concrete Cracks under Strong Interference Based on Convolutional Neural Network. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |