CN111339950A

CN111339950A - Remote sensing image target detection method

Info

Publication number: CN111339950A
Application number: CN202010122264.XA
Authority: CN
Inventors: 滕竹; 段雅妮; 张宝鹏; 李芮; 李浥东
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2020-02-27
Filing date: 2020-02-27
Publication date: 2020-06-26
Anticipated expiration: 2040-02-27
Also published as: CN111339950B

Abstract

The invention provides a remote sensing image target detection method which designs an anchor frame based on image semantic features and enables an anchor frame generation stage to be efficient and accurate by means of strong expression of the features. The position and the size of the anchor frame are predicted through the central prediction branch and the shape prediction branch, the adopted bottleneck structure enlarges the receptive field of the characteristics relative to the original image, and the small target detection capability is enhanced. The method not only solves the problem of multi-scale of the remote sensing image target, but also has better detection performance on the small target; the method provided by the invention designs the anchor frame based on the image semantic features, and the anchor frame generation stage is efficient and accurate by means of strong expression of the features; the position and the size of an anchor frame are predicted through a central prediction branch and a shape prediction branch, the adopted bottleneck structure enlarges the receptive field of the characteristics relative to the original image, and the small target detection capability is enhanced; the model provided by the invention improves the adaptability to multi-scale targets in various data sets and enhances the generalization.

Description

Remote sensing image target detection method

Technical Field

The invention relates to the technical field of remote sensing image detection, in particular to a remote sensing image target detection method.

Background

In recent years, the traditional image target detection field has made a major breakthrough due to the introduction of a deep convolutional neural network, and currently, methods based on candidate regions represented by methods of R-CNN system, R-FCN, Mask R-CNN and the like and methods based on end-to-end represented by methods of SSD, YOLO system, RetinaNet and the like are mainly formed. With the continuous development of remote sensing technology, a remote sensing image can be acquired more and more easily and analyzed and processed, and the target detection of the remote sensing image is one of the basic problems of the processing and analysis of the remote sensing image. At present, most remote sensing image target detection methods are directly migrated from the traditional image target detection method. The remote sensing image target detection method based on the candidate region divides the detection process into two stages, firstly generates a series of candidate regions based on an original picture, sends a feature map and the candidate regions into a region of interest Pooling layer (ROI Pooling), and then carries out secondary classification prediction and regression prediction on the candidate regions to generate a final prediction result. The method greatly increases the calculation cost and limits the target detection speed of the remote sensing image. The end-to-end-based remote sensing image target detection method directly considers the detection problem as a regression problem and pays attention to the integrity. The method does not need a region recommendation stage, and directly generates the category prediction probability value and the position offset prediction value of the target object. The method is difficult to solve the problem of multi-scale targets in the remote sensing image, and the detection precision is low. Under the conditions of large scale change range of target objects, small cluster of target objects and various data sets in the remote sensing image, how to construct a detector which has both speed and precision and high adaptability to the scale change of the target objects in different data sets of the remote sensing image is the most important factor.

At present, the prior art achieves good effects according to a traditional image target detection method, as shown in fig. 4, the prior art firstly extracts features of an input image through a feature extraction network, then sets a plurality of groups of anchor frames with fixed sizes and proportions for feature images of a plurality of scales manually, and combines the multi-scale features and the anchor frames to generate a final prediction result, thereby improving the detection effect of a target object. However, in the prior art, anchor frames are often randomly generated or artificially preset, which is particularly disadvantageous for the characteristics of large scale change of a target object and multiple types of data sets in a remote sensing image, and often causes the problems of low efficiency, large proportion of negative anchor frames and difficulty in adapting to the target scale change of multiple data sets.

Disclosure of Invention

The embodiment of the invention provides a remote sensing image target detection method, which is used for solving the following technical problems in the prior art:

aiming at various data sets in the remote sensing image, anchor frames are set in advance manually, and an independent anchor frame strategy needs to be designed for each data set, so that the time cost is high and the efficiency is low;

because the scale change range of the target object of the remote sensing image is large, a plurality of groups of anchor frames need to be designed artificially, but only a small part of the anchor frames are available in practice, so that the negative anchor frames are too many and the error rate is high;

aiming at the problem of multi-scale target detection in remote sensing images, the method of multi-scale feature information fusion can only detect large and small scale targets in the same data set, and the large and small scale target detection in various data sets is difficult to realize.

In order to achieve the purpose, the invention adopts the following technical scheme.

A remote sensing image target detection method is implemented based on a scale generation network model and specifically comprises the following steps:

obtaining multi-scale information through a feature extraction network based on an original remote sensing image;

based on the multi-scale information, generating a central prediction branch and a shape prediction branch of the network through scales to obtain an anchor frame;

based on the anchor frame and the multi-scale information, obtaining regional recommendations through a first regional classification branch and a first regional regression branch of a regional generation network;

and obtaining a target image through a category label prediction operation and a regression prediction operation of the prediction network and a non-maximum suppression operation based on the region recommendation and the multi-scale information.

Preferably, based on the multi-scale information, scaling a center predicted branch and a shape predicted branch of the network to obtain an anchor box comprises:

based on the multi-scale information, obtaining a central prediction value through a central prediction operation of a central prediction branch, and obtaining a shape prediction offset through a shape prediction operation;

screening the central predicted value through the central predicted branch to obtain a central predicted position larger than a preset threshold value;

and performing offset regression on the central prediction bit value through the shape prediction branch to obtain the anchor frame with continuous scale.

Preferably, obtaining the region recommendation through a first region classification branch and a first region regression branch of the region generation network based on the anchor box and the multi-scale information comprises:

carrying out foreground score screening and anchor frame offset regression on the multi-scale information through the first region classification branch to obtain an anchor frame foreground background prediction score and an anchor frame offset prediction value;

and carrying out non-maximum suppression on the prediction score of the foreground and the background of the anchor frame and the predicted value of the offset of the anchor frame through the first regional regression branch to obtain a plurality of regional recommendations.

Preferably, obtaining the target image through a category label prediction operation and a regression amount prediction operation of the prediction network and a non-maximum suppression operation based on the region recommendation and the multi-scale information includes:

mapping the region recommendation to multi-scale information, and obtaining equal-size feature blocks through region-of-interest pooling operation;

based on the equal-size feature block, performing category label prediction operation through a second region classification branch, and performing regression quantity prediction operation through a second region regression branch to obtain the score of each category label recommended by the region and the offset of the region recommendation and the multi-scale information relative to the original target frame;

and obtaining a target image through non-maximum suppression based on the score of each type of label recommended by the region and the offset of the region recommendation and the multi-scale information relative to the original target frame.

Preferably, the training of the scale-generating network model further includes:

respectively generating loss values through a scale generation network, a region generation network and a prediction network to construct a loss function

(1) Wherein α represents the weight of the position prediction loss function, β represents the weight of the shape prediction loss function, and theta

The number of pictures (θ ═ 1,2, …, Θ) is shown, and Θ represents the total number of pictures;

and obtaining a multitask loss result through the loss function (1), and performing gradient back propagation on the scale generation network model through the multitask loss result to update the attention network parameters.

Preferably, obtaining multi-scale information through a feature extraction network based on the original remote sensing image comprises:

an input image pyramid in the original remote sensing image extraction is extracted through a ResNext branch of the feature extraction network;

and performing feature fusion operation on the input image pyramid through the FPN branch of the feature extraction network to obtain multi-scale information.

Preferably, the method further comprises the step of preprocessing the original remote sensing image, and specifically comprises the following steps:

fixing the size of the original remote sensing image by a resize method;

and carrying out mean value removing processing on the original remote sensing image with the fixed size to obtain a preprocessed original remote sensing image.

According to the technical scheme provided by the embodiment of the invention, the anchor frame is designed based on the image semantic features by the remote sensing image target detection method, and the anchor frame generation stage is efficient and accurate by means of strong expression of the features. The position and the size of the anchor frame are predicted through the central prediction branch and the shape prediction branch, the adopted bottleneck structure enlarges the receptive field of the characteristics relative to the original image, and the small target detection capability is enhanced. The method not only solves the problem of multi-scale of the remote sensing image target, but also has better detection performance on the small target; the method provided by the invention designs the anchor frame based on the image semantic features, and the anchor frame generation stage is efficient and accurate by means of strong expression of the features; the position and the size of an anchor frame are predicted through a central prediction branch and a shape prediction branch, the adopted bottleneck structure enlarges the receptive field of the characteristics relative to the original image, and the small target detection capability is enhanced; the model provided by the invention improves the adaptability to multi-scale targets in various data sets and enhances the generalization.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a processing flow chart of a method for detecting a target in a remote sensing image according to the present invention;

FIG. 2 is a process flow diagram of a preferred embodiment of a method for detecting a target in a remote sensing image according to the present invention;

FIG. 3 is a frame diagram of a scale-generating network model of a remote sensing image target detection method provided by the invention;

fig. 4 is a network model framework diagram provided in the prior art.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.

The first embodiment;

referring to fig. 1 and 2, the method for detecting a remote sensing image target provided by the invention is implemented based on a scale generation network model, and specifically comprises the following steps:

In the embodiment provided by the invention, the multi-scale information is that after the input original remote sensing picture is subjected to a plurality of convolution operations and down-sampling operations, the size of the feature map is smaller and smaller, a feature layer similar to a pyramid shape is formed, and each feature map represents one scale information.

Further, in the embodiment of the present invention, a scale generation network for generating a continuous scale anchor frame based on semantic features is used to solve the problem of tedious mechanism for manually designing an anchor frame and the problem of high error rate of negative samples, and the specific process is as follows:

performing offset regression on the central prediction bit value through the shape prediction branch to obtain an anchor frame with continuous scale;

in this embodiment, each branch is moved into a bottleneck (bottle) structure, so as to enhance the receptive field of the feature map relative to the original image, which is helpful to improve the detection performance of the small target object.

Further, the obtaining of the region recommendation through the first region classification branch and the first region regression branch of the region generation network based on the anchor frame and the multi-scale information includes:

performing one-round foreground score screening and anchor frame offset regression on the multi-scale information through the first region classification branch to obtain an anchor frame foreground background prediction score and an anchor frame offset prediction value;

performing non-maximum suppression on the prediction score of the foreground and the background of the anchor frame and the predicted value of the offset of the anchor frame through the first regional regression branch to obtain a plurality of regional recommendations;

in this embodiment, only one round of screening is performed as the input of the final prediction, which is the end-to-end target detection method, also called a one-stage method, and the effect is to achieve a fast prediction speed.

Further, the obtaining the target image through the category label prediction operation and the regression amount prediction operation of the prediction network and the non-maximum suppression operation based on the region recommendation and the multi-scale information includes:

Further, the method provided by the present invention further includes training the scale generation network model, specifically including:

generating loss values and constructing loss functions through total 6 branch networks in the scale generation network, the area generation network and the prediction network respectively

The loss function in the embodiment has the effects that the end-to-end training model is carried out through the summation of 6 loss functions, the accuracy of each network structure is observed through continuously adjusting the parameters of the network model, and therefore the accuracy of the whole network model is improved.

Furthermore, in a preferred embodiment provided by the present invention, the feature extraction network adds feature pyramid network fusion high-low semantic feature information, and the deep feature map of the convolutional neural network is suitable for extracting features of a larger target object corresponding to a larger receptive field of the original image, and the shallow feature map thereof is suitable for extracting features of a smaller target object corresponding to a smaller receptive field of the original image, and the feature fusion module is added to enhance expressible power of multi-scale information; the specific process is as follows:

Further, in a preferred embodiment provided by the present invention, the method further includes a step of preprocessing the original remote sensing image, specifically including:

fixing the size of the original remote sensing image by a resize method;

Example two:

the invention provides a scale generation network model for implementing the method, as shown in fig. 3, comprising:

a feature extraction module;

the image is input into a model, firstly multi-scale feature information of the image is extracted through a feature extraction module, and the multi-scale feature information is mainly realized by ResNext101 and FPN structures. And obtaining multi-level feature maps after the FPN network. And feeding each featuremap into the scale generation module.

A scale generation module;

the input of the scale generation module is a multi-layer feature map obtained by FPN, for each feature map, the feature map is enabled to respectively pass through a center prediction branch and a shape prediction branch of the scale generation module to obtain a center prediction value and a shape prediction offset, positions which are larger than a certain preset threshold value in the center prediction value are screened, the offset regression operation is carried out on the positions, and finally an anchor frame is obtained.

A region generation module;

the input of the area generation module is an anchor frame generated by the scale generation module and a plurality of layers of feature maps obtained by FPN, each feature map is sent to the area classification branch and the area regression branch to obtain an anchor frame foreground background prediction score and an anchor frame offset prediction value, and a plurality of proposals are obtained after NMS operation.

A final prediction module;

the input of the final prediction module is the region recommendation generated by the region generation module and the multi-layer feature maps obtained by the FPN. The region recommendations are mapped onto feature map and sampled into feature blocks of the same size through the ROI Pooling layer. And respectively sending the feature blocks into the prediction classification branch and the prediction regression branch to obtain a category prediction fraction value and a regression offset prediction value, and obtaining a final prediction result after NMS operation.

In summary, according to the remote sensing image target detection method provided by the invention, based on the region recommendation method (two-stage method) in the target detection method, firstly, the multi-layer feature information of the input picture is extracted through the feature extraction network, and is sent into the scale generation network to generate the anchor frame with continuous scale, and then the final result is predicted through the region generation network and the prediction network in sequence.

The feature extraction network is composed of ResNext101 and FPN networks and is mainly used for extracting multi-scale feature information of the remote sensing image. The scale generation network aims to improve the anchor frame generation phase, and the anchor frame with continuous scale is generated by introducing an anchor frame center prediction branch and an anchor frame shape prediction branch. And each branch introduces a bottleeck structure for enhancing the receptive field of the characteristic graph relative to the original graph and contributing to improving the detection performance of the small target object. And the area generation network respectively generates a foreground prediction score and an offset prediction value through an anchor frame foreground background classification network and an anchor frame position offset prediction network. And then, screening the foreground score, performing regression operation, and performing non-maximum suppression operation (NMS) operation to obtain a screened anchor frame, namely the propusals. And the prediction network respectively generates a fraction prediction value of the Proposal relative to each established category and a position offset of the Proposal relative to the actual calibration frame through the target category prediction network and the target position prediction network, and a final prediction result is obtained after screening, regression and NMS.

The method provided by the invention designs the anchor frame based on the image semantic features, and enables the anchor frame generation stage to be efficient and accurate by means of strong expression of the features. The position and the size of an anchor frame are predicted through a central prediction branch and a shape prediction branch, and the adopted bottleeck structure enlarges the receptive field of the characteristics relative to the original image and enhances the small target detection capability. The method not only solves the problem of multi-scale of the remote sensing image target, but also has better detection performance on the small target; the model provided by the invention improves the adaptability to multi-scale targets in various data sets and enhances the generalization.

Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.

From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A remote sensing image target detection method is characterized in that a scale-based network generation model is implemented, and specifically comprises the following steps:

2. The method of claim 1, wherein obtaining an anchor block by scaling a central predicted branch and a shape predicted branch of the network based on the multi-scale information comprises:

3. The method of claim 2, wherein obtaining the regional recommendation via the first regional classification branch and the first regional regression branch of the regional generation network based on the anchor box and the multi-scale information comprises:

4. The method of claim 3, wherein obtaining the target image based on the region recommendation and the multi-scale information through a class label prediction operation and a regression prediction operation of the prediction network and a non-maximum suppression operation comprises:

5. The method according to claim 1, further comprising training the scale-generating network model, specifically comprising:

6. The method according to any one of claims 1 to 5, wherein the obtaining of the multi-scale information through the feature extraction network based on the original remote sensing image comprises:

7. The method according to any one of claims 1 to 5, characterized by the further step of preprocessing the raw remote sensing image, in particular comprising:

fixing the size of the original remote sensing image by a resize method;