CN115272846A

CN115272846A - Improved Orientdrcnn-based rotating target detection method

Info

Publication number: CN115272846A
Application number: CN202210827268.7A
Authority: CN
Inventors: 王友伟; 郭颖; 邵香迎; 鲍正位; 王季宇
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2022-07-13
Filing date: 2022-07-13
Publication date: 2022-11-01

Abstract

The invention relates to the technical field of rotating image target detection, in particular to a rotating target detection method based on improved Orienterdcnn, which comprises the following steps of: inputting an image; image preprocessing: adjusting each picture to a fixed size, carrying out normalization processing on the fixed-size pictures, and dividing the pictures into a training set, a verification set and a test set; inputting a network model for training: inputting the training set into an improved Oriented rcnn model for training; inputting a test set and outputting a detection result; the method defines the rotary anchor frame by using a six-parameter method different from the prior art, extracts different characteristics required by a classification task and a positioning task respectively by using different polarization functions, and introduces the SPP module to realize the fusion between local characteristics and global characteristics, so that the detection interference caused by the inconsistency of the required characteristics between classification and regression can be overcome, the different characteristics required by different tasks are effectively extracted, and the remote sensing image target can be classified and positioned more accurately.

Description

Improved Orienterdrcnn-based rotating target detection method

Technical Field

The invention relates to the technical field of rotating image target detection, in particular to a rotating target detection method based on improved Orientandrcnn.

Background

The remote sensing image target detection is a primary task in the field of remote sensing image processing, and the main work is to automatically find out an interested area and give a specific category of a target according to a given remote sensing image data set. However, the remote sensing image target has the characteristics of different directions, smaller scale and the like, and the conventional target detection uses a horizontal anchor frame and cannot be well attached to the target.

In order to accurately predict the target direction, many scholars propose rotating target detection, which introduces direction parameters to an RPN module and generates a directional anchor frame for regression and classification. For example, the rotarpn sets up 54 anchor frames of different angles, scales, and aspect ratios at each anchor point, which may improve accuracy when the directional objects are sparsely distributed. XieX et al propose Orienterrcnn to solve the computational cost problem and further improve accuracy.

In order to obtain accurate rotating target information, researchers introduce a rotating anchor frame on the existing target detection model to realize more accurate positioning. For example, patent application with publication number CN112800955A entitled "method and system for detecting remote sensing image rotating target based on weighted bidirectional feature pyramid" discloses a method for detecting remote sensing image rotating target based on weighted bidirectional feature pyramid, and introduces cross-scale feature fusion capability of a BiFPN enhanced model. The patent application with the publication number of CN110378242A and the name of 'a remote sensing target detection method of a double attention mechanism' discloses a remote sensing target detection method of the double attention mechanism, and a feature diagram is redefined by using the double attention mechanism.

However, due to the inconsistency of the required features between classification and regression, the existing rotating image target detection method has difficulty in accurately extracting different features required by different tasks.

Disclosure of Invention

The present invention aims to provide a method for detecting a rotating target based on an improved orientdrcnn, so as to solve the problems in the background art.

The technical scheme of the invention is as follows: a rotary target detection method based on improved Orientdrcnn comprises the following steps:

step 1, inputting an image: selecting a remote sensing image data set of an annotation file containing direction information as an input image, and randomly overturning and filling the input image;

step 2, image preprocessing: adjusting each picture to a fixed 1024 x 1024 size; carrying out normalization processing on the pictures with fixed sizes, and dividing the pictures into a training set, a verification set and a test set according to the following steps of 5;

step 3, inputting a network model for training: inputting the training set of the step 2 into an improved Orientandrcnn model for training;

step 4, inputting a test set, and outputting a detection result: detecting the remote sensing image by using a trained improved Orienterdrcnn model to obtain a result image of a target framed by a rotating frame;

wherein the training of the improved orientdrcnn model operation in step 3 comprises: inputting the training set into a main network ResNet50 for feature extraction to obtain features C2-C5 with different sizes; and inputting the extracted features into an SPP-FPN module for feature fusion to obtain featuremap, wherein the SPP-FPN module outputs C5 which is the deepest layer of the backbone network through the SPP module to obtain M5.

Preferably, the specific operation of step 3 includes:

step 3.1, inputting the training set into a main network ResNet50 for feature extraction to obtain features C2-C5 with different sizes;

step 3.2, inputting the extracted features into an SPP-FPN module for feature fusion to obtain featuremap;

step 3.3, inputting the featuremap into a rotation suggestion region generation module orientation edRPN, and outputting suggestion regions disposals after encoding and decoding;

and 3.4, inputting the feature files obtained in the step 3.2 and the explosals obtained in the step 3.3 into an improved detection head module PAM-head, carrying out final classification and positioning operation, and outputting a remote sensing target identification and positioning result.

Preferably, in step 3.2, the specific workflow of the SPP-FPN module includes: and (3) obtaining M5 by passing the deepest output C5 of the backbone network through an SPP module, carrying out element summation on results obtained by transversely connecting M5 upsampling and C4 to obtain M4, carrying out element summation on results obtained by transversely connecting M4 upsampling and C3 to obtain M3, and repeating the steps to obtain M2-M5, and carrying out 3 × 3 convolution on the M2-M5 to obtain improved FPN output P2-P5.

Preferably, the SPP module realizes fusion between local features and global features, processes the feature map by using pooling of different sizes, and finally performs stitching to obtain an output result.

Preferably, in step 3.3, the specific operation of the rotation proposal region generation module orientatedrpn includes: and 3.2, changing the number of convolution channels of the output characteristic diagram into 6A, wherein A represents the number of anchor frames generated at each anchor point, 6 represents that 6 parameters are needed to define a rotation anchor frame, and the 6 parameters are (x, y, w, h, delta alpha and delta beta), wherein x and y represent the coordinates of the central point of the generated horizontal anchor frame, w and h represent the width and height of the generated horizontal anchor frame, and delta alpha and delta beta represent the offset between two adjacent top points of the rotation anchor frame and the middle points of two adjacent edges of the horizontal anchor frame.

Preferably, the specific operation of the improved detection head PAM-head module in step 3.4 includes: the input feature map is processed by a polarized attention module PAM, different feature pyramids are generated for classification tasks and positioning tasks, feature interference among different tasks can be avoided, different key features required by different tasks are effectively extracted, the obtained different features are sent to a full connection layer for classification and regression, and final classification and positioning results are output.

Preferably, the polarization attention module PAM has a double-branch structure, the input feature map passes through the attention module (the channel attention module is parallel to the spatial attention module), and then uses different feature representation functions, wherein the classification branches use excitation functions, so that high-response global features can be obtained, and the positioning branches use suppression functions, so that only boundary features can be focused, and irrelevant high-activation regions can be suppressed.

Preferably, the experimental configuration of the training model of the rotating target detection method includes that the experimental environment is Python3.8, pytroch 1.7.0, torchvision0.7.0, and batchsize is 2, the initial value of the learning rate is set to 0.001, the maximum training epoch number is 12, and the learning rate is respectively reduced to 1 × 10-4 and 1 × 10-5 after the 9 th and 11 th epochs are iterated.

Preferably, the experimental hardware equipment of the training model of the rotating target detection method is

Core^TMi9-10900XCPU, NVIDIARTX3080Ti display card.

Preferably, the size of the input image is adjusted to 1024 × 1024 pixels, and the accuracy AP of each type of target and the average accuracy mAP of all types of targets in the data set are used as the measurement indexes of the experiment.

The invention provides a rotary target detection method based on improved Orientandrcnn by improvement, and compared with the prior art, the rotary target detection method has the following improvements and advantages:

the method uses a six-parameter method different from the prior art to define the rotating anchor frame, uses different polarization functions to respectively extract different characteristics required by a classification task and a positioning task, and introduces the SPP module to realize the fusion between local characteristics and global characteristics, so that the detection interference caused by the inconsistency of the required characteristics between classification and regression can be overcome, different characteristics required by different tasks are effectively extracted, and the remote sensing image target can be classified and positioned more accurately.

Drawings

The invention is further explained below with reference to the figures and examples:

FIG. 1 is a flow chart of the overall network framework of the present invention.

FIG. 2 is a diagram showing the structure of SPP-FPN in the present invention.

Fig. 3 is a block diagram of an SPP module according to the present invention.

FIG. 4 is a diagram of 6 parameters representing anchor boxes in the OrientedRPN of the present invention.

FIG. 5 is a structural diagram of PAM-head in the present invention.

FIG. 6 is a schematic diagram of a detection result of a remote sensing image obtained by the present invention.

Detailed Description

The present invention is described in detail below, and the technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

The invention provides a rotary target detection method based on improved Orientandrcnn through improvement, and the technical scheme of the invention is as follows:

as shown in fig. 1, a method for detecting a rotating target based on modified orientdrcnn comprises the following steps:

step 2, image preprocessing: adjusting each picture to a fixed 1024 × 1024 size; carrying out normalization processing on the fixed-size picture, and dividing the picture into a training set, a verification set and a test set according to a ratio of 5;

step 4, inputting a test set, and outputting a detection result: detecting the remote sensing image by using the trained improved Orienterdrcnn model to obtain a result image of the target framed by the rotating frame, wherein the result image is shown in FIG. 6;

wherein the training of the improved orientendrcnn model in step 3 comprises: inputting the training set into a main network ResNet50 for feature extraction to obtain features C2-C5 with different sizes; inputting the extracted features into an SPP-FPN module for feature fusion to obtain featuremap, wherein the SPP-FPN module outputs C5 which is the deepest layer of the main network through the SPP module to obtain M5.

Wherein, the specific operation of step 3 includes:

step 3.1, inputting the training set into a backbone network ResNet50 for feature extraction to obtain features C2-C5 with different sizes;

step 3.3, inputting the featuremap into a rotation proposal region generating module orientatedRPN, and outputting proposal regions proposals after encoding and decoding;

In step 3.2, the specific work flow of the SPP-FPN module includes: obtaining M5 by passing through an SPP module from C5 output at the deepest layer of the backbone network, performing element summation on results obtained by horizontally connecting M5 upsampling and C4 to obtain M4, performing element summation on results obtained by horizontally connecting M4 upsampling and C3 to obtain M3, and so on to obtain M2-M5, and performing 3 × 3 convolution on the M2-M5 respectively to obtain improved FPN output P2-P5, as shown in FIG. 2.

Further, the SPP module implements fusion between local features and global features, processes the feature map using pooling of different sizes, and finally performs stitching to obtain an output result, as shown in fig. 3.

In step 3.3, the specific operation of the rotation proposal region generation module orientatedrpn includes: step 3.2, the output characteristic diagram is changed into 6A after the number of convolution channels, A represents the number of anchor frames generated at each anchor point, 6 represents that 6 parameters are needed to define a rotation anchor frame, and 6 parameters are (x, y, w, h, delta alpha and delta beta), wherein x and y represent the coordinates of the central point of the generated horizontal anchor frame, w and h represent the width and height of the generated horizontal anchor frame, and delta alpha and delta beta represent the offset between two adjacent top points of the rotation anchor frame and the middle points of two adjacent edges of the horizontal anchor frame, as shown in FIG. 4. Wherein, the formula using the above 6 parameter regression anchor frame is:

where v1, v2, v3, v4 represent the four vertices of the rotation anchor frame.

Further, since the orientadrepn generates a large number of anchor frames, N anchor frames with higher scores need to be selected as subsequent inputs, the present invention uses the DIoU score as a positive sample allocation strategy, and the DIoU expression is shown in formula 2:

wherein d is²Represents the square of the distance of the predicted frame from the center point of the real frame, c²Representing the square of the length of the diagonal of the minimum bounding rectangle of the prediction box and the real box.

The specific operation of the detection head PAM-head module improved in the step 3.4 includes: processing the input feature graph by a polarized attention module PAM (pulse amplitude modulation), generating different feature pyramids for classification tasks and positioning tasks, avoiding feature interference among different tasks, effectively extracting different key features required by different tasks, sending the obtained different features into a full-connection layer for classification and regression, and outputting a final classification and positioning result; wherein, the PAM-head module structure is shown in FIG. 5. The total loss function of the model is shown in equation 3:

wherein L is_clsUsing the cross-entropy loss, L_regUsing Smooth_L1And (4) loss. L is a radical of an alcohol_clsAs shown in formula 4, L_regAs shown in equation 5:

wherein p is_iFor the output of the RPN classification branch, representing the probability of being suggested as foreground,

tag for ith true value, t_iAn offset value representing the regression of the positioning branch,

denotes the offset value to the real box, smooth_L1The function definition is shown in equation 6:

furthermore, the polarization attention module PAM is of a double-branch structure, an input feature map uses different feature representation functions after passing through an attention module (a channel attention module is parallel to a space attention module), wherein the classification branches use excitation functions to obtain global features with high response, and the positioning branches use inhibition functions to focus on boundary features only and inhibit irrelevant high activation regions; wherein, the expression of the excitation function is as follows:

where η is the excitation coefficient. The suppression function expression is as follows:

in the invention, the experimental configuration of the training model of the rotating target detection method comprises that an MMdetectionV2 frame is based, the experimental environment is Python3.8, pytroch 1.7.0, torchvision0.7.0, and batchsize is 2, the initial value of the learning rate is set to be 0.001, the maximum training epoch number is 12, and the learning rate is respectively reduced to 1 × 10-4 and 1 × 10-5 after the iteration is carried out to the 9 th epoch and the 11 th epoch.

The experimental hardware equipment of the training model of the rotating target detection method is

Core^TMi9-10900XCPU, NVIDIARTX3080Ti display card.

Considering that the remote sensing image is large, the size of the input image is adjusted to 1024 multiplied by 1024 pixels, and the precision AP and the average precision mAP of all kinds of targets of the data set are used as the measurement indexes of the experiment.

The method of the invention defines the rotating anchor frame by using a six-parameter method different from the prior art, extracts different characteristics required by a classification task and a positioning task respectively by using different polarization functions, and introduces an SPP module to realize the fusion between local characteristics and global characteristics, can overcome the detection interference caused by the inconsistency of the required characteristics between classification and regression, enhances the characteristic representation of the remote sensing small target, and has good network performance and strong model generalization capability.

The previous description is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A rotary target detection method based on improved Orientdrcnn is characterized in that: the method comprises the following steps:

step 3, inputting a network model for training: inputting the training set of the step 2 into an improved Orientendrcnn model for training;

step 4, inputting a test set, and outputting a detection result: detecting the remote sensing image by using a trained improved Orientdrcnn model to obtain a result image of a target framed by a rotating frame;

wherein the training of the improved orientdrcnn model operation in step 3 comprises: inputting the training set into a main network ResNet50 for feature extraction to obtain features C2-C5 with different sizes; inputting the extracted features into an SPP-FPN module for feature fusion to obtain featuremap, wherein the SPP-FPN module outputs C5 which is the deepest layer of the main network through the SPP module to obtain M5.

2. The improved orientdrcnn-based rotating object detection method according to claim 1, wherein: the specific operation of step 3 comprises:

and 3.4, inputting the featuremaps obtained in the step 3.2 and the disposals obtained in the step 3.3 into an improved detection head module PAM-head, performing final classification and positioning operation, and outputting a remote sensing target identification and positioning result.

3. The improved orientdrcnn-based rotating object detection method according to claim 2, wherein: in the step 3.2, the specific work flow of the SPP-FPN module includes: and (3) obtaining M5 by passing the deepest layer output C5 of the backbone network through an SPP module, performing element summation on a result obtained by transversely connecting M5 upsampling with C4 to obtain M4, performing element summation on a result obtained by transversely connecting M4 upsampling with C3 to obtain M3, repeating the steps to obtain M2-M5, and performing 3 x3 convolution on the M2-M5 to obtain improved FPN output P2-P5.

4. The method for detecting the rotating target based on the improved orientdrcnn as claimed in claim 3, wherein: the SPP module realizes the fusion between the local features and the global features, processes the feature map by using pooling of different sizes, and finally splices to obtain an output result.

5. The improved orientdrcnn-based rotating object detection method according to claim 2, wherein: in the step 3.3, the specific operation of the orientation edrpn of the rotation suggestion region generation module includes: and 3.2, changing the number of convolution channels of the output characteristic diagram into 6A, wherein A represents the number of anchor frames generated at each anchor point, 6 represents that 6 parameters are needed to define a rotation anchor frame, and the 6 parameters are (x, y, w, h, delta alpha and delta beta), wherein x and y represent the coordinates of the central point of the generated horizontal anchor frame, w and h represent the width and height of the generated horizontal anchor frame, and delta alpha and delta beta represent the offset between two adjacent top points of the rotation anchor frame and the middle points of two adjacent edges of the horizontal anchor frame.

6. The improved orientdrcnn-based rotating object detection method according to claim 2, wherein: the specific operation of the improved detection head PAM-head module in the step 3.4 comprises the following steps: the input feature graph is processed by a polarized attention module PAM to generate different feature pyramids for the classification task and the positioning task, so that feature interference among different tasks can be avoided, different key features required by different tasks are effectively extracted, the obtained different features are sent to a full-connection layer for classification and regression, and a final classification and positioning result is output.

7. The improved orientdrcnn-based rotating object detection method according to claim 6, wherein: the polarization attention module PAM is of a double-branch structure, an input feature map uses different feature representation functions after passing through an attention module (a channel attention module is parallel to a space attention module), wherein the classification branches use excitation functions to obtain global features with high response, the positioning branches use inhibition functions to focus on boundary features only and inhibit irrelevant high activation regions.

8. A method for detecting a rotating object based on modified orientdrcnn according to any of claims 1-7, wherein: the experimental configuration of the training model of the rotating target detection method comprises that based on an MMdetectionV2 framework, the experimental environment is Python3.8, pytroch 1.7.0, torchvision0.7.0, and batchsize is 2, the initial value of the learning rate is set to be 0.001, the maximum training epoch number is 12, and the learning rate is respectively reduced to 1 × 10-4 and 1 × 10-5 after the iteration is carried out to the 9 th and 11 th epochs.

9. The improved orientrcnn-based rotating object detection method according to claim 8, wherein: the experimental hardware equipment of the training model of the rotating target detection method is

Core^TMi9-10900XCPU, NVIDIARTX3080Ti display card.

10. The improved orientdrcnn-based rotating object detection method according to claim 9, wherein: the size of the input image is adjusted to 1024 x 1024 pixels, and the precision AP and the average precision mAP of all kinds of targets in the data set are used as the measurement indexes of the experiment.