CN115690627A

CN115690627A - Method and system for detecting aerial image rotating target

Info

Publication number: CN115690627A
Application number: CN202211371137.9A
Authority: CN
Inventors: 唐俊; 张登正; 段章领; 葛新科; 傅祖涛; 刘凯
Original assignee: ANHUI BOWEI GUANGCHENG INFORMATION TECHNOLOGY CO LTD; Anhui University
Current assignee: ANHUI BOWEI GUANGCHENG INFORMATION TECHNOLOGY CO LTD; Anhui University
Priority date: 2022-11-03
Filing date: 2022-11-03
Publication date: 2023-02-03

Abstract

The invention relates to the technical field of deep learning and target detection, and solves the defects that most of the conventional detectors are limited to predicting horizontal frames, the recall rate of small targets is low, the false detection rate is too high, the characteristics are not aligned, and the angle prediction has boundary problems, in particular to a method for detecting an aerial image rotating target, which comprises the following steps: s1, acquiring a high-altitude image shot by an unmanned aerial vehicle as training data, and preprocessing the training data; s2, building an aerial image rotating target detection model by adopting a YOLOv5 network; s3, training the aerial image rotating target detection model by using the preprocessed training data to obtain an optimal detection model; and S4, inputting the aerial image to be detected into the optimal detection model and outputting a detection result. The invention achieves the purpose of detecting the target at any angle, eliminating the boundary problem, improving the problem of characteristic misalignment and improving the detection precision.

Description

Method and system for detecting aerial image rotating target

Technical Field

The invention relates to the technical field of deep learning and target detection, in particular to a method and a system for detecting a rotating target of an aerial image.

Background

At present, aerial image rotating target detection is divided into a horizontal frame and an arbitrary angle frame. In a real scene, when objects to be detected have large length and width, inclined angles and close arrangement among the objects, if only a horizontal frame type detector is used, large-scale omission occurs because horizontal frames are overlapped with each other. Therefore, a bounding box with an arbitrary angle is often required to completely enclose the object.

When the aerial image rotation target is detected, because the size of the target in the aerial image is small, the pixels occupied in the whole image are few, the target contains less information, important information is missing, edge information is obvious, the target angle is not fixed, and the direction is changeable, so that the detection difficulty is high, the false detection rate is too high, and the detection precision is low.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a method and a system for detecting a rotary target of an aerial image, which solve the defects that most of the conventional detectors are limited to a prediction horizontal frame, the recall rate of small targets is low, the false detection rate is too high, the features are not aligned and the angle prediction has a boundary problem, achieve the purpose of eliminating the detection of any angle target of the boundary problem, improve the problem of feature non-alignment and improve the detection precision.

In order to solve the technical problems, the invention provides the following technical scheme: a method for detecting a rotating target of an aerial image comprises the following steps:

s1, acquiring a high-altitude image shot by an unmanned aerial vehicle as training data, and preprocessing the training data;

s2, building an aerial image rotating target detection model by adopting a YOLOv5 network;

s3, training the aerial image rotating target detection model by using the preprocessed training data to obtain an optimal detection model;

s4, inputting the aerial image to be detected into the optimal detection model and outputting a detection result, wherein the detection result comprises an object type, a coordinate position and a confidence coefficient;

and S5, reasoning the aerial image to be detected to obtain a reasoning result, and then carrying out post-processing on the reasoning result.

Further, in step S1, the preprocessing of the training data includes the following steps:

s11, labeling four vertexes of the targets in the aerial image by using label labeling software, wherein the four vertexes are arranged clockwise, and the category and the detection difficulty of each target are marked at the same time;

s12, converting the quadrangle determined by the four vertexes of the marked target into a minimum circumscribed rectangle to obtain coordinates of the four vertexes of the minimum circumscribed rectangle;

s13, cutting the aerial image into a fixed size through the set size and gap, and recording cutting information;

s14, adopting a long edge definition method to enable the coordinates of four vertexes of the minimum circumscribed rectangle to be in a vertex format { (x) ₁ ,y ₁ )、(x ₂ ,y ₂ )、(x ₃ ,y ₃ )、(x ₄ ,y ₄ ) Is converted into (x) _c ,y _c W, h, θ) ofA format;

s15, encoding the angle information theta into a CSL format;

and S16, performing data enhancement on the training data cut into the aerial images with fixed sizes.

Further, in step S2, building an aerial image rotation target detection model by using a YOLOv5 network includes the following steps:

s21, constructing a TE-BottleneeckCSP network as a backhaul extraction feature;

s22, adding a CBAM attention mechanism module into the Bottom-up Path Augmentaion part in the Neck;

s23, adding a detection branch in the Neck, and adding a small-step-length characteristic diagram to predict a tiny target for the problem that the target in the unmanned aerial vehicle image shot at a high altitude depression angle is too small;

s24, adding an angle component in the Head part;

s25, decoupling the position prediction task and the category prediction task in the Head part, and respectively carrying out parallel processing by using two sub-networks;

and S26, adjusting the sampling position of the classification network convolution kernel by using the prediction result of the regression branch, realizing feature alignment, and finally obtaining the aerial image rotating target detection model.

Further, in step S3, training the aerial image rotation target detection model by using the preprocessed training data to obtain an optimal detection model includes the following steps:

s31, designing a group of anchors for each category through k-means and a genetic algorithm according to the number of the categories of the preprocessed aerial images;

s32, setting a corresponding anchor according to the category of each gt, and performing label assignment on each gt by using a positive and negative sample division rule of YOLOv 5;

s33, continuously implementing a dynamic matching strategy along with the training, so that the gt can obtain more high-quality compensation anchors;

s34, calculating loss after positive and negative samples are defined, and then carrying out gradient updating;

s35, adopting an implicit supervision method to set SkewIoU of the real frame and the forecast frame as a scalar, multiplying the scalar with regression loss as a coefficient, and integrating the scalar with the SkewIoU into a target positioning process;

and S36, adding an angle prediction loss function.

Further, in step S5, performing inference on the aerial image to be detected to obtain an inference result, and then performing post-processing on the inference result includes the following steps:

s51, setting a website and a port number, and transmitting the aerial image to be inferred to a browser through a client, wherein the aerial image to be detected is the aerial image to be inferred;

s52, capturing an aerial image to be inferred from the browser by the server, and then sending the aerial image to a trained optimal detection model for feature extraction, target category judgment and target position positioning to obtain a detection result, namely the detection result in the step S4;

s53, sending the detection result back to the browser through the JSON format, and obtaining a final detection result from the browser through the client;

s54, judging whether the inference image is cut and then sent to a network according to the final detection result;

if the cutting is not performed, directly visualizing the detection result;

if the image is cut, the position of the target in the original un-divided image is restored by using the target position information detected by the divided image and the cutting information in the image name, and then the restored target is sent to a network.

Further, the air conditioner is provided with a fan,

the mathematical expression for CSL is:

in the above equation, g (x) is a window function, r is a window radius, θ is an angle of a bounding box, and x is an abscissa in a window range, where the window function needs to satisfy periodicity, symmetry, monotonicity, and a maximum value g (θ) =1.

Further, the air conditioner is provided with a fan,

the mathematical expression for the regression components is:

in the above equation, N represents the number of positive samples, CI represents the complete IoU of the prediction box and the real box, SI represents the oblique cross-over ratio of the prediction box and the real box, and | represents setting this as a scalar.

The technical solution also provides a system for implementing the detection method, and the detection system includes:

the system comprises a preprocessing module, a data acquisition module and a data processing module, wherein the preprocessing module is used for acquiring high-altitude images shot by an unmanned aerial vehicle as training data and preprocessing the training data;

the detection model building module is used for building an aerial image rotating target detection model by adopting a YOLOv5 network;

the optimal detection model training module is used for training the aerial image rotating target detection model by using the preprocessed training data to obtain an optimal detection model;

the detection result output module is used for inputting the aerial image to be detected into the optimal detection model and outputting a detection result, and the detection result comprises an object type, a coordinate position and a confidence coefficient;

and the post-processing module is used for reasoning the aerial image to be detected to obtain a reasoning result and then post-processing the reasoning result.

By means of the technical scheme, the invention provides a method and a system for detecting a rotating target of an aerial image, which at least have the following beneficial effects:

1. the invention processes the aerial image into a form suitable for the current universal target detection in advance, realizes training and testing on limited computing resources and does not influence the detection effect at all.

2. The invention converts the prediction of the angle information from the regression problem to the classification problem through the CSL, eliminates the periodic of angularness problem and the exchange accessibility of Edges problem which cause the sudden loss increase in the conventional angle regression problem, solves the boundary problem, effectively measures the angle distance between the prediction result and the label information, reduces the training difficulty and improves the prediction precision of the angle information.

3. The TE-BottleNeckCSP network is designed as a backhaul to mine the global information of the image, a low-power down-sampling detection head is added in the design of the Neck, the characteristic that the target in the aerial image is tiny is better adapted, the recall rate is effectively improved, in addition, a CBAM attention mechanism is added in the Neck, the accuracy of high-density scene target positioning is improved, the network is beneficial to efficiently searching an interesting region in a large-range scene, and finally a classification task and a regression task are decoupled, the misalignment phenomenon of classification and regression is eliminated, the perception field of classification branches is calibrated by using a regression result, and the feature alignment is realized.

4. The method optimizes the matching strategy of YOLOv5, designs a group of anchors of different types for gt of each category, realizes more accurate matching, and implements the strategy of dynamic matching in the training process, so that gt can obtain more high-quality compensation anchors, thereby effectively improving the recall rate.

5. The invention improves the positioning component in the loss function aiming at the characteristic of any angle of the envelope frame, and effectively solves the problem that the SkewIoU part is not conductive by integrating the CIoU and the SkewIoU, thereby realizing more accurate positioning effect.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a flow chart of a method for detecting a rotating target in an aerial image according to the present invention;

FIG. 2 is a HTTP service flow diagram of the method for detecting a rotating target of an aerial image according to the present invention;

FIG. 3 is a network architecture diagram of an aerial image rotating target detection model of the present invention;

FIG. 4 is a block diagram of a system for detecting a rotating target in an aerial image according to the present invention.

In the figure: 10. a preprocessing module; 20. a detection model building module; 30. an optimal detection model training module; 40. a detection result output module; 50. and a post-processing module.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below. Therefore, the realization process of how to apply technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented.

It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by relevant hardware instructed by a program, and therefore, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

Overview of the background

The current aerial image rotating target detection is divided into a horizontal frame and an arbitrary angle frame. In a real scene, when objects to be detected have large length and width, inclined angles and close arrangement among the objects, if only a horizontal frame type detector is used, large-scale omission occurs because horizontal frames are overlapped with each other. Therefore, a bounding box with an arbitrary angle is often required to completely enclose the object.

The current aerial image rotating target detection mainly faces the following challenges:

1. compared with the image shot on the ground, the size of the aerial image is too large, and the required computing resource is larger.

2. Due to high-altitude depression angle shooting, the aerial image scene has complex background, multiple interference factors and large target scale change.

3. The size of the target in the aerial image is small, the pixels occupied in the whole image are few, the target contains less information, important information is lost, and edge information is obvious.

4. The aerial image scene has large variation of target distribution density, and both sparsely arranged and compact scenes exist.

5. The target angle is not fixed, the direction is changeable, and the detection difficulty is large.

In light of the above problems, the present embodiment provides a method and a system for detecting a rotating target in an aerial image, which are used to solve the problems encountered in the prior art.

Referring to fig. 1 to 4, a specific implementation of the present embodiment is shown, in which an aerial image is cut and a labeling information format is converted through acquisition and preprocessing of training data; according to a basic target detection network, a feature extraction network is built by integrating a Transformer encoder, an attention mechanism is integrated in a feature fusion stage, small-step detection branches are added, detection and regression tasks are decomposed, optimization classification branches are output by the regression branches, and an aerial image rotating target detection model is built; in the training process, the positive and negative samples are divided by adopting multi-class label distribution, dynamic adjustment is carried out in the iteration process, and parameters are updated according to the optimized loss function, so that the detection of any angle target for eliminating the boundary problem is realized, the problem of characteristic misalignment is improved, and the detection precision is improved.

Referring to fig. 1, the present embodiment provides a method for detecting a rotation target of an aerial image, including the following steps:

s1, acquiring high-altitude images shot by an unmanned aerial vehicle as training data, preprocessing the training data, and preprocessing the aerial images into a form suitable for current general target detection, so that training and testing on limited computing resources are realized, and the detection effect is not influenced at all.

In step S1, the preprocessing of the training data includes the steps of:

s13, cutting the aerial image into 1024 × 1024 fixed sizes through the set size and gap, and recording cutting information;

s14, adopting a long edge definition method to enable the coordinates of four vertexes of the minimum circumscribed rectangle to be in a vertex format { (x) ₁ ,y ₁ )、(x ₂ ,y ₂ )、(x ₃ ,y ₃ )、(x ₄ ,y ₄ ) Is converted into (x) _c ,y _c W, h, θ) in the rectangular frame, w represents the longest side of the rectangular frame, and h, x represents the shortest side of the rectangular frame _c ,y _c The x coordinate and the y coordinate of the central point of the rectangular frame are respectively, and theta is the inclination angle of the rectangular frame;

s15, encoding the angle information theta into a CSL format;

the mathematical expression for CSL is as follows:

in the above equation, g (x) is a window function, r is a window radius, θ is an angle of a bounding box, and x is an abscissa in a window range, and the window function needs to satisfy periodicity, symmetry, monotonicity, and a maximum value g (θ) =1.

In this embodiment, firstly, the aerial images shot by the unmanned aerial vehicle are sorted, secondly, the targets to be detected in the aerial images are marked by using labelme marking software, the marking format is four vertexes of the targets to be detected, wherein the four vertexes are marked clockwise, the category and the detection difficulty of each target are marked simultaneously, in addition, the marked quadrangle envelopes the targets to be detected as much as possible, and some unnecessary background information is eliminated.

After the labeling information of all the targets to be detected is obtained, converting the quadrangle determined by the four vertexes of the targets to be detected into the minimum circumscribed rectangle, and thus obtaining the coordinates of the four vertexes of the minimum circumscribed rectangle; then, setting proper size and gap according to the computing resource and performance of a computer, cutting the image in the data set into fixed size, and recording the cutting information for subsequent post-processing of the test result;

after the aerial images with uniform sizes are obtained, the labeling information of all targets to be detected is recoded to be (x) _c ,y _c W, h, theta) format, further processing theta into a CSL format, and converting the regression problem into a classification problem, thereby overcoming the boundary problem and better measuring the loss degree of the prediction information and the real label.

And then, a series of data enhancement operations are carried out on the image with the angle label information, so that the diversity of data is enhanced, and the generalization capability of the model is improved.

S2, an aerial image rotating target detection model is built by adopting a YOLOv5 network, the YOLOv5 network is used as a base to carry out optimization, the TE-BottleneckCSP network is built to be used as a backbone to extract features, a global attention mechanism module is added to a Neck part, a parallel feature alignment detection head is built, a prediction branch is added, the prediction of angle information is converted from a regression problem into a classification problem through CSL, the problem of Periodicity of Angular and the problem of exchange accessibility of Es, which cause loss surge in the conventional angle regression problem, is eliminated, the problem of a boundary is solved, the angle distance between a prediction result and label information is effectively measured, the training difficulty is reduced, and the prediction precision of the angle information is improved.

In step S2, the method for building an aerial image rotation target detection model by using the YOLOv5 network includes the following steps:

s21, constructing a TE-BottleneeckCSP network as a backbone extraction feature, and providing the TE-BottleneeckCSP network as the backbone extraction feature according to the advantages of a transducer Encoder and a BottleneeckCSP;

s24, adding an angle component to the Head part;

s25, decoupling a position prediction task and a category prediction task in the Head part, and respectively carrying out parallel processing by using two sub-networks;

In the embodiment, the aerial image rotation target detection model is optimized to adapt to the aerial image on the basis of YOLOv 5. The traditional YOLOv5 can only be detected by a horizontal frame, so the invention adds an angle component to the Head part of the YOLOv5, so that the boundary frame with any angle can be predicted.

This embodiment removes the last bottomless nickcsp module based on the backhaul of the traditional YOLOv5, and replaces it with a transform Encoder module, as shown in fig. 3, wherein the transform Encoder module integrates multiple full connection layers and Multi-head Self-attachment layers, and some unnecessary network structures are removed compared to the traditional transform Encoder module. The Multi-head Self-orientation is a variant of the Self-orientation, and n h times of linear operations are carried out by setting a reasonable head number nh, so that a plurality of different sub-area representation spaces are mapped, and information of different position characteristics under the plurality of sub-spaces is more comprehensively excavated.

The sock part of the YOLOv5 uses a PANet method to perform feature aggregation, which is enough for ground detection to accurately detect targets in various scale ranges, however, in order to adapt to the characteristic that the target size of the aerial image is small, in this embodiment, a detection layer with a smaller step length is added to the original sock part of the YOLOv5, as shown in fig. 3, the target with an extremely small size in the aerial image can be detected, and the detection robustness is improved.

In addition, in order to filter invalid features, a CBAM module is added to the Bottom-up Path augmentation aion part in the tack, as shown in fig. 3, so that the detector can more accurately and efficiently locate the target, and the adaptability of the detector to various complex scenes is improved;

CBAM is an attention mechanism module that combines space and channels, taking into account both the characteristics of different channels at the same location and the characteristics of different locations of the same channel. Including Channel orientation Module and Spatial orientation Module, the detector can effectively search the interested region by comprehensively considering two dimensions.

The Channel Attention can be expressed as:

F _later ＝F _pre *sigmoid(MLP(AvgPool _c (F _pre ))+MLP(MaxPool _c (F _pre )))

spatial orientation can be expressed as:

F _latter ＝F _pre *sigmoid(Conv(Concat(AvgPool _s (F _pre ),MaxPool _s (F _pre ))))

in the formula, F _pre And F _latter Respectively representing the pre-treatment profile and the post-treatment profile, MLP and Conv respectively representing the fully-connected layer and the convolutional layer, avgPool _c And Avgpool _s Representing the average pooling operation on channel and space, maxpool, respectively _c And MaxPool _s Representing the maximum pooling operation on the channel and spatially, respectively.

In order to solve the problem of feature conflict between the classification task and the regression task in the target detection, the embodiment adopts a method of decomposing the Head part into two sub-networks to perform classification and regression, as shown in fig. 3, so that the accuracy of both classification and regression is effectively improved.

Due to misaligmentation problems caused by the independence between the classification task and the regression task, the invention provides a feature alignment module to learn object feelingAnd classifying the known characteristics. The alignment module transforms the fixed sample locations of the convolution kernel to align with the predicted bounding box, as shown in the oa.conv module of fig. 3. Specifically, for each location (dx, dy) in the classification map, it has a corresponding bounding box M = (M) predicted by the regression network _x ,m _y ,m _w ,m _h ). Regular sampling region G = { ([ k/2 ]) using standard 2D convolution],-[k/2]),…,([k/2],[k/2]) Converting the sampling position into a prediction region M by using a fixed region and a spatial transformation matrix T _i . The mathematical expression of the spatial transformation matrix is as follows:

T＝{(m _x +m _y )-B}-{(d _x +d _y )-G}

in the formula, B = { ([ m) _w /2]，-[m _h /2])，...，([m _w /2]，[m _h /2]) Denotes the offset of the new sample position with respect to the center point, m _x 、m _y 、m _w 、m _h Respectively represent the abscissa, ordinate, width, height, d of the predicted bounding box M _x 、d _y Respectively, the abscissa and ordinate of each position in the classification map, and G denotes a regular sampling region.

Through the new sampling position, the feature alignment module extracts object perception features, and the mathematical expression of the feature alignment module is as follows:

in the formula, x represents input feature mapping, w represents convolution weight, u represents a position on a feature map, F represents an output object perception feature map, G represents a conventional sampling region of standard 2D convolution, G represents a conventional sampling position, T represents a spatial transformation matrix, and T represents a distance vector from an original sampling point to a new sampling point.

The transformation of the sample positions may be adapted to the changes of the prediction bounding box. Therefore, the extracted target perceptual features are robust to changes in the target scale. Furthermore, the object-aware features provide a global description of the candidate objects, which makes the differentiation of objects and background more reliable.

S3, training an aerial image rotating target detection model by using the preprocessed training data to obtain an optimal detection model, respectively designing anchors by taking the categories as units, dividing sample labels by adopting a dynamic matching strategy in the training process, adding an angle component into a loss function, and integrating SkewIoU and CIoU of a prediction frame and a labeling frame to construct a guided approximate tilt loss for a regression component in the loss function.

In the embodiment, a TE-BottleNeckCSP network is designed as a backhaul to mine the global information of the image, and then a low-power down-sampling detection head is added in the design of the Neck to better adapt to the characteristic that the target in the aerial image is tiny, so that the recall rate is effectively improved.

In step S3, training the aerial image rotation target detection model using the preprocessed training data to obtain an optimal detection model includes the following steps:

s33, continuously implementing a dynamic matching strategy along with the training, so that more high-quality compensation anchors can be obtained;

s35, adopting an implicit supervision method to set SkewIoU of the real frame and the prediction frame as scalar, multiplying the scalar with regression loss as a coefficient, integrating the scalar with the positioning process of the target, improving the positioning capability of the network by using an implicit supervision method, wherein the mathematical expression of regression components is as follows:

in the formula, N represents the number of positive samples, CI represents the complete IoU of the prediction box and the real box, SI represents the oblique cross-over ratio of the prediction box and the real box, and | represents setting this as a scalar.

And S36, adding an angle prediction loss function.

In this embodiment, after the image outputs a plurality of feature maps of different sizes through the network, label assignment is performed on each feature map.

Respectively designing a group of anchors for each type of targets according to k-means and a genetic algorithm;

after a group of anchors of each category is determined, each gt is mapped to each feature map, positive and negative samples are matched for each gt according to a matching strategy of YOLOv5, and the matched anchors are dynamically adjusted for each gt along with the progress of each epoch, so that some difficult samples can be matched with the anchors of high quality, and the difficulty of training is reduced.

After matching is completed, back propagation is performed by calculating the positioning loss of the positive sample, the angle loss of the positive sample, the classification loss of the positive sample and the negative sample and the confidence coefficient loss, so that gradient updating is performed. The mathematical expression for the loss function is as follows:

L _total ＝L _cls +L _loc +L _conf +L _angle

wherein the classification loss L _cls Confidence loss L _conf Angle loss L _angle Using BCELoss, loss of localization L _loc The method integrates two items of CIoU loss and SkewIoU, solves the problem of unguided property of the SkewIoU through an implicit supervision method, and effectively improves the positioning capability according to the high sensitivity of a plurality of objects with large length-width ratio to the SkewIoU.

And S4, inputting the aerial image to be detected into the optimal detection model to output a detection result, wherein the detection result comprises object categories, coordinate positions and confidence degrees, the matching strategy aiming at YOLOv5 is optimized, a different group of anchors is designed for each category, more accurate matching is realized, and meanwhile, the dynamic matching strategy is implemented in the training process, so that the gt can obtain more high-quality compensation anchors, and the recall rate is effectively improved.

S5, reasoning the aerial image to be detected to obtain a reasoning result, then post-processing the reasoning result, improving the positioning component in the loss function according to the characteristics of any angle of the envelope frame, and effectively solving the problem that the SkewIoU part cannot be guided by integrating the CIoU and the SkewIoU so as to realize a more accurate positioning effect.

Referring to fig. 2, in step S5, the process of reasoning the aerial image to be detected to obtain a reasoning result, and then performing post-processing on the reasoning result includes the following steps:

s53, sending the detection result back to the browser through a JSON format, and obtaining a final detection result from the browser through the client;

if the cutting is not performed, directly visualizing the detection result;

In this embodiment, after training is completed, the model with the highest precision is stored in the verification set, and subsequent reasoning is performed;

a Web frame is constructed through flash, the client encodes the image to be inferred through a set website and a set port number and then sends the image to the browser, and the server sends the obtained image to the model.

And carrying out forward reasoning, then converting a reasoning result into a JSON format, and transmitting a corresponding website and port number back by the server, wherein the client can obtain detection information including object type, coordinate position, confidence coefficient and the like.

After the test image forward reasoning is finished and an output result is obtained, judging whether the reasoning image is cut and then sent to a network; if the cutting is not performed, directly visualizing the detection result; and if the image is cut, restoring the position of the target in the original non-divided image by using the target position information detected by the divided image and the cutting information in the image name.

Corresponding to the detection method provided in the foregoing embodiment, this embodiment further provides a system of the detection method, and since the detection system provided in this embodiment corresponds to the detection method provided in the foregoing embodiment, the implementation of the foregoing detection method is also applicable to the detection system provided in this embodiment, and will not be described in detail in this embodiment.

Referring to fig. 4, a block diagram of a detection system provided in this embodiment is shown, where the detection system includes:

the device comprises a preprocessing module 10, a data acquisition module and a data processing module, wherein the preprocessing module 10 is used for acquiring high-altitude images shot by the unmanned aerial vehicle as training data and preprocessing the training data;

the detection model building module 20 is used for building an aerial image rotating target detection model by adopting a YOLOv5 network;

the optimal detection model training module 30, the optimal detection model training module 30 is used for training the aerial image rotating target detection model by using the preprocessed training data to obtain an optimal detection model;

the detection result output module 40 is used for inputting the aerial image to be detected into the optimal detection model and outputting a detection result, wherein the detection result comprises an object type, a coordinate position and a confidence coefficient;

and the post-processing module 50 is used for reasoning the aerial image to be detected to obtain a reasoning result, and then performing post-processing on the reasoning result.

It should be noted that, in the system provided in the foregoing embodiment, when the functions of the system are implemented, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the system and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

The present invention has been described in detail with reference to the foregoing embodiments, and the principles and embodiments of the present invention have been described herein with reference to specific examples, which are provided only to assist understanding of the methods and core concepts of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for detecting a rotating target of an aerial image is characterized by comprising the following steps:

2. The detection method according to claim 1, characterized in that: in step S1, the preprocessing of the training data includes the steps of:

s11, labeling four vertexes of the targets in the aerial image by using labelme labeling software, wherein the four vertexes are arranged clockwise, and the category and the detection difficulty of each target are simultaneously marked;

s14, adopting a long edge definition method to enable the coordinates of four vertexes of the minimum circumscribed rectangle to be in a vertex format { (x) ₁ ,y ₁ )、(x ₂ ,y ₂ )、(x ₃ ,y ₃ )、(x ₄ ,y ₄ ) Is converted into (x) _c ,y _c W, h, θ);

s15, encoding the angle information theta into a CSL format;

3. The detection method according to claim 1, characterized in that: in step S2, the method for building an aerial image rotation target detection model by using the YOLOv5 network includes the following steps:

s21, constructing a TE-BottleneeckCSP network as a backhaul extraction feature;

s23, adding a detection branch in the Neck, and adding a small-step characteristic diagram to predict a tiny target for the problem that the target in the unmanned aerial vehicle image shot at the high-altitude depression angle is too small;

s24, adding an angle component in the Head part;

4. The detection method according to claim 1, characterized in that: in step S3, the training of the aerial image rotation target detection model by using the preprocessed training data to obtain an optimal detection model includes the following steps:

s31, designing a group of anchors for each category of the preprocessed aerial images through k-means and a genetic algorithm according to the number of the categories;

s34, after positive and negative samples are defined, loss begins to be calculated, and then gradient updating is carried out;

s35, adopting an implicit supervision method to set the SkewIoU of the real frame and the prediction frame as a scalar, multiplying the scalar with regression loss as a coefficient, and integrating the scalar with the regression loss into the positioning process of the target;

and S36, adding an angle prediction loss function.

5. The detection method according to claim 1, characterized in that: in step S5, the process of reasoning the aerial image to be detected to obtain a reasoning result, and then performing post-processing on the reasoning result includes the following steps:

if the cutting is not performed, directly visualizing the detection result;

if the image is cut, the position of the target in the original non-divided image is restored by using the target position information detected by the divided image and the cutting information in the image name, and then the target is sent to the network.

6. The detection method according to claim 2, characterized in that:

the mathematical expression for CSL is:

7. The detection method according to claim 4, characterized in that:

the mathematical expression for the regression components is:

8. A system for implementing the detection method according to any one of the preceding claims 1 to 7, characterized in that the detection system comprises:

the device comprises a preprocessing module (10), a storage module and a display module, wherein the preprocessing module (10) is used for acquiring high-altitude images shot by the unmanned aerial vehicle as training data and preprocessing the training data;

the detection model building module (20), the detection model building module (20) is used for building an aerial image rotating target detection model by adopting a YOLOv5 network;

the optimal detection model training module (30), the optimal detection model training module (30) is used for training the aerial image rotating target detection model by using the preprocessed training data to obtain an optimal detection model;

the detection result output module (40) is used for inputting the aerial image to be detected into the optimal detection model and outputting a detection result, and the detection result comprises an object type, a coordinate position and a confidence coefficient;

the post-processing module (50) is used for reasoning the aerial images to be detected to obtain a reasoning result, and then post-processing the reasoning result.