CN113688830B

CN113688830B - Deep learning target detection method based on center point regression

Info

Publication number: CN113688830B
Application number: CN202110930245.4A
Authority: CN
Inventors: 李婕; 周顺; 王恩果; 李毅; 巩朋成; 张正文; 朱鑫潮
Original assignee: Hubei University of Technology
Current assignee: Hubei University of Technology
Priority date: 2021-08-13
Filing date: 2021-08-13
Publication date: 2024-04-26
Anticipated expiration: 2041-08-13
Also published as: CN113688830A

Abstract

The invention provides a deep learning target detection method based on center point regression, which comprises the following specific steps: a horizontal connection module is introduced into the original CENTERNET network structure to correlate the characteristics of different layers, and deep layer characteristics and shallow layer characteristics are fused to improve the small target detection performance; a channel attention module is introduced into the horizontal connection module, and the characteristic responses among different channels are adaptively calibrated, so that the characteristic extraction capacity of the network is improved; finally, a comparison experiment is carried out on UCAS-AOD and RSOD public remote sensing data sets. The method has higher detection precision in the remote sensing image aircraft target detection, simultaneously maintains the speed advantage of the single-stage detection model, and has a certain practical value.

Description

Deep learning target detection method based on center point regression

Technical Field

The invention relates to the technical field of target detection, in particular to a deep learning target detection method based on center point regression.

Background

The remote sensing image is photographed by a satellite, and comprises spatial resolution, time resolution, spectral resolution and the like. The target detection in the remote sensing image has important significance and application value in civil and military, and particularly, the airplane target detection in the remote sensing image can provide more valuable information for more efficient management of civil aviation and military operations. Different from the traditional plane image, the remote sensing image plane target detection faces the difficulties of multiple scales, complex background, large picture memory occupation and the like.

With the rapid development of deep learning, a target detection method based on (Convolutional Neural Networks, CNN) of a convolutional neural network has become a trend for processing and recognizing remote sensing images. Currently, the mainstream deep learning target detection algorithms can be divided into two categories: anchor-based methods and non-anchor-based methods. Among them, most anchor-based methods can be classified into two types, single-stage and two-stage. The single-stage method such as SSD and YOLO can finish classification and regression of objects in one stage, and has the advantage of very high detection speed. Two-stage methods, such as R-CNN, fast R-CNN, etc., can often achieve more accurate detection results by introducing a regional advice network. In recent years, there has been much effort to improve the accuracy and efficiency of anchor-based methods and is gradually moving toward maturity. However, since the anchor-based method is largely dependent on the number of positive and negative samples and the super-parameters of the anchor, such as the size, aspect ratio and number of anchor points, there is no effective method to automatically adjust these super-parameters, and calibration must be performed manually one by one, thus limiting its popularity. In addition, the anchor-based detection method introduces a Non-maximum suppression algorithm (Non-Maximum Suppression, NMS) to eliminate repeated target frames, which increases algorithm complexity, calculation amount and the like, and causes the final detection speed to be correspondingly slower and the real-time performance to be poorer.

To increase the flexibility of the detector, anchor-free methods have been developed and have received great attention. The anchor-free method is no longer dependent on preset anchors, performs feature extraction and detection on key points or dense areas of the input learning image directly, and can adapt to various objects through regression. If CornerNet uses a single convolutional neural network to predict the upper left and lower right heat maps of all instances of the same object class, and the embedding vector for each detected corner, finally, we refer to the associative embedding method proposed by Newell et al to match and group a pair of corner points belonging to the same object. ExtremeNet detects four extreme points and central points of an object by predicting four multi-peak maps of each object category, groups the extreme points into the object based on a geometric method, and further obtains a final result. CENTERNET is improved based on CornerNet related ideas, object detection is performed by directly predicting a target center point, other object attributes such as size, 3D position, direction and even gesture are obtained through a regression method, NMS, RPN and the like are not needed in training and testing, and compared with a detector based on a boundary box, the detector is simpler, faster and more accurate, and end-to-end is truly realized. Liu et al first attempted to solve the problem of remote sensing image target detection using the CENTERNET anchor-free method and evaluated CENTERNET for each backbone network performance on the NWPU VHR-10 remote sensing dataset. Zhang et al propose a feature enhanced central point network, which introduces horizontal connection between different layers, and improves the detection accuracy of a small target of a remote sensing image by combining deep layer features and shallow layer features, but the detection speed needs to be improved. The literature provides a single-thread anchor-free target detection model based on a single-thread anchor-free target detection model, integrates a plurality of prediction branches into one branch by using an hourglass backbone extraction network, has similar semantic feature characterization capability with a model combined with ResNet and FPN structures, saves memory space, maintains small target detection precision, and has the problem of low feature extraction efficiency. The method provides a certain thought for the classical anchor-free method to be directly used for detecting the target of the remote sensing image, but the balance of detection speed and precision caused by the problems of complex background, small target, different shape to be detected and the like of the remote sensing image is to be improved.

Disclosure of Invention

The invention aims to provide a deep learning target detection method based on center point regression, which solves the problems of high false detection and difficult detection of small targets when the original CENTERNET algorithm detects remote sensing images.

The technical scheme of the invention is as follows:

A deep learning target detection method based on center point regression comprises the following specific steps:

A horizontal connection module is introduced into the original CENTERNET network structure to correlate the characteristics of different layers, and deep layer characteristics and shallow layer characteristics are fused to improve the small target detection performance;

A channel attention module is introduced into the horizontal connection module, and the characteristic responses among different channels are adaptively calibrated, so that the characteristic extraction capacity of the network is improved;

finally, a comparison experiment is carried out on UCAS-AOD and RSOD public remote sensing data sets.

The horizontal connection module is a Feature Fusion module, namely a Feature Fusion module, wherein the Feature Fusion module comprises C-CENTERNET and T-CENTERNET, the C-CenteNet enables Feature layers before Fusion to have the same space size through standard 1X1 convolution, the T-CENTERNET changes the standard convolution in the C-CENTERNET into cavity convolution for testing, and because each Feature value in different layers has different proportions, the Feature values need to be subjected to batch normalization and Relu activation processing after convolution.

The channel attention module is an extrusion-Excitation attention module SE-Net, the extrusion-Excitation attention module firstly performs extrusion Squeeze operation on H multiplied by W characteristic diagrams with the number of input channels being C to obtain 1 multiplied by 1 of the number of channels being C, then performs Excitation operation on the obtained characteristic diagrams to obtain weight values among the channels, finally multiplies the original characteristic diagrams by the weight of the corresponding channels through Scale operation to obtain a new characteristic diagram, and updating of channels containing effective information and suppression of channels containing useless information are completed.

And (3) performing a comparison experiment on a UCAS-AOD and RSOD public remote sensing dataset, randomly selecting pictures from a dataset sample in the comparison experiment process as a training set, keeping the ratio of the training set to the testing set to be 9:1, taking down the sampling rate R in the experiment, performing iterative training by using an Adam optimizer, uniformly scaling an input image to a resolution of 512 multiplied by 512, reducing the learning rate by 10 times after 50 epochs are trained by 1e-3, setting the batch_size to be 4, and training 50 epochs, and in addition, in order to accelerate the convergence speed, using the pre-training weight obtained in the ImageNet classification task for ResNet-50-based backbones in the training process.

In order to verify the detection performance of the deep learning target detection method based on center point regression, training and comparing the performances of different detection networks under the condition of the same experimental platform and training data set, and comparing the single-stage and two-stage target detection algorithms to obtain a test result.

Compared with the prior art, the invention has the beneficial effects that: the method solves the problems of high false detection and difficult detection of small targets and the like when the original CENTERNET algorithm detects the remote sensing image, has higher detection precision in the detection of the aircraft targets of the remote sensing image, maintains the speed advantage of a single-stage detection model, and has certain practical value.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention.

Fig. 2 is a schematic diagram of a modified CENTERNET network architecture of the present invention.

Fig. 3 is a schematic diagram of a network structure of a horizontal connection module according to the present invention.

Fig. 4 is a schematic view of the squeeze-actuated attention module of the present invention.

Fig. 5 is a schematic diagram of a network structure of horizontal connection modules after the channel attention module of the present invention is introduced.

Fig. 6 is a comparison of the present invention before and after introduction of attention.

FIG. 7 is a graph of the total Loss variation trained on UCAS-AOD datasets.

Fig. 8 is a graph comparing PR curves before and after improvement for UCAS datasets.

FIG. 9 is a graph of the total Loss variation for training on the RSOD data set.

Fig. 10 is a graph comparing PR curves before and after improvement for the RSOD dataset.

FIG. 11 is a visual comparison of the results before and after improvement.

Fig. 12 is a comparison of the remaining algorithms.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

CENTERNET is used as a new Anchor-free end-to-end target detection algorithm, the center of the bounding box of the object is represented by a point, and then regression is directly carried out from image features around the center position to obtain other attributes, such as the size, direction, gesture and the like of the object, so that the target detection problem becomes a standard key point estimation problem. The method comprises the following specific steps: an image of I ε R ^W×H×3 is input, where width is W and height is H. Generating heatmaps through convolutional neural networksWherein R is a scaled size, r=4 is taken as default, and c is the number of key point categories. When/>Representing the detection of a critical point,/>Representing the background. During the training phase, key points of Ground Truth are distributed to the thermodynamic diagram through Gaussian kernels of the following formula (1)And (3) upper part.

Wherein the method comprises the steps ofFor the target real point, x and y are prediction center points, and sigma _p is the standard deviation of object size adaptation.

If two gaussian distributions of the same class overlap, the maximum value of the element is taken. Zhou et al first predicted a heat map using Hourglass, resNet, DLA et al as the backbone networkThe feature map generated by the backbone network is then fed into a detection module consisting of three branches, heatmap, width-Height and Offset prediction branches, respectively. Each branch comprises a3 x 3 and a1 x1 convolution layer, wherein the number of Heatmap final output channels is related to the number of categories contained in the dataset, for example, 20 categories in the VOC dataset, and the number of Heatmap output channels is 20, and the number of width-Height and Offset final output channels are all 2, which represents the length and width of the target of the predicted center point and the Offset of the abscissa and ordinate of the center point. And finally, extracting relevant information in the network output thermodynamic diagram and obtaining an input image detection result.

CENTERNET have many advantages in remote sensing image aircraft target detection. First, it does not have a threshold for manually setting foreground and background classifications, reducing the requirements for positive and negative samples of the dataset. Second, since each target detects only one center point in the detection process, no computationally intensive and time consuming non-maximum suppression (NMS) is required, improving the detection speed. Meanwhile, the difficulty of small target detection and dense target detection caused by an anchor mechanism is reduced. However, due to the problems of shooting height, angle and the like of an actual remote sensing image, a large number of small target objects with single pixels exist in the image, which brings great challenges to CENTERNET algorithm, and the fact that the actual remote sensing image is directly applied to remote sensing image detection is difficult to break through, so that the CENTERNET network structure is required to be improved, the image is better suitable for detection of a remote sensing image dataset, and the detection precision is improved.

In order to enable CENTERNET algorithm to be better suitable for airplane target detection in remote sensing images, the problems of low small target detection precision and high false detection rate in the images are solved. The invention provides a method for detecting the attention of a multi-scale channel on the basis of CENTERNET algorithm. The network structure is shown in fig. 2, and compared with the original CENTERNET network structure, the network structure introduces an expansion structure in the figure, and is mainly improved in the following 2 aspects:

(1) In order to improve the detection performance of the small target, the horizontal connection module is introduced to correlate the features of different layers, the deep features and the shallow features are fused, the advantages of strong semantic information of the deep features and strong position and texture information of the shallow features are effectively combined, and the method has a certain effect on improving the detection of the small target.

(2) In order to improve the detection precision, a channel attention module is added in the horizontal connection module, and the characteristic response among different channels is adaptively calibrated, so that the characteristic extraction capacity of the network is improved.

The central point network adopts an encoding-decoding structure, and can learn advanced semantic information through continuous convolution operation of the network. However, the target in the remote sensing image has the characteristic of small density, and a series of convolutions can gather the characteristics of the small target, so that the problems of omission, false detection and the like are caused. To improve the feature representation of small objects, the present invention introduces a horizontal connection module to fuse the features of a given feature layer with the features of higher layer feature graphs. As shown in fig. 2, in the network structure of CENTERNET algorithm, the present invention fuses Conv1 layer and Conv7 layer, conv2 layer and Conv6 layer, conv3 layer and Conv5 layer respectively. Because these Feature layers have different spatial sizes, they are processed by a "Feature Fusion" module before Fusion. For this module, the invention has designed two structures as shown in FIG. 3 for experiments and named C-CENTERNET and T-CENTERNET, respectively. Wherein C-CenteNet is subjected to a standard 1X 1 convolution to enable the feature layers before fusion to have the same space size, and T-CENTERNET is used for testing by replacing the standard convolution in C-CENTERNET with a cavity convolution. Since each eigenvalue in different layers has different proportions, the convolution is followed by batch normalization and Relu activation.

The attention mechanism can focus the local information of the image, locate the information of interest, and suppress useless information. In order to make the model more focused on channels with valid information, the present invention introduces a squeeze-stimulus attention module (SE-Net) in the horizontal fusion module. As shown in fig. 4, the module first performs a Squeeze operation on the h×w feature map with the number of channels C to obtain a 1×1 feature map with the number of channels C, corresponding to the global pooling operation (Global Pooling) in fig. 4. And then performing Excitation operation on the obtained feature map, which corresponds to the two full-connection layer and Sigmoid operations in fig. 4, so as to obtain the weight value among the channels. And finally multiplying the original feature map by the weight of the corresponding channel through Scale operation to obtain a new feature map, and finishing updating the channel containing effective information and inhibiting the channel containing useless information.

The "Feature Fusion" structure after introducing the channel attention mechanism is shown in fig. 5. Before and after attention is drawn, as shown in fig. 5, it can be seen from the arrow mark in the figure that when there is no attention mechanism, the network has some false detection conditions, and after attention mechanism is drawn, the false detection rate is reduced, so that the detection accuracy is improved.

The loss function of the algorithm consists of a central point prediction loss L _k, a bias loss L _off and a wide-high loss L _size.

CENTERNET in predicting the heat map center points, many target center points are generated, and each target center point is unique, resulting in excessive negative and positive sample ratios. Therefore, the authors use a pixel-level logistic regression Focal-loss function based on CENTERNET improvement to solve the problem of maldistribution of positive and negative samples. The formula is shown in the following formula 2.

Wherein, alpha and beta are hyper parameters of focal loss, and 2 and 4 are taken in the experiment respectively. N is the number of keypoints in the image. Mainly serves to normalize all focal loss.Y _xyc is the true tag value for the predicted value. When Y _xyc is equal to 1, for the easily distinguishable samples, the predicted value/>Close to 1, let/>And/>Close to 0, making the final L _k small. Conversely, for indistinguishable samples, the predictor/>Close to 0,/>Then it is large, making the final L _k large. When Y _xyc is not equal to 1, the predictor/>The theoretical value should be 0, if this value is large,/>Will increase to play a punishment role. If the predicted value is close to 0,/>Will be small to reduce the lost specific gravity. And (1-Y _xyc)^β when Y _xyc is not equal to 1, the negative sample loss specific gravity around the weakened center point is used.

Because the resolution of the feature map processed by the backbone extraction network becomes one fourth of the input image, which is equivalent to a pixel point of the output feature map corresponding to a4×4 region of the original image, a large error is caused, and therefore, the author introduces a central point offset valueAnd training the offset value by adopting an L1 loss function, wherein the formula is shown in the following formula 3.

Wherein N is the number of key points in the image, p is the center point of the target frame, R is the downsampling factor, and the value of the invention is 4.Is the bias value.

CENTERNET regression to object size for each object after predicting all centersTo reduce computational burden, a single size prediction is used for each target classAnd training the wide-high loss L _size using the L1 loss function, as shown in equation 4 below.

The total loss L _det is obtained by weighted summation of the above branch losses, and satisfies the relationship shown in the following equation 5:

L_det＝L_k+λ_offL_off+λ_sizeL_size (5)

Wherein weights lambda _off、λ_size and lambda _m are 1, 0.1 and 0.3, respectively.

Experimental results and analysis

Data collection and collection experiment Environment (environment)

In order to verify the feasibility of the algorithm structure, the experiment selects aircraft class pictures from UCAS-AOD remote sensing image data and RSOD data sets to perform network training and testing. The UCAS-AOD data set is made by university of Chinese academy and comprises 1000 aircraft remote sensing pictures, aircraft samples 7482 frames, the data are concentrated, and the target direction distribution is uniform. RSOD data set is made by Wuhan university, and comprises 446 remote sensing pictures of airplane, 4993 frames of airplane samples, various brightness and contrast in the pictures, and interference such as shielding, shadow, distortion and the like. In the experimental process, pictures are randomly selected from data set samples to serve as training sets, the training and testing set ratio is kept at 9:1, and the experimental environment is configured as shown in the following table 1.

Table 1 experimental environment configuration

Table1 Experimental environment configuration

Evaluation index

The invention adopts a set of standard evaluation indexes of current target detection, comprising accuracy (Accuacy), precision (Precision), recall rate (Recall), and Frame Per Second (FPS) and average Precision (MEAN AVERAGE Precision, mAP), wherein the FPS reflects the processing speed index of the model, and the mAP can better measure the model detection Precision under the multi-Recall rate, and related calculation formulas are as follows:

where TP represents the number of positive samples in the positive samples, FN represents the number of negative samples in the positive samples, FP represents the number of positive samples in the negative samples, tp+fp is the number of positive samples in the full portion, and tp+fn is the number of positive samples in the full portion.

The predicted results TP and FP are determined by the cross-over ratio IoU (Intersection over Union), and when IoU is greater than a set threshold, they are denoted as TP, and vice versa. Setting a threshold value without confidence coefficient, different numbers of detection frames can be obtained, and if the threshold value is high, the number of the obtained detection frames is small. Otherwise, the number of detection frames obtained by low threshold is large.

Wherein IoU is obtained by the following formula 9:

FPS is the number of pictures that can be detected per second by the model, and represents the speed of detection, and can be obtained by equation 10:

where N is the number of samples tested and T is the time required to test all samples.

Training details

In the experiment, the sampling rate R is taken to be 4, and an Adam optimizer is used for iterative training, so that the input images are uniformly scaled to be 512 multiplied by 512 in resolution. The initial learning rate in the training process is 1e-3, the batch_size is 4, after 50 epochs are trained, the learning rate is reduced by 10 times, the batch_size is set to be 4, and 50 epochs are trained. In addition, to speed up convergence, pre-training weights obtained in the ImageNet classification task are used for ResNet-50 based backbones in the training process.

Analysis of experimental results

UCAS-AOD remote sensing image dataset experimental result

The Loss basically reaches a stable state after long 100 rounds of training on UCAS-AOD remote sensing image datasets, and in the training process, the Loss function change pair of the original algorithm and the algorithm is shown in figure 7, and the Loss value is obtained by weighting and summing three parts, namely the central point prediction Loss, the bias Loss and the wide-high Loss. It can be seen from the figure that the loss tends to stabilize before and after improvement. However, the convergence rate of the four model designs after improvement is improved compared with that before improvement, and the final convergence value is better than that before improvement. In addition, SC-CenterN and ST-CENTERNET after the attention mechanism is added are optimized compared with the convergence speed and convergence value of C-CENTERNET and T-CENTERNET before the attention mechanism is added respectively. The ST-CENTERNET is improved compared with the rest, the effect is optimal, the convergence speed is fastest, and the convergence value is lower.

PR curves of the model detection output before and after improvement can be calculated according to the above formulas 6 and 7, and compared with ST-CENTERNET with optimal effect, as shown in FIG. 8, wherein (a) in FIG. 8 is the PR curve of the original algorithm, and (b) is the PR curve of the algorithm of the present invention.

In order to verify the detection performance of the algorithm, the invention also carries out training and comparison test on different detection network performances under the condition of the same experimental platform and training data set, and compares the detection algorithm with the target detection algorithm comprising single-stage and two-stage, so as to obtain the test results shown in the following table 2. Compared with the current popular fast R-CNN and SSD, the improved ST-CENTERNET network structure of the invention is improved by 6.22% and 7.23% on average precision mAP. Compared with Faster R-CNN, the detection precision and the detection speed are greatly improved. Under the same condition, the mAP is improved by 16% compared with the network before improvement, the detection speed is slightly lower than that of the original network, but the real-time detection effect is achieved, and the feasibility of the algorithm is proved.

TABLE 2 UCAS-AOD remote sensing aircraft dataset test results

Table2 UCAS-AOD remote sensing aircraft data set test results

RSOD data set experiment results

In order to further test the algorithm of the invention, the RSOD remote sensing data set is selected to train and test, and the training and testing set is also randomly divided according to 9:1. The dataset is more complex than UCAS-AOD dataset picture samples, and has interference factors such as shielding, shading, distortion and the like. After 100 epochs, the total Loss before and after improvement can reach a stable state, and the total Loss before and after improvement is changed as shown in figure 9. It can be seen from the graph that the improved four model designs have better convergence rates and final convergence values than the original algorithm. Comparing the loss changes of C-CENTERNET and SC-CENTERNET and T-CENTERNET and ST-CENTERNET respectively, it can be seen that after the horizontal feature fusion module introduces the attention mechanism, the convergence speed of the model is greatly improved, and the final convergence value is better than that before the attention mechanism is not introduced.

PR curves of the model detection output before and after improvement are calculated through the formulas 6 and 7 in the test process, and are compared with ST-CENTERNET with optimal effect, as shown in FIG. 10, wherein (a) in FIG. 10 is the PR curve of the original algorithm, and (b) is the PR curve of the algorithm.

A comparison experiment was also performed under the same experimental conditions using the two-stage algorithm Faster-RCNN and the single-stage algorithm SSD with the improved algorithm of the present invention, and the comparison results are shown in Table 3.

TABLE 3 RSOD remote sensing aircraft dataset test results

Table2 RSOD remote sensing aircraft data set test results

As can be seen from Table 3, the ST-CENTERNET algorithm with the optimal detection effect of the invention has the advantages that compared with the fast-RCNN and SSD, the accuracy is respectively improved by 26.42 percent and 43.41 percent, and the accuracy is respectively improved by 72 frames/s and 35 frames/s. Compared with the original CENTERNET algorithm, the detection accuracy is improved by 18.25%, the false detection and omission rate of the network are reduced, and the detection rate of a small target is improved to a certain extent. In addition, the FPS is reduced by 0.06% compared with the original algorithm, is in an acceptable range, but is still better than the Faster-RCNN algorithm and the SSD algorithm, and the network structure has good robustness in the remote sensing image plane target detection.

3.2.3 Visual contrast analysis

In order to more intuitively observe the network detection effect before and after improvement, the ST-CENTERNET network structure with the highest precision after improvement is selected, a plurality of groups of representative pictures are selected from the experimental sample for testing, and the visualized pictures of the algorithm before and after improvement are compared, wherein the detection result is shown in figure 11.

As can be seen from comparison of FIG. 11, when the targets are small scattered targets, the targets can be well detected before and after improvement, but the original algorithm has the problem of false detection, and the confidence coefficient of the improved algorithm targets is generally higher than that before improvement; when the target is a dense small target, a large number of missing detection conditions of the original algorithm can be seen from the yellow mark, and the missing detection rate of the improved algorithm is reduced. Individual missed detection situations exist, such as blue marks in the figure; when the target is beside a building with a complex background, the original algorithm has a large number of false detection conditions, the false detection rate of the improved algorithm is reduced, and only the individual false detection conditions are left to be further optimized. The comparison result shows that the improved algorithm of the invention has a certain improvement effect on the detection of the small target, and the false detection rate and the omission factor are reduced.

The invention also visually compares the improved ST-CENTERNET with the classical dual-stage detection algorithm FASTER RCNN and the single-stage detection algorithm SSD respectively, and the comparison result is shown in FIG. 12.

As can be seen from FIG. 12, FASTER RCNN is superior to the algorithm in terms of confidence of the detected target, but there is still a small target missing detection condition, and the detection speed of the algorithm is 8 times higher than that of the FASTER RCNN algorithm by comparing the detection speeds of single pictures through experiments. Compared with the SSD algorithm, when the degree of distinguishing the target from the background is high, the detection effect of the SSD algorithm is good, but when the target is small, the SSD algorithm can only detect two effective targets, the omission ratio is high, and the SSD algorithm can still effectively detect the small targets. The comparison result shows that the algorithm of the invention has good robustness and good detection effect.

Ablation experiments

In order to further verify the effectiveness of each improved module in the algorithm, the invention selects the RSOD remote sensing image plane data set with more complex data samples and the ST-CENTERNET model with highest experimental result precision for carrying out an ablation experiment. The experimental results are shown in table 4 below.

Table 4 ablation experiments on RSOD datasets

Table4 Ablation experiment on RSOD dataset

As can be seen from Table 4, the accuracy and recall rate are both improved by 16.63% compared with the original algorithm by adding the horizontal fusion module alone. After the attention mechanism is introduced into the horizontal fusion module to optimize the weight of each channel, the accuracy is further improved, the average accuracy is respectively improved by 1.62 percent and 18.25 percent compared with the average accuracy of the algorithm before being added and the average accuracy of the original algorithm, and the effectiveness of each module of the algorithm is proved.

The invention tries to apply an anchor-free frame CENTERNET algorithm with more balanced precision and speed to the detection of a remote sensing image plane target, and provides a multi-scale channel attention detection method for solving the problems of high false detection, difficult detection of a small target and the like when the original CENTERNET algorithm detects the remote sensing image. First, feature extraction is performed using an "encode-decode" structure. Secondly, the first step of the method comprises the steps of,

Aiming at the problem of low detection precision of a small target, a horizontal connection module is introduced to integrate deep features and shallow features so as to improve the detection performance of the small target. Then, in order to improve the detection precision and reduce the false detection rate, a channel attention module is introduced into the horizontal connection module to optimize the response among the channels, focus the information of interest and restrain the useless information. And finally, carrying out a comparison experiment with other main stream algorithms on a UCAS-AOD and RSOD public remote sensing data set, wherein the AP on the UCAS-AOD data set reaches 96.78%, and compared with the original CENTERNET, faster R-CNN and SSD300, the AP is respectively improved by 16%, 6.22% and 7.23%, and a certain detection speed is maintained. Experimental results show that the method has higher detection precision in the remote sensing image aircraft target detection, maintains the speed advantage of the single-stage detection model, and has certain practical value.

Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims

1. The deep learning target detection method based on center point regression is characterized by comprising the following specific steps of:

finally, performing a comparison experiment on UCAS-AOD and RSOD public remote sensing data sets;

The horizontal connection module is a Feature Fusion module, wherein the Feature Fusion module comprises C-CENTERNET and T-CENTERNET, wherein the C-CenteNet enables Feature layers before Fusion to have the same space size through standard 1X1 convolution, and the T-CENTERNET changes the standard convolution in the C-CENTERNET into cavity convolution for testing, and because each Feature value in different layers has different proportions, the Feature values need to be subjected to batch normalization and Relu activation treatment after convolution;

The channel attention module is introduced in the horizontal connection module, specifically, before the standard 1 x 1 convolution of C-CenteNet, before the hole convolution of T-CENTERNET,

2. The deep learning target detection method based on center point regression according to claim 1, wherein a comparison experiment is performed on UCAS-AOD and RSOD public remote sensing data sets, pictures are randomly selected as training sets in the comparison experiment process, the training is kept, the test set ratio is 9:1, the sampling rate R is taken down in the experiment to be 4, an Adam optimizer is used for iterative training, input images are uniformly scaled to be 512×512 in resolution, the initial learning rate in the training process is 1e-3, the batch_size is 4, after 50 epochs are trained, the learning rate is reduced by 10 times, the batch_size is set to be 4, and then 50 epochs are trained, in addition, in order to accelerate the convergence speed, the pre-training weights obtained in an image Net classification task are used for a backup based on ResNet-50 in the training process.

3. The method for detecting the deep learning target based on the center point regression according to claim 2, wherein in order to verify the detection performance of the method for detecting the deep learning target based on the center point regression, training and comparing the performances of different detection networks under the condition of the same experimental platform and training data set, and comparing the single-stage and two-stage target detection algorithms to obtain the test result.