WO2023273337A1

WO2023273337A1 - Representative feature-based method for detecting dense targets in remote sensing image

Info

Publication number: WO2023273337A1
Application number: PCT/CN2022/074542
Authority: WO
Inventors: 胡凡; 方效林; 吴文甲; 杨明; 罗军舟
Original assignee: 南京逸智网络空间技术创新研究院有限公司
Priority date: 2021-06-29
Filing date: 2022-01-28
Publication date: 2023-01-05
Also published as: CN113536986A; CN113536986B

Abstract

A representative feature-based method for detecting dense targets in a remote sensing image, comprising: constructing a feature extraction network, a feature pyramid network, a preliminary prediction network, and a final prediction network, and sequentially inputting a remote sensing image to be detected into the feature extraction network and the feature pyramid network to output a preliminary feature map; inputting the preliminary feature map into the preliminary prediction network, and selecting a representative feature of semantic information of each category from all categories in a data set and representative confidence of each category in the whole feature map; inputting a feature map outputted by the preliminary prediction network into the final prediction network to obtain a final feature map, and calculating a similarity between the representative feature of a same category and a feature vector at the same position of the final feature map; and by taking the similarity as a weight, adaptively improving classification confidence on the basis of classification confidence of a hard positive sample.

Description

一种基于代表特征的遥感图像中的密集目标检测方法A Dense Object Detection Method in Remote Sensing Images Based on Representative Features

技术领域technical field

本发明涉及目标检测，特别涉及一种基于代表特征的遥感图像中的密集目标检测方法。The invention relates to target detection, in particular to a dense target detection method in remote sensing images based on representative features.

背景技术Background technique

遥感技术是一种发展迅速的高新技术，它所形成的信息网络为人们提供大量的科学数据和动态信息，遥感图像检测是目标检测的基准问题，在农业、气象测绘、环境保护等多个领域有着很大的应用价值。Remote sensing technology is a rapidly developing high-tech. The information network it forms provides people with a large amount of scientific data and dynamic information. Remote sensing image detection is the benchmark problem of target detection. It is used in many fields such as agriculture, meteorological surveying and mapping, and environmental protection. It has great application value.

随着深度学习算法在计算机视觉领域取得的极大成功，已被认为是遥感图像处理的首选方法。由于俯瞰视角拍摄和更大的空间视野，遥感图像中存在更多的密集场景并包含大量密集排列的物体，在基于深度学习的目标检测方法中，与真值标签对应的目标类别来说，该样本为正样本，对于目标分类置信度的预测值与真值标签误差较大的正样本为困难样本。现有的优良检测模型可以检测出图像中的多数物体，却往往会漏掉其中检测难度较大的部分困难正样本。当检测困难样本时，目标检测模型对于正样本预测的分类置信度低于设定的置信度阈值时，困难正样本会在后处理阶段被滤除，从而导致检测模型检测性能的降低；或者人为在网络的后处理阶段降低置信度阈值，则使得检测模型失去抑制低置信度负样本的能力。所以，在遥感图像中准确地检测出密集排列的多个物体具有更多挑战性。With the great success of deep learning algorithm in the field of computer vision, it has been considered as the preferred method for remote sensing image processing. Due to the bird's-eye view and larger spatial field of view, there are more dense scenes in remote sensing images and contain a large number of densely arranged objects. In the target detection method based on deep learning, the target category corresponding to the ground truth label, the The sample is a positive sample, and the positive sample with a large error between the predicted value of the target classification confidence and the true value label is a difficult sample. Existing excellent detection models can detect most objects in the image, but often miss some difficult positive samples that are more difficult to detect. When detecting difficult samples, when the classification confidence of the target detection model for the positive sample prediction is lower than the set confidence threshold, the difficult positive samples will be filtered out in the post-processing stage, resulting in a decrease in the detection performance of the detection model; or artificially Lowering the confidence threshold in the post-processing stage of the network makes the detection model lose the ability to suppress low-confidence negative samples. Therefore, it is more challenging to accurately detect densely arranged multiple objects in remote sensing images.

发明内容Contents of the invention

发明目的：针对以上问题，本发明目的是提供一种基于代表特征的遥感图像中的密集目标检测方法，通过自适应增加困难正样本的分类置信度，进而准确检测出遥感图像中密集排列的多个同类物体。Purpose of the invention: In view of the above problems, the purpose of the present invention is to provide a dense target detection method in remote sensing images based on representative features, by adaptively increasing the classification confidence of difficult positive samples, and then accurately detecting densely arranged multiple targets in remote sensing images. similar objects.

技术方案：本发明的一种基于代表特征的遥感图像中的密集目标检测方法，包括如下步骤：Technical solution: A dense target detection method in remote sensing images based on representative features of the present invention, comprising the following steps:

(1)构建四个网络模块，包括特征提取网络、特征金字塔网络、初步预测网络和最终预测网络，将待检测遥感图像依次输入到特征提取网络和特征金子塔网络中，输出初步特征图；(1) Construct four network modules, including a feature extraction network, a feature pyramid network, a preliminary prediction network and a final prediction network, input the remote sensing image to be detected into the feature extraction network and the feature pyramid network in turn, and output a preliminary feature map;

(2)将初步特征图输入到初步预测网络中，在数据集所有类别中选取每个类别语义信息的代表特征和各个类别在整张特征图中代表置信度；(2) Input the preliminary feature map into the preliminary prediction network, select the representative features of each category semantic information in all categories of the data set and each category represents the confidence in the entire feature map;

(3)将初步预测网络输出的特征图输入到最终预测网络中，得到最终特征图，计算同类别的代表特征与最终特征图相同位置特征向量之间的相似度；(3) Input the feature map output by the preliminary prediction network into the final prediction network to obtain the final feature map, and calculate the similarity between the representative features of the same category and the same position feature vector of the final feature map;

(4)以步骤3得到的相似度为权重，在困难正样本分类置信度基础上自适应提升分类置信度，作为困难正样本的最终分类置信度。(4) Take the similarity obtained in step 3 as the weight, and adaptively increase the classification confidence based on the classification confidence of difficult positive samples, as the final classification confidence of difficult positive samples.

进一步，所述步骤2得到最高分类置信度和代表特征的过程为：Further, the process of obtaining the highest classification confidence and representative features in step 2 is:

(201)在初步预测网络的分类支路中，计算整张特征图H×W位置处各类别的分类置信度

其中H为特征图的长度，W为宽度，k为数据集的类别； (201) In the classification branch of the preliminary prediction network, calculate the classification confidence of each category at the H×W position of the entire feature map

Where H is the length of the feature map, W is the width, and k is the category of the data set;

(202)在

中找出最高的分类置信度作为类别k的代表置信度RepConfidences，找出取得最高分类置信度的位置(h,w)，其中h为长度，w为宽度； (202) at

Find the highest classification confidence as the representative confidence RepConfidences of category k, and find the position (h, w) where the highest classification confidence is obtained, where h is the length and w is the width;

(203)在初步特征图FM _FAM中提取第h行、第w列的特征信息

用以表示类别k的代表特征RepFeature _k，其中FM _FAM是初步预测网络的分类支路和回归支路所共享的前一层特征图； (203) Extract the feature information of the hth row and wth column in the preliminary feature map FM _FAM

It is used to represent the representative feature RepFeature _k of category k, where FM _FAM is the previous layer feature map shared by the classification branch and the regression branch of the preliminary prediction network;

(204)设置分类置信度阈值，只有当类别k的代表置信度大于分类置信度阈值时，类别k的代表特征才为有效的代表特征。(204) Set a classification confidence threshold, and only when the representative confidence of category k is greater than the classification confidence threshold, the representative feature of category k is an effective representative feature.

进一步，所述步骤3中相似度包括特征语义相似度和特征空间相似度，所述特征语义相似度计算过程包括：Further, the similarity in the step 3 includes feature semantic similarity and feature space similarity, and the feature semantic similarity calculation process includes:

采用嵌入高斯相似性度量函数计算特征语义信息相似性，并对采用的度量方法进行归一化，所述的嵌入高斯相似性度量函数为：Using the embedded Gaussian similarity measurement function to calculate the feature semantic information similarity, and normalize the measurement method adopted, the described embedded Gaussian similarity measurement function is:

其中RF _k表示第k种类别的代表特征RepFeature _k，F _hw表示最终预测网络输出的特征图FM _ODM中第h行、第w列的特征向量

特征向量RepFeature _k、

均为1×1×n维，i表示n个维度中第i个维度的特征值； Where RF _k represents the representative feature RepFeature _k of the k-th category, and F _hw represents the feature vector of the h-th row and w-th column in the feature map FM _ODM output by the final prediction network

Feature vector RepFeature _k ,

Both are 1×1×n-dimensional, and i represents the eigenvalue of the i-th dimension in n dimensions;

采用线性嵌入空间的形式：Take the form of a linear embedding space:

φ(RF _k)＝W _φRF _k φ(RF _k )＝W _φ RF _k

θ(F _hw)＝W _θF _hw θ(F _hw )＝W _θ F _hw

其中W _φ、W _θ是学习权重矩阵；φ(RF _k) ⁱ、θ(F _hw) ⁱ分别表示两个特征向量在每个维度中的特征值； Where W _φ , W _θ are learning weight matrices; φ(RF _k ) ⁱ , θ(F _hw ) ⁱ respectively represent the eigenvalues of the two eigenvectors in each dimension;

N(φ(RF))为归一化因子，通过计算最终预测网络中第h行、第w列的特征向量F _hw分别与K个有效代表特征RF _k的相似度的加和，K为数据集的类别数目，将嵌入高斯相似度归一化为0到1的范围内，以避免相似度过高而产生的梯度***问题，归一化因子计算公式如式为； N(φ(RF)) is the normalization factor, by calculating the sum of the similarities between the feature vector F _hw of the hth row and wth column in the final prediction network and K effective representative features RF _k , K is the data The number of categories in the set, and the embedded Gaussian similarity is normalized to a range of 0 to 1 to avoid the gradient explosion problem caused by excessive similarity. The formula for calculating the normalization factor is as follows;

进一步，所述特征空间相似度计算过程包括如下步骤：Further, the feature space similarity calculation process includes the following steps:

(301)计算特征向量RepFeature _k和

在特征图维度上的空间距离dis(RF _k,F _hw)，计算公式为： (301) Calculate feature vector RepFeature _k and

The spatial distance dis(RF _k ,F _hw ) in the dimension of the feature map is calculated as:

其中

为特征向量RepFeature _k在特征图中的横、纵坐标，

为特征向量

在特征图中的横、纵坐标； in

is the abscissa and ordinate of the feature vector RepFeature _k in the feature map,

is the eigenvector

The horizontal and vertical coordinates in the feature map;

(302)利用dis(RF _k,F _hw)乘以各个特征图的步长stride _i得到两个特征向量在原始图像上的空间距离Corr _{Spatial_i}(RF _k,F _hw)，计算公式为： (302) Multiply dis(RF _k , F _hw ) by the stride _i of each feature map to obtain the spatial distance Corr _{Spatial_i} (RF _k , F _hw ) of the two feature vectors on the original image, the calculation formula is:

其中，Spatial_i表示RF _k、F _hw取自特征金字塔网络自底向上的第i层特征图，α是尺度参数； Among them, Spatial_i means that RF _k and F _hw are taken from the i-th layer feature map from the bottom up of the feature pyramid network, and α is the scale parameter;

所以，所述步骤3中的相似度表达式为：Therefore, the similarity expression in the step 3 is:

Similarity(RF _k,F _hw) Similarity(RF _k ,F _hw )

＝Sim _{Embedded_Gaussian}(RF _k,F _hw)+Corr _{Spatial_i}(RF _k,F _hw) =Sim _{Embedded_Gaussian} (RF _k ,F _hw )+Corr _{Spatial_i} (RF _k ,F _hw )

进一步，所述步骤4困难正样本最终的分类置信度通过将类别k的代表置信度按权重加到最终预测网络特征图的第(h,w)处位置关于类别k的置信度

上实现的，计算公式为： Further, the final classification confidence of the difficult positive sample in step 4 is added by weight to the confidence of the position at (h, w) of the final prediction network feature map with respect to category k by representing the confidence of category k

Realized above, the calculation formula is:

进一步，所述特征语义相似度的度量方法包括采用欧式相似度、余弦相似度或高斯相似度中的任一种。Further, the method for measuring the feature semantic similarity includes using any one of Euclidean similarity, cosine similarity or Gaussian similarity.

进一步，所述特征提取网络采用卷积层缩小原始图像的尺寸，并将提取的有效特征输入到特征金字塔网络；所述特征提取网络选取ResNet或HRNet卷积神经网络。Further, the feature extraction network uses a convolutional layer to reduce the size of the original image, and the extracted effective features are input to the feature pyramid network; the feature extraction network selects a ResNet or HRNet convolutional neural network.

进一步，所述初步预测网络选取S ²A-NET模型中的特征对齐模块，初步预测物体的类别信息和位置信息。 Further, the preliminary prediction network selects the feature alignment module in the S ² A-NET model to preliminarily predict the category information and location information of the object.

进一步，所述最终预测网络选取S ²A-NET模型中的旋转检测模块，预测物体最终的类别信息和位置信息。 Further, the final prediction network selects the rotation detection module in the S ² A-NET model to predict the final category information and position information of the object.

有益效果：本发明与现有技术相比，其显著优点是：Beneficial effect: the present invention compares with prior art, and its remarkable advantage is:

1、本发明利用代表特征和代表置信度自适应提升了困难正样本分类置信度，提升了遥感图像密集场景下的困难正样本的分类能力；1. The present invention uses representative features and representative confidence to adaptively improve the classification confidence of difficult positive samples, and improves the classification ability of difficult positive samples in dense remote sensing image scenes;

2、利用两个阶段分类支路参数确保相似度计算过程的一致性，减少检测模型的复杂度以及网络参数量。2. Use the two-stage classification branch parameters to ensure the consistency of the similarity calculation process and reduce the complexity of the detection model and the amount of network parameters.

附图说明Description of drawings

图1为本发明代表特征获取过程示意图；Fig. 1 is a schematic diagram of the representative feature acquisition process of the present invention;

图2为本发明计算相似度流程图；Fig. 2 is the flow chart of computing similarity of the present invention;

图3为本发明困难正样本提升分类置信度示意图。Fig. 3 is a schematic diagram of improving classification confidence for difficult positive samples in the present invention.

具体实施方式detailed description

本实施例所述的本一种基于代表特征的遥感图像中的密集目标检测方法，包括如下步骤：The dense target detection method in a remote sensing image based on representative features described in this embodiment includes the following steps:

(1)构建四个网络模块，包括特征提取网络、特征金字塔网络、初步预测网络和最终预测网络，将待检测遥感图像输入到特征提取网络中，采用卷积层缩小原始图像的尺寸，特征提取网络将提取的有效特征输入到特征金字塔网络中，然后输出初步特征图FM _FAM。 (1) Construct four network modules, including feature extraction network, feature pyramid network, preliminary prediction network and final prediction network, input the remote sensing image to be detected into the feature extraction network, use the convolutional layer to reduce the size of the original image, feature extraction The network inputs the extracted effective features into the feature pyramid network, and then outputs the preliminary feature map FM _FAM .

其中特征提取网络选取ResNet或HRNet卷积神经网络；初步预测网络选取S ²A-NET模型中的特征对齐模块FAM；最终预测网络选取S ²A-NET模型中的旋转检测模块ODM。 The feature extraction network selects ResNet or HRNet convolutional neural network; the preliminary prediction network selects the feature alignment module FAM in the S ² A-NET model; the final prediction network selects the rotation detection module ODM in the S ² A-NET model.

(2)将初步特征图FM _FAM输入到特征对齐模块FAM中，在数据集所有类别中选取每个类别语义信息的代表特征和各个类别在整张特征图中代表置信度，如图1所示，过程为： (2) Input the preliminary feature map FM _FAM into the feature alignment module FAM, select the representative features of each category semantic information in all categories of the data set and each category represents the confidence in the entire feature map, as shown in Figure 1 , the process is:

其中H为特征图的长度，W为宽度，k为数据集的类别，本实施例数据集包含15种物体类别，对15种类别依次进行计算； (201) In the classification branch of the preliminary prediction network, calculate the classification confidence of each category at the H×W position of the entire feature map

Wherein H is the length of the feature map, W is the width, and k is the category of the data set. The data set of this embodiment contains 15 object categories, and the 15 categories are calculated sequentially;

(202)在

(203)在初步特征图FM _FAM中提取第h行、第w列的特征信息

用以表示类别k的代表特征RepFeature _k，其中FM _FAM是初步预测网络的分类支路和回归支路所共享的前一层特征图；FM _FAM同时包含物体类别和位置信息的相关特征，用于后续特征之间相似度的计算；特征图FM _FAM为H×W×C维，其中H、W为特征图的长、宽，C为特征图的通道数，本实施例中C为256； (203) Extract the feature information of the hth row and wth column in the preliminary feature map FM _FAM

It is used to represent the representative feature RepFeature _k of category k, where FM _FAM is the previous layer feature map shared by the classification branch and regression branch of the preliminary prediction network; FM _FAM contains related features of object category and location information at the same time, used Calculation of the similarity between subsequent features; the feature map FM _FAM is H×W×C dimension, where H and W are the length and width of the feature map, and C is the number of channels of the feature map. In this embodiment, C is 256;

(204)设置分类置信度阈值以保证代表特征的可靠性和成为代表特征的难度之间的最佳平衡，本实施例中将阈值设置为0.6，只有当类别k的代表置信度大于0.6时，类别k的代表特征才为有效的代表特征。当RepConfidence _k较低如0.3、0.4时，RepFeature _k自身属于类别k的概率也较低，无法成为有效的代表特征。 (204) Set the classification confidence threshold to ensure the best balance between the reliability of the representative feature and the difficulty of becoming a representative feature. In this embodiment, the threshold is set to 0.6. Only when the representative confidence of category k is greater than 0.6, The representative features of category k are effective representative features. When RepConfidence _k is low, such as 0.3, 0.4, the probability of RepFeature _k itself belonging to category k is also low, and cannot be an effective representative feature.

(3)将初步预测网络输出的特征图输入到最终预测网络中，得到最终特征图，计算同类别的代表特征与最终特征图相同位置特征向量之间的相似度，流程图如图2所示。相似度包括特征语义相似度和特征空间相似度，(3) Input the feature map output by the preliminary prediction network into the final prediction network to obtain the final feature map, and calculate the similarity between the representative features of the same category and the feature vectors at the same position in the final feature map. The flow chart is shown in Figure 2 . Similarity includes feature semantic similarity and feature space similarity,

特征语义相似度的度量方法可以采用欧式相似度、余弦相似度或高斯相似度中的任一种。本实施采用高斯相似度来进行特征语义相似度计算，过程如下：The measurement method of feature semantic similarity can adopt any one of Euclidean similarity, cosine similarity or Gaussian similarity. In this implementation, Gaussian similarity is used to calculate feature semantic similarity. The process is as follows:

采用嵌入高斯相似性度量函数计算特征语义信息相似性，并对采用的度量方法进行归一化，嵌入高斯相似性度量函数为：The embedded Gaussian similarity measurement function is used to calculate the similarity of feature semantic information, and the measurement method used is normalized. The embedded Gaussian similarity measurement function is:

特征向量RepFeature _k、

均为1×1×n维，本实施例中特征向量的维度n取256，i表示n个维度中第i个维度的特征值； Where RF _k represents the representative feature RepFeature _k of the k-th category, and F _hw represents the feature vector of the h-th row and w-th column in the feature map FM _ODM output by the final prediction network

Feature vector RepFeature _k ,

Both are 1×1×n dimensions, and the dimension n of the feature vector in this embodiment is 256, and i represents the feature value of the i-th dimension in the n dimensions;

采用线性嵌入空间的形式：Take the form of a linear embedding space:

φ(RF _k)＝W _φRF _k φ(RF _k )＝W _φ RF _k

θ(F _hw)＝W _θF _hw θ(F _hw )＝W _θ F _hw

N(φ(RF))为归一化因子，通过计算最终预测网络中第h行、第w列的特征向量F _hw分别与15个有效代表特征RF _k的相似度的加和，将嵌入高斯相似度归一化为0到1的范围内，以避免相似度过高而产生的梯度***问题，归一化因子计算公式如式为； N(φ(RF)) is the normalization factor. By calculating the sum of the similarities between the feature vector F _hw of the hth row and wth column in the final prediction network and the 15 effective representative features RF _k , the embedded Gaussian The similarity is normalized to a range of 0 to 1 to avoid the gradient explosion problem caused by excessive similarity. The formula for calculating the normalization factor is as follows:

特征空间相似度计算过程包括如下步骤：The feature space similarity calculation process includes the following steps:

(301)计算特征向量RepFeature _k和

其中

为特征向量RepFeature _k在特征图中的横、纵坐标，

为特征向量

在特征图中的横、纵坐标；训练模型时，使用特征金子塔网络自底向上感受5层特征图进行预测，5层特征图的步长stride _i取值分别是8,16,32,64,128； in

is the eigenvector

The horizontal and vertical coordinates in the feature map; when training the model, use the feature pyramid network to experience the 5-layer feature map from the bottom up for prediction, and the stride _i values of the 5-layer feature map are 8, 16, 32, 64, 128 respectively ;

其中，Spatial_i表示RF _k、F _hw取自特征金字塔网络自底向上的第i层特征图，α是尺度参数，本实施例中将α设为1/64以使得两个距离较近的特征可以有较高的空间位置相关性。 Among them, Spatial_i means that RF _k and F _hw are taken from the i-th layer feature map from the bottom up of the feature pyramid network, and α is a scale parameter. In this embodiment, α is set to 1/64 so that two features with closer distances can be There is a high spatial location correlation.

相似度表达式为：The similarity expression is:

Similarity(RF _k,F _hw) Similarity(RF _k ,F _hw )

(4)以步骤3得到的相似度为权重，在困难正样本分类置信度基础上自适应提升分类置信度，作为困难正样本的最终分类置信度，示意图如图3所示。(4) Take the similarity obtained in step 3 as the weight, and adaptively increase the classification confidence based on the classification confidence of difficult positive samples, as the final classification confidence of difficult positive samples, as shown in Figure 3.

困难正样本最终的分类置信度通过将类别k的代表置信度按权重加到最终预测网络特征图的第(h,w)处位置关于类别k的置信度

上实现的，计算公式为： The final classification confidence of the difficult positive sample is weighted by adding the representative confidence of category k to the confidence of category k at the (h, w)th position of the final prediction network feature map

Realized above, the calculation formula is:

Claims

一种基于代表特征的遥感图像中的密集目标检测方法，其特征在于，包括如下步骤：A dense target detection method in remote sensing images based on representative features, characterized in that it comprises the following steps:

(1)构建四个网络模块，包括特征提取网络、特征金字塔网络、初步预测网络和最终预测网络，将待检测遥感图像依次输入到特征提取网络和特征金子塔网络中，输出初步特征图；(1) Construct four network modules, including a feature extraction network, a feature pyramid network, a preliminary prediction network and a final prediction network, input the remote sensing image to be detected into the feature extraction network and the feature pyramid network in turn, and output a preliminary feature map;

(2)将初步特征图输入到初步预测网络中，在数据集所有类别中选取每个类别语义信息的代表特征和各个类别在整张特征图中代表置信度；(2) Input the preliminary feature map into the preliminary prediction network, select the representative features of each category semantic information in all categories of the data set and each category represents the confidence in the entire feature map;

(3)将初步预测网络输出的特征图输入到最终预测网络中，得到最终特征图，计算同类别的代表特征与最终特征图相同位置特征向量之间的相似度；(3) Input the feature map output by the preliminary prediction network into the final prediction network to obtain the final feature map, and calculate the similarity between the representative features of the same category and the same position feature vector of the final feature map;

(4)以步骤3得到的相似度为权重，在困难正样本分类置信度基础上自适应提升分类置信度，作为困难正样本的最终分类置信度。(4) Take the similarity obtained in step 3 as the weight, and adaptively increase the classification confidence based on the classification confidence of difficult positive samples, as the final classification confidence of difficult positive samples.
根据权利要求1所述的密集目标检测方法，其特征在于，所述步骤2得到最高分类置信度和代表特征的过程为：The dense target detection method according to claim 1, wherein the process of obtaining the highest classification confidence and representative features in the step 2 is:

(201)在初步预测网络的分类支路中，计算整张特征图H×W位置处各类别的分类置信度
其中H为特征图的长度，W为宽度，k为数据集的类别； (201) In the classification branch of the preliminary prediction network, calculate the classification confidence of each category at the H×W position of the entire feature map
Where H is the length of the feature map, W is the width, and k is the category of the data set;

(202)在
中找出最高的分类置信度作为类别k的代表置信度RepConfidences，找出取得最高分类置信度的位置(h,w)，其中h为长度，w为宽度； (202) at
Find the highest classification confidence as the representative confidence RepConfidences of category k, and find the position (h, w) where the highest classification confidence is obtained, where h is the length and w is the width;

(203)在初步特征图FM _FAM中提取第h行、第w列的特征信息
用以表示类别k的代表特征RepFeature _k，其中FM _FAM是初步预测网络的分类支路和回归支路所共享的前一层特征图； (203) Extract the feature information of the hth row and wth column in the preliminary feature map FM _FAM
It is used to represent the representative feature RepFeature _k of category k, where FM _FAM is the previous layer feature map shared by the classification branch and the regression branch of the preliminary prediction network;

(204)设置分类置信度阈值，只有当类别k的代表置信度大于分类置信度阈值时，类别k的代表特征才为有效的代表特征。(204) Set a classification confidence threshold, and only when the representative confidence of category k is greater than the classification confidence threshold, the representative feature of category k is an effective representative feature.
根据权利要求2所述的密集目标检测方法，其特征在于，所述步骤3中相似度包括特征语义相似度和特征空间相似度，所述特征语义相似度计算过程包括：The dense target detection method according to claim 2, wherein the similarity in the step 3 includes a feature semantic similarity and a feature space similarity, and the feature semantic similarity calculation process includes:

采用嵌入高斯相似性度量函数计算特征语义信息相似性，并对采用的度量方法进行归一化，所述的嵌入高斯相似性度量函数为：The embedding Gaussian similarity measurement function is used to calculate the similarity of feature semantic information, and the measurement method adopted is normalized. The embedded Gaussian similarity measurement function is:

其中RF _k表示第k种类别的代表特征RepFeature _k，F _hw表示最终预测网络输出的特征图FM _ODM中第h行、第w列的特征向量
特征向量RepFeature _k、
均为1×1×n维，i表示n个维度中第i个维度的特征值； Where RF _k represents the representative feature RepFeature _k of the k-th category, and F _hw represents the feature vector of the h-th row and w-th column in the feature map FM _ODM output by the final prediction network
Feature vector RepFeature _k ,
Both are 1×1×n-dimensional, and i represents the eigenvalue of the i-th dimension in n dimensions;

采用线性嵌入空间的形式：Take the form of a linear embedding space:

φ(RF _k)＝W _φRF _k φ(RF _k )＝W _φ RF _k

θ(F _hw)＝W _θF _hw θ(F _hw )＝W _θ F _hw

其中W _φ、W _θ是学习权重矩阵；φ(RF _k) ⁱ、θ(F _hw) ⁱ分别表示两个特征向量在每个维度中的特征值； Where W _φ , W _θ are learning weight matrices; φ(RF _k ) ⁱ , θ(F _hw ) ⁱ respectively represent the eigenvalues of the two eigenvectors in each dimension;

N(φ(RF))为归一化因子，通过计算最终预测网络中第h行、第w列的特征向量F _hw分别与K个有效代表特征RF _k的相似度的加和，将嵌入高斯相似度归一化为0到1的范围内，以避免相似度过高而产生的梯度***问题，归一化因子计算公式如式为； N(φ(RF)) is the normalization factor. By calculating the sum of the similarities between the feature vector F _hw of the hth row and wth column in the final prediction network and K effective representative features RF _k , the embedded Gaussian The similarity is normalized to a range of 0 to 1 to avoid the gradient explosion problem caused by excessive similarity. The formula for calculating the normalization factor is as follows:
根据权利要求3所述的密集目标检测方法，其特征在于，所述特征空间相似度计算过程包括如下步骤：The dense target detection method according to claim 3, wherein the feature space similarity calculation process comprises the following steps:

(301)计算特征向量RepFeature _k和
在特征图维度上的空间距离dis(RF _k,F _hw)，计算公式为： (301) Calculate feature vector RepFeature _k and
The spatial distance dis(RF _k ,F _hw ) in the dimension of the feature map is calculated as:

其中
为特征向量RepFeature _k在特征图中的横、纵坐标，
为特征向量
在特征图中的横、纵坐标； in
is the abscissa and ordinate of the feature vector RepFeature _k in the feature map,
is the eigenvector
The horizontal and vertical coordinates in the feature map;

(302)利用dis(RF _k,F _hw)乘以各个特征图的步长stride _i得到两个特征向量在原始图像上的空间距离Corr _{Spatial_i}(RF _k,F _hw)，计算公式为： (302) Multiply dis(RF _k , F _hw ) by the stride _i of each feature map to obtain the spatial distance Corr _{Spatial_i} (RF _k , F _hw ) of the two feature vectors on the original image, the calculation formula is:

其中，Spatial_i表示RF _k、F _hw取自特征金字塔网络自底向上的第i层特征图，α是尺度参数； Among them, Spatial_i means that RF _k and F _hw are taken from the i-th layer feature map from the bottom up of the feature pyramid network, and α is the scale parameter;

所以，所述步骤3中的相似度表达式为：Therefore, the similarity expression in the step 3 is:

Similarity(RF _k,F _hw) Similarity(RF _k ,F _hw )

＝Sim _{Embedded_Gaussian}(RF _k,F _hw)+Corr _{Spatial_i}(RF _k,F _hw) =Sim _{Embedded_Gaussian} (RF _k ,F _hw )+Corr _{Spatial_i} (RF _k ,F _hw )
根据权利要求4所述的密集目标检测方法，其特征在于，所述步骤4困难正样本最终的分类置信度通过将类别k的代表置信度按权重加到最终预测网络特征图的第(h,w)处位置关于类别k的置信度
上实现的，计算公式为： The dense target detection method according to claim 4, wherein the final classification confidence of the difficult positive sample in step 4 is added to the (h, The confidence of the position at w) about the category k
Realized above, the calculation formula is:
根据权利要求3所述的密集目标检测方法，其特征在于，所述特征语义相似度的度量方法包括采用欧式相似度、余弦相似度或高斯相似度中的任一种。The dense target detection method according to claim 3, wherein the method for measuring feature semantic similarity includes any one of Euclidean similarity, cosine similarity or Gaussian similarity.
根据权利要求1所述的密集目标检测方法，其特征在于，所述特征提取网络采用卷积层缩小原始图像的尺寸，并将提取的有效特征输入到特征金字塔网络；所述特征提取网络选取ResNet或HRNet卷积神经网络。The dense target detection method according to claim 1, wherein the feature extraction network uses a convolutional layer to reduce the size of the original image, and the effective features extracted are input to the feature pyramid network; the feature extraction network selects ResNet or HRNet Convolutional Neural Network.
根据权利要求1所述的密集目标检测方法，其特征在于，所述初步预测网络选取S ²A-NET模型中的特征对齐模块。 The dense target detection method according to claim 1, wherein the feature alignment module in the S ² A-NET model is selected for the preliminary prediction network.
根据权利要求1所述的密集目标检测方法，其特征在于，所述最终预测网络选取S ²A-NET模型中的旋转检测模块。 The dense target detection method according to claim 1, wherein the final prediction network selects the rotation detection module in the S ² A-NET model.