CN115272153A

CN115272153A - Image matching enhancement method based on feature sparse area detection

Info

Publication number: CN115272153A
Application number: CN202210971608.3A
Authority: CN
Inventors: 蓝朝桢; 王龙号; 施群山; 周杨; 张衡; 李鹏程; 吕亮; 胡校飞; 魏紫珺; 高天
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2022-08-12
Filing date: 2022-08-12
Publication date: 2022-11-01

Abstract

The invention relates to an image matching enhancement method based on characteristic sparse area detection, and belongs to the technical field of multi-source image matching. The method comprises the following steps: 1) Obtaining a heterogeneous image pair to be matched, and performing feature extraction and feature matching to obtain an initial matching result; 2) Selecting any image from the heterogeneous image pair to be matched, detecting the characteristic sparse area of the selected image to obtain each characteristic sparse area of the selected image, and calculating an affine transformation model of the heterogeneous image pair to be matched according to an initial matching result; obtaining corresponding characteristic sparse areas of each group according to the affine transformation model; 3) Extracting the features of each group of feature sparse regions, and performing feature screening on each group of feature sparse regions according to the set corresponding feature screening threshold value to obtain local features; and aggregating the local features, and performing feature matching according to the aggregated features to obtain a matching enhancement result of the feature sparse region. The invention can obtain steady matching results with sufficient quantity and uniform distribution.

Description

Image matching enhancement method based on feature sparse area detection

Technical Field

The invention belongs to the technical field of multi-source image matching, and particularly relates to an image matching enhancement method based on feature sparse region detection.

Background

With the diversification of remote sensing imaging modes, the occurrence of multi-sensor, multi-resolution and multi-temporal remote sensing image data well-spraying, and the requirements for jointly processing heterogeneous remote sensing images and mining complementary information of the heterogeneous remote sensing images are continuously increased. The unmanned aerial vehicle image and satellite image combined processing is a great research hotspot in recent years, and the satellite reference image with accurate geographical position information can provide reliable reference information for unmanned aerial vehicle image target identification and positioning, three-dimensional modeling, disaster assessment, land resource change and the like. To realize the joint processing of the unmanned aerial vehicle image and the satellite image, a high-precision matching control point pair is obtained firstly, and the matching control point pair not only requires accurate positioning, but also requires enough quantity and uniform distribution height. However, the unmanned aerial vehicle image and the satellite image are difficult to meet the requirements due to the differences of imaging modes, imaging time, nonlinear radiation distortion and the like.

Remote sensing image matching algorithms in the prior art already widely exist. For example, on the basis of a classical matching algorithm SIFT (scale invariant Feature transform), through a SURF (Speed-Up Robust Feature) algorithm which takes the row and column values of a Hessian matrix as Feature point response detection and uses an integral map to accelerate operation, the identification and the calculation Speed of the SURF algorithm exceed those of the SIFT algorithm, but the traditional manual features represented by the SIFT algorithm and the SURF algorithm cannot adapt to complex nonlinear radiation distortion between multi-source remote sensing images. Aiming at the problem, the document 'consider the heterogeneous image matching of anisotropic weighted moment and absolute phase direction' (author: yao yongxiang, journal: university of wuhan, journal, 46 th edition in 2021, 11 th edition) also provides an HAPCG (absolute phase direction gradient histogram) algorithm, absolute phase direction feature description is obtained through the absolute phase direction gradient histogram, and finally a reliable matching result is obtained. With the rapid development of deep learning, the Convolutional Neural Network (CNN) features have stronger and stronger generalization expression capability, and compared with the traditional manual features, the CNN features are more suitable for nonlinear radiation distortion among multi-source remote sensing images. Based on CNN features, documents "Large-scale image retrieval with attribute depth features" (author: noh et al, conference: proceedings of the IEEE International Conference on Computer Vision, pages 3456-3465 in 2017), deep learning algorithm for feature matching of heterogeneous remote sensing images "(author: blue Frames et al, journal: journal of the surveying and mapping, volume 50, 2 in 2021)," Superpoint: self-superior interest point detection and description "(author: deTone D et al, conference: proceedings of the IEEE Conference Computer Vision and Pattern Recognition works, pages 224-236 in 2018) successively propose DELF, point-Net, and depth detection of local learning algorithm and feature, etc. The SuperPoint algorithm uses an automatic supervision training method, a full convolution neural network is supervised and trained through a key point detector instead of manual marking, and finally a multi-view feature descriptor detection and extraction network with strong geometric adaptability and good generalization is obtained. The SuperGlue and LoFTR algorithms are Graph Neural Network (GNN) matching algorithms that have emerged in recent years, and both pay attention to learning affine transformation relationships in stereo image pairs, that is, spatial position relationships between feature points, for matching. The SuperGlue algorithm encodes the position information and the feature descriptors of the key points through an attention mechanism, fuses visual context information among the feature points through a cross attention mechanism, enables the homonymous features to have stronger correlation, and finally completes feature matching and mismatching rejection simultaneously by using an optimal matching layer. Compared with the traditional KNN matching algorithm, the SuperGlue algorithm has great advantages in the aspects of matching speed and matching quantity, but depends on the local feature extraction algorithm with excellent performance to extract enough local feature points with strong generalization capability.

The premise of high-quality matching of the unmanned aerial vehicle image and the satellite image is to acquire local features which are enough in quantity and even in distribution. The SuperGlue matching algorithm uses SuperPoint characteristics to obtain a good effect on multi-source image matching, but for the ground feature difference problem caused by time phase difference between the unmanned aerial vehicle image and the satellite image and the characteristic sparseness problem caused by more sparse texture areas, the SuperPoint algorithm cannot extract enough and uniformly distributed local characteristics from the whole image.

In summary, most of the current local feature extraction and matching algorithms have poor matching capability for unmanned aerial vehicle images and satellite images with large seasonal differences or large sparse texture areas.

Disclosure of Invention

The invention aims to provide an image matching enhancement method based on feature sparse region detection, and aims to solve the problem that the matching capability of the existing local feature extraction and matching algorithm for multi-source remote sensing images with large seasonal difference or a large number of texture sparse regions is poor.

The technical scheme of the image matching enhancement method based on the characteristic sparse area detection provided by the invention for solving the technical problems is as follows: the method comprises the following steps:

1) Acquiring a heterogeneous image pair to be matched, and performing feature extraction and feature matching on the heterogeneous image pair to be matched to obtain an initial matching result of the heterogeneous image pair to be matched;

2) Selecting any one image in the heterogeneous image pair to be matched, performing feature sparse area detection on the selected image according to the feature points of the selected image obtained by feature extraction in the step 1), obtaining each feature sparse area of the selected image, and resolving an affine transformation model of the heterogeneous image pair to be matched according to the initial matching result of the heterogeneous image pair to be matched; carrying out affine transformation on each characteristic sparse area in the selected image according to an affine transformation model of the heterogeneous image pair to be matched, so as to obtain each group of corresponding characteristic sparse areas;

3) Extracting the features of the corresponding feature sparse regions of each group, setting corresponding feature screening thresholds for the corresponding feature sparse regions of each group, and screening the extracted features of the corresponding feature sparse regions of each group according to the feature screening thresholds to obtain the local features of the corresponding feature sparse regions of each group; aggregating the local features of the corresponding feature sparse areas of each group to obtain the global feature sparse area local features of the heterogeneous image pair to be matched; and carrying out feature matching on local features of the global feature sparse area of the heterogeneous image pair to be matched to obtain a matching enhancement result of the feature sparse area of the heterogeneous image pair to be matched.

The invention has the beneficial effects that: the method provided by the invention has the advantages that the characteristic sparse regions are detected, the characteristic sparse regions are guided one by one to carry out characteristic extraction and self-adaptive score threshold characteristic screening, reliable local characteristics are obtained and then aggregated, and finally, the steady matching results with sufficient quantity and uniform distribution are obtained through a matching algorithm.

In order to improve the accuracy of the detection of the characteristic sparse region, the method for detecting the characteristic sparse region of the selected image in the step 2) comprises the following steps:

a) Dividing the selected image into at least one detection area;

b) Traversing each detection area, storing the detection areas meeting the first condition, performing quadtree splitting on the detection areas meeting the second condition, and obtaining new detection areas after splitting; the first condition is that the characteristic point in the detection area is not more than a set threshold value and the area of the detection area is not less than the minimum detection area; the second condition is that the characteristic point in the detection area is larger than a set threshold value and the area of the detection area is not smaller than the minimum detection area;

c) And c) executing the step b) every time a new detection area is split until the new detection area does not meet the second condition or the splitting frequency of the new detection area obtained by splitting reaches the set splitting frequency, wherein the stored detection areas are the characteristic sparse areas of the selected image.

Further, the calculation formula of the set splitting number k is as follows:

wherein, W and H are the length and the width of the image respectively, S is the minimum detection area, and the value of S is obtained according to the test.

Further, the calculation formula of the feature screening threshold θ of each group of feature sparse regions is as follows:

wherein, C _n And N is the number of the local features independently extracted from each group of feature sparse regions.

Further, the process of obtaining the local features of each group of feature sparse regions in step 3) is as follows: firstly, carrying out region-by-region feature extraction on each group of feature sparse regions to obtain initial local features of each group of feature sparse regions, and screening out local feature key points and descriptors with the score larger than a feature screening threshold value from the initial local features of each group of feature sparse regions, thereby obtaining the local features of each group of feature sparse regions.

Further, the manner of aggregating the local features of each group of feature sparse regions to obtain the global local features of the feature sparse regions is as follows:

where Q represents the number of feature sparse regions, p ^q 、c ^q And e ^q Respectively representing the positions, scores and descriptors of the screened local features in the Q-th group of feature sparse regions, wherein Q =1,2,3, \ 8230;, Q.

Considering that a certain error may exist in calculating the corresponding characteristic sparse region by using the affine change model obtained by the initial matching result, the method for detecting the characteristic sparse region of the selected image in the step 2) further comprises a step d) of performing area diffusion on all the stored detection regions to the periphery according to the corresponding set distance, and taking each detection region after the area diffusion as each characteristic sparse region of the selected image. The influence caused by the error can be avoided by amplifying the adaptive area, and the accuracy of the detection of the characteristic sparse area is further improved.

Further, when the selected image is divided into a detection area, after each division, the area calculation formula of each new detection area is as follows:

wherein, W and H are the length and width of the selected image, and k is the current splitting times of the selected image.

Further, in the step 1), performing feature extraction on the heterogeneous image pair to be matched by using a SuperPoint feature extraction network model to obtain initial local features of the heterogeneous image pair to be matched; and performing feature matching on the initial local features of the heterogeneous image pair to be matched by adopting a SuperGlue network model to obtain an initial matching result of the heterogeneous image pair to be matched.

Further, in the step 3), local feature extraction is performed on each group of feature sparse regions by using a SuperPoint feature extraction network model, and feature matching is performed on local features of the global feature sparse regions by using a SuperGlue network model.

Drawings

FIG. 1 is a flow chart of a feature sparse region detection and match enhancement algorithm according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a SuperPoint feature extraction network model according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a SuperGlue network model according to an embodiment of the present invention;

FIG. 4 (a) is a linked list L in the feature sparse area detection algorithm of the embodiment of the present invention _D A schematic diagram after initialization;

FIG. 4 (b) is a schematic diagram of first splitting of a detection region in the feature sparse region detection algorithm according to the embodiment of the present invention;

fig. 4 (c) is a schematic diagram of splitting the detection region k time in the feature sparse region detection algorithm according to the embodiment of the present invention;

FIG. 4 (d) is a schematic diagram of the stop splitting of the detection region in the feature sparse region detection algorithm according to the embodiment of the present invention;

FIG. 5 is a flow chart of an adaptive threshold feature screening algorithm according to an embodiment of the present invention;

fig. 6 (a) is a first set of unmanned aerial vehicle images and satellite remote sensing images according to an embodiment of the present invention;

fig. 6 (b) is a second set of drone images and satellite remote sensing images according to an embodiment of the present invention;

fig. 6 (c) is a third set of drone images and satellite remote sensing images in accordance with an embodiment of the present invention;

fig. 6 (d) is a fourth set of drone images and satellite remote sensing images according to the embodiment of the present invention;

fig. 7 is a schematic diagram of different minimum detection area matching enhancement quantities of four pairs of unmanned aerial vehicle images and satellite remote sensing images according to the embodiment of the present invention;

fig. 8 (a) is a schematic diagram of an initial matching result of a first set of drone images and satellite remote sensing images according to an embodiment of the present invention;

fig. 8 (b) is a schematic diagram of an initial matching result of the second set of unmanned aerial vehicle images and the satellite remote sensing images according to the embodiment of the present invention;

fig. 8 (c) is a schematic diagram of an initial matching result of a third set of unmanned aerial vehicle images and satellite remote sensing images according to the embodiment of the present invention;

fig. 8 (d) is a schematic diagram of an initial matching result of a fourth set of unmanned aerial vehicle images and satellite remote sensing images according to the embodiment of the present invention;

fig. 9 (a) is a schematic diagram of a detection result of a characteristic sparse region of a first set of unmanned aerial vehicle images and satellite remote sensing images according to the embodiment of the present invention;

fig. 9 (b) is a schematic diagram of a detection result of a feature sparse area of a second set of unmanned aerial vehicle images and satellite remote sensing images according to the embodiment of the present invention;

fig. 9 (c) is a schematic diagram of a detection result of a feature sparse region of a third set of unmanned aerial vehicle images and satellite remote sensing images according to the embodiment of the present invention;

fig. 9 (d) is a schematic diagram of a feature sparse area detection result of a fourth set of unmanned aerial vehicle images and satellite remote sensing images according to the embodiment of the present invention;

FIG. 10 (a) is a diagram illustrating the similarity measurement result of the sparse region of the first set of paired features according to the embodiment of the present invention;

FIG. 10 (b) is a diagram illustrating the similarity measurement result of the sparse region of the second set of paired features according to the embodiment of the present invention;

FIG. 10 (c) is a diagram illustrating the third set of paired feature sparse region similarity measurements according to an embodiment of the present invention;

FIG. 10 (d) is a diagram illustrating the similarity measurement result of the fourth set of paired feature sparse regions according to the embodiment of the present invention;

fig. 11 (a) is a schematic diagram of a matching result between a first set of unmanned aerial vehicle images and satellite images in the SD-ME algorithm according to the embodiment of the present invention;

fig. 11 (b) is a schematic diagram of a matching result between the second set of unmanned aerial vehicle images and the satellite images in the SD-ME algorithm according to the embodiment of the present invention;

fig. 11 (c) is a schematic diagram of a matching result between the third set of unmanned aerial vehicle images and the satellite images in the SD-ME algorithm according to the embodiment of the present invention;

fig. 11 (d) is a schematic diagram of a matching result between the images of the fourth group of unmanned aerial vehicles and the satellite images in the SD-ME algorithm according to the embodiment of the present invention;

FIG. 12 (a) is a schematic view of a horizontal section of an embodiment of the present invention;

FIG. 12 (b) is a schematic vertical sectional view of an embodiment of the present invention;

FIG. 12 (c) is a schematic view of a 45 ° directional region of an embodiment of the present invention;

FIG. 12 (d) is a schematic view of a 135 ° directional region of an embodiment of the present invention;

FIG. 12 (e) is a schematic diagram of the central and peripheral regions of an embodiment of the present invention.

Detailed Description

The following description will further describe embodiments of the present invention with reference to the accompanying drawings.

The invention relates to an image matching enhancement method based on characteristic sparse area detection, which has the technical conception that: firstly, local features depend on gray information with obvious changes, however, the gray change of a texture sparse region is small, and the extraction quantity of the local features of the texture sparse region is small under the global consistency parameters, so that the enhancement method of the application relates to screening and detecting the local feature sparse region, and then the local feature sparse region is independently input into a feature extraction network, and the problem of uneven local feature distribution caused by texture differences under the global consistency parameters is solved. Secondly, since local features of the texture sparse region are not obvious in performance, the score of the feature point of the region is low, and therefore the local feature point with the relatively high score is reserved to the maximum extent by using the adaptive score threshold feature screening algorithm. And performing joint processing on a feature sparse region detection algorithm and an adaptive score threshold feature screening algorithm, aggregating the obtained local features, and inputting the aggregated local features into a feature matching network to finally obtain steady matching results with sufficient quantity and uniform distribution.

Based on the technical conception, the invention has the concrete implementation steps as follows:

2) Selecting any one image from the heterogeneous image pair to be matched, detecting the characteristic sparse area of the selected image according to the characteristic points of the selected image obtained by characteristic extraction in the step 1) to obtain each characteristic sparse area of the selected image, and calculating an affine transformation model of the heterogeneous image pair to be matched according to the initial matching result of the heterogeneous image pair to be matched; carrying out affine transformation on each characteristic sparse region in the selected image according to an affine transformation model of the heterogeneous image pair to be matched, so as to obtain each group of corresponding characteristic sparse regions;

3) Extracting the features of the corresponding feature sparse regions of each group, setting corresponding feature screening thresholds for the corresponding feature sparse regions of each group, and screening the extracted features of the corresponding feature sparse regions of each group according to the feature screening thresholds to obtain the local features of the corresponding feature sparse regions of each group; aggregating the local features of the corresponding feature sparse regions of each group to obtain the global feature sparse region local features of the heterogeneous image pair to be matched; and carrying out feature matching on local features of the global feature sparse region of the heterogeneous image pair to be matched to obtain a matching enhancement result of the feature sparse region of the heterogeneous image pair to be matched.

Further, in order to improve the accuracy of feature sparse region detection, the invention provides a sparse feature region detection algorithm, which is used for processing and analyzing the initial matching result of the image pair to be matched through the feature sparse region detection algorithm so as to detect each group of feature sparse regions of the image pair to be matched. The detection algorithm comprises the following steps:

a) Dividing the selected image into at least one detection area;

d) Calculating an affine transformation model of the heterogeneous image pair to be matched according to an initial matching result of the heterogeneous image pair to be matched; and respectively carrying out affine transformation on each characteristic sparse area in the selected image according to the affine transformation model of the heterogeneous image pair to be matched, thereby obtaining each group of corresponding characteristic sparse areas.

The characteristic point in step b) is smaller than a set threshold, where the set threshold is set according to the selected minimum detection area, such as 0, 1, and so on.

Further, considering that a certain error may exist in calculating the corresponding characteristic sparse region by means of the affine change model obtained from the initial matching result, adaptive area expansion is performed on the obtained characteristic sparse regions, wherein the adaptive area expansion means that a corresponding diffusion distance is set for each characteristic sparse region, and area diffusion is performed on the periphery of the characteristic sparse region according to the set distance to obtain a region with a larger area. The adaptive area amplification can avoid the influence of the error and improve the accuracy of sparse area detection.

The following describes each implementation method in each step described above.

1. And obtaining an initial matching result.

When feature extraction and feature matching are performed on the heterogeneous image pairs to be matched, the SuperPoint feature extraction network model and the SuperGlue network model are preferably selected for the embodiment to perform feature extraction and matching on the heterogeneous image pairs to be matched to obtain an initial matching result.

1) And extracting the network model from the SuperPoint characteristics.

The SuperPoint algorithm performs feature extraction on the whole image through a full convolution neural network architecture, and the local feature extraction and descriptor construction network of the algorithm is shown in FIG. 2.

As can be seen from fig. 2, the SuperPoint algorithm consists of three modules, namely a feature encoder, a keypoint decoder and a descriptor decoder. The characteristic encoder is constructed by a light-weight full convolution neural network transformed by VGG-style, and the full convolution neural network replaces a full connection layer at the tail of the traditional convolution neural network with a convolution layer. The feature encoder has the main function of extracting features after dimension reduction is carried out on the pictures, and further reducing the calculation amount of subsequent grids. Inputting the image via a feature encoder

Is encoded as an intermediate tensor

The key point decoder consists of a convolution layer, a Softmax activation function and a tensor deformation function, wherein the convolution layer is mainly used for converting intermediate tensor output by the encoder into an eigenmap

Then obtaining probability distribution representing interest points through Softmax activation function

A characteristic diagram of (2); finally, an upsampling model constructed by a plurality of deconvolution layers and anti-pooling layers is replaced by a tensor deformation function, and the upsampling model is directly constructed

Flattening the feature map to extract thermodynamic diagram tensor for key points

Each channel vector on the signature map corresponds to a thermodynamic value for an 8 x 8 region of the thermodynamic diagram.

The key point decoder consists of a convolution layer, a Softmax activation function and a tensor deformation function, and the effect of the key point decoder is to calculate a probability for each pixel of a picture, wherein the probability represents the probability size of the key point. In a keypoint decoder, the main role of the convolutional layer is to convert the intermediate tensor of the encoder output into an eigenmap

A characteristic diagram of (2); tensor distortion function adopterThe pixel convolution (sub-pixel convolution) method replaces an up-sampling model constructed by a plurality of deconvolution layers and inverse pooling layers, and has the advantages of reducing the calculation amount of the model and simultaneously recovering the resolution of a feature map, and directly converting the resolution of the feature map into a sub-pixel convolution (sub-pixel convolution) method

The characteristic diagram is flattened to extract thermodynamic diagram tensor of key points

Each channel vector on the signature map corresponds to a thermodynamic value for an 8 x 8 region of the thermodynamic diagram. Final output thermodynamic tensor

Each value of (a) represents a probability magnitude that the pixel is a feature point.

The descriptor decoder is used to generate a semi-dense descriptor feature map, which is first output by the convolutional layer

The semi-dense descriptors (namely 1 is output by each 8 pixel points) can effectively reduce the memory and the calculation amount and keep the operation efficiency. Then, the decoder carries out bicubic interpolation on the descriptors to obtain pixel-level precision, and finally, the dense feature map is obtained through L2 normalization

2) SuperGlue network model.

And after obtaining the initial local features of the heterogeneous image pair to be matched, matching the initial local features by using a SuperGlue algorithm. The SuperGlue is a network for feature matching and outlier rejection, performs feature enhancement on key points based on a GNN network, and converts a feature matching problem into a problem of solving differentiable optimization transfer. The structure of the SuperGlue graph neural network model is shown in FIG. 3, and the algorithm is mainly divided into two modules, namely a feature enhancement module and an optimal matching module based on an attention-seeking neural network. The feature enhancement module encodes the position information of the feature points and the descriptor information and then performs feature fusion, and then performs interleaving processing on L rounds through a self-attention layer and a cross-attention layer to respectively aggregate context information in the images and between the images to obtain a more specific feature matching vector f for matching.

The optimal matching module calculates the inner product between the feature matching vectors f by the formula (1) to obtain a matching degree score matrix S belonging to R ^M×N M and N represent the number of feature points in the A and B images respectively. Because some feature points are influenced by the problems of shielding and the like, and no matching point exists, the algorithm provides a garbage can mechanism, as shown in formula (2), a garbage can is a row and a column which are newly added on the basis of a score matrix and is used for identifying whether the feature has the matching point or not. The SuperGlue regards the final matching result as a distribution problem, and calculates a distribution matrix P ∈ R ^M×N The optimization problem is constructed with the scoring matrix C by maximizing the overall score ∑ _i,j C _i,j P _i,j To solve for P. The optimal feature distribution matrix P is solved by a Sinkhorm algorithm on a Graphic Processing Unit (GPU) in a fast iteration mode.

In the formula (I), the compound is shown in the specification,<·，·>denotes the inner product, f _i ^A And with

And matching vectors of the features of the images A and B output by the feature enhancement module.

In the formula, N and M represent the garbage bins in A and B respectively, namely the matching number of each garbage bin is the same as the number of key points in the other group, and z is the number of the matching number in each garbage bin.

2. And acquiring corresponding characteristic sparse areas in the heterogeneous image pairs to be matched.

The more and more uniform the distribution of the feature points in the space, the more accurate the spatial geometrical relationship of the two heterogeneous images can be expressed. However, the texture difference between the unmanned aerial vehicle image and the satellite image under the global consistent parameter can cause the distribution of the initial matching result to be uneven, the feature points in the dense texture area are dense, and the feature points in the sparse texture area are rare. Aiming at the problem, the initial matching result is processed and analyzed through a feature sparse area detection algorithm, so that a feature sparse area is detected, and the feature sparse area is guided to expand adaptively and re-extract features. The feature sparse region detection algorithm has the main idea of quadtree region segmentation, and has the advantages that sub-regions can be more uniformly segmented, so that the feature sparse region can be more uniformly detected, and redundant feature extraction near a feature dense region is avoided.

The principle and flow of the feature sparse region detection algorithm are shown in algorithm 1 and fig. 4 (a) to 4 (d).

The total splitting times K are calculated according to the formula:

in the formula, W and H are the length and the width of the image respectively, S is the minimum detection area, and the value of S is obtained according to test tests.

After the feature sparse region is obtained in the algorithm 1, adaptive area expansion processing is performed on the feature sparse region, that is, an area with a larger area is obtained by performing area diffusion to the periphery of the sparse region.

The method comprises the steps of obtaining an area with unobvious remote sensing image features through a feature sparse area detection algorithm, independently detecting relatively reliable feature points of the area, and directly influencing the total number and representativeness of the detected feature sparse area through the minimum detection area. If the minimum detection area is too large, the total number of the detected feature sparse regions is possibly too small, most of the feature sparse regions cannot be covered, and the representativeness is low; if the minimum detection area is too small, the feature detector cannot effectively identify and extract the local features. Therefore, the appropriate minimum detection area is a decisive factor for the performance of the feature sparse region detection algorithm. As the feature sparse region detection algorithm is based on the quadtree principle to carry out detection by segmentation, the small area detection area is shown as the formula (4).

Wherein W and H are the length and width of the image, and k is the current splitting number of the image.

3. And (4) adaptive threshold characteristic screening.

And (4) processing by a feature sparse area detection algorithm to obtain one-to-one corresponding feature sparse areas in the image pair, wherein the corresponding areas mainly comprise texture sparse areas, ground feature difference areas and the like. When local features are extracted for the first time in the global state, feature points with relatively low scores in the feature sparse region are screened out due to the fact that the global consistency screening threshold is high; if the global consistency screening threshold is reduced during initial matching, the feature points are still gathered in the texture dense area due to the global consistency texture difference, and a matching result with uneven distribution is obtained. In the method of the embodiment, the feature extraction is performed on each independent feature sparse region obtained by the feature sparse region detection algorithm, and compared with the feature points of the feature sparse region obtained on the global thermodynamic diagram, the thermodynamic diagram tensor is independently extracted on the small surface element of the feature sparse region, so that the obtained feature points are relatively richer and more uniform and have higher confidence coefficient.

The SuperPoint local feature extraction is independently carried out on paired feature sparse regions through an adaptive threshold feature screening algorithm, and the algorithm flow is shown in FIG. 5.

As shown in fig. 5, firstly, region-by-region feature extraction is performed on the feature sparse region pair to obtain relatively abundant local features on the small surface element texture sparse region; screening local features in a single region based on scores, calculating a self-adaptive score threshold as shown in a formula (5), and screening out local feature key points and descriptors with scores larger than the self-adaptive score threshold of the region; and finally, aggregating the screened features of all the independent areas to form the local features of the global feature sparse area, as shown in the formula (6).

In the formula, C _n And N is the number of the local features independently extracted from each group of feature sparse regions.

After the local features of the global feature sparse region are obtained through the self-adaptive threshold feature screening algorithm, the local features are input into a SuperGlue feature matching network for matching, and finally the robust matching enhancement result of the feature sparse region is obtained.

The Enhancement method of the embodiment provides a Feature Sparse area Detection and Matching Enhancement Algorithm on the basis of a SuperPoint and SuperGlue Feature Matching Algorithm, so as to enhance the Matching effect of SuperGlue, and is referred to as an SD-ME Algorithm (SD-ME) for short, and a flow chart of the method is shown in FIG. 1. In the specific implementation of the method of this embodiment, an experiment can be performed in the ubuntu18.04 operating system, the programming language environment is python3.6, and the programming platform is Pycharm. The hardware platform uses a notebook computer carrying an I7 CPU, a 31G memory and a GeForce RTX 2060 display card (6 GB of display memory). The SD-ME algorithm was tested using the SuperPoint algorithm with an official pre-trained model of the SuperGlue algorithm.

The following takes four sets of satellite images (left) and unmanned aerial vehicle images (right) shown in fig. 6 (a) to 6 (d) as an example to further explain the specific implementation process of the method of the present invention.

In the four sets of satellite images (left) and unmanned aerial vehicle images (right) shown in fig. 6 (a) to 6 (d), the first set of images is dense in buildings and has sparse texture areas such as farmlands, and the unmanned aerial vehicle images generate a certain degree of nonlinear distortion due to the distortion of the photographic lens; the second group of unmanned aerial vehicle images are 2020 spring images, the satellite images are 2018 summer images, the ground feature difference is obvious due to large time-season difference, and the texture sparse areas such as forests, wastelands and the like are more; the third group of unmanned aerial vehicle images are shot in 2021 years in the river south flood disaster, the satellite images are 2018 summer images, the local gray level difference of the two images is large, and the ground feature difference is obvious; the fourth group of unmanned aerial vehicle images are winter images, the satellite images are 2016 summer images, and the difference between the image sizes of the two images and the difference between the resolution ratios of the two images are large.

The 4 groups of images selected in fig. 6 (a) -6 (d) have sparse texture areas, large time phase difference, obvious ground feature difference and certain nonlinear spectral radiation distortion, and are difficult to match. The verification of the method of the invention by selecting the above 4 groups of images has stronger representativeness.

First, a minimum detection area S optimum value determination test is performed.

In the SD-ME algorithm, the proper minimum detection area is selected to be beneficial to the feature sparse area detection algorithm to obtain an effective detection result. Since the image widths of most of the images to be matched are not consistent and are not necessarily integer multiples of 4, the minimum detection areas of different images calculated by the formula (6) may have differences within a certain range, and therefore the fluctuation range of the minimum detection area is represented in the form of an area interval. And performing surface element reverse combination derivation according to a quadtree region segmentation principle in the feature sparse region detection algorithm, wherein the area interval can be shown by an equation (7).

S＝4 ^l ～4 ^l+1 (7)

Wherein l is the reverse binning times.

In order to select the minimum detection area most suitable for the feature sparse area detection algorithm, 4 pairs of images in fig. 6 (a) -6 (d) are tested, and the optimal value of the minimum detection area S is detected by comparing the total number of feature sparse area matching points obtained from different detection areas. In the experiment, the minimum detection areas were set to 64, 256, 1024, 4096, respectively, and the results are shown in fig. 7.

As shown in fig. 7, when the minimum detection area S is 256, the number of matching points acquired in the feature sparse regions of the four image pairs is the largest, and therefore, the optimal value of the minimum detection area of the feature sparse region detection algorithm herein is set to 256, so as to ensure that the feature sparse region detection algorithm assists in acquiring the matching point pairs with the best number and quality.

Secondly, feature sparse region detection and similarity measurement are carried out.

The SD-ME algorithm acquires a characteristic sparse region between the unmanned aerial vehicle image and the satellite image by means of a characteristic sparse region detection algorithm, and then performs characteristic re-extraction and matching through a self-adaptive threshold characteristic screening algorithm to finally complete matching enhancement. In order to verify the reliability of feature sparse region detection in the SD-ME algorithm, a test performs a feature sparse region detection test on 4 sets of image pairs in fig. 6 (a) to 6 (d), and the initial matching result of the SuperGlue combined with the SuperPoint feature on the test image is shown in fig. 8 (a) to 8 (d).

As can be seen from fig. 8 (a) to 8 (d), the SuperGlue and the SuperPoint feature have an ideal matching effect on the unmanned aerial vehicle image and the satellite remote sensing image, but the number of matching point pairs in the sparse texture area (vegetation, water area, farmland, etc.) is small, and the effect of global uniform control cannot be exerted in the applications such as unmanned aerial vehicle image correction and registration. The SD-ME algorithm extracts feature sparse regions by means of initial matching results, sets the optimal value of the minimum detection area to 256, performs feature sparse region extraction tests on 4 unmanned aerial vehicle images in FIGS. 6 (a) -6 (d), and the feature sparse region detection algorithm stores each extracted feature sparse region in a linked list structure independently, and displays the feature sparse regions in a combined manner in order to display feature sparse region detection results more intuitively, wherein the results are shown in FIGS. 9 (a) -9 (d).

As shown in fig. 9 (a) to 9 (d), the feature sparse region detection algorithm accurately obtains a relatively reliable feature sparse region, but the key to extracting local features from paired feature sparse regions and successfully matching the local features is that the feature sparse regions have a strict one-to-one correspondence relationship. The feature similarity of different feature sparse regions is extremely low, and the effective correspondence of the regions detected by the feature sparse region detection algorithm is verified by measuring the feature similarity of the obtained paired regions.

Performing similarity measurement on the region features, aggregating the local features extracted in the region, and aggregating the local feature descriptors of the feature sparse region by using a vector of aggregation locality descriptors (VLAD). The VLAD algorithm is a classic algorithm in the field of image retrieval, the VLAD trains out a codebook of which an image has a plurality of visual words through a clustering method, the codebook is used for calculating a codebook clustering center nearest to each image feature, and then differences between all the features and the clustering centers are superposed to obtain a plurality of d-dimensional vectors corresponding to the number of the visual words in the codebook, and the d-dimensional vectors are connected in series to obtain a VALD vector corresponding to the image.

Feature similarity between paired images is low due to ground feature differences between unmanned aerial vehicle images and satellite images, nonlinear radiation distortion and the like, but the feature similarity still has global consistency, namely VLAD vector similarity between paired feature sparse regions and VLAD vector similarity between the whole images have consistency, and VLAD vector similarity between unpaired feature sparse regions is far lower than VLAD vector similarity between the whole images. The experiment uses the Euclidean distance of the VLAD feature vector as a measure to measure the feature similarity, the greater the Euclidean distance is, the lower the feature vector similarity is, and the smaller the Euclidean distance is, the higher the feature vector similarity is. Feature vector V ₁ ＝(x ₁ ,x ₂ ,…,x _d ) And V ₂ ＝(y ₁ ,y ₂ ,…,y _d ) Are calculated as the Euclidean distance betweenFormula (8).

Where Ed is Euclidean distance, u represents the u-th element in the d-dimensional vector, and x _u As a feature vector V ₁ The u-th element of (b), y _u As a feature vector V ₂ U =1,2, \8230;, d.

The test tests the VLAD vector similarity of paired feature sparse areas obtained by the feature sparse area detection algorithm on the image pair by taking the Euclidean distance similarity of VLAD feature vectors between the paired images as a standard, and further proves the one-to-one correspondence and local feature similarity of the detection results of the feature sparse area detection algorithm. Fig. 10 (a) to 10 (d) show the results of the feature similarity measurement for the feature sparse region.

In fig. 10 (a) to 10 (d), the horizontal axis represents 36 randomly selected pairs of feature sparse regions, the vertical axis represents the euclidean distance between the pair of region VLAD vectors, ed _ a represents the VLAD feature vector of the first pair of images, and Ed _ a _ random represents the feature sparse region VLAD vector randomly selected from the first pair of images. As can be seen from fig. 8 (a) to 8 (d), the characteristic sparse regions VLAD vector euclidean distances between the four sets of unmanned aerial vehicle images extracted by the characteristic sparse region detection algorithm and the satellite reference image are mostly smaller than the VLAD vector euclidean distance between the whole images, which proves that the characteristic sparse regions extracted by the characteristic sparse region detection algorithm have a strict one-to-one correspondence relationship.

And finally, comparing and analyzing the matching performance of the SD-ME algorithm with the traditional SIFT algorithm, the ContextDesc algorithm and the SuperGlue algorithm.

In order to verify the superiority of the SD-ME algorithm to the detection and the matching enhancement of the feature sparse region, the matching performance test is performed on four groups of image pairs in the images 6 (a) to 6 (d) through the test, and the traditional SIFT algorithm, the ContextDesc algorithm and the SuperGlue algorithm are compared through different matching performance evaluation indexes. The SIFT (scale invariant feature transform), namely the scale invariant feature transform, is a local feature descriptor with certain affine invariance and anti-interference; the ContextDesc algorithm is a deep learning algorithm suitable for multi-mode image matching, and original feature descriptors such as DELF (DeltaFla-DeltaFla) are enhanced through high-order image visual information and geometric information of key point distribution.

The test uses the number of correct matching points (P) and the matching time (t) to compare the performance of the above matching algorithms. The correct matching points are the number of points of which the difference between the actual position of the feature point on the image to be matched and the actual position of the feature point on the reference image is within a threshold value, and the correct matching points are verified by the formula (9). The number (P) of correct matching points refers to the number of matching points meeting the above conditions, and the index can reflect the quality of the basic performance of the feature matching algorithm.

In the formula, H is a real affine transformation model of replacing two multi-source remote sensing images by an affine transformation model fitted by artificial points, and the characteristic point (x) _u ′,y _u ') after affine transformation, its corresponding point (x) _u ,y _u ) If the distance of (c) is smaller than the threshold epsilon (the threshold is set to 3 in this embodiment), it is determined that it is a correct matching point. The manual point selection mainly comprises the steps of manually judging whether the point pairs are homonymous point pairs one by means of various matching methods; and aiming at the areas which cannot be identified by various matching algorithms, selecting the areas through image amplification visual interpretation. And selecting the uniformly distributed 36 homonymous point pairs for fitting an affine transformation model by the aid of the two modes.

The comparison results of the above three algorithms with the SD-ME algorithm on the number of correct matching points (P) and the matching time (t) are shown in Table 1.

TABLE 1 comparison of matching test results

As can be seen from table 1, the classical SIFT algorithm has a poor matching effect, and cannot be completely applied to unmanned aerial vehicle images and satellite images with nonlinear radiation distortion and large ground object differences. The ContextDesc algorithm obtains a certain number of matching point pairs on 4 groups of images, but the overall matching accuracy of the algorithm is low, and the time consumption is far higher than that of other algorithms. The SuperGlue algorithm achieves a good effect on the heterogeneous images, and obtains a large number of matching point pairs while maintaining excellent matching accuracy. The SD-ME algorithm performs matching enhancement processing on the SuperGlue algorithm, which is slightly higher than the SuperGlue algorithm in time consumption, but obtains a larger number of correct matching point pairs distributed more uniformly, and the matching effect is shown in fig. 11 (a) -11 (d).

Fig. 11 (a) to 11 (d) visually indicate that the SD-ME algorithm has strong adaptability to the unmanned aerial vehicle image and the satellite reference image. The SD-ME algorithm performs matching enhancement on the initial matching result of the SuperGlue algorithm, and obtains a large number of correct matching point pairs. The SD-ME algorithm makes up the defect that the matching performance of the SuperGlue algorithm in the characteristic sparse region is poor, and obtains enough and reliable characteristic matching point pairs in the characteristic sparse region, so that the matching results are distributed more uniformly. In order to further verify the uniformity advantage of the matching enhancement result of the SD-ME algorithm, the distribution uniformity of the matching points of the SD-ME algorithm and the SuperGlue algorithm is compared through the distribution uniformity of the matching points.

According to the document "evaluation method of distribution uniformity of image feature points" (author: zhuyi peak, etc., journal: proceedings of the university gentleman's college of academic institute, vol. 30, no. 3, 2021), the distribution uniformity of the matching points is calculated according to the distribution uniformity of the matching results in five directions, and the image is divided into ten areas in total in five directions, as shown in fig. 12 (a) to 12 (e).

According to the statistical principle, the sample variance is adopted to represent the difference of the number of the matching points in the image blocks in the five different directions, if the distribution of the matching point pairs in the five directions is relatively uniform, the sample variance of the number of the matching point pairs in the five directions is relatively small, otherwise, the sample variance is relatively large. The distribution uniformity of the matching points is shown as formula (10), and the larger the distribution uniformity of the matching points is, the more uniform the distribution of the matching points is proved, otherwise, the non-uniform distribution of the matching points is proved.

In the formula, V _g And counting a distribution vector for the g-th region, wherein the vector is formed by combining the number of matching points in ten regions.

The matching point pair distribution uniformity calculation step is shown in algorithm 2.

And calculating the distribution uniformity of the matching point pairs of the SD-ME algorithm and the SuperGlue algorithm according to the algorithm 2, wherein the result is shown in the table 2.

TABLE 2 comparison of uniformity of distribution of matching points

As can be seen from Table 2, the distribution uniformity of the matching point pairs of the SD-ME algorithm on the 4 groups of images is greater than that of the SuperGlue algorithm, the matching point uniformity adopts logarithmic operation and is reflected to the variance of five-direction distribution, the matching point uniformity of the SD-ME algorithm is obviously superior to that of the SuperGlue algorithm, and the experiment proves the effectiveness of the SD-ME algorithm in detecting the characteristic sparse region and enhancing the matching, and greatly improves the distribution uniformity of the matching result.

In summary, for the problem that the texture sparse region between the unmanned aerial vehicle image and the satellite image is difficult to match, the matching effect of the SuperGlue algorithm is enhanced by feature sparse region detection and adaptive threshold feature extraction and screening, and the SD-ME algorithm more suitable for matching the unmanned aerial vehicle image and the satellite image is provided. The SD-ME algorithm guides the feature sparse region to carry out feature extraction and self-adaptive score threshold feature screening by detecting the feature sparse region to obtain local features which are sufficient in quantity and uniform in distribution, and finally, reliable and steady matching results are obtained by carrying out matching again through the SuperGlue algorithm. To verify the effectiveness of the SD-ME algorithm, trial tests were performed on 4 representative sets of high-difficulty images. Under the condition that the matching effect of the traditional classical SIFT algorithm and the deep learning ContextDesc algorithm on 4 groups of images is poor, the SD-ME algorithm obtains excellent results on 4 groups of images, compared with the SuperGlue algorithm, the number of correct matching points is remarkably increased and is gathered in a feature sparse area, and the defect that the SuperGlue algorithm is matched in the texture sparse area between the unmanned aerial vehicle image and the satellite image is overcome.

In addition, the SD-ME algorithm has the additional contribution that the SD-ME algorithm can be used for enhancing the matching effect of the SuperGlue algorithm, and the feature sparse region detection algorithm and the adaptive score threshold feature screening algorithm can be suitable for most CNN feature matching algorithms and can be applied to other matching scenes, so that the SD-ME algorithm has high application value.

Claims

1. An image matching enhancement method based on feature sparse area detection is characterized by comprising the following steps of:

1) Obtaining a heterogeneous image pair to be matched, and performing feature extraction and feature matching on the heterogeneous image pair to be matched to obtain an initial matching result of the heterogeneous image pair to be matched;

2. The image matching enhancement method based on feature sparse region detection as claimed in claim 1, wherein the method for performing feature sparse region detection on the selected image in step 2) comprises:

a) Dividing the selected image into at least one detection area;

3. The image matching enhancement method based on feature sparse area detection according to claim 2, wherein the calculation formula of the set splitting number k is:

4. The image matching enhancement method based on feature sparse region detection as claimed in claim 1, wherein the calculation formula of the feature screening threshold θ of each group of feature sparse regions is as follows：

Wherein, C _n And N is the number of the local features independently extracted in each group of feature sparse regions.

5. The image matching enhancement method based on feature sparse region detection as claimed in claim 1, wherein the process of obtaining the local features of each group of feature sparse regions in step 3) is: firstly, carrying out region-by-region feature extraction on each group of feature sparse regions to obtain initial local features of each group of feature sparse regions, and screening out local feature key points and descriptors with the score larger than a feature screening threshold value from the initial local features of each group of feature sparse regions, thereby obtaining the local features of each group of feature sparse regions.

6. The image matching enhancement method based on feature sparse region detection as claimed in claim 1, wherein the manner of aggregating the local features of each group of feature sparse regions to obtain the local features of the global feature sparse region is as follows:

7. The feature sparse region detection-based image matching enhancement method according to claim 2, wherein the feature sparse region detection method for the selected image in step 2) further comprises a step d) of performing area diffusion on the stored detection regions according to the corresponding set distance, and taking the detection regions after area diffusion as the feature sparse regions of the selected image.

8. The image matching enhancement method based on feature sparse region detection according to claim 2, wherein when the selected image is divided into a detection region, after each division, the area calculation formula of each new detection region is as follows:

9. The image matching enhancement method based on the feature sparse area detection according to any one of claims 1 to 8, wherein in the step 1), a SuperPoint feature extraction network model is adopted to perform feature extraction on a heterogeneous image pair to be matched, so as to obtain initial local features of the heterogeneous image pair to be matched; and performing feature matching on the initial local features of the heterogeneous image pair to be matched by adopting a SuperGlue network model to obtain an initial matching result of the heterogeneous image pair to be matched.

10. The image matching enhancement method based on feature sparse area detection according to claim 9, wherein in step 3), local feature extraction is performed on each group of feature sparse areas by using a SuperPoint feature extraction network model, and feature matching is performed on local features of global feature sparse areas by using a SuperGlue network model.