CN114511012A

CN114511012A - SAR image and optical image matching method based on feature matching and position matching

Info

Publication number: CN114511012A
Application number: CN202210067322.2A
Authority: CN
Inventors: 廖赟; 邸一得
Original assignee: Yunnan Lanyi Network Technology Co ltd
Current assignee: Yunnan Lanyi Network Technology Co ltd
Priority date: 2022-01-20
Filing date: 2022-01-20
Publication date: 2022-05-17

Abstract

The invention provides a method for matching an SAR image with an optical image based on feature matching and position matching, which uses a Gaussian difference algorithm to perform primary key point detection on the optical image and the SAR image; extracting surrounding areas according to the detected key points of the optical image and the SAR image, and reconstructing an image block; designing a deep convolutional neural network comprising a dense block and a transition layer, designing a composite loss function, and generating a deep feature descriptor by training and running the deep convolutional neural network; performing feature matching on the optical image and the SAR image by using an L2 distance algorithm and a depth feature descriptor, and evaluating the distance error of a matching point; and realizing the position matching of the SAR image and the optical image by a two-dimensional Gaussian function voting algorithm. The SAR image matching method solves the problem of characteristic matching of the SAR image and the optical image, has better matching capability and accuracy, and can realize position matching of the SAR image and the optical image.

Description

SAR image and optical image matching method based on feature matching and position matching

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an SAR image and optical image matching method based on feature matching and position matching.

Background

In earth observation, optical and Synthetic Aperture Radar (SAR) images can be compared and analyzed, and more valuable information can be obtained through complementation. In the fields of image registration, image fusion, change detection and the like, the feature matching of the SAR image and the optical image is very important. However, since the imaging mechanisms of the optical image and the synthetic aperture radar image are very different, it is difficult to match the characteristics of the optical image and the SAR image. Speckle noise widely exists in SAR images, and can affect the performance of image features, so that the image features are difficult to identify. Furthermore, the distance dependence along the range axis and the nature of the radar signal wavelength cause geometric distortions of the synthetic aperture radar image.

Image matching methods can be divided into three categories: region-based descriptor matching methods, manual feature descriptor matching methods, and learning-based feature descriptor matching methods.

Region-based methods can directly match images at the pixel level through appropriate patch similarity measures. However, visual changes, lighting changes, and image distortions can mislead the similarity metric and the match search. Therefore, these methods are generally applicable only to the following cases: zooming, local deformation, and small-scale rotation.

Experts and scholars typically use existing knowledge to derive and design handcrafted feature descriptors. For non-linear luminance variation, SIFT feature points are unreliable in the calculation of the main direction due to the diversity of gradient statistics around the feature points, which will generate more error matching points, resulting in error registration or failure in registration. Many handmade feature descriptor matching methods have emerged over the last decades, but it is difficult to extract a sufficient number of high quality features from optical and SAR images due to non-linear radiation differences.

Compared with manually compiled descriptors, learning-based feature descriptors can find more valuable information hidden in data and have better performance and feature description capabilities. In many types of images, feature descriptors based on deep learning achieve better results in feature matching than traditional descriptors. However, learning-based feature descriptors also face a number of difficulties. For example, deep learning methods typically require extraction of a large number of features from an image, which typically contain noise and outliers.

Disclosure of Invention

The embodiment of the invention aims to provide a method for matching an SAR image with an optical image based on feature matching and position matching, so as to better solve the problem of feature matching of the SAR image and the optical image, have better matching capability and matching accuracy, and further realize the position matching of the SAR image and the optical image.

In order to solve the technical problem, the technical scheme adopted by the invention is that the SAR image and optical image matching method based on feature matching and position matching comprises the following steps:

s1: carrying out primary key point detection on the optical image and the SAR image by using a Gaussian difference algorithm;

s2: extracting surrounding areas according to the detected key points of the optical image and the SAR image, and reconstructing the surrounding areas into image blocks of 64 x 64 pixels;

s3: designing a deep convolutional neural network comprising a dense block and a transition layer, designing a composite loss function, and generating a deep feature descriptor through training and running the deep convolutional neural network;

s4: performing feature matching on the optical image and the SAR image by using an L2 distance algorithm and a depth feature descriptor, and evaluating the distance error of a matching point;

s5: and realizing the position matching of the SAR image and the optical image by a two-dimensional Gaussian function voting algorithm.

Further, the function of the gaussian difference algorithm in S1 is:

wherein the content of the first and second substances,

and

gaussian filtering representing the two images respectively; x, y are the horizontal and vertical coordinates of the predicted point, σ, respectively₁、σ₂Is the variance of the predicted point, and e is a natural constant.

Further, in S1, the method for preliminary key point detection includes:

in the preliminary key point detection, the gray values of all pixel points in the image need to be detected, and if the DOG value of a pixel is the maximum value or the minimum value of all adjacent pixel points, the DOG value is regarded as the key point.

Further, the specific method for generating the feature descriptor in S3 is as follows: training the deep convolutional neural network through the designed deep convolutional neural network and the loss function, and after the training is finished, sending the image into the trained deep convolutional neural network so as to generate 256-bit feature descriptors;

the reconstructed image blocks in S2 serve as training data of the deep convolutional neural network in S3.

Further, the deep convolutional neural network in S3 is composed of three dense blocks and two transition layers, where the function formula of the dense blocks is:

X_i＝H_i([X₀，X₁，…，X_i-1])

the functional formula of the transition layer is:

X_k＝W_k([X₀″，X₁，…，X_k-1]

X_T＝W_T*[X_0”，X₁，…，X_k]

X_U＝W_U*[X₀′，X_T]

wherein, X_iRepresenting the output of the current layer, Hi () representing the batch normalization, ReLU, pooling and convolution operationsThe complex function is the convolution operator, X_kIs the output of the k-th dense layer, X_TIs the output of the first transition layer, X₀The warp dense block is divided into two parts, denoted as X₀＝[X_0′，X_0″]Wherein X is_0′Is the part which does not enter the dense layer; x₀Is the output of the layer 0 neural network feature, X₁Is the output of the layer 1 neural network feature, X_UFor final output, W represents trainable weights, X_i-1Denotes the output of the previous layer, X_k-1Representing the output of the layer preceding the k-th layer.

Further, the composite loss function in S3 is composed of a Hardl2 loss function and an ArcPatch loss function, where the Hardl2 loss function is:

wherein o is_iDenotes an optical descriptor, s_jA description of the SAR is shown,

representative tables and (o)_i，s_i) The top M SAR descriptor closest in euclidean distance does not match the descriptor,

representative tables and (o)_i，s_i) The top M closest Euclidean distance optical descriptor non-matching, d (o)_i，s_j) Represents o_iAnd s_jM represents M nearest descriptors before the experiment; i 1 … n, j 1 … n, i, j, k are subscripts describing the subset, j ≠ i, k ≠ i;

the ArcPatch loss function is:

where b represents the value of the training batch size,s is a constant and represents the magnification, cos θ_iiDenotes the distance between positive samples, cos θ_ijAnd cos θ_jiRepresenting the distance between the negative samples, ij and ji respectively representing the negative sample conditions when the sample unit vectors of the optical image and the radar image are different, and m is an angle margin;

the composite loss function is:

Loss＝λ₁L_hardl2+λ₂L_ARCpatch

wherein λ is₁＝1,

p represents the number of iterations.

Further, the specific method of feature matching in S4 is to calculate the L2 distances of the optical image and the SAR image through the feature descriptors generated in S3, and if the distance error of the corresponding matching points in the optical image and the SAR image is less than 2 pixels, regard them as a pair of correct matching points;

the functional formula of the L2 distance algorithm is:

wherein o is_iAnd s_jRespectively, an optical descriptor and a SAR descriptor.

Further, the evaluation method in S4 is:

using xrmse to represent the error of the horizontal distance of the matching point, using yrmse to represent the error of the vertical distance of the matching point, and using xyrmse to represent the error of the pixel distance of the matching point; all error measurements are in pixels; the lower the values of the three errors are, the higher the matching precision is; xrmse, yrmse and xyyrmse are calculated as follows:

wherein the content of the first and second substances,

the coordinates of the matching points in the SAR image are represented,

coordinates representing matching points in the optical image; n represents the total number of matching points.

Further, the two-dimensional gaussian function voting algorithm in S5 specifically includes:

first, the variance σ from the predicted point₁、σ₂And mathematical expectation of predicted points mu₁、μ₂Weight W of each candidate position_ijCan be expressed as:

W_ij＝f(x，y) f(x，y)～N(μ₁＝3.5，σ₁＝7，σ₂＝7)

secondly, each candidate position gives a certain weight to the pixel of the optical image through a weight template, and the weights are accumulated for multiple times to obtain a final voting value V_ijThe formula is expressed as:

V_ij＝∑w_ij

finally, V is selected_ijAnd the position coordinate with the maximum value, namely the final result of the position matching of the SAR image and the optical image.

Further, the two-dimensional gaussian function is represented as:

where x and y are the horizontal and vertical coordinates of the predicted point, respectively, σ₁、σ₂Is the variance of the predicted point, μ₁、μ₂Is a mathematical expectation of the predicted point.

The invention has the advantages that

The method solves the problem of characteristic matching of the SAR image and the optical image, has better matching capability and matching accuracy, can further realize position matching of the SAR image and the optical image, and has high practical value.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a method for matching an SAR image with an optical image based on feature matching and position matching according to an embodiment of the present invention.

Fig. 2 is an exemplary diagram of an SAR image and an optical image taken at unequal sizes in accordance with an embodiment of the present invention.

Fig. 3 is an overall architecture diagram of a method for matching SAR images with optical images based on feature matching and position matching according to an embodiment of the present invention.

Fig. 4 is a diagram of a deep neural network architecture according to an embodiment of the present invention.

Fig. 5 is a sampling schematic diagram of the Hardl2 algorithm according to an embodiment of the present invention.

FIG. 6 is a schematic diagram of an ArcPatch loss function of an embodiment of the present invention.

FIG. 7 is a graph of positive and negative samples for different loss functions according to an embodiment of the present invention; wherein (a) is a positive and negative sample histogram for the ArcPatch loss function, (b) is a positive and negative sample histogram for the Hardl2 loss function, and (c) is a positive and negative sample histogram for the Cpatch + Hardl2 loss function.

FIG. 8 is a graph of the match rate for an embodiment of the present invention; where (a) is the false match rate-correct match rate curve generated for the data set without noise added, and (b) is the 1-precision-correct match rate curve generated for the data set without noise added.

FIG. 9 is a graph of the match rate after adding noise to a data set according to an embodiment of the present invention; (a) is a mismatch-correct match rate curve generated from the data set with gaussian noise and salt and pepper noise added, (b) is a 1-precision-correct match rate curve generated from the data set with gaussian noise and salt and pepper noise added.

Fig. 10 is a schematic diagram of candidate coordinates for position matching according to an embodiment of the present invention.

FIG. 11 is a Gaussian weight template and three-dimensional distribution map of an embodiment of the invention; wherein (a) is a Gaussian template map of the embodiment of the present invention, and (b) is a three-dimensional distribution map of (a).

FIG. 12 is a diagram illustrating the principle of a two-dimensional Gaussian function voting algorithm and a function curve according to an embodiment of the present invention; wherein, (a) is a two-dimensional Gaussian function voting algorithm schematic diagram, and (b) is a voting result function curve diagram.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1 to fig. 3, the present embodiment discloses a method for matching SAR images with optical images (MatchosNet) and implements feature matching, which includes the following steps:

s1: and performing preliminary key point detection on the optical image and the SAR image by using a Gaussian difference (DoG) algorithm.

Wherein the function of the difference of gaussians (DoG) algorithm is:

wherein the content of the first and second substances,

and

gaussian filtering for respectively representing the two images; x, y are the horizontal and vertical coordinates of the predicted point, σ, respectively₁、σ₂Is the variance of the predicted point, and e is a natural constant. In the primary key point detection, the gray values of all pixel points in the image are detected. If the DOG value of a pixel is the maximum or minimum of all neighboring pixel points, it can be considered as a keypoint.

S2: and extracting surrounding areas according to the detected key points of the optical image and the SAR image, and reconstructing the surrounding areas into image blocks of 64 x 64 pixels.

The present embodiment extracts a surrounding area from the detected key points of the optical image and the SAR image, and reconstructs it as an image block of 64 × 64 pixels. The reconstructed image blocks can be used as training data of a deep convolutional neural network to solve the problem of size difference of the optical image and the SAR image.

S3: a deep convolutional neural network comprising dense blocks and a cross-stage partial network (transition layer) is designed, a composite loss function is designed, and a deep feature descriptor is generated through training and running of the deep convolutional neural network.

Specifically, S3 includes:

s31: referring to fig. 4 and table 1, the present invention designs a deep convolutional network comprising dense blocks and a cross-phase partial network.

Table 1 deep convolutional network detailed description table

A conventional convolutional neural network simply connects an upper layer and a lower layer, and proposes the current layer as the ith layer, and the input of the current layer is the output of the (i-1) layer (i.e., the upper layer). The formula can be defined as:

X_i＝H_i(X_i-1)

wherein, X_iRepresents the output of the i-th layer, Hi () represents the complex function of batch normalization, ReLU, pooling, convolution, etc.

In recent years, researchers have proposed ResNet (residual error network) and added a skip connection based on bypassing the non-linear transformation. The formula for ResNet can be expressed as:

X_i＝H_i(X_i-1)+X_i-1

subsequently DenseNet (dense connection network) was proposed again and introduced a direct connection from any layer to all subsequent layers. The formula for DenseNet can be expressed as:

X_i＝H_i([X₀，X₁，…，X_i-1])

DenseNet can achieve better results than traditional convolutional networks and ResNet because it more efficiently uses features, enhances feature transmission, reduces gradient disappearance, reduces the number of parameters, and can be used as a dense block.

Recently, another scholars have proposed CSPNet (cross-phase network) and introduced a transition layer to eliminate the computation bottleneck and enhance the learning ability of the convolutional network. The formula for the transition layer of CSPNet can be expressed as:

X_k＝W_k*[X_0″，X₁，...，X_k-1]

X_T＝W_T*[X_0”，X₁，…，X_k]

X_U＝W_U*[X_0’，X_T]

wherein, is the convolution operator, X_kIs the output of the k-th dense layer, X_TIs the output of the first transition layer, X₀The warp dense block is divided into two parts, denoted as X₀＝[X_0′，X_0″]Wherein X is_0′Is a part not entering the dense layerDividing; x₀Is the output of the layer 0 neural network feature, X₁Is the output of the layer 1 neural network feature, X_UFor final output, W represents trainable weights, X_i-1Denotes the output of the previous layer, X_k-1Representing the output of the layer preceding the k-th layer.

CSPNet retains the advantage of the feature reuse feature of DenseNet, while preventing excessive repetition of gradient information by truncating the gradient flow, can act as a transition layer.

Referring to a connection mode of a DenseNet and a CSPNet network, a deep convolutional neural network structure suitable for solving the problem of matching of optical image and SAR image features is designed, and the designed deep convolutional neural network consists of three dense blocks and two transition layers. As shown in table 1, the neural network designed by the present invention receives data of size 64 × 64 × 1 and outputs the final result of size 256 × 256 × 1. Each dense block contains 9 layers, including 6 convolutional layers and 3 tie layers. Each transition layer comprises a convolution layer and an average pooling layer, receives h × w × c (height × width × number of channels) data, and derives

The data of (1). The classification layer comprises a special convolutional layer, the convolutional kernel size of which is 8 × 8, and can convert data with the size of 8 × 8 × 21 into output data with the size of 256 × 256 × 1. Compared with other methods, the network has better feature transfer effect and can generate ideal deep convolution descriptors.

S32, the invention designs a very effective composite loss function through mathematical derivation and practice.

Specifically, S32 includes:

s321, designing a Hardl2 loss function through mathematical derivation and practice.

As shown in fig. 5, the present invention designed the Hardl2 loss function for a total of n pairs of matching samples in a batch. For each positive sample, 2n-1 negative samples are generated, and the first M negative samples with the smallest distance from the positive sample are selected using the L2 distance formula to optimize the model and obtain a strong feature descriptor.

Calculating the size n x n according to the L2 distance formula

Wherein d (o)_i，s_j) Represents o_iAnd s_jA distance between o_iDenotes an optical descriptor, s_jThe SAR descriptor is represented.

Is provided with

representative tables and (o)_i，s_i) The first M optical descriptor non-matching descriptors with the closest Euclidean distance, wherein M represents the first M closest descriptors taken in the experiment; i, j, k are subscripts describing the subset, j ≠ i, k ≠ i, and then the triples are formed from the descriptors:

the goal of the Hardl2 loss function designed by the present invention is to maximize the distances between the matching descriptors and the nearest M non-matching descriptors, which are input into the marginal loss:

wherein L is_Hardl2Representing the Hardl2 loss function.

And S322, designing an ArcPatch loss function through mathematical derivation and practice.

In the classification problem before, a large part uses Softmax (cross entropy loss function) as the loss layer of the network. Experiments show that Softmax loss considers whether samples can be classified correctly, and a large optimization space exists in the problems of expanding the inter-class distance between different samples and reducing the intra-class distance between similar samples.

The ArcFace (face recognition algorithm) method proposes an angular margin loss of the end face. The arcs are more "compact" in convergence than other losses, which compresses the same class of losses into a tighter space and more densely than other losses, so that the features learned by the network have a more pronounced angular distribution.

The ArcFace is a loss function for face recognition, maximizes a classification boundary in an angle space, and has a good effect on processing a classification problem. However, the ArcFace loss function is not applicable to the project because the face matching problem and the key point feature matching problem have a large difference. ArcFace maximizes the classification boundaries, but the feature matching problem addressed by the present invention does not have any classification information.

Therefore, as shown in FIG. 6, the present invention designs a new loss function ArcPatch. Unlike the ArcFace method, ArcPatch does not have a central vector matrix and cannot form an accurate number of classes. A special classification method is designed according to the sample matching condition of the feature matching problem. For each sample batch, a 2 batch-1 class is generated to calculate the loss, including a positive sample match class and a 2 batch-2 negative sample match class.

The ArcPatch algorithm maximizes the distance between the positive and negative examples of feature points in angular space, with the overall formula:

wherein L is_ArcPatchRepresents the ArcPatch loss function, b represents a value for one batch size; s is a constant, representing the magnification factor of each distance, in this example 30; cos θ_iiDenotes the distance between positive samples, cos θ_ijAnd cos θ_jiThe distance between the negative samples is shown, and ij and ji respectively show the negative sample condition when the sample unit vectors of the optical image and the radar image are different. The invention adds an additional angle margin m between the positive sample and the negative sample to enhanceThe compactness between the positive and negative samples is obtained.

And S323, designing a composite loss function by configuring the most appropriate weight.

The invention assigns the most appropriate weights to these two loss functions and combines them to design the composite loss function of MatchosNet. The loss function can be expressed as:

Loss＝λ₁L_，Hardl2+λ₂L_ArcPatch

through a large number of experiments, it is found that increasing the distance difference and then increasing the angle difference is most effective in the training process. Therefore, the present embodiment sets λ₁＝1,

p represents the number of iterations. Furthermore, as shown in fig. 7, the composite loss function has a more pronounced margin between positive and negative samples on the test set than the ArcPatch loss function and the Hardl2 loss function.

The neural network model is trained through a training set by the model and the loss function designed by the method. After the model training is completed, the image is sent to the trained neural network model, so that 256-bit feature descriptors are generated. The feature descriptor has two properties: invariance and discriminative power. Invariance: even if the image is transformed, the descriptor should not change; distinguishing force: the descriptor for each image should be highly unique, with different images having different descriptors.

S4: and performing feature matching on the optical image and the SAR image by using an L2 distance algorithm and a depth feature descriptor.

The present embodiment calculates the L2 distance of the optical image and the SAR image by the feature descriptors generated in S3, and if the distance error of the corresponding matching points in the optical image and the SAR image is less than 2 pixels, regards them as a pair of correct matching points;

the functional formula of the L2 distance algorithm is:

o_iand s_jRespectively, an optical descriptor and a SAR descriptor.

This example compares the effectiveness of the matching method of the present invention (MatchosNet) with three other excellent methods.

SEN1-2 (Earth remote sensing image dataset) was proposed by Schmitt et al in 2017 in The SEN1-2 dataset for deep learning in SAR-optical data fusion. SEN1-2 compared 282384 corresponding image blocks collected from all regions and all weather seasons throughout the world. In this example, 48158 images and 60104 images were used for the summer and winter portions of the SEN1-2 dataset, respectively.

The present embodiment collects a large number of optical and SAR images of chinese regions and collates the corresponding optical and SAR images together. Since the SAR image is much more difficult to acquire in real life than the optical image, the SAR image is randomly segmented into 512 × 512 and the optical image is segmented into 800 × 800 containing the content of the SAR image. Data sets fall into six major categories, port, urban area, water system, airport, island and plains. The total data set had 19.2 million images, 9.6 million optical images and 9.6 million SAR images. The dataset has 16000 optical images and 16000 SAR images per category, respectively.

Mishchuk et al, Working hard to knock green neighbor's markers Local descriptor learning loss, proposed a method of distance between the nearest positive and negative examples (Hardnet). The penalty they propose in this paper is superior to the complex regularization method, maximizing the distance between the nearest positive and nearest negative samples in a batch.

Balntas et al, implemented the triple feature descriptor method (TFeat) in left local features descriptors with triplets and short connected neural networks, and proposed the use of triplets of training samples, and triple mining of difficult negative samples. Experiments show that compared with other methods, the method has good effect, the complexity of the network structure of the model is low, and no typical calculation out-of-tolerance exists.

A patch matching network (MatchNet) is proposed by Han et al in MatchNet: Unifying feature and metallic learning for patch-based matching. As a new method for a deep network structure based on patch matching, the result is significantly improved by using fewer descriptors than other methods. Experiments prove that MatchNet has strong competitiveness compared with other similar methods. The authors believe that MatchNet works best without the use of a fully connected layer. Therefore, in this embodiment, the MatchNet without the full connection layer is adopted in the comparative experiment, so that the comparative experiment is objective and fair.

Specifically, S4 includes:

and S41, classifying the optical image and the SAR image.

In the test data set, the optical image and the SAR image are completely scrambled, and the classification experiment of the step can prove that the MatchosNet has the classification capability of one-to-one correspondence of the optical image and the SAR image.

TABLE 2 Classification Performance of the present invention (MatchosNet) against other methods on SEN1-2 SUMMER dataset

Model (model)	AUC index (%)	FPR80 index (%)
			Hardnet	0.9881	0.00075
TFeat	0.9859	0.10
			MatchNet	0.9625	0.05813
MatchosNet	0.9899	0.0001

TABLE 3 Classification Performance of the invention (MatchosNet) in comparison to other methods on the SARptic dataset

Model (model)	AUC index (%)	FPR80 index (%)
			Hardnet	0.9575	0.0445
TFeat	0.9004	0.1647
			MatchNet	0.9010	0.1594
MatchosNet	0.9810	0.0168

Indicators of area under the curve (AUC) and FPR80 (false positive rate of 0.80 true positive recall point) are shown in tables 2 and 3. The ideal value of the area under the curve is 1. The larger the AUC, the better the network performance. For FPR80, the smaller the number, the better the network performance. The ideal value of FPR80 is 0. Table 2 shows the performance of the different methods from SEN1-2 SUMMER dataset (earth remote sensing SUMMER image dataset). Table 3 shows the performance of different methods from the sarpitical dataset (three-dimensional dataset of dense urban SAR combined with optical image analysis). As shown in tables 2 and 3, the AUC values of MatchosNet were highest and the FPR80 value was lowest in both data sets. It can prove that MatchosNet is very competitive in classification capacity.

And S42, realizing feature matching of the optical image and the SAR image, and evaluating the matching result.

In the feature matching test, the present invention evaluated the performance of different methods of training using different training datasets (the SEN1-2 SUMMER dataset, the SEN1-2 WINTER dataset (the Earth remote sensing WINTER image dataset) and the dataset we collected) to determine if the two patches correspond to each other. The four methods use the same data set and train in the same size batch on the same server to ensure the objectivity and fairness of the experiment. In experimental tests, if the distance error of the corresponding matching points in the optical image and the SAR image is less than 2 pixels, they are considered as a pair of correct matching points.

The method utilizes various data sets such as Gaussian noise, salt and pepper noise and the like to calculate the final registration result of the SAR optics. In the experiment, a criterion is calculated according to the number of true and false matching times obtained by each pair of images by calculating the matching result of the optical image and the SAR image. It is assumed that the detected key points a and B, and their descriptors DA and DB, are selected from the reference image and the target image, respectively. A and B are a true match if the distance between descriptors DA and DB is less than a threshold T while A and B are verified as a correct match by the true value tag (communication area data). If A and B are not the correct matches confirmed by the true value tags, but the distance between the descriptors DA and DB is less than T, then A and B are false matches, and vice versa.

The formulas of the true matching rate, the false matching rate and the precision algorithm are as follows:

for any precision, the maximum recall of a descriptor is 1. That is, the closer the curve (mismatch rate-correct match rate) and the curve (1-precision-correct match rate) are to the top and left, the more efficient the algorithm is. As shown in fig. 8, (a) a (false match rate-correct match rate) curve generated for a data set without added noise; (b) (1-accuracy-correct match rate) curve generated for the data set without added noise. As can be seen from fig. 8, the MatchosNet method performs best on both measurement metrics.

As shown in fig. 9, (a) a (false match rate-correct match rate) curve generated for the data set with gaussian noise and salt and pepper noise added; (b) a (1-accuracy-correct match rate) curve is generated for the dataset with gaussian noise and salt and pepper noise added. It can be derived from fig. 9 that the MatchosNet method performs best on both of these measurement metrics, even on complex datasets after noise addition, further demonstrating the superiority and robustness of the MatchosNet method.

And S43, matching the characteristic points of the optical image and the SAR image, and evaluating the distance error of the matched points.

In order to verify the accuracy of the matching points, the distance error of the matching points is detected, namely on an image, xrmse represents the error of the horizontal distance of the matching points, yrmse represents the error of the vertical distance of the matching points, and xyrmse represents the error of the pixel distance of the matching points. All error measurements are in pixels. The functions of xrmse, yrmse and xyyrmse are calculated as follows:

wherein the content of the first and second substances,

the coordinates of the matching points in the SAR image are represented,

representing the coordinates of the matching points in the optical image. N represents the total number of matching points.

Table 4 SEN1-2 — average error xrmse, xyrmse for correct matching points for 4 methods on the SUMMER dataset.

Model (model)	xrmse	yrmse	xyrmse
				Hardnet	1.3044	0.6793	0.7870
TFeat	1.3383	0.7616	0.7537
				MatchNet	1.3403	0.7452	0.7714
MatchosNet	1.3033	0.7432	0.7415

TABLE 5 SEN1-2_ WINTER data set average error xrmse, yrmse, xyrmse of correct match points for 4 methods.

Table 6 mean errors xrmse, yrmse, xyrmse of correct matching points for 4 methods on the data set collected by the present invention.

Model (model)	xrmse	yrmse	xyrmse
				Hardnet	1.3152	0.7603	0.7368
TFeat	1.2838	0.7455	0.7489
				MatchNet	1.3371	0.7719	0.7444
MatchosNet	1.3130	0.7656	0.7281

Table 4, table 5 and table 6 show the average error xrmse, yrmse, xyyrmse for different methods in the three datasets, respectively. And the xyrmse represents the average pixel error between matching points and can best reflect the accuracy and effect of matching. It is clear that MatchosNet has the lowest average error xyrmse in the three data sets, and that xrmse and yrmse are also excellent. This indicates that MatchosNet has high feature matching point accuracy and strong feature matching capability.

Example 2

The embodiment discloses a method for matching an SAR image with an optical image (MatchosNet) and realizes position matching, which comprises the following steps:

A. and designing a voting algorithm of two-dimensional Gaussian distribution to realize the position matching of the SAR image and the optical image.

And the coordinates of the upper left pixel of the SAR image on the optical image can be obtained by matching each pair of feature points. As shown in fig. 10, since each group of images has many different feature matching points, it is possible to obtain a plurality of candidate position coordinates.

And a position matching voting algorithm is designed by adopting two-dimensional Gaussian distribution. x and y are respectively the horizontal coordinate and the vertical coordinate of the predicted point, and the two-dimensional Gaussian function of the invention can be expressed as:

wherein σ₁、σ₂Is the variance of the predicted point, μ₁、μ₂Is a mathematical expectation of the predicted point.

As shown in fig. 11 (a), this embodiment designs a gaussian weight template with a size of 7 × 7, and the three-dimensional distribution of the gaussian weight template is shown in fig. 11 (b). Setting mu₁＝3.5，μ₂＝3.5，σ₁＝7，σ₂7. Weight W for each location_ijCan be expressed as:

W_ij＝f(x，y)f(x，y)～N(μ₁＝３.5，μ₂＝3.5，σ₁＝7，σ₂＝7)

as shown in fig. 12 (a), each candidate position may be weighted by the weight template to obtain a final vote value V after multiple rounds of accumulation_ij. The formula can be expressed as:

V_ij＝∑w_ij

the distribution of the function can be roughly represented in (b) of fig. 12. Finally, the invention selects V_ijThe position coordinate with the maximum value, which is the final result of the matching of the SAR image and the optical image position.

B. The effect of position matching is objectively proved through a large number of experiments.

Table 7 mean error xrmse, yrmse, xyrmse and number of correct position matches for the MatchosNet images on different datasets.

As far as we know, other feature matching methods do not achieve the position matching of images with different sizes, so that the embodiment can only use objective indexes to analyze the position matching effect of the method provided by the invention. Table 7 shows the position matching error of MatchosNet and the number of images that matched correctly in a batch of different datasets. In the position matching of the SAR image and the optical image, when the distance error of the image position matching is less than 5.0 pixels, it is regarded as a correct position-matched image.

As can be seen from table 7, when MatchosNet performs position matching, the distance error between the SAR image and the optical image is small, and the number of correctly matched images is large. The result shows that MatchosNet can well realize the position matching between the SAR image and the optical image, and has a larger practical application value.

Example 3

In this embodiment, different deep convolutional network structures are respectively adopted to perform experiments on the feature matching and position matching methods in the first embodiment and the second embodiment, thereby verifying the performance of the network structure designed by the present invention.

Table 8 mean errors xrmse, yrmse, xyrmse and number of correctly position matched pictures for the method using different depth convolutional networks.

By comparing results of different models, the influence of the network structure on the characteristic detection model and the effectiveness of the MatchosNet structure designed by the invention are verified. Table 8 shows the results of position matching of MatchosNet with the other two methods. It can be seen that the Xrmse, YRmmse and XYrmse of MatchosNet are the lowest, and the number of correctly position-matched images in the batch is the highest. The above experiments prove that the network architecture designed by the MatchosNet of the present invention is very effective in handling the problems of feature matching and location matching.

Example 4

In this embodiment, different loss functions are respectively adopted to perform experiments on the feature matching and position matching methods in the first embodiment and the second embodiment, so as to verify the performance of the loss function designed by the present invention.

Table 9 mean errors xrmse, yrmse, xyrmse and number of correctly position matched pictures for the method using different loss functions.

By comparing the results of different models, the influence of the loss function on the feature detection model is verified. Table 9 shows the results of position matching of MatchosNet with the other two methods. It can be seen that the Xrmse, YRmmse and XYrmse of MatchosNet are the lowest, and the number of correctly position-matched images in the batch is the highest. The above experiments show that the loss function designed by the MatchosNet of the present invention is very effective in dealing with the problems of feature matching and location matching.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A SAR image and optical image matching method based on feature matching and position matching is characterized by comprising the following steps:

2. The method for matching a SAR image and an optical image based on feature matching and position matching according to claim 1, wherein a function of the gaussian difference algorithm in S1 is:

wherein the content of the first and second substances,

and

3. The method for matching the SAR image and the optical image based on feature matching and location matching according to claim 1, wherein in S1, the method for preliminary keypoint detection is:

4. The method for matching the SAR image and the optical image based on feature matching and position matching according to claim 1, wherein the specific method for generating the feature descriptor in S3 is as follows: training the deep convolutional neural network through the designed deep convolutional neural network and the loss function, and after the training is finished, sending the image into the trained deep convolutional neural network so as to generate 256-bit feature descriptors;

5. The method for matching the SAR image and the optical image based on the feature matching and the position matching as claimed in claim 1 or 4, wherein the deep convolutional neural network in S3 is composed of three dense blocks and two transition layers, wherein the function formula of the dense blocks is as follows:

X_i＝H_i([X₀,X₁,…,X_i-1])

the functional formula of the transition layer is:

X_k＝W_k*[X_0”，X₁，…，X_k-1]

X_T＝W_T*[X_0”，X₁，…，X_k]

X_U＝W_U*[X_0’，X_T]

wherein, X_iRepresents the output of the current layer, Hi () represents the complex function of the batch normalization, ReLU, pooling and convolution operations, a convolution operator, X_kIs the output of the k-th dense layer, X_TIs the output of the first transition layer, X₀The warp density block is divided into two parts, tableShown as X₀＝[X_0′，X_0″]Wherein X is_0′Is the part which does not enter the dense layer; x₀Is the output of the layer 0 neural network feature, X₁Is the output of the layer 1 neural network feature, X_UFor final output, W represents trainable weights, X_i-1Denotes the output of the previous layer, X_k-1Representing the output of the layer preceding the k-th layer.

6. The method for matching an SAR image and an optical image based on feature matching and position matching according to claim 1, wherein the composite loss function in S3 is composed of a Hardl2 loss function and an ArcPatch loss function, wherein the Hardl2 loss function is:

wherein o is_iDenotes an optical descriptor, s_jDenotes the SAR descriptor, S_jminRepresentative tables and (o)_i，s_i) The top M SAR descriptor closest in euclidean distance does not match the descriptor,

representative tables and (o)_i，s_i) The top M closest Euclidean distance optical descriptor non-matching, d (o)_i，s_j) Represents o_iAnd s_jM represents the top M nearest descriptors of the experiment; i 1 … n, j 1 … n, i, j, k are subscripts describing the subset, j ≠ i, k ≠ i;

the ArcPatch loss function is:

where b represents the value of the training batch size, s is a constant, represents the magnification factor, cos θ_iiDenotes the distance between positive samples, cosθ_ijAnd cos θ_jiRepresenting the distance between the negative samples, ij and ji respectively representing the negative sample conditions when the sample unit vectors of the optical image and the radar image are different, and m is an angle margin;

the composite loss function is:

Loss＝λ₁L_hardl2+λ₂L_ARCpatch

wherein λ is₁＝1，

p represents the number of iterations.

7. The method for matching the SAR image and the optical image based on feature matching and position matching according to claim 1, wherein the specific method of feature matching in S4 is to calculate the L2 distance of the optical image and the SAR image through the feature descriptors generated in S3, and if the distance error of the corresponding matching points in the optical image and the SAR image is less than 2 pixels, regard them as a pair of correct matching points;

the functional formula of the L2 distance algorithm is:

wherein o is_iAnd s_jRespectively, an optical descriptor and a SAR descriptor.

8. The SAR image and optical image matching method based on feature matching and position matching as claimed in claim 1, wherein the evaluation method in S4 is:

wherein the content of the first and second substances,

the coordinates of the matching points in the SAR image are represented,

9. The method for matching the SAR image and the optical image based on the feature matching and the position matching as claimed in claim 1, wherein the two-dimensional gaussian function voting algorithm in S5 is specifically:

W_ij＝f(x，y) f(x，y)～N(μ₁＝3.5，μ₂＝3.5，σ₁＝7，σ₂＝7)

V_ij＝∑w_ij

finally, V is selected_ijPosition coordinates with maximum value, i.e. SAR image and optical image positionAnd setting the final result of matching.

10. The method for matching the SAR image and the optical image based on feature matching and position matching according to claim 9, wherein the two-dimensional gaussian function is represented as:

where x and y are the horizontal and vertical coordinates of the predicted point, σ, respectively₁、σ₂Is the variance of the predicted point, μ₁、μ₂Is a mathematical expectation of the predicted point.