CN114511012A - SAR image and optical image matching method based on feature matching and position matching - Google Patents

SAR image and optical image matching method based on feature matching and position matching Download PDF

Info

Publication number
CN114511012A
CN114511012A CN202210067322.2A CN202210067322A CN114511012A CN 114511012 A CN114511012 A CN 114511012A CN 202210067322 A CN202210067322 A CN 202210067322A CN 114511012 A CN114511012 A CN 114511012A
Authority
CN
China
Prior art keywords
matching
image
optical image
feature
sar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210067322.2A
Other languages
Chinese (zh)
Inventor
廖赟
邸一得
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan Lanyi Network Technology Co ltd
Original Assignee
Yunnan Lanyi Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan Lanyi Network Technology Co ltd filed Critical Yunnan Lanyi Network Technology Co ltd
Priority to CN202210067322.2A priority Critical patent/CN114511012A/en
Publication of CN114511012A publication Critical patent/CN114511012A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method for matching an SAR image with an optical image based on feature matching and position matching, which uses a Gaussian difference algorithm to perform primary key point detection on the optical image and the SAR image; extracting surrounding areas according to the detected key points of the optical image and the SAR image, and reconstructing an image block; designing a deep convolutional neural network comprising a dense block and a transition layer, designing a composite loss function, and generating a deep feature descriptor by training and running the deep convolutional neural network; performing feature matching on the optical image and the SAR image by using an L2 distance algorithm and a depth feature descriptor, and evaluating the distance error of a matching point; and realizing the position matching of the SAR image and the optical image by a two-dimensional Gaussian function voting algorithm. The SAR image matching method solves the problem of characteristic matching of the SAR image and the optical image, has better matching capability and accuracy, and can realize position matching of the SAR image and the optical image.

Description

SAR image and optical image matching method based on feature matching and position matching
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to an SAR image and optical image matching method based on feature matching and position matching.
Background
In earth observation, optical and Synthetic Aperture Radar (SAR) images can be compared and analyzed, and more valuable information can be obtained through complementation. In the fields of image registration, image fusion, change detection and the like, the feature matching of the SAR image and the optical image is very important. However, since the imaging mechanisms of the optical image and the synthetic aperture radar image are very different, it is difficult to match the characteristics of the optical image and the SAR image. Speckle noise widely exists in SAR images, and can affect the performance of image features, so that the image features are difficult to identify. Furthermore, the distance dependence along the range axis and the nature of the radar signal wavelength cause geometric distortions of the synthetic aperture radar image.
Image matching methods can be divided into three categories: region-based descriptor matching methods, manual feature descriptor matching methods, and learning-based feature descriptor matching methods.
Region-based methods can directly match images at the pixel level through appropriate patch similarity measures. However, visual changes, lighting changes, and image distortions can mislead the similarity metric and the match search. Therefore, these methods are generally applicable only to the following cases: zooming, local deformation, and small-scale rotation.
Experts and scholars typically use existing knowledge to derive and design handcrafted feature descriptors. For non-linear luminance variation, SIFT feature points are unreliable in the calculation of the main direction due to the diversity of gradient statistics around the feature points, which will generate more error matching points, resulting in error registration or failure in registration. Many handmade feature descriptor matching methods have emerged over the last decades, but it is difficult to extract a sufficient number of high quality features from optical and SAR images due to non-linear radiation differences.
Compared with manually compiled descriptors, learning-based feature descriptors can find more valuable information hidden in data and have better performance and feature description capabilities. In many types of images, feature descriptors based on deep learning achieve better results in feature matching than traditional descriptors. However, learning-based feature descriptors also face a number of difficulties. For example, deep learning methods typically require extraction of a large number of features from an image, which typically contain noise and outliers.
Disclosure of Invention
The embodiment of the invention aims to provide a method for matching an SAR image with an optical image based on feature matching and position matching, so as to better solve the problem of feature matching of the SAR image and the optical image, have better matching capability and matching accuracy, and further realize the position matching of the SAR image and the optical image.
In order to solve the technical problem, the technical scheme adopted by the invention is that the SAR image and optical image matching method based on feature matching and position matching comprises the following steps:
s1: carrying out primary key point detection on the optical image and the SAR image by using a Gaussian difference algorithm;
s2: extracting surrounding areas according to the detected key points of the optical image and the SAR image, and reconstructing the surrounding areas into image blocks of 64 x 64 pixels;
s3: designing a deep convolutional neural network comprising a dense block and a transition layer, designing a composite loss function, and generating a deep feature descriptor through training and running the deep convolutional neural network;
s4: performing feature matching on the optical image and the SAR image by using an L2 distance algorithm and a depth feature descriptor, and evaluating the distance error of a matching point;
s5: and realizing the position matching of the SAR image and the optical image by a two-dimensional Gaussian function voting algorithm.
Further, the function of the gaussian difference algorithm in S1 is:
Figure BDA0003480718970000021
wherein the content of the first and second substances,
Figure BDA0003480718970000022
and
Figure BDA0003480718970000023
gaussian filtering representing the two images respectively; x, y are the horizontal and vertical coordinates of the predicted point, σ, respectively1、σ2Is the variance of the predicted point, and e is a natural constant.
Further, in S1, the method for preliminary key point detection includes:
in the preliminary key point detection, the gray values of all pixel points in the image need to be detected, and if the DOG value of a pixel is the maximum value or the minimum value of all adjacent pixel points, the DOG value is regarded as the key point.
Further, the specific method for generating the feature descriptor in S3 is as follows: training the deep convolutional neural network through the designed deep convolutional neural network and the loss function, and after the training is finished, sending the image into the trained deep convolutional neural network so as to generate 256-bit feature descriptors;
the reconstructed image blocks in S2 serve as training data of the deep convolutional neural network in S3.
Further, the deep convolutional neural network in S3 is composed of three dense blocks and two transition layers, where the function formula of the dense blocks is:
Xi=Hi([X0,X1,…,Xi-1])
the functional formula of the transition layer is:
Xk=Wk([X0″,X1,…,Xk-1]
XT=WT*[X0”,X1,…,Xk]
XU=WU*[X0′,XT]
wherein, XiRepresenting the output of the current layer, Hi () representing the batch normalization, ReLU, pooling and convolution operationsThe complex function is the convolution operator, XkIs the output of the k-th dense layer, XTIs the output of the first transition layer, X0The warp dense block is divided into two parts, denoted as X0=[X0′,X0″]Wherein X is0′Is the part which does not enter the dense layer; x0Is the output of the layer 0 neural network feature, X1Is the output of the layer 1 neural network feature, XUFor final output, W represents trainable weights, Xi-1Denotes the output of the previous layer, Xk-1Representing the output of the layer preceding the k-th layer.
Further, the composite loss function in S3 is composed of a Hardl2 loss function and an ArcPatch loss function, where the Hardl2 loss function is:
Figure BDA0003480718970000031
wherein o isiDenotes an optical descriptor, sjA description of the SAR is shown,
Figure BDA0003480718970000032
representative tables and (o)i,si) The top M SAR descriptor closest in euclidean distance does not match the descriptor,
Figure BDA0003480718970000033
representative tables and (o)i,si) The top M closest Euclidean distance optical descriptor non-matching, d (o)i,sj) Represents oiAnd sjM represents M nearest descriptors before the experiment; i 1 … n, j 1 … n, i, j, k are subscripts describing the subset, j ≠ i, k ≠ i;
the ArcPatch loss function is:
Figure BDA0003480718970000034
where b represents the value of the training batch size,s is a constant and represents the magnification, cos θiiDenotes the distance between positive samples, cos θijAnd cos θjiRepresenting the distance between the negative samples, ij and ji respectively representing the negative sample conditions when the sample unit vectors of the optical image and the radar image are different, and m is an angle margin;
the composite loss function is:
Loss=λ1Lhardl22LARCpatch
wherein λ is1=1,
Figure BDA0003480718970000035
p represents the number of iterations.
Further, the specific method of feature matching in S4 is to calculate the L2 distances of the optical image and the SAR image through the feature descriptors generated in S3, and if the distance error of the corresponding matching points in the optical image and the SAR image is less than 2 pixels, regard them as a pair of correct matching points;
the functional formula of the L2 distance algorithm is:
Figure BDA0003480718970000036
wherein o isiAnd sjRespectively, an optical descriptor and a SAR descriptor.
Further, the evaluation method in S4 is:
using xrmse to represent the error of the horizontal distance of the matching point, using yrmse to represent the error of the vertical distance of the matching point, and using xyrmse to represent the error of the pixel distance of the matching point; all error measurements are in pixels; the lower the values of the three errors are, the higher the matching precision is; xrmse, yrmse and xyyrmse are calculated as follows:
Figure BDA0003480718970000041
Figure BDA0003480718970000042
Figure BDA0003480718970000043
wherein the content of the first and second substances,
Figure BDA0003480718970000044
the coordinates of the matching points in the SAR image are represented,
Figure BDA0003480718970000045
coordinates representing matching points in the optical image; n represents the total number of matching points.
Further, the two-dimensional gaussian function voting algorithm in S5 specifically includes:
first, the variance σ from the predicted point1、σ2And mathematical expectation of predicted points mu1、μ2Weight W of each candidate positionijCan be expressed as:
Wij=f(x,y) f(x,y)~N(μ1=3.5,σ1=7,σ2=7)
secondly, each candidate position gives a certain weight to the pixel of the optical image through a weight template, and the weights are accumulated for multiple times to obtain a final voting value VijThe formula is expressed as:
Vij=∑wij
finally, V is selectedijAnd the position coordinate with the maximum value, namely the final result of the position matching of the SAR image and the optical image.
Further, the two-dimensional gaussian function is represented as:
Figure BDA0003480718970000046
where x and y are the horizontal and vertical coordinates of the predicted point, respectively, σ1、σ2Is the variance of the predicted point, μ1、μ2Is a mathematical expectation of the predicted point.
The invention has the advantages that
The method solves the problem of characteristic matching of the SAR image and the optical image, has better matching capability and matching accuracy, can further realize position matching of the SAR image and the optical image, and has high practical value.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for matching an SAR image with an optical image based on feature matching and position matching according to an embodiment of the present invention.
Fig. 2 is an exemplary diagram of an SAR image and an optical image taken at unequal sizes in accordance with an embodiment of the present invention.
Fig. 3 is an overall architecture diagram of a method for matching SAR images with optical images based on feature matching and position matching according to an embodiment of the present invention.
Fig. 4 is a diagram of a deep neural network architecture according to an embodiment of the present invention.
Fig. 5 is a sampling schematic diagram of the Hardl2 algorithm according to an embodiment of the present invention.
FIG. 6 is a schematic diagram of an ArcPatch loss function of an embodiment of the present invention.
FIG. 7 is a graph of positive and negative samples for different loss functions according to an embodiment of the present invention; wherein (a) is a positive and negative sample histogram for the ArcPatch loss function, (b) is a positive and negative sample histogram for the Hardl2 loss function, and (c) is a positive and negative sample histogram for the Cpatch + Hardl2 loss function.
FIG. 8 is a graph of the match rate for an embodiment of the present invention; where (a) is the false match rate-correct match rate curve generated for the data set without noise added, and (b) is the 1-precision-correct match rate curve generated for the data set without noise added.
FIG. 9 is a graph of the match rate after adding noise to a data set according to an embodiment of the present invention; (a) is a mismatch-correct match rate curve generated from the data set with gaussian noise and salt and pepper noise added, (b) is a 1-precision-correct match rate curve generated from the data set with gaussian noise and salt and pepper noise added.
Fig. 10 is a schematic diagram of candidate coordinates for position matching according to an embodiment of the present invention.
FIG. 11 is a Gaussian weight template and three-dimensional distribution map of an embodiment of the invention; wherein (a) is a Gaussian template map of the embodiment of the present invention, and (b) is a three-dimensional distribution map of (a).
FIG. 12 is a diagram illustrating the principle of a two-dimensional Gaussian function voting algorithm and a function curve according to an embodiment of the present invention; wherein, (a) is a two-dimensional Gaussian function voting algorithm schematic diagram, and (b) is a voting result function curve diagram.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1 to fig. 3, the present embodiment discloses a method for matching SAR images with optical images (MatchosNet) and implements feature matching, which includes the following steps:
s1: and performing preliminary key point detection on the optical image and the SAR image by using a Gaussian difference (DoG) algorithm.
Wherein the function of the difference of gaussians (DoG) algorithm is:
Figure BDA0003480718970000061
wherein the content of the first and second substances,
Figure BDA0003480718970000062
and
Figure BDA0003480718970000063
gaussian filtering for respectively representing the two images; x, y are the horizontal and vertical coordinates of the predicted point, σ, respectively1、σ2Is the variance of the predicted point, and e is a natural constant. In the primary key point detection, the gray values of all pixel points in the image are detected. If the DOG value of a pixel is the maximum or minimum of all neighboring pixel points, it can be considered as a keypoint.
S2: and extracting surrounding areas according to the detected key points of the optical image and the SAR image, and reconstructing the surrounding areas into image blocks of 64 x 64 pixels.
The present embodiment extracts a surrounding area from the detected key points of the optical image and the SAR image, and reconstructs it as an image block of 64 × 64 pixels. The reconstructed image blocks can be used as training data of a deep convolutional neural network to solve the problem of size difference of the optical image and the SAR image.
S3: a deep convolutional neural network comprising dense blocks and a cross-stage partial network (transition layer) is designed, a composite loss function is designed, and a deep feature descriptor is generated through training and running of the deep convolutional neural network.
Specifically, S3 includes:
s31: referring to fig. 4 and table 1, the present invention designs a deep convolutional network comprising dense blocks and a cross-phase partial network.
Table 1 deep convolutional network detailed description table
Figure BDA0003480718970000064
Figure BDA0003480718970000071
A conventional convolutional neural network simply connects an upper layer and a lower layer, and proposes the current layer as the ith layer, and the input of the current layer is the output of the (i-1) layer (i.e., the upper layer). The formula can be defined as:
Xi=Hi(Xi-1)
wherein, XiRepresents the output of the i-th layer, Hi () represents the complex function of batch normalization, ReLU, pooling, convolution, etc.
In recent years, researchers have proposed ResNet (residual error network) and added a skip connection based on bypassing the non-linear transformation. The formula for ResNet can be expressed as:
Xi=Hi(Xi-1)+Xi-1
subsequently DenseNet (dense connection network) was proposed again and introduced a direct connection from any layer to all subsequent layers. The formula for DenseNet can be expressed as:
Xi=Hi([X0,X1,…,Xi-1])
DenseNet can achieve better results than traditional convolutional networks and ResNet because it more efficiently uses features, enhances feature transmission, reduces gradient disappearance, reduces the number of parameters, and can be used as a dense block.
Recently, another scholars have proposed CSPNet (cross-phase network) and introduced a transition layer to eliminate the computation bottleneck and enhance the learning ability of the convolutional network. The formula for the transition layer of CSPNet can be expressed as:
Xk=Wk*[X0″,X1,...,Xk-1]
XT=WT*[X0”,X1,…,Xk]
XU=WU*[X0’,XT]
wherein, is the convolution operator, XkIs the output of the k-th dense layer, XTIs the output of the first transition layer, X0The warp dense block is divided into two parts, denoted as X0=[X0′,X0″]Wherein X is0′Is a part not entering the dense layerDividing; x0Is the output of the layer 0 neural network feature, X1Is the output of the layer 1 neural network feature, XUFor final output, W represents trainable weights, Xi-1Denotes the output of the previous layer, Xk-1Representing the output of the layer preceding the k-th layer.
CSPNet retains the advantage of the feature reuse feature of DenseNet, while preventing excessive repetition of gradient information by truncating the gradient flow, can act as a transition layer.
Referring to a connection mode of a DenseNet and a CSPNet network, a deep convolutional neural network structure suitable for solving the problem of matching of optical image and SAR image features is designed, and the designed deep convolutional neural network consists of three dense blocks and two transition layers. As shown in table 1, the neural network designed by the present invention receives data of size 64 × 64 × 1 and outputs the final result of size 256 × 256 × 1. Each dense block contains 9 layers, including 6 convolutional layers and 3 tie layers. Each transition layer comprises a convolution layer and an average pooling layer, receives h × w × c (height × width × number of channels) data, and derives
Figure BDA0003480718970000081
The data of (1). The classification layer comprises a special convolutional layer, the convolutional kernel size of which is 8 × 8, and can convert data with the size of 8 × 8 × 21 into output data with the size of 256 × 256 × 1. Compared with other methods, the network has better feature transfer effect and can generate ideal deep convolution descriptors.
S32, the invention designs a very effective composite loss function through mathematical derivation and practice.
Specifically, S32 includes:
s321, designing a Hardl2 loss function through mathematical derivation and practice.
As shown in fig. 5, the present invention designed the Hardl2 loss function for a total of n pairs of matching samples in a batch. For each positive sample, 2n-1 negative samples are generated, and the first M negative samples with the smallest distance from the positive sample are selected using the L2 distance formula to optimize the model and obtain a strong feature descriptor.
Calculating the size n x n according to the L2 distance formula
Figure BDA0003480718970000082
Wherein d (o)i,sj) Represents oiAnd sjA distance between oiDenotes an optical descriptor, sjThe SAR descriptor is represented.
Is provided with
Figure BDA0003480718970000083
Representative tables and (o)i,si) The top M SAR descriptor closest in euclidean distance does not match the descriptor,
Figure BDA0003480718970000084
representative tables and (o)i,si) The first M optical descriptor non-matching descriptors with the closest Euclidean distance, wherein M represents the first M closest descriptors taken in the experiment; i, j, k are subscripts describing the subset, j ≠ i, k ≠ i, and then the triples are formed from the descriptors:
Figure BDA0003480718970000091
the goal of the Hardl2 loss function designed by the present invention is to maximize the distances between the matching descriptors and the nearest M non-matching descriptors, which are input into the marginal loss:
Figure BDA0003480718970000092
wherein L isHardl2Representing the Hardl2 loss function.
And S322, designing an ArcPatch loss function through mathematical derivation and practice.
In the classification problem before, a large part uses Softmax (cross entropy loss function) as the loss layer of the network. Experiments show that Softmax loss considers whether samples can be classified correctly, and a large optimization space exists in the problems of expanding the inter-class distance between different samples and reducing the intra-class distance between similar samples.
The ArcFace (face recognition algorithm) method proposes an angular margin loss of the end face. The arcs are more "compact" in convergence than other losses, which compresses the same class of losses into a tighter space and more densely than other losses, so that the features learned by the network have a more pronounced angular distribution.
The ArcFace is a loss function for face recognition, maximizes a classification boundary in an angle space, and has a good effect on processing a classification problem. However, the ArcFace loss function is not applicable to the project because the face matching problem and the key point feature matching problem have a large difference. ArcFace maximizes the classification boundaries, but the feature matching problem addressed by the present invention does not have any classification information.
Therefore, as shown in FIG. 6, the present invention designs a new loss function ArcPatch. Unlike the ArcFace method, ArcPatch does not have a central vector matrix and cannot form an accurate number of classes. A special classification method is designed according to the sample matching condition of the feature matching problem. For each sample batch, a 2 batch-1 class is generated to calculate the loss, including a positive sample match class and a 2 batch-2 negative sample match class.
The ArcPatch algorithm maximizes the distance between the positive and negative examples of feature points in angular space, with the overall formula:
Figure BDA0003480718970000093
wherein L isArcPatchRepresents the ArcPatch loss function, b represents a value for one batch size; s is a constant, representing the magnification factor of each distance, in this example 30; cos θiiDenotes the distance between positive samples, cos θijAnd cos θjiThe distance between the negative samples is shown, and ij and ji respectively show the negative sample condition when the sample unit vectors of the optical image and the radar image are different. The invention adds an additional angle margin m between the positive sample and the negative sample to enhanceThe compactness between the positive and negative samples is obtained.
And S323, designing a composite loss function by configuring the most appropriate weight.
The invention assigns the most appropriate weights to these two loss functions and combines them to design the composite loss function of MatchosNet. The loss function can be expressed as:
Loss=λ1L,Hardl22LArcPatch
through a large number of experiments, it is found that increasing the distance difference and then increasing the angle difference is most effective in the training process. Therefore, the present embodiment sets λ1=1,
Figure BDA0003480718970000101
p represents the number of iterations. Furthermore, as shown in fig. 7, the composite loss function has a more pronounced margin between positive and negative samples on the test set than the ArcPatch loss function and the Hardl2 loss function.
The neural network model is trained through a training set by the model and the loss function designed by the method. After the model training is completed, the image is sent to the trained neural network model, so that 256-bit feature descriptors are generated. The feature descriptor has two properties: invariance and discriminative power. Invariance: even if the image is transformed, the descriptor should not change; distinguishing force: the descriptor for each image should be highly unique, with different images having different descriptors.
S4: and performing feature matching on the optical image and the SAR image by using an L2 distance algorithm and a depth feature descriptor.
The present embodiment calculates the L2 distance of the optical image and the SAR image by the feature descriptors generated in S3, and if the distance error of the corresponding matching points in the optical image and the SAR image is less than 2 pixels, regards them as a pair of correct matching points;
the functional formula of the L2 distance algorithm is:
Figure BDA0003480718970000102
oiand sjRespectively, an optical descriptor and a SAR descriptor.
This example compares the effectiveness of the matching method of the present invention (MatchosNet) with three other excellent methods.
SEN1-2 (Earth remote sensing image dataset) was proposed by Schmitt et al in 2017 in The SEN1-2 dataset for deep learning in SAR-optical data fusion. SEN1-2 compared 282384 corresponding image blocks collected from all regions and all weather seasons throughout the world. In this example, 48158 images and 60104 images were used for the summer and winter portions of the SEN1-2 dataset, respectively.
The present embodiment collects a large number of optical and SAR images of chinese regions and collates the corresponding optical and SAR images together. Since the SAR image is much more difficult to acquire in real life than the optical image, the SAR image is randomly segmented into 512 × 512 and the optical image is segmented into 800 × 800 containing the content of the SAR image. Data sets fall into six major categories, port, urban area, water system, airport, island and plains. The total data set had 19.2 million images, 9.6 million optical images and 9.6 million SAR images. The dataset has 16000 optical images and 16000 SAR images per category, respectively.
Mishchuk et al, Working hard to knock green neighbor's markers Local descriptor learning loss, proposed a method of distance between the nearest positive and negative examples (Hardnet). The penalty they propose in this paper is superior to the complex regularization method, maximizing the distance between the nearest positive and nearest negative samples in a batch.
Balntas et al, implemented the triple feature descriptor method (TFeat) in left local features descriptors with triplets and short connected neural networks, and proposed the use of triplets of training samples, and triple mining of difficult negative samples. Experiments show that compared with other methods, the method has good effect, the complexity of the network structure of the model is low, and no typical calculation out-of-tolerance exists.
A patch matching network (MatchNet) is proposed by Han et al in MatchNet: Unifying feature and metallic learning for patch-based matching. As a new method for a deep network structure based on patch matching, the result is significantly improved by using fewer descriptors than other methods. Experiments prove that MatchNet has strong competitiveness compared with other similar methods. The authors believe that MatchNet works best without the use of a fully connected layer. Therefore, in this embodiment, the MatchNet without the full connection layer is adopted in the comparative experiment, so that the comparative experiment is objective and fair.
Specifically, S4 includes:
and S41, classifying the optical image and the SAR image.
In the test data set, the optical image and the SAR image are completely scrambled, and the classification experiment of the step can prove that the MatchosNet has the classification capability of one-to-one correspondence of the optical image and the SAR image.
TABLE 2 Classification Performance of the present invention (MatchosNet) against other methods on SEN1-2 SUMMER dataset
Model (model) AUC index (%) FPR80 index (%)
Hardnet 0.9881 0.00075
TFeat 0.9859 0.10
MatchNet 0.9625 0.05813
MatchosNet 0.9899 0.0001
TABLE 3 Classification Performance of the invention (MatchosNet) in comparison to other methods on the SARptic dataset
Model (model) AUC index (%) FPR80 index (%)
Hardnet 0.9575 0.0445
TFeat 0.9004 0.1647
MatchNet 0.9010 0.1594
MatchosNet 0.9810 0.0168
Indicators of area under the curve (AUC) and FPR80 (false positive rate of 0.80 true positive recall point) are shown in tables 2 and 3. The ideal value of the area under the curve is 1. The larger the AUC, the better the network performance. For FPR80, the smaller the number, the better the network performance. The ideal value of FPR80 is 0. Table 2 shows the performance of the different methods from SEN1-2 SUMMER dataset (earth remote sensing SUMMER image dataset). Table 3 shows the performance of different methods from the sarpitical dataset (three-dimensional dataset of dense urban SAR combined with optical image analysis). As shown in tables 2 and 3, the AUC values of MatchosNet were highest and the FPR80 value was lowest in both data sets. It can prove that MatchosNet is very competitive in classification capacity.
And S42, realizing feature matching of the optical image and the SAR image, and evaluating the matching result.
In the feature matching test, the present invention evaluated the performance of different methods of training using different training datasets (the SEN1-2 SUMMER dataset, the SEN1-2 WINTER dataset (the Earth remote sensing WINTER image dataset) and the dataset we collected) to determine if the two patches correspond to each other. The four methods use the same data set and train in the same size batch on the same server to ensure the objectivity and fairness of the experiment. In experimental tests, if the distance error of the corresponding matching points in the optical image and the SAR image is less than 2 pixels, they are considered as a pair of correct matching points.
The method utilizes various data sets such as Gaussian noise, salt and pepper noise and the like to calculate the final registration result of the SAR optics. In the experiment, a criterion is calculated according to the number of true and false matching times obtained by each pair of images by calculating the matching result of the optical image and the SAR image. It is assumed that the detected key points a and B, and their descriptors DA and DB, are selected from the reference image and the target image, respectively. A and B are a true match if the distance between descriptors DA and DB is less than a threshold T while A and B are verified as a correct match by the true value tag (communication area data). If A and B are not the correct matches confirmed by the true value tags, but the distance between the descriptors DA and DB is less than T, then A and B are false matches, and vice versa.
The formulas of the true matching rate, the false matching rate and the precision algorithm are as follows:
Figure BDA0003480718970000121
Figure BDA0003480718970000122
Figure BDA0003480718970000123
for any precision, the maximum recall of a descriptor is 1. That is, the closer the curve (mismatch rate-correct match rate) and the curve (1-precision-correct match rate) are to the top and left, the more efficient the algorithm is. As shown in fig. 8, (a) a (false match rate-correct match rate) curve generated for a data set without added noise; (b) (1-accuracy-correct match rate) curve generated for the data set without added noise. As can be seen from fig. 8, the MatchosNet method performs best on both measurement metrics.
As shown in fig. 9, (a) a (false match rate-correct match rate) curve generated for the data set with gaussian noise and salt and pepper noise added; (b) a (1-accuracy-correct match rate) curve is generated for the dataset with gaussian noise and salt and pepper noise added. It can be derived from fig. 9 that the MatchosNet method performs best on both of these measurement metrics, even on complex datasets after noise addition, further demonstrating the superiority and robustness of the MatchosNet method.
And S43, matching the characteristic points of the optical image and the SAR image, and evaluating the distance error of the matched points.
In order to verify the accuracy of the matching points, the distance error of the matching points is detected, namely on an image, xrmse represents the error of the horizontal distance of the matching points, yrmse represents the error of the vertical distance of the matching points, and xyrmse represents the error of the pixel distance of the matching points. All error measurements are in pixels. The functions of xrmse, yrmse and xyyrmse are calculated as follows:
Figure BDA0003480718970000131
Figure BDA0003480718970000132
Figure BDA0003480718970000133
wherein the content of the first and second substances,
Figure BDA0003480718970000134
the coordinates of the matching points in the SAR image are represented,
Figure BDA0003480718970000135
representing the coordinates of the matching points in the optical image. N represents the total number of matching points.
Table 4 SEN1-2 — average error xrmse, xyrmse for correct matching points for 4 methods on the SUMMER dataset.
Model (model) xrmse yrmse xyrmse
Hardnet 1.3044 0.6793 0.7870
TFeat 1.3383 0.7616 0.7537
MatchNet 1.3403 0.7452 0.7714
MatchosNet 1.3033 0.7432 0.7415
TABLE 5 SEN1-2_ WINTER data set average error xrmse, yrmse, xyrmse of correct match points for 4 methods.
Figure BDA0003480718970000136
Figure BDA0003480718970000141
Table 6 mean errors xrmse, yrmse, xyrmse of correct matching points for 4 methods on the data set collected by the present invention.
Model (model) xrmse yrmse xyrmse
Hardnet 1.3152 0.7603 0.7368
TFeat 1.2838 0.7455 0.7489
MatchNet 1.3371 0.7719 0.7444
MatchosNet 1.3130 0.7656 0.7281
Table 4, table 5 and table 6 show the average error xrmse, yrmse, xyyrmse for different methods in the three datasets, respectively. And the xyrmse represents the average pixel error between matching points and can best reflect the accuracy and effect of matching. It is clear that MatchosNet has the lowest average error xyrmse in the three data sets, and that xrmse and yrmse are also excellent. This indicates that MatchosNet has high feature matching point accuracy and strong feature matching capability.
Example 2
The embodiment discloses a method for matching an SAR image with an optical image (MatchosNet) and realizes position matching, which comprises the following steps:
A. and designing a voting algorithm of two-dimensional Gaussian distribution to realize the position matching of the SAR image and the optical image.
And the coordinates of the upper left pixel of the SAR image on the optical image can be obtained by matching each pair of feature points. As shown in fig. 10, since each group of images has many different feature matching points, it is possible to obtain a plurality of candidate position coordinates.
And a position matching voting algorithm is designed by adopting two-dimensional Gaussian distribution. x and y are respectively the horizontal coordinate and the vertical coordinate of the predicted point, and the two-dimensional Gaussian function of the invention can be expressed as:
Figure BDA0003480718970000142
wherein σ1、σ2Is the variance of the predicted point, μ1、μ2Is a mathematical expectation of the predicted point.
As shown in fig. 11 (a), this embodiment designs a gaussian weight template with a size of 7 × 7, and the three-dimensional distribution of the gaussian weight template is shown in fig. 11 (b). Setting mu1=3.5,μ2=3.5,σ1=7,σ27. Weight W for each locationijCan be expressed as:
Wij=f(x,y)f(x,y)~N(μ1=3.5,μ2=3.5,σ1=7,σ2=7)
as shown in fig. 12 (a), each candidate position may be weighted by the weight template to obtain a final vote value V after multiple rounds of accumulationij. The formula can be expressed as:
Vij=∑wij
the distribution of the function can be roughly represented in (b) of fig. 12. Finally, the invention selects VijThe position coordinate with the maximum value, which is the final result of the matching of the SAR image and the optical image position.
B. The effect of position matching is objectively proved through a large number of experiments.
Table 7 mean error xrmse, yrmse, xyrmse and number of correct position matches for the MatchosNet images on different datasets.
Figure BDA0003480718970000151
As far as we know, other feature matching methods do not achieve the position matching of images with different sizes, so that the embodiment can only use objective indexes to analyze the position matching effect of the method provided by the invention. Table 7 shows the position matching error of MatchosNet and the number of images that matched correctly in a batch of different datasets. In the position matching of the SAR image and the optical image, when the distance error of the image position matching is less than 5.0 pixels, it is regarded as a correct position-matched image.
As can be seen from table 7, when MatchosNet performs position matching, the distance error between the SAR image and the optical image is small, and the number of correctly matched images is large. The result shows that MatchosNet can well realize the position matching between the SAR image and the optical image, and has a larger practical application value.
Example 3
In this embodiment, different deep convolutional network structures are respectively adopted to perform experiments on the feature matching and position matching methods in the first embodiment and the second embodiment, thereby verifying the performance of the network structure designed by the present invention.
Table 8 mean errors xrmse, yrmse, xyrmse and number of correctly position matched pictures for the method using different depth convolutional networks.
Figure BDA0003480718970000152
By comparing results of different models, the influence of the network structure on the characteristic detection model and the effectiveness of the MatchosNet structure designed by the invention are verified. Table 8 shows the results of position matching of MatchosNet with the other two methods. It can be seen that the Xrmse, YRmmse and XYrmse of MatchosNet are the lowest, and the number of correctly position-matched images in the batch is the highest. The above experiments prove that the network architecture designed by the MatchosNet of the present invention is very effective in handling the problems of feature matching and location matching.
Example 4
In this embodiment, different loss functions are respectively adopted to perform experiments on the feature matching and position matching methods in the first embodiment and the second embodiment, so as to verify the performance of the loss function designed by the present invention.
Table 9 mean errors xrmse, yrmse, xyrmse and number of correctly position matched pictures for the method using different loss functions.
Figure BDA0003480718970000161
By comparing the results of different models, the influence of the loss function on the feature detection model is verified. Table 9 shows the results of position matching of MatchosNet with the other two methods. It can be seen that the Xrmse, YRmmse and XYrmse of MatchosNet are the lowest, and the number of correctly position-matched images in the batch is the highest. The above experiments show that the loss function designed by the MatchosNet of the present invention is very effective in dealing with the problems of feature matching and location matching.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A SAR image and optical image matching method based on feature matching and position matching is characterized by comprising the following steps:
s1: carrying out primary key point detection on the optical image and the SAR image by using a Gaussian difference algorithm;
s2: extracting surrounding areas according to the detected key points of the optical image and the SAR image, and reconstructing the surrounding areas into image blocks of 64 x 64 pixels;
s3: designing a deep convolutional neural network comprising a dense block and a transition layer, designing a composite loss function, and generating a deep feature descriptor through training and running the deep convolutional neural network;
s4: performing feature matching on the optical image and the SAR image by using an L2 distance algorithm and a depth feature descriptor, and evaluating the distance error of a matching point;
s5: and realizing the position matching of the SAR image and the optical image by a two-dimensional Gaussian function voting algorithm.
2. The method for matching a SAR image and an optical image based on feature matching and position matching according to claim 1, wherein a function of the gaussian difference algorithm in S1 is:
Figure FDA0003480718960000011
wherein the content of the first and second substances,
Figure FDA0003480718960000012
and
Figure FDA0003480718960000013
gaussian filtering representing the two images respectively; x, y are the horizontal and vertical coordinates of the predicted point, σ, respectively1、σ2Is the variance of the predicted point, and e is a natural constant.
3. The method for matching the SAR image and the optical image based on feature matching and location matching according to claim 1, wherein in S1, the method for preliminary keypoint detection is:
in the preliminary key point detection, the gray values of all pixel points in the image need to be detected, and if the DOG value of a pixel is the maximum value or the minimum value of all adjacent pixel points, the DOG value is regarded as the key point.
4. The method for matching the SAR image and the optical image based on feature matching and position matching according to claim 1, wherein the specific method for generating the feature descriptor in S3 is as follows: training the deep convolutional neural network through the designed deep convolutional neural network and the loss function, and after the training is finished, sending the image into the trained deep convolutional neural network so as to generate 256-bit feature descriptors;
the reconstructed image blocks in S2 serve as training data of the deep convolutional neural network in S3.
5. The method for matching the SAR image and the optical image based on the feature matching and the position matching as claimed in claim 1 or 4, wherein the deep convolutional neural network in S3 is composed of three dense blocks and two transition layers, wherein the function formula of the dense blocks is as follows:
Xi=Hi([X0,X1,…,Xi-1])
the functional formula of the transition layer is:
Xk=Wk*[X0”,X1,…,Xk-1]
XT=WT*[X0”,X1,…,Xk]
XU=WU*[X0’,XT]
wherein, XiRepresents the output of the current layer, Hi () represents the complex function of the batch normalization, ReLU, pooling and convolution operations, a convolution operator, XkIs the output of the k-th dense layer, XTIs the output of the first transition layer, X0The warp density block is divided into two parts, tableShown as X0=[X0′,X0″]Wherein X is0′Is the part which does not enter the dense layer; x0Is the output of the layer 0 neural network feature, X1Is the output of the layer 1 neural network feature, XUFor final output, W represents trainable weights, Xi-1Denotes the output of the previous layer, Xk-1Representing the output of the layer preceding the k-th layer.
6. The method for matching an SAR image and an optical image based on feature matching and position matching according to claim 1, wherein the composite loss function in S3 is composed of a Hardl2 loss function and an ArcPatch loss function, wherein the Hardl2 loss function is:
Figure FDA0003480718960000021
wherein o isiDenotes an optical descriptor, sjDenotes the SAR descriptor, SjminRepresentative tables and (o)i,si) The top M SAR descriptor closest in euclidean distance does not match the descriptor,
Figure FDA0003480718960000022
representative tables and (o)i,si) The top M closest Euclidean distance optical descriptor non-matching, d (o)i,sj) Represents oiAnd sjM represents the top M nearest descriptors of the experiment; i 1 … n, j 1 … n, i, j, k are subscripts describing the subset, j ≠ i, k ≠ i;
the ArcPatch loss function is:
Figure FDA0003480718960000023
where b represents the value of the training batch size, s is a constant, represents the magnification factor, cos θiiDenotes the distance between positive samples, cosθijAnd cos θjiRepresenting the distance between the negative samples, ij and ji respectively representing the negative sample conditions when the sample unit vectors of the optical image and the radar image are different, and m is an angle margin;
the composite loss function is:
Loss=λ1Lhardl22LARCpatch
wherein λ is1=1,
Figure FDA0003480718960000031
p represents the number of iterations.
7. The method for matching the SAR image and the optical image based on feature matching and position matching according to claim 1, wherein the specific method of feature matching in S4 is to calculate the L2 distance of the optical image and the SAR image through the feature descriptors generated in S3, and if the distance error of the corresponding matching points in the optical image and the SAR image is less than 2 pixels, regard them as a pair of correct matching points;
the functional formula of the L2 distance algorithm is:
Figure FDA0003480718960000032
wherein o isiAnd sjRespectively, an optical descriptor and a SAR descriptor.
8. The SAR image and optical image matching method based on feature matching and position matching as claimed in claim 1, wherein the evaluation method in S4 is:
using xrmse to represent the error of the horizontal distance of the matching point, using yrmse to represent the error of the vertical distance of the matching point, and using xyrmse to represent the error of the pixel distance of the matching point; all error measurements are in pixels; the lower the values of the three errors are, the higher the matching precision is; xrmse, yrmse and xyyrmse are calculated as follows:
Figure FDA0003480718960000033
Figure FDA0003480718960000034
Figure FDA0003480718960000035
wherein the content of the first and second substances,
Figure FDA0003480718960000036
the coordinates of the matching points in the SAR image are represented,
Figure FDA0003480718960000037
coordinates representing matching points in the optical image; n represents the total number of matching points.
9. The method for matching the SAR image and the optical image based on the feature matching and the position matching as claimed in claim 1, wherein the two-dimensional gaussian function voting algorithm in S5 is specifically:
first, the variance σ from the predicted point1、σ2And mathematical expectation of predicted points mu1、μ2Weight W of each candidate positionijCan be expressed as:
Wij=f(x,y) f(x,y)~N(μ1=3.5,μ2=3.5,σ1=7,σ2=7)
secondly, each candidate position gives a certain weight to the pixel of the optical image through a weight template, and the weights are accumulated for multiple times to obtain a final voting value VijThe formula is expressed as:
Vij=∑wij
finally, V is selectedijPosition coordinates with maximum value, i.e. SAR image and optical image positionAnd setting the final result of matching.
10. The method for matching the SAR image and the optical image based on feature matching and position matching according to claim 9, wherein the two-dimensional gaussian function is represented as:
Figure FDA0003480718960000041
where x and y are the horizontal and vertical coordinates of the predicted point, σ, respectively1、σ2Is the variance of the predicted point, μ1、μ2Is a mathematical expectation of the predicted point.
CN202210067322.2A 2022-01-20 2022-01-20 SAR image and optical image matching method based on feature matching and position matching Pending CN114511012A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210067322.2A CN114511012A (en) 2022-01-20 2022-01-20 SAR image and optical image matching method based on feature matching and position matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210067322.2A CN114511012A (en) 2022-01-20 2022-01-20 SAR image and optical image matching method based on feature matching and position matching

Publications (1)

Publication Number Publication Date
CN114511012A true CN114511012A (en) 2022-05-17

Family

ID=81549041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210067322.2A Pending CN114511012A (en) 2022-01-20 2022-01-20 SAR image and optical image matching method based on feature matching and position matching

Country Status (1)

Country Link
CN (1) CN114511012A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115601576A (en) * 2022-12-12 2023-01-13 云南览易网络科技有限责任公司(Cn) Image feature matching method, device, equipment and storage medium
CN117710711A (en) * 2024-02-06 2024-03-15 东华理工大学南昌校区 Optical and SAR image matching method based on lightweight depth convolution network
CN118135364A (en) * 2024-05-08 2024-06-04 北京数慧时空信息技术有限公司 Fusion method and system of multi-source remote sensing images based on deep learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510532A (en) * 2018-03-30 2018-09-07 西安电子科技大学 Optics and SAR image registration method based on depth convolution GAN
US20200372350A1 (en) * 2019-05-22 2020-11-26 Electronics And Telecommunications Research Institute Method of training image deep learning model and device thereof
CN113223068A (en) * 2021-05-31 2021-08-06 西安电子科技大学 Multi-modal image registration method and system based on depth global features

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510532A (en) * 2018-03-30 2018-09-07 西安电子科技大学 Optics and SAR image registration method based on depth convolution GAN
US20200372350A1 (en) * 2019-05-22 2020-11-26 Electronics And Telecommunications Research Institute Method of training image deep learning model and device thereof
CN113223068A (en) * 2021-05-31 2021-08-06 西安电子科技大学 Multi-modal image registration method and system based on depth global features

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUN LIAO ET AL: "Feature Matching and Position Matching Between Optical and SAR With Local Deep Feature Descriptor", 《REMOTE SENSING》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115601576A (en) * 2022-12-12 2023-01-13 云南览易网络科技有限责任公司(Cn) Image feature matching method, device, equipment and storage medium
CN115601576B (en) * 2022-12-12 2023-04-07 云南览易网络科技有限责任公司 Image feature matching method, device, equipment and storage medium
CN117710711A (en) * 2024-02-06 2024-03-15 东华理工大学南昌校区 Optical and SAR image matching method based on lightweight depth convolution network
CN117710711B (en) * 2024-02-06 2024-05-10 东华理工大学南昌校区 Optical and SAR image matching method based on lightweight depth convolution network
CN118135364A (en) * 2024-05-08 2024-06-04 北京数慧时空信息技术有限公司 Fusion method and system of multi-source remote sensing images based on deep learning

Similar Documents

Publication Publication Date Title
Chen et al. Remote sensing scene classification via multi-branch local attention network
Li et al. Adaptive multiscale deep fusion residual network for remote sensing image classification
CN108573276B (en) Change detection method based on high-resolution remote sensing image
CN107066559B (en) Three-dimensional model retrieval method based on deep learning
Shi et al. Branch feature fusion convolution network for remote sensing scene classification
CN101980250B (en) Method for identifying target based on dimension reduction local feature descriptor and hidden conditional random field
CN114511012A (en) SAR image and optical image matching method based on feature matching and position matching
CN103034863B (en) The remote sensing image road acquisition methods of a kind of syncaryon Fisher and multiple dimensioned extraction
CN103927511B (en) image identification method based on difference feature description
CN104680173B (en) A kind of remote sensing images scene classification method
Cheng et al. Robust affine invariant feature extraction for image matching
CN112800876B (en) Super-spherical feature embedding method and system for re-identification
CN105574505A (en) Human body target re-identification method and system among multiple cameras
CN111723675A (en) Remote sensing image scene classification method based on multiple similarity measurement deep learning
CN108021890B (en) High-resolution remote sensing image port detection method based on PLSA and BOW
Mei et al. Remote sensing scene classification using sparse representation-based framework with deep feature fusion
CN109635726B (en) Landslide identification method based on combination of symmetric deep network and multi-scale pooling
CN107341813A (en) SAR image segmentation method based on structure learning and sketch characteristic inference network
CN110516533A (en) A kind of pedestrian based on depth measure discrimination method again
CN104751475A (en) Feature point optimization matching method for static image object recognition
Zhuang et al. Small sample set inshore ship detection from VHR optical remote sensing images based on structured sparse representation
Almaadeed et al. Partial shoeprint retrieval using multiple point-of-interest detectors and SIFT descriptors
CN113723492A (en) Hyperspectral image semi-supervised classification method and device for improving active deep learning
CN111695455A (en) Low-resolution face recognition method based on coupling discrimination manifold alignment
CN105809132B (en) A kind of improved compressed sensing face identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220517