CN110135474A

CN110135474A - A kind of oblique aerial image matching method and system based on deep learning

Info

Publication number: CN110135474A
Application number: CN201910344297.6A
Authority: CN
Inventors: 吴春; 郑振华; 宋洁; 张翼峰; 孙佳
Original assignee: Wuhan Land Use And Urban Spatial Planning Research Center
Current assignee: Wuhan Land Use And Urban Spatial Planning Research Center
Priority date: 2019-04-26
Filing date: 2019-04-26
Publication date: 2019-08-16

Abstract

The present invention provides a kind of oblique aerial image matching method and system based on deep learning, includes the following steps: step 1, and training sample data set, performance test data collection obtain；Step 2 is improved Triplet network by the loss function for selecting different affine Transform Model parameters, construction different, and is trained using training sample data set to improved Triplet network；Performance test data collection is input in trained Triplet network by step 3, exports matching result, and reject mispairing point pair using basis matrix F, i.e. corresponding dot pair need to meet condition: x'Fx≤threshold value T.By the results showed that compared with the conventional method, for aviation image, the method for the present invention can to avoid a large amount of space characteristics points to mispairing point deletion is mistakened as the problem of.

Description

A kind of oblique aerial image matching method and system based on deep learning

Technical field

The invention belongs to the photogrammetric technology fields in Surveying Science and Technology subject, navigate more particularly to a kind of inclination The matching process and system of empty image.

Background technique

With the rapid development of earth observation technology, contemporary Geographical Information Sciences are just from two-dimentional papery plane to three-dimensional space Direction develop.Obtain in recent years fast-developing oblique photograph measuring system can effectively solve the problem that urban skyscraper block, The problems such as side grain acquisition of information, is increasingly becoming the main data source of 3 d modeling of building.Feature Matching is Restore the key technology means of three-dimensional information, including two core procedures: feature detection, feature in the system based on bidimensional image Description.The features such as DOG, Harris, Harris-Affine, Hessian, Hessian-Affine, MSERs detection symbol, SIFT, The feature descriptors such as SURF, DAISY, belong to the shallow-layer learning model of engineer, and Image Matching process is as shown in Figure 1.With Data volume increased dramatically the raising with computing capability, the limitation of shallow-layer study mainly based on finite sample and is calculating single In the case where member, the applicability in complex three-dimensional scene (such as urban area) is limited.Basic reason is that urban area covers Inclination image in range usually can not be approximately two-dimensional surface, different different as estimating affine transformation parameter, pass through The robustness for the provincial characteristics that conventional method obtains is not strong.

Summary of the invention

Present invention seek to address that " ground object target change in depth acutely causes picture can not between in oblique aerial Image Matching With single affine Transform Model approximate expression " problem.

The technical scheme is that a kind of oblique aerial image matching method based on deep learning, including walk as follows It is rapid:

Step 1, training sample data set, performance test data collection obtain；

Wherein training sample data set in HPatches data set by being sampled to obtain, and implementation is such as Under,

If χ=(A_i,P_i)_{I=1,2 ... n}For the sample matches point pair generated from data set, n is the logarithm of matching double points, right Answering network output valve is (a_i,p_i)_{I=1,2 ... n}, Distance matrix D=[d_ij]_n×n,

Define distance a_iNearest non-matching pointj_min=argmind (a_i,p_j), j=1,2 ... n, j ≠ i；

Define distance p_iNearest non-matching pointk_min=argmind (a_k,p_i), k=1,2 ... n, k ≠ i；

Construct the input value of Triplet network structure: ifIt takes IfIt takesWork as setting(or) be isolated point when, it is corresponding(or) rejected from input sample centering；

Step 2, by selecting different affine Transform Model parameters, constructing different loss functions to Triplet network It improves, and improved Triplet network is trained using training sample data set；

Performance test data collection is input in trained Triplet network by step 3, exports matching result, and adopt Mispairing point pair is rejected with basis matrix F, i.e. corresponding dot pair need to meet condition: x'Fx≤threshold value T；Wherein, x, x' indicate two shadows The corresponding dot pair expressed as in homogeneous coordinates, F is 3 × 3 matrix.

Further, affine Transform Model parameter selected in step 2 isWherein, λ is scale；φ is longitude；θ is latitude Degree；ψ is swing angle.

Further, the loss function constructed in step 2 is as follows,

Wherein, a_iOn the basis of, p_iFor positive sample, q_iFor negative sample, n is the logarithm of corresponding dot pair.

Further, the performance test data collection include the general two-dimensional surface scene of 4 groups of computer vision fields from Right image data collection and 2 groups of cities provided by International Society for Photogrammetry and Remote (ISPRS) Third Committee's experimental project Central city three-dimensional scenic multi-angle of view image.

Further, the value of threshold value T is 3 or 5.

The present invention also provides a kind of oblique aerial image matching system based on deep learning, including following module:

Data acquisition module, the acquisition for training sample data set, performance test data collection；

If χ=(A_i,P_i)_{I=1,2 ...}N is the sample matches point pair generated from data set, and n is the logarithm of matching double points, Corresponding network output valve is (a_i,p_i)_{I=1,2 ... n}, Distance matrix D=[d_ij]_n×n,

Network improvement and training module, for the loss by selecting different affine Transform Model parameters, construction different Function improves Triplet network, and is instructed using training sample data set to improved Triplet network Practice；

Mispairing point, for performance test data collection to be input in trained Triplet network, is exported to module is rejected Matching result, and mispairing point pair is rejected using basis matrix F, i.e. corresponding dot pair need to meet condition: x'Fx≤threshold value T；Wherein, x, X' indicates the corresponding dot pair expressed in two images with homogeneous coordinates, and F is 3 × 3 matrix.

Further, affine Transform Model parameter selected in network improvement and training module isWherein, λ is scale；φ is longitude；θ is latitude Degree；ψ is swing angle.

Further, the loss function constructed in network improvement and training module is as follows,

Further, the value of threshold value T is 3 or 5.

Compared with prior art, the advantages of the present invention: what is be primarily present between inclination image is that visual angle becomes Change, translation common in previous deep learning document, rotation parameter estimation are extended to the affine transformation suitable for visual angle change Parameter realizes the dynamic estimation of affine transformation parameter；By constructing specific loss function to obtain matched sample Euclidean distance Relatively minimal, 128 low dimensional feature vectors of the degree of correlation between different dimensions.

Detailed description of the invention

Fig. 1 is the Feature Matching process based on shallow-layer learning model (SIFT).

Fig. 2 is Siamese Network in the embodiment of the present invention.

Fig. 3 is Triplet Network in the embodiment of the present invention.

Fig. 4 is that training dataset samples process in the embodiment of the present invention.

Fig. 5 is algorithm performance test data set in the embodiment of the present invention: (a)-(d) is that 4 groups of computer vision fields are general Two-dimensional surface scene.

Fig. 6 is algorithm performance test data set in the embodiment of the present invention: simple three-dimensional scene, wherein (a) forward sight, (b) Backsight, (c) left view, (d) right view, (e) under regard.

Fig. 7 is algorithm performance test data set in the embodiment of the present invention: complex three-dimensional scene figure, wherein (a) forward sight, (b) Backsight, (c) left view, (d) right view, (e) under regard.

Fig. 8 is affine camera model in the embodiment of the present invention.

Fig. 9 is that ASIFT affine space parameter space samples schematic diagram in the embodiment of the present invention.

Figure 10 is median-plane field of embodiment of the present invention scape homography matrix H.

Figure 11 is basis matrix F in the embodiment of the present invention.

Figure 12 is homography matrix H, basis matrix F rejecting mispairing point comparative experiments figure in the embodiment of the present invention, wherein (a), (b) is lower view, right view aviation image to SIFT algorithmic match as a result, using basis matrix rejecting mispairing point；(c), (d) is Same image is to RANSAC algorithm mispairing point deletion result.

Specific embodiment

Technical solution of the present invention is described further with reference to the accompanying drawings and examples.

In general classification problem, classification sum is known and fixed, each input value has its corresponding ownership class Not.In Feature Matching, image is matched to the substantial amounts of respective extracted characteristic point, but it is partially of the same name for wherein only having Characteristic point pair, remaining is " isolated point ".To improve matched accuracy, in addition to conventional mispairing point rejects strategy, usually also Compare other two point relevant to point to be matched: the point close apart from nearest point, distance time.Therefore it is suitable for image feature Matched neural network model is based on " parallel " structure of multi input (out) value, and there are mainly two types of forms:

I, the Siamese Network (also known as " twins' network ", such as Fig. 2) of 2 input values；

The Triplet Network (such as Fig. 3) of П, 3 input values；

For there are the inclination Image Matching that visual angle (viewpoint) changes, newest contrast and experiment shows: " tradition Matching strategy ratio " local use depth learning technology " (learning characteristic detection+craft of the detection of hand-designed feature, descriptor " Feature description, manual feature detection+learning characteristic description) matching strategy advantageously (Fan et al., 2017； Schonberger et al.,2017；Lenc et al.,2018).The present invention is using " learning characteristic detection+learning characteristic is retouched State " matching strategy, technology path include three big steps:

Step 1, training sample data set, performance test data collection obtain.

I, training sample data set samples

Using newest at present and most popular HPatches data set (Balntas et al., 2017), the data Collection has the advantage that real scene is abundant, data volume is big, is suitable for multiple-task.Training data concentrates the number of non-matching point It is unnecessary to traverse all possible combination far more than match point for amount, therefore effectively very crucial (the Tian et of sampling policy al.,2017)。

Define the nearest non-matching point of distance pik_min=argmind (a_k,p_i), k=1,2 ... n, k ≠ i；

Construct the input value of Triplet network structure: ifIt takes IfIt takesAs shown in Figure 4.

Unlike document Mishkin et al. (2018), we add additional judgement(or) whether For the random selection mechanism of " isolated point ", work as setting(or) be isolated point when, it is corresponding(or) from input Sample centering is rejected, and reason is in characteristic point to be matched that only some is there are Corresponding matching point, is existed largely without corresponding " isolated point " with point, this problem is more prominent in aviation image matching, and reason is aviation image mostly with urban architecture Based on object (complex three-dimensional scene), the terrestrial object information that different perspectives image is included is not fully identical.

П, performance test data collection

The general two-dimensional surface scene nature image data collection (as shown in Figure 5) of experiment 4 groups of computer vision fields of selection, 2 groups of urban central zone three dimensional fields provided by International Society for Photogrammetry and Remote (ISPRS) Third Committee's experimental project Scape multi-angle of view image (portion intercepts region is as shown in Figure 6, Figure 7).Fig. 6 image overlay area: building structure is simple, site coverage It is small, highly low, simple three-dimensional scenic can be considered as, as experimental data set I.Fig. 7 image overlay area: building structure complexity, Site coverage is big, can be considered as complex three-dimensional scene, as experimental data set II.Two groups of scene data collection all include image The information such as elements of interior orientation, elements of exterior orientation, Pixel size, radiometric resolution, and have been subjected to stringent optic aberrance revising.By In algorithm main output the result is that the characteristic point pair after matching pixel coordinate value, therefore pass through correct matching double points quantity (quantitative assessment) carrys out the performance of evaluation algorithms.

Experimental situation is as follows: Ubuntu 18.04.1,10 professional of Microsoft Windows 64 behaviour Make system, development platform is Visual Studio 2015, Matlab2017a；Deep learning tool is Pyorch, SIFT algorithm It is adopted using open source software VLFeat, the ASIFT algorithm that Oxford University Andrew Zisserman professor computation vision group provides The source program provided with algorithm original author.

The correct matching characteristic point logarithm that distinct methods obtain in 1 two-dimensional surface scene of table

The statistical result of table 1 shows: in two-dimensional surface scene, affine space continuous parameters, and ASIFT algorithm affine space The matching strategy of sparse sampling can be good at covering this affine space, therefore the matching knot of ASIFT algorithm parameter at equal intervals Fruit will be substantially better than deep learning method.Deep learning method is commonly available to " can not be by physics theorem or mathematics etc. Formula establishes model " occasion；Conversely, when model " can be established by physics theorem or mathematical equation ", deep learning Method can not often play its advantage.

The correct match point logarithm that distinct methods obtain in the simple three-dimensional scenic interception area of table 2

As shown in table 2, in the simple three-dimensional scenic there are visual angle change, SIFT algorithm fails substantially, and ASIFT algorithm is still Have certain adaptability, but in forward sight-backsight, left view-is right sharply declines depending on the true matching double points quantity of two image alignments, depth Learning method also has similar situation, because the two opposite camera observed directions are completely on the contrary, parallax comparison is most obvious.With two Dimensional plane scene is different, and the advantage of deep learning method has started to show, deep except lower view-forward sight, lower view-right seeing image picture are external The correct matching double points that degree learning method obtains are more than ASIFT algorithm.

The correct match point logarithm that distinct methods obtain in 3 complex three-dimensional scene interception area of table

As shown in table 3, in simple three-dimensional scenic, even there are also certain adaptability for ASIFT algorithm, in complex three-dimensional field It almost fails in scape, having three pairs of images to (lower view-forward sight, forward sight-backsight, the right view of left view -), it fails to match.Deep learning method Correct matching double points in varying numbers are all obtained, but quantity decline is obvious compared with simple three-dimensional scenic.

Step 2, depth net structure, training

Depth net structure includes two parts: affine Transform Model parameter selection, feature describe network losses function structure It makes, is trained using network of the training sample after sampling to construction.

I, affine Transform Model parameter selection

Direction estimation realized by Spatial Transformer Newworks (Jaderberg et al., 2016), It needs covariant constraint (Covariant Constraint) being extended to adaptation affine transformation in the direction estimation of visual angle change, Select affine Transform Model parameter.Affine transformation matrixA decomposed form multiplicity, based on convolution mind In geometry estimation through network model, the selection of resolution parameter can have a significant impact (Mishkin et al., 2018).The present invention The affine parameter of use is as follows,

Above-mentioned affine parameter is selected to be based primarily upon following two points consideration:

1) parameter explicit physical meaning (Fig. 8, λ: scale；φ: longitude；θ: latitude；ψ: swing angle), and training can be passed through Depth network implementations dynamic estimation afterwards, close to the true imaging process of aviation image.

2) it is convenient for comparing with algorithm ASIFT；In plane scene, ASIFT algorithmic match effect is very good, and in three-dimensional It is undesirable in scene.Reason is: in three-dimensional scenic, affine space parameter is not continuous, the affine parameter of ASIFT algorithm Equal interval sampling strategy (Fig. 9) can not cover all affine spaces (Chen Min, 2014).In previous comparative experiments document, Rare to compare deep learning method and ASIFT method, the present invention will exactly be solved discontinuous by the method for deep learning Affine space Parameter Estimation Problem.

П, loss function construction

The basic Component units of neural network that feature description uses quoted from L2NET (Tian et al., 2017), but we Using the Triplet network structure (as shown in Figure 3) different from master mould, loss function construction is also different, if network is defeated Feature description vectors out are as follows: (a_i,p_i,q_i), a_i: benchmark, p_i: positive sample is (with a_iThe point to match), q_i: negative sample is (with a_iIt is non- Matched point)；

Loss function:

In the ideal case, d (a_i,p_i) < d (a_i,q_i), it may be assumed that the distance between matching double points are less than non-matching point pair The distance between, it will appear negative value in network training real process.

Step 3, mispairing point is to rejecting

There are a large amount of mispairing points pair in network output result, it is therefore necessary to take appropriate measures and be rejected.It is counting Calculation machine visual field, widely used natural image is mostly based on plane scene or almost plane scene in view transformation research. In plane scene, all corresponding dot pairs meet the same homography matrix (Homography Matrix) transformation model: x'= Hx, as shown in Figure 10.π is a space plane, x_πFor the point in space plane, c, c' are projection ray, and x, x' are planar point x_π? As the subpoint expressed in plane with homogeneous coordinates, i.e., (as) of the same name point pair, H is 3 × 3 matrix.In Image Matching, plane Scene homography matrix H can be used for affine space parameter sparse sampling early period and later period and be deleted based on the mispairing point of RANSAC algorithm It removes.

Oblique aerial image is based on complicated city three-dimensional scenic, and ground object target change in depth is discontinuous, in image Corresponding dot pair does not meet the same homography conversion model, this is also that the famous operators of many computer vision fields can not obtain Ideal effect reason for it.Due to image elements of exterior orientation it is known that using basis matrix (Fundamental matrix, figure 11) F rejecting mispairing point more meets reality, i.e., when x, x' indicate the corresponding dot pair expressed in left and right image with homogeneous coordinates, has Following equation is set up: x'Fx=0, F are 3 × 3 matrixes.Inevitably there is error in imaging, matching process in image, because This is normally set up the threshold value of 3 to 5 pixels, and corresponding dot pair need to meet condition: x'Fx≤3.

As shown in figure 12, (a), (b) be it is lower view, right view aviation image to SIFT algorithmic match as a result, using basis matrix Reject mispairing point；(c), (d) is same image to RANSAC algorithm mispairing point deletion result.The results showed that for aviation Image, according to the relationship between homography matrix approximate expression corresponding dot pair, a large amount of space characteristics points are to (such as building object angle Point) mispairing point deletion can be mistakened as.

The embodiment of the present invention also provides a kind of oblique aerial image matching system based on deep learning, including such as lower die Block:

Wherein, affine Transform Model parameter selected in network improvement and training module isWherein, λ is scale；φ is longitude；θ is latitude Degree；ψ is swing angle.

Wherein, the loss function constructed in network improvement and training module is as follows,

The specific implementation of each module and each step are corresponding, and the present invention not writes.

Relevant references are as follows:

[1] Chen Min .2014. multi-source Remote Sensing Images characteristic matching technical research [D]: the Wuhan [doctor]: Wuhan University

[2]Balntas V,Lenc K,Vedaldi A,et al.2017.HPatch:A benchmark and evaluation of handcrafted and learned local descriptors[C].Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[3]Fan B,Kong Q,Wang X,et al.2017.A Performance Evaluation of Local Features for Image Based 3D Reconstruction[C].Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[4]Jaderberg M,Simonyan K,Zisserman A.et al.2016.Spatial Transformer Networks[C].Proceedings of Advances in Neural Information Processing Systems.

[5]Lenc K,Vedaldi A.2017.Large scale evaluation of local image feature detectors on homography datasets[J].

[6]Mishkin D,Radenovic F,Matas J.2018.Repeatability Is Not Enough: Learning discriminative affine regions via discriminability[C].Proceedings of Computing Research Repository.

[7]Sch¨onberger J L,Hardmeier H,Sattler T,et al.2017.Comparative evaluation of hand-crafted and learned local features[C].Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[8]Tian Y,Fan B,Wu F.2017.L2Net:Deep learning of discriminative patch descriptor in euclidean space[C].Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[9]Yu G,Morel J M.2009.A fully affine invariant image comparison method[C].Proceedings of IEEE International Conference on Acoustics,Speech, and Signal Processing.

Specific embodiment described herein is only an example for the spirit of the invention.The neck of technology belonging to the present invention The technical staff in domain can make various modifications or additions to the described embodiments or replace by a similar method In generation, however, it does not deviate from the spirit of the invention or beyond the scope of the appended claims.

Claims

1. a kind of oblique aerial image matching method based on deep learning, which comprises the steps of:

Step 1, training sample data set, performance test data collection obtain；

Wherein for training sample data set by being sampled to obtain in HPatches data set, implementation is as follows,

If χ=(A_i,P_i)_{I=1,2 ... n}For the sample matches point pair generated from data set, n is the logarithm of matching double points, corresponding net Network output valve is (a_i,p_i)_{I=1,2 ... n}, Distance matrix D=[d_ij]_n×n,

Construct the input value of Triplet network structure: ifIt takesIfIt takesWork as setting(or) be isolated point when, it is corresponding (or) rejected from input sample centering；

Step 2 carries out Triplet network by the loss function for selecting different affine Transform Model parameters, construction different It improves, and improved Triplet network is trained using training sample data set；

Performance test data collection is input in trained Triplet network by step 3, exports matching result, and use base Plinth matrix F rejects mispairing point pair, i.e. corresponding dot pair need to meet condition: x'Fx≤threshold value T；Wherein, x, x' are indicated in two images The corresponding dot pair expressed with homogeneous coordinates, F is 3 × 3 matrix.

2. a kind of oblique aerial image matching method based on deep learning as described in claim 1, it is characterised in that: step Affine Transform Model parameter selected in 2 is Wherein, λ is scale；φ is longitude；θ is latitude；ψ is swing angle.

3. a kind of oblique aerial image matching method based on deep learning as described in claim 1, it is characterised in that: step The loss function constructed in 2 is as follows,

4. a kind of oblique aerial image matching method based on deep learning as described in claim 1, it is characterised in that: described Performance test data collection include the general two-dimensional surface scene nature image data collection of 4 groups of computer vision fields and 2 groups by The urban central zone three-dimensional scenic that International Society for Photogrammetry and Remote (ISPRS) Third Committee's experimental project provides regards more Angle image.

5. a kind of oblique aerial image matching method based on deep learning as described in claim 1, it is characterised in that: threshold value The value of T is 3 or 5.

6. a kind of oblique aerial image matching system based on deep learning, which is characterized in that including following module:

Network improvement and training module, for the loss function by selecting different affine Transform Model parameters, construction different Triplet network is improved, and improved Triplet network is trained using training sample data set；

Mispairing point is to module is rejected, for performance test data collection to be input in trained Triplet network, output matching As a result, and mispairing point pair is rejected using basis matrix F, i.e. corresponding dot pair need to meet condition: x'Fx≤threshold value T；Wherein, x, x' table Show the corresponding dot pair expressed in two images with homogeneous coordinates, F is 3 × 3 matrix.

7. a kind of oblique aerial image matching system based on deep learning as claimed in claim 6, it is characterised in that: network Improve and training module selected in affine Transform Model parameter beWherein, λ is scale；φ is longitude；θ is latitude Degree；ψ is swing angle.

8. a kind of oblique aerial image matching system based on deep learning as claimed in claim 6, it is characterised in that: network The loss function constructed in improvement and training module is as follows,

9. a kind of oblique aerial image matching system based on deep learning as described in claim 1, it is characterised in that: described Performance test data collection include the general two-dimensional surface scene nature image data collection of 4 groups of computer vision fields and 2 groups by The urban central zone three-dimensional scenic that International Society for Photogrammetry and Remote (ISPRS) Third Committee's experimental project provides regards more Angle image.

10. a kind of oblique aerial image matching system based on deep learning as claimed in claim 6, it is characterised in that: threshold The value of value T is 3 or 5.