CN114463397A

CN114463397A - Multi-modal image registration method based on progressive filtering

Info

Publication number: CN114463397A
Application number: CN202210021872.0A
Authority: CN
Inventors: 方圣辉; 熊强; 龚龑; 彭漪; 刘小娟
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2022-01-10
Filing date: 2022-01-10
Publication date: 2022-05-10

Abstract

The invention discloses a multi-modal image registration method based on progressive filtering, which comprises the following steps: step 1, generating a PC map of a multi-modal image pair by using a 2D-LGF; step 2, generating a nonlinear scale space of the PC map by using the AOS, detecting an extreme value in the nonlinear scale space, obtaining a characteristic point with unchanged illumination and contrast, and using the characteristic point for multi-mode image matching; step 3, establishing a structure descriptor by using the 2D-LGF convolution sequence to describe the feature points obtained in the last step; step 4, establishing an initial characteristic corresponding relation by adopting a bilateral matching method, performing convolution operation by using progressive filtering to recover a real smooth motion field, and gradually eliminating error matching by using an iteration method according to the consistency of motion vectors of initial matching; and 5, transforming the image by using the affine transformation model, and outputting an image registration result.

Description

Multi-modal image registration method based on progressive filtering

Technical Field

The invention belongs to the field of image processing, and particularly relates to a multi-modal image registration method based on progressive filtering.

Background

Image matching extracts the correct feature correspondence from two or more images with overlapping regions, which may be acquired by one or more sensors from the same viewpoint or from different viewpoints, at the same time or at different times. Image feature matching has been a fundamental and key problem in the fields of photogrammetry and remote sensing, computer vision, robot vision, medical image analysis, and the like. Image matching has been a popular research topic and has made great progress over the past few decades. However, due to the imaging characteristics of the sensor itself and radiation transmission errors caused by the atmosphere, there may be differences in scale, rotation, radiation, noise, time variation, etc. between images. The large differences in geometry and radiation cause nonlinear radiation difference phenomena between images, which presents a great challenge to a feature matching algorithm, may cause a large reduction in image matching performance, and is difficult to meet the requirements of ever-changing practical applications. Therefore, it is very important to research more effective, general and robust image matching algorithms. Generally, an image having nonlinear radiation differences is referred to as a multi-modal image.

At present, the automatic registration technology of multimodal remote sensing images is widely researched and is one of the research hotspots in the field of image processing. The universal registration and automatic registration of the multi-modal remote sensing images are necessary steps of image fusion, image splicing, target identification and change detection. The method can be used for geometrical registration of optical images and SAR images, registration of high-resolution satellite images, registration of hyperspectral images and other emerging geographic space environments and engineering applications. Because the imaging mechanisms of different sensors are different, and the time, angle and environment for acquiring images are also different, multi-modal image registration still presents many challenges. High accuracy and automatic matching techniques are more difficult to implement, such as image matching where there are significant differences in optical and SAR images, depth maps and visible light images, visible light and infrared images, and so on.

For the multi-modal remote sensing image, the gray scale features are not linear any more. It is not even a non-functional change, the gray-scale relationship between images has no statistical correlation and geometric similarity, and the registration method based on gray scale can cause inaccurate registration. Although the feature descriptors have great advantages in homologous image registration, the feature descriptors do not have the same robustness on multi-modal remote sensing datasets, and the challenges faced by these methods (such as SUSAN, SIFT, phase consistency features, extended SURF) are repetitive feature detection and modality-invariant feature description.

Through studies on image matching, some researchers have found that the geometric information between multimodal images is more stable than the gradient or intensity information under non-linear gray scale changes in multispectral images. Based on this observation, feature points can be detected based on the structural consistency of the image, which can be evaluated by computing similarity measures on structural or shape descriptors. Phase Consistency (PC) models have been shown to capture structural information consistent with human vision and to be robust to illumination and contrast differences. Therefore, some image computer-based algorithms are applied to multispectral or multimodal image matching, however, the PC is susceptible to noise, and aliasing effects greatly affect the image quality and detection accuracy of the PC algorithm. Some scholars extend PC models using Log-Gabor filters, such as oriented phase consistency Histograms (HOPCs), Phase Consistency Structure Descriptors (PCSDs), Radiation Invariant Feature Transforms (RIFTs), and the like. HOPC is a template matching method, and an improved similarity metric (HOPCnc) is proposed on the basis of NCC, however, HOPC requires knowledge of geographic information of an image for geometric correction, and thus may not be applicable to images without geographic information; HOPC uses a Harris detector to detect feature points, however Harris is very sensitive to Nonlinear Radiation Distortion (NRD) and is difficult to universally adapt to different types of multimodal images. RIFT is a feature-based matching method, which uses feature points of a FASR detector PC map, uses MIM to replace gradient for feature description, has good robustness to NRD, realizes rotation invariance by constructing a plurality of minimum mean square errors, but RIFT does not establish scale space for feature detection and description, and is sensitive to large scale and viewpoint change.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a multi-modal image registration method based on progressive filtering, aiming at the defects in the prior art.

The technical scheme adopted by the invention for solving the technical problem is that the method for multi-modal image registration based on progressive filtering comprises the following steps:

step 1, generating a phase consistency map (PC map) of a multi-modal image pair by using two-dimensional Log-Gabor filtering (2D-LGF);

step 2, generating a nonlinear scale space of the PC map by using an additive operator splitting Algorithm (AOS), and detecting an extreme value in the nonlinear scale space to obtain a characteristic point with unchanged illumination and contrast;

step 3, calculating a maximum average index map and a phase consistency direction by using the 2D-LGF, and generating a local feature descriptor for describing the feature points obtained in the previous step;

step 4, constructing an initial feature corresponding relation by adopting bilateral matching, performing convolution operation by using progressive filtering to recover a real smooth motion field, converting feature matching into a denoising model capable of performing convolution filtering, analyzing distribution difference between correct matching and error matching at the same time, further constructing a convolution kernel, gradually recovering the real smooth motion field in the convolution operation process, and eliminating error matching through consistency of initially matched motion vectors to obtain an inner point set;

and 5, transforming the image by using the correct feature matching points in the inner point set obtained in the previous step and using an affine transformation model, and outputting an image registration result.

Further, the specific method of step 1 of the present invention is:

in the step 1, the scale degree, the direction and the number of neighborhood pixels of the log-Gabor filter are initialized, a phase consistency graph of the multi-modal image pair is generated through the two-dimensional 2D-LGF, and the calculation formula is as follows:

PC (x, y) in formula (1) is the size of PC at point (x, y). W_o(x, y) is a weight function. t is the noise threshold, A_so(x, y) is the magnitude component of the log-Gabor filter at point (x, y), and δ is a minimum value to prevent the denominator from being zero. Symbol

Meaning that the closure amount equals itself when its value is positive and zero otherwise. Delta phi_so(x, y) is the phase deviation function, and s and o are the scale and direction of the log-Gabor filter.

Further, the specific method of step 2 of the present invention is:

in step 2, the maximum octave, scale sublevel and control factor of the AOS are initialized. The nonlinear diffusion filtering is described by a nonlinear partial differential equation, which is as follows:

div and in formula (2)

Respectively divergence and gradient operators, and L is the brightness of the image.

Is a function of the conductance, and,

is a Gaussian smoothed image L_σOf the gradient of (c). k is a contrast factor that controls the level of diffusion. Since the nonlinear partial differential equation has no analytic solution, the method generally performs iterative solution by using a numerical analysis method, and the formula is as follows:

a in the formula (3)_lIs a derivative representing the image in the direction l, m refers to the number of directions l, τ ═ t_i+1-t_iFor the time step, I denotes the identity matrix, LⁱRepresenting the solution of the i-th iteration of the nonlinear diffusion filtering.

When a nonlinear scale space is constructed, the scale level is increased in logarithm mode and divided into o octaves, each octave is provided with s sub-levels, each sub-level adopts the resolution ratio same as that of an original image, and the hierarchy proportion relation is as follows:

σ_i(o,s)＝σ₀2^o+s/S (4)

in the formula (4), O and s respectively represent octave and sublevel, and O belongs to [0],s∈[0,...,S-1]。σ₀Is the initial value of the scale parameter. i is an element of [0]N-O-S is the total number of images contained in the entire nonlinear scale space. The nonlinear diffusion filter model is in time unit, and needs a scale parameter sigma in pixel unit_iConverting to time unit, the corresponding relationship is

Firstly, Gaussian filtering is carried out on an input image; then calculating a gradient histogram of the image so as to obtain a contrast factor k; by τ ═ t_i+1-t_iAll images of the nonlinear scale space can be obtained by using the formula (3):

after a nonlinear scale space is constructed, the characteristic point detection is realized by finding Hessian local maximum value points normalized by different scales. The Hessian matrix is calculated as follows:

in the formula (6), σ is a scale parameter σ_iInteger value of, L_HHessian matrix, L, of the image_xx、L_yyAnd L_xyThe second horizontal derivative, the second vertical derivative and the second cross derivative of the image, respectively.

When searching for the extreme point, each pixel point is compared with all its neighbors, and when it is larger than all its neighbors in its image domain and scale domain, it is the extreme point3 sizes on the previous and next scales are σ_i×σ_iThe window of (2). In order to accelerate the search speed, the window size is fixed to be 3 x 3, then the search space is a cube with the side length of 3 pixels, and the middle detection point, 8 adjacent points with the same scale and 9 x 2 points corresponding to the upper and lower adjacent scales are compared to ensure that the extreme points are detected in the scale space and the two-dimensional image space.

Further, the specific method of step 3 of the present invention is:

in step 3, the 2D-LGF is used for calculating the maximum average index map and the phase consistency direction, and a local feature descriptor is generated. First the maximum mean index map is computed, the image is convolved using 2D-LGF filtering, and then the amplitude A at scale s and orientation o is computed_so(x, y). The amplitudes of all scales are added and divided by the number of scales N_sObtaining a maximum average index graph, wherein the formula is as follows:

and constructing a characteristic direction of phase consistency by using an odd symmetric filter of the Log-Gabor function. The convolution results of the odd symmetric filter and the original image can obtain o convolution results according to different directions, the o convolution results are projected to x and y axes respectively, and the direction of phase consistency is calculated:

phi in the formula (8)_soRepresenting a phase consistency direction characteristic; PC (personal computer)_so(θ) represents the odd symmetric filter convolution result in direction θ; δ is a minimum value that prevents the denominator from being zero.

For multi-modal images, a gradient inversion phenomenon may occur when the phase congruency direction exceeds 180 degrees. In view of this problem, the present invention deals with_soTransforming to limit the phase consistency direction between 0 degree and 180 degrees:

and finally constructing a local feature descriptor. For each feature point, a local region centered on a P × P pixel is selected. Dividing the local area into 6 x 6 sub-areas, and establishing a block distribution histogram by counting the index value of the maximum direction amplitude diagram of each sub-area. The feature descriptors are obtained by merging the distribution histograms of each block and normalized to a unit length. The method mainly comprises the following steps:

performing convolution calculation on the neighborhood of the characteristic points by adopting a 2D-LGF (two dimensional-soil texture), and respectively obtaining convolution results of odd parts and even parts in a scale s and a direction o;

secondly, according to a formula (7), obtaining a maximum average index graph;

thirdly, calculating a phase consistency directional diagram according to the formulas (8) to (9);

selecting the neighborhood of the characteristic point as P multiplied by P, and dividing the neighborhood into n multiplied by n subregions (taking n as an example 6);

fifthly, histogram calculation is carried out in each subregion, firstly, 180 degrees are divided into N_oEach partition is used for calculating the element corresponding to the maximum average index map of each partition, and further obtaining the feature vector of each sub-region;

sixthly, the descriptor of each feature point is formed by the feature vectors of 36 sub-areas in the step five according to a certain sequence. The eigenvector corresponding to sub-region a1 is V1, and thus, V ═ V1, V2, and V36 for any one of the feature points on the image.

Further, the specific method of step 4 of the present invention is:

in step 4, the grid size, kernel size and iteration threshold of the motion vector are initialized. The pairs of feature points with the smallest SAD distance are considered potential matches using a two-way matching method. Partitioning a matching set into n_c×n_cThe number of motion vectors in the (j, k) th cell is marked as A_j,k。

For the average motion vector in the (j, k) -th cell, the formula is as follows:

in order to link initial matching in surrounding cells and improve robustness and filtering performance, the method comprehensively considers local n according to the convolution theory_c×n_cThe unit calculates the motion vector of the current unit, and the formula is as follows:

in the formula (11)

Is a n generated by a Gaussian kernel convolution_c×n_cX 2 matrix, w is a count matrix and | w_j,k|＝A_j,kδ is a small positive number to prevent the denominator from being 0, k is a 3 × 3 gaussian kernel distance matrix, and k is defined by the form:

in the formula (12) c_i,j＝(i,j)^T，

n_i,jAn L2 norm representing a gaussian kernel matrix,

rounding an element to the nearest integer no less than itself, n_kIndicating the gaussian kernel size.

After the kernel convolution operation, a representative motion vector of each unit can be obtained, and each initially matched motion vector m is used_iRepresentative vector corresponding to the cell

The deviation between is defined as:

at this time, the inner point set I to be solved^*Can be determined by comparing the deviation with a given threshold lambda.

I^*＝{(I_i,I′_i):d_i≤λ,i∈N} (14)

I_i＝(x_i,y_i)^TAnd l'_i＝(x′_i,y′_i)^TIs the coordinate of the ith pixel in the reference image I and the image I' to be matched, N represents the number of matching points, and the obtained inner point set I^*I.e. the feature point set after the mismatching is deleted.

Only a portion of the initial matches may be filtered out by the threshold lambda. To address this difficulty, the present invention uses an iterative strategy to remove outliers step by step. Based on the theory from coarse to fine, the method iteratively refines a typical motion vector and optimizes a threshold lambda, an inner connection set is approximated by the result of each iteration until convergence, a larger threshold is set at the beginning of the iteration, the size of the threshold is gradually reduced along with the progress of the iteration times, and the elimination of outliers from coarse to fine is realized.

Further, the specific method of step 5 of the present invention is:

and 5, transforming the multi-modal image by using the correct feature matching points obtained in the step 4 and an affine transformation model to obtain an image registration result.

The phase consistency map, the feature points, the initial feature matching relationship, the correct feature matching relationship and the registered images of the multi-modal image are obtained by the method.

Compared with the prior art, the invention has the advantages and beneficial effects that:

aiming at a multi-modal image with nonlinear radiation difference, the multi-modal image registration method based on progressive filtering is difficult to achieve a good matching effect based on an image intensity or gradient feature matching method, a PC map with structural consistency of a multi-modal image pair is obtained by using 2D-LCF, feature points of the PC map are detected by using nonlinear filtering, and then an initial feature matching set is obtained by using a BBF method. The initial matching set is divided into non-overlapping cells and then a typical motion vector for each cell is calculated using a gaussian kernel convolution operation. Finally, by detecting outliers by checking that the deviation between the assumed motion vector and its corresponding representative motion vector is smaller than a given threshold, we use an iterative strategy to progressively filter out outliers, each iteration setting a different threshold (the threshold is progressively reduced) until convergence. The method of the present invention can converge in several iterations, and some sparse point-based tasks may be inspired by our method when they are implemented by deep learning techniques.

Drawings

The invention will be further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a flow chart of multi-modality image registration according to an embodiment of the present invention;

FIG. 2 is a flow chart of a PFSC algorithm implementation of an embodiment of the present invention;

(a) a multi-modal image pair; (b) the method comprises the following steps PC maps; (c) the method comprises the following steps Feature points; (d) the method comprises the following steps Initial matching; (e) the method comprises the following steps The feature matching result of the first iteration (up) and the motion field vector distribution (down); (f) the method comprises the following steps The feature matching result of the second iteration (up) and the motion field vector distribution (down); (g) the method comprises the following steps The feature matching result (up) and motion field vector distribution (down) of the third iteration; (h) the method comprises the following steps Feature matching results (up) and motion field vector distribution (down) for the fourth iteration;

FIG. 3 shows four sets of multi-modal images, (a) - (d) are four sets of multi-modal images according to an embodiment of the present invention;

FIG. 4 is a graph of the accuracy, recall rate and F-score histogram obtained by the five feature matching algorithms of the present invention, (a) is the accuracy histogram of the five feature matching algorithms, (b) is the recall rate histogram of the five feature matching algorithms, and (c) is the F-score histogram of the five feature matching algorithms;

FIG. 5 shows the PFSC algorithm feature matching result (left) and the correctly matched motion field vector distribution (right) according to an embodiment of the present invention; (a) - (d) correspond to (a) - (d) in fig. 3, respectively;

fig. 6 shows four sets of multi-modality image registration results according to the embodiment of the invention, which correspond to (a) - (d) in fig. 3.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The method provided by the invention can realize the process by using a computer software technology. The method specifically comprises the following steps:

step 1, generating a phase consistency map PC map of a multi-modal image pair by using two-dimensional Log-Gabor filtering;

step 2, generating a nonlinear scale space of the PC map by using an additive operator splitting algorithm AOS, and detecting an extreme value in the nonlinear scale space to obtain a characteristic point with unchanged illumination and contrast;

step 4, constructing an initial feature corresponding relation by adopting bilateral matching, performing convolution operation by using progressive filtering to recover a real smooth motion field, converting feature matching into a denoising model capable of performing convolution filtering, analyzing distribution difference between correct matching and error matching at the same time, further constructing a convolution kernel, gradually recovering the real smooth motion field in the convolution operation process, and eliminating error matching through consistency of motion vectors of initial matching;

and 5, transforming the image by using the correct feature matching points obtained in the previous step and an affine transformation model, and outputting an image registration result.

Referring to fig. 1, the embodiment specifically explains the process of the present invention by taking 4 sets of multi-modal image registration as an example, as follows:

in the step 1, the scale degree, the direction and the number of the neighborhood pixels of the log-Gabor filter are initialized, the scale degree is set to be 4, the direction is set to be 6, and the neighborhood pixels are set to be 90.

Generating a phase consistency map of the multimodal image pair by a two-dimensional 2D-LGF, the calculation formula is as follows:

PC (x, y) in the formula (1) is the size of PC at the point (x, y), W_o(x, y) is a weight function, A_so(x, y) is the magnitude component of the log-Gabor filter at point (x, y), t is the noise threshold, δ is a minimum value to prevent the denominator from being zero, and the sign

Meaning that when its value is positive, the closure amount equals itself, otherwise it is zero, Δ Φ_so(x, y) is the phase deviation function, and s and o are the scale and direction of the log-Gabor filter.

In step 2, the maximum octave, the scale sublevel sum and the control factor of the AOS are initialized. To capture more salient feature points, the present invention sets the maximum octave to 6, the number of sub-levels per scale level to 4, and the control factor to 0.0003.

The nonlinear diffusion filtering is described by a nonlinear partial differential equation, which is as follows:

div and in formula (2)

Respectively, divergence and gradient operators, L is the brightness of the image,

is a function of the conductance, and,

is a Gaussian smoothed image L_σThe gradient of (a) is a contrast factor for controlling the diffusion level, and since the nonlinear partial differential equation is not solved analytically, the iterative solution is generally carried out by a numerical analysis method, and the formula is as follows:

σ_i(o,s)＝σ₀2^o+s/S (4)

in the formula (4), O represents the number of octaves and s represents the number of sub-stages, and O belongs to [0],s∈[0,...,S-1]，σ₀Is the initial value of the scale parameter, i ∈ [ 0.,. N]N-O-S is the total number of images contained in the entire nonlinear scale space, and the nonlinear diffusion filter model is time-based, and requires a scale parameter σ of pixel-based_iConverting to time unit, the corresponding relationship is

after a nonlinear scale space is constructed, the characteristic point detection is realized by finding Hessian local maximum value points normalized by different scales, and the calculation of a Hessian matrix is as follows:

When searching for the extreme point, each pixel point is compared with all its adjacent points, when it is larger than all its adjacent points in its image domain and scale domain, it is the extreme point, and the comparison range of general extreme point is that 3 points of current scale, previous scale and next scale are whose sizes are sigma_i×σ_iThe window of (2). In order to accelerate the search speed, the window size is fixed to be 3 x 3, then the search space is a cube with the side length of 3 pixels, and the middle detection point, 8 adjacent points with the same scale and 9 x 2 points corresponding to the upper and lower adjacent scales are compared to ensure that the extreme points are detected in the scale space and the two-dimensional image space.

In step 3, the 2D-LGF is used for calculating the maximum average index map and the phase consistency direction, and a local feature descriptor is generated. First the maximum mean index map is calculated, the image is convolved using 2D-LGF filtering, and then the amplitude A at scale s and orientation o is calculated_so(x, y). The amplitudes of all scales are added and divided by the number of scales N_sObtaining a maximum average index graph, wherein the formula is as follows:

phi in the formula (8)_soRepresenting a phase consistency direction characteristic; PC (personal computer)_so(θ) represents the odd symmetric filter convolution result in direction θ; θ is the angle of direction o; δ is a minimum value that prevents the denominator from being zero.

secondly, according to a formula (7), obtaining a maximum average index graph;

fifthly, histogram calculation is carried out in each subregion, firstly, 180 degrees are divided into N_oEach partition, calculating the maximum average of each partitionIndexing elements corresponding to the graph to obtain a feature vector of each sub-region;

In step 4, the grid size, kernel size and iteration threshold of the motion vector are initialized. The present invention limits the grid size to between 15 and 30 and sets the kernel size to an odd number no greater than one-third of the grid size. And gradually separating the internal union body and the external union body by using a progressive iteration method, wherein the method is similar to a simulated annealing strategy, a larger threshold value is set at the beginning of iteration, the size of the threshold value is gradually reduced along with the progress of iteration times, the elimination of outliers from coarse to fine is realized, the typical motion vector is iteratively refined, and the threshold value is optimized. The inlining set is approximated with the results of each iteration until convergence. In the experiment, it was sufficient to obtain reliable matching performance in 4 iterations, with thresholds of 0.8, 0.2, 0.1 and 0.01 for each iteration, respectively.

The pairs of feature points with the smallest SAD distance are considered potential matches using a two-way matching method. Partitioning a matching set into n_c×n_cThe number of motion vectors in the (j, k) th cell is marked as A_j,k，

in formula (11)

Is a n generated by a Gaussian kernel convolution_c×n_cX 2 matrix, w is a count matrix and | w_j,k|＝A_j,kδ is a small positive number to prevent the denominator from being 0, k is a 3 × 3 gaussian kernel matrix, and k is defined as:

in the formula (12) c_i,j＝(i,j)^T，

n_i,jAn L2 norm representing a gaussian kernel matrix,

rounding an element to the nearest integer no less than itself, n_kRepresents the gaussian kernel size;

The deviation between is defined as:

I^*＝{(I_i,I′_i):d_i≤λ,i∈N} (14)

I_i＝(x_i,y_i)^TAnd l'_i＝(x′_i,y′_i)^TIs the coordinate of the ith pixel in the reference image I and the image I' to be matched, and N represents the number of matching points.

Only a portion of the initial matches may be filtered out by the threshold lambda. In order to solve the problem, the invention uses an iteration strategy to gradually remove abnormal values, a larger threshold is set at the beginning of iteration, the size of the threshold is gradually reduced along with the progress of the iteration times, outliers are eliminated from coarse to fine, a typical motion vector is iteratively refined, a threshold lambda is optimized, and an inner point set is approximated by the result of each iteration until convergence.

As shown in (a) to (d) of fig. 2, the original image is subjected to PC calculation, feature detection, and feature matching to obtain an initial matching set. As shown in fig. 2 (e) - (h), as the iteration proceeds, the mismatch is gradually filtered out until a correct matching set is obtained, and at this time, the motion vectors corresponding to the correct matching set have structural consistency. The feature matching method is based on the idea of structure consistency progressive filtering, so the algorithm is named PFSC, and the whole algorithm is summarized in the algorithm 1.

The performance of the algorithm was tested using 4 sets of multimodal images, the data set is shown in FIG. 3, and the data description is shown in Table 1.

TABLE 1 multimodal imaging data detailed description

The method of the invention is compared with other four representative most advanced feature matching methods (VFC, LLT, LMR and mTopKRP), and the feature matching performance is quantified by using Precision (Precision), Recall (Recall) and F-score, wherein the Precision is defined as the percentage of the correct matching points retained by the algorithm to all the retained matching points, the Recall is defined as the percentage of the correct matching points retained by the algorithm to all the correct matching points of the image, and the F-score is a balance measure between the accuracy and the Recall. They are defined as follows:

in equations (15) to (17), TP, FP, and FN represent true positive (actually correct, and the algorithm determines correct), false positive (actually wrong, and the algorithm determines correct), and false negative (actually correct, and the algorithm determines wrong), respectively.

As shown in fig. 4, it can be seen that the PFSC is highest in the respective accuracy, recall and F-score relative to the other four methods. Fig. 5 shows the result of image matching and motion vector, and it can be seen that the PFSC can remove most of the outliers, and only a few outliers are misjudged.

In step 5, transforming the image to be registered by using the affine transformation model according to the correct feature correspondence to obtain a registered image, and using a Root MEAN Square Error (RMSE), a MEAN Error (MEAN) and a standard deviation (Std) for measuring the image registration accuracy, which is defined as follows:

in the formulas (18) to (20), N represents the number of correct matches after the mismatching rejection;

the coordinates of the corresponding reference image feature points and the coordinates of the feature points of the image to be registered after transformation are sequentially obtained.

The RMSE, MEAN, and Std of the four sets of multi-modal images were calculated using equations (18) to (20), and the results are shown in tables 2 to 4, and it can be seen that the RMSE, MEAN, and Std values of the four sets of multi-modal images obtained using the PFSC method are all lower than those of the other four methods.

Table 2 RMSE results of five comparison methods for four groups of multi-modal remote sensing images

Table 3 MEAN results of five comparison methods for four groups of multi-modal remote sensing images

Table 2 Std results of five comparison methods for four groups of multi-modal remote sensing images

Fig. 6 shows the image registration result on the chessboard, and it can be found that the registered images achieve good alignment effect.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. A multi-modal image registration method based on progressive filtering is characterized by comprising the following steps:

step 3, calculating a maximum average index graph and a phase consistency direction by using two-dimensional Log-Gabor filtering, and generating a local feature descriptor for describing the feature points obtained in the step 2;

2. The multi-modal image registration method based on progressive filtering according to claim 1, wherein: the specific implementation manner of the step 1 is as follows;

the scale degree, the direction and the number of neighborhood pixels of the log-Gabor filter are initialized, a phase consistency graph of the multi-modal image pair is generated through the two-dimensional 2D-LGF, and the calculation formula is as follows:

PC (x, y) in the formula (1) is the size of PC at the point (x, y), W_o(x, y) is a weight function, A_so(x, y) is the magnitude component of the log-Gabor filter at point (x, y), t is the noise threshold, δ is a minimum value that prevents the denominator from being zero, and the sign

3. The multi-modal image registration method based on progressive filtering according to claim 1, wherein: the specific implementation manner of the step 2 is as follows;

the maximum octave, the scale sublevel and the control factor of the AOS are initialized, and the nonlinear diffusion filtering is described by a nonlinear partial differential equation, wherein the formula is as follows:

div and in formula (2)

is a function of the conductance, and,

is a Gaussian smoothed image L_σK is a contrast factor controlling the level of diffusion, sinceThe linear partial differential equation is not solved analytically, and is solved iteratively by a numerical analysis method, wherein the formula is as follows:

a in the formula (3)_lIs a derivative representing the image in the direction l, m refers to the number of directions l, τ ═ t_i+1-t_iFor the time step, I denotes the identity matrix, LⁱRepresents the solution of the ith iteration of the nonlinear diffusion filtering;

σ_i(o,s)＝σ₀2^o+s/S (4)

in the formula (4), O and s respectively represent octave and sublevel, and O belongs to [0],s∈[0,...,S-1]，σ₀Is the initial value of the scale parameter; i is an element of [0]N ═ O × S is the total number of images contained in the entire nonlinear scale space; the nonlinear diffusion filter model is in time unit, and needs a scale parameter sigma in pixel unit_iConverting to time unit, the corresponding relationship is

in the formula (6), σ is a scale parameter σ_iInteger value of, L_HHessian matrix, L, of the image_xx、L_yyAnd L_xyRespectively a second-order horizontal derivative, a second-order vertical derivative and a second-order cross derivative of the image;

when searching for the extreme point, each pixel point is compared with all the adjacent points, and when the pixel point is larger than all the adjacent points of the image domain and the scale domain, the pixel point is the extreme point.

4. The method of claim 3, wherein the multi-modal image registration method based on progressive filtering is characterized in that: the comparison range of the extreme points is 3 sizes of sigma on the current scale, the previous scale and the next scale_i×σ_iIn order to accelerate the search speed, the window size is fixed to be 3 x 3, then the search space is a cube with the side length of 3 pixels, and the middle detection point, 8 adjacent points with the same scale and 9 x 2 points corresponding to the upper and lower adjacent scales are compared to ensure that extreme points are detected in the scale space and the two-dimensional image space.

5. The multi-modal image registration method based on progressive filtering according to claim 1, wherein: the specific implementation manner of the step 3 is as follows;

first the maximum mean index map is calculated, the image is convolved using 2D-LGF filtering, and then the amplitude A at scale s and orientation o is calculated_so(x, y), the amplitudes of all scales are added and divided by the number of scales N_sObtaining a maximum average index graph, wherein the formula is as follows:

using an odd symmetric filter of a Log-Gabor function to construct a characteristic direction of phase consistency, obtaining o convolution results according to different directions from the convolution results of the odd symmetric filter and the original image, projecting the o convolution results onto x and y axes respectively, and calculating the direction of the phase consistency:

phi in the formula (8)_soRepresenting a phase consistency direction characteristic; PC (personal computer)_so(θ) represents the odd symmetric filter convolution result in direction θ; θ is the angle of direction o; δ is a minimum value preventing the denominator from being zero;

for multi-modal images, the gradient inversion phenomenon occurs when the phase congruency direction exceeds 180 degrees, considering this problem, for phi_soTransforming to limit the phase consistency direction between 0 degree and 180 degrees:

and finally, constructing a local feature descriptor, selecting a local region taking P multiplied by P pixels as the center for each feature point, dividing the local region into n multiplied by n sub-regions, establishing a block distribution histogram by counting the index value of the maximum direction amplitude diagram of each sub-region, combining the distribution histograms of each block to obtain the feature descriptor, and normalizing the feature descriptor to be the unit length.

6. The multi-modal image registration method based on progressive filtering according to claim 1, wherein: the specific implementation manner in the step 4 is as follows;

the grid size, the kernel size and the iteration threshold of the motion vector are initialized, and the feature point pair with the minimum SAD distance is regarded as potential matching by using a bidirectional matching method; partitioning a matching set into n_c×n_cThe (j, k) th unit of the unit not overlapped with each otherThe number of motion vectors in the cell is marked as A_j,k，

in order to link initial matching in surrounding cells and improve robustness and filtering performance, local n is comprehensively considered according to a convolution theory_c×n_cThe unit calculates the motion vector of the current unit, and the formula is as follows:

in the formula (11)

in the formula (12) c_i,j＝(i,j)^T，

n_i,jAn L2 norm representing a gaussian kernel matrix,

rounding an element to the nearest integer no less than itself, n_kIs highThe size of the nucleus;

The deviation between is defined as:

at this time, the inner point set I to be solved^*Can be determined by comparing the deviation with a given threshold lambda;

I^*＝{(I_i,I′_i):d_i≤λ,i∈N} (14)

7. The method of claim 6, wherein the multi-modal image registration based on progressive filtering is as follows: and (3) further removing abnormal values by adopting an iteration strategy, setting a larger threshold value at the beginning of iteration, gradually reducing the size of the threshold value along with the progress of iteration times, eliminating outliers from coarse to fine, iteratively refining a typical motion vector and optimizing a threshold value lambda, and approximating an inner point set by using the result of each iteration until convergence.

8. The multi-modal image registration method based on progressive filtering according to claim 1, wherein: step 5 also comprises the step of evaluating the accuracy of the measured image registration by using the root MEAN square error RMSE, the MEAN error MEAN and the standard deviation Std, which are defined as follows;