CN114463397A - Multi-modal image registration method based on progressive filtering - Google Patents

Multi-modal image registration method based on progressive filtering Download PDF

Info

Publication number
CN114463397A
CN114463397A CN202210021872.0A CN202210021872A CN114463397A CN 114463397 A CN114463397 A CN 114463397A CN 202210021872 A CN202210021872 A CN 202210021872A CN 114463397 A CN114463397 A CN 114463397A
Authority
CN
China
Prior art keywords
image
matching
scale
formula
filtering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210021872.0A
Other languages
Chinese (zh)
Inventor
方圣辉
熊强
龚龑
彭漪
刘小娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202210021872.0A priority Critical patent/CN114463397A/en
Publication of CN114463397A publication Critical patent/CN114463397A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/02Affine transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-modal image registration method based on progressive filtering, which comprises the following steps: step 1, generating a PC map of a multi-modal image pair by using a 2D-LGF; step 2, generating a nonlinear scale space of the PC map by using the AOS, detecting an extreme value in the nonlinear scale space, obtaining a characteristic point with unchanged illumination and contrast, and using the characteristic point for multi-mode image matching; step 3, establishing a structure descriptor by using the 2D-LGF convolution sequence to describe the feature points obtained in the last step; step 4, establishing an initial characteristic corresponding relation by adopting a bilateral matching method, performing convolution operation by using progressive filtering to recover a real smooth motion field, and gradually eliminating error matching by using an iteration method according to the consistency of motion vectors of initial matching; and 5, transforming the image by using the affine transformation model, and outputting an image registration result.

Description

Multi-modal image registration method based on progressive filtering
Technical Field
The invention belongs to the field of image processing, and particularly relates to a multi-modal image registration method based on progressive filtering.
Background
Image matching extracts the correct feature correspondence from two or more images with overlapping regions, which may be acquired by one or more sensors from the same viewpoint or from different viewpoints, at the same time or at different times. Image feature matching has been a fundamental and key problem in the fields of photogrammetry and remote sensing, computer vision, robot vision, medical image analysis, and the like. Image matching has been a popular research topic and has made great progress over the past few decades. However, due to the imaging characteristics of the sensor itself and radiation transmission errors caused by the atmosphere, there may be differences in scale, rotation, radiation, noise, time variation, etc. between images. The large differences in geometry and radiation cause nonlinear radiation difference phenomena between images, which presents a great challenge to a feature matching algorithm, may cause a large reduction in image matching performance, and is difficult to meet the requirements of ever-changing practical applications. Therefore, it is very important to research more effective, general and robust image matching algorithms. Generally, an image having nonlinear radiation differences is referred to as a multi-modal image.
At present, the automatic registration technology of multimodal remote sensing images is widely researched and is one of the research hotspots in the field of image processing. The universal registration and automatic registration of the multi-modal remote sensing images are necessary steps of image fusion, image splicing, target identification and change detection. The method can be used for geometrical registration of optical images and SAR images, registration of high-resolution satellite images, registration of hyperspectral images and other emerging geographic space environments and engineering applications. Because the imaging mechanisms of different sensors are different, and the time, angle and environment for acquiring images are also different, multi-modal image registration still presents many challenges. High accuracy and automatic matching techniques are more difficult to implement, such as image matching where there are significant differences in optical and SAR images, depth maps and visible light images, visible light and infrared images, and so on.
For the multi-modal remote sensing image, the gray scale features are not linear any more. It is not even a non-functional change, the gray-scale relationship between images has no statistical correlation and geometric similarity, and the registration method based on gray scale can cause inaccurate registration. Although the feature descriptors have great advantages in homologous image registration, the feature descriptors do not have the same robustness on multi-modal remote sensing datasets, and the challenges faced by these methods (such as SUSAN, SIFT, phase consistency features, extended SURF) are repetitive feature detection and modality-invariant feature description.
Through studies on image matching, some researchers have found that the geometric information between multimodal images is more stable than the gradient or intensity information under non-linear gray scale changes in multispectral images. Based on this observation, feature points can be detected based on the structural consistency of the image, which can be evaluated by computing similarity measures on structural or shape descriptors. Phase Consistency (PC) models have been shown to capture structural information consistent with human vision and to be robust to illumination and contrast differences. Therefore, some image computer-based algorithms are applied to multispectral or multimodal image matching, however, the PC is susceptible to noise, and aliasing effects greatly affect the image quality and detection accuracy of the PC algorithm. Some scholars extend PC models using Log-Gabor filters, such as oriented phase consistency Histograms (HOPCs), Phase Consistency Structure Descriptors (PCSDs), Radiation Invariant Feature Transforms (RIFTs), and the like. HOPC is a template matching method, and an improved similarity metric (HOPCnc) is proposed on the basis of NCC, however, HOPC requires knowledge of geographic information of an image for geometric correction, and thus may not be applicable to images without geographic information; HOPC uses a Harris detector to detect feature points, however Harris is very sensitive to Nonlinear Radiation Distortion (NRD) and is difficult to universally adapt to different types of multimodal images. RIFT is a feature-based matching method, which uses feature points of a FASR detector PC map, uses MIM to replace gradient for feature description, has good robustness to NRD, realizes rotation invariance by constructing a plurality of minimum mean square errors, but RIFT does not establish scale space for feature detection and description, and is sensitive to large scale and viewpoint change.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a multi-modal image registration method based on progressive filtering, aiming at the defects in the prior art.
The technical scheme adopted by the invention for solving the technical problem is that the method for multi-modal image registration based on progressive filtering comprises the following steps:
step 1, generating a phase consistency map (PC map) of a multi-modal image pair by using two-dimensional Log-Gabor filtering (2D-LGF);
step 2, generating a nonlinear scale space of the PC map by using an additive operator splitting Algorithm (AOS), and detecting an extreme value in the nonlinear scale space to obtain a characteristic point with unchanged illumination and contrast;
step 3, calculating a maximum average index map and a phase consistency direction by using the 2D-LGF, and generating a local feature descriptor for describing the feature points obtained in the previous step;
step 4, constructing an initial feature corresponding relation by adopting bilateral matching, performing convolution operation by using progressive filtering to recover a real smooth motion field, converting feature matching into a denoising model capable of performing convolution filtering, analyzing distribution difference between correct matching and error matching at the same time, further constructing a convolution kernel, gradually recovering the real smooth motion field in the convolution operation process, and eliminating error matching through consistency of initially matched motion vectors to obtain an inner point set;
and 5, transforming the image by using the correct feature matching points in the inner point set obtained in the previous step and using an affine transformation model, and outputting an image registration result.
Further, the specific method of step 1 of the present invention is:
in the step 1, the scale degree, the direction and the number of neighborhood pixels of the log-Gabor filter are initialized, a phase consistency graph of the multi-modal image pair is generated through the two-dimensional 2D-LGF, and the calculation formula is as follows:
Figure BDA0003462982680000031
PC (x, y) in formula (1) is the size of PC at point (x, y). Wo(x, y) is a weight function. t is the noise threshold, Aso(x, y) is the magnitude component of the log-Gabor filter at point (x, y), and δ is a minimum value to prevent the denominator from being zero. Symbol
Figure BDA0003462982680000038
Meaning that the closure amount equals itself when its value is positive and zero otherwise. Delta phiso(x, y) is the phase deviation function, and s and o are the scale and direction of the log-Gabor filter.
Further, the specific method of step 2 of the present invention is:
in step 2, the maximum octave, scale sublevel and control factor of the AOS are initialized. The nonlinear diffusion filtering is described by a nonlinear partial differential equation, which is as follows:
Figure BDA0003462982680000032
div and in formula (2)
Figure BDA0003462982680000033
Respectively divergence and gradient operators, and L is the brightness of the image.
Figure BDA0003462982680000034
Is a function of the conductance, and,
Figure BDA0003462982680000035
is a Gaussian smoothed image LσOf the gradient of (c). k is a contrast factor that controls the level of diffusion. Since the nonlinear partial differential equation has no analytic solution, the method generally performs iterative solution by using a numerical analysis method, and the formula is as follows:
Figure BDA0003462982680000036
a in the formula (3)lIs a derivative representing the image in the direction l, m refers to the number of directions l, τ ═ ti+1-tiFor the time step, I denotes the identity matrix, LiRepresenting the solution of the i-th iteration of the nonlinear diffusion filtering.
When a nonlinear scale space is constructed, the scale level is increased in logarithm mode and divided into o octaves, each octave is provided with s sub-levels, each sub-level adopts the resolution ratio same as that of an original image, and the hierarchy proportion relation is as follows:
σi(o,s)=σ02o+s/S (4)
in the formula (4), O and s respectively represent octave and sublevel, and O belongs to [0],s∈[0,...,S-1]。σ0Is the initial value of the scale parameter. i is an element of [0]N-O-S is the total number of images contained in the entire nonlinear scale space. The nonlinear diffusion filter model is in time unit, and needs a scale parameter sigma in pixel unitiConverting to time unit, the corresponding relationship is
Figure BDA0003462982680000037
Firstly, Gaussian filtering is carried out on an input image; then calculating a gradient histogram of the image so as to obtain a contrast factor k; by τ ═ ti+1-tiAll images of the nonlinear scale space can be obtained by using the formula (3):
Figure BDA0003462982680000041
after a nonlinear scale space is constructed, the characteristic point detection is realized by finding Hessian local maximum value points normalized by different scales. The Hessian matrix is calculated as follows:
Figure BDA0003462982680000042
in the formula (6), σ is a scale parameter σiInteger value of, LHHessian matrix, L, of the imagexx、LyyAnd LxyThe second horizontal derivative, the second vertical derivative and the second cross derivative of the image, respectively.
When searching for the extreme point, each pixel point is compared with all its neighbors, and when it is larger than all its neighbors in its image domain and scale domain, it is the extreme point3 sizes on the previous and next scales are σi×σiThe window of (2). In order to accelerate the search speed, the window size is fixed to be 3 x 3, then the search space is a cube with the side length of 3 pixels, and the middle detection point, 8 adjacent points with the same scale and 9 x 2 points corresponding to the upper and lower adjacent scales are compared to ensure that the extreme points are detected in the scale space and the two-dimensional image space.
Further, the specific method of step 3 of the present invention is:
in step 3, the 2D-LGF is used for calculating the maximum average index map and the phase consistency direction, and a local feature descriptor is generated. First the maximum mean index map is computed, the image is convolved using 2D-LGF filtering, and then the amplitude A at scale s and orientation o is computedso(x, y). The amplitudes of all scales are added and divided by the number of scales NsObtaining a maximum average index graph, wherein the formula is as follows:
Figure BDA0003462982680000043
and constructing a characteristic direction of phase consistency by using an odd symmetric filter of the Log-Gabor function. The convolution results of the odd symmetric filter and the original image can obtain o convolution results according to different directions, the o convolution results are projected to x and y axes respectively, and the direction of phase consistency is calculated:
Figure BDA0003462982680000044
phi in the formula (8)soRepresenting a phase consistency direction characteristic; PC (personal computer)so(θ) represents the odd symmetric filter convolution result in direction θ; δ is a minimum value that prevents the denominator from being zero.
For multi-modal images, a gradient inversion phenomenon may occur when the phase congruency direction exceeds 180 degrees. In view of this problem, the present invention deals withsoTransforming to limit the phase consistency direction between 0 degree and 180 degrees:
Figure BDA0003462982680000051
and finally constructing a local feature descriptor. For each feature point, a local region centered on a P × P pixel is selected. Dividing the local area into 6 x 6 sub-areas, and establishing a block distribution histogram by counting the index value of the maximum direction amplitude diagram of each sub-area. The feature descriptors are obtained by merging the distribution histograms of each block and normalized to a unit length. The method mainly comprises the following steps:
performing convolution calculation on the neighborhood of the characteristic points by adopting a 2D-LGF (two dimensional-soil texture), and respectively obtaining convolution results of odd parts and even parts in a scale s and a direction o;
secondly, according to a formula (7), obtaining a maximum average index graph;
thirdly, calculating a phase consistency directional diagram according to the formulas (8) to (9);
selecting the neighborhood of the characteristic point as P multiplied by P, and dividing the neighborhood into n multiplied by n subregions (taking n as an example 6);
fifthly, histogram calculation is carried out in each subregion, firstly, 180 degrees are divided into NoEach partition is used for calculating the element corresponding to the maximum average index map of each partition, and further obtaining the feature vector of each sub-region;
sixthly, the descriptor of each feature point is formed by the feature vectors of 36 sub-areas in the step five according to a certain sequence. The eigenvector corresponding to sub-region a1 is V1, and thus, V ═ V1, V2, and V36 for any one of the feature points on the image.
Further, the specific method of step 4 of the present invention is:
in step 4, the grid size, kernel size and iteration threshold of the motion vector are initialized. The pairs of feature points with the smallest SAD distance are considered potential matches using a two-way matching method. Partitioning a matching set into nc×ncThe number of motion vectors in the (j, k) th cell is marked as Aj,k
Figure BDA0003462982680000052
For the average motion vector in the (j, k) -th cell, the formula is as follows:
Figure BDA0003462982680000053
in order to link initial matching in surrounding cells and improve robustness and filtering performance, the method comprehensively considers local n according to the convolution theoryc×ncThe unit calculates the motion vector of the current unit, and the formula is as follows:
Figure BDA0003462982680000054
in the formula (11)
Figure BDA0003462982680000056
Is a n generated by a Gaussian kernel convolutionc×ncX 2 matrix, w is a count matrix and | wj,k|=Aj,kδ is a small positive number to prevent the denominator from being 0, k is a 3 × 3 gaussian kernel distance matrix, and k is defined by the form:
Figure BDA0003462982680000055
in the formula (12) ci,j=(i,j)T
Figure BDA0003462982680000061
ni,jAn L2 norm representing a gaussian kernel matrix,
Figure BDA0003462982680000062
rounding an element to the nearest integer no less than itself, nkIndicating the gaussian kernel size.
After the kernel convolution operation, a representative motion vector of each unit can be obtained, and each initially matched motion vector m is usediRepresentative vector corresponding to the cell
Figure BDA0003462982680000063
The deviation between is defined as:
Figure BDA0003462982680000064
at this time, the inner point set I to be solved*Can be determined by comparing the deviation with a given threshold lambda.
I*={(Ii,I′i):di≤λ,i∈N} (14)
Ii=(xi,yi)TAnd l'i=(x′i,y′i)TIs the coordinate of the ith pixel in the reference image I and the image I' to be matched, N represents the number of matching points, and the obtained inner point set I*I.e. the feature point set after the mismatching is deleted.
Only a portion of the initial matches may be filtered out by the threshold lambda. To address this difficulty, the present invention uses an iterative strategy to remove outliers step by step. Based on the theory from coarse to fine, the method iteratively refines a typical motion vector and optimizes a threshold lambda, an inner connection set is approximated by the result of each iteration until convergence, a larger threshold is set at the beginning of the iteration, the size of the threshold is gradually reduced along with the progress of the iteration times, and the elimination of outliers from coarse to fine is realized.
Further, the specific method of step 5 of the present invention is:
and 5, transforming the multi-modal image by using the correct feature matching points obtained in the step 4 and an affine transformation model to obtain an image registration result.
The phase consistency map, the feature points, the initial feature matching relationship, the correct feature matching relationship and the registered images of the multi-modal image are obtained by the method.
Compared with the prior art, the invention has the advantages and beneficial effects that:
aiming at a multi-modal image with nonlinear radiation difference, the multi-modal image registration method based on progressive filtering is difficult to achieve a good matching effect based on an image intensity or gradient feature matching method, a PC map with structural consistency of a multi-modal image pair is obtained by using 2D-LCF, feature points of the PC map are detected by using nonlinear filtering, and then an initial feature matching set is obtained by using a BBF method. The initial matching set is divided into non-overlapping cells and then a typical motion vector for each cell is calculated using a gaussian kernel convolution operation. Finally, by detecting outliers by checking that the deviation between the assumed motion vector and its corresponding representative motion vector is smaller than a given threshold, we use an iterative strategy to progressively filter out outliers, each iteration setting a different threshold (the threshold is progressively reduced) until convergence. The method of the present invention can converge in several iterations, and some sparse point-based tasks may be inspired by our method when they are implemented by deep learning techniques.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flow chart of multi-modality image registration according to an embodiment of the present invention;
FIG. 2 is a flow chart of a PFSC algorithm implementation of an embodiment of the present invention;
(a) a multi-modal image pair; (b) the method comprises the following steps PC maps; (c) the method comprises the following steps Feature points; (d) the method comprises the following steps Initial matching; (e) the method comprises the following steps The feature matching result of the first iteration (up) and the motion field vector distribution (down); (f) the method comprises the following steps The feature matching result of the second iteration (up) and the motion field vector distribution (down); (g) the method comprises the following steps The feature matching result (up) and motion field vector distribution (down) of the third iteration; (h) the method comprises the following steps Feature matching results (up) and motion field vector distribution (down) for the fourth iteration;
FIG. 3 shows four sets of multi-modal images, (a) - (d) are four sets of multi-modal images according to an embodiment of the present invention;
FIG. 4 is a graph of the accuracy, recall rate and F-score histogram obtained by the five feature matching algorithms of the present invention, (a) is the accuracy histogram of the five feature matching algorithms, (b) is the recall rate histogram of the five feature matching algorithms, and (c) is the F-score histogram of the five feature matching algorithms;
FIG. 5 shows the PFSC algorithm feature matching result (left) and the correctly matched motion field vector distribution (right) according to an embodiment of the present invention; (a) - (d) correspond to (a) - (d) in fig. 3, respectively;
fig. 6 shows four sets of multi-modality image registration results according to the embodiment of the invention, which correspond to (a) - (d) in fig. 3.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The method provided by the invention can realize the process by using a computer software technology. The method specifically comprises the following steps:
step 1, generating a phase consistency map PC map of a multi-modal image pair by using two-dimensional Log-Gabor filtering;
step 2, generating a nonlinear scale space of the PC map by using an additive operator splitting algorithm AOS, and detecting an extreme value in the nonlinear scale space to obtain a characteristic point with unchanged illumination and contrast;
step 3, calculating a maximum average index map and a phase consistency direction by using the 2D-LGF, and generating a local feature descriptor for describing the feature points obtained in the previous step;
step 4, constructing an initial feature corresponding relation by adopting bilateral matching, performing convolution operation by using progressive filtering to recover a real smooth motion field, converting feature matching into a denoising model capable of performing convolution filtering, analyzing distribution difference between correct matching and error matching at the same time, further constructing a convolution kernel, gradually recovering the real smooth motion field in the convolution operation process, and eliminating error matching through consistency of motion vectors of initial matching;
and 5, transforming the image by using the correct feature matching points obtained in the previous step and an affine transformation model, and outputting an image registration result.
Referring to fig. 1, the embodiment specifically explains the process of the present invention by taking 4 sets of multi-modal image registration as an example, as follows:
in the step 1, the scale degree, the direction and the number of the neighborhood pixels of the log-Gabor filter are initialized, the scale degree is set to be 4, the direction is set to be 6, and the neighborhood pixels are set to be 90.
Generating a phase consistency map of the multimodal image pair by a two-dimensional 2D-LGF, the calculation formula is as follows:
Figure BDA0003462982680000081
PC (x, y) in the formula (1) is the size of PC at the point (x, y), Wo(x, y) is a weight function, Aso(x, y) is the magnitude component of the log-Gabor filter at point (x, y), t is the noise threshold, δ is a minimum value to prevent the denominator from being zero, and the sign
Figure BDA0003462982680000082
Meaning that when its value is positive, the closure amount equals itself, otherwise it is zero, Δ Φso(x, y) is the phase deviation function, and s and o are the scale and direction of the log-Gabor filter.
In step 2, the maximum octave, the scale sublevel sum and the control factor of the AOS are initialized. To capture more salient feature points, the present invention sets the maximum octave to 6, the number of sub-levels per scale level to 4, and the control factor to 0.0003.
The nonlinear diffusion filtering is described by a nonlinear partial differential equation, which is as follows:
Figure BDA0003462982680000083
div and in formula (2)
Figure BDA0003462982680000084
Respectively, divergence and gradient operators, L is the brightness of the image,
Figure BDA0003462982680000085
is a function of the conductance, and,
Figure BDA0003462982680000086
is a Gaussian smoothed image LσThe gradient of (a) is a contrast factor for controlling the diffusion level, and since the nonlinear partial differential equation is not solved analytically, the iterative solution is generally carried out by a numerical analysis method, and the formula is as follows:
Figure BDA0003462982680000087
a in the formula (3)lIs a derivative representing the image in the direction l, m refers to the number of directions l, τ ═ ti+1-tiFor the time step, I denotes the identity matrix, LiRepresenting the solution of the i-th iteration of the nonlinear diffusion filtering.
When a nonlinear scale space is constructed, the scale level is increased in logarithm mode and divided into o octaves, each octave is provided with s sub-levels, each sub-level adopts the resolution ratio same as that of an original image, and the hierarchy proportion relation is as follows:
σi(o,s)=σ02o+s/S (4)
in the formula (4), O represents the number of octaves and s represents the number of sub-stages, and O belongs to [0],s∈[0,...,S-1],σ0Is the initial value of the scale parameter, i ∈ [ 0.,. N]N-O-S is the total number of images contained in the entire nonlinear scale space, and the nonlinear diffusion filter model is time-based, and requires a scale parameter σ of pixel-basediConverting to time unit, the corresponding relationship is
Figure BDA0003462982680000091
Firstly, Gaussian filtering is carried out on an input image; then calculating a gradient histogram of the image so as to obtain a contrast factor k; by τ ═ ti+1-tiAll images of the nonlinear scale space can be obtained by using the formula (3):
Figure BDA0003462982680000092
after a nonlinear scale space is constructed, the characteristic point detection is realized by finding Hessian local maximum value points normalized by different scales, and the calculation of a Hessian matrix is as follows:
Figure BDA0003462982680000093
in the formula (6), σ is a scale parameter σiInteger value of, LHHessian matrix, L, of the imagexx、LyyAnd LxyThe second horizontal derivative, the second vertical derivative and the second cross derivative of the image, respectively.
When searching for the extreme point, each pixel point is compared with all its adjacent points, when it is larger than all its adjacent points in its image domain and scale domain, it is the extreme point, and the comparison range of general extreme point is that 3 points of current scale, previous scale and next scale are whose sizes are sigmai×σiThe window of (2). In order to accelerate the search speed, the window size is fixed to be 3 x 3, then the search space is a cube with the side length of 3 pixels, and the middle detection point, 8 adjacent points with the same scale and 9 x 2 points corresponding to the upper and lower adjacent scales are compared to ensure that the extreme points are detected in the scale space and the two-dimensional image space.
In step 3, the 2D-LGF is used for calculating the maximum average index map and the phase consistency direction, and a local feature descriptor is generated. First the maximum mean index map is calculated, the image is convolved using 2D-LGF filtering, and then the amplitude A at scale s and orientation o is calculatedso(x, y). The amplitudes of all scales are added and divided by the number of scales NsObtaining a maximum average index graph, wherein the formula is as follows:
Figure BDA0003462982680000094
and constructing a characteristic direction of phase consistency by using an odd symmetric filter of the Log-Gabor function. The convolution results of the odd symmetric filter and the original image can obtain o convolution results according to different directions, the o convolution results are projected to x and y axes respectively, and the direction of phase consistency is calculated:
Figure BDA0003462982680000101
phi in the formula (8)soRepresenting a phase consistency direction characteristic; PC (personal computer)so(θ) represents the odd symmetric filter convolution result in direction θ; θ is the angle of direction o; δ is a minimum value that prevents the denominator from being zero.
For multi-modal images, a gradient inversion phenomenon may occur when the phase congruency direction exceeds 180 degrees. In view of this problem, the present invention deals withsoTransforming to limit the phase consistency direction between 0 degree and 180 degrees:
Figure BDA0003462982680000102
and finally constructing a local feature descriptor. For each feature point, a local region centered on a P × P pixel is selected. Dividing the local area into 6 x 6 sub-areas, and establishing a block distribution histogram by counting the index value of the maximum direction amplitude diagram of each sub-area. The feature descriptors are obtained by merging the distribution histograms of each block and normalized to a unit length. The method mainly comprises the following steps:
performing convolution calculation on the neighborhood of the characteristic points by adopting a 2D-LGF (two dimensional-soil texture), and respectively obtaining convolution results of odd parts and even parts in a scale s and a direction o;
secondly, according to a formula (7), obtaining a maximum average index graph;
thirdly, calculating a phase consistency directional diagram according to the formulas (8) to (9);
selecting the neighborhood of the characteristic point as P multiplied by P, and dividing the neighborhood into n multiplied by n subregions (taking n as an example 6);
fifthly, histogram calculation is carried out in each subregion, firstly, 180 degrees are divided into NoEach partition, calculating the maximum average of each partitionIndexing elements corresponding to the graph to obtain a feature vector of each sub-region;
sixthly, the descriptor of each feature point is formed by the feature vectors of 36 sub-areas in the step five according to a certain sequence. The eigenvector corresponding to sub-region a1 is V1, and thus, V ═ V1, V2, and V36 for any one of the feature points on the image.
In step 4, the grid size, kernel size and iteration threshold of the motion vector are initialized. The present invention limits the grid size to between 15 and 30 and sets the kernel size to an odd number no greater than one-third of the grid size. And gradually separating the internal union body and the external union body by using a progressive iteration method, wherein the method is similar to a simulated annealing strategy, a larger threshold value is set at the beginning of iteration, the size of the threshold value is gradually reduced along with the progress of iteration times, the elimination of outliers from coarse to fine is realized, the typical motion vector is iteratively refined, and the threshold value is optimized. The inlining set is approximated with the results of each iteration until convergence. In the experiment, it was sufficient to obtain reliable matching performance in 4 iterations, with thresholds of 0.8, 0.2, 0.1 and 0.01 for each iteration, respectively.
The pairs of feature points with the smallest SAD distance are considered potential matches using a two-way matching method. Partitioning a matching set into nc×ncThe number of motion vectors in the (j, k) th cell is marked as Aj,k
Figure BDA0003462982680000111
For the average motion vector in the (j, k) -th cell, the formula is as follows:
Figure BDA0003462982680000112
in order to link initial matching in surrounding cells and improve robustness and filtering performance, the method comprehensively considers local n according to the convolution theoryc×ncThe unit calculates the motion vector of the current unit, and the formula is as follows:
Figure BDA0003462982680000113
in formula (11)
Figure BDA0003462982680000119
Is a n generated by a Gaussian kernel convolutionc×ncX 2 matrix, w is a count matrix and | wj,k|=Aj,kδ is a small positive number to prevent the denominator from being 0, k is a 3 × 3 gaussian kernel matrix, and k is defined as:
Figure BDA0003462982680000114
in the formula (12) ci,j=(i,j)T
Figure BDA0003462982680000115
ni,jAn L2 norm representing a gaussian kernel matrix,
Figure BDA0003462982680000116
rounding an element to the nearest integer no less than itself, nkRepresents the gaussian kernel size;
after the kernel convolution operation, a representative motion vector of each unit can be obtained, and each initially matched motion vector m is usediRepresentative vector corresponding to the cell
Figure BDA0003462982680000117
The deviation between is defined as:
Figure BDA0003462982680000118
at this time, the inner point set I to be solved*Can be determined by comparing the deviation with a given threshold lambda.
I*={(Ii,I′i):di≤λ,i∈N} (14)
Ii=(xi,yi)TAnd l'i=(x′i,y′i)TIs the coordinate of the ith pixel in the reference image I and the image I' to be matched, and N represents the number of matching points.
Only a portion of the initial matches may be filtered out by the threshold lambda. In order to solve the problem, the invention uses an iteration strategy to gradually remove abnormal values, a larger threshold is set at the beginning of iteration, the size of the threshold is gradually reduced along with the progress of the iteration times, outliers are eliminated from coarse to fine, a typical motion vector is iteratively refined, a threshold lambda is optimized, and an inner point set is approximated by the result of each iteration until convergence.
As shown in (a) to (d) of fig. 2, the original image is subjected to PC calculation, feature detection, and feature matching to obtain an initial matching set. As shown in fig. 2 (e) - (h), as the iteration proceeds, the mismatch is gradually filtered out until a correct matching set is obtained, and at this time, the motion vectors corresponding to the correct matching set have structural consistency. The feature matching method is based on the idea of structure consistency progressive filtering, so the algorithm is named PFSC, and the whole algorithm is summarized in the algorithm 1.
Figure BDA0003462982680000121
The performance of the algorithm was tested using 4 sets of multimodal images, the data set is shown in FIG. 3, and the data description is shown in Table 1.
TABLE 1 multimodal imaging data detailed description
Figure BDA0003462982680000122
Figure BDA0003462982680000131
The method of the invention is compared with other four representative most advanced feature matching methods (VFC, LLT, LMR and mTopKRP), and the feature matching performance is quantified by using Precision (Precision), Recall (Recall) and F-score, wherein the Precision is defined as the percentage of the correct matching points retained by the algorithm to all the retained matching points, the Recall is defined as the percentage of the correct matching points retained by the algorithm to all the correct matching points of the image, and the F-score is a balance measure between the accuracy and the Recall. They are defined as follows:
Figure BDA0003462982680000132
Figure BDA0003462982680000133
Figure BDA0003462982680000134
in equations (15) to (17), TP, FP, and FN represent true positive (actually correct, and the algorithm determines correct), false positive (actually wrong, and the algorithm determines correct), and false negative (actually correct, and the algorithm determines wrong), respectively.
As shown in fig. 4, it can be seen that the PFSC is highest in the respective accuracy, recall and F-score relative to the other four methods. Fig. 5 shows the result of image matching and motion vector, and it can be seen that the PFSC can remove most of the outliers, and only a few outliers are misjudged.
In step 5, transforming the image to be registered by using the affine transformation model according to the correct feature correspondence to obtain a registered image, and using a Root MEAN Square Error (RMSE), a MEAN Error (MEAN) and a standard deviation (Std) for measuring the image registration accuracy, which is defined as follows:
Figure BDA0003462982680000135
Figure BDA0003462982680000136
Figure BDA0003462982680000141
in the formulas (18) to (20), N represents the number of correct matches after the mismatching rejection;
Figure BDA0003462982680000142
the coordinates of the corresponding reference image feature points and the coordinates of the feature points of the image to be registered after transformation are sequentially obtained.
The RMSE, MEAN, and Std of the four sets of multi-modal images were calculated using equations (18) to (20), and the results are shown in tables 2 to 4, and it can be seen that the RMSE, MEAN, and Std values of the four sets of multi-modal images obtained using the PFSC method are all lower than those of the other four methods.
Table 2 RMSE results of five comparison methods for four groups of multi-modal remote sensing images
Figure BDA0003462982680000143
Table 3 MEAN results of five comparison methods for four groups of multi-modal remote sensing images
Figure BDA0003462982680000144
Table 2 Std results of five comparison methods for four groups of multi-modal remote sensing images
Figure BDA0003462982680000145
Fig. 6 shows the image registration result on the chessboard, and it can be found that the registered images achieve good alignment effect.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (8)

1. A multi-modal image registration method based on progressive filtering is characterized by comprising the following steps:
step 1, generating a phase consistency map PC map of a multi-modal image pair by using two-dimensional Log-Gabor filtering;
step 2, generating a nonlinear scale space of the PC map by using an additive operator splitting algorithm AOS, and detecting an extreme value in the nonlinear scale space to obtain a characteristic point with unchanged illumination and contrast;
step 3, calculating a maximum average index graph and a phase consistency direction by using two-dimensional Log-Gabor filtering, and generating a local feature descriptor for describing the feature points obtained in the step 2;
step 4, constructing an initial feature corresponding relation by adopting bilateral matching, performing convolution operation by using progressive filtering to recover a real smooth motion field, converting feature matching into a denoising model capable of performing convolution filtering, analyzing distribution difference between correct matching and error matching at the same time, further constructing a convolution kernel, gradually recovering the real smooth motion field in the convolution operation process, and eliminating error matching through consistency of initially matched motion vectors to obtain an inner point set;
and 5, transforming the image by using the correct feature matching points in the inner point set obtained in the previous step and using an affine transformation model, and outputting an image registration result.
2. The multi-modal image registration method based on progressive filtering according to claim 1, wherein: the specific implementation manner of the step 1 is as follows;
the scale degree, the direction and the number of neighborhood pixels of the log-Gabor filter are initialized, a phase consistency graph of the multi-modal image pair is generated through the two-dimensional 2D-LGF, and the calculation formula is as follows:
Figure FDA0003462982670000011
PC (x, y) in the formula (1) is the size of PC at the point (x, y), Wo(x, y) is a weight function, Aso(x, y) is the magnitude component of the log-Gabor filter at point (x, y), t is the noise threshold, δ is a minimum value that prevents the denominator from being zero, and the sign
Figure FDA0003462982670000012
Meaning that when its value is positive, the closure amount equals itself, otherwise it is zero, Δ Φso(x, y) is the phase deviation function, and s and o are the scale and direction of the log-Gabor filter.
3. The multi-modal image registration method based on progressive filtering according to claim 1, wherein: the specific implementation manner of the step 2 is as follows;
the maximum octave, the scale sublevel and the control factor of the AOS are initialized, and the nonlinear diffusion filtering is described by a nonlinear partial differential equation, wherein the formula is as follows:
Figure FDA0003462982670000021
div and in formula (2)
Figure FDA0003462982670000022
Respectively, divergence and gradient operators, L is the brightness of the image,
Figure FDA0003462982670000023
is a function of the conductance, and,
Figure FDA0003462982670000024
is a Gaussian smoothed image LσK is a contrast factor controlling the level of diffusion, sinceThe linear partial differential equation is not solved analytically, and is solved iteratively by a numerical analysis method, wherein the formula is as follows:
Figure FDA0003462982670000025
a in the formula (3)lIs a derivative representing the image in the direction l, m refers to the number of directions l, τ ═ ti+1-tiFor the time step, I denotes the identity matrix, LiRepresents the solution of the ith iteration of the nonlinear diffusion filtering;
when a nonlinear scale space is constructed, the scale level is increased in logarithm mode and divided into o octaves, each octave is provided with s sub-levels, each sub-level adopts the resolution ratio same as that of an original image, and the hierarchy proportion relation is as follows:
σi(o,s)=σ02o+s/S (4)
in the formula (4), O and s respectively represent octave and sublevel, and O belongs to [0],s∈[0,...,S-1],σ0Is the initial value of the scale parameter; i is an element of [0]N ═ O × S is the total number of images contained in the entire nonlinear scale space; the nonlinear diffusion filter model is in time unit, and needs a scale parameter sigma in pixel unitiConverting to time unit, the corresponding relationship is
Figure FDA0003462982670000026
Firstly, Gaussian filtering is carried out on an input image; then calculating a gradient histogram of the image so as to obtain a contrast factor k; by τ ═ ti+1-tiAll images of the nonlinear scale space can be obtained by using the formula (3):
Figure FDA0003462982670000027
after a nonlinear scale space is constructed, the characteristic point detection is realized by finding Hessian local maximum value points normalized by different scales, and the calculation of a Hessian matrix is as follows:
Figure FDA0003462982670000028
in the formula (6), σ is a scale parameter σiInteger value of, LHHessian matrix, L, of the imagexx、LyyAnd LxyRespectively a second-order horizontal derivative, a second-order vertical derivative and a second-order cross derivative of the image;
when searching for the extreme point, each pixel point is compared with all the adjacent points, and when the pixel point is larger than all the adjacent points of the image domain and the scale domain, the pixel point is the extreme point.
4. The method of claim 3, wherein the multi-modal image registration method based on progressive filtering is characterized in that: the comparison range of the extreme points is 3 sizes of sigma on the current scale, the previous scale and the next scalei×σiIn order to accelerate the search speed, the window size is fixed to be 3 x 3, then the search space is a cube with the side length of 3 pixels, and the middle detection point, 8 adjacent points with the same scale and 9 x 2 points corresponding to the upper and lower adjacent scales are compared to ensure that extreme points are detected in the scale space and the two-dimensional image space.
5. The multi-modal image registration method based on progressive filtering according to claim 1, wherein: the specific implementation manner of the step 3 is as follows;
first the maximum mean index map is calculated, the image is convolved using 2D-LGF filtering, and then the amplitude A at scale s and orientation o is calculatedso(x, y), the amplitudes of all scales are added and divided by the number of scales NsObtaining a maximum average index graph, wherein the formula is as follows:
Figure FDA0003462982670000031
using an odd symmetric filter of a Log-Gabor function to construct a characteristic direction of phase consistency, obtaining o convolution results according to different directions from the convolution results of the odd symmetric filter and the original image, projecting the o convolution results onto x and y axes respectively, and calculating the direction of the phase consistency:
Figure FDA0003462982670000032
phi in the formula (8)soRepresenting a phase consistency direction characteristic; PC (personal computer)so(θ) represents the odd symmetric filter convolution result in direction θ; θ is the angle of direction o; δ is a minimum value preventing the denominator from being zero;
for multi-modal images, the gradient inversion phenomenon occurs when the phase congruency direction exceeds 180 degrees, considering this problem, for phisoTransforming to limit the phase consistency direction between 0 degree and 180 degrees:
Figure FDA0003462982670000033
and finally, constructing a local feature descriptor, selecting a local region taking P multiplied by P pixels as the center for each feature point, dividing the local region into n multiplied by n sub-regions, establishing a block distribution histogram by counting the index value of the maximum direction amplitude diagram of each sub-region, combining the distribution histograms of each block to obtain the feature descriptor, and normalizing the feature descriptor to be the unit length.
6. The multi-modal image registration method based on progressive filtering according to claim 1, wherein: the specific implementation manner in the step 4 is as follows;
the grid size, the kernel size and the iteration threshold of the motion vector are initialized, and the feature point pair with the minimum SAD distance is regarded as potential matching by using a bidirectional matching method; partitioning a matching set into nc×ncThe (j, k) th unit of the unit not overlapped with each otherThe number of motion vectors in the cell is marked as Aj,k
Figure FDA0003462982670000034
For the average motion vector in the (j, k) -th cell, the formula is as follows:
Figure FDA0003462982670000041
in order to link initial matching in surrounding cells and improve robustness and filtering performance, local n is comprehensively considered according to a convolution theoryc×ncThe unit calculates the motion vector of the current unit, and the formula is as follows:
Figure FDA0003462982670000042
in the formula (11)
Figure FDA0003462982670000043
Is a n generated by a Gaussian kernel convolutionc×ncX 2 matrix, w is a count matrix and | wj,k|=Aj,kδ is a small positive number to prevent the denominator from being 0, k is a 3 × 3 gaussian kernel distance matrix, and k is defined by the form:
Figure FDA0003462982670000044
in the formula (12) ci,j=(i,j)T
Figure FDA0003462982670000045
ni,jAn L2 norm representing a gaussian kernel matrix,
Figure FDA0003462982670000046
rounding an element to the nearest integer no less than itself, nkIs highThe size of the nucleus;
after the kernel convolution operation, a representative motion vector of each unit can be obtained, and each initially matched motion vector m is usediRepresentative vector corresponding to the cell
Figure FDA0003462982670000047
The deviation between is defined as:
Figure FDA0003462982670000048
at this time, the inner point set I to be solved*Can be determined by comparing the deviation with a given threshold lambda;
I*={(Ii,I′i):di≤λ,i∈N} (14)
Ii=(xi,yi)Tand l'i=(x′i,y′i)TIs the coordinate of the ith pixel in the reference image I and the image I' to be matched, N represents the number of matching points, and the obtained inner point set I*I.e. the feature point set after the mismatching is deleted.
7. The method of claim 6, wherein the multi-modal image registration based on progressive filtering is as follows: and (3) further removing abnormal values by adopting an iteration strategy, setting a larger threshold value at the beginning of iteration, gradually reducing the size of the threshold value along with the progress of iteration times, eliminating outliers from coarse to fine, iteratively refining a typical motion vector and optimizing a threshold value lambda, and approximating an inner point set by using the result of each iteration until convergence.
8. The multi-modal image registration method based on progressive filtering according to claim 1, wherein: step 5 also comprises the step of evaluating the accuracy of the measured image registration by using the root MEAN square error RMSE, the MEAN error MEAN and the standard deviation Std, which are defined as follows;
Figure FDA0003462982670000049
Figure FDA0003462982670000051
Figure FDA0003462982670000052
in the formulas (18) to (20), N represents the number of correct matches after the mismatching rejection;
Figure FDA0003462982670000053
the coordinates of the corresponding reference image feature points and the coordinates of the feature points of the image to be registered after transformation are sequentially obtained.
CN202210021872.0A 2022-01-10 2022-01-10 Multi-modal image registration method based on progressive filtering Pending CN114463397A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210021872.0A CN114463397A (en) 2022-01-10 2022-01-10 Multi-modal image registration method based on progressive filtering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210021872.0A CN114463397A (en) 2022-01-10 2022-01-10 Multi-modal image registration method based on progressive filtering

Publications (1)

Publication Number Publication Date
CN114463397A true CN114463397A (en) 2022-05-10

Family

ID=81409617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210021872.0A Pending CN114463397A (en) 2022-01-10 2022-01-10 Multi-modal image registration method based on progressive filtering

Country Status (1)

Country Link
CN (1) CN114463397A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385502A (en) * 2023-03-09 2023-07-04 武汉大学 Image registration method based on region search under geometric constraint
CN117237680A (en) * 2023-08-18 2023-12-15 暨南大学 Heterogeneous model fitting-based multi-source image matching method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385502A (en) * 2023-03-09 2023-07-04 武汉大学 Image registration method based on region search under geometric constraint
CN116385502B (en) * 2023-03-09 2024-04-19 武汉大学 Image registration method based on region search under geometric constraint
CN117237680A (en) * 2023-08-18 2023-12-15 暨南大学 Heterogeneous model fitting-based multi-source image matching method and system
CN117237680B (en) * 2023-08-18 2024-03-12 暨南大学 Heterogeneous model fitting-based multi-source image matching method and system

Similar Documents

Publication Publication Date Title
Yao et al. Multi-modal remote sensing image matching considering co-occurrence filter
Mukhopadhyay et al. A survey of Hough Transform
Wei et al. Tensor voting guided mesh denoising
CN110232387B (en) Different-source image matching method based on KAZE-HOG algorithm
Wang et al. A unified tensor level set for image segmentation
Zhao et al. Deep lucas-kanade homography for multimodal image alignment
CN106485651B (en) The image matching method of fast robust Scale invariant
Wang et al. Recognition and location of the internal corners of planar checkerboard calibration pattern image
CN111767960A (en) Image matching method and system applied to image three-dimensional reconstruction
US20070098219A1 (en) Method and system for filtering, registering, and matching 2.5D normal maps
CN114463397A (en) Multi-modal image registration method based on progressive filtering
Coleman et al. Multi-scale edge detection on range and intensity images
CN111310688A (en) Finger vein identification method based on multi-angle imaging
Fan et al. 3-Points Convex Hull Matching (3PCHM) for fast and robust point set registration
CN107301643A (en) Well-marked target detection method based on robust rarefaction representation Yu Laplce's regular terms
CN111199558A (en) Image matching method based on deep learning
Fan et al. VLSG-SANet: A feature matching algorithm for remote sensing image registration
CN112288784B (en) Descriptor neighborhood self-adaptive weak texture remote sensing image registration method
CN113763274A (en) Multi-source image matching method combining local phase sharpness orientation description
CN112767457A (en) Principal component analysis-based plane point cloud matching method and device
Liu et al. Using Retinex for point selection in 3D shape registration
Hamzah et al. Development of stereo matching algorithm based on sum of absolute RGB color differences and gradient matching
CN115861792A (en) Multi-mode remote sensing image matching method for weighted phase orientation description
CN116128919A (en) Multi-temporal image abnormal target detection method and system based on polar constraint
Wan et al. A performance comparison of feature detectors for planetary rover mapping and localization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination