CN113486928B - Multi-view image alignment method based on rational polynomial model differentiable tensor expression - Google Patents

Multi-view image alignment method based on rational polynomial model differentiable tensor expression Download PDF

Info

Publication number
CN113486928B
CN113486928B CN202110666281.4A CN202110666281A CN113486928B CN 113486928 B CN113486928 B CN 113486928B CN 202110666281 A CN202110666281 A CN 202110666281A CN 113486928 B CN113486928 B CN 113486928B
Authority
CN
China
Prior art keywords
tensor
multiplied
image
coordinate
view
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110666281.4A
Other languages
Chinese (zh)
Other versions
CN113486928A (en
Inventor
季顺平
高建
刘瑾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202110666281.4A priority Critical patent/CN113486928B/en
Publication of CN113486928A publication Critical patent/CN113486928A/en
Application granted granted Critical
Publication of CN113486928B publication Critical patent/CN113486928B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a multi-view image alignment method based on rational polynomial model differentiable tensor expression. The invention directly starts from the multi-view images and the imaging geometric relation, so that the end-to-end multi-view satellite image dense matching becomes possible, and complex preprocessing steps such as satellite image epipolar resampling and the like are not needed. The portability is strong, and the method can be embedded into the existing deep learning multi-view dense matching network as an independent module without changing other structures and training strategies. The theory is tighter, and the method has tighter theoretical support compared with the idea of locally fitting a rational polynomial model into a perspective imaging model, and proves that the technology can achieve higher precision in experiments.

Description

Multi-view image alignment method based on rational polynomial model differentiable tensor expression
Technical Field
The invention relates to a multi-view image alignment method based on rational polynomial model differentiable tensor expression, which can realize rapid coordinate mapping and feature alignment among multi-view images and is applied to the field of three-dimensional earth surface reconstruction based on multi-view optical satellite images.
Background
Processing of multi-view optical satellite stereoscopic image pairs is currently the most prominent way to produce large-scale digital earth surface models. The basic principle is to search homonymous points between stereo image pairs by means of an image dense matching technology and then extract elevation information according to an imaging geometric model.
In recent years, a plurality of multi-view dense matching methods based on deep learning are proposed and successfully applied to multi-view aerial images, and achieve precision and efficiency superior to those of the traditional dense matching algorithm in the earth surface model reconstruction task. The deep learning method which has great potential in the multi-view aerial image dense matching task is also very likely to bring a brand-new revolution for satellite image dense matching. However, currently, mainstream deep learning multi-view dense matching methods, such as MVSNet, RED-Net, UCS-Net, and the like, are designed for pinhole camera imaging models. The methods adopt micro homography transformation to align multi-view image features, and after a set of forward parallel surfaces of a reference camera at different depths are given, the image features of all views are transformed to the reference image views through the micro homography transformation to prepare for subsequent multi-view feature fusion, cost body construction, cost body regularization and depth map regression.
The above-mentioned alignment method for multi-view image features based on micro-homography mainly relies on a perspective camera model (i.e. a pinhole camera model), which can be applied to close-up images and aerial images, but is not applicable to multi-view satellite images. The satellite image is formed by linear array push-broom imaging, and a Rational Polynomial Camera (RPC) model is widely adopted. The differences in the imaging models make dense matching of satellite images unable to benefit from the most advanced depth learning algorithm. Therefore, a huge gap exists between the current mainstream deep learning multi-view dense matching technology and the multi-view satellite image dense matching task.
Some researchers propose to fit a rational polynomial model to a perspective imaging model in a local range of a satellite image, so that the imaging model of the satellite image is changed from the rational polynomial model to a plurality of perspective imaging models. Based on the thought, the mainstream multi-view dense matching framework has the capability of processing the satellite images. However, it is obvious that such a fitting strategy destroys the strict geometric correspondence in the imaging model, and inevitably introduces fitting errors; at the same time, additional image preprocessing is required. In addition, the earth surface three-dimensional reconstruction work aiming at the optical satellite image is still mainly realized by the traditional geometric method at present, and the deep learning multi-view dense matching technology is not effectively applied.
Therefore, in order to apply the current advanced deep learning multi-view dense matching technology to the satellite multi-view image, cross the gap between the prior art and the multi-view satellite image dense matching task, and realize end-to-end large-scale intelligent reconstruction of the global surface, a multi-view image alignment method expressed by tensor operation and based on rational polynomial model differential tensor expression needs to be researched, and the deep learning method with excellent performance on close-range and aviation data is further expanded to the satellite image.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a multi-view image alignment method based on differential tensor expression of a rational polynomial model, and a micro-mapping module based on the rational polynomial model is constructed based on the technology and is applied to multi-view satellite image dense matching. The module takes accurate rational polynomial model parameters and satellite image characteristics as input, and expresses the geometric transformation relation between the multi-view satellite images into a series of differentiable tensor operations by utilizing the mathematical description of a quaternary cubic model, thereby realizing the alignment between the multi-view satellite image characteristics.
The technical scheme adopted for realizing the aim of the invention is that the multi-view image alignment method based on the differentiable tensor expression of the rational polynomial model comprises the following steps:
step 1, dividing a multi-view satellite image into blocks correspondingly, taking a rational polynomial model as geometric constraint to find all corresponding target image block areas and dividing the target image block areas, wherein each reference image block and the corresponding target image block form a group of multi-view units;
step 2, mapping the coordinates of all pixel points in the reference image obtained in the step 1 to an object space through differentiable mapping based on a rational polynomial model inverse solution form to obtain corresponding object coordinates;
step 3, mapping the object space coordinates obtained in the step 2 to a target image through differentiable mapping based on a positive solution form of a rational polynomial model to obtain corresponding target image coordinates;
and 4, aligning the characteristics of the multi-view satellite images, forming a group of coordinate mapping by using the coordinates of all pixel points in the reference image obtained in the step 1 and the coordinates of the target image in the step 3, twisting the characteristics of the multi-view satellite images by using the coordinate mapping relation, and transforming the characteristics of the target image to the reference image to realize the characteristic alignment of the multi-view satellite images.
Further, the specific implementation of step 1 includes the following sub-steps;
step 1.1, selecting a block area corresponding to a multi-view satellite image; adopting a blocking method based on an object space or a blocking method based on a reference image space;
step 1.2, multi-view image segmentation; and calculating the minimum external rectangle of each block area on all the visual angle images, taking the image area in the rectangular range as the final image block, and forming a group of multi-view units by the reference image block and all the source image blocks.
Further, in the blocking method based on the object space, the object space is averagely divided in the designated object space range according to the designated overlapping degree and the blocking size, and then the divided object blocking areas are calculated to the image plane of each visual angle through a rational polynomial model, so that the corresponding blocking areas of the multi-view satellite image are obtained; in the reference image space-based blocking method, a specified object space range is calculated on a reference image plane through a rational polynomial model, an image plane area obtained through calculation is averagely divided according to a specified overlapping degree and a blocking size, and then the divided reference image blocking area is calculated on other visual angles through the rational polynomial model to obtain corresponding image blocking areas on a source image.
Further, the specific implementation of step 2 includes the following sub-steps:
step 2.1, constructing a hypothetical elevation plane: determining an elevation search range in a designated area, and dividing the elevation search range into D parts to obtain D imaginary elevation surfaces;
step 2.2, constructing a coordinate tensor: assuming that the height and width of the image block are H and W respectively; first, a tensor X with all 1 elements and H × W × D dimensions is constructed1(ii) a Acquiring normalization parameters from a rational polynomial model, normalizing the row coordinates of all pixel points in the reference image blocks acquired in the step 1 to acquire normalized row coordinates, forming a tensor with dimension H multiplied by W, increasing the tensor by one dimension and copying the tensor for D times on the dimension to acquire a tensor X with dimension H multiplied by W multiplied by Dline(ii) a All pixels in the reference image block obtained in the step 1The column coordinates of the points are processed by the same process of the row coordinates to obtain a tensor X with dimensions H multiplied by W multiplied by Dsamp(ii) a Combining the elevation values of all the hypothetical elevation surfaces obtained in the step 2.1 into a tensor with one dimension being D, normalizing, adding two dimensions again, and respectively copying H and W times on the two new dimensions to obtain a tensor X with dimensions H multiplied by W multiplied by Dhei(ii) a Mixing X1、Xline、Xsamp、XheiStacking to form a reference image coordinate tensor X of H multiplied by W multiplied by D multiplied by 4;
step 2.3, constructing a coefficient tensor: respectively constructing the numerator coefficients and the denominator coefficients of two equations in a reverse solution form of a rational polynomial model into 4 multiplied by 4 coefficient tensors, which are hereinafter referred to as a row coordinate vector and a column coordinate vector;
the inverse solution form of the known rational polynomial model is:
Figure BDA0003117545640000031
wherein lon and lat respectively represent longitude and latitude coordinates, P represents a ternary cubic polynomial, line, samp and hei respectively represent row coordinates, column coordinates and elevation variables, subscript n represents normalization, and superscript inv represents an inverse solution form; before the description of the construction process is made, the following provisions are made: let T be the coefficient tensor with dimensions 4 × 4 × 4, where T (i, j, k) represents the element at (i, j, k) in the tensor T, where i, j, k can all be taken from the integer set {0,1,2,3 }; forming variable in molecule into a variable sequence V ═ 1, linen,sampn,hein]The ith element in the variable sequence is denoted as ViWhere the constant 1 is considered as a variable alone, and linen,sampnAnd heinRespectively representing three variables of row coordinates, column coordinates and elevation in molecules; let the coefficients of the polynomial P be denoted as aijk,aijkRefers to V in the polynomial PiVjVkThe coefficient corresponding to the term;
the coefficient tensors corresponding to each polynomial P in the formula (1) are constructed in the same mannerThe structure mode is as follows: when i, j, k are all equal, T (i, j, k) is aijk(ii) a When only two of i, j, k are equal, T (i, j, k) ═ aijkA/3; when all i, j, k are not equal, T (i, j, k) ═ aijk/6;
And 2.4, calculating the coordinates of the corresponding object space: first, the value of each polynomial P in formula (1) is calculated, that is, the molecular coefficient tensor T and the coordinate tensor X acquired in step 2.3 are calculated as follows:
Figure BDA0003117545640000041
tensor f (X) with dimension H multiplied by W multiplied by D and formed by the value of the polynomial P can be obtained through batch calculation; the operation in step 2.4 adopts the einstein summation convention, and the parenthesized superscript is specified to indicate that the dimension is not summed, in the formula, the components of the first three dimensions of X, namely the dimensions H, W and D, are reserved and not summed, and are multiplied by the three components of the coefficient tensor T respectively in the last dimension of X to be summed; each polynomial adopts the same calculation process; and then, performing element-by-element division on the tensor obtained by calculating the numerator polynomial and the tensor obtained by calculating the corresponding denominator polynomial, and performing normalization to obtain a longitude coordinate tensor and a latitude coordinate tensor with the dimensionality of H multiplied by W multiplied by D.
Further, the elevation search range in step 2.1 is obtained by calculating an elevation normalization coefficient in rational polynomial model parameters, or a sparse reconstruction result or a public DEM is suitably expanded in the height direction.
Further, the specific implementation of step 3 includes the following sub-steps:
step 3.1, constructing a coordinate tensor: first, a tensor X with all 1 elements and H × W × D dimensions is constructed1(ii) a Normalizing each element in the longitude coordinate tensor obtained in the step 2.4 to obtain a normalized longitude coordinate tensor X with the dimensionality of H multiplied by W multiplied by Dlon(ii) a The latitude coordinate tensor obtained in the step 2.4 is executed with the same process as the longitude coordinate to obtain a normalized latitude coordinate tensor X with the dimensionality of H multiplied by W multiplied by Dlat(ii) a All hypothetical elevations obtained in step 2.1Combining the elevation values of the surfaces into a tensor with one dimension of D, normalizing, adding two dimensions again, and respectively copying H and W times on the two new dimensions to obtain a tensor X with dimensions of H multiplied by W multiplied by Dhei(ii) a Mixing X1、Xlon、Xlat、XheiStacking to form a new H multiplied by W multiplied by D multiplied by 4 coordinate tensor X;
step 3.2, constructing a coefficient tensor: respectively constructing the numerator coefficients and denominator coefficients of two equations in a positive solution form of a rational polynomial model into 4-dimensional 4 x 4 coefficient tensors;
the positive solution form of the known rational polynomial model is:
Figure BDA0003117545640000051
where the superscript fwd represents the positive solution form, unlike the definition in step 2.3, the variable sequence here is modified to V ═ 1, lonn,linen,hein]Other specifications are kept consistent, and the construction process of the coefficient tensor is the same as the step 2.3;
and 3.3, calculating corresponding target image coordinates: and (5) finally obtaining a line coordinate tensor and a column coordinate tensor of the target image with the dimension of H multiplied by W multiplied by D in the same way as the step 2.4.
Further, the specific implementation manner of step 4 is as follows: and (3) forming a group of coordinate mapping by the row coordinate tensor and the column coordinate tensor in the reference image obtained in the step (2.2) and the row coordinate tensor and the column coordinate tensor of the target image obtained by calculation in the step (3.3), and transforming the image or the image characteristics on the target image to the reference image through differentiable bilinear interpolation according to the coordinate mapping relation to realize the alignment of the multi-view satellite image or the image characteristics.
The invention has the following advantages:
the method is used for feature alignment between multi-view satellite images and is a core key technology for establishing a multi-view satellite image dense matching framework.
Directly starting from the multi-view image and imaging geometric relation, the end-to-end multi-view satellite image dense matching becomes possible, and complex preprocessing steps such as satellite image epipolar resampling and the like are not needed.
The portability is strong, and the method can be embedded into the existing deep learning multi-view dense matching network as an independent module without changing other structures and training strategies.
The theory is tighter, and the method has tighter theoretical support compared with the idea of locally fitting a rational polynomial model into a perspective imaging model, and proves that the technology can achieve higher precision in experiments, see the attached figure 2.
Drawings
Fig. 1 is an overall schematic view of the present invention.
FIG. 2 is an error distribution plot of a rational polynomial camera model fitted to a perspective projection model.
FIG. 3 is a schematic diagram of a deep learning-based multi-view dense matching network model framework according to an embodiment of the present invention.
FIG. 4 is a Digital Surface Model (DSM) visualization result from three methods of an embodiment of the invention.
Detailed Description
The technical scheme of the invention is further specifically described by the following embodiments and the accompanying drawings.
The invention provides a multi-view image alignment method based on rational polynomial model differentiable tensor expression, which comprises the following steps:
step 1, correspondingly partitioning the multi-view satellite image. And partitioning the satellite image by using the rational polynomial model as geometric constraint, and finding all source image characteristic partitioned areas corresponding to the reference image characteristic partitioned area to form a group of multi-view units. If enough video memory can directly process the satellite image features of the original size, this step can be omitted.
And 2, mapping the coordinates of all pixel points in the reference image obtained in the step 1 to an object space through differentiable mapping based on a rational polynomial model inverse solution form to obtain corresponding object coordinates.
And 3, mapping the object space coordinates obtained in the step 2 to the source image through differentiable mapping based on a positive solution form of a rational polynomial model to obtain corresponding source image coordinates.
And 4, aligning the characteristics of the multi-view satellite images. And (3) forming a group of coordinate mapping by the coordinates of all pixel points in the reference image obtained in the step (1) and the source image coordinates in the step (3), performing distortion on the characteristics of the multi-view satellite image by using the coordinate mapping relation, and transforming the characteristics on the source image to the reference image to realize characteristic alignment of the multi-view satellite image.
Further, the specific implementation of step 1 includes the following sub-steps:
step 1.1, selecting a block area corresponding to a multi-view satellite image; a blocking method based on an object space or a blocking method based on a reference image space may be employed. In the blocking method based on the object space, the object space is averagely divided in a designated object space range according to the designated overlapping degree and the designated blocking size, and then the divided object blocking area is calculated to the image plane of each visual angle through a rational polynomial model, so that the corresponding blocking area of the multi-view satellite image is obtained. In the reference image space-based blocking method, a specified object space range is calculated on a reference image plane through a rational polynomial model, an image plane area obtained through calculation is averagely divided according to a specified overlapping degree and a blocking size, and then the divided reference image blocking area is calculated on other visual angles through the rational polynomial model to obtain corresponding image blocking areas on a source image.
Step 1.2, multi-view image segmentation; and calculating the minimum external rectangle of each block area on all the visual angle images, taking the image area in the rectangular range as the final image block, and forming a group of multi-view units by the reference image block and all the source image blocks.
Further, the specific implementation of step 2 includes the following sub-steps:
step 2.1, constructing a hypothetical elevation plane: firstly, determining an elevation search range in a designated area. The range can be calculated by an elevation normalization coefficient in rational polynomial model parameters, or obtained by properly extending a sparse reconstruction result or a public DEM in the height direction. The high-range search space range is divided into D parts to obtain D imaginary high-range planes.
Step 2.2, constructing a coordinate tensor: assume that the height and width of the video block are H and W, respectively. First, a tensor X with all 1 elements and H × W × D dimensions is constructed1(ii) a Acquiring normalization parameters from a rational polynomial model, normalizing the row coordinates of all pixel points in the reference image blocks acquired in the step 1 to acquire normalized row coordinates, forming a tensor with dimension H multiplied by W, increasing the tensor by one dimension and copying the tensor for D times on the dimension to acquire a tensor X with dimension H multiplied by W multiplied by Dline(ii) a Carrying out the process of the same row coordinate on the column coordinates of all the pixel points in the reference image block obtained in the step 1 to obtain a tensor X with dimension H multiplied by W multiplied by Dsamp(ii) a Combining the elevation values of all the hypothetical elevation surfaces obtained in the step 2.1 into a tensor with one dimension being D, normalizing, adding two dimensions again, and respectively copying H and W times on the two new dimensions to obtain a tensor X with dimensions H multiplied by W multiplied by Dhei. Mixing X1、Xline、Xsamp、XheiStacking is performed to form a reference image coordinate tensor X of H multiplied by W multiplied by D multiplied by 4.
Step 2.3, constructing a coefficient tensor: the numerator coefficients and denominator coefficients of two equations in the inverse solution form of the rational polynomial model are respectively constructed as 4 × 4 × 4 coefficient tensors in dimension.
The inverse solution form of the known rational polynomial model is:
Figure BDA0003117545640000071
wherein lon and lat respectively represent longitude and latitude coordinates, P represents a ternary cubic polynomial, line, samp and hei respectively represent row coordinates, column coordinates and elevation variables, subscript n represents normalization, and superscript inv represents an inverse solution form. Before the expression of the construction process is carried outThe following is specified: let T be the coefficient tensor with dimensions 4 × 4 × 4, where T (i, j, k) represents the element at (i, j, k) in the tensor T, where i, j, k can all be taken from the integer set {0,1,2,3 }; forming variable in molecule into a variable sequence V ═ 1, linen,sampn,hein]The ith element in the variable sequence is denoted as ViWhere the constant 1 is considered as a variable alone, and linen,sampnAnd heinRespectively representing three variables of row coordinates, column coordinates and elevation in molecules; let the coefficients of the polynomial P be denoted as aijkWherein a isijkRefers to V in the polynomial PiVjVkThe coefficient corresponding to the term.
The coefficient tensors corresponding to each polynomial P in equation (3) are constructed in the same manner as follows: when i, j, k are all equal, T (i, j, k) is aijk(ii) a When only two of i, j, k are equal, T (i, j, k) ═ aijkA/3; when all i, j, k are not equal, T (i, j, k) ═ aijk/6。
And 2.4, calculating the coordinates of the corresponding object space: first, the value of each polynomial P in formula (3) is calculated, that is, the molecular coefficient tensor T and the coordinate tensor X acquired in step 2.3 are calculated as follows:
Figure BDA0003117545640000072
the tensor f (x) having dimensions H × W × D, which is formed by the values of the polynomial P, can be calculated in batches. Note that the above operation uses the einstein summation convention and the parenthetical superscript is defined here to mean that the dimension does not sum. In the equation, the components of the first three dimensions of X (i.e., the H, W, and D dimensions) remain without being summed, and are multiplied by the three components of the coefficient tensor T in the last dimension of X, respectively, and then summed. Each polynomial uses the same calculation process as described above. And then, performing element-by-element division on the tensor obtained by calculating the numerator polynomial and the tensor obtained by calculating the corresponding denominator polynomial, and performing normalization to obtain a longitude coordinate tensor and a latitude coordinate tensor with the dimensionality of H multiplied by W multiplied by D.
Further, the specific implementation of step 3 includes the following sub-steps:
step 3.1, constructing a coordinate tensor: first, a tensor X with all 1 elements and H × W × D dimensions is constructed1(ii) a Normalizing each element in the longitude coordinate tensor obtained in the step 2.4 to obtain a normalized longitude coordinate tensor X with the dimensionality of H multiplied by W multiplied by Dlon(ii) a The latitude coordinate tensor obtained in the step 2.4 is executed with the same process as the longitude coordinate to obtain a normalized latitude coordinate tensor X with the dimensionality of H multiplied by W multiplied by Dlat(ii) a Combining the elevation values of all the hypothetical elevation surfaces obtained in the step 2.1 into a tensor with one dimension being D, normalizing, adding two dimensions again, and respectively copying H and W times on the two new dimensions to obtain a tensor X with dimensions H multiplied by W multiplied by Dhei. Mixing X1、Xlon、Xlat、XheiStacking is performed to newly construct a coordinate tensor X of H multiplied by W multiplied by D multiplied by 4.
Step 3.2, constructing a coefficient tensor: the numerator coefficients and denominator coefficients of two equations in the positive solution form of the rational polynomial model are respectively constructed as 4-dimensional 4 × 4 × 4 coefficient tensors.
The positive solution form of the known rational polynomial model is:
Figure BDA0003117545640000081
where the superscript fwd represents the positive solution form. In contrast to the definition in step 2.3, the variable sequence here is modified to V ═ 1, lonn,linen,hein]Other specifications remain consistent. The construction process of the coefficient tensor is the same as step 2.3.
And 3.3, calculating the corresponding source image coordinates: the same as step 2.4, will not be described again. And finally, obtaining a line coordinate tensor and a column coordinate tensor of the source image with dimensions H multiplied by W multiplied by D.
Further, the specific implementation manner of step 4 is as follows: and (3) forming a group of coordinate mapping by using the row coordinate tensor and the column coordinate tensor in the reference image obtained in the step 2.2 and the row coordinate tensor and the column coordinate tensor of the source image obtained by calculation in the step 3.3. And according to the coordinate mapping relation, image features on the source image are transformed to the reference image through differentiable bilinear interpolation, so that the alignment of the multi-view satellite image features is realized.
Example (b):
the end-to-end large-range earth surface model intelligent reconstruction based on the multi-view satellite images is realized by using the multi-view image alignment method based on rational polynomial model differentiable tensor expression and under a deep learning framework. The constructed micro-mapping module based on the rational polynomial model is embedded into the three networks by taking the three network structures of the most advanced RED-Net, Cas-MVSNet and UCS-Net as basic frameworks to replace the micro-homography transformation module therein for constructing cost bodies among multi-view satellite image features. In the present embodiment, we denote the multi-view dense matching method based on the differentiable homography as "× (homo)", and the multi-view dense matching method based on the differentiable rational polynomial coordinate mapping as "× (rpc)".
The general structure of the three basic network frameworks is shown in fig. 3. The type framework comprises a feature detection module, a cost body construction module, a cost body regularization module, a loss value calculation module and a multi-scale prediction module. The characteristic detection module consists of a plurality of 2D convolution neuron network branches sharing weights; the cost body construction module is realized by adopting the above micro homography transformation technology (homo) or the micro rational polynomial coordinate mapping technology (rpc); the cost body regularization module comprises two forms: a cyclic coding-decoding regularization module (RED-Net, consisting of 4-scale 2D convolutional layers and convolutional gating cyclic units) or a 3D convolutional regularization module (Cas-MVSNet and UCS-Net, consisting of 4-scale 3D convolutional layers) for learning regularized cost bodies in the depth direction and the spatial direction; the loss value calculation module is used for converting the regularized cost body into a depth map and calculating a loss value back propagation guide network training; the multi-scale prediction module comprises a pyramid structure with three scales, the matching of the next scale is restrained by using the matching result of the low resolution of the previous scale, the depth search interval when the cost body is constructed by the next scale is determined in a self-adaptive mode, and the accurate matching from coarse to fine is realized.
And constructing a multi-scale dense matching network for the multi-view satellite images according to the framework, and then carrying out experiment and precision evaluation on the open-source multi-view satellite image dense matching data set TLC SatMVS. TLC SatMVS image data comes from ZY3-02 star carrying a Three-Line array Camera (TLC), the lower visual resolution is 2.1m, the front and rear visual angle resolution is 2.5m, and the method is suitable for large-scale terrain reconstruction. The data set provided 173 sets of three-linear satellite image data, of which 127 sets were training data and 46 sets were test data, each image size was 5120 × 5120 pixels, and the degree of overlap of the three images was around 95%. Limited by the existing memory size, where the training set data is clipped to 5011 to 768 × 384 sized tri-view sub-blocks.
For a network frame (represented by a (rpc)) adopting a slightly rational polynomial model coordinate mapping, directly taking a satellite image and a corresponding rational polynomial coefficient as input, and outputting a height map corresponding to a reference visual angle; for a network framework (denoted as a homography) adopting the micro-homography, a satellite image based on a rational polynomial camera model is locally fitted to be a central projection image, and a depth map corresponding to a lower view angle is output by taking the fitted block image and central projection camera parameters as input.
And training the constructed deep learning dense matching network by using the training set until the training loss is not reduced any more and the model reaches the optimum. All network models are implemented on a deep learning framework Pytorch and follow the same software and hardware and hyper-parameter settings, for example: training on a single NVIDIA RTX 2080Ti GPU display card, wherein the size of a training batch is 1, the optimizer is RMSprop, all training data are iterated for 35 times, and the initial learning rate is set to be 0.001. And (3) the three-layer pyramid structure is adopted to realize depth map (or height map) prediction from coarse to fine. For a three-line array image, the number of the visual angles of the input image is fixed to be N-3, in a three-layer structure, the number of the virtual height planes is {64,32,8}, and except for the UCS-Net model frame which adopts a designed adaptive sampling interval determination strategy, the sampling intervals corresponding to the other two model frames are { (d)max-dmin) /64,5m,2.5m }, wherein d ismaxAnd dminRespectively representing a maximum elevation (or a maximum depth) and a minimum elevation (or a minimum depth) in the search range.
After the model training is finished, inputting the multi-view images of the adjacent visual angles of the test area and the corresponding rational polynomial coefficients (or fitted central projection camera parameters) into a network model to obtain a height map (or a depth map) corresponding to the reference image. And filtering outliers of the height value (or the depth value) of each pixel through a post-processing process of consistency check, back-projecting the residual reliable points to a three-dimensional object space by using imaging Model parameters, and generating a Digital Surface Model (DSM) of a test area through resampling.
Because an end-to-end deep learning multi-view dense matching method directly applied to satellite images does not exist at present, the micro-mapping module based on the rational polynomial model is only embedded into several advanced multi-view dense matching deep learning network frameworks, and is compared with a network model based on a micro-homography conversion module (a pinhole camera model is adopted to fit an RPC model when the network model is applied to the satellite images) in a basic form. Meanwhile, the earth surface reconstruction result of the multi-view satellite image based on the deep learning method is compared with the result based on the traditional method adaptive-COLMAP.
Table 1 shows the quantitative evaluation results of the image alignment method based on the differential homography, the image alignment method based on the rational polynomial differential tensor expression, and the attached-colomap method based on the conventional method on the TLC SatMVS test set, and four measures are used as the indexes for evaluating the DSM result quality, which are: mean Absolute Error (MAE), i.e., L between the ground true value and the estimated height value valid in the DSM, over all grid cells1An average value of the distances; root Mean Square Error (RMSE), i.e. the standard deviation between the ground true height and the estimated height; l is1Error less than 2.5m and 7.5m grid cell percentage (<2.5m,<7.5 m); and the integrity of the DSM results (Comp.). From the quantitative results, it can be seen that under the same network framework, bases are used compared with the homography transformation module adopting the pinhole fitting modelBetter performance is generally obtained in a micro-mappable module of a strictly rational polynomial model; compared with the traditional method adaptive-COLMAP, the method has obvious advantages in performance of all deep learning-based multi-view dense matching methods.
TABLE 1 comparison of quantitative results of different multi-view dense matching methods (or network models) on TLC SatMVS test set
Figure BDA0003117545640000101
Fig. 4 is a network model of an image alignment method based on a micro-homography transformation and an image alignment method based on rational polynomial differentiable tensor expression, and an obtained visual DSM result of adaptive-colomap, taking RED-Net network framework as an example, and it can be seen from the figure that the DSM result generated by a deep learning method based on rational polynomial tensor expression image alignment is more complete, less hollow regions and clearer feature edges.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (7)

1. A multi-view image alignment method based on rational polynomial model differentiable tensor expression is characterized in that: the method comprises the following steps:
step 1, dividing a multi-view satellite image into blocks correspondingly, taking a rational polynomial model as geometric constraint to find all corresponding target image block areas and dividing the target image block areas, wherein each reference image block and the corresponding target image block form a group of multi-view units;
step 2, mapping the coordinates of all pixel points in the reference image obtained in the step 1 to an object space through differentiable mapping based on a rational polynomial model inverse solution form to obtain corresponding object coordinates;
step 3, mapping the object space coordinates obtained in the step 2 to a target image through differentiable mapping based on a positive solution form of a rational polynomial model to obtain corresponding target image coordinates;
and 4, aligning the characteristics of the multi-view satellite images, forming a group of coordinate mapping by using the coordinates of all pixel points in the reference image obtained in the step 1 and the coordinates of the target image in the step 3, twisting the characteristics of the multi-view satellite images by using the coordinate mapping relation, and transforming the characteristics of the target image to the reference image to realize the characteristic alignment of the multi-view satellite images.
2. The method for multi-view image alignment based on differential tensor expression of rational polynomial model as claimed in claim 1, wherein: the specific implementation of the step 1 comprises the following substeps;
step 1.1, selecting a block area corresponding to a multi-view satellite image; adopting a blocking method based on an object space or a blocking method based on a reference image space;
step 1.2, multi-view image segmentation; and calculating the minimum external rectangle of each block area on all the visual angle images, taking the image area in the rectangular range as the final image block, and forming a group of multi-view units by the reference image block and all the source image blocks.
3. The method for multi-view image alignment based on differential tensor expression of rational polynomial model as claimed in claim 2, wherein: in the blocking method based on the object space, the object space is averagely divided in a designated object space range according to the designated overlapping degree and the designated blocking size, and then the divided object blocking area is calculated to the image plane of each visual angle through a rational polynomial model, so that the corresponding blocking area of the multi-view satellite image is obtained; in the reference image space-based blocking method, a specified object space range is calculated on a reference image plane through a rational polynomial model, an image plane area obtained through calculation is averagely divided according to a specified overlapping degree and a blocking size, and then the divided reference image blocking area is calculated on other visual angles through the rational polynomial model to obtain corresponding image blocking areas on a source image.
4. The method for multi-view image alignment based on differential tensor expression of rational polynomial model as claimed in claim 1, wherein: the specific implementation of the step 2 comprises the following substeps:
step 2.1, constructing a hypothetical elevation plane: determining an elevation search range in a designated area, and dividing the elevation search range into D parts to obtain D imaginary elevation surfaces;
step 2.2, constructing a coordinate tensor: assuming that the height and width of the image block are H and W respectively; first, a tensor X with all 1 elements and H × W × D dimensions is constructed1(ii) a Acquiring normalization parameters from a rational polynomial model, normalizing the row coordinates of all pixel points in the reference image blocks acquired in the step 1 to acquire normalized row coordinates, forming a tensor with dimension H multiplied by W, increasing the tensor by one dimension and copying the tensor for D times on the dimension to acquire a tensor X with dimension H multiplied by W multiplied by Dline(ii) a Carrying out the process of the same row coordinate on the column coordinates of all the pixel points in the reference image block obtained in the step 1 to obtain a tensor X with dimension H multiplied by W multiplied by Dsamp(ii) a Combining the elevation values of all the hypothetical elevation surfaces obtained in the step 2.1 into a tensor with one dimension being D, normalizing, adding two dimensions again, and respectively copying H and W times on the two new dimensions to obtain a tensor X with dimensions H multiplied by W multiplied by Dhei(ii) a Mixing X1、Xline、Xsamp、XheiStacking to form a reference image coordinate tensor X of H multiplied by W multiplied by D multiplied by 4;
step 2.3, constructing a coefficient tensor: respectively constructing the numerator coefficients and the denominator coefficients of two equations in a reverse solution form of a rational polynomial model into 4 multiplied by 4 coefficient tensors, which are hereinafter referred to as a row coordinate vector and a column coordinate vector;
the inverse solution form of the known rational polynomial model is:
Figure FDA0003117545630000021
wherein lon and lat respectively represent longitude and latitude coordinates, P represents a ternary cubic polynomial, line, samp and hei respectively represent row coordinates, column coordinates and elevation variables, subscript n represents normalization, and superscript inv represents an inverse solution form; before the description of the construction process is made, the following provisions are made: let T be the coefficient tensor with dimensions 4 × 4 × 4, where T (i, j, k) represents the element at (i, j, k) in the tensor T, where i, j, k can all be taken from the integer set {0,1,2,3 }; forming variable in molecule into a variable sequence V ═ 1, linen,sampn,hein]The ith element in the variable sequence is denoted as ViWhere the constant 1 is considered as a variable alone, and linen,sampnAnd heinRespectively representing three variables of row coordinates, column coordinates and elevation in molecules; let the coefficients of the polynomial P be denoted as aijk,aijkRefers to V in the polynomial PiVjVkThe coefficient corresponding to the term;
the coefficient tensors corresponding to each polynomial P in equation (1) are constructed in the same manner as follows: when i, j, k are all equal, T (i, j, k) is aijk(ii) a When only two of i, j, k are equal, T (i, j, k) ═ aijkA/3; when all i, j, k are not equal, T (i, j, k) ═ aijk/6;
And 2.4, calculating the coordinates of the corresponding object space: first, the value of each polynomial P in formula (1) is calculated, that is, the molecular coefficient tensor T and the coordinate tensor X acquired in step 2.3 are calculated as follows:
Figure FDA0003117545630000031
tensor f (X) with dimension H multiplied by W multiplied by D and formed by the value of the polynomial P can be obtained through batch calculation; the operation in step 2.4 uses the einstein summation convention and specifies that the parenthetical superscript indicates that this dimension is not summed, in which the first three dimensions of X, i.e. the components of the H, W and D dimensionsThe three components of the coefficient tensor T are respectively multiplied in the last dimension of X and then summed; each polynomial adopts the same calculation process; and then, performing element-by-element division on the tensor obtained by calculating the numerator polynomial and the tensor obtained by calculating the corresponding denominator polynomial, and performing normalization to obtain a longitude coordinate tensor and a latitude coordinate tensor with the dimensionality of H multiplied by W multiplied by D.
5. The method for multi-view image alignment based on differential tensor expression of rational polynomial model as claimed in claim 4, wherein: and 2.1, calculating the elevation search range through an elevation normalization coefficient in rational polynomial model parameters, or obtaining a sparse reconstruction result or a public DEM in a height direction through proper external expansion.
6. The method for multi-view image alignment based on differential tensor expression of rational polynomial model as claimed in claim 4, wherein: the specific implementation of the step 3 comprises the following substeps:
step 3.1, constructing a coordinate tensor: first, a tensor X with all 1 elements and H × W × D dimensions is constructed1(ii) a Normalizing each element in the longitude coordinate tensor obtained in the step 2.4 to obtain a normalized longitude coordinate tensor X with the dimensionality of H multiplied by W multiplied by Dlon(ii) a The latitude coordinate tensor obtained in the step 2.4 is executed with the same process as the longitude coordinate to obtain a normalized latitude coordinate tensor X with the dimensionality of H multiplied by W multiplied by Dlat(ii) a Combining the elevation values of all the hypothetical elevation surfaces obtained in the step 2.1 into a tensor with one dimension being D, normalizing, adding two dimensions again, and respectively copying H and W times on the two new dimensions to obtain a tensor X with dimensions H multiplied by W multiplied by Dhei(ii) a Mixing X1、Xlon、Xlat、XheiStacking to form a new H multiplied by W multiplied by D multiplied by 4 coordinate tensor X;
step 3.2, constructing a coefficient tensor: respectively constructing the numerator coefficients and denominator coefficients of two equations in a positive solution form of a rational polynomial model into 4-dimensional 4 x 4 coefficient tensors;
the positive solution form of the known rational polynomial model is:
Figure FDA0003117545630000032
where the superscript fwd represents the positive solution form, unlike the definition in step 2.3, the variable sequence here is modified to V ═ 1, lonn,linen,hein]Other specifications are kept consistent, and the construction process of the coefficient tensor is the same as the step 2.3;
and 3.3, calculating corresponding target image coordinates: and (5) finally obtaining a line coordinate tensor and a column coordinate tensor of the target image with the dimension of H multiplied by W multiplied by D in the same way as the step 2.4.
7. The method for multi-view image alignment based on differential tensor expression of rational polynomial model as claimed in claim 6, wherein: the specific implementation manner of the step 4 is as follows: and (3) forming a group of coordinate mapping by the row coordinate tensor and the column coordinate tensor in the reference image obtained in the step (2.2) and the row coordinate tensor and the column coordinate tensor of the target image obtained by calculation in the step (3.3), and transforming the image or the image characteristics on the target image to the reference image through differentiable bilinear interpolation according to the coordinate mapping relation to realize the alignment of the multi-view satellite image or the image characteristics.
CN202110666281.4A 2021-06-16 2021-06-16 Multi-view image alignment method based on rational polynomial model differentiable tensor expression Active CN113486928B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110666281.4A CN113486928B (en) 2021-06-16 2021-06-16 Multi-view image alignment method based on rational polynomial model differentiable tensor expression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110666281.4A CN113486928B (en) 2021-06-16 2021-06-16 Multi-view image alignment method based on rational polynomial model differentiable tensor expression

Publications (2)

Publication Number Publication Date
CN113486928A CN113486928A (en) 2021-10-08
CN113486928B true CN113486928B (en) 2022-04-12

Family

ID=77934954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110666281.4A Active CN113486928B (en) 2021-06-16 2021-06-16 Multi-view image alignment method based on rational polynomial model differentiable tensor expression

Country Status (1)

Country Link
CN (1) CN113486928B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116109485A (en) * 2023-02-22 2023-05-12 中科星图数字地球合肥有限公司 Remote sensing image updating method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101231754A (en) * 2008-02-03 2008-07-30 四川虹微技术有限公司 Multi-visual angle video image depth detecting method and depth estimating method
CN101901502A (en) * 2010-08-17 2010-12-01 黑龙江科技学院 Global optimal registration method of multi-viewpoint cloud data during optical three-dimensional measurement
CN102855628A (en) * 2012-08-20 2013-01-02 武汉大学 Automatic matching method for multisource multi-temporal high-resolution satellite remote sensing image
CN104361590A (en) * 2014-11-12 2015-02-18 河海大学 High-resolution remote sensing image registration method with control points distributed in adaptive manner
CN108415871A (en) * 2017-02-10 2018-08-17 北京吉威时代软件股份有限公司 Based on the half matched intensive DSM generation methods of global multi-view images of object space
CN111127538A (en) * 2019-12-17 2020-05-08 武汉大学 Multi-view image three-dimensional reconstruction method based on convolution cyclic coding-decoding structure
CN113962858A (en) * 2021-10-22 2022-01-21 沈阳工业大学 Multi-view depth acquisition method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101231754A (en) * 2008-02-03 2008-07-30 四川虹微技术有限公司 Multi-visual angle video image depth detecting method and depth estimating method
CN101901502A (en) * 2010-08-17 2010-12-01 黑龙江科技学院 Global optimal registration method of multi-viewpoint cloud data during optical three-dimensional measurement
CN102855628A (en) * 2012-08-20 2013-01-02 武汉大学 Automatic matching method for multisource multi-temporal high-resolution satellite remote sensing image
CN104361590A (en) * 2014-11-12 2015-02-18 河海大学 High-resolution remote sensing image registration method with control points distributed in adaptive manner
CN108415871A (en) * 2017-02-10 2018-08-17 北京吉威时代软件股份有限公司 Based on the half matched intensive DSM generation methods of global multi-view images of object space
CN111127538A (en) * 2019-12-17 2020-05-08 武汉大学 Multi-view image three-dimensional reconstruction method based on convolution cyclic coding-decoding structure
CN113962858A (en) * 2021-10-22 2022-01-21 沈阳工业大学 Multi-view depth acquisition method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于最小二乘支持向量机的图像配准研究;刘丁 等;《仪器仪表学报》;20081231;全文 *
多视角图像特征匹配的高光去除方法;温佩芝等;《计算机工程与应用》;20180725(第23期);全文 *

Also Published As

Publication number Publication date
CN113486928A (en) 2021-10-08

Similar Documents

Publication Publication Date Title
CN111462329B (en) Three-dimensional reconstruction method of unmanned aerial vehicle aerial image based on deep learning
CN108510573B (en) Multi-view face three-dimensional model reconstruction method based on deep learning
CN109472819B (en) Binocular parallax estimation method based on cascade geometric context neural network
CN111476242B (en) Laser point cloud semantic segmentation method and device
CN106023230B (en) A kind of dense matching method of suitable deformation pattern
CN111626927B (en) Binocular image super-resolution method, system and device adopting parallax constraint
CN113962858A (en) Multi-view depth acquisition method
CN110930500A (en) Dynamic hair modeling method based on single-view video
CN113313740B (en) Disparity map and surface normal vector joint learning method based on plane continuity
CN109215118B (en) Incremental motion structure recovery optimization method based on image sequence
CN114119884A (en) Building LOD1 model construction method based on high-score seven-satellite image
CN113538243A (en) Super-resolution image reconstruction method based on multi-parallax attention module combination
CN113486928B (en) Multi-view image alignment method based on rational polynomial model differentiable tensor expression
CN117315169A (en) Live-action three-dimensional model reconstruction method and system based on deep learning multi-view dense matching
CN115375838A (en) Binocular gray image three-dimensional reconstruction method based on unmanned aerial vehicle
CN114757862A (en) Image enhancement progressive fusion method for infrared light field equipment
CN104796624A (en) Method for editing and propagating light fields
CN108615221B (en) Light field angle super-resolution method and device based on shearing two-dimensional polar line plan
CN112116646B (en) Depth estimation method for light field image based on depth convolution neural network
CN115100382B (en) Nerve surface reconstruction system and method based on hybrid characterization
CN111696167A (en) Single image super-resolution reconstruction method guided by self-example learning
Zhang et al. A CNN-based subpixel level DSM generation approach via single image super-resolution
CN116681844A (en) Building white film construction method based on sub-meter stereopair satellite images
CN116310228A (en) Surface reconstruction and new view synthesis method for remote sensing scene
CN111932670B (en) Three-dimensional human body self-portrait reconstruction method and system based on single RGBD camera

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant