CN115423978A - Image laser data fusion method based on deep learning and used for building reconstruction - Google Patents

Image laser data fusion method based on deep learning and used for building reconstruction Download PDF

Info

Publication number
CN115423978A
CN115423978A CN202211059667.XA CN202211059667A CN115423978A CN 115423978 A CN115423978 A CN 115423978A CN 202211059667 A CN202211059667 A CN 202211059667A CN 115423978 A CN115423978 A CN 115423978A
Authority
CN
China
Prior art keywords
depth map
visible light
point cloud
reconstruction
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211059667.XA
Other languages
Chinese (zh)
Inventor
谢红梅
曾田子
徐梓雲
邱文
蒋晓悦
姚冠宇
冯晓毅
彭进业
文明
苗阿新
夏召强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202211059667.XA priority Critical patent/CN115423978A/en
Publication of CN115423978A publication Critical patent/CN115423978A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for fusing visible light images and laser radar data based on deep learning, which can be used for three-dimensional reconstruction of outdoor buildings, and comprises the following steps: firstly, acquiring a visible light image and laser radar data and preprocessing the visible light image and the laser radar data; secondly, sparse reconstruction and camera pose estimation are carried out through a motion recovery Structure (SFM) frame COLMAP; secondly, constructing a depth map completion network model based on combination of explicit expression and a Space Propagation Network (SPN), inputting a data set formed by a visible light image, a laser radar depth map and a laser radar point cloud into the network model for training to obtain a trained depth map completion network model, inputting a sparse laser radar depth map to be completed, the point cloud and the visible light image into the trained model, and estimating a dense depth map; and finally, carrying out dense reconstruction, grid reconstruction and texture mapping by using the estimated depth map through an open-source multi-view stereo (MVS) framework OpenMVS. The invention provides a depth map completion (namely, the visible light image and laser radar data are fused) method based on the combination of explicit expression and SPN, which makes full use of two-dimensional image information and three-dimensional space structure information, increases the accuracy of depth map estimation and improves the precision of three-dimensional reconstruction.

Description

Image laser data fusion method based on deep learning for building reconstruction
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a visible light image and laser radar data fusion method based on deep learning and capable of being used for three-dimensional reconstruction of outdoor buildings.
Background
The intensive three-dimensional reconstruction with large scale and high precision is one of the most classic problems in the fields of photogrammetry and computer vision, and is very important for various applications such as automatic driving, quality control monitoring, virtual tourism, augmented reality, cultural heritage protection and the like.
Whether rendering realistic objects or further analyzing the three-dimensional model, both accurate spatial geometry information and high-fidelity color texture information are often lacking. Although the laser scanning reconstruction technology can recover the real size of a target object, the accuracy can reach millimeter level, but the difference of environmental scenes can also cause the inherent sparsity of point cloud data acquired by a laser radar. Lack of texture expression and susceptibility to defects and holes at the edges and corners of the model object. The model built by the multi-view-based reconstruction technology is visual in effect, but lacks the real depth information of a target object, and is greatly influenced by environmental factors such as illumination and the like, so that the final three-dimensional model is poor in reconstruction effect. Any method is independently used and has certain limitation, so that the hole noise point is repaired according to different data sources based on the three-dimensional reconstruction strategy fused by multiple data sources, the actual coordinate deviation of the three-dimensional point cloud is reduced, and more targeted three-dimensional data support and service can be provided for structure protection and information retention.
The research of performing three-dimensional reconstruction by using visible light images and laser radar data mainly comprises depth estimation (depth map completion) and three-dimensional grid reconstruction, wherein the depth map completion is fused with the visible light images and sparse depth map prediction derived by the laser radar to generate a dense depth map, which is the most critical step.
In the past decades, scholars make great efforts in tasks such as model construction and 3D object detection by using a depth map completion method. The method based on deep learning shows remarkable performance on the task of completing the deep map and draws the development trend. Previous work has shown that a network with multiple convolutional layers or a simple automatic encoder can make up for the depth of the miss. Furthermore, depth completion can be further improved by using RGB information. A typical approach of this type is to use dual encoders to extract features from the sparse depth map and its corresponding RGB image, respectively, and then fuse them in the decoder. To advance the progress of depth map completion, recent methods tend to use a complex network structure and a complex learning strategy. In addition to multi-branching for extracting features from multi-modal data (e.g., images and sparse depths), researchers have begun integrating surface normals, affinity matrices, residual depth maps, etc. into their framework. Furthermore, to address the lack of supervised pixels, some work has introduced the use of multi-view geometric constraints and anti-regularization.
Depth map completion methods can be roughly classified into five categories: 1. and (5) a previous-stage fusion model. 2. And (5) later stage fusion model. 3. An explicit three-dimensional representation model. 4. A residual depth model. 5. A model based on Spatial Propagation Network (SPN).
1. Early stage fusion model: such methods typically aggregate images and sparse depth maps directly as input, or fuse multi-modal features at the first convolution layer, as in fig. 1 (a).
2. Later fusion model: as in fig. 1 (b), this approach generally consists of a dual encoder or two sub-networks as shown in fig. 1; one for extracting RGB features and the other for extracting depth features. The fusion is performed in the middle layer, for example, fusing features extracted from the encoder.
3. Explicit three-dimensional representation model: such methods typically apply three-dimensional convolution, embedding surface curves, or learning information directly from three-dimensional point clouds, as in fig. 1 (c), to predict dense depth maps.
4. Residual depth model: as shown in fig. 1 (d), this method generally learns a coarse depth map and a residual depth map, and generates a final depth map by combining them.
5. SPN-based model: as shown in fig. 1 (e), such methods typically first learn the affinity matrix and the initial coarse depth map through the encoder-decoder network, and then use the SPN for affinity-based depth map refinement iterations.
Disclosure of Invention
As can be seen from the published models of scholars, the explicit three-dimensional representation model, the SPN-based model and the residual depth model exhibit more advanced performance, often superior to other methods. The SPN-based model learns three-dimensional geometric relationships in an implicit way, and the explicit three-dimensional representation model greatly facilitates the progress of depth map completion. Therefore, the depth map completion method based on the combination of the explicit three-dimensional representation and the SPN is provided, the visible light RGB image and the sparse depth information of the laser radar are fused, the dense and accurate depth map can be obtained, and the method has potential application value in application scenes such as large building reconstruction for cultural relic protection, three-dimensional reconstruction of street view reconstruction for automatic driving and the like.
The technical method adopted by the invention is as follows: a visible light image and laser radar data fusion method based on deep learning and applicable to three-dimensional reconstruction of outdoor buildings comprises the following steps:
acquiring a visible light image and a laser radar data set and preprocessing the visible light image and the laser radar data set;
step 101: respectively acquiring a multi-angle visible light image and a point cloud of the same scene by using a visible light camera and a laser radar/laser scanner;
step 102: projecting the point cloud acquired by the laser radar onto an imaging plane of the visible light camera to obtain a corresponding sparse depth map;
step 103: acquiring a depth map truth value: superposing each sparse depth map and 2n +1 sparse depth maps (n =5 when KITTI is processed) adjacent to each other at sampling time to increase the density of the generated depth map, cleaning accumulated laser scanning projection by using semi-global matching (SGM), removing abnormal values of occlusion, dynamic motion and measurement artifacts, and taking the finally superposed depth map as a true value of the depth map;
step 104: processing to obtain that each visible light RGB image corresponds to a sparse depth map and a point cloud one by one, forming a data set by a plurality of visible light RGB images and the sparse depth maps and the point cloud which correspond to one by one, and dividing the obtained data set into a training set and a testing set;
performing sparse reconstruction and camera pose estimation through an open source SFM framework COLMAP;
step 201: inputting the multi-view visible light picture into a COLMAP, and obtaining a camera pose and a sparse point cloud through sparse reconstruction;
step 202: converting the binary type related file generated in the step 201 into a text format, and changing other types of camera models in the camera.
Step three, constructing an image feature extraction module of the depth map completion network, and extracting features of the image and the depth map, for example, fig. 3 is a coder-decoder framework of the network;
step 301: an image encoder is constructed based on a residual network. An image encoder of visible light image and sparse depth map features uses ResNet as a basic structure, processes two kinds of input through an extra convolutional layer, and connects two image features of different sources as input of ResNet after passing through a first layer of convolutional layer;
step 302: obtaining an intermediate feature vector representation of one of the two images by passing the result obtained in the step 301 through 5 ResNet rolling blocks;
step four, constructing a point cloud feature extraction module of the depth map completion network, and extracting the features of the point cloud;
step 401: selecting a point cloud processing classic network PointNet + + as a point cloud characteristic encoder;
step 402: grouping the input point clouds, and extracting the features of each group: carrying out dimension change, carrying out convolution operation, and finally carrying out maximum pooling according to a PointNet mode to obtain characteristics;
step 403: sampling and grouping the result of the step 402 for multiple times, and obtaining the final overall characteristics by the operation of PointNet;
constructing a decoder module of the depth map completion network, and performing up-sampling on the obtained features;
step 501: the decoder uses the multi-scale image features from the image encoder and the point features from the point cloud encoder, and is composed of four transposed convolution layers and convolution layers, and the transposed convolution layers up-sample the features;
step 502: projecting point features onto each transposed block in the same proportion as the image features by feature projection;
step 503: the shared weight of the initial dense depth, the confidence coefficient, the non-local neighborhood and the original affinity is estimated by the characteristics of the decoder network, the output of the last transposed block is processed by the convolution layer, and the initial dense depth image, the initial confidence coefficient and the original affinity are predicted;
step six, constructing an SPN module of the depth map completion network, and performing iterative optimization to obtain a final depth map;
step 601: the SPN may propagate information from regions with high confidence to regions with low confidence based on the affinity of the data, but if the propagation neighborhood is fixed, the depth distribution within the local region is ignored, so non-fixed local SPNs are employed;
step 602: the non-stationary local SPN estimates the neighborhood of each pixel outside the local area (i.e., the non-stationary local) based on color and depth information, the non-stationary local neighborhood being defined as:
Figure BDA0003826163090000051
in the formula, I and D are respectively a visible light RGB image and a sparse depth map; f. of φ (. Is a non-stationary local neighborhood prediction network that estimates K neighbors per pixel under a learnable parameter φ; p and q are real numbers;
step 603: and (3) realizing the non-fixed local SPN by the initial dense depth map obtained in the step (503) through deformable convolution, wherein the formula of iterative optimization is as follows:
Figure BDA0003826163090000052
wherein (m, n) and (i, j) are coordinates of the reference pixel and the neighboring pixel, respectively,
Figure BDA0003826163090000053
which represents the affinity of the reference pixel,
Figure BDA0003826163090000054
indicates the affinity of the pixels at (m, n) and (i, j). The first term to the right of the formula represents the propagation of the reference pixel and the second term represents the propagation affinity of the neighborhood weighted by the corresponding pixel.
Step 604: to ensure the stability of the transmission, the affinity was normalized by combining confidence before transmission, using Tanh-gamma-Abs-Sum * The process of (1) according to the formula:
Figure BDA0003826163090000061
in the formula, c i,j ∈[0,1]Indicating that the pixel is in (i,j) The confidence of (c).
Constructing a loss function of the depth map completion network;
step 701: the reconstruction loss formula in the depth map completion is as follows:
Figure BDA0003826163090000062
wherein D gt Is a groudtuth depth map, D pred Is a depth map predicted by an algorithm, d υ V and | v | represent the depth value at the pixel index v, D, respectively gt P is 1 denotes l 1 Loss, 2 denotes l 2 Loss;
step 702: introducing a Chamfering Distance (CD) in 3D point cloud processing, wherein the CD represents that the mutual nearest points in two point sets are averaged, and the calculation formula is as follows:
Figure BDA0003826163090000063
wherein S 1 And S 2 For two 3D point cloud sets, back projecting a dense depth map predicted by a depth map completion network to a 3D space to obtain a pseudo laser point, performing the same operation on the dense depth map in a group route, and calculating the CD loss between the dense depth map and the pseudo laser point;
step 703: combining the reconstruction loss in the depth map completion and the CD loss iteration in the point cloud processing, the final loss function is:
L=μL recon +(1-μ)L CD
wherein mu is a weight coefficient lost by depth map reconstruction;
step eight, predicting test set data by using the trained model to obtain a depth map completion result;
performing dense reconstruction, grid reconstruction and texture mapping by using the estimated depth map through an open source MVS framework OpenMVS;
step 901: replacing a depth map calculation part in OpenMVS, and obtaining dense point cloud by a method of fusing multi-frame depth maps according to a camera pose calculated by COLMAP and a dense depth map obtained by depth map completion network prediction.
Step 902: and continuously using the grid reconstruction in the OpenMVS to obtain a three-dimensional grid model from the dense point cloud obtained in the step 901, using a grid optimization module to obtain a finer grid model, and using a texture mapping module to obtain a final three-dimensional surface model with textures.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention has simple steps, reasonable design and convenient realization, use and operation.
2. According to the depth map completion method, the depth map completion is carried out by using a method of combining explicit three-dimensional expression with an implicit model, 3D geometric clues are captured from sparse irregular depth distribution, and the accuracy of the depth map completion is improved.
3. The invention simultaneously utilizes the two-dimensional image information and the three-dimensional information to fuse the characteristics of the two-dimensional image information and the three-dimensional information, thereby improving the precision of three-dimensional reconstruction.
4. According to the method, a loss function in 3D point cloud processing is introduced into a depth map for completion, and more three-dimensional structure information is fed back to a model.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
Fig. 1 shows the classification of the main depth map completion methods.
FIG. 2 is a flow chart of the method of the present invention.
Fig. 3 is a block diagram of an encoder-decoder of a depth map completion network.
Detailed Description
The method of the present invention is further described in detail below with reference to the accompanying drawings and embodiments of the invention.
As shown in fig. 2, the present invention comprises the steps of:
acquiring a visible light image and a laser radar data set and preprocessing the visible light image and the laser radar data set;
step 101: respectively acquiring a multi-angle visible light image and a point cloud of the same scene by using a visible light camera and a laser radar/laser scanner;
step 102: projecting the point cloud acquired by the laser radar onto an imaging plane of the visible light camera to obtain a corresponding sparse depth map;
step 103: acquiring a depth map truth value: superposing each sparse depth map and 2n +1 sparse depth maps (n =5 when KITTI is processed) adjacent to each other at sampling time to increase the density of the generated depth map, cleaning accumulated laser scanning projection by using semi-global matching (SGM), removing abnormal values of occlusion, dynamic motion and measurement artifacts, and taking the finally superposed depth map as a true value of the depth map;
step 104: processing to obtain one-to-one correspondence of each visible light RGB image, the sparse depth map and the point cloud, forming a data set by the visible light RGB images, the sparse depth maps and the point cloud which are in one-to-one correspondence, and dividing the obtained data set into a training set and a testing set;
performing sparse reconstruction and camera pose estimation through an open source SFM framework COLMAP;
step 201: inputting the multi-view visible light picture into a COLMAP, and obtaining a camera pose and a sparse point cloud through sparse reconstruction;
step 202: converting the binary type related file generated in the step 201 into a text format, and changing other types of camera models in the camera.
Step three, constructing an image feature extraction module of the depth map completion network, and extracting features of the image and the depth map, for example, fig. 3 is a coder-decoder framework of the network;
step 301: an image encoder is constructed based on a residual network. An image encoder of visible light image and sparse depth map features uses ResNet as a basic structure, processes two kinds of input through an extra convolutional layer, and connects image features of two different sources as input of ResNet after passing through a first layer of convolutional layer;
step 302: obtaining an intermediate feature vector representation of one of the two images by passing the result obtained in the step 301 through 5 ResNet rolling blocks;
step four, constructing a point cloud feature extraction module of the depth map completion network, and extracting the features of the point cloud;
step 401: selecting a point cloud processing classic network PointNet + + as a point cloud characteristic encoder;
step 402: grouping the input point clouds, and extracting the characteristics of each group: carrying out dimensionality change, carrying out convolution operation, and finally carrying out maximum pooling according to a PointNet mode to obtain characteristics;
step 403: sampling and grouping the result of the step 402 for multiple times, and obtaining the final overall characteristics by the operation of PointNet;
constructing a decoder module of the depth map completion network, and performing up-sampling on the obtained features;
step 501: the decoder uses the multi-scale image features from the image encoder and the point features from the point cloud encoder, and is composed of four transposed convolution layers and convolution layers, and the transposed convolution layers up-sample the features;
step 502: projecting point features onto each transposed block in the same proportion as the image features by feature projection;
step 503: the shared weight is estimated according to the characteristics of the decoder network for the initial dense depth, the confidence coefficient, the non-local neighborhood and the original affinity, the output of the last transposed block is processed through the convolution layer, and the initial dense depth image, the initial confidence coefficient and the original affinity are predicted;
step six, constructing an SPN module of the depth map completion network, and performing iterative optimization to obtain a final depth map;
step 601: SPNs can propagate information from regions with high confidence to regions with low confidence based on affinity for data, but if the propagation neighborhood is fixed, the depth distribution within the local region is ignored, so non-fixed local SPNs are employed;
step 602: the non-stationary local SPN estimates the neighborhood of each pixel outside the local area (i.e., the non-stationary local) based on color and depth information, the non-stationary local neighborhood being defined as:
Figure BDA0003826163090000101
in the formula, I and D are respectively a visible light RGB image and a sparse depth map; f. of φ (. Is a non-stationary local neighborhood prediction network that estimates K neighbors per pixel under a learnable parameter φ; p and q are real numbers;
step 603: and (3) realizing the non-fixed local SPN by the initial dense depth map obtained in the step (503) through deformable convolution, wherein the formula of iterative optimization is as follows:
Figure BDA0003826163090000102
wherein (m, n) and (i, j) are coordinates of the reference pixel and the neighboring pixel, respectively,
Figure BDA0003826163090000103
which represents the affinity of the reference pixel,
Figure BDA0003826163090000104
indicates the affinity of the pixels at (m, n) and (i, j). The first term to the right of the formula represents the propagation of the reference pixel and the second term represents the propagation affinity of the neighborhood weighted by the corresponding pixel.
Step 604: in order to ensure the stability of the transmission, the affinity is normalized by combining confidence coefficient before the transmission, and Tanh-gamma-Abs-Sum is adopted * The process of (1) according to the formula:
Figure BDA0003826163090000105
in the formula, c i,j ∈[0,1]Representing the confidence of the pixel at (i, j).
Constructing a loss function of the depth map completion network;
step 701: the reconstruction loss formula in the depth map completion is as follows:
Figure BDA0003826163090000106
wherein D gt Is a groudtuth depth map, D pred Is a depth map predicted by an algorithm, d υ V and | v | represent depth values at the pixel index v, D, respectively gt P is 1 denotes l 1 Loss, 2 represents l 2 Loss;
step 702: introducing a Chamfering Distance (CD) in 3D point cloud processing, wherein the CD represents that the mutual nearest points in two point sets are averaged, and the calculation formula is as follows:
Figure BDA0003826163090000111
wherein S 1 And S 2 For two 3D point cloud sets, carrying out back projection on a dense depth map predicted by a depth map completion network to a 3D space to obtain pseudo laser points, carrying out the same operation on the dense depth map in the group route, and calculating the CD loss between the dense depth map and the pseudo laser points;
step 703: in combination with reconstruction loss in depth map completion and CD loss iteration in point cloud processing, the final loss function is:
L=μL recon +(1-μ)L CD
wherein mu is a weight coefficient lost by depth map reconstruction;
step eight, predicting test set data by using the trained model to obtain a depth map completion result;
performing dense reconstruction, grid reconstruction and texture mapping by using the estimated depth map through an open source MVS framework OpenMVS;
step 901: replacing a depth map calculation part in OpenMVS, and obtaining dense point cloud by a method of fusing multi-frame depth maps according to a camera pose calculated by COLMAP and a dense depth map obtained by depth map completion network prediction.
Step 902: and (4) continuously using the grid reconstruction in OpenMVS to obtain a three-dimensional grid model from the dense point cloud obtained in the step 901, using a grid optimization module to obtain a finer grid model, and using a texture mapping module to obtain a final three-dimensional surface model with texture.
The above description is only an embodiment of the present invention, and does not limit the present invention in any way, and any simple modifications, alterations and equivalent structural changes made to the above embodiment according to the technical essence of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims (1)

1. A visible light image and laser radar data fusion method based on deep learning and applicable to three-dimensional reconstruction of outdoor buildings comprises the following steps:
acquiring a visible light image and a laser radar data set and preprocessing the visible light image and the laser radar data set;
step 101: respectively acquiring a multi-angle visible light image and a point cloud of the same scene by using a visible light camera and a laser radar/laser scanner;
step 102: projecting the point cloud collected by the laser radar to an imaging plane of the visible light camera to obtain a corresponding sparse depth map;
step 103: acquiring a depth map truth value: superposing each sparse depth map and 2n +1 sparse depth maps (n =5 when KITTI is processed) adjacent to each other at sampling time to increase the density of the generated depth map, cleaning accumulated laser scanning projection by using semi-global matching (SGM), removing abnormal values of occlusion, dynamic motion and measurement artifacts, and taking the finally superposed depth map as a true value of the depth map;
step 104: processing to obtain that each visible light RGB image corresponds to a sparse depth map and a point cloud one by one, forming a data set by a plurality of visible light RGB images and the sparse depth maps and the point cloud which correspond to one by one, and dividing the obtained data set into a training set and a testing set;
performing sparse reconstruction and camera pose estimation through an open source SFM framework COLMAP;
step 201: inputting the multi-view visible light picture into a COLMAP, and obtaining a camera pose and a sparse point cloud through sparse reconstruction;
step 202: converting the binary type related file generated in the step 201 into a text format, and changing other types of camera models in the camera.
Step three, constructing an image feature extraction module of the depth map completion network, and extracting features of the image and the depth map, for example, fig. 3 is a coder-decoder framework of the network;
step 301: an image encoder is constructed based on a residual network. An image encoder of visible light image and sparse depth map features uses ResNet as a basic structure, processes two kinds of input through an extra convolutional layer, and connects image features of two different sources as input of ResNet after passing through a first layer of convolutional layer;
step 302: obtaining an intermediate feature vector representation of one of the two images by 5 ResNet rolling blocks of the result obtained in the step 301;
step four, constructing a point cloud feature extraction module of the depth map completion network, and extracting the features of the point cloud;
step 401: selecting a point cloud processing classical network PointNet + + as a point cloud characteristic encoder;
step 402: grouping the input point clouds, and extracting the features of each group: carrying out dimensionality change, carrying out convolution operation, and finally carrying out maximum pooling according to a PointNet mode to obtain characteristics;
step 403: sampling and grouping the result of the step 402 for multiple times, and obtaining the final overall characteristics by the operation of PointNet;
constructing a decoder module of the depth map completion network, and performing up-sampling on the obtained features;
step 501: the decoder uses the multi-scale image features from the image encoder and the point features from the point cloud encoder, and consists of four transposed convolutional layers and a convolutional layer, and the transposed convolutional layers up-sample the features;
step 502: projecting the point features onto each transposed block in the same proportion as the image features by feature projection;
step 503: the shared weight is estimated according to the characteristics of the decoder network for the initial dense depth, the confidence coefficient, the non-local neighborhood and the original affinity, the output of the last transposed block is processed through the convolution layer, and the initial dense depth image, the initial confidence coefficient and the original affinity are predicted;
step six, constructing an SPN module of the depth map completion network, and performing iterative optimization to obtain a final depth map;
step 601: the SPN may propagate information from regions with high confidence to regions with low confidence based on the affinity of the data, but if the propagation neighborhood is fixed, the depth distribution within the local region is ignored, so non-fixed local SPNs are employed;
step 602: the non-stationary local SPN estimates the neighborhood of each pixel outside the local area (i.e., the non-stationary local) based on color and depth information, the non-stationary local neighborhood being defined as:
Figure FDA0003826163080000031
in the formula, I and D are respectively a visible light RGB image and a sparse depth map; f. of φ (. Is a non-stationary local neighborhood prediction network that estimates K neighbors per pixel under a learnable parameter φ; p and q are real numbers;
step 603: and (3) realizing the non-fixed local SPN by the initial dense depth map obtained in the step (503) through deformable convolution, wherein the formula of iterative optimization is as follows:
Figure FDA0003826163080000032
wherein (m, n) and (i, j) are coordinates of the reference pixel and the neighboring pixel, respectively,
Figure FDA0003826163080000033
which represents the affinity of the reference pixel,
Figure FDA0003826163080000034
denotes the positions of (m, n) and (i, j)Affinity of the pixel. The first term to the right of the formula represents the propagation of the reference pixel and the second term represents the propagation affinity of the neighborhood weighted by the corresponding pixel.
Step 604: in order to ensure the stability of the transmission, the affinity is normalized by combining confidence coefficient before the transmission, and Tanh-gamma-Abs-Sum is adopted * The process of (1), formula:
Figure FDA0003826163080000035
in the formula, c i,j ∈[0,1]Representing the confidence of the pixel at (i, j).
Constructing a loss function of the depth map completion network;
step 701: the reconstruction loss formula in the depth map completion is as follows:
Figure FDA0003826163080000036
wherein D gt Is a groudtuth depth map, D pred Is a depth map predicted by the algorithm, d υ V and | v | represent the depth value at the pixel index v, D, respectively gt P is 1 denotes l 1 Loss, 2 denotes l 2 Loss;
step 702: introducing a Chamfer Distance (CD) in 3D point cloud processing, wherein the CD represents that the average value of the points which are closest to each other in two point sets is obtained, and the calculation formula is as follows:
Figure FDA0003826163080000041
wherein S 1 And S 2 For two 3D point cloud sets, carrying out back projection on a dense depth map predicted by a depth map completion network to a 3D space to obtain pseudo laser points, carrying out the same operation on the dense depth map in the group route, and calculating the CD loss between the dense depth map and the pseudo laser points;
step 703: in combination with reconstruction loss in depth map completion and CD loss iteration in point cloud processing, the final loss function is:
L=μL recon +(1-μ)L CD
wherein mu is a weight coefficient lost by depth map reconstruction;
step eight, predicting test set data by using the trained model to obtain a depth map completion result;
performing dense reconstruction, grid reconstruction and texture mapping by using the estimated depth map through an open source MVS framework OpenMVS;
step 901: replacing a depth map calculation part in OpenMVS, and obtaining a dense point cloud by a method of fusing multi-frame depth maps according to a camera pose calculated by COLMAP and a dense depth map predicted by a depth map completion network.
Step 902: and (4) continuously using the grid reconstruction in OpenMVS to obtain a three-dimensional grid model from the dense point cloud obtained in the step 901, using a grid optimization module to obtain a finer grid model, and using a texture mapping module to obtain a final three-dimensional surface model with texture.
CN202211059667.XA 2022-08-30 2022-08-30 Image laser data fusion method based on deep learning and used for building reconstruction Pending CN115423978A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211059667.XA CN115423978A (en) 2022-08-30 2022-08-30 Image laser data fusion method based on deep learning and used for building reconstruction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211059667.XA CN115423978A (en) 2022-08-30 2022-08-30 Image laser data fusion method based on deep learning and used for building reconstruction

Publications (1)

Publication Number Publication Date
CN115423978A true CN115423978A (en) 2022-12-02

Family

ID=84199811

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211059667.XA Pending CN115423978A (en) 2022-08-30 2022-08-30 Image laser data fusion method based on deep learning and used for building reconstruction

Country Status (1)

Country Link
CN (1) CN115423978A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116255976A (en) * 2023-05-15 2023-06-13 长沙智能驾驶研究院有限公司 Map fusion method, device, equipment and medium
CN116468768A (en) * 2023-04-20 2023-07-21 南京航空航天大学 Scene depth completion method based on conditional variation self-encoder and geometric guidance
CN116579955A (en) * 2023-07-13 2023-08-11 厦门微图软件科技有限公司 New energy battery cell weld reflection point denoising and point cloud complement method and system
CN116740300A (en) * 2023-06-16 2023-09-12 广东工业大学 Multi-mode-based prime body and texture fusion furniture model reconstruction method
CN118172422A (en) * 2024-05-09 2024-06-11 武汉大学 Method and device for positioning and imaging interest target by combining vision, inertia and laser

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116468768A (en) * 2023-04-20 2023-07-21 南京航空航天大学 Scene depth completion method based on conditional variation self-encoder and geometric guidance
CN116468768B (en) * 2023-04-20 2023-10-17 南京航空航天大学 Scene depth completion method based on conditional variation self-encoder and geometric guidance
CN116255976A (en) * 2023-05-15 2023-06-13 长沙智能驾驶研究院有限公司 Map fusion method, device, equipment and medium
CN116255976B (en) * 2023-05-15 2023-10-31 长沙智能驾驶研究院有限公司 Map fusion method, device, equipment and medium
CN116740300A (en) * 2023-06-16 2023-09-12 广东工业大学 Multi-mode-based prime body and texture fusion furniture model reconstruction method
CN116740300B (en) * 2023-06-16 2024-05-03 广东工业大学 Multi-mode-based prime body and texture fusion furniture model reconstruction method
CN116579955A (en) * 2023-07-13 2023-08-11 厦门微图软件科技有限公司 New energy battery cell weld reflection point denoising and point cloud complement method and system
CN116579955B (en) * 2023-07-13 2023-10-20 厦门微图软件科技有限公司 New energy battery cell weld reflection point denoising and point cloud complement method and system
CN118172422A (en) * 2024-05-09 2024-06-11 武汉大学 Method and device for positioning and imaging interest target by combining vision, inertia and laser

Similar Documents

Publication Publication Date Title
CN115423978A (en) Image laser data fusion method based on deep learning and used for building reconstruction
CN112001960B (en) Monocular image depth estimation method based on multi-scale residual error pyramid attention network model
Sun et al. Aerial 3D building detection and modeling from airborne LiDAR point clouds
CN108038906B (en) Three-dimensional quadrilateral mesh model reconstruction method based on image
CN110009674B (en) Monocular image depth of field real-time calculation method based on unsupervised depth learning
CN111753698A (en) Multi-mode three-dimensional point cloud segmentation system and method
CN114758337B (en) Semantic instance reconstruction method, device, equipment and medium
Holzmann et al. Semantically aware urban 3d reconstruction with plane-based regularization
CN112862736B (en) Real-time three-dimensional reconstruction and optimization method based on points
Condorelli et al. A comparison between 3D reconstruction using nerf neural networks and mvs algorithms on cultural heritage images
CN114359509A (en) Multi-view natural scene reconstruction method based on deep learning
CN109859249B (en) Scene flow estimation method based on automatic layering in RGBD sequence
CN116797742A (en) Three-dimensional reconstruction method and system for indoor scene
CN115937461B (en) Multi-source fusion model construction and texture generation method, device, medium and equipment
CN112288788A (en) Monocular image depth estimation method
CN116912405A (en) Three-dimensional reconstruction method and system based on improved MVSNet
CN115482268A (en) High-precision three-dimensional shape measurement method and system based on speckle matching network
CN115272599A (en) Three-dimensional semantic map construction method oriented to city information model
CN115222884A (en) Space object analysis and modeling optimization method based on artificial intelligence
CN112991504B (en) Improved hole filling method based on TOF camera three-dimensional reconstruction
CN112215766A (en) Image defogging method integrating image restoration and image enhancement and convolution network thereof
CN116485892A (en) Six-degree-of-freedom pose estimation method for weak texture object
CN115375847B (en) Material recovery method, three-dimensional model generation method and model training method
CN116758219A (en) Region-aware multi-view stereo matching three-dimensional reconstruction method based on neural network
CN113487741B (en) Dense three-dimensional map updating method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination