CN115564888A - Visible light multi-view image three-dimensional reconstruction method based on deep learning - Google Patents

Visible light multi-view image three-dimensional reconstruction method based on deep learning Download PDF

Info

Publication number
CN115564888A
CN115564888A CN202210845580.9A CN202210845580A CN115564888A CN 115564888 A CN115564888 A CN 115564888A CN 202210845580 A CN202210845580 A CN 202210845580A CN 115564888 A CN115564888 A CN 115564888A
Authority
CN
China
Prior art keywords
depth
map
depth map
feature
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210845580.9A
Other languages
Chinese (zh)
Inventor
罗欣
冯倩
吴禹萱
韦祖棋
宋依芸
冷庚
许文波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangtze River Delta Research Institute of UESTC Huzhou
Original Assignee
Yangtze River Delta Research Institute of UESTC Huzhou
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangtze River Delta Research Institute of UESTC Huzhou filed Critical Yangtze River Delta Research Institute of UESTC Huzhou
Priority to CN202210845580.9A priority Critical patent/CN115564888A/en
Publication of CN115564888A publication Critical patent/CN115564888A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Graphics (AREA)
  • Biophysics (AREA)
  • Geometry (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a visible light multi-view image three-dimensional reconstruction method based on deep learning, which is improved based on an MVSNet network. The batch normalization layer and the nonlinear activation function layer in the network are replaced by the fused infilace-ABN layer, so that the occupation amount of video memory is reduced. A weighted mean measurement method based on grouping similarity is designed to reduce the dimension of the characteristic dimension of the cost body, so that a lighter-weight cost body is obtained, network parameters are compressed, and the calculation amount and the video memory consumption are reduced. Aiming at the problem that the resolution of a depth map is lower than that of an input image due to the fact that a low-scale feature map is used in an MVSNet network, a multi-scale feature map is extracted by using a feature pyramid module, and staged multi-scale iterative optimization depth estimation is designed. On the premise of ensuring the precision, the average number of depth planes of the cost body is reduced through multiple rounds of depth iteration, so that the cost body obtains higher spatial resolution, and the accuracy of depth map estimation is improved. And finally, filtering and fusing the output depth map to complete a three-dimensional scene reconstruction task.

Description

Visible light multi-view image three-dimensional reconstruction method based on deep learning
Technical Field
The invention belongs to the field of computer image processing, and relates to a method for performing three-dimensional reconstruction on a visible light multi-view image based on a deep learning method and outputting a three-dimensional point cloud.
Background
As a technology for finely restoring real world scenes, three-dimensional reconstruction plays an important role in daily life and production work of people. The concept of depth in three-dimensional reconstruction refers to the projection distance between a spatial three-dimensional point corresponding to an imaging pixel and a camera focus on an image. The depth map is a data format for recording depth information of all pixel points on one image, and the pixels in the image can be restored to a three-dimensional space according to the depth map corresponding to the image to obtain a small piece of point cloud. Enough images and enough depth maps are provided, so that a point cloud with enough density can be obtained. The MVSNet is a more classical MVS method based on deep learning, follows the idea of a plane scanning method, and has the main advantages that the feature extraction is carried out through a convolutional neural network, a high-dimensional cost body constructed based on the MVSNet keeps high-dimensional spatial structure semantic information, the cost body is subjected to regularization processing through 3D CNN, the operation speed of the method is much higher than that of the traditional method, a better processing effect is achieved on low-texture regions, and some obvious defects exist. The MVSNet abandons a pixel map and uses a feature map instead for depth estimation, a feature extraction network with a VGG structure is used, the size is continuously reduced by using multilayer convolution to extract image features of different levels, the network is subjected to down-sampling twice, the resolution of the feature map is reduced to 1/16 of that of an original image, and the constructed cost is wide and the height is only 1/4 of that of the original image. Since the width and height of the depth map are equal to those of the cost body, the area of the finally predicted depth map is only 1/16 of that of the reference image, and the edge of the target object is too smooth due to the influence of the convolution operation. In order to solve the problems of resolution reduction and edge smoothing of the depth map, the MVSNet adopts an additional 2D CNN upsampling module to perform refinement upsampling on the H/4 xW/4 initial depth map, and the process combines edge features contained in an original image to perform interpolation to finally obtain the H xW full-size depth map. Since this process is performed on the initial depth map at a two-dimensional level, the three-dimensional high-level semantic information included in the cost volume is not effectively utilized.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention discloses a feature point detection matching method based on deep learning. The invention provides a visible light multi-view image three-dimensional reconstruction method based on deep learning, which is improved based on an MVSNet network. The batch normalization layer and the nonlinear activation function layer in the network are replaced by the fused infilace-ABN layer, so that the occupation amount of video memory is reduced. A weighted mean measurement method based on grouping similarity is designed to reduce the dimension of the characteristic dimension of the cost body, so that a lighter-weight cost body is obtained, network parameters are compressed, and the calculation amount and the video memory consumption are reduced. Aiming at the problem that the resolution of a depth map is lower than that of an input image due to the fact that a low-scale feature map is used in an MVSNet network, a multi-scale feature map is extracted by using a feature pyramid module, and staged multi-scale iterative optimization depth estimation is designed. On the premise of ensuring the precision, the average number of the depth planes of the cost body is reduced through multiple rounds of depth iteration, so that the cost body obtains higher spatial resolution, and the accuracy of depth map estimation is improved. And finally, filtering and fusing the output depth map to complete a scene three-dimensional reconstruction task.
The technical route adopted by the invention is as follows:
a multi-view image three-dimensional reconstruction method based on deep learning comprises the following steps:
step 1: performing incremental SfM on an image group of a scene to be predicted, and calculating camera parameters of each image and sparse point cloud of the scene to be predicted;
step 1.1: and reading in the image group of the scene to be predicted by using a COLMAP program, performing an incremental motion recovery structure algorithm, and calculating to obtain the camera parameters of each image and the sparse point cloud of the scene to be predicted.
And 2, step: designing an improved depth estimation network based on MVSNet, inputting a scene to be predicted into the network for calculation, and obtaining a depth map and a probability map corresponding to each image;
step 2.1: and (3) adopting the same extraction process as the MVSNet feature extractor for an original image with the size of H multiplied by W, after obtaining a 32-channel high-dimensional feature map, performing multilayer convolution and two times of 2-time interpolation upsampling, after each time of interpolation upsampling, aggregating the upsampled feature map with the same resolution of the previous stage, and finally obtaining feature maps with the sizes of H/4 multiplied by W/4 multiplied by 32, H/2 multiplied by W/2 multiplied by 16 and H multiplied by W multiplied by 8.
Step 2.2: and for each adjacent visual angle, extracting a point cloud set of a common-view area of the adjacent visual angle and the reference visual angle from the sparse point cloud according to the camera parameters and the sparse point cloud of the scene obtained in the step 1. And calculating a base line angle between each point in the point cloud set and the optical centers and the main optical axes of the two image cameras, calculating a score for the point by using a piecewise Gaussian function, and adding the scores of all the points to obtain a total score representing the matching degree score between the two images.
Step 2.3: and dividing 32-channel feature bodies obtained by carrying out micro-homography transformation on feature maps extracted from adjacent visual angles into G channel groups, and calculating the similarity of each group and the channel group corresponding to the feature body of the reference visual angle by adopting an inner product mode. A similarity map of the G channel is obtained for each contiguous viewing angle. And carrying out normalized weighted mean aggregation by using the matching degree score as a weighting coefficient between the similarity mapping bodies of all adjacent visual angles, and finally obtaining the G channel cost body of the group mean measurement.
Step 2.4: and (4) uniformly setting 64 depth planes in the depth range of the whole scene by using the feature graph with the lowest scale extracted by the feature pyramid module, and constructing an H/4 multiplied by W/4 multiplied by 64 multiplied by G cost body by using the grouping similarity mean value measurement method in the step 2.3, wherein G is the number of groups. Then, the cost body is normalized by using 3D CNN to obtain a probability body, and a H/4 xW/4 coarse depth map is estimated. Wherein the batch normalization layer and the nonlinear activation function layer after each convolution layer in the 3D CNN are replaced by an Inplace-ABN layer.
Step 2.5: and 2 times of upsampling is carried out on the coarse depth map estimated in the step 2.4 by utilizing the mesoscale feature map extracted by the feature pyramid module to obtain an H/2 xW/2 upsampled depth map, the depth map is used as a prior depth curved surface, 1/128 of the scene depth range is used as an interval, and 32 equidistant relative depth surfaces are arranged in front of and behind the prior depth curved surface. After the relative depth surface is set up, a H/2 xW/2 x32 xG cost body is constructed by utilizing a grouping similarity mean value measurement method. Regularizing the cost body by using the 3D CNN module in the step 2.4 to obtain a probability body, estimating a relative depth map of H/2 xW/2, and superposing the relative depth map and a result after bilinear interpolation upsampling of the prior depth map to obtain an intermediate-level depth map of H/2 xW/2.
Step 2.6: similar to the step 2.5, performing 2-fold upsampling on the intermediate-level depth map output in the step 2.5 by using a high-scale feature map extracted by a feature pyramid module to obtain an H × W upsampled depth map, taking the depth map as a prior depth curved surface, setting up 8 equidistant relative depth planes in front and at the back of the prior depth curved surface, wherein the plane interval is 1/256 of the scene depth, constructing an H × W × 8 × G cost body by using a grouping similarity mean value measurement method, obtaining a probability body by using the regularization of the 3D CNN module in the step 2.5, estimating the relative depth map with the size of H × W, and overlapping the relative depth map with the result of bilinear interpolation upsampling of the intermediate-level depth map to obtain a final depth map.
And step 3: and filtering and fusing the depth maps of all the images according to the geometric consistency to generate three-dimensional point cloud data of the scene to be predicted.
And 4, step 4: generating scene three-dimensional point cloud data;
step 4.1: and performing threshold value screening on the depth map and the probability map obtained from each image through the probability map, performing depth filtering on pixel depth meeting the threshold value through double-view geometric consistency after the depth map and the probability map of each image are obtained, and fusing the filtered depth pixels to obtain point cloud data.
The method is suitable for three-dimensional reconstruction engineering of visible light multi-view images, such as building model reconstruction, unmanned aerial vehicle photogrammetry and the like.
Compared with the prior art, the invention has the following advantages:
(1) When the MVSNet network reconstructs an image, the consumption of video memory resources is overlarge, and the application in a high-resolution scene is greatly limited. According to the improved MVSNet method based on deep learning, the batch normalization layer and the nonlinear activation function layer in the network are replaced by the fused infilace-ABN layer, and the occupation amount of video memory is reduced.
(2) The designed weighted mean measurement method based on the grouping similarity is used for reducing the dimension of the characteristic dimension of the cost body, so that the more lightweight cost body is obtained, the network parameters are compressed, and the calculated amount and the memory consumption are reduced.
(3) Aiming at the problem that the resolution of a depth map is lower than that of an input image due to the fact that a low-scale feature map is used in an MVSNet network, a multi-scale feature map is extracted by using a feature pyramid module, and staged multi-scale iterative optimization depth estimation is designed. On the premise of ensuring the precision, the average number of the depth planes of the cost body is reduced through multiple rounds of depth iteration, so that the cost body obtains higher spatial resolution, and the accuracy of depth map estimation is improved.
Drawings
Fig. 1 is a diagram of a network architecture of the present invention.
Fig. 2 is a diagram of a characteristic pyramid network structure according to the present invention.
Fig. 3 is a flow chart of the construction of the group mean cost body according to the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The Network structure of this patent is as shown in fig. 1, firstly, a Feature extraction structure of the MVSNet is improved, pyramid Feature extraction is performed on an input reference view image and an adjacent view image by using a Feature Pyramid Network (FPN), so as to obtain a series of Feature maps of different scales, and then, the Feature maps are constructed into cost bodies of different scales by a mean value measurement method based on grouping similarity from low to high according to the scales. And after 3D regularization is carried out on the low-scale cost body and the pre-estimated depth map of the scale is output, the depth map is used as prior depth information, iterative correction is carried out on the depth in the high-scale cost body, and finally the depth map with the same resolution as the reference visual angle image is obtained through multi-stage multi-scale iteration.
Based on the FPN idea, the feature extraction network of the MVSNet is improved to extract a plurality of feature maps with different scales. As shown in fig. 2, firstly, the same extraction process is adopted as that of the original MVSNet feature extractor, after a 32-channel high-dimensional feature map is obtained, multilayer convolution and two times of interpolation upsampling are performed, and after each interpolation upsampling, aggregation is performed with the same-resolution feature map of the previous stage, so that three feature maps with different scales are finally obtained. For an original image with the size of H multiplied by W, the sizes of the FPN output feature maps are H/4 multiplied by W/4 multiplied by 32, H/2 multiplied by W/2 multiplied by 16 and H multiplied by W multiplied by 8 respectively, and the original image with the size of H multiplied by W is aggregated with high-level semantic features and is used for constructing cost bodies at different stages respectively.
Based on a grouping mean value measurement mechanism used in a binocular matching task, a multi-view image depth estimation network is improved to replace an original measurement method based on variance, a mean value measurement method based on grouping similarity is used for constructing a cost body, and the specific flow is shown in fig. 3:
reference is made to the characteristic diagram of the reference viewing angle
Figure BDA0003751532650000041
Let the characteristic map of the ith adjacent viewing angle be
Figure BDA0003751532650000042
Will be provided with
Figure BDA0003751532650000043
Assuming a plane d at the jth depth j The homographic projection of (A) is recorded as
Figure BDA0003751532650000044
Will be provided with
Figure BDA0003751532650000045
Is divided into G groupsTo is aligned with
Figure BDA0003751532650000046
And
Figure BDA0003751532650000047
the similarity between the groups is calculated according to the grouping of the characteristic channels, and the similarity of the g-th group is recorded as
Figure BDA0003751532650000048
Wherein G belongs to (0, 1,. G-1),
Figure BDA0003751532650000049
the calculation formula of (a) is as follows:
Figure BDA00037515326500000410
wherein
Figure BDA00037515326500000417
Characteristic diagram of indicating reference visual angle
Figure BDA00037515326500000412
The method of (1) group g of features,
Figure BDA00037515326500000413
to represent
Figure BDA00037515326500000418
The group g of features of (1),<·,·>representing an inner product operation. Grouping all G groups
Figure BDA00037515326500000415
Similarity between them
Figure BDA00037515326500000416
After the calculation is finished, the calculation is carried out to simultaneously obtain the characteristic similarity mapping S with G channels i,j . Let the total number of depth hypothesis planes be denoted as D, knowing j ∈ (0, 1.,. D-1), and thus D between the reference image and the ith neighboring imageFeature similarity mapping S i,j Similarity character V capable of being combined as W multiplied by H multiplied by D multiplied by G i This feature is distinguished from the MVSNet feature in that it records how similar the feature map of the neighboring view angle is to the reference view angle. Thus, we distinguish from the variance-based aggregation approach that MVSNet employs for different neighboring view-angle features, for similarity feature V i And a mean value-based polymerization mode is adopted to obtain the light-weight matching cost body C. In the cost based on variance, the smaller the variance value at the depth plane d is, the higher the probability that the depth value is d is, while in the cost based on mean value, the higher the mean value at the depth plane d is, the higher the similarity at d is represented by each view angle, the higher the probability that the depth value is d is, and the aggregation formula is as follows:
Figure BDA0003751532650000051
the size of the cost body C is W multiplied by H multiplied by D multiplied by G, the size of the cost body can be reduced to the original G/F based on the average value measurement of the grouping similarity, G =8 is set, and compared with the original 32-channel cost body, the operation consumption of a 3D U-Net regularization link is reduced.

Claims (4)

1. A visible light multi-view image three-dimensional reconstruction method based on deep learning is characterized by comprising the following steps:
step 1: performing incremental SfM on an image group of a scene to be predicted, and calculating to obtain camera parameters of each image and sparse point cloud of the scene to be predicted;
step 1.1: and reading in the image group of the scene to be predicted by using a COLMAP program, performing an incremental motion recovery structure algorithm, and calculating to obtain the camera parameters of each image and the sparse point cloud of the scene to be predicted.
Step 2: designing an improved depth estimation network based on MVSNet, inputting a scene to be predicted into the network for calculation, and obtaining a depth map and a probability map corresponding to each image;
step 2.1: and (3) adopting the same extraction process as the MVSNet feature extractor for an original image with the size of H multiplied by W, after obtaining a 32-channel high-dimensional feature map, performing multilayer convolution and two times of 2-time interpolation upsampling, after each time of interpolation upsampling, aggregating the upsampled feature map with the same resolution of the previous stage, and finally obtaining feature maps with the sizes of H/4 multiplied by W/4 multiplied by 32, H/2 multiplied by W/2 multiplied by 16 and H multiplied by W multiplied by 8.
Step 2.2: and for each adjacent visual angle, extracting a point cloud set of a common-view area of the adjacent visual angle and the reference visual angle from the sparse point cloud according to the camera parameters and the sparse point cloud of the scene obtained in the step 1. And calculating a base line angle between each point in the point cloud set and the optical centers and the main optical axis of the two image cameras, calculating a score for the point by using a piecewise Gaussian function, and adding the scores of all the points to obtain a total score representing the matching degree score between the two images.
Step 2.3: and dividing the 32-channel feature bodies obtained by carrying out micro-homography transformation on the feature maps extracted from the adjacent visual angles into G channel groups, and calculating the similarity of the channel groups corresponding to the feature bodies of the reference visual angles by each group in an inner product mode. A similarity map of the G channel for each adjacent viewing angle is obtained. And carrying out normalized weighted mean aggregation by using the matching degree score as a weighting coefficient between the similarity mapping bodies of all adjacent visual angles, and finally obtaining the G channel cost body of the group mean measurement.
Step 2.4: and (3) uniformly setting 64 depth planes in the depth range of the whole scene by using the feature map of the lowest scale extracted by the feature pyramid module, and constructing an H/4 multiplied by W/4 multiplied by 64 multiplied by G cost body by using the grouping similarity mean measurement method in the step 2.3, wherein G is the number of groups. Then, the cost body is normalized by using 3D CNN to obtain a probability body, and a H/4 xW/4 coarse depth map is estimated. Wherein the batch normalization layer and the nonlinear activation function layer after each convolution layer in the 3D CNN are replaced by an Inplace-ABN layer.
Step 2.5: and 2 times of upsampling is carried out on the coarse depth map estimated in the step 2.4 by utilizing the mesoscale feature map extracted by the feature pyramid module to obtain an H/2 xW/2 upsampled depth map, the depth map is used as a prior depth curved surface, 1/128 of the scene depth range is used as an interval, and 32 equidistant relative depth surfaces are arranged in front of and behind the prior depth curved surface. After the relative depth surface is established, a H/2 xW/2 x32 xG cost body is constructed by utilizing a grouping similarity mean measurement method. And (3) regularizing the cost body by using the 3D CNN module in the step 2.4 to obtain a probability body, estimating an H/2 xW/2 relative depth map, and superposing the relative depth map and a result obtained after bilinear interpolation upsampling of the prior depth map to obtain an H/2 xW/2 intermediate-level depth map.
Step 2.6: similar to the step 2.5, performing 2-fold upsampling on the intermediate-level depth map output in the step 2.5 by using a high-scale feature map extracted by a feature pyramid module to obtain an H × W upsampled depth map, taking the depth map as a prior depth curved surface, setting up 8 equidistant relative depth planes in front and at the back of the prior depth curved surface, wherein the plane interval is 1/256 of the scene depth, constructing an H × W × 8 × G cost body by using a grouping similarity mean value measurement method, obtaining a probability body by using the regularization of the 3D CNN module in the step 2.5, estimating the relative depth map with the size of H × W, and overlapping the relative depth map with the result of bilinear interpolation upsampling of the intermediate-level depth map to obtain a final depth map.
And step 3: and filtering and fusing the depth maps of all the images according to the geometric consistency to generate three-dimensional point cloud data of the scene to be predicted.
And 4, step 4: generating scene three-dimensional point cloud data;
step 4.1: and performing threshold value screening on the depth map and the probability map obtained from each image through the probability map, performing depth filtering on pixel depths meeting the threshold value through double-view geometric consistency after the depth map and the probability map of each image are obtained, and fusing the filtered depth pixels to obtain point cloud data.
2. The method as claimed in claim 1, wherein step 2.1 uses a feature pyramid network structure to improve the feature extraction network of the MVSNet, extracts the multi-scale image features, and replaces the batch normalization layer and the activation function layer with the Inplace-ABN layer, thereby reducing the consumption of video memory.
3. The method according to claim 1, wherein the matching-based weighted mean measure method designed in step 2.2 converts the feature map after the micro-homography transformation into the similarity mapping body of the G channel based on the grouping similarity, designs the perspective matching degree algorithm, and aggregates the similarity mapping bodies of the adjacent perspectives into the lightweight cost body based on the matching degree weighted mean.
4. The method of claim 1, wherein step 2.3 is to perform multi-stage iteration through a multi-scale mean cost body of the multi-scale feature map aggregation, to refine the depth map by increasing the spatial resolution, and finally to output the depth map and the probability map with the same size as the original image.
CN202210845580.9A 2022-07-18 2022-07-18 Visible light multi-view image three-dimensional reconstruction method based on deep learning Pending CN115564888A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210845580.9A CN115564888A (en) 2022-07-18 2022-07-18 Visible light multi-view image three-dimensional reconstruction method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210845580.9A CN115564888A (en) 2022-07-18 2022-07-18 Visible light multi-view image three-dimensional reconstruction method based on deep learning

Publications (1)

Publication Number Publication Date
CN115564888A true CN115564888A (en) 2023-01-03

Family

ID=84738586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210845580.9A Pending CN115564888A (en) 2022-07-18 2022-07-18 Visible light multi-view image three-dimensional reconstruction method based on deep learning

Country Status (1)

Country Link
CN (1) CN115564888A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091712A (en) * 2023-04-12 2023-05-09 安徽大学 Multi-view three-dimensional reconstruction method and system for computing resource limited equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091712A (en) * 2023-04-12 2023-05-09 安徽大学 Multi-view three-dimensional reconstruction method and system for computing resource limited equipment

Similar Documents

Publication Publication Date Title
Huang et al. Deepmvs: Learning multi-view stereopsis
CN110570371B (en) Image defogging method based on multi-scale residual error learning
CN109377530B (en) Binocular depth estimation method based on depth neural network
CN111259945B (en) Binocular parallax estimation method introducing attention map
CN111462329A (en) Three-dimensional reconstruction method of unmanned aerial vehicle aerial image based on deep learning
CN112132023A (en) Crowd counting method based on multi-scale context enhanced network
CN108062769B (en) Rapid depth recovery method for three-dimensional reconstruction
CN113345082B (en) Characteristic pyramid multi-view three-dimensional reconstruction method and system
CN115205489A (en) Three-dimensional reconstruction method, system and device in large scene
CN110910437B (en) Depth prediction method for complex indoor scene
CN113283525B (en) Image matching method based on deep learning
CN112232134B (en) Human body posture estimation method based on hourglass network and attention mechanism
CN115984494A (en) Deep learning-based three-dimensional terrain reconstruction method for lunar navigation image
CN111860651B (en) Monocular vision-based semi-dense map construction method for mobile robot
CN111950477A (en) Single-image three-dimensional face reconstruction method based on video surveillance
CN113838191A (en) Three-dimensional reconstruction method based on attention mechanism and monocular multi-view
CN112270694B (en) Method for detecting urban environment dynamic target based on laser radar scanning pattern
CN110889868B (en) Monocular image depth estimation method combining gradient and texture features
CN113284251A (en) Cascade network three-dimensional reconstruction method and system with self-adaptive view angle
CN116912405A (en) Three-dimensional reconstruction method and system based on improved MVSNet
CN116486074A (en) Medical image segmentation method based on local and global context information coding
CN116883588A (en) Method and system for quickly reconstructing three-dimensional point cloud under large scene
Khan et al. Lrdnet: lightweight lidar aided cascaded feature pools for free road space detection
CN115564888A (en) Visible light multi-view image three-dimensional reconstruction method based on deep learning
CN113610912B (en) System and method for estimating monocular depth of low-resolution image in three-dimensional scene reconstruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination