CN115511759A - Point cloud image depth completion method based on cascade feature interaction - Google Patents

Point cloud image depth completion method based on cascade feature interaction Download PDF

Info

Publication number
CN115511759A
CN115511759A CN202211167454.9A CN202211167454A CN115511759A CN 115511759 A CN115511759 A CN 115511759A CN 202211167454 A CN202211167454 A CN 202211167454A CN 115511759 A CN115511759 A CN 115511759A
Authority
CN
China
Prior art keywords
scene
point cloud
image
depth
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211167454.9A
Other languages
Chinese (zh)
Inventor
梁韵基
陈能真
刘磊
於志文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202211167454.9A priority Critical patent/CN115511759A/en
Publication of CN115511759A publication Critical patent/CN115511759A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • G06T2207/10044Radar image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a point cloud image depth complementing method based on cascade feature interaction, which belongs to the field of automatic driving and comprises the following steps: the method comprises the steps of obtaining an automatic driving scene three-dimensional point cloud and a scene two-dimensional RGB image, constructing an encoder according to a plurality of residual modules, constructing a decoder according to a plurality of up-sampling modules, constructing neural networks of the point cloud and the image respectively, constructing a plurality of cascade feature interaction modules between the neural networks of the point cloud and the image, constructing a feature interaction point cloud and image double-branch neural network model, inputting the scene three-dimensional point cloud and the scene two-dimensional RGB image into the feature interaction point cloud and image double-branch neural network model, outputting a scene dense depth map, and fusing the scene depth map output by two branches in a confidence map weighting mode to obtain a depth map with higher reliability. Compared with other models based on image and point cloud fusion, the method has better depth perception performance under the condition of taking the image and the low-beam laser radar point cloud as input.

Description

Point cloud image depth completion method based on cascade feature interaction
Technical Field
The invention relates to the technical field of automatic driving, in particular to a point cloud image depth complementing method based on cascade feature interaction.
Background
The depth perception is a very basic and important perception technology in an automatic driving system, the purpose of the depth perception technology is to acquire accurate and dense depth information of surrounding scenes, and based on the acquired dense depth information, a plurality of high-level perception tasks of automatic driving, such as semantic segmentation, target detection, three-dimensional scene reconstruction and the like, can acquire performance improvement to a great extent. The automatic driving at the present stage mainly depends on two sensors, namely a camera and a laser radar, to perform depth perception, the two sensors respectively have advantages and disadvantages, image data collected by the camera sensor can acquire rich texture and color information of a scene, but the influence of illumination conditions is large, point cloud data collected by the laser radar sensor can acquire accurate depth information of the scene and is not influenced by illumination, but the point cloud data is very sparse, and sufficient effective information cannot be provided.
In the prior art, a depth perception scheme based on a pure image and a depth perception scheme based on an image and a laser radar point cloud exist.
Among depth perception schemes based on pure images, there is a depth perception scheme of monocular depth estimation, which, as the name implies, estimates the distance of each pixel in an image from a shooting source by using an RGB image at one or only one viewing angle. The monocular depth estimation method based on supervised learning directly takes a two-dimensional image as input, takes a depth map as output, takes a ground truth depth map as supervision information, and trains a depth model; in addition, because the acquisition difficulty of the depth label data is high, many algorithms are based on unsupervised models at present, namely, binocular image data acquired by two cameras are only used for carrying out combined training. The binocular data can mutually predict each other, so that corresponding parallax data is obtained, and then evolution is carried out according to the relation between parallax and depth, or the corresponding problem of each pixel point in the binocular image is regarded as a stereo matching problem to be trained.
In a depth perception scheme based on image and laser radar point cloud, considering that two sensors of a camera and a laser radar have respective advantages and disadvantages, an existing automatic driving perception system is usually based on a scheme of multi-sensor perception fusion, and the advantages of the two sensor data are complemented by fusing the image and the point cloud sensor data, so that the purpose of improving the depth perception capability is achieved. According to the fusion stage, the existing heterogeneous multi-sensor fusion sensing scheme can be divided into three fusion modes, namely early fusion, middle fusion and later fusion, wherein the early fusion is also called data layer fusion, and two kinds of sensing data are fused on an original data layer; the method is simple to implement, but has obvious defects, information interaction and advantage complementation between two modal data are not fully realized, the fusion effect is improved to a limited extent, and sometimes the fusion result is even worse than that under the condition of a single perception mode. The feature layer fusion is also called middle-stage fusion, and the two sensing data are respectively extracted and then the extracted features are fused, so that the advantage is that the network can be designed for single sensing modal data to fully extract the features, but the defect also exists, and the full interaction of the two sensing data cannot be effectively realized.
The current depth estimation methods based on pure images can be divided into conventional methods, machine learning-based methods, and deep learning-based methods. The traditional method is based on binocular or multi-view images, adopts a stereo matching technology, converts parallax information between two images into depth information by using a triangulation method, and estimates scene depth information from the images. Monocular image depth estimation based on machine learning, a probability map model is built for a depth relation by using a Markov Random Field (MRF), and image depth estimation is realized by minimizing an energy function. The deep learning-based method is also a relatively large method used at present, and a model is trained to learn the mapping relation from an image to a depth map by inputting an RGB image. The disadvantage of this method is that the model performance depends heavily on the data quality, and thus the model performance may be degraded severely in situations with poor care conditions, such as at night, in tunnels, etc.
The scheme based on the point cloud fusion of the image and the laser radar is a mainstream scheme of automatic driving depth perception at the present stage, and the defect based on a pure image scheme is overcome. In the current point cloud image fusion depth perception technology, although a scheme based on pre-fusion can retain original information of data to the greatest extent, the existing technology is difficult to realize fine-grained heterogeneous perception data spatial alignment and fusion, which often results in poor fusion effect; the scheme based on post-fusion fuses the sensing results of the data of the two sensors in a decision-making level, the implementation is simple, but the two sensors are limited respectively, interaction is lacked between the two modes, and advantage complementation between the two modes cannot be realized, so that the fusion effect is poor, and sometimes, the sensing effect is worse because the sensing results of the two sensors are mutually contradictory. At present, a more used fusion scheme is multi-modal fusion perception based on a feature layer, and the method has the advantages that spatial alignment of data does not need to be considered, but the current various feature layer fusion-based technologies are still not fine enough in fusion granularity, and one mode is often used as auxiliary supplementary information of the other mode or only fused in modes such as simple addition, so that interaction between the two modes is insufficient, and fusion is not sufficient.
Disclosure of Invention
In order to solve the problems of insufficient depth perception precision and poor fusion effect of heterogeneous perception data in the scheme, fine-grained fusion and sufficient interaction of point cloud and image sensor data are achieved, a dual-branch heterogeneous perception data cascade interaction network is provided, corresponding features of two modes are fused on multiple scales, the fused features are input into branch networks corresponding to the respective modes, the information richness and the depth perception capability of the two branch networks are improved, in addition, the idea of an auxiliary task is introduced, and scene structure information in a model learning image is guided by introducing an image reconstruction task, so that the structure information of an output depth map is more complete. And finally, outputting the high-reliability depth values in the output depth maps of the two branch networks as a final model through the confidence map to obtain a fusion perception result.
The embodiment of the invention provides a point cloud image depth completion method based on cascade feature interaction, which comprises the following steps:
acquiring an automatic driving scene three-dimensional point cloud and a scene two-dimensional RGB image;
constructing two encoders for extracting characteristics of scene three-dimensional point cloud and scene two-dimensional RGB images according to a plurality of Resnet34 residual modules;
constructing two decoders for performing feature restoration on the scene three-dimensional point cloud and the scene two-dimensional RGB image according to the plurality of up-sampling modules;
connecting an encoder and a decoder of a scene three-dimensional point cloud extraction and reduction branch to construct a scene three-dimensional point cloud branch neural network;
connecting an encoder of a scene two-dimensional RGB image extraction and reduction branch with a decoder to construct a scene two-dimensional RGB image branch neural network;
setting levels of residual modules of two encoders in a scene three-dimensional point cloud branched neural network and a scene two-dimensional RGB image branched neural network in a mutually corresponding manner;
constructing a plurality of cascade feature interaction modules, wherein the input of each cascade feature interaction module is connected with the corresponding hierarchical output of the residual error modules of the two encoders, and the output of each cascade feature interaction module is connected with the next corresponding hierarchical of the two encoders, so as to construct a point cloud and image double-branch neural network model of feature interaction;
inputting a scene three-dimensional point cloud and a scene two-RGB dimensional image on a point cloud and image double-branch neural network model with characteristic interaction, and outputting a scene depth map;
and fusing the scene depth map by using a confidence map weighting mode to obtain a new scene depth map.
Preferably, the two encoders for extracting the features of the scene three-dimensional point cloud and the scene two-dimensional RGB image each include five cascaded residual error modules, and the two decoders for performing the feature restoration of the scene three-dimensional point cloud and the scene two-dimensional RGB image each include five cascaded upsampling modules, and in the encoder for extracting the features of the scene three-dimensional point cloud, the convolutional neural network of the residual error modules adopts a sparse convolutional neural network, and the convolutional kernel is 3 × 3; in an encoder for extracting the features of a scene two-dimensional RGB image, a convolution neural network of a residual error module adopts a standard convolution neural network, and a convolution kernel is 3x3.
Preferably, the three-dimensional point cloud branched neural network and the scene two-dimensional RGB image branched neural network each comprise a plurality of different convolutional layers, pooling layers, activation layers, transpose convolutional layers and cross-scale feature connection layers.
Preferably, each decoder comprises five cascaded upsampling modules, each comprising one transposed convolution, one batch normalization layer, one pooling layer.
Preferably, the number of the cascade feature interaction modules is five, and each cascade feature interaction module comprises a 1x1 convolution, three hole convolutions with hole rates of 1, 2 and 4 respectively, and a 1x1 convolution;
and the output of the last cascade feature interaction module is used as the input of a first up-sampling layer of the point cloud and image double-branch neural network model of feature interaction.
Preferably, the method further comprises the following steps:
and taking the reconstructed image output by the last up-sampling module as an auxiliary task, calculating the difference between the reconstructed image and the input scene two-dimensional RGB image according to the L2 loss function, and training the model to learn the structural information of the image.
Preferably, the L2 loss function comprises:
Figure BDA0003862251610000051
wherein D is i A depth value representing the ith position of the predicted depth map,
Figure BDA0003862251610000052
and the depth value of the ith position of the ground truth depth map is represented.
Preferably, training of the point cloud and image double-branch neural network model of feature interaction is further included, which includes:
taking the point cloud and image data pair as a training data set;
performing enhancement processing on the image in the data set, wherein the enhancement processing comprises turning processing, cutting processing, brightness adjustment and normalization processing, and converting into tensor map processing;
initializing the model parameters by random Gaussian distribution;
and setting a loss function of model training and a loss function of a reconstructed image, adding the two loss functions, setting respective coefficients, taking the minimized loss function as an optimization target, and training the model through a gradient updating strategy to obtain optimal model parameters.
Preferably, the fusion of the scene maps by using the confidence map weighting method to obtain a new scene depth map includes:
respectively acquiring two estimated depth values of corresponding positions of two scene depth maps output by a cloud and image double-branch neural network;
calculating the confidence coefficients of two estimated depth values of the corresponding positions of the two scene depth maps;
respectively calculating the products of the confidence degrees and the depth values of the corresponding positions of the two scene depth maps to obtain a picture to be fused;
and respectively outputting the depth values of the corresponding positions of the picture to be fused by the point cloud and the image double-branch neural network, adding the depth values, and fusing the depth values into the scene depth map to obtain a new scene depth map.
The embodiment of the invention provides a point cloud image depth complementing method based on cascade feature interaction, which has the following beneficial effects compared with the prior art:
according to the point cloud image depth complementing method based on cascade feature interaction, the interaction degree of two heterogeneous sensing data is greatly improved through fine-grained fusion of the multi-scale point cloud image features, the advantage complementation of the two sensing data is realized, the fused features are input into corresponding branch networks again, the information amount of two branches is enriched, the sensing capability of the two branch networks is improved, finally, the output of the two branch networks is fused, and the depth value with higher confidence coefficient of the corresponding position of the two output depth maps is taken as the depth value of the final output depth map. In addition, the outputs of the two modes are independent from each other, and are not mainly provided with a branch, so that the robustness of the model to noise is improved. Finally, compared with other models based on image and point cloud fusion, the model has better performance under the condition of taking the image and the low-beam laser radar point cloud as input, which also proves that the model can be applied to equipment with limited resources of only a camera and a low-beam low-cost laser radar.
Drawings
FIG. 1 is a model structure diagram of a point cloud image depth completion method based on cascade feature interaction according to an embodiment of the present invention;
fig. 2 is a diagram of a cascade feature interaction module according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example (b):
referring to fig. 1-2, the embodiment provides a point cloud image depth complementing method based on cascade feature interaction, and a multi-scale fine-grained fusion model is constructed by using the characteristics that an image has color and texture information, but is influenced by illumination, and point cloud is not influenced by illumination and information is sparse, so that full fusion and advantage complementation of image and point cloud data are realized, and the sensing capability of small objects is greatly improved. In the embodiment, the image reconstruction auxiliary task is introduced to guide the model to learn the structural information in the image, so that the object contour of the output depth map is more complete. And fusing the output depth maps of the two branches by using a confidence map weighting mode to obtain a depth map with higher credibility.
Step1: building a multi-scale double-branch neural network model by using a residual error module of Resnet 34;
step101: the encoder parts of the two branch networks are respectively composed of five residual error blocks, and the decoder parts are respectively composed of five upsampling modules;
step102: the two branch networks comprise a plurality of different convolution layers, a pooling layer, an activation layer, a transposition convolution layer and cross-scale characteristic connection, wherein the encoder part of the point cloud branch, the convolution network of five residual blocks adopt sparse convolution, the encoder part of the image branch, the convolution network of five residual blocks adopt standard convolution networks, and the convolution kernels of all the convolution neural networks are 3x3 in size;
step103: each up-sampling module of the decoder consists of a transposition convolution, a batch normalization layer and a pooling layer;
step104: five cascade feature interaction modules are arranged between the two branch networks, each cascade feature interaction module is composed of a 1x1 convolution, a hole convolution with hole rates of 1, 2 and 4 and a 1x1 convolution according to the building sequence from top to bottom, and the input is a feature diagram of the corresponding hierarchy of the two branch networks;
step105: five upsampling modules are connected behind the last cascade feature interaction module, each upsampling module consists of a convolution network layer, a normalization layer and an activation layer, the output of the last upsampling module is a reconstructed input image, the reconstructed image is used as an auxiliary task, the difference between the reconstructed image and an input RGB image is calculated by utilizing an L2 loss function, and the structural information of the model learning image is trained;
the working process is as follows: each network of the double-branch network has an output depth map, and the depth values of the corresponding positions of the final output depth map are obtained by calculating the confidence degrees of two estimated depth values of the corresponding positions of the two depth maps, multiplying the confidence degrees by the depth values of the corresponding depth maps respectively and adding the two estimated depth values.
Step2: enhancing the images in the data set, wherein the enhancing operation comprises turning, cutting, brightness and the like, carrying out normalization processing, finally converting into a tensor form, and obtaining a training data set convenient for deep learning convolutional neural network processing:
step3: the real scene autopilot dataset used in this example is a KITTI2015 depth estimation and depth completion dataset that includes left and right images of a binocular camera, a lidar point cloud, and a ground truth depth map. The image input into the model is cropped to HxW to 325x1216 resolution. In order to accelerate the training speed of the model, the input image is normalized by zero mean value in the present example. The model parameters are initialized with random Gaussian distribution before training is started, and the model performance can be enhanced by enough randomness. The specific parameter settings for this example during training are as follows:
parameter name Parameter value
Batch size (batch size) 16
Input image resolution (H x W) 352x1216
Number of training rounds (Epochs) 30
Learning rate (learning rate) 1e-4
Effective depth value range (unit: m) 0-80
Step, 4: according to the established double-branch network model, a loss function of model training and a loss function of image reconstruction are set, the two loss functions are added, respective coefficients are set, the minimum loss function serves as an optimization target, and the model is trained through a gradient updating strategy to obtain the optimal model parameters.
The loss function used in this example is the L2 loss function:
Figure BDA0003862251610000081
in which D is i Depth values representing i positions lower in the predicted depth map,
Figure BDA0003862251610000082
and the depth value of the ith position of the ground truth depth map is represented. In this example, an Adam optimizer is used to optimize the model parameters, achieving the goal of minimizing the loss function. The Adam algorithm optimization process can be summarized as: and dynamically adjusting the learning rate of each parameter by using the sample mean value estimation and the sample square mean value estimation of the gradient once each iteration, so that the parameters are updated more stably during training, and the gradient of the model can be stably reduced.
Step5: and inputting the point cloud and image data pair data into the double-branch network model to obtain a final output depth map.
In the point cloud image fusion depth completion method based on the cascade feature fusion in the embodiment, based on the convolutional neural network, the specifically designed network aiming at two sensing data is utilized to respectively extract the features of two modes, and the cascade feature network is used to realize the fine-grained fusion of the multi-scale point cloud image features, so that the interaction degree of the two modes is fully improved, the depth sensing capability of the two branch networks is enhanced, and the robustness of the model to noise is improved. After an image reconstruction task is introduced, the image reconstruction task guides a model to learn scene structure information in the image, and an output depth map of the model has better integrity on the object contour. Through comparison on the KITTI2015 deep completion and the deep estimation task, the model achieves the best performance on the deep estimation task, the competitive performance on the deep completion task is achieved, the model achieves a result with strong competitive power in a Gaussian noise introduced robustness experiment, and the practicability of the method is proved.
Although the present invention has been described in detail with reference to the specific embodiments, it should be understood that various changes and modifications can be made by those skilled in the art without departing from the spirit and scope of the invention.

Claims (9)

1. A point cloud image depth completion method based on cascade feature interaction is characterized by comprising the following steps:
acquiring a three-dimensional point cloud of an automatic driving scene and a two-dimensional RGB image of the scene;
according to a plurality of cascaded residual modules of Resnet34, two encoders for extracting characteristics of scene three-dimensional point cloud and scene two-dimensional RGB images are constructed;
according to a plurality of cascaded up-sampling modules, two decoders for performing feature restoration on the scene three-dimensional point cloud and the scene two-dimensional RGB image are constructed;
connecting the output of an encoder of the scene three-dimensional point cloud extraction and reduction branch with the input of a decoder to construct a scene three-dimensional point cloud branch neural network;
connecting the output of an encoder of the scene two-dimensional RGB image extraction and restoration branch with the input of a decoder to construct a scene two-dimensional RGB image branch neural network;
setting levels of residual modules of two encoders in a scene three-dimensional point cloud branched neural network and a scene two-dimensional RGB image branched neural network in a one-to-one correspondence manner;
building a feature interaction module according to a 1x1 convolution, three void convolutions of which the void rates are 1, 2 and 4 respectively, and the 1x1 convolution in sequence, and cascading to obtain a plurality of cascading feature interaction modules, wherein the input of each cascading feature interaction module is connected with the corresponding level output of a residual error module of two encoders, the output of each cascading feature interaction module is connected with the next corresponding level of the two encoders, and a point cloud and image double-branch neural network model of feature interaction is built;
inputting a scene three-dimensional point cloud and a scene two-RGB dimensional image in a point cloud and image double-branch neural network model with feature interaction, and outputting two scene depth maps;
and fusing the two scene depth maps in a confidence map weighting mode to obtain a new scene depth map.
2. The point cloud image depth completion method based on cascade feature interaction as claimed in claim 1, wherein the two encoders for extracting features of the scene three-dimensional point cloud and the scene two-dimensional RGB image each comprise five cascade residual error modules, the two decoders for restoring features of the scene three-dimensional point cloud and the scene two-dimensional RGB image each comprise five cascade upsampling modules, and in the encoder for extracting features of the scene three-dimensional point cloud, the convolutional neural network of the residual error modules adopts a sparse convolutional neural network, and the convolutional kernel is 3x3; in an encoder for extracting the features of a scene two-dimensional RGB image, a convolution neural network of a residual error module adopts a standard convolution neural network, and a convolution kernel is 3x3.
3. The method of claim 1, wherein the three-dimensional point cloud branched neural network and the scene two-dimensional RGB image branched neural network each comprise a plurality of different convolutional layers, pooling layers, activation layers, transpose convolutional layers, and cross-scale feature connection layers.
4. The method of claim 1, wherein each decoder comprises five cascaded upsampling modules, each upsampling module comprising a transposed convolution, a batch normalization layer, and a pooling layer.
5. The method of claim 4, wherein the number of the cascade feature interaction modules is five.
And the output of the last cascade feature interaction module is used as the input of a first up-sampling layer of the point cloud and image double-branch neural network model of feature interaction.
6. The method of claim 4, wherein the method of point cloud image depth completion based on cascade feature interaction further comprises:
and taking the reconstructed image output by the last up-sampling module as an auxiliary task, calculating the difference between the reconstructed image and the input scene two-dimensional RGB image according to the L2 loss function, and training a model to learn the structural information of the image.
7. The method of claim 6, wherein the L2 loss function comprises:
Figure FDA0003862251600000021
wherein D is i A depth value representing the ith position of the predicted depth map,
Figure FDA0003862251600000022
and the depth value of the ith position of the ground truth depth map is represented.
8. The method of claim 7, further comprising training a point cloud and image dual-branch neural network model of feature interaction, which comprises:
taking the point cloud and image data pair as a training data set;
performing enhancement processing on the image in the data set, wherein the enhancement processing comprises turning processing, cutting processing, brightness adjustment and normalization processing, and converting into tensor map processing;
initializing the model parameters by random Gaussian distribution;
and setting a loss function of model training and a loss function of a reconstructed image, adding the two loss functions, setting respective coefficients, taking the minimized loss function as an optimization target, and training the model through a gradient updating strategy to obtain optimal model parameters.
9. The method of claim 1, wherein the fusing two scene depth maps by using confidence map weighting to obtain a new scene depth map comprises:
respectively acquiring two estimated depth values of corresponding positions of two scene depth maps output by a cloud and image double-branch neural network;
calculating the confidence degrees of two estimated depth values of the corresponding positions of the two scene depth maps;
respectively calculating the products of the confidence degrees and the depth values of the corresponding positions of the two scene depth maps to obtain a picture to be fused;
and respectively outputting the depth values of the corresponding positions of the picture to be fused by the point cloud and the image double-branch neural network, adding the depth values, and fusing the depth values into the scene depth map to obtain a new scene depth map.
CN202211167454.9A 2022-09-23 2022-09-23 Point cloud image depth completion method based on cascade feature interaction Pending CN115511759A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211167454.9A CN115511759A (en) 2022-09-23 2022-09-23 Point cloud image depth completion method based on cascade feature interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211167454.9A CN115511759A (en) 2022-09-23 2022-09-23 Point cloud image depth completion method based on cascade feature interaction

Publications (1)

Publication Number Publication Date
CN115511759A true CN115511759A (en) 2022-12-23

Family

ID=84506860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211167454.9A Pending CN115511759A (en) 2022-09-23 2022-09-23 Point cloud image depth completion method based on cascade feature interaction

Country Status (1)

Country Link
CN (1) CN115511759A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115861401A (en) * 2023-02-27 2023-03-28 之江实验室 Binocular and point cloud fusion depth recovery method, device and medium
CN116503418A (en) * 2023-06-30 2023-07-28 贵州大学 Crop three-dimensional target detection method under complex scene

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115861401A (en) * 2023-02-27 2023-03-28 之江实验室 Binocular and point cloud fusion depth recovery method, device and medium
CN116503418A (en) * 2023-06-30 2023-07-28 贵州大学 Crop three-dimensional target detection method under complex scene
CN116503418B (en) * 2023-06-30 2023-09-01 贵州大学 Crop three-dimensional target detection method under complex scene

Similar Documents

Publication Publication Date Title
CN109377530B (en) Binocular depth estimation method based on depth neural network
CN111832655B (en) Multi-scale three-dimensional target detection method based on characteristic pyramid network
CN113159151B (en) Multi-sensor depth fusion 3D target detection method for automatic driving
CN111259945B (en) Binocular parallax estimation method introducing attention map
CN111931787A (en) RGBD significance detection method based on feature polymerization
CN115511759A (en) Point cloud image depth completion method based on cascade feature interaction
CN113592026B (en) Binocular vision stereo matching method based on cavity volume and cascade cost volume
CN112634341A (en) Method for constructing depth estimation model of multi-vision task cooperation
CN109005398B (en) Stereo image parallax matching method based on convolutional neural network
CN116229461A (en) Indoor scene image real-time semantic segmentation method based on multi-scale refinement
Dai et al. Adaptive disparity candidates prediction network for efficient real-time stereo matching
CN113887349A (en) Road area image identification method based on image and point cloud fusion network
CN113486887B (en) Target detection method and device in three-dimensional scene
Zhao et al. Eai-stereo: Error aware iterative network for stereo matching
CN112651423A (en) Intelligent vision system
Jeon et al. ABCD: Attentive bilateral convolutional network for robust depth completion
CN114549537A (en) Unstructured environment point cloud semantic segmentation method based on cross-modal semantic enhancement
CN113222033A (en) Monocular image estimation method based on multi-classification regression model and self-attention mechanism
CN116612468A (en) Three-dimensional target detection method based on multi-mode fusion and depth attention mechanism
CN116485867A (en) Structured scene depth estimation method for automatic driving
Huang et al. ES-Net: An efficient stereo matching network
CN116258756B (en) Self-supervision monocular depth estimation method and system
CN116402874A (en) Spacecraft depth complementing method based on time sequence optical image and laser radar data
CN115330935A (en) Three-dimensional reconstruction method and system based on deep learning
CN115294371A (en) Complementary feature reliable description and matching method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination