CN114998683A - Attention mechanism-based ToF multipath interference removing method - Google Patents

Attention mechanism-based ToF multipath interference removing method Download PDF

Info

Publication number
CN114998683A
CN114998683A CN202210622444.3A CN202210622444A CN114998683A CN 114998683 A CN114998683 A CN 114998683A CN 202210622444 A CN202210622444 A CN 202210622444A CN 114998683 A CN114998683 A CN 114998683A
Authority
CN
China
Prior art keywords
attention
network
loss
multipath interference
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210622444.3A
Other languages
Chinese (zh)
Other versions
CN114998683B (en
Inventor
周文彪
王鑫
贾云飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202210622444.3A priority Critical patent/CN114998683B/en
Publication of CN114998683A publication Critical patent/CN114998683A/en
Application granted granted Critical
Publication of CN114998683B publication Critical patent/CN114998683B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a ToF multipath interference removing method based on an attention mechanism, and belongs to the technical field of deep learning computer vision and image enhancement. The method comprises the following steps: acquiring an initial sample data set, and reading and preprocessing a depth image; building a network model for removing multipath interference, and mainly introducing an attention mechanism to capture multipath interference attention distribution of a spatial scene to generate an attention diagram; optimizing a loss function and a learning mode, training a network model and updating parameters; and testing different synthesized depth maps with basic facts and the depth map of a real scene to obtain an image without multipath interference, and displaying the robustness of the network. The invention realizes high-performance removal of ToF multipath interference errors on one hand, has certain advantages in applicability and memory consumption on the other hand, and has strong generalization capability.

Description

Attention mechanism-based ToF multipath interference removing method
Technical Field
The invention belongs to the technical field of deep learning computer vision and image enhancement, and particularly relates to a ToF multipath interference removing method based on an attention mechanism.
Background
In recent years, 3D imaging and sensor technologies have been developed, and related research results have been widely applied to the fields of face recognition, three-dimensional reconstruction, medical assistance, and the like. A ToF (Time-of-Flight) camera can actively acquire scene depth information and accurately represent three-dimensional information of a scene and a surface of an object, and has become a popular depth estimation technique. Unlike traditional two-dimensional images, the addition of depth information opens up a wide range of 3D visual applications.
The ToF camera is composed of an image sensor, an image processing chip and a modulated light source. The ToF sensor firstly gives a modulation signal to the light source driving chip, the modulation signal controls the laser to emit modulated near-infrared light, and the receiving end calculates the depth information through the phase difference or time difference between the emitted light and the received light. In practical applications, the continuous wave modulated iToF (indirect ToF) is mostly adopted, the phase difference is resolved through sampling, and then the time of flight is calculated, so as to obtain the distance of the target object. The ideal iToF imaging situation is: the transmitted signal undergoes only one reflection in the scene and the received signal of the sensor contains only the primary reflected light and the ambient light. However, in the actual imaging process, because of the diversity of the geometric shapes and material characteristics of the objects in the spatial scene, the light rays may be reflected and refracted many times in the scene, which means that the actual received signal may contain several (even countless) sub-signals, resulting in attenuation and phase shift of the signal, affecting the accuracy of the depth measurement and the effect of the three-dimensional reconstruction. This phenomenon is called Multi-Path Interference (MPI), which is the biggest obstacle to the widespread use of ToF, and therefore, it is necessary to remove the MPI error.
From the introduction of ToF imaging techniques to the present, research on ToF multi-path interference removal has been advancing. The early MPI correction mainly adopts a traditional physical algorithm, depends on the sparsity of optical reflection, and utilizes a multi-frequency modulation measurement value to solve the depth and the amplitude corresponding to each signal component. Or modifying the hardware module, and obtaining the direct path component from a plurality of measured values through the single-frequency coding illumination mode. These physical methods have limitations in the types of reflection and the number of optical paths, and are slow in calculation, so that more general methods are required. In recent years, with the improvement of a machine learning system and the successful application of a Neural Network in each visual task, more and more scholars build a Convolutional Neural Network (CNN) to correct MPI. The method does not need to modify a hardware module, has relatively simple algorithm design, and has better effect of eliminating errors compared with the traditional method. However, for the current MPI removal of ToF images, most network models do not fully utilize the MPI feature distribution of spatial scenes. Therefore, focusing on a region seriously affected by multipath interference, on the basis of globally removing MPI from the depth map, enhancing the capability of local depth information recovery is of great significance, and the method is also one of the targets that the method is dedicated to achieve.
Disclosure of Invention
The invention aims to provide a method for removing ToF multipath interference based on an attention mechanism, which aims at solving the problems that a large depth measurement error is caused by a multipath interference phenomenon in the imaging process of a ToF camera and the effect of removing the multipath interference is further improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
the ToF multipath interference removing method comprises the following steps:
s1, acquiring an initial sample data set, and dividing a training set and a test set;
the training set comprises a depth map subjected to multipath interference and a reference depth map in pairs;
the test set is a depth map suffering from multipath interference;
s2, reading the depth map subjected to the multipath interference and the reference depth map in pairs in the training set, and calculating the difference of the depth map and the reference depth map to obtain an error map of the multipath interference;
s3, carrying out threshold processing on the error map to generate an attention mask map;
wherein, the attention mask image is a binary image, and the value of the pixel point is 0 or 1;
s4, constructing a network model for removing the multi-path interference based on the attention mechanism, which comprises an attention network, a generation network and a discrimination network, and specifically comprising the following substeps:
s41, constructing an attention network, cascading the sub-attention diagrams output by each residual attention module with the depth map of the multipath interference, inputting the cascade sub-attention diagrams into the next residual attention module, and obtaining an MPI (Multi-Point interference) attention diagram after passing through a plurality of attention modules;
wherein the MPI attention diagram is a non-binary diagram, and the values of the pixel points in the MPI attention diagram are all 0 to 1;
s42, constructing a generating network, cascading the MPI attention diagram obtained by the attention network with the depth image of the multipath interference, and inputting the images into the generating network to obtain a prediction depth diagram;
the generating network adopts a coding and decoding structure and comprises a feature extraction module and an up-sampling module;
the feature layers with the same size and symmetry in the feature extraction module and the up-sampling module are called mirror image layers, and the mirror image layers are added through jumping connection;
s43, constructing a discrimination network, inputting a prediction depth map and a reference depth map which are output by the generated network into the discrimination network, extracting spatial features by Conv + BN + LReLU to obtain a feature layer, connecting Conv with the number of channels being 1 on the feature layer to obtain an MPI attention guide map of the discrimination network, multiplying the MPI attention guide map with the feature layer, connecting Conv + BN + LReLU to obtain low-dimensional feature representation, and finally outputting a discrimination result through FC and Sigmoid;
s5, setting hyper-parameters, training the network for removing the multi-path interference based on the attention mechanism, and obtaining a trained network model;
the hyper-parameters comprise a learning rate, a batch processing size and a training period, the training is based on a network of an attention mechanism for removing multipath interference, and a trained network model is obtained, and the method specifically comprises the following steps:
s51, constructing attention loss by using the attention mask diagram and each sub-attention diagram;
the attention perception loss is a weighted sum of the attention mask map and the L2 loss of each sub-attention map;
s52, constructing coding and decoding loss by using the reference depth map and the prediction depth map;
the coding loss includes L1 loss and gradient loss of the reference depth map and the prediction depth map;
s53, constructing the countermeasure loss by using the judgment result of the judgment network;
the countermeasure loss is cross entropy loss of the GAN network;
s54, training a generating network, constructing and minimizing a loss function of the generating network according to attention perception loss, coding and decoding loss and resistance loss, and iteratively updating weight parameters of the generating network;
the loss function of the generation network is a weighted sum of attention awareness loss, codec loss, and countervailing loss;
s55, training a discrimination network, minimizing a loss function of the discrimination network, and iteratively updating weight parameters of the discrimination network;
the loss function of the discrimination network is the antagonistic loss;
s56, saving the weight parameters of the iterative update to obtain the trained network model
And S6, importing the test set into the trained network model, and outputting the depth image with the MPI removed.
Advantageous effects
Compared with the existing multipath interference removing method, the attention mechanism-based ToF multipath interference removing method has the following beneficial effects:
1. the ToF multipath interference removing method uses an attention mechanism for correcting the MPI error of the ToF, can focus on the distribution condition of the MPI, emphasize the seriously damaged area, and better recover the local depth information on the basis of globally removing the MPI, and has better performance for multipath of a single-frequency depth image in both qualitative and quantitative aspects compared with other existing neural network methods;
2. the multipath interference removing method is suitable for wide ToF depth images, errors can be effectively reduced no matter the ToF depth images are simulated or the depth images of real scenes, and robustness is achieved;
3. compared with other neural network methods, the multipath interference removing method has the advantages that the total weight needing to be learned is less, higher performance can be realized by a network model with a smaller magnitude, and the method is convenient to operate on a GPU with a smaller memory.
Drawings
FIG. 1 is a flow chart of a ToF multi-path interference removing method based on attention mechanism according to the present invention;
FIG. 2 is a schematic diagram of the structural composition and connection relationship of the attention-based ToF multi-path interference removal method of the present invention;
FIG. 3 is a graph of the effect of removing MPI on a synthetic ToF depth map according to the present invention;
FIG. 4 is a diagram illustrating the effect of removing MPI on a real ToF depth map according to the present invention;
FIG. 5 is a diagram of a network model implemented on a synthetic ToF depth map according to the present invention;
FIG. 6 shows the comparison of the present method with several MPI-removed network models.
Detailed Description
The attention-based ToF multi-path interference removing method of the present invention is further illustrated and described in detail below with reference to the accompanying drawings and embodiments.
Example 1
This example illustrates the MPI removal process for a data set 1 using the method of the present invention. The data set 1 comprises pairs of synthetic multipath interfered ToF depth maps and corresponding reference depth maps, real multipath interfered ToF depth maps of unknown reference depths and pairs of real multipath interfered ToF depth maps and corresponding reference depth maps. The synthetic ToF depth map size of data set 1 is 240x320 and the real ToF depth map size is 239x 320. The flow chart of the invention is shown in fig. 1, and fig. 2 is a schematic diagram of the structural composition and the connection relationship of the invention. Fig. 3 and 4 are effect diagrams of the present invention in embodiment 1.
S1, obtaining an initial sample data set 1, and dividing a training set and a test set;
the training set is a pair of a synthetic ToF depth map subjected to multipath interference, a corresponding reference depth map and a real ToF depth map subjected to multipath interference and with unknown reference depth, and the testing set is a real ToF depth map subjected to multipath interference;
s2, reading the depth map subjected to the multipath interference and the reference depth map in pairs in the training set, and calculating the difference of the depth map and the reference depth map to obtain an error map of the multipath interference;
s3, carrying out threshold processing on the error map, and processing the real ToF depth map subjected to multipath interference of the unknown reference depth to obtain an attention mask map;
s31, carrying out threshold processing on the error map of the synthesized ToF to obtain an attention mask map;
the threshold processing is as follows: setting an error threshold value of an error map of the synthesized ToF as a, namely, setting a position with a depth error smaller than a, presetting a mask at the position as 0, and considering that the point is not subjected to multipath interference, otherwise, setting the mask as 1, which represents that the depth value of the point has multipath interference, and forming an attention mask map by masks of all pixel points;
wherein, the attention mask map is a binary map, and only has two values of 0 and 1;
s32, processing the real ToF depth map of the unknown reference depth subjected to the multipath interference to obtain an attention mask map;
setting the mask of all pixel points as 0.5;
s4, constructing a network model for removing the multi-path interference based on the attention mechanism, which comprises an attention network, a generation network and a discrimination network;
the network model for removing the multipath interference takes GAN as a baseline model, and specifically comprises the following substeps:
s41, constructing an attention network, cascading the sub-attention diagrams output by each residual attention module with the depth map of the multipath interference, inputting the cascade sub-attention diagrams into the next residual attention module, and obtaining an MPI (Multi-Point interference) attention diagram after passing through a plurality of attention modules;
the attention network consists of 2 residual attention modules for representing space attention, each residual attention module consists of 6 convolution layers with residual structures and a Sigmoid active layer, the convolution kernel is 3x3, the output channel is 8, and the step length is 1;
wherein the MPI attention diagram is a non-binary diagram, and the values of the pixel points in the MPI attention diagram are all 0 to 1; the value of the pixel point represents the strength of MPI, the closer the value is to 1, the more serious the interference is, and 0 represents that MPI is hardly received;
s42, constructing a generating network, cascading the MPI attention diagram obtained by the attention network with the depth image of the multipath interference, and inputting the images into the generating network to obtain a prediction depth diagram;
the generation network adopts a coding and decoding structure and comprises a feature extraction module and an up-sampling module;
the feature extraction module consists of 4 layers of Conv + LReLU and Conv + AvgPool + LReLU, the step lengths are respectively 2 and 1, the convolution kernel is 3x3, the up-sampling module consists of 4 layers of Deconv + LReLU and 1 layer of Conv + LReLU, and the convolution kernel is 4x 4;
the feature layers with the same size and symmetry in the feature extraction module and the up-sampling module are called mirror image layers, and the mirror image layers are added through jumping connection;
s43, constructing a discrimination network, inputting a prediction depth map and a reference depth map which are output by the generated network into the discrimination network, extracting spatial features by Conv + BN + LReLU to obtain a feature layer, connecting Conv with the number of channels being 1 on the feature layer to obtain an MPI attention guide map of the discrimination network, multiplying the MPI attention guide map with the feature layer, connecting Conv + BN + LReLU to obtain low-dimensional feature representation, and finally outputting a discrimination result through FC and Sigmoid;
the predicted depth map comprises a predicted depth map for a synthetic ToF and a predicted depth map for a real ToF;
the reference depth map is a reference depth map of a synthetic ToF;
s5, setting hyper-parameters, training a network for removing multi-path interference based on an attention mechanism, and obtaining a trained network model;
the hyper-parameters comprise a learning rate, a batch processing size and a training period, the training is based on a network of an attention mechanism for removing multipath interference, and a trained network model is obtained, and the method specifically comprises the following steps:
s51, setting the learning rate to be 0.0001 exponential decay, and the training period to be 120;
s52, constructing attention loss by using the attention mask diagram and each sub-attention diagram of the synthetic ToF; the attention perception loss is a weighted sum of the attention mask map and the L2 loss of each sub-attention map;
s53, constructing coding and decoding loss by using the reference depth map and the prediction depth map;
the coding loss is L1 loss and gradient loss of a reference depth map of the synthesized ToF and a prediction depth map of the synthesized ToF;
s53, constructing the countermeasure loss by using the judgment result of the judgment network;
the antagonistic loss is cross entropy loss of the GAN network;
s55, training a generated network, constructing and minimizing a loss function of the generated network according to attention perception loss, coding and decoding loss and confrontation loss, and iteratively updating weight parameters of the generated network;
the loss function of the generation network is a weighted sum of attention awareness loss, codec loss, and countervailing loss;
s56, training a discrimination network, minimizing a loss function of the discrimination network, and iteratively updating weight parameters of the discrimination network;
the loss function of the discrimination network is a countermeasure loss;
s57, saving the weight parameters of the iterative update to obtain the trained network model
And S6, importing the test set into the trained network model, and outputting the depth image with the MPI removed.
To this end, from step 1 to step 6, the attention-based ToF multi-path interference removal method is completed.
Example 2
This example illustrates the MPI removal process for data set 2 using the method of the present invention. Data set 2 is a synthetic ToF depth map containing six different reflectivities, with an image size of 256x 256. Fig. 5 is a concrete network model diagram of the method in embodiment 2.
S1, acquiring an initial sample data set, and dividing a training set and a test set;
the training set comprises 6696 paired synthesized ToF depth maps subjected to multipath interference and corresponding reference depth maps, and the test set comprises 1704 synthesized ToF depth maps subjected to multipath interference;
s2, reading the depth map subjected to the multipath interference and the reference depth map in pairs in the training set, and calculating the difference of the depth map and the reference depth map to obtain an error map of the multipath interference;
s3, carrying out threshold processing on the error map, and processing the real ToF depth map of the unknown reference depth to obtain an attention mask map;
the threshold processing process comprises the following steps: the error threshold of the synthesized ToF error map is a, that is, the position with depth error smaller than a, the mask is preset to be 0, the point is considered not to be subjected to multipath interference, otherwise, the mask is set to be 1, which represents that the depth value of the point has multipath interference, and the masks of all pixel points form an attention mask map;
wherein, the attention mask map is a binary map, and only has two values of 0 and 1;
s4, constructing a network model for removing the multi-path interference, which comprises an attention network, a generation network and a judgment network;
the network model for removing the multipath interference takes GAN as a baseline model, and specifically comprises the following substeps:
s41, constructing an attention network, cascading the sub-attention diagrams output by each residual attention module with the depth map of the multipath interference, inputting the cascade sub-attention diagrams into the next residual attention module, and obtaining an MPI (Multi-Point interference) attention diagram after passing through a plurality of attention modules;
the attention network consists of 4 residual attention modules for representing space attention, each residual attention module consists of 6 convolution layers with residual structures and a Sigmoid active layer, the convolution kernel is 3x3, the output channel is 8, and the step length is 1;
wherein the MPI attention diagram is a non-binary diagram, and the values of the pixel points in the MPI attention diagram are all 0 to 1; the value of the pixel point represents the strength of MPI, the closer the value is to 1, the more serious the interference is, and 0 represents that MPI is hardly received;
s42, constructing a generating network, cascading the MPI attention diagram obtained by the attention network with the depth map of the multipath interference, and inputting the cascade into the generating network to obtain a prediction depth map;
the generating network adopts a coding and decoding structure and comprises a feature extraction module and an up-sampling module;
the feature extraction module consists of 6 layers of Conv + LReLU and Conv + AvgPool + LReLU, the step lengths are respectively 2 and 1, the convolution kernel is 3x3, the up-sampling module consists of 6 layers of Deconv + LReLU and 1 layer of Conv + LReLU, and the convolution kernel is 4x 4;
the feature layers with the same size and symmetry in the feature extraction module and the up-sampling module are called mirror image layers, and the mirror image layers are added through jumping connection;
s43, constructing a discrimination network, inputting a prediction depth map and a reference depth map which are output by the generated network into the discrimination network, extracting spatial features by Conv + BN + LReLU to obtain a feature layer, connecting Conv with the number of channels being 1 on the feature layer to obtain an MPI attention guide map of the discrimination network, multiplying the MPI attention guide map with the feature layer, connecting Conv + BN + LReLU to obtain low-dimensional feature representation, and finally outputting a discrimination result through FC and Sigmoid;
s5, setting hyper-parameters, training a network for removing multi-path interference based on an attention mechanism, and obtaining a trained network model;
the hyper-parameters comprise a learning rate, a batch processing size and a training period, the training is based on a network of an attention mechanism for removing multipath interference, and a trained network model is obtained, and the method specifically comprises the following steps:
s51, setting the learning rate to be 0.0001 exponential decay, and the training period to be 150;
s52, constructing the attention loss by using the attention mask graph and each sub-attention graph;
the attention aware loss is a weighted sum of the L2 loss of the attention mask map and each sub-attention map;
s53, constructing coding and decoding loss by using the reference depth map and the prediction depth map;
the coding loss includes L1 loss and gradient loss of the reference depth map and the prediction depth map;
s53, constructing the countermeasure loss by using the judgment result of the judgment network;
the countermeasure loss is cross entropy loss of the GAN network;
s55, training a generated network, constructing and minimizing a loss function of the generated network according to attention perception loss, coding and decoding loss and confrontation loss, and iteratively updating weight parameters of the generated network;
the loss function of the generation network is a weighted sum of attention awareness loss, codec loss, and countervailing loss;
s56, training a discrimination network, minimizing a loss function of the discrimination network, and iteratively updating weight parameters of the discrimination network;
the loss function of the discrimination network is a countermeasure loss;
s57, saving the weight parameters of the iterative update to obtain the trained network model
And S6, importing the test set into the trained network model, and outputting the depth image with the MPI removed.
To this end, from step 1 to step 6, the attention-based ToF multi-path interference removal method is completed.
MPI correction is performed on the two data sets by the method and a common network model, the image processing effect is shown in FIG. 5, and the corresponding average absolute error (MAE) and Relative error (Relative error) are shown in Table 1. Wherein, the network models of comparison are DeepToF: a two-stage convolutional network; ToF-KPN: a core prediction network; coarse-to-fine: a convolution network consisting of two sub-networks of thickness; Sharp-Net: and (3) a spatial hierarchy perception residual pyramid network.
TABLE 1 quantitative comparison of the method with several other network models
Figure BDA0003675110940000131
As can be seen from a comparison between fig. 6 and table 1, the method of the present invention has better MPI error removal effect when inputting the same depth image compared to other methods.
While the foregoing is directed to the preferred embodiment of the present invention, it is not intended that the invention be limited to the embodiment and the drawings disclosed herein. Equivalents and modifications may be made without departing from the spirit of the disclosure, which is to be considered as within the scope of the invention.

Claims (10)

1. A method for removing ToF multipath interference based on an attention mechanism relies on a network model for removing multipath interference based on the attention mechanism, wherein the network model for removing multipath interference based on the attention mechanism comprises an attention network, a generation network and a discrimination network; the method is characterized by comprising the following steps:
s1, acquiring an initial sample data set, and dividing a training set and a test set;
the training set comprises pairs of a depth map subjected to multipath interference and a reference depth map;
the test set is a depth map suffering from multipath interference;
s2, reading the depth map subjected to the multipath interference and the reference depth map in pairs in the training set, and calculating the difference of the depth map and the reference depth map to obtain an error map of the multipath interference;
s3, carrying out threshold processing on the error map to generate an attention mask map;
s4, constructing a network model for removing the multi-path interference based on the attention mechanism, which comprises an attention network, a generation network and a discrimination network;
the method specifically comprises the following substeps:
s41, constructing an attention network and obtaining an MPI attention diagram, specifically: cascading the sub-attention diagrams output by each residual attention module with the depth map of the multipath interference, inputting the cascade sub-attention diagrams into the next residual attention module, and obtaining an MPI (Multi-path interference) attention diagram after passing through a plurality of attention modules;
the attention network is used for obtaining an MPI attention diagram and comprises a plurality of residual attention modules; the residual error attention module is used for obtaining a sub-attention diagram, and comprises a plurality of convolution layers of residual error structures and an activation function layer; the MPI attention diagram is a sub-attention diagram obtained by the last residual attention module;
s42, constructing a generation network and obtaining a prediction depth map, specifically: cascading the MPI attention diagram obtained by the attention network with a depth image of multipath interference, and inputting the image into a generating network to obtain a prediction depth map;
the generation network is used for obtaining the prediction depth map without the MPI and comprises a feature extraction module and an up-sampling module; the feature extraction module comprises a plurality of Conv + LReLUs and Conv + AvgPool + LReLUs; the upsampling module comprises a plurality of Deconv + LReLUs and Conv + LReLUs;
wherein, Conv denotes a convolutional layer, LReLU denotes a Leaky ReLU activation function layer, AvgPool denotes an average pooling layer, and Deconv denotes an anti-convolutional layer;
s43, constructing a discrimination network and outputting a discrimination result, specifically:
inputting a prediction depth map and a reference depth map which are generated and output by a network into a discrimination network, extracting spatial features by Conv + BN + LReLU to obtain a feature layer, connecting the Conv with the feature layer with the number of channels being 1 to obtain an MPI attention guide map of the discrimination network, multiplying the MPI attention guide map with the feature layer, connecting the MPI attention guide map with the Conv + BN + LReLU to obtain low-dimensional feature representation, and finally outputting a discrimination result through FC and Sigmoid;
the judgment network is used for judging the authenticity of the generated image and comprises a plurality of Conv + BN + LReLU, Conv and FC + Sigmoid; wherein BN represents a batch normalization layer, and FC represents a full connection layer; sigmoid represents a Sigmoid activation function layer;
s5, setting hyper-parameters, training a network for removing multi-path interference based on an attention mechanism, and obtaining a trained network model;
the hyper-parameters comprise learning rate, batch processing size and training period, and the training is based on the network for removing the multipath interference of the attention mechanism to obtain a trained network model;
and S6, importing the test set into the trained network model, and outputting the depth image with the MPI removed.
2. The ToF multipath interference removing method of claim 1, wherein in S3, the attention mask map is a binary map, and the values of the pixels are 0 or 1.
3. The ToF multipath interference removal method of claim 1, wherein in S41, the MPI attention map is a non-binary map and the values of the pixels in the MPI attention map are all 0 to 1.
4. The ToF multi-path interference removing method of claim 1, wherein the generating network of S42 adopts a codec structure including a feature extraction module and an up-sampling module;
and symmetrical feature layers with the same size in the feature extraction module and the up-sampling module are called mirror image layers, and the mirror image layers are added through jump connection.
5. The ToF multipath interference removing method of claim 1, wherein the network model for removing multipath interference to generate the countermeasure network is a baseline model S5.
6. The ToF multi-path interference removing method according to claim 1, wherein S5 specifically includes:
s51, constructing attention loss by using the attention mask diagram and each sub-attention diagram;
s52, constructing coding and decoding loss by using the reference depth map and the prediction depth map;
s53, constructing the countermeasure loss by using the judgment result of the judgment network;
s54, training a generating network, constructing and minimizing a loss function of the generating network according to attention perception loss, coding and decoding loss and resistance loss, and iteratively updating weight parameters of the generating network;
s55, training a discrimination network, minimizing a loss function of the discrimination network, and iteratively updating weight parameters of the discrimination network;
and S56, saving the weight parameters updated iteratively to obtain the trained network model.
7. The ToF multipath interference removing method of claim 1 or 5, wherein the attention aware loss is a weighted sum of the attention mask map and the L2 loss of each sub-attention map at S51.
8. The ToF multipath interference removing method of claim 1 or 5, wherein the coding loss of S52 includes L1 loss and gradient loss of the reference depth map and the predicted depth map.
9. The ToF multi-path interference removal method of claim 1 or 5, wherein the countermeasure loss of S53 is a cross-entropy loss of the GAN network.
10. The ToF multipath interference removing method according to claim 1 or 5, wherein the loss function of the generating network of S54 is a weighted sum of attention aware loss, codec loss and countervailing loss;
the method of S55 determines the loss function of the network as a countermeasure loss.
CN202210622444.3A 2022-06-01 2022-06-01 Attention mechanism-based ToF multipath interference removal method Active CN114998683B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210622444.3A CN114998683B (en) 2022-06-01 2022-06-01 Attention mechanism-based ToF multipath interference removal method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210622444.3A CN114998683B (en) 2022-06-01 2022-06-01 Attention mechanism-based ToF multipath interference removal method

Publications (2)

Publication Number Publication Date
CN114998683A true CN114998683A (en) 2022-09-02
CN114998683B CN114998683B (en) 2024-05-31

Family

ID=83031753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210622444.3A Active CN114998683B (en) 2022-06-01 2022-06-01 Attention mechanism-based ToF multipath interference removal method

Country Status (1)

Country Link
CN (1) CN114998683B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110853051A (en) * 2019-10-24 2020-02-28 北京航空航天大学 Cerebrovascular image segmentation method based on multi-attention dense connection generation countermeasure network
CN111127346A (en) * 2019-12-08 2020-05-08 复旦大学 Multi-level image restoration method based on partial-to-integral attention mechanism
CN111402137A (en) * 2020-03-20 2020-07-10 南京信息工程大学 Depth attention coding and decoding single image super-resolution algorithm based on perception loss guidance
CN111739082A (en) * 2020-06-15 2020-10-02 大连理工大学 Stereo vision unsupervised depth estimation method based on convolutional neural network
CN111738124A (en) * 2020-06-15 2020-10-02 西安电子科技大学 Remote sensing image cloud detection method based on Gabor transformation and attention
CN112184731A (en) * 2020-09-28 2021-01-05 北京工业大学 Multi-view stereo depth estimation method based on antagonism training
CN112233026A (en) * 2020-09-29 2021-01-15 南京理工大学 SAR image denoising method based on multi-scale residual attention network
AU2020103715A4 (en) * 2020-11-27 2021-02-11 Beijing University Of Posts And Telecommunications Method of monocular depth estimation based on joint self-attention mechanism
CN112508800A (en) * 2020-10-20 2021-03-16 杭州电子科技大学 Attention mechanism-based highlight removing method for surface of metal part with single gray image
CN113205468A (en) * 2021-06-01 2021-08-03 桂林电子科技大学 Underwater image real-time restoration model based on self-attention mechanism and GAN
US20210390723A1 (en) * 2020-06-15 2021-12-16 Dalian University Of Technology Monocular unsupervised depth estimation method based on contextual attention mechanism
CN114092808A (en) * 2021-11-17 2022-02-25 南京工程学院 Crop disease and insect pest detection and prevention device and method based on image and deep learning
CN114170079A (en) * 2021-11-19 2022-03-11 天津大学 Depth map super-resolution method based on attention guide mechanism
CN114187203A (en) * 2021-12-09 2022-03-15 南京林业大学 Attention-optimized deep codec defogging generation countermeasure network

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110853051A (en) * 2019-10-24 2020-02-28 北京航空航天大学 Cerebrovascular image segmentation method based on multi-attention dense connection generation countermeasure network
CN111127346A (en) * 2019-12-08 2020-05-08 复旦大学 Multi-level image restoration method based on partial-to-integral attention mechanism
CN111402137A (en) * 2020-03-20 2020-07-10 南京信息工程大学 Depth attention coding and decoding single image super-resolution algorithm based on perception loss guidance
US20210390723A1 (en) * 2020-06-15 2021-12-16 Dalian University Of Technology Monocular unsupervised depth estimation method based on contextual attention mechanism
CN111739082A (en) * 2020-06-15 2020-10-02 大连理工大学 Stereo vision unsupervised depth estimation method based on convolutional neural network
CN111738124A (en) * 2020-06-15 2020-10-02 西安电子科技大学 Remote sensing image cloud detection method based on Gabor transformation and attention
CN112184731A (en) * 2020-09-28 2021-01-05 北京工业大学 Multi-view stereo depth estimation method based on antagonism training
CN112233026A (en) * 2020-09-29 2021-01-15 南京理工大学 SAR image denoising method based on multi-scale residual attention network
CN112508800A (en) * 2020-10-20 2021-03-16 杭州电子科技大学 Attention mechanism-based highlight removing method for surface of metal part with single gray image
AU2020103715A4 (en) * 2020-11-27 2021-02-11 Beijing University Of Posts And Telecommunications Method of monocular depth estimation based on joint self-attention mechanism
CN113205468A (en) * 2021-06-01 2021-08-03 桂林电子科技大学 Underwater image real-time restoration model based on self-attention mechanism and GAN
CN114092808A (en) * 2021-11-17 2022-02-25 南京工程学院 Crop disease and insect pest detection and prevention device and method based on image and deep learning
CN114170079A (en) * 2021-11-19 2022-03-11 天津大学 Depth map super-resolution method based on attention guide mechanism
CN114187203A (en) * 2021-12-09 2022-03-15 南京林业大学 Attention-optimized deep codec defogging generation countermeasure network

Also Published As

Publication number Publication date
CN114998683B (en) 2024-05-31

Similar Documents

Publication Publication Date Title
CN112347859B (en) Method for detecting significance target of optical remote sensing image
CN110335290B (en) Twin candidate region generation network target tracking method based on attention mechanism
CN104867135B (en) A kind of High Precision Stereo matching process guided based on guide image
JP7166388B2 (en) License plate recognition method, license plate recognition model training method and apparatus
CN110909591B (en) Self-adaptive non-maximum suppression processing method for pedestrian image detection by using coding vector
CN110046659B (en) TLD-based long-time single-target tracking method
CN114463218B (en) Video deblurring method based on event data driving
CN113962858A (en) Multi-view depth acquisition method
CN110070574A (en) A kind of binocular vision Stereo Matching Algorithm based on improvement PSMNet
CN116758130A (en) Monocular depth prediction method based on multipath feature extraction and multi-scale feature fusion
Liu et al. Two-stage underwater object detection network using swin transformer
CN111325778B (en) Improved Census stereo matching algorithm based on window cross-correlation information
CN116468769A (en) Depth information estimation method based on image
CN116612468A (en) Three-dimensional target detection method based on multi-mode fusion and depth attention mechanism
CN116486243A (en) DP-ViT-based sonar image target detection method
Jing et al. Uncertainty guided adaptive warping for robust and efficient stereo matching
CN114529793A (en) Depth image restoration system and method based on gating cycle feature fusion
Liu et al. A multi-scale feature pyramid SAR ship detection network with robust background interference
Li et al. Image reflection removal using end‐to‐end convolutional neural network
CN117292117A (en) Small target detection method based on attention mechanism
Cho et al. Modified perceptual cycle generative adversarial network-based image enhancement for improving accuracy of low light image segmentation
Liu et al. Playing to Vision Foundation Model's Strengths in Stereo Matching
CN115063428B (en) Spatial dim small target detection method based on deep reinforcement learning
CN114998683A (en) Attention mechanism-based ToF multipath interference removing method
CN115424337A (en) Iris image restoration system based on priori guidance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant