CN114998683A

CN114998683A - Attention mechanism-based ToF multipath interference removing method

Info

Publication number: CN114998683A
Application number: CN202210622444.3A
Authority: CN
Inventors: 周文彪; 王鑫; 贾云飞
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2022-06-01
Filing date: 2022-06-01
Publication date: 2022-09-02
Anticipated expiration: 2042-06-01
Also published as: CN114998683B

Abstract

The invention relates to a ToF multipath interference removing method based on an attention mechanism, and belongs to the technical field of deep learning computer vision and image enhancement. The method comprises the following steps: acquiring an initial sample data set, and reading and preprocessing a depth image; building a network model for removing multipath interference, and mainly introducing an attention mechanism to capture multipath interference attention distribution of a spatial scene to generate an attention diagram; optimizing a loss function and a learning mode, training a network model and updating parameters; and testing different synthesized depth maps with basic facts and the depth map of a real scene to obtain an image without multipath interference, and displaying the robustness of the network. The invention realizes high-performance removal of ToF multipath interference errors on one hand, has certain advantages in applicability and memory consumption on the other hand, and has strong generalization capability.

Description

Attention mechanism-based ToF multipath interference removing method

Technical Field

The invention belongs to the technical field of deep learning computer vision and image enhancement, and particularly relates to a ToF multipath interference removing method based on an attention mechanism.

Background

In recent years, 3D imaging and sensor technologies have been developed, and related research results have been widely applied to the fields of face recognition, three-dimensional reconstruction, medical assistance, and the like. A ToF (Time-of-Flight) camera can actively acquire scene depth information and accurately represent three-dimensional information of a scene and a surface of an object, and has become a popular depth estimation technique. Unlike traditional two-dimensional images, the addition of depth information opens up a wide range of 3D visual applications.

The ToF camera is composed of an image sensor, an image processing chip and a modulated light source. The ToF sensor firstly gives a modulation signal to the light source driving chip, the modulation signal controls the laser to emit modulated near-infrared light, and the receiving end calculates the depth information through the phase difference or time difference between the emitted light and the received light. In practical applications, the continuous wave modulated iToF (indirect ToF) is mostly adopted, the phase difference is resolved through sampling, and then the time of flight is calculated, so as to obtain the distance of the target object. The ideal iToF imaging situation is: the transmitted signal undergoes only one reflection in the scene and the received signal of the sensor contains only the primary reflected light and the ambient light. However, in the actual imaging process, because of the diversity of the geometric shapes and material characteristics of the objects in the spatial scene, the light rays may be reflected and refracted many times in the scene, which means that the actual received signal may contain several (even countless) sub-signals, resulting in attenuation and phase shift of the signal, affecting the accuracy of the depth measurement and the effect of the three-dimensional reconstruction. This phenomenon is called Multi-Path Interference (MPI), which is the biggest obstacle to the widespread use of ToF, and therefore, it is necessary to remove the MPI error.

From the introduction of ToF imaging techniques to the present, research on ToF multi-path interference removal has been advancing. The early MPI correction mainly adopts a traditional physical algorithm, depends on the sparsity of optical reflection, and utilizes a multi-frequency modulation measurement value to solve the depth and the amplitude corresponding to each signal component. Or modifying the hardware module, and obtaining the direct path component from a plurality of measured values through the single-frequency coding illumination mode. These physical methods have limitations in the types of reflection and the number of optical paths, and are slow in calculation, so that more general methods are required. In recent years, with the improvement of a machine learning system and the successful application of a Neural Network in each visual task, more and more scholars build a Convolutional Neural Network (CNN) to correct MPI. The method does not need to modify a hardware module, has relatively simple algorithm design, and has better effect of eliminating errors compared with the traditional method. However, for the current MPI removal of ToF images, most network models do not fully utilize the MPI feature distribution of spatial scenes. Therefore, focusing on a region seriously affected by multipath interference, on the basis of globally removing MPI from the depth map, enhancing the capability of local depth information recovery is of great significance, and the method is also one of the targets that the method is dedicated to achieve.

Disclosure of Invention

The invention aims to provide a method for removing ToF multipath interference based on an attention mechanism, which aims at solving the problems that a large depth measurement error is caused by a multipath interference phenomenon in the imaging process of a ToF camera and the effect of removing the multipath interference is further improved.

In order to achieve the purpose, the invention adopts the following technical scheme:

the ToF multipath interference removing method comprises the following steps:

s1, acquiring an initial sample data set, and dividing a training set and a test set;

the training set comprises a depth map subjected to multipath interference and a reference depth map in pairs;

the test set is a depth map suffering from multipath interference;

s2, reading the depth map subjected to the multipath interference and the reference depth map in pairs in the training set, and calculating the difference of the depth map and the reference depth map to obtain an error map of the multipath interference;

s3, carrying out threshold processing on the error map to generate an attention mask map;

wherein, the attention mask image is a binary image, and the value of the pixel point is 0 or 1;

s4, constructing a network model for removing the multi-path interference based on the attention mechanism, which comprises an attention network, a generation network and a discrimination network, and specifically comprising the following substeps:

s41, constructing an attention network, cascading the sub-attention diagrams output by each residual attention module with the depth map of the multipath interference, inputting the cascade sub-attention diagrams into the next residual attention module, and obtaining an MPI (Multi-Point interference) attention diagram after passing through a plurality of attention modules;

wherein the MPI attention diagram is a non-binary diagram, and the values of the pixel points in the MPI attention diagram are all 0 to 1;

s42, constructing a generating network, cascading the MPI attention diagram obtained by the attention network with the depth image of the multipath interference, and inputting the images into the generating network to obtain a prediction depth diagram;

the generating network adopts a coding and decoding structure and comprises a feature extraction module and an up-sampling module;

the feature layers with the same size and symmetry in the feature extraction module and the up-sampling module are called mirror image layers, and the mirror image layers are added through jumping connection;

s43, constructing a discrimination network, inputting a prediction depth map and a reference depth map which are output by the generated network into the discrimination network, extracting spatial features by Conv + BN + LReLU to obtain a feature layer, connecting Conv with the number of channels being 1 on the feature layer to obtain an MPI attention guide map of the discrimination network, multiplying the MPI attention guide map with the feature layer, connecting Conv + BN + LReLU to obtain low-dimensional feature representation, and finally outputting a discrimination result through FC and Sigmoid;

s5, setting hyper-parameters, training the network for removing the multi-path interference based on the attention mechanism, and obtaining a trained network model;

the hyper-parameters comprise a learning rate, a batch processing size and a training period, the training is based on a network of an attention mechanism for removing multipath interference, and a trained network model is obtained, and the method specifically comprises the following steps:

s51, constructing attention loss by using the attention mask diagram and each sub-attention diagram;

the attention perception loss is a weighted sum of the attention mask map and the L2 loss of each sub-attention map;

s52, constructing coding and decoding loss by using the reference depth map and the prediction depth map;

the coding loss includes L1 loss and gradient loss of the reference depth map and the prediction depth map;

s53, constructing the countermeasure loss by using the judgment result of the judgment network;

the countermeasure loss is cross entropy loss of the GAN network;

s54, training a generating network, constructing and minimizing a loss function of the generating network according to attention perception loss, coding and decoding loss and resistance loss, and iteratively updating weight parameters of the generating network;

the loss function of the generation network is a weighted sum of attention awareness loss, codec loss, and countervailing loss;

s55, training a discrimination network, minimizing a loss function of the discrimination network, and iteratively updating weight parameters of the discrimination network;

the loss function of the discrimination network is the antagonistic loss;

s56, saving the weight parameters of the iterative update to obtain the trained network model

And S6, importing the test set into the trained network model, and outputting the depth image with the MPI removed.

Advantageous effects

Compared with the existing multipath interference removing method, the attention mechanism-based ToF multipath interference removing method has the following beneficial effects:

1. the ToF multipath interference removing method uses an attention mechanism for correcting the MPI error of the ToF, can focus on the distribution condition of the MPI, emphasize the seriously damaged area, and better recover the local depth information on the basis of globally removing the MPI, and has better performance for multipath of a single-frequency depth image in both qualitative and quantitative aspects compared with other existing neural network methods;

2. the multipath interference removing method is suitable for wide ToF depth images, errors can be effectively reduced no matter the ToF depth images are simulated or the depth images of real scenes, and robustness is achieved;

3. compared with other neural network methods, the multipath interference removing method has the advantages that the total weight needing to be learned is less, higher performance can be realized by a network model with a smaller magnitude, and the method is convenient to operate on a GPU with a smaller memory.

Drawings

FIG. 1 is a flow chart of a ToF multi-path interference removing method based on attention mechanism according to the present invention;

FIG. 2 is a schematic diagram of the structural composition and connection relationship of the attention-based ToF multi-path interference removal method of the present invention;

FIG. 3 is a graph of the effect of removing MPI on a synthetic ToF depth map according to the present invention;

FIG. 4 is a diagram illustrating the effect of removing MPI on a real ToF depth map according to the present invention;

FIG. 5 is a diagram of a network model implemented on a synthetic ToF depth map according to the present invention;

FIG. 6 shows the comparison of the present method with several MPI-removed network models.

Detailed Description

The attention-based ToF multi-path interference removing method of the present invention is further illustrated and described in detail below with reference to the accompanying drawings and embodiments.

Example 1

This example illustrates the MPI removal process for a data set 1 using the method of the present invention. The data set 1 comprises pairs of synthetic multipath interfered ToF depth maps and corresponding reference depth maps, real multipath interfered ToF depth maps of unknown reference depths and pairs of real multipath interfered ToF depth maps and corresponding reference depth maps. The synthetic ToF depth map size of data set 1 is 240x320 and the real ToF depth map size is 239x 320. The flow chart of the invention is shown in fig. 1, and fig. 2 is a schematic diagram of the structural composition and the connection relationship of the invention. Fig. 3 and 4 are effect diagrams of the present invention in embodiment 1.

S1, obtaining an initial sample data set 1, and dividing a training set and a test set;

the training set is a pair of a synthetic ToF depth map subjected to multipath interference, a corresponding reference depth map and a real ToF depth map subjected to multipath interference and with unknown reference depth, and the testing set is a real ToF depth map subjected to multipath interference;

s3, carrying out threshold processing on the error map, and processing the real ToF depth map subjected to multipath interference of the unknown reference depth to obtain an attention mask map;

s31, carrying out threshold processing on the error map of the synthesized ToF to obtain an attention mask map;

the threshold processing is as follows: setting an error threshold value of an error map of the synthesized ToF as a, namely, setting a position with a depth error smaller than a, presetting a mask at the position as 0, and considering that the point is not subjected to multipath interference, otherwise, setting the mask as 1, which represents that the depth value of the point has multipath interference, and forming an attention mask map by masks of all pixel points;

wherein, the attention mask map is a binary map, and only has two values of 0 and 1;

s32, processing the real ToF depth map of the unknown reference depth subjected to the multipath interference to obtain an attention mask map;

setting the mask of all pixel points as 0.5;

s4, constructing a network model for removing the multi-path interference based on the attention mechanism, which comprises an attention network, a generation network and a discrimination network;

the network model for removing the multipath interference takes GAN as a baseline model, and specifically comprises the following substeps:

the attention network consists of 2 residual attention modules for representing space attention, each residual attention module consists of 6 convolution layers with residual structures and a Sigmoid active layer, the convolution kernel is 3x3, the output channel is 8, and the step length is 1;

wherein the MPI attention diagram is a non-binary diagram, and the values of the pixel points in the MPI attention diagram are all 0 to 1; the value of the pixel point represents the strength of MPI, the closer the value is to 1, the more serious the interference is, and 0 represents that MPI is hardly received;

the generation network adopts a coding and decoding structure and comprises a feature extraction module and an up-sampling module;

the feature extraction module consists of 4 layers of Conv + LReLU and Conv + AvgPool + LReLU, the step lengths are respectively 2 and 1, the convolution kernel is 3x3, the up-sampling module consists of 4 layers of Deconv + LReLU and 1 layer of Conv + LReLU, and the convolution kernel is 4x 4;

the predicted depth map comprises a predicted depth map for a synthetic ToF and a predicted depth map for a real ToF;

the reference depth map is a reference depth map of a synthetic ToF;

s5, setting hyper-parameters, training a network for removing multi-path interference based on an attention mechanism, and obtaining a trained network model;

s51, setting the learning rate to be 0.0001 exponential decay, and the training period to be 120;

s52, constructing attention loss by using the attention mask diagram and each sub-attention diagram of the synthetic ToF; the attention perception loss is a weighted sum of the attention mask map and the L2 loss of each sub-attention map;

s53, constructing coding and decoding loss by using the reference depth map and the prediction depth map;

the coding loss is L1 loss and gradient loss of a reference depth map of the synthesized ToF and a prediction depth map of the synthesized ToF;

the antagonistic loss is cross entropy loss of the GAN network;

s55, training a generated network, constructing and minimizing a loss function of the generated network according to attention perception loss, coding and decoding loss and confrontation loss, and iteratively updating weight parameters of the generated network;

s56, training a discrimination network, minimizing a loss function of the discrimination network, and iteratively updating weight parameters of the discrimination network;

the loss function of the discrimination network is a countermeasure loss;

s57, saving the weight parameters of the iterative update to obtain the trained network model

To this end, from step 1 to step 6, the attention-based ToF multi-path interference removal method is completed.

Example 2

This example illustrates the MPI removal process for data set 2 using the method of the present invention. Data set 2 is a synthetic ToF depth map containing six different reflectivities, with an image size of 256x 256. Fig. 5 is a concrete network model diagram of the method in embodiment 2.

the training set comprises 6696 paired synthesized ToF depth maps subjected to multipath interference and corresponding reference depth maps, and the test set comprises 1704 synthesized ToF depth maps subjected to multipath interference;

s3, carrying out threshold processing on the error map, and processing the real ToF depth map of the unknown reference depth to obtain an attention mask map;

the threshold processing process comprises the following steps: the error threshold of the synthesized ToF error map is a, that is, the position with depth error smaller than a, the mask is preset to be 0, the point is considered not to be subjected to multipath interference, otherwise, the mask is set to be 1, which represents that the depth value of the point has multipath interference, and the masks of all pixel points form an attention mask map;

s4, constructing a network model for removing the multi-path interference, which comprises an attention network, a generation network and a judgment network;

the attention network consists of 4 residual attention modules for representing space attention, each residual attention module consists of 6 convolution layers with residual structures and a Sigmoid active layer, the convolution kernel is 3x3, the output channel is 8, and the step length is 1;

s42, constructing a generating network, cascading the MPI attention diagram obtained by the attention network with the depth map of the multipath interference, and inputting the cascade into the generating network to obtain a prediction depth map;

the feature extraction module consists of 6 layers of Conv + LReLU and Conv + AvgPool + LReLU, the step lengths are respectively 2 and 1, the convolution kernel is 3x3, the up-sampling module consists of 6 layers of Deconv + LReLU and 1 layer of Conv + LReLU, and the convolution kernel is 4x 4;

s51, setting the learning rate to be 0.0001 exponential decay, and the training period to be 150;

s52, constructing the attention loss by using the attention mask graph and each sub-attention graph;

the attention aware loss is a weighted sum of the L2 loss of the attention mask map and each sub-attention map;

the countermeasure loss is cross entropy loss of the GAN network;

the loss function of the discrimination network is a countermeasure loss;

MPI correction is performed on the two data sets by the method and a common network model, the image processing effect is shown in FIG. 5, and the corresponding average absolute error (MAE) and Relative error (Relative error) are shown in Table 1. Wherein, the network models of comparison are DeepToF: a two-stage convolutional network; ToF-KPN: a core prediction network; coarse-to-fine: a convolution network consisting of two sub-networks of thickness; Sharp-Net: and (3) a spatial hierarchy perception residual pyramid network.

TABLE 1 quantitative comparison of the method with several other network models

As can be seen from a comparison between fig. 6 and table 1, the method of the present invention has better MPI error removal effect when inputting the same depth image compared to other methods.

While the foregoing is directed to the preferred embodiment of the present invention, it is not intended that the invention be limited to the embodiment and the drawings disclosed herein. Equivalents and modifications may be made without departing from the spirit of the disclosure, which is to be considered as within the scope of the invention.

Claims

1. A method for removing ToF multipath interference based on an attention mechanism relies on a network model for removing multipath interference based on the attention mechanism, wherein the network model for removing multipath interference based on the attention mechanism comprises an attention network, a generation network and a discrimination network; the method is characterized by comprising the following steps:

the training set comprises pairs of a depth map subjected to multipath interference and a reference depth map;

the test set is a depth map suffering from multipath interference;

the method specifically comprises the following substeps:

s41, constructing an attention network and obtaining an MPI attention diagram, specifically: cascading the sub-attention diagrams output by each residual attention module with the depth map of the multipath interference, inputting the cascade sub-attention diagrams into the next residual attention module, and obtaining an MPI (Multi-path interference) attention diagram after passing through a plurality of attention modules;

the attention network is used for obtaining an MPI attention diagram and comprises a plurality of residual attention modules; the residual error attention module is used for obtaining a sub-attention diagram, and comprises a plurality of convolution layers of residual error structures and an activation function layer; the MPI attention diagram is a sub-attention diagram obtained by the last residual attention module;

s42, constructing a generation network and obtaining a prediction depth map, specifically: cascading the MPI attention diagram obtained by the attention network with a depth image of multipath interference, and inputting the image into a generating network to obtain a prediction depth map;

the generation network is used for obtaining the prediction depth map without the MPI and comprises a feature extraction module and an up-sampling module; the feature extraction module comprises a plurality of Conv + LReLUs and Conv + AvgPool + LReLUs; the upsampling module comprises a plurality of Deconv + LReLUs and Conv + LReLUs;

wherein, Conv denotes a convolutional layer, LReLU denotes a Leaky ReLU activation function layer, AvgPool denotes an average pooling layer, and Deconv denotes an anti-convolutional layer;

s43, constructing a discrimination network and outputting a discrimination result, specifically:

inputting a prediction depth map and a reference depth map which are generated and output by a network into a discrimination network, extracting spatial features by Conv + BN + LReLU to obtain a feature layer, connecting the Conv with the feature layer with the number of channels being 1 to obtain an MPI attention guide map of the discrimination network, multiplying the MPI attention guide map with the feature layer, connecting the MPI attention guide map with the Conv + BN + LReLU to obtain low-dimensional feature representation, and finally outputting a discrimination result through FC and Sigmoid;

the judgment network is used for judging the authenticity of the generated image and comprises a plurality of Conv + BN + LReLU, Conv and FC + Sigmoid; wherein BN represents a batch normalization layer, and FC represents a full connection layer; sigmoid represents a Sigmoid activation function layer;

the hyper-parameters comprise learning rate, batch processing size and training period, and the training is based on the network for removing the multipath interference of the attention mechanism to obtain a trained network model;

2. The ToF multipath interference removing method of claim 1, wherein in S3, the attention mask map is a binary map, and the values of the pixels are 0 or 1.

3. The ToF multipath interference removal method of claim 1, wherein in S41, the MPI attention map is a non-binary map and the values of the pixels in the MPI attention map are all 0 to 1.

4. The ToF multi-path interference removing method of claim 1, wherein the generating network of S42 adopts a codec structure including a feature extraction module and an up-sampling module;

and symmetrical feature layers with the same size in the feature extraction module and the up-sampling module are called mirror image layers, and the mirror image layers are added through jump connection.

5. The ToF multipath interference removing method of claim 1, wherein the network model for removing multipath interference to generate the countermeasure network is a baseline model S5.

6. The ToF multi-path interference removing method according to claim 1, wherein S5 specifically includes:

and S56, saving the weight parameters updated iteratively to obtain the trained network model.

7. The ToF multipath interference removing method of claim 1 or 5, wherein the attention aware loss is a weighted sum of the attention mask map and the L2 loss of each sub-attention map at S51.

8. The ToF multipath interference removing method of claim 1 or 5, wherein the coding loss of S52 includes L1 loss and gradient loss of the reference depth map and the predicted depth map.

9. The ToF multi-path interference removal method of claim 1 or 5, wherein the countermeasure loss of S53 is a cross-entropy loss of the GAN network.

10. The ToF multipath interference removing method according to claim 1 or 5, wherein the loss function of the generating network of S54 is a weighted sum of attention aware loss, codec loss and countervailing loss;

the method of S55 determines the loss function of the network as a countermeasure loss.