WO2023201783A1 - Light field depth estimation method and apparatus, and electronic device and storage medium - Google Patents

Light field depth estimation method and apparatus, and electronic device and storage medium Download PDF

Info

Publication number
WO2023201783A1
WO2023201783A1 PCT/CN2022/091182 CN2022091182W WO2023201783A1 WO 2023201783 A1 WO2023201783 A1 WO 2023201783A1 CN 2022091182 W CN2022091182 W CN 2022091182W WO 2023201783 A1 WO2023201783 A1 WO 2023201783A1
Authority
WO
WIPO (PCT)
Prior art keywords
light field
image
depth
rgb image
simulated
Prior art date
Application number
PCT/CN2022/091182
Other languages
French (fr)
Chinese (zh)
Inventor
戴琼海
岳冬晓
于涛
吴嘉敏
Original Assignee
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学 filed Critical 清华大学
Publication of WO2023201783A1 publication Critical patent/WO2023201783A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics

Definitions

  • This application relates to the field of image processing technology, and in particular to a light field depth estimation method, device, electronic equipment and storage medium.
  • Light field depth estimation is one of the representative technologies of optical three-dimensional measurement and has important application value in the fields of intelligent manufacturing, robot vision, autonomous driving, industrial inspection, virtual reality, aerial exploration and the metaverse.
  • Existing light field depth estimation methods are mainly divided into two categories: optimization-based estimation methods and data-driven deep learning methods.
  • Data-driven deep learning methods rely heavily on the quantity and quality of training data.
  • the models are designed and trained based on supervised learning, and the data sets are artificially synthesized through ray tracing and other methods.
  • the characteristics of light field imaging determine that conventional structured light scanning or depth detectors are difficult to obtain true depth values that meet the requirements of light field data sets. Therefore, light field disparity estimation research has not yet been able to establish large-scale datasets containing accurate disparity values for real scenes.
  • the number of samples in synthetic datasets is limited, and even with methods such as data augmentation, it is difficult to obtain sufficient training data. The problem this brings is that a model trained entirely on synthetic data sets cannot achieve similar generalization performance in real scenes.
  • the existing light field depth has the following problems:
  • This application provides a light field depth estimation method, device, electronic equipment and storage medium. It builds a light field defocus imaging model from the forward direction to improve the accuracy of depth estimation.
  • the optical image based on the light field defocus imaging model is constructed from the reverse direction. Attention learns neural networks to quickly estimate scene depth while obtaining fully focused images.
  • the first embodiment of the present application provides a light field depth estimation method, which includes the following steps: performing gamma correction on a light field RGB image to generate a light field gamma corrected RGB image; converting the light field gamma corrected RGB image, The light field depth map and the light field simulation point transfer function simulation image are forward RGB generated to simulate the light field defocus image to obtain the simulated light field RGB image; the attention learning neural network is used to perform light field depth on the simulated light field RGB image. Estimated, the estimated depth map and fully focused image of the light field are obtained.
  • the method before obtaining the simulated light field RGB image, further includes: calculating the wave function of the object point passing through the main lens according to the light field camera parameters and the point source field propagation process; The wave function is modulated using the phase modulation function of the microlens array; the component of the camera pixel to a specific frequency is collected, a point transfer function is calculated based on the modulated wave function and the component of the specific frequency, and the point transfer function is calculated Perform numerical simulation and use bilinear interpolation, fitting, symmetry completion and normalization operations to obtain a simulation point transfer function diagram; based on the simulation point transfer function diagram, randomly sample the depth value of the object space, and simulate the random The simulation point transfer function at the sampling depth is used to obtain the simulation image of the light field simulation point transfer function.
  • the light field gamma corrected RGB image, the light field depth map and the light field simulation point transfer function simulation image are forward RGB generated to simulate the light field defocus image.
  • obtaining a simulated light field RGB image including: discretizing the light field depth map to obtain a binary depth mask corresponding to the light field depth map; combining the light field gamma corrected RGB image with the binary depth Multiply the masks to obtain the corresponding light field depth slice RGB image; input the light field depth slice RGB image, the binary depth mask and the light field simulation point transfer function simulation image into the light field nonlinear imaging model, The simulated light field RGB image is obtained.
  • using an attention learning neural network to perform light field depth estimation on the simulated light field RGB image to obtain an estimated depth map of the light field and a fully focused image includes: using The simulated light field RGB image and RL iteration and estimation algorithm obtain the initial light field focusing sequence, and use the light field sub-aperture image to obtain the estimated depth map of the light field through the attention learning neural network; the light field sub-aperture image is Concatenated with the initial light field focusing sequence, the estimated light field refocusing sequence is output through the encoding and decoding network, and the light field refocusing sequence is multiplied by the continuous depth volume to obtain the fully focused image.
  • the method further includes: The estimated depth map and the fully focused image are compared with their true values respectively, a loss function is calculated, and the error is backpropagated to train the attention learning neural network parameters.
  • a second embodiment of the present application provides a light field depth estimation device, including: a processing module for performing gamma correction on a light field RGB image and generating a light field gamma corrected RGB image; and a generation module for converting the light field RGB image into a light field depth estimation device.
  • the light field gamma corrected RGB image, the light field depth map and the light field simulation point transfer function simulation image are forward RGB generated to simulate the light field defocus image to obtain the simulated light field RGB image; the estimation module is used to use attention to learn neural
  • the network performs light field depth estimation on the simulated light field RGB image to obtain an estimated depth map and a fully focused image of the light field.
  • an output module used to calculate the object direction point passing through the main body according to the light field camera parameters and the point source field propagation process before obtaining the simulated light field RGB image.
  • the wave function of the lens is modulated by the phase modulation function of the microlens array, the component of the camera pixel to a specific frequency is collected, the point transfer function is calculated based on the modulated wave function and the component of the specific frequency, and
  • the point transfer function is numerically simulated and bilinear interpolation, fitting, symmetry completion and normalization operations are used to obtain a simulation point transfer function diagram, and the object depth value is randomized based on the simulation point transfer function diagram. Sampling, simulating the simulation point transfer function under random sampling depth, to obtain the light field simulation point transfer function simulation image.
  • the generation module is further configured to discretize the light field depth map to obtain a binary depth mask corresponding to the light field depth map; convert the Multiply the light field gamma corrected RGB image with the binary depth mask to obtain the corresponding light field depth slice RGB image; combine the light field depth slice RGB image, the binary depth mask and the light field simulation point
  • the transfer function simulation image is input into the light field nonlinear imaging model to obtain the simulated light field RGB image.
  • the estimation module is further configured to use the simulated light field RGB image and the RL iteration and estimation algorithm to obtain the initial light field focusing sequence, and use the light field sub-aperture image to
  • the attention learning neural network obtains the estimated depth map of the light field; the light field sub-aperture image is cascaded with the initial light field focusing sequence, and the estimated light field refocusing sequence is output through the encoding and decoding network, and the light field refocusing sequence is output.
  • the light field refocusing sequence is multiplied by the continuous depth volume to obtain the fully focused image.
  • it also includes: a training module for comparing the estimated depth map and the fully focused image with their true values respectively, calculating a loss function, and back propagating error training. Attention learns neural network parameters.
  • a third embodiment of the present application provides an electronic device, including: a memory, a processor, and a computer program stored on the memory and executable on the processor.
  • the processor executes the program to execute The light field depth estimation method as described in the above embodiment.
  • a fourth embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and the program is executed by a processor to perform the light field depth estimation method as described in the above embodiment.
  • the forward imaging modeling method that considers the defocus characteristics of the light field improves the accuracy of depth estimation.
  • the nonlinear light field defocus imaging model is used to simulate the defocus characteristics of different depths and different viewing angles during the light field imaging process, instead of the ideal pinhole imaging model, which can model the imaging process more accurately, thereby improving the light field depth. Accuracy of estimate.
  • the nonlinear light field imaging model that takes occlusion into consideration improves the accuracy of depth estimation.
  • Light field nonlinear imaging using alpha synthesis takes into account the impact of occlusion and can provide depth estimation accuracy.
  • Depth estimation results are faster.
  • the attention deep neural network model is used, and the symmetrical attention map is used to determine the weight coefficient of the light field perspective with fewer training parameters, which is faster and more efficient than the traditional iterative method.
  • Figure 1 is a flow chart of a light field depth estimation method provided according to an embodiment of the present application.
  • Figure 2 is a structural framework diagram of a light field depth estimation method provided according to an embodiment of the present application.
  • Figure 3 is a flow chart of a light field transmission model provided according to an embodiment of the present application.
  • Figure 4 is a schematic diagram of a light field forward defocused RGB generation process provided according to an embodiment of the present application.
  • Figure 5 is a schematic diagram of a light field depth inversion process provided according to an embodiment of the present application.
  • Figure 6 is a structural diagram of a light field depth inversion network provided according to an embodiment of the present application.
  • Figure 7 is an example diagram of a light field depth estimation device according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • this application provides a light field depth estimation method.
  • a light field defocus imaging model is established from the forward direction to improve the depth estimation accuracy.
  • the optical image based on the light field defocus imaging model is constructed from the reverse direction. Attention learns neural networks to quickly estimate scene depth and obtain fully focused images at the same time. This improves the accuracy of light field depth estimation, reduces the calculation amount of the algorithm, improves estimation efficiency, and enables real-time estimation.
  • FIG. 1 is a flow chart of a light field depth estimation method provided according to an embodiment of the present application.
  • the light field depth estimation method includes the following steps:
  • step S101 gamma correction is performed on the light field RGB image to generate a light field gamma corrected RGB image.
  • the light field RGB image is gamma corrected to obtain the light field gamma corrected RGB image, which is sent to the forward RGB generation model together with the light field depth map for Simulate defocused light field images.
  • step S102 the light field gamma corrected RGB image, the light field depth map and the light field simulation point transfer function simulation image are subjected to forward RGB generation to simulate the light field defocus image to obtain a simulated light field RGB image.
  • the simulation point transfer function diagram is obtained; based on the simulation point transfer function diagram, the object depth value is randomly sampled, and the simulation point transfer function under the random sampling depth is simulated, and the simulation point transfer function is obtained Light field simulation point transfer function simulation image.
  • the light field simulation point transfer function simulation image is calculated. Specifically, based on the parameter model of the light field camera, the light field point transfer function (PSF) is calculated according to the optical transmission diffraction formula, which provides defocus clues for simulating light field defocused images from different viewing angles.
  • PSF light field point transfer function
  • the wave function U(x, y, x, ⁇ ) of the object-space point passing through the main lens can be expressed as:
  • x, y, x are the three-dimensional space distance coordinates of the object point
  • z means depth
  • is the wavelength
  • r is the radial distance of the aperture plane
  • J 0 ( ⁇ ) is the first kind of zero-order Bessel function
  • D The expression of (r, ⁇ ,z) is as follows:
  • s is the distance between the lens and the sensor
  • d is the distance between the object and the lens.
  • phase modulation function of the microlens array is:
  • x 0 and y 0 are the center coordinates of the microlens array
  • f is the focal length
  • n is the refractive index
  • ⁇ x , ⁇ y are frequency domain samples of space (x, y), and F ⁇ (.) is the Fourier transform operation.
  • ⁇ u is the spatial frequency position corresponding to angle u
  • ⁇ v is the spatial frequency position corresponding to angle v
  • rect(.) is the rectangular window function.
  • the light field gamma corrected RGB image, the light field depth map and the light field simulation point transfer function simulation image are subjected to forward RGB generation to simulate the light field defocus image to obtain the simulated light Field RGB image, including: discretizing the light field depth map to obtain the binary depth mask corresponding to the light field depth map; multiplying the light field gamma corrected RGB image and the binary depth mask to obtain the corresponding light field depth slice RGB image; input the light field depth slice RGB image, binary depth mask and light field simulation point transfer function simulation image into the light field nonlinear imaging model to obtain the simulated light field RGB image.
  • the obtained light field gamma corrected RGB image, light field depth map and simulated PSF are sent to the forward RGB generation process to simulate the light field defocused image.
  • This forward simulation process is different from the existing ideal pinhole imaging technology. By citing the light field point spread function to model optical defocus at different depths and angles, it can be closer to the real light field image obtained.
  • the forward generation process is shown in Figure 4.
  • the depth map is quantized into K layers, u, v represent the light field perspective, ⁇ is the wavelength, and * represents the convolution operation.
  • l k ( ⁇ ) is the k-th layer light field depth slice RGB image
  • PSF k ( ⁇ , u, v) represents the u, v viewing angle
  • the PSF at the kth layer depth of Characterizes the imaging response at the kth layer depth without considering occlusion, It represents the impact of occlusion on imaging at the kth layer depth, and eta is additive noise.
  • step S103 the attention learning neural network is used to perform light field depth estimation on the simulated light field RGB image, and an estimated depth map and a fully focused image of the light field are obtained.
  • the attention learning neural network is used to perform light field depth estimation on the simulated light field RGB image, and the estimated depth map of the light field and the fully focused image are obtained, including: using the simulated light field RGB
  • the image and RL iteration and estimation algorithm obtain the initial light field focusing sequence, and use the light field sub-aperture image to obtain the estimated depth map of the light field through the attention learning neural network;
  • the light field sub-aperture image is cascaded with the initial light field focusing sequence, and is encoded and decoded
  • the network outputs the estimated light field refocusing sequence, and multiplies the light field refocusing sequence with the continuous depth volume to obtain a fully focused image.
  • the method further includes: estimating The depth map and fully focused image are compared with their true values respectively, the loss function is calculated, and the error is back propagated to train the attention learning neural network parameters.
  • depth estimation is performed based on the light field RGB image obtained by the above forward model simulation.
  • the RL iteration and estimation algorithm is used to obtain the initial light field focus sequence (focalstack), and the attention deep learning network is used to train and estimate the light field. Depth of Field Chart.
  • the deep inversion neural network used in this application is shown in Figure 6.
  • the input of the depth inversion network is the multi-view light field image obtained by the simulation in the third step, and the output is the estimated depth map and its corresponding all-in-focus image.
  • the depth inversion network mainly consists of two parts: the first part uses the attention perspective selection neural network, which is mainly composed of 2D convolution, Resblock, SPPModule, CostVolume, AttentionModule, 3DCNN and other modules.
  • the input is the light field sub-aperture image, and the output is H ⁇
  • the light field depth map can be obtained by taking the maximum value.
  • the second part uses the 3DU-Net encoding and decoding network to estimate the fully focused image.
  • the input is the light field sub-aperture image. It first passes through 2DCNN, Resblock, SPPModule and other modules, and then is cascaded with the pre-estimated light field focusing sequence, and then passes through the encoding and decoding network.
  • the estimated light field refocusing sequence is output and multiplied with the continuous depth volume M AiF to obtain a fully focused image.
  • depth volume M and M AiF are defined as:
  • i and j represent the i-th row and j-th column respectively
  • k represents the k-th depth layer
  • M i, j, k Respectively represent the values of the i-th row, j-th column, k-th depth layer of the depth volume M and the continuous depth volume M AiF .
  • the estimated light field depth map and light field fully focused image are compared with their true values respectively, the loss function is calculated, and the error is back propagated to train the network parameters.
  • the minimum mean square error function is used as the loss function of the light field depth map
  • the SSIM (structural similarity) index is used as the loss function of the light field fully focused image.
  • a light field defocus imaging model is established from the forward direction to improve the depth estimation accuracy.
  • an attention learning neural network is constructed from the reverse direction to improve the depth estimation accuracy. The depth of the scene is quickly estimated and a fully focused image is obtained at the same time. This improves the accuracy of light field depth estimation, reduces the calculation amount of the algorithm, improves the estimation efficiency, and enables real-time estimation.
  • Figure 7 is an example diagram of a light field depth estimation device according to an embodiment of the present application.
  • the light field depth estimation device 10 includes: a processing module 100 , a generation module 200 and an estimation module 300 .
  • the processing module 100 is used to perform gamma correction on the light field RGB image and generate a light field gamma corrected RGB image.
  • the generation module 200 is used to perform forward RGB generation on the light field gamma corrected RGB image, the light field depth map and the light field simulation point transfer function simulation image to simulate the light field defocus image to obtain a simulated light field RGB image.
  • the estimation module 300 is used to perform light field depth estimation on the simulated light field RGB image using an attention learning neural network to obtain an estimated depth map and a fully focused image of the light field.
  • the light field depth estimation device 10 also includes: an output module, used to calculate the object according to the light field camera parameters and the point source field propagation process before obtaining the simulated light field RGB image.
  • the square point passes through the wave function of the main lens, uses the phase modulation function of the microlens array to modulate the wave function, collects the component of the camera pixel to a specific frequency, calculates the point transfer function based on the modulated wave function and the component of the specific frequency, and calculates
  • the point transfer function is numerically simulated and bilinear interpolation, fitting, symmetry completion and normalization operations are used to obtain the simulation point transfer function graph. Based on the simulation point transfer function graph, the depth value of the object space is randomly sampled to simulate the Randomly sample the simulated point transfer function at depth to obtain a light field simulated point transfer function simulation image.
  • the generation module 200 is further configured to discretize the light field depth map to obtain a binary depth mask corresponding to the light field depth map; convert the light field gamma corrected RGB image into Multiply with the binary depth mask to obtain the corresponding light field depth slice RGB image; input the light field depth slice RGB image, binary depth mask and light field simulation point transfer function simulation image into the light field nonlinear imaging model to obtain the simulated light field RGB image.
  • the estimation module 300 is further configured to use simulated light field RGB images and RL iteration and estimation algorithms to obtain the initial light field focusing sequence, and use the light field sub-aperture image to learn the neural network through attention.
  • the network obtains the estimated depth map of the light field; cascades the light field sub-aperture image with the initial light field focusing sequence, outputs the estimated light field refocusing sequence through the encoding and decoding network, and multiplies the light field refocusing sequence with the continuous depth volume. A fully focused image is obtained.
  • the light field depth estimation device 10 also includes: a training module for comparing the estimated depth map and the fully focused image with their true values respectively, calculating the loss function, and back propagating error training Attention learns neural network parameters.
  • a light field defocus imaging model is established from the forward direction to improve the depth estimation accuracy.
  • an attention learning neural network is constructed from the reverse direction to improve the depth estimation accuracy. The depth of the scene is quickly estimated and a fully focused image is obtained at the same time. This improves the accuracy of light field depth estimation, reduces the calculation amount of the algorithm, improves the estimation efficiency, and enables real-time estimation.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device may include:
  • the processor 802 executes the program, it implements the light field depth estimation method provided in the above embodiment.
  • vehicles also include:
  • Communication interface 803 is used for communication between the memory 801 and the processor 802.
  • Memory 801 is used to store computer programs that can run on the processor 802.
  • the memory 801 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 8, but it does not mean that there is only one bus or one type of bus.
  • the memory 801, the processor 802 and the communication interface 803 are integrated on one chip, the memory 801, the processor 802 and the communication interface 803 can communicate with each other through the internal interface.
  • the processor 802 may be a central processing unit (Central Processing Unit, CPU for short), or an Application Specific Integrated Circuit (ASIC for short), or one or more processors configured to implement the embodiments of the present application. integrated circuit.
  • CPU Central Processing Unit
  • ASIC Application Specific Integrated Circuit
  • This embodiment also provides a computer-readable storage medium on which a computer program is stored, which is characterized in that when the program is executed by a processor, the above light field depth estimation method is implemented.
  • references to the terms “one embodiment,” “some embodiments,” “an example,” “specific examples,” or “some examples” or the like means that specific features are described in connection with the embodiment or example. , structures, materials or features are included in at least one embodiment or example of the present application. In this specification, the schematic expressions of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, those skilled in the art may combine and combine different embodiments or examples and features of different embodiments or examples described in this specification unless they are inconsistent with each other.
  • first and second are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Therefore, features defined as “first” and “second” may explicitly or implicitly include at least one of these features. In the description of this application, “N” means at least two, such as two, three, etc., unless otherwise clearly and specifically limited.
  • N steps or methods may be implemented using software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if it is implemented in hardware, as in another embodiment, it can be implemented by any one of the following technologies known in the art or their combination: discrete logic gate circuits with logic functions for implementing data signals; Logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.
  • the program can be stored in a computer-readable storage medium.
  • the program can be stored in a computer-readable storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Studio Devices (AREA)
  • Image Analysis (AREA)

Abstract

The present application relates to the technical field of image processing, and particularly to a light field depth estimation method and apparatus, and an electronic device and a storage medium. The method comprises: performing gamma correction on a light field RGB image, so as to generate a light field gamma-corrected RGB image; performing forward RGB generation on the light field gamma-corrected RGB image, a light field depth map and a light field simulation point transmission function simulation image to simulate a light field defocus image, so as to obtain a simulated light field RGB image; and performing light field depth estimation on the simulated light field RGB image by using an attention learning neural network, so as to obtain an estimated depth map and a full-focused image of a light field. A light field defocus imaging model is established from a forward direction, so as to improve the depth estimation accuracy; on the basis of an optical image of the light field defocus imaging model, an attention learning neural network is constructed from a reverse direction, so as to quickly estimate a scene depth; and a full-focused image is obtained. Therefore, the accuracy of light field depth estimation is improved, the calculation amount of an algorithm is reduced, the estimation efficiency is improved, and real-time estimation can be performed.

Description

光场深度估计方法、装置、电子设备及存储介质Light field depth estimation method, device, electronic equipment and storage medium
相关申请的交叉引用Cross-references to related applications
本申请要求清华大学于2022年04月18日提交的、发明名称为“光场深度估计方法、装置、电子设备及存储介质”的、中国专利申请号“202210404127.4”的优先权。This application claims the priority of Chinese patent application number "202210404127.4" submitted by Tsinghua University on April 18, 2022, with the invention title "Light field depth estimation method, device, electronic equipment and storage medium".
技术领域Technical field
本申请涉及图像处理技术领域,特别涉及一种光场深度估计方法、装置、电子设备及存储介质This application relates to the field of image processing technology, and in particular to a light field depth estimation method, device, electronic equipment and storage medium.
背景技术Background technique
光场深度估计是光学三维测量的代表技术之一,在智能制造、机器人视觉、自动驾驶、工业检测、虚拟现实、航空探测及元宇宙等领域具有重要的应用价值。现有光场深度估计方法主要分为两大类:基于优化的估计方法和基于数据驱动的深度学习方法。Light field depth estimation is one of the representative technologies of optical three-dimensional measurement and has important application value in the fields of intelligent manufacturing, robot vision, autonomous driving, industrial inspection, virtual reality, aerial exploration and the metaverse. Existing light field depth estimation methods are mainly divided into two categories: optimization-based estimation methods and data-driven deep learning methods.
基于优化的传统方法主要基于对场景特性的分析,通过手动设计场景特征,并以此建立匹配代价函数。深度估计被转化为优化代价函数的问题。由于手动设计特征难以充分表达场景结构特点,优化的传统方法在处理遮挡、弱纹理等场景时,难以令人满意。此外,其构建的模型往往存在结构复杂且求解困难等问题,无法满足实时性要求。Traditional methods based on optimization are mainly based on the analysis of scene characteristics, manually designing scene features, and establishing a matching cost function. Depth estimation is transformed into a problem of optimizing a cost function. Since manual design features are difficult to fully express the structural characteristics of the scene, traditional optimization methods are unsatisfactory when dealing with scenes such as occlusion and weak textures. In addition, the models they build often have problems such as complex structures and difficult solutions, which cannot meet real-time requirements.
基于数据驱动的深度学习方法严重依赖训练数据的数量和质量。模型均基于监督学习进行设计和训练,数据集则是通过光线追迹等方式人工合成。一方面,光场成像的特性决定了常规的结构光扫描或深度探测器,难以获得满足光场数据集要求的深度真值。因此,光场视差估计研究尚未能建立包含真实场景准确视差值的大规模数据集。另一方面,合成数据集中的样本数量有限,即使采用数据增强等方法,也难以获得充足的训练数据。这带来的问题是完全在合成数据集上训练的模型,无法在真实场景上获得相近的泛化性能。Data-driven deep learning methods rely heavily on the quantity and quality of training data. The models are designed and trained based on supervised learning, and the data sets are artificially synthesized through ray tracing and other methods. On the one hand, the characteristics of light field imaging determine that conventional structured light scanning or depth detectors are difficult to obtain true depth values that meet the requirements of light field data sets. Therefore, light field disparity estimation research has not yet been able to establish large-scale datasets containing accurate disparity values for real scenes. On the other hand, the number of samples in synthetic datasets is limited, and even with methods such as data augmentation, it is difficult to obtain sufficient training data. The problem this brings is that a model trained entirely on synthetic data sets cannot achieve similar generalization performance in real scenes.
此外,不管是优化的估计方法还是基于数据驱动的深度学习方法,均是从光学***成像后的图像出发,采用不同的手段估计光学图像的深度信息,即均假设理想的小孔成像模型。然而实际的光学成像***是存在不同程度的像差的,成像模型带来的误差是影响当前深度估计算法准确度的关键元素之一。In addition, whether it is an optimized estimation method or a data-driven deep learning method, they all start from the image imaged by the optical system and use different methods to estimate the depth information of the optical image, that is, they all assume an ideal pinhole imaging model. However, actual optical imaging systems have varying degrees of aberration, and the error caused by the imaging model is one of the key elements that affects the accuracy of current depth estimation algorithms.
综上,现有光场深度存在以下问题:To sum up, the existing light field depth has the following problems:
1、成像模型的误差以及优化算法误差使得光场深度估计准确度有待提高。1. Errors in the imaging model and optimization algorithm errors make the accuracy of light field depth estimation need to be improved.
2、算法的复杂度导致计算量大,算法效率不高,无法满足实时性要求。2. The complexity of the algorithm results in a large amount of calculation, low algorithm efficiency, and inability to meet real-time requirements.
发明内容Contents of the invention
本申请提供一种光场深度估计方法、装置、电子设备及存储介质,从正向建立光场散焦成像模型以提高深度估计准确度,基于光场散焦成像模型的光学图像,从逆向构建注意力学习神经网络以快速估计场景深度,同时获得全聚焦图像。This application provides a light field depth estimation method, device, electronic equipment and storage medium. It builds a light field defocus imaging model from the forward direction to improve the accuracy of depth estimation. The optical image based on the light field defocus imaging model is constructed from the reverse direction. Attention learns neural networks to quickly estimate scene depth while obtaining fully focused images.
本申请第一方面实施例提供一种光场深度估计方法,包括以下步骤:对光场RGB图像进行伽马校正,生成光场伽马校正RGB图像;将所述光场伽马校正RGB图像、光场深度图和光场仿真点传输函数仿真图像进行前向RGB生成以仿真光场散焦图像,得到仿真光场RGB图像;利用注意力学习神经网络对所述仿真光场RGB图像进行光场深度估计,得到光场的估计深度图和全聚焦图像。The first embodiment of the present application provides a light field depth estimation method, which includes the following steps: performing gamma correction on a light field RGB image to generate a light field gamma corrected RGB image; converting the light field gamma corrected RGB image, The light field depth map and the light field simulation point transfer function simulation image are forward RGB generated to simulate the light field defocus image to obtain the simulated light field RGB image; the attention learning neural network is used to perform light field depth on the simulated light field RGB image. Estimated, the estimated depth map and fully focused image of the light field are obtained.
可选地,在本申请的一个实施例中,在得到所述仿真光场RGB图像之前,还包括:根据光场相机参数和点源场传播过程,计算物方点经过主透镜的波函数;利用微透镜阵列的相位调制函数对所述波函数进行调制;采集相机像素对特定频率的分量,根据调制后的波函数和所述特定频率的分量计算点传输函数,并对所述点传输函数进行数值模拟及并采用双线性插值、拟合、对称补全和归一化操作,得到仿真点传输函数图;基于所述仿真点传输函数图对物方深度值进行随机采样,模拟仿真随机采样深度下的仿真点传输函数,得到所述光场仿真点传输函数仿真图像。Optionally, in one embodiment of the present application, before obtaining the simulated light field RGB image, the method further includes: calculating the wave function of the object point passing through the main lens according to the light field camera parameters and the point source field propagation process; The wave function is modulated using the phase modulation function of the microlens array; the component of the camera pixel to a specific frequency is collected, a point transfer function is calculated based on the modulated wave function and the component of the specific frequency, and the point transfer function is calculated Perform numerical simulation and use bilinear interpolation, fitting, symmetry completion and normalization operations to obtain a simulation point transfer function diagram; based on the simulation point transfer function diagram, randomly sample the depth value of the object space, and simulate the random The simulation point transfer function at the sampling depth is used to obtain the simulation image of the light field simulation point transfer function.
可选地,在本申请的一个实施例中,所述将所述光场伽马校正RGB图像、光场深度图和光场仿真点传输函数仿真图像进行前向RGB生成以仿真光场散焦图像,得到仿真光场RGB图像,包括:对所述光场深度图进行离散化,得到所述光场深度图对应的二进制深度掩膜;将所述光场伽马校正RGB图像与所述二进制深度掩膜相乘,得到对应的光场深度切片RGB图;将所述光场深度切片RGB图、所述二进制深度掩膜和所述光场仿真点传输函数仿真图像输入光场非线性成像模型,得到所述仿真光场RGB图像。Optionally, in one embodiment of the present application, the light field gamma corrected RGB image, the light field depth map and the light field simulation point transfer function simulation image are forward RGB generated to simulate the light field defocus image. , obtaining a simulated light field RGB image, including: discretizing the light field depth map to obtain a binary depth mask corresponding to the light field depth map; combining the light field gamma corrected RGB image with the binary depth Multiply the masks to obtain the corresponding light field depth slice RGB image; input the light field depth slice RGB image, the binary depth mask and the light field simulation point transfer function simulation image into the light field nonlinear imaging model, The simulated light field RGB image is obtained.
可选地,在本申请的一个实施例中,所述利用注意力学习神经网络对所述仿真光场RGB图像进行光场深度估计,得到光场的估计深度图和全聚焦图像,包括:利用所述仿真光场RGB图像和RL迭代与估计算法获得初始光场聚焦序列,利用光场子孔径图像经所述注意力学习神经网络得到所述光场的估计深度图;将所述光场子孔径图像与所述初始光场聚焦序列级联,通过编解码网络输出估计的光场重聚焦序列,将所述光场重聚焦序列与连续深度体进行相乘,得到所述全聚焦图像。Optionally, in one embodiment of the present application, using an attention learning neural network to perform light field depth estimation on the simulated light field RGB image to obtain an estimated depth map of the light field and a fully focused image includes: using The simulated light field RGB image and RL iteration and estimation algorithm obtain the initial light field focusing sequence, and use the light field sub-aperture image to obtain the estimated depth map of the light field through the attention learning neural network; the light field sub-aperture image is Concatenated with the initial light field focusing sequence, the estimated light field refocusing sequence is output through the encoding and decoding network, and the light field refocusing sequence is multiplied by the continuous depth volume to obtain the fully focused image.
可选地,在本申请的一个实施例中,在利用注意力学习神经网络对所述仿真光场RGB图像进行光场深度估计,得到光场的估计深度图和全聚焦图像之后,还包括:将所述估计深度图与所述全聚焦图像分别与其真值比较,计算损失函数,后向传播误差训练所述注意 力学习神经网络参数。Optionally, in one embodiment of the present application, after using an attention learning neural network to perform light field depth estimation on the simulated light field RGB image to obtain an estimated depth map of the light field and a fully focused image, the method further includes: The estimated depth map and the fully focused image are compared with their true values respectively, a loss function is calculated, and the error is backpropagated to train the attention learning neural network parameters.
本申请第二方面实施例提供一种光场深度估计装置,包括:处理模块,用于对光场RGB图像进行伽马校正,生成光场伽马校正RGB图像;生成模块,用于将所述光场伽马校正RGB图像、光场深度图和光场仿真点传输函数仿真图像进行前向RGB生成以仿真光场散焦图像,得到仿真光场RGB图像;估计模块,用于利用注意力学习神经网络对所述仿真光场RGB图像进行光场深度估计,得到光场的估计深度图和全聚焦图像。A second embodiment of the present application provides a light field depth estimation device, including: a processing module for performing gamma correction on a light field RGB image and generating a light field gamma corrected RGB image; and a generation module for converting the light field RGB image into a light field depth estimation device. The light field gamma corrected RGB image, the light field depth map and the light field simulation point transfer function simulation image are forward RGB generated to simulate the light field defocus image to obtain the simulated light field RGB image; the estimation module is used to use attention to learn neural The network performs light field depth estimation on the simulated light field RGB image to obtain an estimated depth map and a fully focused image of the light field.
可选地,在本申请的一个实施例中,还包括:输出模块,用于在得到所述仿真光场RGB图像之前,根据光场相机参数和点源场传播过程,计算物方点经过主透镜的波函数,利用微透镜阵列的相位调制函数对所述波函数进行调制,采集相机像素对特定频率的分量,根据调制后的波函数和所述特定频率的分量计算点传输函数,并对所述点传输函数进行数值模拟及并采用双线性插值、拟合、对称补全和归一化操作,得到仿真点传输函数图,基于所述仿真点传输函数图对物方深度值进行随机采样,模拟仿真随机采样深度下的仿真点传输函数,得到所述光场仿真点传输函数仿真图像。Optionally, in an embodiment of the present application, it also includes: an output module, used to calculate the object direction point passing through the main body according to the light field camera parameters and the point source field propagation process before obtaining the simulated light field RGB image. The wave function of the lens is modulated by the phase modulation function of the microlens array, the component of the camera pixel to a specific frequency is collected, the point transfer function is calculated based on the modulated wave function and the component of the specific frequency, and The point transfer function is numerically simulated and bilinear interpolation, fitting, symmetry completion and normalization operations are used to obtain a simulation point transfer function diagram, and the object depth value is randomized based on the simulation point transfer function diagram. Sampling, simulating the simulation point transfer function under random sampling depth, to obtain the light field simulation point transfer function simulation image.
可选地,在本申请的一个实施例中,所述生成模块,进一步用于,对所述光场深度图进行离散化,得到所述光场深度图对应的二进制深度掩膜;将所述光场伽马校正RGB图像与所述二进制深度掩膜相乘,得到对应的光场深度切片RGB图;将所述光场深度切片RGB图、所述二进制深度掩膜和所述光场仿真点传输函数仿真图像输入光场非线性成像模型,得到所述仿真光场RGB图像。Optionally, in one embodiment of the present application, the generation module is further configured to discretize the light field depth map to obtain a binary depth mask corresponding to the light field depth map; convert the Multiply the light field gamma corrected RGB image with the binary depth mask to obtain the corresponding light field depth slice RGB image; combine the light field depth slice RGB image, the binary depth mask and the light field simulation point The transfer function simulation image is input into the light field nonlinear imaging model to obtain the simulated light field RGB image.
可选地,在本申请的一个实施例中,所述估计模块,进一步用于,利用所述仿真光场RGB图像和RL迭代与估计算法获得初始光场聚焦序列,利用光场子孔径图像经所述注意力学习神经网络得到所述光场的估计深度图;将所述光场子孔径图像与所述初始光场聚焦序列级联,通过编解码网络输出估计的光场重聚焦序列,将所述光场重聚焦序列与连续深度体进行相乘,得到所述全聚焦图像。Optionally, in one embodiment of the present application, the estimation module is further configured to use the simulated light field RGB image and the RL iteration and estimation algorithm to obtain the initial light field focusing sequence, and use the light field sub-aperture image to The attention learning neural network obtains the estimated depth map of the light field; the light field sub-aperture image is cascaded with the initial light field focusing sequence, and the estimated light field refocusing sequence is output through the encoding and decoding network, and the light field refocusing sequence is output. The light field refocusing sequence is multiplied by the continuous depth volume to obtain the fully focused image.
可选地,在本申请的一个实施例中,还包括:训练模块,用于将所述估计深度图与所述全聚焦图像分别与其真值比较,计算损失函数,后向传播误差训练所述注意力学习神经网络参数。Optionally, in an embodiment of the present application, it also includes: a training module for comparing the estimated depth map and the fully focused image with their true values respectively, calculating a loss function, and back propagating error training. Attention learns neural network parameters.
本申请第三方面实施例提供一种电子设备,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序,以执行如上述实施例所述的光场深度估计方法。A third embodiment of the present application provides an electronic device, including: a memory, a processor, and a computer program stored on the memory and executable on the processor. The processor executes the program to execute The light field depth estimation method as described in the above embodiment.
本申请第四方面实施例提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行,以执行如上述实施例所述的光场深度估计方法。A fourth embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and the program is executed by a processor to perform the light field depth estimation method as described in the above embodiment.
本申请的实施例至少具有以下有益效果:The embodiments of the present application have at least the following beneficial effects:
(1)考虑光场散焦特征的前向成像建模方法提升了深度估计准确度。采用非线性光场散焦成像模型模拟了光场成像过程中不同深度、不同视角的散焦特征,而并非理想的小孔成像模型,能够更加准确对成像过程进行建模,从而提升光场深度估计的准确性。(1) The forward imaging modeling method that considers the defocus characteristics of the light field improves the accuracy of depth estimation. The nonlinear light field defocus imaging model is used to simulate the defocus characteristics of different depths and different viewing angles during the light field imaging process, instead of the ideal pinhole imaging model, which can model the imaging process more accurately, thereby improving the light field depth. Accuracy of estimate.
(2)考虑了遮挡的非线性光场成像模型提升了深度估计准确度。采用α合成的光场非线性成像,考虑了遮挡带来的影响,能够提供深度估计准确度。(2) The nonlinear light field imaging model that takes occlusion into consideration improves the accuracy of depth estimation. Light field nonlinear imaging using alpha synthesis takes into account the impact of occlusion and can provide depth estimation accuracy.
(3)深度估计结果更加快速。采用注意力深度神经网络模型,采用对称的注意力图以较少的训练参数确定光场视角的权重系数,相比于传统迭代方法更加快速、高效。(3) Depth estimation results are faster. The attention deep neural network model is used, and the symmetrical attention map is used to determine the weight coefficient of the light field perspective with fewer training parameters, which is faster and more efficient than the traditional iterative method.
本申请附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实践了解到。Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.
附图说明Description of the drawings
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present application will become apparent and readily understood from the following description of the embodiments in conjunction with the accompanying drawings, in which:
图1为根据本申请实施例提供的一种光场深度估计方法的流程图;Figure 1 is a flow chart of a light field depth estimation method provided according to an embodiment of the present application;
图2为根据本申请实施例提供的一种光场深度估计方法的结构框架图;Figure 2 is a structural framework diagram of a light field depth estimation method provided according to an embodiment of the present application;
图3为根据本申请实施例提供的一种光场传输模型流程图;Figure 3 is a flow chart of a light field transmission model provided according to an embodiment of the present application;
图4为根据本申请实施例提供的一种光场前向散焦RGB生成过程示意图;Figure 4 is a schematic diagram of a light field forward defocused RGB generation process provided according to an embodiment of the present application;
图5为根据本申请实施例提供的一种光场深度反演过程示意图;Figure 5 is a schematic diagram of a light field depth inversion process provided according to an embodiment of the present application;
图6为根据本申请实施例提供的一种光场深度反演网络结构图;Figure 6 is a structural diagram of a light field depth inversion network provided according to an embodiment of the present application;
图7为根据本申请实施例的光场深度估计装置的示例图;Figure 7 is an example diagram of a light field depth estimation device according to an embodiment of the present application;
图8为申请实施例提供的电子设备的结构示意图。FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
具体实施方式Detailed ways
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本申请,而不能理解为对本申请的限制。The embodiments of the present application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary and are intended to explain the present application, but should not be construed as limiting the present application.
下面参考附图描述本申请实施例的光场深度估计方法、装置、电子设备及存储介质。针对上述背景技术中心提到的现有光场深度成像模型的误差以及优化算法误差使得光场深度估计准确度低,算法的复杂度导致计算量大,算法效率不高,无法满足实时性要求的问题,本申请提供了一种光场深度估计方法,在该方法中,从正向建立光场散焦成像模型以提高深度估计准确度,基于光场散焦成像模型的光学图像,从逆向构建注意力学习神经网络以快速估计场景深度,同时获得全聚焦图像,由此,提高了光场深度估计的准确度,并 降低了算法的计算量,提高了估计效率,可以进行实时估计。The light field depth estimation method, device, electronic device and storage medium according to the embodiments of the present application will be described below with reference to the accompanying drawings. In view of the errors in the existing light field depth imaging model and optimization algorithm errors mentioned by the above background technology center, the accuracy of light field depth estimation is low, the complexity of the algorithm leads to a large amount of calculation, the algorithm efficiency is not high, and it cannot meet the real-time requirements. Problem, this application provides a light field depth estimation method. In this method, a light field defocus imaging model is established from the forward direction to improve the depth estimation accuracy. The optical image based on the light field defocus imaging model is constructed from the reverse direction. Attention learns neural networks to quickly estimate scene depth and obtain fully focused images at the same time. This improves the accuracy of light field depth estimation, reduces the calculation amount of the algorithm, improves estimation efficiency, and enables real-time estimation.
具体而言,图1为根据本申请实施例提供的一种光场深度估计方法的流程图。Specifically, FIG. 1 is a flow chart of a light field depth estimation method provided according to an embodiment of the present application.
如图1所示,该光场深度估计方法包括以下步骤:As shown in Figure 1, the light field depth estimation method includes the following steps:
在步骤S101中,对光场RGB图像进行伽马校正,生成光场伽马校正RGB图像。In step S101, gamma correction is performed on the light field RGB image to generate a light field gamma corrected RGB image.
在本申请的实施例中,基于现有开源光场数据集,对光场RGB图像进行伽马校正得到光场伽马校正RGB图像,与光场深度图一起送入前向RGB生成模型用以仿真散焦光场图像。In the embodiment of this application, based on the existing open source light field data set, the light field RGB image is gamma corrected to obtain the light field gamma corrected RGB image, which is sent to the forward RGB generation model together with the light field depth map for Simulate defocused light field images.
在步骤S102中,将光场伽马校正RGB图像、光场深度图和光场仿真点传输函数仿真图像进行前向RGB生成以仿真光场散焦图像,得到仿真光场RGB图像。In step S102, the light field gamma corrected RGB image, the light field depth map and the light field simulation point transfer function simulation image are subjected to forward RGB generation to simulate the light field defocus image to obtain a simulated light field RGB image.
可选地,在本申请的一个实施例中,在得到仿真光场RGB图像之前,还包括:根据光场相机参数和点源场传播过程,计算物方点经过主透镜的波函数;利用微透镜阵列的相位调制函数对波函数进行调制;采集相机像素对特定频率的分量,根据调制后的波函数和特定频率的分量计算点传输函数,并对点传输函数进行数值模拟及并采用双线性插值、拟合、对称补全和归一化操作,得到仿真点传输函数图;基于仿真点传输函数图对物方深度值进行随机采样,模拟仿真随机采样深度下的仿真点传输函数,得到光场仿真点传输函数仿真图像。Optionally, in one embodiment of the present application, before obtaining the simulated light field RGB image, it also includes: calculating the wave function of the object point passing through the main lens according to the light field camera parameters and the point source field propagation process; using micro The phase modulation function of the lens array modulates the wave function; the component of the camera pixel to a specific frequency is collected, the point transfer function is calculated based on the modulated wave function and the component of the specific frequency, and the point transfer function is numerically simulated and a double line is used Through sexual interpolation, fitting, symmetry completion and normalization operations, the simulation point transfer function diagram is obtained; based on the simulation point transfer function diagram, the object depth value is randomly sampled, and the simulation point transfer function under the random sampling depth is simulated, and the simulation point transfer function is obtained Light field simulation point transfer function simulation image.
如图2和图3所示,计算光场仿真点传输函数仿真图像。具体地,基于光场相机的参数模型,根据光学传输衍射公式计算光场点传输函数(point spread function,PSF),为仿真不同视角光场散焦图像提供散焦线索。相关技术仅考虑了单视角正向传输模型,本申请考虑了多视角的光场传输模型。As shown in Figure 2 and Figure 3, the light field simulation point transfer function simulation image is calculated. Specifically, based on the parameter model of the light field camera, the light field point transfer function (PSF) is calculated according to the optical transmission diffraction formula, which provides defocus clues for simulating light field defocused images from different viewing angles. The related art only considers a single-view forward transmission model, while this application considers a multi-view light field transmission model.
(1)首先根据光场相机参数及点源场传播过程,物方点经过主透镜的波函数U(x,y,x,λ)可以表达为:(1) First, according to the light field camera parameters and the point source field propagation process, the wave function U(x, y, x, λ) of the object-space point passing through the main lens can be expressed as:
Figure PCTCN2022091182-appb-000001
Figure PCTCN2022091182-appb-000001
其中,x,y,x为物点的三维空间距离坐标,z含义为深度,λ为波长,r为孔径平面径向距离,J 0(·)为第一类零阶贝塞尔函数,D(r,λ,z)表达式如下: Among them, x, y, x are the three-dimensional space distance coordinates of the object point, z means depth, λ is the wavelength, r is the radial distance of the aperture plane, J 0 (·) is the first kind of zero-order Bessel function, D The expression of (r,λ,z) is as follows:
Figure PCTCN2022091182-appb-000002
Figure PCTCN2022091182-appb-000002
其中,s为透镜与传感器之间距离,d为物体与透镜距离。Among them, s is the distance between the lens and the sensor, and d is the distance between the object and the lens.
(2)微透镜阵列的相位调制函数为:(2) The phase modulation function of the microlens array is:
Figure PCTCN2022091182-appb-000003
Figure PCTCN2022091182-appb-000003
其中,x 0,y 0为微透镜阵列的中心坐标,f为焦距,n为折射率。 Among them, x 0 and y 0 are the center coordinates of the microlens array, f is the focal length, and n is the refractive index.
(3)经过微透镜相位调制后的光场信息为:(3) The light field information after microlens phase modulation is:
U′(ω xy)=F ω(U(x,y,z,λ)·t(x,y,x 0,y 0)) U′(ω xy )=F ω (U(x,y,z,λ)·t(x,y,x 0 ,y 0 ))
其中,ω xy为对空间(x,y)的频域采样,F ω(.)为傅里叶变换操作。 Among them, ω x , ω y are frequency domain samples of space (x, y), and F ω (.) is the Fourier transform operation.
(4)相机像素对特定频率分量的采集过程为:(4) The process of collecting specific frequency components by camera pixels is:
Figure PCTCN2022091182-appb-000004
Figure PCTCN2022091182-appb-000004
其中,ω u为角度u对应的空间频率位置,ω v为角度v对应的空间频率位置,rect(.)为矩形窗函数。 Among them, ω u is the spatial frequency position corresponding to angle u, ω v is the spatial frequency position corresponding to angle v, and rect(.) is the rectangular window function.
(5)则经微透镜后的点传输函数PSF可以表达为:(5) Then the point transfer function PSF after passing through the microlens can be expressed as:
Figure PCTCN2022091182-appb-000005
Figure PCTCN2022091182-appb-000005
(6)对上式PSF进行数值模拟,并采用双线性插值、拟合、对称补全和归一化操作,得到仿真PSF图。(6) Carry out numerical simulation on the PSF of the above formula, and use bilinear interpolation, fitting, symmetry completion and normalization operations to obtain the simulated PSF diagram.
(7)对物方深度值z进行随机采样,模拟仿真随机采样深度下的PSF,获得鲁棒的训练样本的PSF仿真图像。(7) Randomly sample the object depth value z, simulate the PSF at the random sampling depth, and obtain a robust PSF simulation image of the training sample.
可选地,在本申请的一个实施例中,将光场伽马校正RGB图像、光场深度图和光场仿真点传输函数仿真图像进行前向RGB生成以仿真光场散焦图像,得到仿真光场RGB图像,包括:对光场深度图进行离散化,得到光场深度图对应的二进制深度掩膜;将光场伽马校正RGB图像与二进制深度掩膜相乘,得到对应的光场深度切片RGB图;将光场深度切片RGB图、二进制深度掩膜和光场仿真点传输函数仿真图像输入光场非线性成像模型,得到仿真光场RGB图像。Optionally, in one embodiment of the present application, the light field gamma corrected RGB image, the light field depth map and the light field simulation point transfer function simulation image are subjected to forward RGB generation to simulate the light field defocus image to obtain the simulated light Field RGB image, including: discretizing the light field depth map to obtain the binary depth mask corresponding to the light field depth map; multiplying the light field gamma corrected RGB image and the binary depth mask to obtain the corresponding light field depth slice RGB image; input the light field depth slice RGB image, binary depth mask and light field simulation point transfer function simulation image into the light field nonlinear imaging model to obtain the simulated light field RGB image.
将得到的光场伽马校正RGB图像、光场深度图以及仿真获得的PSF送入前向RGB生成过程,以仿真光场散焦图像。该前向仿真过程不同于现有的理想小孔成像技术,通过引用光场点扩散函数对不同深度、不同角度的光学散焦进行建模,能够更接近真实获得的光场图像。前向生成过程如图4所示。The obtained light field gamma corrected RGB image, light field depth map and simulated PSF are sent to the forward RGB generation process to simulate the light field defocused image. This forward simulation process is different from the existing ideal pinhole imaging technology. By citing the light field point spread function to model optical defocus at different depths and angles, it can be closer to the real light field image obtained. The forward generation process is shown in Figure 4.
(1)首先对光场深度图进行离散化,获得对应深度图的二进制深度掩膜α k(λ),k=1,2,..,K。K为离散的深度层数。 (1) First, discretize the light field depth map to obtain the binary depth mask α k (λ) corresponding to the depth map, k = 1, 2,...,K. K is the number of discrete depth layers.
(2)将第一步获得的光场伽马校正RGB图像与离散的二进制深度掩膜相乘,获得对应的光场深度切片RGB图。(2) Multiply the light field gamma-corrected RGB image obtained in the first step with the discrete binary depth mask to obtain the corresponding light field depth slice RGB image.
(3)将光场深度切片RGB图、二进制深度掩膜α k(λ)以及仿真得到的光场PSF送入考虑遮挡基于α合成的光场非线性成像模型,以得到仿真的光场RGB图像b(λ,u,v),如下式所示: (3) Send the light field depth slice RGB image, the binary depth mask α k (λ) and the simulated light field PSF into the light field nonlinear imaging model based on α synthesis considering occlusion to obtain the simulated light field RGB image b(λ,u,v), as shown in the following formula:
Figure PCTCN2022091182-appb-000006
Figure PCTCN2022091182-appb-000006
其中:in:
Figure PCTCN2022091182-appb-000007
Figure PCTCN2022091182-appb-000007
Figure PCTCN2022091182-appb-000008
Figure PCTCN2022091182-appb-000008
其中,深度图量化为K层,u,v表示光场视角,λ为波长,*表示卷积操作。α k(k=1,2,…,K)为二进制深度掩膜,l k(λ)为第k层光场深度切片RGB图,PSF k(λ,u,v)表示u,v视角下的第k层深度处的PSF,
Figure PCTCN2022091182-appb-000009
表征了不考虑遮挡时第k层深度处的成像响应,
Figure PCTCN2022091182-appb-000010
表征了遮挡对第k层深度处成像的影响,η为加性噪声。
Among them, the depth map is quantized into K layers, u, v represent the light field perspective, λ is the wavelength, and * represents the convolution operation. α k (k=1,2,…,K) is the binary depth mask, l k (λ) is the k-th layer light field depth slice RGB image, PSF k (λ, u, v) represents the u, v viewing angle The PSF at the kth layer depth of ,
Figure PCTCN2022091182-appb-000009
Characterizes the imaging response at the kth layer depth without considering occlusion,
Figure PCTCN2022091182-appb-000010
It represents the impact of occlusion on imaging at the kth layer depth, and eta is additive noise.
在步骤S103中,利用注意力学习神经网络对仿真光场RGB图像进行光场深度估计,得到光场的估计深度图和全聚焦图像。In step S103, the attention learning neural network is used to perform light field depth estimation on the simulated light field RGB image, and an estimated depth map and a fully focused image of the light field are obtained.
可选地,在本申请的一个实施例中,利用注意力学习神经网络对仿真光场RGB图像进行光场深度估计,得到光场的估计深度图和全聚焦图像,包括:利用仿真光场RGB图像和RL迭代与估计算法获得初始光场聚焦序列,利用光场子孔径图像经注意力学习神经网络得到光场的估计深度图;将光场子孔径图像与初始光场聚焦序列级联,通过编解码网络输出估计的光场重聚焦序列,将光场重聚焦序列与连续深度体进行相乘,得到全聚焦图像。Optionally, in one embodiment of the present application, the attention learning neural network is used to perform light field depth estimation on the simulated light field RGB image, and the estimated depth map of the light field and the fully focused image are obtained, including: using the simulated light field RGB The image and RL iteration and estimation algorithm obtain the initial light field focusing sequence, and use the light field sub-aperture image to obtain the estimated depth map of the light field through the attention learning neural network; the light field sub-aperture image is cascaded with the initial light field focusing sequence, and is encoded and decoded The network outputs the estimated light field refocusing sequence, and multiplies the light field refocusing sequence with the continuous depth volume to obtain a fully focused image.
可选地,在本申请的一个实施例中,在利用注意力学习神经网络对仿真光场RGB图像进行光场深度估计,得到光场的估计深度图和全聚焦图像之后,还包括:将估计深度图与全聚焦图像分别与其真值比较,计算损失函数,后向传播误差训练注意力学习神经网络参数。Optionally, in one embodiment of the present application, after using the attention learning neural network to perform light field depth estimation on the simulated light field RGB image to obtain the estimated depth map of the light field and the fully focused image, the method further includes: estimating The depth map and fully focused image are compared with their true values respectively, the loss function is calculated, and the error is back propagated to train the attention learning neural network parameters.
如图5所示,基于上述前向模型仿真得到的光场RGB图像进行深度估计,首先,采用RL迭代与估计算法获得初始光场聚焦序列(focalstack),采用注意力深度学习网络训练、估计光场深度图。As shown in Figure 5, depth estimation is performed based on the light field RGB image obtained by the above forward model simulation. First, the RL iteration and estimation algorithm is used to obtain the initial light field focus sequence (focalstack), and the attention deep learning network is used to train and estimate the light field. Depth of Field Chart.
具体地,本申请采用的深度反演神经网络如图6所示。深度反演网络的输入为第三步仿真得到的多视角光场图像,输出为估计深度图及其对应的全聚焦(All-in-focus)图像。Specifically, the deep inversion neural network used in this application is shown in Figure 6. The input of the depth inversion network is the multi-view light field image obtained by the simulation in the third step, and the output is the estimated depth map and its corresponding all-in-focus image.
深度反演网络主要由两部分组成:第一部分采用注意力视角选择神经网络,主要由2D卷积、Resblock、SPPModule、CostVolume、AttentionModule、3DCNN等模块组成,输入为光场子孔径图像,输出为H×W×K维度(K表示深度层数,H,W分别表示光场图像的长和宽)的深度体M,取最大值即可得到光场深度图。The depth inversion network mainly consists of two parts: the first part uses the attention perspective selection neural network, which is mainly composed of 2D convolution, Resblock, SPPModule, CostVolume, AttentionModule, 3DCNN and other modules. The input is the light field sub-aperture image, and the output is H× For a depth volume M of W×K dimensions (K represents the number of depth layers, H and W represent the length and width of the light field image respectively), the light field depth map can be obtained by taking the maximum value.
第二部分采用3DU-Net编解码网络估计全聚焦图像,输入为光场子孔径图像,首先经过2DCNN、Resblock、SPPModule等模块,然后与预估计的光场聚焦序列进行级联,然后 通过编解码网络输出估计的光场重聚焦序列,与连续深度体M AiF相乘即可得到全聚焦图像。 The second part uses the 3DU-Net encoding and decoding network to estimate the fully focused image. The input is the light field sub-aperture image. It first passes through 2DCNN, Resblock, SPPModule and other modules, and then is cascaded with the pre-estimated light field focusing sequence, and then passes through the encoding and decoding network. The estimated light field refocusing sequence is output and multiplied with the continuous depth volume M AiF to obtain a fully focused image.
深度体M与M AiF关系定义为: The relationship between depth volume M and M AiF is defined as:
Figure PCTCN2022091182-appb-000011
Figure PCTCN2022091182-appb-000011
其中:in:
Figure PCTCN2022091182-appb-000012
Figure PCTCN2022091182-appb-000012
i=1,…,H,j=1,…,W,k=1,…,Ki=1,…,H, j=1,…,W, k=1,…,K
其中,
Figure PCTCN2022091182-appb-000013
为连续深度体,i和j分别表示第i行和第j列,k表示第k深度层,为M i,j,k为和
Figure PCTCN2022091182-appb-000014
分别表示深度体M和连续深度体M AiF的第i行第j列第k深度层的取值。
in,
Figure PCTCN2022091182-appb-000013
is a continuous depth volume, i and j represent the i-th row and j-th column respectively, k represents the k-th depth layer, and are M i, j, k and
Figure PCTCN2022091182-appb-000014
Respectively represent the values of the i-th row, j-th column, k-th depth layer of the depth volume M and the continuous depth volume M AiF .
将估计的光场深度图与光场全聚焦图像分别与其真值相比较,计算损失函数,后向传播误差训练网络参数。其中,光场深度图的损失函数选用最小均方误差函数,光场全聚焦图像的损失函数选用SSIM(structural similarity,结构相似性)指标。The estimated light field depth map and light field fully focused image are compared with their true values respectively, the loss function is calculated, and the error is back propagated to train the network parameters. Among them, the minimum mean square error function is used as the loss function of the light field depth map, and the SSIM (structural similarity) index is used as the loss function of the light field fully focused image.
根据本申请实施例提出的光场深度估计方法,从正向建立光场散焦成像模型以提高深度估计准确度,基于光场散焦成像模型的光学图像,从逆向构建注意力学习神经网络以快速估计场景深度,同时获得全聚焦图像,由此,提高了光场深度估计的准确度,并降低了算法的计算量,提高了估计效率,可以进行实时估计。According to the light field depth estimation method proposed in the embodiment of this application, a light field defocus imaging model is established from the forward direction to improve the depth estimation accuracy. Based on the optical image of the light field defocus imaging model, an attention learning neural network is constructed from the reverse direction to improve the depth estimation accuracy. The depth of the scene is quickly estimated and a fully focused image is obtained at the same time. This improves the accuracy of light field depth estimation, reduces the calculation amount of the algorithm, improves the estimation efficiency, and enables real-time estimation.
其次参照附图描述根据本申请实施例提出的光场深度估计装置。Next, the light field depth estimation device proposed according to the embodiment of the present application is described with reference to the accompanying drawings.
图7为根据本申请实施例的光场深度估计装置的示例图。Figure 7 is an example diagram of a light field depth estimation device according to an embodiment of the present application.
如图7所示,该光场深度估计装置10包括:处理模块100、生成模块200和估计模块300。As shown in FIG. 7 , the light field depth estimation device 10 includes: a processing module 100 , a generation module 200 and an estimation module 300 .
其中,处理模块100,用于对光场RGB图像进行伽马校正,生成光场伽马校正RGB图像。生成模块200,用于将光场伽马校正RGB图像、光场深度图和光场仿真点传输函数仿真图像进行前向RGB生成以仿真光场散焦图像,得到仿真光场RGB图像。估计模块300,用于利用注意力学习神经网络对仿真光场RGB图像进行光场深度估计,得到光场的估计深度图和全聚焦图像。Among them, the processing module 100 is used to perform gamma correction on the light field RGB image and generate a light field gamma corrected RGB image. The generation module 200 is used to perform forward RGB generation on the light field gamma corrected RGB image, the light field depth map and the light field simulation point transfer function simulation image to simulate the light field defocus image to obtain a simulated light field RGB image. The estimation module 300 is used to perform light field depth estimation on the simulated light field RGB image using an attention learning neural network to obtain an estimated depth map and a fully focused image of the light field.
可选地,在本申请的一个实施例中,光场深度估计装置10还包括:输出模块,用于在得到仿真光场RGB图像之前,根据光场相机参数和点源场传播过程,计算物方点经过主透镜的波函数,利用微透镜阵列的相位调制函数对波函数进行调制,采集相机像素对特定频率的分量,根据调制后的波函数和特定频率的分量计算点传输函数,并对点传输函数进行数值模拟及并采用双线性插值、拟合、对称补全和归一化操作,得到仿真点传输函数图, 基于仿真点传输函数图对物方深度值进行随机采样,模拟仿真随机采样深度下的仿真点传输函数,得到光场仿真点传输函数仿真图像。Optionally, in one embodiment of the present application, the light field depth estimation device 10 also includes: an output module, used to calculate the object according to the light field camera parameters and the point source field propagation process before obtaining the simulated light field RGB image. The square point passes through the wave function of the main lens, uses the phase modulation function of the microlens array to modulate the wave function, collects the component of the camera pixel to a specific frequency, calculates the point transfer function based on the modulated wave function and the component of the specific frequency, and calculates The point transfer function is numerically simulated and bilinear interpolation, fitting, symmetry completion and normalization operations are used to obtain the simulation point transfer function graph. Based on the simulation point transfer function graph, the depth value of the object space is randomly sampled to simulate the Randomly sample the simulated point transfer function at depth to obtain a light field simulated point transfer function simulation image.
可选地,在本申请的一个实施例中,生成模块200,进一步用于,对光场深度图进行离散化,得到光场深度图对应的二进制深度掩膜;将光场伽马校正RGB图像与二进制深度掩膜相乘,得到对应的光场深度切片RGB图;将光场深度切片RGB图、二进制深度掩膜和光场仿真点传输函数仿真图像输入光场非线性成像模型,得到仿真光场RGB图像。Optionally, in one embodiment of the present application, the generation module 200 is further configured to discretize the light field depth map to obtain a binary depth mask corresponding to the light field depth map; convert the light field gamma corrected RGB image into Multiply with the binary depth mask to obtain the corresponding light field depth slice RGB image; input the light field depth slice RGB image, binary depth mask and light field simulation point transfer function simulation image into the light field nonlinear imaging model to obtain the simulated light field RGB image.
可选地,在本申请的一个实施例中,估计模块300,进一步用于,利用仿真光场RGB图像和RL迭代与估计算法获得初始光场聚焦序列,利用光场子孔径图像经注意力学习神经网络得到光场的估计深度图;将光场子孔径图像与初始光场聚焦序列级联,通过编解码网络输出估计的光场重聚焦序列,将光场重聚焦序列与连续深度体进行相乘,得到全聚焦图像。Optionally, in one embodiment of the present application, the estimation module 300 is further configured to use simulated light field RGB images and RL iteration and estimation algorithms to obtain the initial light field focusing sequence, and use the light field sub-aperture image to learn the neural network through attention. The network obtains the estimated depth map of the light field; cascades the light field sub-aperture image with the initial light field focusing sequence, outputs the estimated light field refocusing sequence through the encoding and decoding network, and multiplies the light field refocusing sequence with the continuous depth volume. A fully focused image is obtained.
可选地,在本申请的一个实施例中,光场深度估计装置10还包括:训练模块,用于将估计深度图与全聚焦图像分别与其真值比较,计算损失函数,后向传播误差训练注意力学习神经网络参数。Optionally, in one embodiment of the present application, the light field depth estimation device 10 also includes: a training module for comparing the estimated depth map and the fully focused image with their true values respectively, calculating the loss function, and back propagating error training Attention learns neural network parameters.
需要说明的是,前述对光场深度估计方法实施例的解释说明也适用于该实施例的光场深度估计装置,此处不再赘述。It should be noted that the foregoing explanation of the embodiment of the light field depth estimation method also applies to the light field depth estimation device of this embodiment, and will not be described again here.
根据本申请实施例提出的光场深度估计装置,从正向建立光场散焦成像模型以提高深度估计准确度,基于光场散焦成像模型的光学图像,从逆向构建注意力学习神经网络以快速估计场景深度,同时获得全聚焦图像,由此,提高了光场深度估计的准确度,并降低了算法的计算量,提高了估计效率,可以进行实时估计。According to the light field depth estimation device proposed in the embodiment of the present application, a light field defocus imaging model is established from the forward direction to improve the depth estimation accuracy. Based on the optical image of the light field defocus imaging model, an attention learning neural network is constructed from the reverse direction to improve the depth estimation accuracy. The depth of the scene is quickly estimated and a fully focused image is obtained at the same time. This improves the accuracy of light field depth estimation, reduces the calculation amount of the algorithm, improves the estimation efficiency, and enables real-time estimation.
图8为本申请实施例提供的电子设备的结构示意图。该电子设备可以包括:FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. The electronic device may include:
存储器801、处理器802及存储在存储器801上并可在处理器802上运行的计算机程序。 Memory 801, processor 802, and a computer program stored on memory 801 and executable on processor 802.
处理器802执行程序时实现上述实施例中提供的光场深度估计方法。When the processor 802 executes the program, it implements the light field depth estimation method provided in the above embodiment.
进一步地,车辆还包括:Furthermore, vehicles also include:
通信接口803,用于存储器801和处理器802之间的通信。 Communication interface 803 is used for communication between the memory 801 and the processor 802.
存储器801,用于存放可在处理器802上运行的计算机程序。 Memory 801 is used to store computer programs that can run on the processor 802.
存储器801可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。The memory 801 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
如果存储器801、处理器802和通信接口803独立实现,则通信接口803、存储器801和处理器802可以通过总线相互连接并完成相互间的通信。总线可以是工业标准体系结构(Industry Standard Architecture,简称为ISA)总线、外部设备互连(Peripheral Component,简称为PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,简称 为EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图8中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。If the memory 801, the processor 802 and the communication interface 803 are implemented independently, the communication interface 803, the memory 801 and the processor 802 can be connected to each other through a bus and complete communication with each other. The bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 8, but it does not mean that there is only one bus or one type of bus.
可选的,在具体实现上,如果存储器801、处理器802及通信接口803,集成在一块芯片上实现,则存储器801、处理器802及通信接口803可以通过内部接口完成相互间的通信。Optionally, in terms of specific implementation, if the memory 801, the processor 802 and the communication interface 803 are integrated on one chip, the memory 801, the processor 802 and the communication interface 803 can communicate with each other through the internal interface.
处理器802可能是一个中央处理器(Central Processing Unit,简称为CPU),或者是特定集成电路(Application Specific Integrated Circuit,简称为ASIC),或者是被配置成实施本申请实施例的一个或多个集成电路。The processor 802 may be a central processing unit (Central Processing Unit, CPU for short), or an Application Specific Integrated Circuit (ASIC for short), or one or more processors configured to implement the embodiments of the present application. integrated circuit.
本实施例还提供一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如上的光场深度估计方法。This embodiment also provides a computer-readable storage medium on which a computer program is stored, which is characterized in that when the program is executed by a processor, the above light field depth estimation method is implemented.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或N个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, reference to the terms "one embodiment," "some embodiments," "an example," "specific examples," or "some examples" or the like means that specific features are described in connection with the embodiment or example. , structures, materials or features are included in at least one embodiment or example of the present application. In this specification, the schematic expressions of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, those skilled in the art may combine and combine different embodiments or examples and features of different embodiments or examples described in this specification unless they are inconsistent with each other.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本申请的描述中,“N个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。In addition, the terms “first” and “second” are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Therefore, features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of this application, "N" means at least two, such as two, three, etc., unless otherwise clearly and specifically limited.
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更N个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent modules, segments, or portions of code that include one or more executable instructions for implementing customized logical functions or steps of the process. , and the scope of the preferred embodiments of the present application includes additional implementations in which functions may be performed out of the order shown or discussed, including in a substantially simultaneous manner or in the reverse order, depending on the functionality involved, which shall It should be understood by those skilled in the technical field to which the embodiments of this application belong.
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,N个步骤或方法可以用存储在存储器中且由合适的指令执行***执行的软件或固件来实现。如,如果用硬件来实现和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that various parts of the present application can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented using software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if it is implemented in hardware, as in another embodiment, it can be implemented by any one of the following technologies known in the art or their combination: discrete logic gate circuits with logic functions for implementing data signals; Logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。Those of ordinary skill in the art can understand that all or part of the steps involved in implementing the methods of the above embodiments can be completed by instructing relevant hardware through a program. The program can be stored in a computer-readable storage medium. The program can be stored in a computer-readable storage medium. When executed, one of the steps of the method embodiment or a combination thereof is included.

Claims (12)

  1. 一种光场深度估计方法,其特征在于,包括以下步骤:A light field depth estimation method, characterized by including the following steps:
    对光场RGB图像进行伽马校正,生成光场伽马校正RGB图像;Perform gamma correction on the light field RGB image to generate a light field gamma corrected RGB image;
    将所述光场伽马校正RGB图像、光场深度图和光场仿真点传输函数仿真图像进行前向RGB生成以仿真光场散焦图像,得到仿真光场RGB图像;以及Perform forward RGB generation on the light field gamma corrected RGB image, light field depth map and light field simulation point transfer function simulation image to simulate the light field defocus image to obtain a simulated light field RGB image; and
    利用注意力学习神经网络对所述仿真光场RGB图像进行光场深度估计,得到光场的估计深度图和全聚焦图像。The attention learning neural network is used to perform light field depth estimation on the simulated light field RGB image, and the estimated depth map and fully focused image of the light field are obtained.
  2. 根据权利要求1所述的方法,其特征在于,在得到所述仿真光场RGB图像之前,还包括:The method according to claim 1, characterized in that, before obtaining the simulated light field RGB image, it further includes:
    根据光场相机参数和点源场传播过程,计算物方点经过主透镜的波函数;According to the light field camera parameters and the point source field propagation process, calculate the wave function of the object point passing through the main lens;
    利用微透镜阵列的相位调制函数对所述波函数进行调制;Modulating the wave function using a phase modulation function of a microlens array;
    采集相机像素对特定频率的分量,根据调制后的波函数和所述特定频率的分量计算点传输函数,并对所述点传输函数进行数值模拟及并采用双线性插值、拟合、对称补全和归一化操作,得到仿真点传输函数图;Collect the component of the camera pixel to a specific frequency, calculate the point transfer function based on the modulated wave function and the component of the specific frequency, perform numerical simulation on the point transfer function, and use bilinear interpolation, fitting, and symmetry complementation The total sum normalization operation is performed to obtain the simulation point transfer function diagram;
    基于所述仿真点传输函数图对物方深度值进行随机采样,模拟仿真随机采样深度下的仿真点传输函数,得到所述光场仿真点传输函数仿真图像。Randomly sample the object depth value based on the simulation point transfer function map, simulate the simulation point transfer function at the random sampling depth, and obtain the light field simulation point transfer function simulation image.
  3. 根据权利要求1或2所述的方法,其特征在于,所述将所述光场伽马校正RGB图像、光场深度图和光场仿真点传输函数仿真图像进行前向RGB生成以仿真光场散焦图像,得到仿真光场RGB图像,包括:The method according to claim 1 or 2, characterized in that the light field gamma corrected RGB image, the light field depth map and the light field simulation point transfer function simulation image are subjected to forward RGB generation to simulate the light field dispersion. Focus image to obtain simulated light field RGB image, including:
    对所述光场深度图进行离散化,得到所述光场深度图对应的二进制深度掩膜;Discretize the light field depth map to obtain a binary depth mask corresponding to the light field depth map;
    将所述光场伽马校正RGB图像与所述二进制深度掩膜相乘,得到对应的光场深度切片RGB图;Multiply the light field gamma corrected RGB image and the binary depth mask to obtain the corresponding light field depth slice RGB image;
    将所述光场深度切片RGB图、所述二进制深度掩膜和所述光场仿真点传输函数仿真图像输入光场非线性成像模型,得到所述仿真光场RGB图像。The light field depth slice RGB image, the binary depth mask and the light field simulation point transfer function simulation image are input into the light field nonlinear imaging model to obtain the simulated light field RGB image.
  4. 根据权利要求1所述的方法,其特征在于,所述利用注意力学习神经网络对所述仿真光场RGB图像进行光场深度估计,得到光场的估计深度图和全聚焦图像,包括:The method of claim 1, wherein the attention learning neural network is used to perform light field depth estimation on the simulated light field RGB image to obtain an estimated depth map and a fully focused image of the light field, including:
    利用所述仿真光场RGB图像和RL迭代与估计算法获得初始光场聚焦序列,利用光场子孔径图像经所述注意力学习神经网络得到所述光场的估计深度图;The simulated light field RGB image and the RL iteration and estimation algorithm are used to obtain the initial light field focusing sequence, and the light field sub-aperture image is used to obtain the estimated depth map of the light field through the attention learning neural network;
    将所述光场子孔径图像与所述初始光场聚焦序列级联,通过编解码网络输出估计的光场重聚焦序列,将所述光场重聚焦序列与连续深度体进行相乘,得到所述全聚焦图像。The light field sub-aperture image is cascaded with the initial light field focusing sequence, the estimated light field refocusing sequence is output through the encoding and decoding network, and the light field refocusing sequence is multiplied by the continuous depth volume to obtain the Fully focused image.
  5. 根据权利要求4所述的方法,其特征在于,在利用注意力学习神经网络对所述仿真 光场RGB图像进行光场深度估计,得到光场的估计深度图和全聚焦图像之后,还包括:The method according to claim 4, characterized in that, after using the attention learning neural network to perform light field depth estimation on the simulated light field RGB image to obtain the estimated depth map and the fully focused image of the light field, it also includes:
    将所述估计深度图与所述全聚焦图像分别与其真值比较,计算损失函数,后向传播误差训练所述注意力学习神经网络参数。The estimated depth map and the fully focused image are compared with their true values respectively, a loss function is calculated, and the error is backpropagated to train the attention learning neural network parameters.
  6. 一种光场深度估计装置,其特征在于,包括:A light field depth estimation device, characterized by including:
    处理模块,用于对光场RGB图像进行伽马校正,生成光场伽马校正RGB图像;A processing module used to perform gamma correction on the light field RGB image and generate a light field gamma corrected RGB image;
    生成模块,用于将所述光场伽马校正RGB图像、光场深度图和光场仿真点传输函数仿真图像进行前向RGB生成以仿真光场散焦图像,得到仿真光场RGB图像;以及A generation module for forward RGB generation of the light field gamma corrected RGB image, light field depth map and light field simulation point transfer function simulation image to simulate the light field defocus image to obtain a simulated light field RGB image; and
    估计模块,用于利用注意力学习神经网络对所述仿真光场RGB图像进行光场深度估计,得到光场的估计深度图和全聚焦图像。An estimation module is used to perform light field depth estimation on the simulated light field RGB image using an attention learning neural network to obtain an estimated depth map and a fully focused image of the light field.
  7. 根据权利要求6所述的装置,其特征在于,还包括:The device according to claim 6, further comprising:
    输出模块,用于在得到所述仿真光场RGB图像之前,根据光场相机参数和点源场传播过程,计算物方点经过主透镜的波函数,利用微透镜阵列的相位调制函数对所述波函数进行调制,采集相机像素对特定频率的分量,根据调制后的波函数和所述特定频率的分量计算点传输函数,并对所述点传输函数进行数值模拟及并采用双线性插值、拟合、对称补全和归一化操作,得到仿真点传输函数图,基于所述仿真点传输函数图对物方深度值进行随机采样,模拟仿真随机采样深度下的仿真点传输函数,得到所述光场仿真点传输函数仿真图像。An output module is used to calculate the wave function of the object point passing through the main lens according to the light field camera parameters and the point source field propagation process before obtaining the simulated light field RGB image, and use the phase modulation function of the microlens array to calculate the The wave function is modulated, the component of the camera pixel to a specific frequency is collected, the point transfer function is calculated based on the modulated wave function and the component of the specific frequency, and the point transfer function is numerically simulated and bilinear interpolation is used. Fitting, symmetry completion and normalization operations are performed to obtain a simulation point transfer function diagram. Based on the simulation point transfer function diagram, the object depth value is randomly sampled, and the simulation point transfer function under the random sampling depth is simulated to obtain the result. The light field simulation point transfer function simulation image is described.
  8. 根据权利要求6或7所述的装置,其特征在于,所述生成模块,进一步用于,对所述光场深度图进行离散化,得到所述光场深度图对应的二进制深度掩膜;将所述光场伽马校正RGB图像与所述二进制深度掩膜相乘,得到对应的光场深度切片RGB图;将所述光场深度切片RGB图、所述二进制深度掩膜和所述光场仿真点传输函数仿真图像输入光场非线性成像模型,得到所述仿真光场RGB图像。The device according to claim 6 or 7, wherein the generating module is further configured to discretize the light field depth map to obtain a binary depth mask corresponding to the light field depth map; The light field gamma corrected RGB image is multiplied by the binary depth mask to obtain the corresponding light field depth slice RGB image; the light field depth slice RGB image, the binary depth mask and the light field are The simulated point transfer function simulated image is input into the light field nonlinear imaging model to obtain the simulated light field RGB image.
  9. 根据权利要求6所述的装置,其特征在于,所述估计模块,进一步用于,利用所述仿真光场RGB图像和RL迭代与估计算法获得初始光场聚焦序列,利用光场子孔径图像经所述注意力学习神经网络得到所述光场的估计深度图;将所述光场子孔径图像与所述初始光场聚焦序列级联,通过编解码网络输出估计的光场重聚焦序列,将所述光场重聚焦序列与连续深度体进行相乘,得到所述全聚焦图像。The device according to claim 6, wherein the estimation module is further configured to obtain an initial light field focus sequence using the simulated light field RGB image and RL iteration and estimation algorithm, and use the light field sub-aperture image to The attention learning neural network obtains the estimated depth map of the light field; the light field sub-aperture image is cascaded with the initial light field focusing sequence, and the estimated light field refocusing sequence is output through the encoding and decoding network, and the light field refocusing sequence is output. The light field refocusing sequence is multiplied by the continuous depth volume to obtain the fully focused image.
  10. 根据权利要求9所述的装置,其特征在于,还包括:The device according to claim 9, further comprising:
    训练模块,用于将所述估计深度图与所述全聚焦图像分别与其真值比较,计算损失函数,后向传播误差训练所述注意力学习神经网络参数。A training module, configured to compare the estimated depth map and the fully focused image with their true values respectively, calculate a loss function, and back propagate errors to train the attention learning neural network parameters.
  11. 一种电子设备,其特征在于,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序,以实现如权利要求1-5任 一项所述的光场深度估计方法。An electronic device, characterized in that it includes: a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the program to implement the method of claim 1 -The light field depth estimation method described in any one of -5.
  12. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行,以用于实现如权利要求1-5任一项所述的光场深度估计方法。A computer-readable storage medium with a computer program stored thereon, characterized in that the program is executed by a processor to implement the light field depth estimation method according to any one of claims 1-5.
PCT/CN2022/091182 2022-04-18 2022-05-06 Light field depth estimation method and apparatus, and electronic device and storage medium WO2023201783A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210404127.4A CN114511605B (en) 2022-04-18 2022-04-18 Light field depth estimation method and device, electronic equipment and storage medium
CN202210404127.4 2022-04-18

Publications (1)

Publication Number Publication Date
WO2023201783A1 true WO2023201783A1 (en) 2023-10-26

Family

ID=81555405

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/091182 WO2023201783A1 (en) 2022-04-18 2022-05-06 Light field depth estimation method and apparatus, and electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN114511605B (en)
WO (1) WO2023201783A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117541501A (en) * 2024-01-09 2024-02-09 清华大学 Scanning light field self-supervision network denoising method and device, electronic equipment and medium
CN117974478A (en) * 2024-04-02 2024-05-03 武汉工程大学 Visible light to near infrared hyperspectral image reconstruction method and system
CN118075590A (en) * 2024-03-22 2024-05-24 四川大学 Achromatic extended depth of field imaging system and imaging method based on multiple virtual lenses

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115190257A (en) * 2022-05-30 2022-10-14 元潼(北京)技术有限公司 CIS system for meta imaging
CN115375827B (en) * 2022-07-21 2023-09-15 荣耀终端有限公司 Illumination estimation method and electronic equipment
CN116016952B (en) * 2022-12-20 2024-05-14 维悟光子(北京)科技有限公司 Training method for image coding and decoding model of optical imaging system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110443882A (en) * 2019-07-05 2019-11-12 清华大学 Light field microscopic three-dimensional method for reconstructing and device based on deep learning algorithm
US20200120328A1 (en) * 2018-10-10 2020-04-16 Avalon Holographics Inc. High-Performance Light Field Display Simulator
CN111127536A (en) * 2019-12-11 2020-05-08 清华大学 Light field multi-plane representation reconstruction method and device based on neural network
CN112102165A (en) * 2020-08-18 2020-12-18 北京航空航天大学 Light field image angular domain super-resolution system and method based on zero sample learning
CN112150526A (en) * 2020-07-27 2020-12-29 浙江大学 Light field image depth estimation method based on depth learning
CN113506336A (en) * 2021-06-30 2021-10-15 上海师范大学 Light field depth prediction method based on convolutional neural network and attention mechanism
CN113554744A (en) * 2021-07-08 2021-10-26 清华大学 Rapid scanning three-dimensional imaging method and device for large-volume scattering sample
CN114092540A (en) * 2021-10-29 2022-02-25 上海师范大学 Attention mechanism-based light field depth estimation method and computer readable medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899870B (en) * 2015-05-15 2017-08-25 清华大学深圳研究生院 The depth estimation method being distributed based on light field data
CN106846463B (en) * 2017-01-13 2020-02-18 清华大学 Microscopic image three-dimensional reconstruction method and system based on deep learning neural network
CN112767466B (en) * 2021-01-20 2022-10-11 大连理工大学 Light field depth estimation method based on multi-mode information

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200120328A1 (en) * 2018-10-10 2020-04-16 Avalon Holographics Inc. High-Performance Light Field Display Simulator
CN110443882A (en) * 2019-07-05 2019-11-12 清华大学 Light field microscopic three-dimensional method for reconstructing and device based on deep learning algorithm
CN111127536A (en) * 2019-12-11 2020-05-08 清华大学 Light field multi-plane representation reconstruction method and device based on neural network
CN112150526A (en) * 2020-07-27 2020-12-29 浙江大学 Light field image depth estimation method based on depth learning
CN112102165A (en) * 2020-08-18 2020-12-18 北京航空航天大学 Light field image angular domain super-resolution system and method based on zero sample learning
CN113506336A (en) * 2021-06-30 2021-10-15 上海师范大学 Light field depth prediction method based on convolutional neural network and attention mechanism
CN113554744A (en) * 2021-07-08 2021-10-26 清华大学 Rapid scanning three-dimensional imaging method and device for large-volume scattering sample
CN114092540A (en) * 2021-10-29 2022-02-25 上海师范大学 Attention mechanism-based light field depth estimation method and computer readable medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHOU, XU ET AL.: "Parameter-Free Gaussian PSF Model for Extended Depth of Field in Brightfield Microscopy", IEEE TRANSACTIONS ON IMAGE PROCESSING, vol. 29, 28 January 2020 (2020-01-28), pages 3227 - 3238, XP011769254, ISSN: 1057-7149, DOI: 10.1109/TIP.2019.2957941 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117541501A (en) * 2024-01-09 2024-02-09 清华大学 Scanning light field self-supervision network denoising method and device, electronic equipment and medium
CN117541501B (en) * 2024-01-09 2024-05-31 清华大学 Scanning light field self-supervision network denoising method and device, electronic equipment and medium
CN118075590A (en) * 2024-03-22 2024-05-24 四川大学 Achromatic extended depth of field imaging system and imaging method based on multiple virtual lenses
CN117974478A (en) * 2024-04-02 2024-05-03 武汉工程大学 Visible light to near infrared hyperspectral image reconstruction method and system

Also Published As

Publication number Publication date
CN114511605A (en) 2022-05-17
CN114511605B (en) 2022-09-02

Similar Documents

Publication Publication Date Title
WO2023201783A1 (en) Light field depth estimation method and apparatus, and electronic device and storage medium
Ranftl et al. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer
Larsson et al. Beyond grobner bases: Basis selection for minimal solvers
US8761534B2 (en) Optimization of plenoptic imaging systems
US9288389B2 (en) Estimation of metrics using a plenoptic imaging system
JP2023545199A (en) Model training method, human body posture detection method, apparatus, device and storage medium
Bergman et al. Deep adaptive lidar: End-to-end optimization of sampling and depth completion at low sampling rates
Senushkin et al. Decoder modulation for indoor depth completion
US11676294B2 (en) Passive and single-viewpoint 3D imaging system
CN115147709B (en) Underwater target three-dimensional reconstruction method based on deep learning
Zheng et al. A simple framework for 3d lensless imaging with programmable masks
US11967096B2 (en) Methods and apparatuses of depth estimation from focus information
CN114119770B (en) Multi-sensor external parameter joint calibration method and system based on deep learning
CN115170429A (en) Deep learning-based depth of field extension method and system for underwater in-situ microscopic imager
Ceruso et al. Relative multiscale deep depth from focus
Singh et al. A systematic review of the methodologies for the processing and enhancement of the underwater images
Hazineh et al. D-flat: A differentiable flat-optics framework for end-to-end metasurface visual sensor design
Favaro et al. Shape and radiance estimation from the information divergence of blurred images
CN114897955B (en) Depth completion method based on micro-geometric propagation
CN103217147A (en) Measurement device and measurement method
CN114399697A (en) Scene self-adaptive target detection method based on moving foreground
Olszewski Hashcc: Lightweight method to improve the quality of the camera-less nerf scene generation
RU2770153C1 (en) METHOD FOR CORRECTING THE DEPTH MEASUREMENT ERROR OF THE ToF CAMERA
Wu et al. Acoustic camera pose refinement using differentiable rendering
WO2023100774A1 (en) Training method, training system, and training program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22938018

Country of ref document: EP

Kind code of ref document: A1