CN113936117B - High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning - Google Patents
High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning Download PDFInfo
- Publication number
- CN113936117B CN113936117B CN202111524515.8A CN202111524515A CN113936117B CN 113936117 B CN113936117 B CN 113936117B CN 202111524515 A CN202111524515 A CN 202111524515A CN 113936117 B CN113936117 B CN 113936117B
- Authority
- CN
- China
- Prior art keywords
- layer
- surface normal
- attention weight
- network
- reconstructed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/30—Polynomial surface description
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Software Systems (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Geometry (AREA)
- Computer Graphics (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
The method comprises the steps of shooting a plurality of images of an object to be reconstructed by using a photometric stereo system, outputting accurate surface normal three-dimensional reconstruction by using a deep learning algorithm, wherein a surface normal generation network is designed to generate the surface normal of the object to be reconstructed from the images and illumination; the attention weight generation network generates an attention weight map of an object to be reconstructed from the image; processing the attention weight loss function pixel by pixel; and then using the trained network for surface normal reconstruction of the photometric stereo image. The invention respectively learns the surface normal and high-frequency information through the proposed surface normal generation network and the attention weight generation network, and trains by using the proposed attention weight loss, thereby improving the reconstruction precision of the surface of a high-frequency region such as a fold edge. Compared with the traditional photometric stereo method, the three-dimensional reconstruction precision is improved, and particularly the details of the surface of an object to be reconstructed are improved.
Description
Technical Field
The invention relates to a high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning, and belongs to the field of multi-degree three-dimensional reconstruction.
Background
The three-dimensional reconstruction algorithm is a very important and basic problem in computer vision, and the photometric stereo algorithm is a high-precision pixel-by-pixel three-dimensional reconstruction method which recovers the normal direction of the surface of an object by utilizing gray scale change clues provided by images in different illumination directions. Photometric stereo has irreplaceable positions in many high-precision three-dimensional reconstruction tasks, and has important application values in the aspects of archaeological exploration, pipeline detection, seabed fine mapping and the like.
However, the existing depth learning-based photometric stereo method has large errors in high-frequency regions of the object surface, such as wrinkles and edges, and the existing method generates blurred three-dimensional reconstruction results in these regions, which are the places where the emphasis is placed and accurate reconstruction is required.
Disclosure of Invention
In view of the above problems, the present invention provides a method for three-dimensional reconstruction of enhanced luminosity in high frequency region based on deep learning, so as to overcome the disadvantages of the prior art.
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized by comprising the following steps of:
1) using a photometric stereo system, taking several images of the object to be reconstructed:
an image of an object to be reconstructed is shot under the irradiation of a single parallel white light source, a Cartesian coordinate system is established by taking the center of the object to be reconstructed as the origin of a coordinate axis, and the position of the white light source is determined by a vector in the Cartesian coordinate systeml = [x, y, z]Represents;
changing the position of the light source to obtain a shot image in another illumination direction; usually, at least 10 or more images under different illumination directions are taken and recorded asm 1 , m 2 , ..., m j ,With the corresponding light source position notedl 1 , l 2 , ...,l j ,jIs a natural number greater than or equal to 10;
2) input using deep learning algorithmsm 1 ,m 2 , ..., m j Andl 1 ,l 2 , ..., l j outputting accurate surface normal three-dimensional reconstruction:
the deep learning algorithm utilized is divided into the following four parts: (1) generating a network by a surface normal method, (2) generating a network by attention weight, (3) performing attention weight loss function joint training, and (4) performing network training; wherein:
(1) the surface normal generating network is designed to generate images fromm 1 ,m 2 , ..., m j And lightLight blockl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructed;
(2) The attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructedP ;
(3) Attention weight lossLIs a loss function of pixel-by-pixel processing, which is determined by the loss of each pixelL k Is obtained by average calculation, and the formula is;p*qAs resolution of the image m,p、q≥2n,n≥4;
Loss per pixel positionL k Comprising two parts, the first part being a gradient loss with a coefficient termL gradient The second component is the normal loss with coefficient termL normal I.e. byL k= P k L gradient +λ(1 – P k ) L normal ;
Wherein the content of the first and second substances,,is normal to the true surface of the object to be reconstructednIn positionkThe gradient of (a) of (b) is,ζis the neighborhood pixel range used in computing the gradient,ζthe setting ranges are 1, 2, 3, 4 and 5,is the predicted surface normalIn positionkA gradient of (a);representing the surface normal of the network prediction,representing the true surface normal;
gradient loss can sharpen high frequency representations of the surface normal in the network;P k for the pixel position on the attention weight mapkA value of (d) above;
secondly, the first step is to carry out the first,● represents the point multiplication operation, λ is a hyper parameter, and the range is set to {7,8,9,10} for the purpose of gradient loss and normal loss;
the (1) surface normal generation network and (2) attention weight generation network can be linked through the (3) attention weight loss;
(4) network training
When the network is trained, continuously adjusting and optimizing by using a back propagation algorithm, minimizing the loss function, and stopping training when the set cycle number is reached so as to achieve the optimal effect; orL normal When the training time is less than 0.03, the training is considered to have reached the most effective result, and the training is stopped;
3) the trained network is used for surface normal reconstruction of photometric stereo images:
firstly, shooting more than s images in different illumination directions, wherein s is more than or equal to 10, and then, shooting the images in different illumination directionsm 1 , m 2 , ..., m s Andl 1 , l 2 , ..., l s inputting the trained network to obtain the predicted surface normal 。
The (1) surface normal generating network is designed to generate images fromm 1 ,m 2 , ..., m j And illumination of lightl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructedThe method comprises the following specific steps:
resolution of image m is notedp*q,p、q≥2nN is not less than 4, thenm∈ℝp*q*3Wherein 3 represents RGB; the surface normal generation network is firstly as followsmResolution ofp*qTo illuminatel = [x, y, z] ∈ℝ3Repeatedly filling to ℝp*q*3In the space (D), the illumination after filling is recorded ashThen, thenh∈ℝp*q*3At this timehAndmhaving the same space size, willhAndmjoin in a third set of dimensions to form a new tensor, which belongs to ℝp*q*6At the input ofjUnder the condition of image and illumination, obtainjA fused tensor;
respectively carrying out 4 layers of convolutional layer operations on the tensors, wherein the sizes of convolutional kernels of convolutional layers 1, 2, 3 and 4 are all 3 x 3, and all the convolutional kernels adopt 'relu' activation functions, wherein the 2 nd layer and the 4 th layer are convolutions with the step length 'stride' of 2, the 1 st layer and the 3 rd layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolutional layers 1, 2, 3 and 4 is respectively 64, 128, 128 and 256;
then, the maximum pooling layer is used to derive from j 4-layered convolved tensors ∈ ℝp/4*q/4*256Pooled into one ℝp/4*q/4*256Tensor of (2);
calculating by convolution layers 5, 6, 7 and 8, wherein the convolution kernels of the convolution layers 5, 6, 7 and 8 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 5 th layer and the 7 th layer are transposition convolutions, the 6 th layer and the 8 th layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolution layers 5, 6, 7 and 8 is 128, 64 and 3;
finally, normalizing the tensor obtained by the 8 th layer of convolution to enable the modulus to be 1, and obtaining the surface normal direction of the object to be reconstructed。
The (2) attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructedPThe method comprises the following specific steps:
attention weight generating network pair imagem∈ℝp*q*3Calculate its gradient value, which also belongs to space ℝp*q*3And the gradient of the image is connected and fused with the image in a third group of dimensions to form a new tensor, wherein the new tensor belongs to ℝp*q*6Under the condition of inputting j images and illumination, j fused tensors are obtained;
firstly, performing convolution layer operations of 3 layers on the fused tensors respectively, wherein the sizes of convolution kernels of the 3 layers are all 3 x 3, and a 'relu' activation function is adopted, wherein the step length 'stride' of the 2 nd layer is 2, the step lengths 'stride' of the 1 st layer and the 3 rd layer are 1, and the number of characteristic channels of the four convolution layers is 64, 128 and 128 respectively;
then, from j 3-layered convolved tensors ∈ ℝ using the max pooling layerp/2*q/2*128Pooled into one ℝp/2*q/2*128Tensor of (2);
and calculating by convolution layers 5, 6 and 7, wherein the convolution kernels of the convolution layers 5, 6 and 7 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 6 th layer is the transposition convolution, the 5 th layer and the 7 th layer are the convolution with the step length 'stride' of 1, the number of characteristic channels of the convolution layers 5, 6 and 7 is 128, 64 and 1, and thus obtaining the attention weight graph of the object to be reconstructedP 。
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in that the resolution ratio of the image mp*qIn (1),pthe values 16, 32, 48, 64,qvalues 16, 32, 48, 64.
The above-mentionedThe high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in thatζIs set to 1.
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in that lambda is set to be 8.
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in that the cycle number is set to be 30 epochs.
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in thatpThe value of the number 32 is taken as the value,qtaking the value of 32.
According to the high-frequency region enhanced photometric stereo three-dimensional reconstruction method based on deep learning, provided by the invention, the network is generated through the surface normal, the network is generated through the attention weight, the surface normal and the high-frequency information are respectively learned, and the training is carried out by utilizing the provided attention weight loss, so that the reconstruction precision of the high-frequency region surface such as the fold edge can be improved. Compared with the traditional photometric stereo method, the three-dimensional reconstruction precision is improved, and particularly the details of the surface of an object to be reconstructed are improved.
The attention weight loss provided by the invention can be applied to various bottom layer vision tasks, the task precision is improved, and the details of the image, such as depth estimation, image deblurring and image defogging, are enriched.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a schematic diagram of the surface normal generation network in step 2).
Fig. 3 is a schematic diagram of the attention weight generation network in step 2).
Fig. 4 is a schematic diagram of the application effect of the present invention, in which a first action is an input image, a second action generates a weighted image, and a third action generates a surface normal.
Detailed Description
As shown in fig. 1, the method for three-dimensional reconstruction of a high-frequency region enhanced luminosity based on deep learning is characterized by comprising the following steps:
1) using a photometric stereo system, taking several images of the object to be reconstructed:
an image of an object to be reconstructed is shot under the irradiation of a single parallel white light source, a Cartesian coordinate system is established by taking the center of the object to be reconstructed as the origin of a coordinate axis, and the position of the white light source is determined by a vector in the Cartesian coordinate systeml = [x, y, z]Represents;
changing the position of the light source to obtain a shot image in another illumination direction; usually, at least 10 or more images under different illumination directions are taken and recorded asm 1 , m 2 , ..., m j ,With the corresponding light source position notedl 1 , l 2 , ...,l j ,jIs a natural number greater than or equal to 10;
2) input using deep learning algorithmsm 1 ,m 2 , ..., m j Andl 1 ,l 2 , ..., l j outputting accurate surface normal three-dimensional reconstruction:
the deep learning algorithm utilized is divided into the following four parts: (1) generating a network by a surface normal method, (2) generating a network by attention weight, (3) performing attention weight loss function joint training, and (4) performing network training;
(1) the surface normal generating network is designed to generate images fromm 1 ,m 2 , ..., m j And illumination of lightl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructed;
Resolution of image m is notedp*q,p、q≥2nN is not less than 4, thenm∈ℝp*q*3Wherein 3 represents RGB; as shown in FIG. 2, the surface normal generation network is first generated according tomResolution ofp*qTo illuminatel = [x, y, z] ∈ℝ3Repeatedly filling to ℝp*q*3In the space (D), the illumination after filling is recorded ashThen, thenh∈ℝp*q*3At this timehAndmhaving the same space size, willhAndmjoin in a third set of dimensions to form a new tensor, which belongs to ℝp*q*6At the input ofjUnder the condition of image and illumination, obtainjA fused tensor;
respectively carrying out 4 layers of convolutional layer operations on the tensors, wherein the sizes of convolutional kernels of convolutional layers 1, 2, 3 and 4 are all 3 x 3, and all the convolutional kernels adopt 'relu' activation functions, wherein the 2 nd layer and the 4 th layer are convolutions with the step length 'stride' of 2, the 1 st layer and the 3 rd layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolutional layers 1, 2, 3 and 4 is respectively 64, 128, 128 and 256;
then, the maximum pooling layer is used to derive from j 4-layered convolved tensors ∈ ℝp/4*q/4*256Pooled into one ℝp/4*q/4*256Tensor of (2);
calculating by convolution layers 5, 6, 7 and 8, wherein the convolution kernels of the convolution layers 5, 6, 7 and 8 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 5 th layer and the 7 th layer are transposition convolutions, the 6 th layer and the 8 th layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolution layers 5, 6, 7 and 8 is 128, 64 and 3;
finally, normalizing the tensor obtained by the 8 th layer of convolution to enable the modulus to be 1, and obtaining the predicted surface normal direction;
(2) The attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructed:
attention weight generating network pair imagem∈ℝp*q*3Calculate its gradient value, which also belongs to space ℝp*q*3And its gradient is fused with the image in a third set of dimensions, as in FIG. 3, intoNew tensor, the new tensor belonging to ℝp*q*6Under the condition of inputting j images and illumination, j fused tensors are obtained;
firstly, performing convolution layer operations of 3 layers on the fused tensors respectively, wherein the sizes of convolution kernels of the 3 layers are all 3 x 3, and a 'relu' activation function is adopted, wherein the step length 'stride' of the 2 nd layer is 2, the step lengths 'stride' of the 1 st layer and the 3 rd layer are 1, and the number of characteristic channels of the four convolution layers is 64, 128 and 128 respectively;
then, from j 3-layered convolved tensors ∈ ℝ using the max pooling layerp/2*q/2*128Pooled into one ℝp/2*q/2*128Tensor of (2);
and calculating by convolution layers 5, 6 and 7, wherein the convolution kernels of the convolution layers 5, 6 and 7 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 6 th layer is the transposition convolution, the 5 th layer and the 7 th layer are the convolution with the step length 'stride' of 1, the number of characteristic channels of the convolution layers 5, 6 and 7 is 128, 64 and 1, and thus obtaining the attention weight graph of the object to be reconstructedP ;
(3) Attention weight lossLIs a loss function of pixel-by-pixel processing, which is determined by the loss of each pixelL k Is obtained by average calculation, and the formula is;
Loss per pixel positionL k Comprising two parts, the first part being a gradient loss with a coefficient termL gradient The second component is the normal loss with coefficient termL normal I.e. byL k= P k L gradient +λ(1 – P k ) L normal ;
is normal to the true surface of the object to be reconstructednIn positionkThe gradient of (a) of (b) is,ζis the neighborhood pixel range used in computing the gradient,ζthe setting ranges are 1, 2, 3, 4, 5, the default setting in the invention is 1,is the predicted surface normalIn positionkA gradient of (a);representing the surface normal of the network prediction,representing the true surface normal;
gradient loss can sharpen high frequency representations of the surface normal in the network;P k for the pixel position on the attention weight mapkThe value of (A) is a loss of attention weight on a pixel-by-pixel basisL k Providing a first gradient loss componentL gradient Where the attention weight value is large, the weight of the gradient loss is large;
secondly, the first step is to carry out the first,● represents a point-by-point operation, λ is a hyper-parameter, which is set to 8 herein for the purpose of gradient loss and normal loss; generally, the setting can be {7,8,9,10}, and when 8 is taken, a better effect can be obtained;
the (1) surface normal generation network and (2) attention weight generation network can be linked through the (3) attention weight loss;
(4) network training
When training the network, constantly adjusting and optimizing by using a back propagation algorithm to minimize the lossA function, which stops training at the time of reaching 30 epochs (cycles) to achieve the optimal effect; orL normal When the training time is less than 0.03, the training is considered to have reached the most effective result, and the training is stopped;
in the invention, the training of the network is finished after 30 epochs, and the training is considered to have achieved the optimal effect at the moment;
(5) the trained network is used for surface normal reconstruction of photometric stereo images:
first shootingsThe images with different illumination directions are displayed,snot less than 10, mixing 1 , m 2 , ..., m s And l 1 , l 2 , ..., l s Inputting the trained network to obtain the predicted surface normal。
Whereinp,qE {16, 32, 48, 64}, λ e {7,8,910}, ζ can be 1, 2, 3, 4, 5.
Claims (8)
1. The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized by comprising the following steps of:
1) using a photometric stereo system, taking several images of the object to be reconstructed:
an image of an object to be reconstructed is shot under the irradiation of a single parallel white light source, a Cartesian coordinate system is established by taking the center of the object to be reconstructed as the origin of a coordinate axis, and the position of the white light source is determined by a vector in the Cartesian coordinate systeml = [x, y, z]Represents;
by changing the position of the light source, a shot is taken in another direction of illuminationTaking an image; usually, at least 10 or more images under different illumination directions are taken and recorded asm 1 , m 2 , ..., m j ,With the corresponding light source position notedl 1 , l 2 , ..., l j ,jIs a natural number greater than or equal to 10;
2) input using deep learning algorithmsm 1 ,m 2 , ..., m j Andl 1 ,l 2 , ..., l j outputting accurate surface normal three-dimensional reconstruction:
the deep learning algorithm utilized is divided into the following four parts: (1) generating a network by a surface normal method, (2) generating a network by attention weight, (3) performing attention weight loss function joint training, and (4) performing network training; wherein:
(1) the surface normal generating network is designed to generate images fromm 1 ,m 2 , ..., m j And illumination of lightl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructed;
(2) The attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructedP ;
(3) Attention weight lossLIs a loss function of pixel-by-pixel processing, which is determined by the loss of each pixelL k Is obtained by average calculation, and the formula is; p*qAs resolution of the image m,p、q≥2n,n≥4;
Loss per pixel positionL k Comprising two parts, the first part being a gradient loss with a coefficient termL gradient The second component is the normal loss with coefficient termL normal I.e. byL k= P k L gradient +λ(1 – P k ) L normal ;
Wherein the content of the first and second substances,;is normal to the true surface of the object to be reconstructednIn positionkA gradient of (a);
ζis the neighborhood pixel range used in computing the gradient,ζsetting ranges of 1, 2, 3, 4 and 5;is the predicted surface normalIn positionkA gradient of (a);
P k for the pixel position on the attention weight mapkA value of (d) above;
secondly, the first step is to carry out the first,● represents the dot product operation, λ is a super parameter, and the setting range is {7,8,9,10 };
the (1) surface normal generation network and (2) attention weight generation network can be linked through the (3) attention weight loss;
(4) network training
When the network is trained, continuously adjusting and optimizing by using a back propagation algorithm, minimizing the loss function, and stopping training when the set cycle number is reached so as to achieve the optimal effect; orL normal When the training time is less than 0.03, the training is considered to have reached the most effective result, and the training is stopped;
3) the trained network is used for surface normal reconstruction of photometric stereo images:
2. The deep learning based high frequency region enhanced photometric stereo three dimensional reconstruction method according to claim 1 wherein (1) the surface normal generation network is designed to generate from the imagem 1 ,m 2 , ..., m j And illumination of lightl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructedThe method comprises the following specific steps:
resolution of image m is notedp*q,p、q≥2nN is not less than 4, thenm∈ℝp*q*3Wherein 3 represents RGB; the surface normal generation network is firstly as followsmResolution ofp*qTo illuminatel = [x, y, z] ∈ℝ3Repeatedly filling to ℝp*q*3In the space (D), the illumination after filling is recorded ashThen, thenh∈ℝp*q*3At this timehAndmhaving the same space size, willhAndmjoin in a third set of dimensions to form a new tensor, which belongs to ℝp*q*6At the input ofjUnder the condition of image and illumination, obtainjA fused tensor;
respectively carrying out 4 layers of convolutional layer operations on the tensors, wherein the sizes of convolutional kernels of convolutional layers 1, 2, 3 and 4 are all 3 x 3, and all the convolutional kernels adopt 'relu' activation functions, wherein the 2 nd layer and the 4 th layer are convolutions with the step length 'stride' of 2, the 1 st layer and the 3 rd layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolutional layers 1, 2, 3 and 4 is respectively 64, 128, 128 and 256;
then, the maximum pooling layer is used to derive from j 4-layered convolved tensors ∈ ℝp/4*q/4*256Pooled into one ℝp/4*q/4*256Tensor of (2);
calculating by convolution layers 5, 6, 7 and 8, wherein the convolution kernels of the convolution layers 5, 6, 7 and 8 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 5 th layer and the 7 th layer are transposition convolutions, the 6 th layer and the 8 th layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolution layers 5, 6, 7 and 8 is 128, 64 and 3;
3. The deep learning-based high-frequency region-enhanced photometric stereo three-dimensional reconstruction method according to claim 1, wherein (2) the attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructedPThe method comprises the following specific steps:
attention weight generating network pair imagem∈ℝp*q*3Calculate its gradient value, which also belongs to space ℝp*q*3And the gradient of the image is connected and fused with the image in a third group of dimensions to form a new tensor, wherein the new tensor belongs to ℝp*q*6Under the condition of inputting j images and illumination, j fused tensors are obtained;
firstly, performing convolution layer operation on 3 layers of fused tensors respectively, wherein the sizes of convolution kernels of the 3 layers are all 3 x 3, and a 'relu' activation function is adopted, wherein the step length 'stride' of the 2 nd layer is 2, the step lengths 'stride' of the 1 st layer and the 3 rd layer are 1, and the number of characteristic channels of the 3 convolution layers is 64, 128 and 128 respectively;
then, from j 3-layered convolved tensors ∈ ℝ using the max pooling layerp/2*q/2*128Pooled into one ℝp/2*q/2*128Tensor of (2);
and calculating by convolution layers 5, 6 and 7, wherein the convolution kernels of the convolution layers 5, 6 and 7 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 6 th layer is the transposition convolution, the 5 th layer and the 7 th layer are the convolution with the step length 'stride' of 1, the number of characteristic channels of the convolution layers 5, 6 and 7 is 128, 64 and 1, and thus obtaining the attention weight graph of the object to be reconstructedP 。
4. The deep learning based high frequency region enhanced photometric stereo three dimensional reconstruction method according to claim 1 wherein the resolution of the image m isp*qIn (1),pthe values 16, 32, 48, 64,qvalues 16, 32, 48, 64.
5. The deep learning based high frequency region enhanced photometric stereo three dimensional reconstruction method according to claim 1, wherein the method is characterized in thatζIs set to 1.
6. The deep learning based high frequency region enhanced photometric stereo three dimensional reconstruction method according to claim 1 wherein λ is set to 8.
7. The deep learning-based high-frequency region enhanced photometric stereo three-dimensional reconstruction method according to claim 1, wherein the number of cycles is set to 30 epochs.
8. The deep learning-based high-frequency region enhanced photometric stereo three-dimensional reconstruction method according to claim 4, wherein the method is characterized in thatpThe value of the number 32 is taken as the value,qtaking the value of 32.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111524515.8A CN113936117B (en) | 2021-12-14 | 2021-12-14 | High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111524515.8A CN113936117B (en) | 2021-12-14 | 2021-12-14 | High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113936117A CN113936117A (en) | 2022-01-14 |
CN113936117B true CN113936117B (en) | 2022-03-08 |
Family
ID=79288969
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111524515.8A Active CN113936117B (en) | 2021-12-14 | 2021-12-14 | High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113936117B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115098563B (en) * | 2022-07-14 | 2022-11-11 | 中国海洋大学 | Time sequence abnormity detection method and system based on GCN and attention VAE |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107862741A (en) * | 2017-12-10 | 2018-03-30 | 中国海洋大学 | A kind of single-frame images three-dimensional reconstruction apparatus and method based on deep learning |
CN109146934A (en) * | 2018-06-04 | 2019-01-04 | 成都通甲优博科技有限责任公司 | A kind of face three-dimensional rebuilding method and system based on binocular solid and photometric stereo |
CN110060212A (en) * | 2019-03-19 | 2019-07-26 | 中国海洋大学 | A kind of multispectral photometric stereo surface normal restoration methods based on deep learning |
CN113538675A (en) * | 2021-06-30 | 2021-10-22 | 同济人工智能研究院(苏州)有限公司 | Neural network for calculating attention weight for laser point cloud and training method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108510573B (en) * | 2018-04-03 | 2021-07-30 | 南京大学 | Multi-view face three-dimensional model reconstruction method based on deep learning |
EP4100925A4 (en) * | 2020-02-03 | 2024-03-06 | Nanotronics Imaging, Inc. | Deep photometric learning (dpl) systems, apparatus and methods |
CN113762358B (en) * | 2021-08-18 | 2024-05-14 | 江苏大学 | Semi-supervised learning three-dimensional reconstruction method based on relative depth training |
-
2021
- 2021-12-14 CN CN202111524515.8A patent/CN113936117B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107862741A (en) * | 2017-12-10 | 2018-03-30 | 中国海洋大学 | A kind of single-frame images three-dimensional reconstruction apparatus and method based on deep learning |
CN109146934A (en) * | 2018-06-04 | 2019-01-04 | 成都通甲优博科技有限责任公司 | A kind of face three-dimensional rebuilding method and system based on binocular solid and photometric stereo |
CN110060212A (en) * | 2019-03-19 | 2019-07-26 | 中国海洋大学 | A kind of multispectral photometric stereo surface normal restoration methods based on deep learning |
CN113538675A (en) * | 2021-06-30 | 2021-10-22 | 同济人工智能研究院(苏州)有限公司 | Neural network for calculating attention weight for laser point cloud and training method |
Non-Patent Citations (2)
Title |
---|
A Constrained Independent Component Analysis Based Photometric Stereo for 3D Human Face Reconstruction;Cheng-Jian Lin等;《2012 International Symposium on Computer, Consumer and Control》;20120702;710-712 * |
深度学习在基于单幅图像的物体三维重建中的应用;陈加等;《自动化学报》;20181128(第04期);23-34 * |
Also Published As
Publication number | Publication date |
---|---|
CN113936117A (en) | 2022-01-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021232687A1 (en) | Deep learning-based point cloud upsampling method | |
CN106355570B (en) | A kind of binocular stereo vision matching method of combination depth characteristic | |
Liu et al. | Exemplar-based image inpainting using multiscale graph cuts | |
CN111627019A (en) | Liver tumor segmentation method and system based on convolutional neural network | |
CN112634149B (en) | Point cloud denoising method based on graph convolution network | |
Pottmann et al. | The isophotic metric and its application to feature sensitive morphology on surfaces | |
CN112348959A (en) | Adaptive disturbance point cloud up-sampling method based on deep learning | |
CN113962858A (en) | Multi-view depth acquisition method | |
CN113936117B (en) | High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning | |
Sulc et al. | Reflection Separation in Light Fields based on Sparse Coding and Specular Flow. | |
CN107103610A (en) | Stereo mapping satellite image matches suspicious region automatic testing method | |
CN114549669B (en) | Color three-dimensional point cloud acquisition method based on image fusion technology | |
Shen et al. | 3D shape reconstruction from images in the frequency domain | |
CN112991504B (en) | Improved hole filling method based on TOF camera three-dimensional reconstruction | |
CN115631223A (en) | Multi-view stereo reconstruction method based on self-adaptive learning and aggregation | |
JP2856661B2 (en) | Density converter | |
Gallardo et al. | Using Shading and a 3D Template to Reconstruct Complex Surface Deformations. | |
CN115239559A (en) | Depth map super-resolution method and system for fusion view synthesis | |
CN114972937A (en) | Feature point detection and descriptor generation method based on deep learning | |
US20220172421A1 (en) | Enhancement of Three-Dimensional Facial Scans | |
Gong et al. | Multi-view stereo point clouds visualization | |
Tabb et al. | Camera calibration correction in shape from inconsistent silhouette | |
Schouten et al. | Timed fast exact euclidean distance (tfeed) maps | |
US20230177722A1 (en) | Apparatus and method with object posture estimating | |
JP7508673B2 (en) | Computer vision method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |