CN113936117B - High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning - Google Patents

High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning Download PDF

Info

Publication number
CN113936117B
CN113936117B CN202111524515.8A CN202111524515A CN113936117B CN 113936117 B CN113936117 B CN 113936117B CN 202111524515 A CN202111524515 A CN 202111524515A CN 113936117 B CN113936117 B CN 113936117B
Authority
CN
China
Prior art keywords
layer
surface normal
attention weight
network
reconstructed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111524515.8A
Other languages
Chinese (zh)
Other versions
CN113936117A (en
Inventor
举雅琨
董军宇
高峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202111524515.8A priority Critical patent/CN113936117B/en
Publication of CN113936117A publication Critical patent/CN113936117A/en
Application granted granted Critical
Publication of CN113936117B publication Critical patent/CN113936117B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/30Polynomial surface description
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Software Systems (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The method comprises the steps of shooting a plurality of images of an object to be reconstructed by using a photometric stereo system, outputting accurate surface normal three-dimensional reconstruction by using a deep learning algorithm, wherein a surface normal generation network is designed to generate the surface normal of the object to be reconstructed from the images and illumination; the attention weight generation network generates an attention weight map of an object to be reconstructed from the image; processing the attention weight loss function pixel by pixel; and then using the trained network for surface normal reconstruction of the photometric stereo image. The invention respectively learns the surface normal and high-frequency information through the proposed surface normal generation network and the attention weight generation network, and trains by using the proposed attention weight loss, thereby improving the reconstruction precision of the surface of a high-frequency region such as a fold edge. Compared with the traditional photometric stereo method, the three-dimensional reconstruction precision is improved, and particularly the details of the surface of an object to be reconstructed are improved.

Description

High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning
Technical Field
The invention relates to a high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning, and belongs to the field of multi-degree three-dimensional reconstruction.
Background
The three-dimensional reconstruction algorithm is a very important and basic problem in computer vision, and the photometric stereo algorithm is a high-precision pixel-by-pixel three-dimensional reconstruction method which recovers the normal direction of the surface of an object by utilizing gray scale change clues provided by images in different illumination directions. Photometric stereo has irreplaceable positions in many high-precision three-dimensional reconstruction tasks, and has important application values in the aspects of archaeological exploration, pipeline detection, seabed fine mapping and the like.
However, the existing depth learning-based photometric stereo method has large errors in high-frequency regions of the object surface, such as wrinkles and edges, and the existing method generates blurred three-dimensional reconstruction results in these regions, which are the places where the emphasis is placed and accurate reconstruction is required.
Disclosure of Invention
In view of the above problems, the present invention provides a method for three-dimensional reconstruction of enhanced luminosity in high frequency region based on deep learning, so as to overcome the disadvantages of the prior art.
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized by comprising the following steps of:
1) using a photometric stereo system, taking several images of the object to be reconstructed:
an image of an object to be reconstructed is shot under the irradiation of a single parallel white light source, a Cartesian coordinate system is established by taking the center of the object to be reconstructed as the origin of a coordinate axis, and the position of the white light source is determined by a vector in the Cartesian coordinate systeml = [x, y, z]Represents;
changing the position of the light source to obtain a shot image in another illumination direction; usually, at least 10 or more images under different illumination directions are taken and recorded asm 1 , m 2 , ..., m j With the corresponding light source position notedl 1 , l 2 , ...,l j jIs a natural number greater than or equal to 10;
2) input using deep learning algorithmsm 1 ,m 2 , ..., m j Andl 1 ,l 2 , ..., l j outputting accurate surface normal three-dimensional reconstruction:
the deep learning algorithm utilized is divided into the following four parts: (1) generating a network by a surface normal method, (2) generating a network by attention weight, (3) performing attention weight loss function joint training, and (4) performing network training; wherein:
(1) the surface normal generating network is designed to generate images fromm 1 ,m 2 , ..., m j And lightLight blockl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructed
Figure DEST_PATH_IMAGE001
(2) The attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructedP
(3) Attention weight lossLIs a loss function of pixel-by-pixel processing, which is determined by the loss of each pixelL k Is obtained by average calculation, and the formula is
Figure DEST_PATH_IMAGE002
p*qAs resolution of the image m,p、q≥2n,n≥4;
Loss per pixel positionL k Comprising two parts, the first part being a gradient loss with a coefficient termL gradient The second component is the normal loss with coefficient termL normal I.e. byL k= P k L gradient +λ(1 – P k ) L normal
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE003
Figure DEST_PATH_IMAGE004
is normal to the true surface of the object to be reconstructednIn positionkThe gradient of (a) of (b) is,ζis the neighborhood pixel range used in computing the gradient,ζthe setting ranges are 1, 2, 3, 4 and 5,
Figure DEST_PATH_IMAGE005
is the predicted surface normal
Figure DEST_PATH_IMAGE006
In positionkA gradient of (a);
Figure DEST_PATH_IMAGE007
representing the surface normal of the network prediction,
Figure DEST_PATH_IMAGE008
representing the true surface normal;
gradient loss can sharpen high frequency representations of the surface normal in the network;P k for the pixel position on the attention weight mapkA value of (d) above;
secondly, the first step is to carry out the first,
Figure DEST_PATH_IMAGE009
● represents the point multiplication operation, λ is a hyper parameter, and the range is set to {7,8,9,10} for the purpose of gradient loss and normal loss;
the (1) surface normal generation network and (2) attention weight generation network can be linked through the (3) attention weight loss;
(4) network training
When the network is trained, continuously adjusting and optimizing by using a back propagation algorithm, minimizing the loss function, and stopping training when the set cycle number is reached so as to achieve the optimal effect; orL normal When the training time is less than 0.03, the training is considered to have reached the most effective result, and the training is stopped;
3) the trained network is used for surface normal reconstruction of photometric stereo images:
firstly, shooting more than s images in different illumination directions, wherein s is more than or equal to 10, and then, shooting the images in different illumination directionsm 1 , m 2 , ..., m s Andl 1 , l 2 , ..., l s inputting the trained network to obtain the predicted surface normal
Figure 273068DEST_PATH_IMAGE001
The (1) surface normal generating network is designed to generate images fromm 1 ,m 2 , ..., m j And illumination of lightl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructed
Figure 799995DEST_PATH_IMAGE001
The method comprises the following specific steps:
resolution of image m is notedp*q,p、q≥2nN is not less than 4, thenm∈ℝp*q*3Wherein 3 represents RGB; the surface normal generation network is firstly as followsmResolution ofp*qTo illuminatel = [x, y, z] ∈ℝ3Repeatedly filling to ℝp*q*3In the space (D), the illumination after filling is recorded ashThen, thenh∈ℝp*q*3At this timehAndmhaving the same space size, willhAndmjoin in a third set of dimensions to form a new tensor, which belongs to ℝp*q*6At the input ofjUnder the condition of image and illumination, obtainjA fused tensor;
respectively carrying out 4 layers of convolutional layer operations on the tensors, wherein the sizes of convolutional kernels of convolutional layers 1, 2, 3 and 4 are all 3 x 3, and all the convolutional kernels adopt 'relu' activation functions, wherein the 2 nd layer and the 4 th layer are convolutions with the step length 'stride' of 2, the 1 st layer and the 3 rd layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolutional layers 1, 2, 3 and 4 is respectively 64, 128, 128 and 256;
then, the maximum pooling layer is used to derive from j 4-layered convolved tensors ∈ ℝp/4*q/4*256Pooled into one ℝp/4*q/4*256Tensor of (2);
calculating by convolution layers 5, 6, 7 and 8, wherein the convolution kernels of the convolution layers 5, 6, 7 and 8 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 5 th layer and the 7 th layer are transposition convolutions, the 6 th layer and the 8 th layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolution layers 5, 6, 7 and 8 is 128, 64 and 3;
finally, normalizing the tensor obtained by the 8 th layer of convolution to enable the modulus to be 1, and obtaining the surface normal direction of the object to be reconstructed
Figure 487722DEST_PATH_IMAGE001
The (2) attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructedPThe method comprises the following specific steps:
attention weight generating network pair imagem∈ℝp*q*3Calculate its gradient value, which also belongs to space ℝp*q*3And the gradient of the image is connected and fused with the image in a third group of dimensions to form a new tensor, wherein the new tensor belongs to ℝp*q*6Under the condition of inputting j images and illumination, j fused tensors are obtained;
firstly, performing convolution layer operations of 3 layers on the fused tensors respectively, wherein the sizes of convolution kernels of the 3 layers are all 3 x 3, and a 'relu' activation function is adopted, wherein the step length 'stride' of the 2 nd layer is 2, the step lengths 'stride' of the 1 st layer and the 3 rd layer are 1, and the number of characteristic channels of the four convolution layers is 64, 128 and 128 respectively;
then, from j 3-layered convolved tensors ∈ ℝ using the max pooling layerp/2*q/2*128Pooled into one ℝp/2*q/2*128Tensor of (2);
and calculating by convolution layers 5, 6 and 7, wherein the convolution kernels of the convolution layers 5, 6 and 7 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 6 th layer is the transposition convolution, the 5 th layer and the 7 th layer are the convolution with the step length 'stride' of 1, the number of characteristic channels of the convolution layers 5, 6 and 7 is 128, 64 and 1, and thus obtaining the attention weight graph of the object to be reconstructedP
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in that the resolution ratio of the image mp*qIn (1),pthe values 16, 32, 48, 64,qvalues 16, 32, 48, 64.
The above-mentionedThe high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in thatζIs set to 1.
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in that lambda is set to be 8.
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in that the cycle number is set to be 30 epochs.
The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized in thatpThe value of the number 32 is taken as the value,qtaking the value of 32.
According to the high-frequency region enhanced photometric stereo three-dimensional reconstruction method based on deep learning, provided by the invention, the network is generated through the surface normal, the network is generated through the attention weight, the surface normal and the high-frequency information are respectively learned, and the training is carried out by utilizing the provided attention weight loss, so that the reconstruction precision of the high-frequency region surface such as the fold edge can be improved. Compared with the traditional photometric stereo method, the three-dimensional reconstruction precision is improved, and particularly the details of the surface of an object to be reconstructed are improved.
The attention weight loss provided by the invention can be applied to various bottom layer vision tasks, the task precision is improved, and the details of the image, such as depth estimation, image deblurring and image defogging, are enriched.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a schematic diagram of the surface normal generation network in step 2).
Fig. 3 is a schematic diagram of the attention weight generation network in step 2).
Fig. 4 is a schematic diagram of the application effect of the present invention, in which a first action is an input image, a second action generates a weighted image, and a third action generates a surface normal.
Detailed Description
As shown in fig. 1, the method for three-dimensional reconstruction of a high-frequency region enhanced luminosity based on deep learning is characterized by comprising the following steps:
1) using a photometric stereo system, taking several images of the object to be reconstructed:
an image of an object to be reconstructed is shot under the irradiation of a single parallel white light source, a Cartesian coordinate system is established by taking the center of the object to be reconstructed as the origin of a coordinate axis, and the position of the white light source is determined by a vector in the Cartesian coordinate systeml = [x, y, z]Represents;
changing the position of the light source to obtain a shot image in another illumination direction; usually, at least 10 or more images under different illumination directions are taken and recorded asm 1 , m 2 , ..., m j With the corresponding light source position notedl 1 , l 2 , ...,l j jIs a natural number greater than or equal to 10;
2) input using deep learning algorithmsm 1 ,m 2 , ..., m j Andl 1 ,l 2 , ..., l j outputting accurate surface normal three-dimensional reconstruction:
the deep learning algorithm utilized is divided into the following four parts: (1) generating a network by a surface normal method, (2) generating a network by attention weight, (3) performing attention weight loss function joint training, and (4) performing network training;
(1) the surface normal generating network is designed to generate images fromm 1 ,m 2 , ..., m j And illumination of lightl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructed
Figure 164822DEST_PATH_IMAGE001
Resolution of image m is notedp*q,p、q≥2nN is not less than 4, thenm∈ℝp*q*3Wherein 3 represents RGB; as shown in FIG. 2, the surface normal generation network is first generated according tomResolution ofp*qTo illuminatel = [x, y, z] ∈ℝ3Repeatedly filling to ℝp*q*3In the space (D), the illumination after filling is recorded ashThen, thenh∈ℝp*q*3At this timehAndmhaving the same space size, willhAndmjoin in a third set of dimensions to form a new tensor, which belongs to ℝp*q*6At the input ofjUnder the condition of image and illumination, obtainjA fused tensor;
respectively carrying out 4 layers of convolutional layer operations on the tensors, wherein the sizes of convolutional kernels of convolutional layers 1, 2, 3 and 4 are all 3 x 3, and all the convolutional kernels adopt 'relu' activation functions, wherein the 2 nd layer and the 4 th layer are convolutions with the step length 'stride' of 2, the 1 st layer and the 3 rd layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolutional layers 1, 2, 3 and 4 is respectively 64, 128, 128 and 256;
then, the maximum pooling layer is used to derive from j 4-layered convolved tensors ∈ ℝp/4*q/4*256Pooled into one ℝp/4*q/4*256Tensor of (2);
calculating by convolution layers 5, 6, 7 and 8, wherein the convolution kernels of the convolution layers 5, 6, 7 and 8 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 5 th layer and the 7 th layer are transposition convolutions, the 6 th layer and the 8 th layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolution layers 5, 6, 7 and 8 is 128, 64 and 3;
finally, normalizing the tensor obtained by the 8 th layer of convolution to enable the modulus to be 1, and obtaining the predicted surface normal direction
Figure 884254DEST_PATH_IMAGE001
(2) The attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructed:
attention weight generating network pair imagem∈ℝp*q*3Calculate its gradient value, which also belongs to space ℝp*q*3And its gradient is fused with the image in a third set of dimensions, as in FIG. 3, intoNew tensor, the new tensor belonging to ℝp*q*6Under the condition of inputting j images and illumination, j fused tensors are obtained;
firstly, performing convolution layer operations of 3 layers on the fused tensors respectively, wherein the sizes of convolution kernels of the 3 layers are all 3 x 3, and a 'relu' activation function is adopted, wherein the step length 'stride' of the 2 nd layer is 2, the step lengths 'stride' of the 1 st layer and the 3 rd layer are 1, and the number of characteristic channels of the four convolution layers is 64, 128 and 128 respectively;
then, from j 3-layered convolved tensors ∈ ℝ using the max pooling layerp/2*q/2*128Pooled into one ℝp/2*q/2*128Tensor of (2);
and calculating by convolution layers 5, 6 and 7, wherein the convolution kernels of the convolution layers 5, 6 and 7 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 6 th layer is the transposition convolution, the 5 th layer and the 7 th layer are the convolution with the step length 'stride' of 1, the number of characteristic channels of the convolution layers 5, 6 and 7 is 128, 64 and 1, and thus obtaining the attention weight graph of the object to be reconstructedP
(3) Attention weight lossLIs a loss function of pixel-by-pixel processing, which is determined by the loss of each pixelL k Is obtained by average calculation, and the formula is
Figure 280950DEST_PATH_IMAGE002
Loss per pixel positionL k Comprising two parts, the first part being a gradient loss with a coefficient termL gradient The second component is the normal loss with coefficient termL normal I.e. byL k= P k L gradient +λ(1 – P k ) L normal
Wherein the content of the first and second substances,
Figure 564295DEST_PATH_IMAGE003
Figure 668255DEST_PATH_IMAGE004
is normal to the true surface of the object to be reconstructednIn positionkThe gradient of (a) of (b) is,ζis the neighborhood pixel range used in computing the gradient,ζthe setting ranges are 1, 2, 3, 4, 5, the default setting in the invention is 1,
Figure 386069DEST_PATH_IMAGE005
is the predicted surface normal
Figure 989220DEST_PATH_IMAGE006
In positionkA gradient of (a);
Figure 648609DEST_PATH_IMAGE007
representing the surface normal of the network prediction,
Figure 667512DEST_PATH_IMAGE008
representing the true surface normal;
gradient loss can sharpen high frequency representations of the surface normal in the network;P k for the pixel position on the attention weight mapkThe value of (A) is a loss of attention weight on a pixel-by-pixel basisL k Providing a first gradient loss componentL gradient Where the attention weight value is large, the weight of the gradient loss is large;
secondly, the first step is to carry out the first,
Figure 358606DEST_PATH_IMAGE009
● represents a point-by-point operation, λ is a hyper-parameter, which is set to 8 herein for the purpose of gradient loss and normal loss; generally, the setting can be {7,8,9,10}, and when 8 is taken, a better effect can be obtained;
the (1) surface normal generation network and (2) attention weight generation network can be linked through the (3) attention weight loss;
(4) network training
When training the network, constantly adjusting and optimizing by using a back propagation algorithm to minimize the lossA function, which stops training at the time of reaching 30 epochs (cycles) to achieve the optimal effect; orL normal When the training time is less than 0.03, the training is considered to have reached the most effective result, and the training is stopped;
in the invention, the training of the network is finished after 30 epochs, and the training is considered to have achieved the optimal effect at the moment;
(5) the trained network is used for surface normal reconstruction of photometric stereo images:
first shootingsThe images with different illumination directions are displayed,snot less than 10, mixing 1 , m 2 , ..., m s And l 1 , l 2 , ..., l s Inputting the trained network to obtain the predicted surface normal
Figure 568877DEST_PATH_IMAGE001
WhereinpqE {16, 32, 48, 64}, λ e {7,8,910}, ζ can be 1, 2, 3, 4, 5.
The reconstruction effect is shown in fig. 4. The first row represents the image taken of the object to be reconstructed, the second row represents the generated attention weight map P, and the third row represents the generated surface normal
Figure 217027DEST_PATH_IMAGE001

Claims (8)

1. The high-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning is characterized by comprising the following steps of:
1) using a photometric stereo system, taking several images of the object to be reconstructed:
an image of an object to be reconstructed is shot under the irradiation of a single parallel white light source, a Cartesian coordinate system is established by taking the center of the object to be reconstructed as the origin of a coordinate axis, and the position of the white light source is determined by a vector in the Cartesian coordinate systeml = [x, y, z]Represents;
by changing the position of the light source, a shot is taken in another direction of illuminationTaking an image; usually, at least 10 or more images under different illumination directions are taken and recorded asm 1 , m 2 , ..., m j With the corresponding light source position notedl 1 , l 2 , ..., l j jIs a natural number greater than or equal to 10;
2) input using deep learning algorithmsm 1 ,m 2 , ..., m j Andl 1 ,l 2 , ..., l j outputting accurate surface normal three-dimensional reconstruction:
the deep learning algorithm utilized is divided into the following four parts: (1) generating a network by a surface normal method, (2) generating a network by attention weight, (3) performing attention weight loss function joint training, and (4) performing network training; wherein:
(1) the surface normal generating network is designed to generate images fromm 1 ,m 2 , ..., m j And illumination of lightl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructed
Figure 427731DEST_PATH_IMAGE001
(2) The attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructedP
(3) Attention weight lossLIs a loss function of pixel-by-pixel processing, which is determined by the loss of each pixelL k Is obtained by average calculation, and the formula is
Figure 282554DEST_PATH_IMAGE002
p*qAs resolution of the image m,p、q≥2n,n≥4;
Loss per pixel positionL k Comprising two parts, the first part being a gradient loss with a coefficient termL gradient The second component is the normal loss with coefficient termL normal I.e. byL k= P k L gradient +λ(1 – P k ) L normal
Wherein the content of the first and second substances,
Figure 718084DEST_PATH_IMAGE003
Figure 801708DEST_PATH_IMAGE004
is normal to the true surface of the object to be reconstructednIn positionkA gradient of (a);
ζis the neighborhood pixel range used in computing the gradient,ζsetting ranges of 1, 2, 3, 4 and 5;
Figure 22605DEST_PATH_IMAGE005
is the predicted surface normal
Figure 235281DEST_PATH_IMAGE006
In positionkA gradient of (a);
Figure 659571DEST_PATH_IMAGE007
representing the surface normal of the network prediction,
Figure 514264DEST_PATH_IMAGE008
representing the true surface normal;
P k for the pixel position on the attention weight mapkA value of (d) above;
secondly, the first step is to carry out the first,
Figure 386405DEST_PATH_IMAGE009
● represents the dot product operation, λ is a super parameter, and the setting range is {7,8,9,10 };
the (1) surface normal generation network and (2) attention weight generation network can be linked through the (3) attention weight loss;
(4) network training
When the network is trained, continuously adjusting and optimizing by using a back propagation algorithm, minimizing the loss function, and stopping training when the set cycle number is reached so as to achieve the optimal effect; orL normal When the training time is less than 0.03, the training is considered to have reached the most effective result, and the training is stopped;
3) the trained network is used for surface normal reconstruction of photometric stereo images:
firstly, shooting more than s images in different illumination directions, wherein s is more than or equal to 10, and then, shooting the images in different illumination directionsm 1 , m 2 , ..., m s Andl 1 , l 2 , ..., l s inputting the trained network to obtain the predicted surface normal
Figure 68184DEST_PATH_IMAGE006
2. The deep learning based high frequency region enhanced photometric stereo three dimensional reconstruction method according to claim 1 wherein (1) the surface normal generation network is designed to generate from the imagem 1 ,m 2 , ..., m j And illumination of lightl 1 ,l 2 , ..., l j In generating a surface normal of the object to be reconstructed
Figure 760197DEST_PATH_IMAGE001
The method comprises the following specific steps:
resolution of image m is notedp*q,p、q≥2nN is not less than 4, thenm∈ℝp*q*3Wherein 3 represents RGB; the surface normal generation network is firstly as followsmResolution ofp*qTo illuminatel = [x, y, z] ∈ℝ3Repeatedly filling to ℝp*q*3In the space (D), the illumination after filling is recorded ashThen, thenh∈ℝp*q*3At this timehAndmhaving the same space size, willhAndmjoin in a third set of dimensions to form a new tensor, which belongs to ℝp*q*6At the input ofjUnder the condition of image and illumination, obtainjA fused tensor;
respectively carrying out 4 layers of convolutional layer operations on the tensors, wherein the sizes of convolutional kernels of convolutional layers 1, 2, 3 and 4 are all 3 x 3, and all the convolutional kernels adopt 'relu' activation functions, wherein the 2 nd layer and the 4 th layer are convolutions with the step length 'stride' of 2, the 1 st layer and the 3 rd layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolutional layers 1, 2, 3 and 4 is respectively 64, 128, 128 and 256;
then, the maximum pooling layer is used to derive from j 4-layered convolved tensors ∈ ℝp/4*q/4*256Pooled into one ℝp/4*q/4*256Tensor of (2);
calculating by convolution layers 5, 6, 7 and 8, wherein the convolution kernels of the convolution layers 5, 6, 7 and 8 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 5 th layer and the 7 th layer are transposition convolutions, the 6 th layer and the 8 th layer are convolutions with the step length 'stride' of 1, and the number of characteristic channels of the convolution layers 5, 6, 7 and 8 is 128, 64 and 3;
finally, normalizing the tensor obtained by the 8 th layer of convolution to enable the modulus to be 1, and obtaining the surface normal direction of the object to be reconstructed
Figure 684159DEST_PATH_IMAGE001
3. The deep learning-based high-frequency region-enhanced photometric stereo three-dimensional reconstruction method according to claim 1, wherein (2) the attention weight generating network is designed to generate the attention weight from the imagem 1 , m 2 , ..., m j To generate an attention weight map of the object to be reconstructedPThe method comprises the following specific steps:
attention weight generating network pair imagem∈ℝp*q*3Calculate its gradient value, which also belongs to space ℝp*q*3And the gradient of the image is connected and fused with the image in a third group of dimensions to form a new tensor, wherein the new tensor belongs to ℝp*q*6Under the condition of inputting j images and illumination, j fused tensors are obtained;
firstly, performing convolution layer operation on 3 layers of fused tensors respectively, wherein the sizes of convolution kernels of the 3 layers are all 3 x 3, and a 'relu' activation function is adopted, wherein the step length 'stride' of the 2 nd layer is 2, the step lengths 'stride' of the 1 st layer and the 3 rd layer are 1, and the number of characteristic channels of the 3 convolution layers is 64, 128 and 128 respectively;
then, from j 3-layered convolved tensors ∈ ℝ using the max pooling layerp/2*q/2*128Pooled into one ℝp/2*q/2*128Tensor of (2);
and calculating by convolution layers 5, 6 and 7, wherein the convolution kernels of the convolution layers 5, 6 and 7 are all 3 x 3 and all adopt 'relu' activation functions, wherein the 6 th layer is the transposition convolution, the 5 th layer and the 7 th layer are the convolution with the step length 'stride' of 1, the number of characteristic channels of the convolution layers 5, 6 and 7 is 128, 64 and 1, and thus obtaining the attention weight graph of the object to be reconstructedP
4. The deep learning based high frequency region enhanced photometric stereo three dimensional reconstruction method according to claim 1 wherein the resolution of the image m isp*qIn (1),pthe values 16, 32, 48, 64,qvalues 16, 32, 48, 64.
5. The deep learning based high frequency region enhanced photometric stereo three dimensional reconstruction method according to claim 1, wherein the method is characterized in thatζIs set to 1.
6. The deep learning based high frequency region enhanced photometric stereo three dimensional reconstruction method according to claim 1 wherein λ is set to 8.
7. The deep learning-based high-frequency region enhanced photometric stereo three-dimensional reconstruction method according to claim 1, wherein the number of cycles is set to 30 epochs.
8. The deep learning-based high-frequency region enhanced photometric stereo three-dimensional reconstruction method according to claim 4, wherein the method is characterized in thatpThe value of the number 32 is taken as the value,qtaking the value of 32.
CN202111524515.8A 2021-12-14 2021-12-14 High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning Active CN113936117B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111524515.8A CN113936117B (en) 2021-12-14 2021-12-14 High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111524515.8A CN113936117B (en) 2021-12-14 2021-12-14 High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning

Publications (2)

Publication Number Publication Date
CN113936117A CN113936117A (en) 2022-01-14
CN113936117B true CN113936117B (en) 2022-03-08

Family

ID=79288969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111524515.8A Active CN113936117B (en) 2021-12-14 2021-12-14 High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning

Country Status (1)

Country Link
CN (1) CN113936117B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115098563B (en) * 2022-07-14 2022-11-11 中国海洋大学 Time sequence abnormity detection method and system based on GCN and attention VAE

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862741A (en) * 2017-12-10 2018-03-30 中国海洋大学 A kind of single-frame images three-dimensional reconstruction apparatus and method based on deep learning
CN109146934A (en) * 2018-06-04 2019-01-04 成都通甲优博科技有限责任公司 A kind of face three-dimensional rebuilding method and system based on binocular solid and photometric stereo
CN110060212A (en) * 2019-03-19 2019-07-26 中国海洋大学 A kind of multispectral photometric stereo surface normal restoration methods based on deep learning
CN113538675A (en) * 2021-06-30 2021-10-22 同济人工智能研究院(苏州)有限公司 Neural network for calculating attention weight for laser point cloud and training method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510573B (en) * 2018-04-03 2021-07-30 南京大学 Multi-view face three-dimensional model reconstruction method based on deep learning
EP4100925A4 (en) * 2020-02-03 2024-03-06 Nanotronics Imaging, Inc. Deep photometric learning (dpl) systems, apparatus and methods
CN113762358B (en) * 2021-08-18 2024-05-14 江苏大学 Semi-supervised learning three-dimensional reconstruction method based on relative depth training

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862741A (en) * 2017-12-10 2018-03-30 中国海洋大学 A kind of single-frame images three-dimensional reconstruction apparatus and method based on deep learning
CN109146934A (en) * 2018-06-04 2019-01-04 成都通甲优博科技有限责任公司 A kind of face three-dimensional rebuilding method and system based on binocular solid and photometric stereo
CN110060212A (en) * 2019-03-19 2019-07-26 中国海洋大学 A kind of multispectral photometric stereo surface normal restoration methods based on deep learning
CN113538675A (en) * 2021-06-30 2021-10-22 同济人工智能研究院(苏州)有限公司 Neural network for calculating attention weight for laser point cloud and training method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Constrained Independent Component Analysis Based Photometric Stereo for 3D Human Face Reconstruction;Cheng-Jian Lin等;《2012 International Symposium on Computer, Consumer and Control》;20120702;710-712 *
深度学习在基于单幅图像的物体三维重建中的应用;陈加等;《自动化学报》;20181128(第04期);23-34 *

Also Published As

Publication number Publication date
CN113936117A (en) 2022-01-14

Similar Documents

Publication Publication Date Title
WO2021232687A1 (en) Deep learning-based point cloud upsampling method
CN106355570B (en) A kind of binocular stereo vision matching method of combination depth characteristic
Liu et al. Exemplar-based image inpainting using multiscale graph cuts
CN111627019A (en) Liver tumor segmentation method and system based on convolutional neural network
CN112634149B (en) Point cloud denoising method based on graph convolution network
Pottmann et al. The isophotic metric and its application to feature sensitive morphology on surfaces
CN112348959A (en) Adaptive disturbance point cloud up-sampling method based on deep learning
CN113962858A (en) Multi-view depth acquisition method
CN113936117B (en) High-frequency region enhanced luminosity three-dimensional reconstruction method based on deep learning
Sulc et al. Reflection Separation in Light Fields based on Sparse Coding and Specular Flow.
CN107103610A (en) Stereo mapping satellite image matches suspicious region automatic testing method
CN114549669B (en) Color three-dimensional point cloud acquisition method based on image fusion technology
Shen et al. 3D shape reconstruction from images in the frequency domain
CN112991504B (en) Improved hole filling method based on TOF camera three-dimensional reconstruction
CN115631223A (en) Multi-view stereo reconstruction method based on self-adaptive learning and aggregation
JP2856661B2 (en) Density converter
Gallardo et al. Using Shading and a 3D Template to Reconstruct Complex Surface Deformations.
CN115239559A (en) Depth map super-resolution method and system for fusion view synthesis
CN114972937A (en) Feature point detection and descriptor generation method based on deep learning
US20220172421A1 (en) Enhancement of Three-Dimensional Facial Scans
Gong et al. Multi-view stereo point clouds visualization
Tabb et al. Camera calibration correction in shape from inconsistent silhouette
Schouten et al. Timed fast exact euclidean distance (tfeed) maps
US20230177722A1 (en) Apparatus and method with object posture estimating
JP7508673B2 (en) Computer vision method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant