WO2022237249A1 - 三维重建方法、装置和***、介质及计算机设备 - Google Patents

三维重建方法、装置和***、介质及计算机设备 Download PDF

Info

Publication number
WO2022237249A1
WO2022237249A1 PCT/CN2022/075636 CN2022075636W WO2022237249A1 WO 2022237249 A1 WO2022237249 A1 WO 2022237249A1 CN 2022075636 W CN2022075636 W CN 2022075636W WO 2022237249 A1 WO2022237249 A1 WO 2022237249A1
Authority
WO
WIPO (PCT)
Prior art keywords
parameter
value
target object
optimized
dimensional
Prior art date
Application number
PCT/CN2022/075636
Other languages
English (en)
French (fr)
Inventor
曹智杰
汪旻
刘文韬
钱晨
马利庄
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Priority to KR1020237014677A priority Critical patent/KR20230078777A/ko
Priority to JP2023525021A priority patent/JP2023547888A/ja
Publication of WO2022237249A1 publication Critical patent/WO2022237249A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/97Determining parameters from multiple pictures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20036Morphological image processing
    • G06T2207/20044Skeletonization; Medial axis transform

Definitions

  • the present disclosure relates to the technical field of computer vision, and in particular to a three-dimensional reconstruction method, device and system, media and computer equipment.
  • 3D reconstruction is one of the important technologies in computer vision, and has many potential applications in fields such as augmented reality and virtual reality. By performing three-dimensional reconstruction on the target object, the posture and limb rotation of the target object can be reconstructed. However, traditional 3D reconstruction methods cannot balance the accuracy and reliability of reconstruction results.
  • the present disclosure provides a three-dimensional reconstruction method, device and system, medium and computer equipment.
  • a 3D reconstruction method comprising: performing 3D reconstruction on a target object in an image through a 3D reconstruction network to obtain an initial value of a parameter of the target object, wherein the The initial value of the parameter is used to establish the three-dimensional model of the target object; the initial value of the parameter is optimized based on the pre-acquired supervision information used to represent the characteristics of the target object to obtain the optimized value of the parameter; based on the obtained The optimized values of the above parameters are used for bone skinning processing, and the three-dimensional model of the target object is established.
  • the supervision information includes first supervision information, or the supervision information includes first supervision information and second supervision information; the first supervision information includes at least one of the following: the initial Two-dimensional key points, semantic information of multiple pixel points on the target object in the image; the second supervisory information includes an initial three-dimensional point cloud of the target object surface.
  • the initial two-dimensional key points or semantic information of pixels of the target object can be used as supervisory information to optimize the initial value of the parameter, which has high optimization efficiency and low optimization complexity; or, can also use The initial 3D point cloud of the surface of the target object and the semantic information of the aforementioned initial 2D key points or pixels are used as supervisory information, thereby improving the accuracy of the optimal value of the obtained parameters.
  • the method further includes: extracting information of initial two-dimensional key points of the target object from the image through a key point extraction network. Using the information of the initial two-dimensional key points extracted by the key point extraction network as supervision information can generate more natural and reasonable actions for the three-dimensional model.
  • the image includes a depth image of the target object; the method further includes: extracting depth information of a plurality of pixels on the target object from the depth image; A plurality of pixel points on the target object in the depth image are back-projected to a three-dimensional space to obtain an initial three-dimensional point cloud of the surface of the target object.
  • the initial three-dimensional point cloud of the target object surface can be obtained, so that the initial three-dimensional point cloud can be used as the supervision information to optimize the initial parameters. value, further improving the accuracy of parameter optimization.
  • the method further includes: filtering outliers from the initial three-dimensional point cloud, and using the filtered initial three-dimensional point cloud as the second supervisory information. By filtering the outliers, the interference of the outliers is reduced, and the accuracy of the parameter optimization process is further improved.
  • the image of the target object is acquired by an image acquisition device
  • the parameters include: the global rotation parameter of the target object, the key point rotation parameters of each key point of the target object, the target object The body parameters and the displacement parameters of the image acquisition device; the initial value of the parameter is optimized based on the pre-acquired supervision information used to represent the characteristics of the target object, including: the initial value of the body parameter and the key Under the condition that the initial value of the point rotation parameter remains unchanged, based on the supervisory information and the initial value of the displacement parameter, optimize the current value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter , to obtain the optimized value of the displacement parameter and the optimized value of the global rotation parameter; based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter are performed Optimization, to obtain the optimal value of the key point rotation parameter and the optimal value of the body parameter.
  • the supervision information includes the initial two-dimensional key points of the target object; the current value of the displacement parameter of the image acquisition device based on the supervision information and the initial value of the displacement parameter
  • optimizing the initial value of the global rotation parameter includes: obtaining the target two-dimensional projection key points corresponding to the two-dimensional projection key points corresponding to the three-dimensional key points of the target object belonging to the preset position of the target object; wherein, The 3D key points of the target object are obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the posture parameter, and the 2D projection key point is based on the current value of the displacement parameter and the global
  • the initial value of the rotation parameter is obtained by projecting the three-dimensional key point of the target object; obtaining the first loss between the target two-dimensional projection key point and the initial two-dimensional key point; obtaining the initial value of the displacement parameter and a second loss between the current value of the displacement parameter; optimizing the current value of the displacement parameter and the initial value of the global rotation
  • the preset part can be the torso and other parts. Since different actions have little influence on the key points of the torso, the first loss can be determined by using the key points of the torso, which can reduce the influence of different actions on the position of the key points and improve Optimize the accuracy of the results. Since the two-dimensional key points are supervisory information on the two-dimensional plane, and the displacement parameters of the image acquisition device are parameters on the three-dimensional plane, by obtaining the second loss, it is possible to reduce the deviation of the optimization result from falling into the local optimal point on the two-dimensional plane. The real situation.
  • the supervisory information includes the initial two-dimensional key points of the target object; the optimized value based on the displacement parameter and the optimized value of the global rotation parameter, and the initial value of the key point rotation parameter
  • Optimizing with the initial value of the posture parameter includes: obtaining the third loss between the optimized two-dimensional projection key point of the target object and the initial two-dimensional key point, and the optimized two-dimensional projection key point is based on the The optimized value of the displacement parameter and the optimized value of the global rotation parameter are obtained by projecting the optimized three-dimensional key point of the target object, and the optimized three-dimensional key point is based on the optimized value of the global rotation parameter and the initial value of the key point rotation parameter and the initial value of the posture parameter is obtained; the fourth loss is obtained, and the fourth loss is used to characterize the rationality of the posture corresponding to the optimal value of the global rotation parameter, the initial value of the key point rotation parameter, and the initial value of the posture parameter; Optimizing the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the third
  • This embodiment optimizes the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, which improves the stability of the optimization process.
  • the fourth loss ensures the optimization
  • the latter parameters correspond to the rationality of the pose.
  • the method further includes: after optimizing the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter , performing joint optimization on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter.
  • the optimized parameters are jointly optimized, thereby further improving the accuracy of the optimization result.
  • the supervision information includes the initial two-dimensional key points of the target object and the initial three-dimensional point cloud of the surface of the target object; based on the supervision information and the initial value of the displacement parameter, the Optimizing the current value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter includes: obtaining presets belonging to the target object in the two-dimensional projection key points corresponding to the three-dimensional key points of the target object The target two-dimensional projection key point of the part; wherein, the three-dimensional key point of the target object is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the posture parameter, and the two-dimensional projection key The point is obtained by projecting the 3D key point of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining the first 2D key point between the target 2D projection key point and the initial 2D key point A loss; obtain the second loss between the initial value of the displacement parameter and the current value of the displacement parameter;
  • the joint optimization of the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the posture parameter and the optimized value of the displacement parameter includes: obtaining the A sixth loss between an optimized two-dimensional projection keypoint of the target object and the initial two-dimensional keypoint, the optimized two-dimensional projection keypoint is based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter on the The optimized three-dimensional key points of the target object are obtained by projection, and the optimized three-dimensional key points are obtained based on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the body shape parameter; the seventh loss is obtained, and the first Seven losses are used to characterize the rationality of the posture corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter and the optimized value of the body shape parameter; the second three-dimensional point cloud of the surface of the target object is obtained and the initial The eighth loss between three-dimensional point clouds; the second three-dimensional point cloud is obtained based on
  • a 3D reconstruction device comprising: a first 3D reconstruction module, configured to perform 3D reconstruction on a target object in an image through a 3D reconstruction network, to obtain the target object
  • the initial value of the parameter wherein the initial value of the parameter is used to establish the three-dimensional model of the target object
  • the optimization module is used to adjust the initial value of the parameter based on the pre-acquired supervision information used to represent the characteristics of the target object Perform optimization to obtain the optimized value of the parameter
  • the second three-dimensional reconstruction module is used to perform bone skinning processing based on the optimized value of the parameter, and establish a three-dimensional model of the target object.
  • the supervision information includes first supervision information, or the supervision information includes first supervision information and second supervision information; the first supervision information includes at least one of the following: the initial Two-dimensional key points, semantic information of multiple pixel points on the target object in the image; the second supervisory information includes an initial three-dimensional point cloud of the target object surface.
  • the initial two-dimensional key points or semantic information of pixels of the target object can be used as supervisory information to optimize the initial value of the parameter, which has high optimization efficiency and low optimization complexity; or, can also use The initial 3D point cloud of the surface of the target object and the semantic information of the aforementioned initial 2D key points or pixels are used as supervisory information, thereby improving the accuracy of the optimal value of the obtained parameters.
  • the device further includes: a two-dimensional key point extraction module, configured to extract initial two-dimensional key point information of the target object from the image through a key point extraction network. Using the information of the initial two-dimensional key points extracted by the key point extraction network as supervision information can generate more natural and reasonable actions for the three-dimensional model.
  • the image includes a depth image of the target object; the device further includes: a depth information extraction module, configured to extract depth information of multiple pixels on the target object from the depth image a back-projection module, configured to back-project multiple pixel points on the target object in the depth image to a three-dimensional space based on the depth information, to obtain an initial three-dimensional point cloud on the surface of the target object.
  • a depth information extraction module configured to extract depth information of multiple pixels on the target object from the depth image
  • a back-projection module configured to back-project multiple pixel points on the target object in the depth image to a three-dimensional space based on the depth information, to obtain an initial three-dimensional point cloud on the surface of the target object.
  • the image further includes an RGB image of the target object;
  • the depth information extraction module includes: an image segmentation unit for performing image segmentation on the RGB image, and an image area determination unit for based on The result of image segmentation determines the image area where the target object is located in the RGB image, and determines the image area where the target object is located in the depth image based on the image area where the target object is located in the RGB image;
  • the depth information acquisition unit is used to acquire Depth information of multiple pixels in the image area where the target object is located in the depth image.
  • the device further includes: a filtering module, configured to filter out outliers from the initial 3D point cloud, and use the filtered initial 3D point cloud as the second supervisory information. By filtering the outliers, the interference of the outliers is reduced, and the accuracy of the parameter optimization process is further improved.
  • a filtering module configured to filter out outliers from the initial 3D point cloud, and use the filtered initial 3D point cloud as the second supervisory information.
  • the image of the target object is acquired by an image acquisition device
  • the parameters include: the global rotation parameter of the target object, the key point rotation parameters of each key point of the target object, the target object
  • the optimization module includes: a first optimization unit, for when the initial values of the body posture parameters and the initial values of the key point rotation parameters remain unchanged, based on The supervision information and the initial value of the displacement parameter are optimized by optimizing the current value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter to obtain an optimized value of the displacement parameter and an optimized value of the global rotation parameter
  • the second optimization unit is used to optimize the initial value of the key point rotation parameter and the initial value of the body posture parameter based on the optimal value of the displacement parameter and the optimal value of the global rotation parameter to obtain the key point rotation parameter The optimal value of and the optimal value of body parameters.
  • the supervisory information includes the initial two-dimensional key points of the target object; the first optimization unit is configured to: obtain the two-dimensional projection key points corresponding to the three-dimensional key points of the target object belonging to the The target two-dimensional projection key point of the preset part of the target object; wherein, the three-dimensional key point of the target object is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter, The two-dimensional projection key point is obtained by projecting the three-dimensional key point of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining the target two-dimensional projection key point and the initial two-dimensional a first loss between key points; obtaining a second loss between an initial value of the displacement parameter and a current value of the displacement parameter; based on the first loss and the second loss on the current value of the displacement parameter and the initial value of the global rotation parameter for optimization.
  • the preset part can be the torso and other parts. Since different actions have little influence on the key points of the torso, the first loss can be determined by using the key points of the torso, which can reduce the influence of different actions on the position of the key points and improve Optimize the accuracy of the results. Since the two-dimensional key points are supervisory information on the two-dimensional plane, and the displacement parameters of the image acquisition device are parameters on the three-dimensional plane, by obtaining the second loss, it is possible to reduce the deviation of the optimization result from falling into the local optimal point on the two-dimensional plane. The real situation.
  • the supervisory information includes the initial two-dimensional key points of the target object; the second optimization unit is configured to: obtain the optimized two-dimensional projection key points of the target object and the initial two-dimensional key points The third loss between points, the optimized two-dimensional projection key point is obtained by projecting the optimized three-dimensional key point of the target object based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, and the optimized three-dimensional key point The point is obtained based on the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the posture parameter; the fourth loss is obtained, and the fourth loss is used to characterize the optimal value of the global rotation parameter, the key point The rationality of the attitude corresponding to the initial value of the rotation parameter and the initial value of the posture parameter; based on the third loss and the fourth loss, the initial value of the key point rotation parameter and the initial value of the posture parameter are optimized .
  • This embodiment optimizes the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, which improves the stability of the optimization process.
  • the fourth loss ensures the optimization
  • the latter parameters correspond to the rationality of the pose.
  • the device further includes: a joint optimization module, configured to perform an initial value of the key point rotation parameter and the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter. After the initial value of is optimized, the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter are jointly optimized. In this embodiment, on the basis of the aforementioned optimization, the optimized parameters are jointly optimized, thereby further improving the accuracy of the optimization result.
  • the supervisory information includes the initial two-dimensional key points of the target object and the initial three-dimensional point cloud of the surface of the target object;
  • the first optimization unit is configured to: acquire the three-dimensional key points of the target object Among the two-dimensional projection key points corresponding to the point, the target two-dimensional projection key point belonging to the preset part of the target object; wherein, the three-dimensional key point of the target object is based on the initial value of the global rotation parameter, the key point rotation parameter The initial value of the initial value and the initial value of the posture parameter are obtained, and the two-dimensional projection key point is obtained by projecting the three-dimensional key point of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining the target The first loss between the two-dimensional projection key point and the initial two-dimensional key point; obtain the second loss between the initial value of the displacement parameter and the current value of the displacement parameter; obtain the target object surface The fifth loss between the first 3D point cloud and the initial 3D point cloud; the first 3D point
  • the joint optimization module includes: a first acquisition unit, configured to acquire the sixth loss between the optimized 2D projection keypoint of the target object and the initial 2D keypoint, the optimization The two-dimensional projection key point is obtained by projecting the optimized three-dimensional key point of the target object based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, and the optimized three-dimensional key point is based on the optimized value of the global rotation parameter, The optimized value of the key point rotation parameter and the optimized value of the posture parameter are obtained; the second acquisition unit is used to obtain the seventh loss, and the seventh loss is used to represent the optimized value of the global rotation parameter and the optimized value of the key point rotation parameter value and the rationality of the posture corresponding to the optimized value of the body posture parameter; the third acquisition unit is used to acquire the eighth loss between the second 3D point cloud on the surface of the target object and the initial 3D point cloud; the first 3D point cloud The 2D and 3D point cloud is obtained based on the optimized value of the global rotation parameter, the optimized value of the key point
  • a three-dimensional reconstruction system comprising: an image acquisition device, configured to acquire an image of a target object; and a processing unit communicatively connected to the image acquisition device, configured to The three-dimensional reconstruction network performs three-dimensional reconstruction on the target object in the image to obtain the initial value of the parameter of the target object, and the initial value of the parameter is used to establish the three-dimensional model of the target object; Optimizing the initial value of the parameter based on the supervisory information representing the characteristics of the target object to obtain the optimized value of the parameter; performing bone skinning processing based on the optimized value of the parameter to establish a three-dimensional model of the target object.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the method described in any embodiment is implemented.
  • a computer device including a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor executes the computer program, any The method described in the examples.
  • a computer program product is provided, the computer program product is stored in a storage medium and includes a computer program that can run on a processor, and when the processor executes the computer program, any A method described in one embodiment.
  • the initial value of the parameter is obtained by three-dimensionally reconstructing the image of the target object through the three-dimensional reconstruction network, and then the initial value of the parameter is optimized based on the supervisory information, and the optimized value of the parameter obtained based on the parameter optimization is established.
  • 3D model of the target object The advantage of the parameter optimization method is that it can give more accurate 3D reconstruction results that conform to the 2D observation characteristics of the image, but it often gives unnatural and unreasonable action results with low reliability. Network regression through the 3D reconstruction network can give more natural and reasonable action results. Therefore, optimizing the output of the 3D reconstruction network as the initial value of the parameters can ensure the reliability of the 3D reconstruction results. Accuracy of 3D reconstruction.
  • FIG. 1A and 1B are schematic illustrations of three-dimensional models of some embodiments.
  • Fig. 2 is a flowchart of a three-dimensional reconstruction method according to an embodiment of the present disclosure.
  • FIG. 3 is an overall flowchart of an embodiment of the present disclosure.
  • FIG. 4A and FIG. 4B are schematic diagrams of application scenarios of embodiments of the present disclosure, respectively.
  • FIG. 5 is a block diagram of a three-dimensional reconstruction device according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of a three-dimensional reconstruction system according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.
  • first, second, third, etc. may be used in the present disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present disclosure, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word “if” as used herein may be interpreted as “at” or “when” or “in response to a determination.”
  • the 3D reconstruction of the target object needs to reconstruct the body posture and limb rotation of the target object.
  • a parametric model is used to express the body posture and limb rotation of the target object, not just the 3D key points.
  • a 3D model of a thinner person as shown in Figure 1A
  • a 3D model of a fatter person as shown in Figure 1B
  • the person shown in Figure 1B is in the same posture as the person shown in Figure 1B, and the key point information is the same, and the difference in posture between the two cannot be represented only through the key point information.
  • 3D reconstruction is generally carried out by means of parameter optimization and network regression.
  • the parameter optimization method usually selects a set of standard parameters, and uses the gradient descent method to iteratively optimize the initial value of the parameters of the 3D model of the target object according to the 2D visual features of the image of the target object, where the 2D visual features of the image can be Select 2D keypoints, etc.
  • the advantage of the parameter optimization method is that it can give more accurate parameter estimation results that conform to the two-dimensional visual characteristics of the image, but it often gives unnatural and unreasonable action results, and the final performance of parameter optimization is very dependent on the initial value of the parameters , leading to low reliability of the 3D reconstruction method based on parameter optimization.
  • Methods for network regression typically train an end-to-end neural network to learn the mapping from images to 3D model parameters.
  • the advantage of the network regression method is that it can give more natural and reasonable action results.
  • the 3D reconstruction results may not match the 2D visual features in the image. Therefore, the accuracy of the 3D reconstruction method based on network regression is relatively low. Low.
  • the 3D reconstruction method in the related art cannot take into account the accuracy and reliability of the 3D reconstruction results.
  • an embodiment of the present disclosure provides a three-dimensional reconstruction method, as shown in FIG. 2 , the method includes:
  • Step 201 Perform 3D reconstruction on the target object in the image through a 3D reconstruction network to obtain initial values of parameters of the target object, wherein the initial values of the parameters are used to establish a 3D model of the target object;
  • Step 202 Optimizing the initial value of the parameter based on the pre-acquired supervisory information representing the characteristics of the target object to obtain the optimized value of the parameter;
  • Step 203 Perform bone skinning processing based on the optimized values of the parameters, and establish a 3D model of the target object.
  • the target object may be a three-dimensional object, such as a person, an animal, a robot, etc. in a physical space, or one or more regions on the three-dimensional object, such as a human face or a limb.
  • the target object is a human being
  • the three-dimensional reconstruction performed on the target object is a human body reconstruction as an example for description.
  • the image of the target object may be a single image, or may include multiple images obtained by shooting the target object from multiple different angles of view.
  • 3D human body reconstruction based on a single image is called monocular 3D human body reconstruction, and 3D human body reconstruction based on multiple images from different perspectives is called multi-eye 3D human body reconstruction.
  • Each image can be a grayscale image, RGB image or RGBD image.
  • the image may be an image collected in real time by an image acquisition device (for example, a camera or a camera) around the target object, or an image collected and stored in advance.
  • the image of the target object can be reconstructed in 3D through a 3D reconstruction network, wherein the 3D reconstruction network can be a pre-trained neural network.
  • the 3D reconstruction network can perform 3D reconstruction based on images, and estimate the initial values of natural and reasonable parameters.
  • the initial values of the parameters here can be represented by a vector.
  • the dimension of the vector can be 85 dimensions, for example, and the vector contains
  • the rotation information of the moving limbs of the human body that is, the initial value of the posture parameters, including the initial values of the global rotation parameters of the human body and the initial values of the key point rotation parameters of 23 key points
  • the human body can be represented by key points and limb bones connecting these key points.
  • the key points of the human body can include the top of the head, nose, neck, left and right eyes, left and right ears, chest, left and right shoulders, left and right elbows, left and right wrists, left and right hips, left and right buttocks, One or more of key points such as left and right knees, left and right ankles, etc., the initial value of the pose parameter is used to determine the position of the key points of the human body in three-dimensional space.
  • the initial value of the body shape parameter is used to determine body shape information such as height, shortness, fatness, and thinness of the human body.
  • the initial value of the parameter of the camera is used to determine the absolute position of the human body in the three-dimensional space under the camera coordinate system, and the parameter of the camera includes a displacement parameter between the camera and the human body and a posture parameter of the camera, wherein the posture parameter of the camera is
  • the initial value can be replaced by the initial value of the global rotation parameter of the human body.
  • the parameters of the human body can be expressed using a parametric form of a Skinned Multi-Person Linear (SMPL) model (referred to as SMPL parameters).
  • SMPL parameters Skinned Multi-Person Linear
  • the bone skinning process can be performed based on the value of the SMPL parameter, that is, a mapping function M( ⁇ , ⁇ ) is used to map the initial value of the body parameter and the initial value of the attitude parameter to the three-dimensional model of the human body surface , the 3D model includes 6890 vertices, and the vertices form a triangular patch through a fixed connection relationship.
  • a pre-trained regressor W can be used to further regress the 3D key points of the human body from the vertices of the human surface model which is:
  • the supervisory information can be two-dimensional visual features of the image (also called two-dimensional observation features), for example, the two-dimensional key points of the target object in the image and the semantics of multiple pixel points on the target object at least one of the information.
  • the semantic information of a pixel is used to represent which area the pixel is located on the target object, and the area may be, for example, the area where the head, arm, torso, leg, etc. are located.
  • the two-dimensional key point extraction network can be used to estimate the position of human key points in the image.
  • any two-dimensional pose estimation method can be used, such as OpenPose.
  • 2D visual features and the initial 3D point cloud of the target object surface can also be used as supervision information to further improve the accuracy of 3D reconstruction.
  • the depth information of multiple pixels on the target object can be extracted from the depth image, and the Multiple pixel points on the target object in the depth image are projected into a three-dimensional space to obtain an initial three-dimensional point cloud on the surface of the target object.
  • the plurality of pixels may be part or all of the pixels on the target object in the image.
  • it may include pixel points of various areas on the target object that need to be three-dimensionally reconstructed, and the number of pixel points in each area should be greater than or equal to the number required for three-dimensional reconstruction.
  • the image generally includes both the target object and the background area. Therefore, image segmentation can be performed on the RGB image included in the image, the image area where the target object is located in the RGB image is obtained, and the target object in the depth image is determined based on the image area where the target object is located in the RGB image. the image area; acquiring depth information of multiple pixels in the image area where the target object is located in the depth image.
  • the pixels in the depth image correspond one-to-one to the pixels in the RGB image.
  • the image may also be an RGBD image.
  • outliers can also be filtered out from the 3D point cloud (ie, the initial 3D point cloud), and the supervision information can include the filtered 3D point cloud.
  • the filtering can be implemented using a point cloud filter. By filtering out outliers, a finer 3D point cloud of the surface of the target object can be obtained, thereby further improving the accuracy of 3D reconstruction.
  • each target 3D point in the 3D point cloud For each target 3D point in the 3D point cloud, obtain the average distance from the n 3D points nearest to the target 3D point to the target 3D point, assuming that the average distance corresponding to each target 3D point obeys a statistical distribution (for example, Gaussian distribution), the mean and variance of the statistical distribution can be calculated, and a threshold s can be set based on the mean and variance, then the three-dimensional points whose average distance is outside the range of the threshold s can be regarded as outliers and analyzed from the three-dimensional filtered from the point cloud.
  • a statistical distribution for example, Gaussian distribution
  • the initial values of the parameters can be iteratively optimized using the two-dimensional observation features as supervisory information.
  • the image is an RGBD image
  • the two-dimensional observation features and the three-dimensional point cloud of the surface of the target object can be used as supervisory information to iteratively optimize the initial value of the parameter.
  • the optimization method may, for example, use a gradient descent method, which is not limited in the present disclosure.
  • bone skinning processing may be performed based on the optimized values of the parameters to obtain a three-dimensional model of the target object.
  • the RGB image can be reconstructed three-dimensionally through the three-dimensional reconstruction network to obtain the human body parameter value of the person in the image, and the key point extraction network can be used to extract the key points of the person in the image to obtain the two-dimensional human body key point.
  • the human body parameter value is used as the initial value of the parameter
  • the two-dimensional key points of the human body are used as the supervision information
  • the initial value of the human body parameter is optimized through the parameter optimization module to obtain the optimized value of the human body parameter, and based on the optimized value of the human body parameter. Skinning processing to obtain the human body reconstruction model.
  • the image can be decomposed into an RGB image and a TOF (Time of Flight, time of flight) depth map.
  • the TOF depth map includes the depth information of each pixel in the RGB image.
  • the RGB image can be reconstructed three-dimensionally through the three-dimensional reconstruction network to obtain the human body parameter value of the person in the image, and the key point extraction network can be used to extract the key point of the person in the image to obtain the two-dimensional key point of the human body.
  • the point cloud reconstruction module can also be used to reconstruct the surface point cloud of the human body based on the depth information in the TOF depth map.
  • the human body parameter value is used as the initial value of the parameter, and the two-dimensional key points of the human body and the point cloud of the human body surface are jointly used as supervision information.
  • the optimal value of the parameters is processed by bone skinning to obtain the human body reconstruction model.
  • color processing may be performed on the human body reconstruction model based on the color information in the RGB image or the RGBD image, so that the human body reconstruction model matches the color information of the person in the image.
  • the target object in the image is reconstructed three-dimensionally through the three-dimensional reconstruction network, so as to obtain the initial value of the parameter, and then optimize the initial value of the parameter based on the supervision information, and establish the target object based on the optimized value of the parameter 3D model of .
  • the advantage of the parameter optimization method is that it can give more accurate 3D reconstruction results that conform to the 2D observation characteristics of the image, but it often gives unnatural and unreasonable action results with low reliability.
  • the network regression through the 3D reconstruction network can give more natural and reasonable action results. Therefore, using the output of the 3D reconstruction network as the initial value of the parameters for parameter optimization can ensure the reliability of the 3D reconstruction results. Taking into account the accuracy of 3D reconstruction.
  • a multi-stage optimization method may be used in the parameter optimization stage.
  • the multi-stage optimization method may include a camera optimization stage and a pose optimization stage.
  • the optimization targets are the value R of the global rotation parameter and the current value t of the displacement parameter between the image acquisition device and the target object.
  • t and R are three-dimensional vectors, and R is expressed in the form of axis and angle.
  • the optimization targets are the values of key point rotation parameters and body posture parameters.
  • the current displacement parameter of the image acquisition device value and the initial value of the global rotation parameter are optimized to obtain the optimal value of the displacement parameter and the optimal value of the global rotation parameter; then keep the optimal value of the displacement parameter and the optimal value of the global rotation parameter unchanged, and based on the
  • the optimized value and the optimized value of the global rotation parameter are optimized by optimizing the initial value of the key point rotation parameter and the initial value of the body shape parameter to obtain the optimized value of the key point rotation parameter and the optimized value of the body shape parameter.
  • target 2D projection key points belonging to preset parts of the target object can be acquired; wherein, the 3D key points of the target object are based on the The initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter are obtained; the two-dimensional projection key point is based on the current value of the displacement parameter and the initial value of the global rotation parameter for the target object
  • the 3D key points of are obtained by projection.
  • a first loss between the target 2D projection keypoint and the initial 2D keypoint is obtained.
  • a second loss between an initial value of the displacement parameter and a current value of the displacement parameter is obtained.
  • the current value of the displacement parameter and the initial value of the global rotation parameter are optimized based on the first loss and the second loss.
  • the preset part may be a trunk part
  • the key points of the target two-dimensional projection may include left and right shoulder points, left and right hip points, spine center points and other key points. Since different actions have less influence on the key points of the torso, by using the key points of the torso to establish the first loss, the influence of different actions on the position of the key points can be reduced and the accuracy of the optimization result can be improved.
  • the first loss can also be called torso key point projection loss
  • the second loss can also be called camera displacement regularization loss.
  • the first loss can be obtained by the following formula (1)
  • the second loss can be obtained by the following formula (2) get:
  • L torso and L cam denote the first loss and the second loss respectively
  • t and t net represent the current value of the displacement parameter between the image acquisition device and the target object and the initial value of the displacement parameter respectively.
  • the first target loss L 1 can be determined based on the first loss and the second loss.
  • the first target loss can be determined as the sum of the first loss and the second loss, which can be determined by the following formula (3) Sure:
  • a third loss between an optimized 2D projection keypoint of the target object and the initial 2D keypoint may be obtained, wherein the optimized 2D projection keypoint is based on an optimized value of the displacement parameter and a global rotation parameter
  • the optimized value of is obtained by projecting the optimized 3D key point of the target object, and the optimized 3D key point is obtained based on the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter.
  • the fourth loss is obtained, and the fourth loss is used to characterize the rationality of the posture corresponding to the optimal value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body posture parameter. Optimizing the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the third loss and the fourth loss.
  • the third loss can also be called the two-dimensional key point projection loss
  • the fourth loss can also be called the attitude rationality loss
  • the third loss can be determined by the following formula (4):
  • L 2d is the third loss
  • x and represent the optimized two-dimensional projection key points and the initial two-dimensional key points respectively.
  • the second target loss may be determined based on the third loss and the fourth loss.
  • the second target loss may be determined as the sum of the third loss and the fourth loss, which may be determined by the following formula (5):
  • L 2 is the second target loss
  • L prior is the fourth loss, which can be obtained by using a Gaussian Mixture Model (GMM), which is used to judge the optimal value of the global rotation parameter, the initial and body posture of the key point rotation parameter Whether the attitude corresponding to the initial value of the parameter is reasonable, and output a large loss for the unreasonable attitude.
  • GMM Gaussian Mixture Model
  • the optimized value of the global rotation parameter may also be optimized.
  • the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter are jointly optimized, that is, a three-stage optimization method is adopted.
  • the supervision information includes the information of the 3D point cloud on the surface of the target object
  • the three-stage optimization method can be adopted, including the camera optimization stage, the attitude optimization stage and the point cloud optimization stage.
  • target two-dimensional projection key points belonging to preset parts of the target object among the two-dimensional projection key points corresponding to the three-dimensional key points of the target object can be obtained; wherein, the three-dimensional key points of the target object Based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter, the two-dimensional projection key point is based on the current value of the displacement parameter and the initial value of the global rotation parameter.
  • the 3D key points of the target object are obtained by projection.
  • a first loss between the target 2D projection keypoint and the initial 2D keypoint is obtained.
  • a second loss between an initial value of the displacement parameter and a current value of the displacement parameter is obtained.
  • the fifth loss can also be called the nearest point iteration (Iterative Closest Point, ICP) point cloud registration loss, which can be determined by the following formula (6):
  • L icp is the fifth loss
  • the initial 3D point cloud is regarded as point cloud P
  • the first 3D point cloud is regarded as point cloud Q
  • K 1 ⁇ (p,q) ⁇
  • Each point in point cloud P is a set of point pairs formed by the closest point in point cloud Q
  • K 2 ⁇ (p,q) ⁇ is the point pair set from each point in point cloud Q to the closest point in point cloud P
  • the first loss and the second loss are represented by the following formula (7) and formula (8) respectively:
  • L torso and L cam denote the first loss and the second loss respectively
  • x torso and respectively represent the target two-dimensional projection key point and the initial two-dimensional key point
  • t and t net represent the current value of the displacement parameter and the initial value of the displacement parameter respectively.
  • the first target loss L 1 can be determined based on the sum of the first loss, the second loss and the fifth loss, and then optimize the current value of the displacement parameter and the initial value of the global rotation parameter based on the first target loss, that is, as The following formula (9):
  • L 1 L torso +L cam +L icp (9).
  • the attitude optimization stage in the three-stage optimization process is the same as the attitude optimization stage in the two-stage optimization process, and will not be repeated here.
  • the sixth loss between the optimized 2D projection keypoint of the target object and the initial 2D keypoint can be obtained, wherein the optimized 2D projection keypoint is based on the displacement parameter
  • the optimized value and the optimized value of the global rotation parameter are obtained by projecting the optimized three-dimensional key point of the target object, and the optimized three-dimensional key point is obtained based on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter and the body shape parameter.
  • the optimized value is obtained.
  • a seventh loss is obtained, and the seventh loss is used to characterize the rationality of the posture corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the posture parameter.
  • the initial two-dimensional keypoint To optimize the 2D projection keypoints, is the initial two-dimensional keypoint.
  • the seventh loss can be obtained by using a Gaussian mixture model, which is used to judge whether the posture corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the posture parameter is reasonable, and outputs a large loss for an unreasonable posture .
  • P is the initial 3D point cloud as a point cloud
  • the second 3D point cloud For each point in the point cloud P to the point cloud A set of point pairs consisting of the closest points in the middle, for the point cloud A set of point pairs from each point in point cloud P to the nearest point in point cloud P.
  • the sum of the sixth loss, the seventh loss and the eighth loss can be determined as the third target loss L 3 , and based on the third target loss, optimize the value of the global rotation parameter, the key point rotation parameter
  • the optimal value of the optimized value, the optimized value of the body shape parameter and the optimized value of the displacement parameter are jointly optimized, and can be jointly optimized by the following formula (12):
  • parameter optimization can be performed based on the aforementioned two-stage optimization method including the camera optimization stage and the attitude optimization stage;
  • the parameters are optimized by the three-stage optimization method of the stage, attitude optimization stage and point cloud optimization stage.
  • This solution can be used in a wide range of scenarios, and can provide natural, reasonable and accurate human body reconstruction models in scenarios such as virtual fitting rooms, virtual anchors, and video action migration.
  • FIG. 4A it is a schematic diagram of an application scene of a virtual fitting room according to an embodiment of the present disclosure.
  • the image of the user 401 can be collected by the camera 403, and the collected image is sent to a processor (not shown in the figure) for three-dimensional human body reconstruction, so as to obtain the human body reconstruction model 404 corresponding to the user 401, and the human body reconstruction model 404 is displayed on
  • the display interface 402 is for the user 401 to watch.
  • the user 401 can select the required clothing 405, including but not limited to clothing 4051 and hat 4052, etc., and the clothing 405 can be displayed on the display interface 402 based on the human body reconstruction model 404, so that the user 401 can watch the wearing effect of the clothing 405.
  • FIG. 4B it is a schematic diagram of an application scenario of a virtual live broadcast room according to an embodiment of the present disclosure.
  • the image of the anchor user 406 can be collected through the anchor client 407, and the image of the anchor user 406 can be sent to the server 408 for three-dimensional reconstruction to obtain the human body reconstruction model of the anchor user, that is, the virtual anchor.
  • the server 408 can return the human body reconstruction model of the host user to the host client 407 for display, as shown in the model 4071 in the figure.
  • the host client 407 can also collect the voice information of the host user, and send the voice information to the server 408, so that the server 408 can fuse the reconstruction model of the human body and the voice information.
  • the server 408 can send the fused human body reconstruction model and voice information to the viewer client 409 watching the live program for display and playback, wherein the displayed human body reconstruction model is shown as model 4091 in the figure.
  • the live broadcast screen of the virtual anchor can be displayed on the viewer client 409 .
  • the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possible
  • the inner logic is OK.
  • the present disclosure also provides a three-dimensional reconstruction device, which includes:
  • the first three-dimensional reconstruction module 501 is configured to perform three-dimensional reconstruction on the target object in the image through a three-dimensional reconstruction network to obtain an initial value of a parameter of the target object, and the initial value of the parameter is used to establish a three-dimensional model of the target object ;
  • An optimization module 502 configured to optimize the initial value of the parameter based on the pre-acquired supervisory information used to represent the characteristics of the target object, to obtain the optimized value of the parameter;
  • the second three-dimensional reconstruction module 503 is configured to perform bone skinning processing based on the optimized values of the parameters, and establish a three-dimensional model of the target object.
  • the supervision information includes first supervision information, or the supervision information includes first supervision information and second supervision information; the first supervision information includes at least one of the following: the initial Two-dimensional key points, semantic information of multiple pixel points on the target object in the image; the second supervisory information includes an initial three-dimensional point cloud of the target object surface.
  • the initial two-dimensional key points or semantic information of pixels of the target object can be used as supervisory information to optimize the initial value of the parameter, which has high optimization efficiency and low optimization complexity; or, can also use The initial 3D point cloud of the surface of the target object and the semantic information of the aforementioned initial 2D key points or pixels are used as supervisory information, thereby improving the accuracy of the optimal value of the obtained parameters.
  • the device further includes: a two-dimensional key point extraction module, configured to extract initial two-dimensional key point information of the target object from the image through a key point extraction network. Using the information of the initial two-dimensional key points extracted by the key point extraction network as supervision information can generate more natural and reasonable actions for the three-dimensional model.
  • the image includes a depth image of the target object; the device further includes: a depth information extraction module, configured to extract depth information of multiple pixels on the target object from the depth image a back-projection module, configured to back-project multiple pixel points on the target object in the depth image to a three-dimensional space based on the depth information, to obtain an initial three-dimensional point cloud on the surface of the target object.
  • a depth information extraction module configured to extract depth information of multiple pixels on the target object from the depth image
  • a back-projection module configured to back-project multiple pixel points on the target object in the depth image to a three-dimensional space based on the depth information, to obtain an initial three-dimensional point cloud on the surface of the target object.
  • the image further includes an RGB image of the target object;
  • the depth information extraction module includes: an image segmentation unit for performing image segmentation on the RGB image, and an image area determination unit for based on The result of image segmentation determines the image area where the target object is located in the RGB image, and determines the image area where the target object is located in the depth image based on the image area where the target object is located in the RGB image;
  • the depth information acquisition unit is used to acquire Depth information of multiple pixels in the image area where the target object is located in the depth image.
  • the device further includes: a filtering module, configured to filter out outliers from the initial 3D point cloud, and use the filtered initial 3D point cloud as the second supervisory information. By filtering the outliers, the interference of the outliers is reduced, and the accuracy of the parameter optimization process is further improved.
  • a filtering module configured to filter out outliers from the initial 3D point cloud, and use the filtered initial 3D point cloud as the second supervisory information.
  • the image of the target object is acquired by an image acquisition device
  • the parameters include: the global rotation parameter of the target object, the key point rotation parameters of each key point of the target object, the target object
  • the optimization module includes: a first optimization unit, for when the initial values of the body posture parameters and the initial values of the key point rotation parameters remain unchanged, based on The supervision information and the initial value of the displacement parameter are optimized by optimizing the current value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter to obtain an optimized value of the displacement parameter and an optimized value of the global rotation parameter
  • the second optimization unit is used to optimize the initial value of the key point rotation parameter and the initial value of the body posture parameter based on the optimal value of the displacement parameter and the optimal value of the global rotation parameter to obtain the key point rotation parameter The optimal value of and the optimal value of body parameters.
  • the supervisory information includes the initial two-dimensional key points of the target object; the first optimization unit is configured to: obtain the two-dimensional projection key points corresponding to the three-dimensional key points of the target object belonging to the The target two-dimensional projection key point of the preset part of the target object; wherein, the three-dimensional key point of the target object is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter, The two-dimensional projection key point is obtained by projecting the three-dimensional key point of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining the target two-dimensional projection key point and the initial two-dimensional a first loss between key points; obtaining a second loss between an initial value of the displacement parameter and a current value of the displacement parameter; based on the first loss and the second loss on the current value of the displacement parameter and the initial value of the global rotation parameter for optimization.
  • the preset part can be the torso and other parts. Since different actions have little influence on the key points of the torso, the first loss can be determined by using the key points of the torso, which can reduce the influence of different actions on the position of the key points and improve Optimize the accuracy of the results. Since the two-dimensional key points are supervisory information on the two-dimensional plane, and the displacement parameters of the image acquisition device are parameters on the three-dimensional plane, by obtaining the second loss, it is possible to reduce the deviation of the optimization result from falling into the local optimal point on the two-dimensional plane. The real situation.
  • the supervisory information includes the initial two-dimensional key points of the target object; the second optimization unit is configured to: obtain the optimized two-dimensional projection key points of the target object and the initial two-dimensional key points The third loss between points, the optimized two-dimensional projection key point is obtained by projecting the optimized three-dimensional key point of the target object based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, and the optimized three-dimensional key point The point is obtained based on the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the posture parameter; the fourth loss is obtained, and the fourth loss is used to characterize the optimal value of the global rotation parameter, the key point The rationality of the attitude corresponding to the initial value of the rotation parameter and the initial value of the posture parameter; based on the third loss and the fourth loss, the initial value of the key point rotation parameter and the initial value of the posture parameter are optimized .
  • This embodiment optimizes the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, which improves the stability of the optimization process.
  • the fourth loss ensures the optimization
  • the latter parameters correspond to the rationality of the pose.
  • the device further includes: a joint optimization module, configured to perform an initial value of the key point rotation parameter and the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter. After the initial value of is optimized, the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter are jointly optimized. In this embodiment, on the basis of the aforementioned optimization, the optimized parameters are jointly optimized, thereby further improving the accuracy of the optimization result.
  • the supervisory information includes the initial two-dimensional key points of the target object and the initial three-dimensional point cloud of the surface of the target object;
  • the first optimization unit is configured to: acquire the three-dimensional key points of the target object Among the two-dimensional projection key points corresponding to the point, the target two-dimensional projection key point belonging to the preset part of the target object; wherein, the three-dimensional key point of the target object is based on the initial value of the global rotation parameter, the key point rotation parameter The initial value of the initial value and the initial value of the posture parameter are obtained, and the two-dimensional projection key point is obtained by projecting the three-dimensional key point of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining the target The first loss between the two-dimensional projection key point and the initial two-dimensional key point; obtain the second loss between the initial value of the displacement parameter and the current value of the displacement parameter; obtain the target object surface The fifth loss between the first 3D point cloud and the initial 3D point cloud; the first 3D point
  • the joint optimization module includes: a first acquisition unit, configured to acquire the sixth loss between the optimized 2D projection keypoint of the target object and the initial 2D keypoint, the optimization The two-dimensional projection key point is obtained by projecting the optimized three-dimensional key point of the target object based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, and the optimized three-dimensional key point is based on the optimized value of the global rotation parameter, The optimized value of the key point rotation parameter and the optimized value of the posture parameter are obtained; the second acquisition unit is used to obtain the seventh loss, and the seventh loss is used to represent the optimized value of the global rotation parameter and the optimized value of the key point rotation parameter value and the rationality of the posture corresponding to the optimized value of the body posture parameter; the third acquisition unit is used to acquire the eighth loss between the second 3D point cloud on the surface of the target object and the initial 3D point cloud; the first 3D point cloud The 2D and 3D point cloud is obtained based on the optimized value of the global rotation parameter, the optimized value of the key point
  • the functions or modules included in the device provided by the embodiments of the present disclosure can be used to execute the methods described in the method embodiments above, and its specific implementation can refer to the description of the method embodiments above. For brevity, here No longer.
  • the present disclosure also provides a three-dimensional reconstruction system, which includes:
  • An image acquisition device 601 configured to acquire an image of a target object
  • the processing unit 602 communicated with the image acquisition device 601 is configured to perform three-dimensional reconstruction on the target object in the image through the three-dimensional reconstruction network to obtain the initial value of the parameter of the target object, and the initial value of the parameter is used To establish a three-dimensional model of the target object; optimize the initial value of the parameter based on the pre-acquired supervisory information used to represent the characteristics of the target object to obtain the optimized value of the parameter; Skeletal skinning processing to establish a 3D model of the target object.
  • the image acquisition device 601 in the embodiment of the present disclosure may be a device with an image acquisition function such as a camera or a camera, and the images collected by the image acquisition device 601 may be transmitted to the processing unit 602 in real time, or stored, and transmitted from the storage space when needed to processing unit 602.
  • the processing unit 602 may be a single server or a server cluster composed of multiple servers. For the method executed by the processing unit 602, refer to the above-mentioned embodiment of the three-dimensional reconstruction method for details, and details are not repeated here.
  • the embodiment of this specification also provides a computer device, which at least includes a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein, when the processor executes the program, the computer program described in any of the preceding embodiments is implemented. described method.
  • FIG. 7 shows a schematic diagram of a more specific hardware structure of a computing device provided by the embodiment of this specification.
  • the device may include: a processor 701 , a memory 702 , an input/output interface 703 , a communication interface 704 and a bus 705 .
  • the processor 701 , the memory 702 , the input/output interface 703 and the communication interface 704 are connected to each other within the device through the bus 705 .
  • the processor 701 may be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs to realize the technical solutions provided by the embodiments of this specification.
  • the processor 701 may also include a graphics card, and the graphics card may be an Nvidia titan X graphics card or a 1080Ti graphics card.
  • the memory 702 can be implemented in the form of ROM (Read Only Memory, read-only memory), RAM (Random Access Memory, random access memory), static storage device, dynamic storage device, etc.
  • the memory 702 can store an operating system and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 702 and invoked by the processor 701 for execution.
  • the input/output interface 703 is used to connect the input/output module to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions.
  • the input device may include a keyboard, mouse, touch screen, microphone, various sensors, etc.
  • the output device may include a display, a speaker, a vibrator, an indicator light, and the like.
  • the communication interface 704 is used to connect with a communication module (not shown in the figure), so as to realize communication interaction between the device and other devices.
  • the communication module can realize communication through wired means (such as USB, network cable, etc.), and can also realize communication through wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • Bus 705 includes a path for transferring information between the various components of the device (eg, processor 701, memory 702, input/output interface 703, and communication interface 704).
  • the above device only shows the processor 701, the memory 702, the input/output interface 703, the communication interface 704, and the bus 705, in the specific implementation process, the device may also include other components.
  • the above-mentioned device may only include components necessary to implement the solutions of the embodiments of this specification, and does not necessarily include all the components shown in the figure.
  • An embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any one of the foregoing embodiments is implemented.
  • Computer-readable media including both permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for storage of information.
  • Information may be computer readable instructions, data structures, modules of a program, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media excludes transitory computer-readable media, such as modulated data signals and carrier waves.
  • a typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.
  • each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments.
  • the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.
  • the device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the functions of each module may be integrated in the same or multiple software and/or hardware implementations. Part or all of the modules can also be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without creative effort.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)
  • Image Generation (AREA)

Abstract

本公开提供一种三维重建方法、装置和***、介质及计算机设备,通过三维重建网络对图像中的目标对象进行三维重建,得到所述目标对象的参数的初始值,其中,所述参数的初始值用于建立所述目标对象的三维模型;基于预先获取的用于表示目标对象的特征的监督信息对所述参数的初始值进行优化,得到参数的优化值;基于所述参数的优化值进行骨骼蒙皮处理,建立所述目标对象的三维模型。

Description

三维重建方法、装置和***、介质及计算机设备
相关申请的交叉引用
本公开要求于2021年05月10日提交的、申请号为202110506464X、发明名称为“三维重建方法、装置和***、介质及计算机设备”的中国专利申请的优先权,该申请以引用的方式并入本文中。
技术领域
本公开涉及计算机视觉技术领域,尤其涉及三维重建方法、装置和***、介质及计算机设备。
背景技术
三维重建是计算机视觉中的重要技术之一,在增强现实,虚拟现实等领域有许多潜在的应用。通过对目标对象进行三维重建,能够重建出目标对象的体态和肢体旋转。然而,传统的三维重建方式无法兼顾重建结果的准确性和可靠性。
发明内容
本公开提供一种三维重建方法、装置和***、介质及计算机设备。
根据本公开实施例的第一方面,提供一种三维重建方法,所述方法包括:通过三维重建网络对图像中的目标对象进行三维重建,得到所述目标对象的参数的初始值,其中,所述参数的初始值用于建立所述目标对象的三维模型;基于预先获取的用于表示目标对象的特征的监督信息对所述参数的初始值进行优化,得到所述参数的优化值;基于所述参数的优化值进行骨骼蒙皮处理,建立所述目标对象的三维模型。
在一些实施例中,所述监督信息包括第一监督信息,或者所述监督信息包括第一监督信息和第二监督信息;所述第一监督信息包括以下至少一者:所述目标对象的初始二维关键点,所述图像中所述目标对象上的多个像素点的语义信息;所述第二监督信息包括所述目标对象表面的初始三维点云。本公开实施例可以仅采用目标对象的初始二维关键点或者像素点的语义信息作为监督信息来对所述参数的初始值进行优化,优化效率较高,优化复杂度低;或者,也可以将目标对象表面的初始三维点云与前述的初始二维关键点或者像素点的语义信息共同作为监督信息,从而提高获取的参数的优化值的准确度。
在一些实施例中,所述方法还包括:通过关键点提取网络从所述图像中提取所述目标对象的初始二维关键点的信息。将关键点提取网络提取出的初始二维关键点的信息作为监督信息,能够为三维模型生成较为自然合理的动作。
在一些实施例中,所述图像包括所述目标对象的深度图像;所述方法还包括:从所述深度图像中提取所述目标对象上多个像素点的深度信息;基于所述深度信息将所述深度图像中所述目标对象上的多个像素点反向投影到三维空间,得到所述目标对象表面 的初始三维点云。通过提取深度信息,并基于深度信息将二维图像上的像素点反向投影到三维空间,得到目标对象表面的初始三维点云,从而能够将该初始三维点云作为监督信息来优化参数的初始值,进一步提高了参数优化的准确性。
在一些实施例中,所述图像还包括所述目标对象的RGB图像;所述从所述深度图像中提取所述目标对象上多个像素点的深度信息,包括:对所述RGB图像进行图像分割,基于图像分割的结果确定所述RGB图像中目标对象所在的图像区域,基于所述RGB图像中目标对象所在的图像区域确定所述深度图像中目标对象所在的图像区域;获取所述深度图像中所述目标对象所在的图像区域中多个像素点的深度信息。通过对RGB图像进行图像分割,能够准确地确定目标对象的位置,从而准确地提取出目标对象的深度信息。
在一些实施例中,所述方法还包括:从所述初始三维点云中过滤掉离群点,将过滤后的所述初始三维点云作为所述第二监督信息。通过过滤离群点,从而减轻离群点的干扰,进一步提高了参数优化过程的准确性。
在一些实施例中,所述目标对象的图像通过图像采集装置采集得到,所述参数包括:所述目标对象的全局旋转参数、所述目标对象各个关键点的关键点旋转参数、所述目标对象的体态参数以及所述图像采集装置的位移参数;所述基于预先获取的用于表示目标对象特征的监督信息对所述参数的初始值进行优化,包括:在所述体态参数的初始值和关键点旋转参数的初始值保持不变的情况下,基于所述监督信息和所述位移参数的初始值,对所述图像采集装置的位移参数的当前值以及所述全局旋转参数的初始值进行优化,得到位移参数的优化值和全局旋转参数的优化值;基于所述位移参数的优化值和全局旋转参数的优化值,对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化,得到关键点旋转参数的优化值和体态参数的优化值。由于在优化过程中,改变图像采集装置的位置与改变三维关键点位置均可以导致三维关键点的二维投影产生变化,这将会导致优化过程很不稳定。通过采用两阶段优化的方式,先固定关键点旋转参数的初始值和体态参数的初始值来对图像采集装置的位移参数的初始值和全局旋转参数的初始值进行优化,再固定位移参数的初始值和全局旋转参数的初始值,对关键点旋转参数的初始值和体态参数的初始值进行优化,提高了优化过程的稳定性。
在一些实施例中,所述监督信息包括所述目标对象的初始二维关键点;所述基于所述监督信息和所述位移参数的初始值,对所述图像采集装置的位移参数的当前值以及所述全局旋转参数的初始值进行优化,包括:获取所述目标对象的三维关键点对应的二维投影关键点中属于所述目标对象的预设部位的目标二维投影关键点;其中,所述目标对象的三维关键点基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到,所述二维投影关键点基于所述位移参数的当前值和全局旋转参数的初始值对所述目标对象的三维关键点进行投影得到;获取所述目标二维投影关键点与所述初始二维关键点之间的第一损失;获取所述位移参数的初始值与所述位移参数的当前值之间的第二损失;基于所述第一损失和第二损失对所述位移参数的当前值和全局旋转参数的初始值进行优化。预设部位可以是躯干等部位,由于不同的动作对躯干部位的关键点的影响较小,因此,通过采用躯干部位的关键点确定第一损失,能够减轻不同动作对关键点位置的影响,提高优化结果的准确性。由于二维关键点是二维平面上的监督信息,而图像采集装置的位移参数是三维平面上的参数,通过获取第二损失,能够减少优化结果落入二维平面上的局部最优点从而偏离真实点的情况。
在一些实施例中,所述监督信息包括所述目标对象的初始二维关键点;所述基于所述位移参数的优化值和全局旋转参数的优化值,对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化,包括:获取所述目标对象的优化二维投影关键点与所述初始二维关键点之间的第三损失,所述优化二维投影关键点基于所述位移参数的优化值 和全局旋转参数的优化值对所述目标对象的优化三维关键点进行投影得到,所述优化三维关键点基于所述全局旋转参数的优化值、关键点旋转参数的初始值和体态参数的初始值得到;获取第四损失,所述第四损失用于表征所述全局旋转参数的优化值、关键点旋转参数的初始值和体态参数的初始值对应的姿态的合理性;基于所述第三损失和所述第四损失对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化。本实施例基于位移参数的优化值和全局旋转参数的优化值对关键点旋转参数的初始值和体态参数的初始值进行优化,提高了优化过程的稳定性,同时,通过第四损失保证了优化后的参数对应的姿态的合理性。
在一些实施例中,所述方法还包括:在基于所述位移参数的优化值和全局旋转参数的优化值,对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化之后,对所述全局旋转参数的优化值,所述关键点旋转参数的优化值,体态参数的优化值以及所述位移参数的优化值进行联合优化。本实施例在前述优化的基础上,对优化后的各项参数进行联合优化,从而进一步提高了优化结果的准确性。
在一些实施例中,所述监督信息包括所述目标对象的初始二维关键点和所述目标对象表面的初始三维点云;所述基于所述监督信息和所述位移参数的初始值,对所述图像采集装置的位移参数的当前值以及所述全局旋转参数的初始值进行优化,包括:获取所述目标对象的三维关键点对应的二维投影关键点中属于所述目标对象的预设部位的目标二维投影关键点;其中,所述目标对象的三维关键点基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到,所述二维投影关键点基于所述位移参数的当前值和全局旋转参数的初始值对所述目标对象的三维关键点进行投影得到;获取所述目标二维投影关键点与所述初始二维关键点之间的第一损失;获取所述位移参数的初始值与所述位移参数的当前值之间的第二损失;获取所述目标对象表面的第一三维点云与所述初始三维点云之间的第五损失;所述第一三维点云基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到;基于所述第一损失、第二损失和第五损失对所述位移参数的当前值和全局旋转参数的初始值进行优化。本实施例将三维点云加入到监督信息中对初始的各项参数进行优化,从而提高了优化结果的准确性。
在一些实施例中,所述对所述全局旋转参数的优化值,所述关键点旋转参数的优化值,体态参数的优化值以及所述位移参数的优化值进行联合优化,包括:获取所述目标对象的优化二维投影关键点与所述初始二维关键点之间的第六损失,所述优化二维投影关键点基于所述位移参数的优化值和全局旋转参数的优化值对所述目标对象的优化三维关键点进行投影得到,所述优化三维关键点基于所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值得到;获取第七损失,所述第七损失用于表征所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值对应的姿态的合理性;获取所述目标对象表面的第二三维点云与所述初始三维点云之间的第八损失;所述第二三维点云基于所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值得到;基于所述第六损失、第七损失和第八损失对所述全局旋转参数的优化值,所述关键点旋转参数的优化值,体态参数的优化值以及所述位移参数的优化值进行联合优化。本实施例将三维点云加入到监督信息中对初始的各项参数进行优化,从而提高了优化结果的准确性。
根据本公开实施例的第二方面,提供一种三维重建装置,所述装置包括:第一三维重建模块,用于通过三维重建网络对图像中的目标对象进行三维重建,得到所述目标对象的参数的初始值,其中,所述参数的初始值用于建立所述目标对象的三维模型;优化模块,用于基于预先获取的用于表示目标对象的特征的监督信息对所述参数的初始值进行优化,得到参数的优化值;第二三维重建模块,用于基于所述参数的优化值进行骨 骼蒙皮处理,建立所述目标对象的三维模型。
在一些实施例中,所述监督信息包括第一监督信息,或者所述监督信息包括第一监督信息和第二监督信息;所述第一监督信息包括以下至少一者:所述目标对象的初始二维关键点,所述图像中所述目标对象上的多个像素点的语义信息;所述第二监督信息包括所述目标对象表面的初始三维点云。本公开实施例可以仅采用目标对象的初始二维关键点或者像素点的语义信息作为监督信息来对所述参数的初始值进行优化,优化效率较高,优化复杂度低;或者,也可以将目标对象表面的初始三维点云与前述的初始二维关键点或者像素点的语义信息共同作为监督信息,从而提高获取的参数的优化值的准确度。
在一些实施例中,所述装置还包括:二维关键点提取模块,用于通过关键点提取网络从所述图像中提取所述目标对象的初始二维关键点的信息。将关键点提取网络提取出的初始二维关键点的信息作为监督信息,能够为三维模型生成较为自然合理的动作。
在一些实施例中,所述图像包括所述目标对象的深度图像;所述装置还包括:深度信息提取模块,用于从所述深度图像中提取所述目标对象上多个像素点的深度信息;反向投影模块,用于基于所述深度信息将所述深度图像中所述目标对象上的多个像素点反向投影到三维空间,得到所述目标对象表面的初始三维点云。通过提取深度信息,并基于深度信息将二维图像上的像素点反向投影到三维空间,得到目标对象表面的初始三维点云,从而能够将该初始三维点云作为监督信息来优化参数的初始值,进一步提高了参数优化的准确性。
在一些实施例中,所述图像还包括所述目标对象的RGB图像;所述深度信息提取模块包括:图像分割单元,用于对所述RGB图像进行图像分割,图像区域确定单元,用于基于图像分割的结果确定所述RGB图像中目标对象所在的图像区域,基于所述RGB图像中目标对象所在的图像区域确定所述深度图像中目标对象所在的图像区域;深度信息获取单元,用于获取所述深度图像中所述目标对象所在的图像区域中多个像素点的深度信息。通过对RGB图像进行图像分割,能够准确地确定目标对象的位置,从而准确地提取出目标对象的深度信息。
在一些实施例中,所述装置还包括:过滤模块,用于从所述初始三维点云中过滤掉离群点,将过滤后的所述初始三维点云作为所述第二监督信息。通过过滤离群点,从而减轻离群点的干扰,进一步提高了参数优化过程的准确性。
在一些实施例中,所述目标对象的图像通过图像采集装置采集得到,所述参数包括:所述目标对象的全局旋转参数、所述目标对象各个关键点的关键点旋转参数、所述目标对象的体态参数以及所述图像采集装置的位移参数;所述优化模块包括:第一优化单元,用于在所述体态参数的初始值和关键点旋转参数的初始值保持不变的情况下,基于所述监督信息和所述位移参数的初始值,对所述图像采集装置的位移参数的当前值以及所述全局旋转参数的初始值进行优化,得到位移参数的优化值和全局旋转参数的优化值;第二优化单元,用于基于所述位移参数的优化值和全局旋转参数的优化值,对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化,得到关键点旋转参数的优化值和体态参数的优化值。由于在优化过程中,改变图像采集装置的位置与改变三维关键点位置均可以导致三维关键点的二维投影产生变化,这将会导致优化过程很不稳定。通过采用两阶段优化的方式,先固定关键点旋转参数的初始值和体态参数的初始值来对图像采集装置的位移参数的初始值和全局旋转参数的初始值进行优化,再固定位移参数的初始值和全局旋转参数的初始值,对关键点旋转参数的初始值和体态参数的初始值进行优化,提高了优化过程的稳定性。
在一些实施例中,所述监督信息包括所述目标对象的初始二维关键点;所述第一 优化单元用于:获取所述目标对象的三维关键点对应的二维投影关键点中属于所述目标对象的预设部位的目标二维投影关键点;其中,所述目标对象的三维关键点基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到,所述二维投影关键点基于所述位移参数的当前值和全局旋转参数的初始值对所述目标对象的三维关键点进行投影得到;获取所述目标二维投影关键点与所述初始二维关键点之间的第一损失;获取所述位移参数的初始值与所述位移参数的当前值之间的第二损失;基于所述第一损失和第二损失对所述位移参数的当前值和全局旋转参数的初始值进行优化。预设部位可以是躯干等部位,由于不同的动作对躯干部位的关键点的影响较小,因此,通过采用躯干部位的关键点确定第一损失,能够减轻不同动作对关键点位置的影响,提高优化结果的准确性。由于二维关键点是二维平面上的监督信息,而图像采集装置的位移参数是三维平面上的参数,通过获取第二损失,能够减少优化结果落入二维平面上的局部最优点从而偏离真实点的情况。
在一些实施例中,所述监督信息包括所述目标对象的初始二维关键点;所述第二优化单元用于:获取所述目标对象的优化二维投影关键点与所述初始二维关键点之间的第三损失,所述优化二维投影关键点基于所述位移参数的优化值和全局旋转参数的优化值对所述目标对象的优化三维关键点进行投影得到,所述优化三维关键点基于所述全局旋转参数的优化值、关键点旋转参数的初始值和体态参数的初始值得到;获取第四损失,所述第四损失用于表征所述全局旋转参数的优化值、关键点旋转参数的初始值和体态参数的初始值对应的姿态的合理性;基于所述第三损失和所述第四损失对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化。本实施例基于位移参数的优化值和全局旋转参数的优化值对关键点旋转参数的初始值和体态参数的初始值进行优化,提高了优化过程的稳定性,同时,通过第四损失保证了优化后的参数对应的姿态的合理性。
在一些实施例中,所述装置还包括:联合优化模块,用于在基于所述位移参数的优化值和全局旋转参数的优化值,对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化之后,对所述全局旋转参数的优化值,所述关键点旋转参数的优化值,体态参数的优化值以及所述位移参数的优化值进行联合优化。本实施例在前述优化的基础上,对优化后的各项参数进行联合优化,从而进一步提高了优化结果的准确性。
在一些实施例中,所述监督信息包括所述目标对象的初始二维关键点和所述目标对象表面的初始三维点云;所述第一优化单元用于:获取所述目标对象的三维关键点对应的二维投影关键点中属于所述目标对象的预设部位的目标二维投影关键点;其中,所述目标对象的三维关键点基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到,所述二维投影关键点基于所述位移参数的当前值和全局旋转参数的初始值对所述目标对象的三维关键点进行投影得到;获取所述目标二维投影关键点与所述初始二维关键点之间的第一损失;获取所述位移参数的初始值与所述位移参数的当前值之间的第二损失;获取所述目标对象表面的第一三维点云与所述初始三维点云之间的第五损失;所述第一三维点云基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到;基于所述第一损失、第二损失和第五损失对所述位移参数的当前值和全局旋转参数的初始值进行优化。本实施例将三维点云加入到监督信息中对初始的各项参数进行优化,从而提高了优化结果的准确性。
在一些实施例中,所述联合优化模块包括:第一获取单元,用于获取所述目标对象的优化二维投影关键点与所述初始二维关键点之间的第六损失,所述优化二维投影关键点基于所述位移参数的优化值和全局旋转参数的优化值对所述目标对象的优化三维关键点进行投影得到,所述优化三维关键点基于所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值得到;第二获取单元,用于获取第七损失,所述第七损失用于表征所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优 化值对应的姿态的合理性;第三获取单元,用于获取所述目标对象表面的第二三维点云与所述初始三维点云之间的第八损失;所述第二三维点云基于所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值得到;联合优化单元,用于基于所述第六损失、第七损失和第八损失对所述全局旋转参数的优化值,所述关键点旋转参数的优化值,体态参数的优化值以及所述位移参数的优化值进行联合优化。本实施例将三维点云加入到监督信息中对初始的各项参数进行优化,从而提高了优化结果的准确性。
根据本公开实施例的第三方面,提供一种三维重建***,所述***包括:图像采集装置,用于采集目标对象的图像;以及与所述图像采集装置通信连接的处理单元,用于通过三维重建网络对所述图像中的所述目标对象进行三维重建,得到所述目标对象的参数的初始值,所述参数的初始值用于建立所述目标对象的三维模型;基于预先获取的用于表示目标对象特征的监督信息对所述参数的初始值进行优化,得到所述参数的优化值;基于所述参数的优化值进行骨骼蒙皮处理,建立所述目标对象的三维模型。
根据本公开实施例的第四方面,提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现任一实施例所述的方法。
根据本公开实施例的第五方面,提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现任一实施例所述的方法。
根据本公开实施例的第六方面,提供一种计算机程序产品,该计算机程序产品存储于存储介质中并包括可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现任一实施例所述的方法。
本公开实施例通过将三维重建网络对目标对象的图像进行三维重建,从而得到参数的初始值,再基于监督信息对所述参数的初始值进行优化,基于参数优化得到的参数的优化值来建立目标对象的三维模型。参数优化的方法优点在于能够给出较为精确的,符合图像二维观察特征的三维重建结果,但往往会给不自然的,不合理的动作结果,可靠性较低。而通过三维重建网络进行网络回归则能够给出较为自然合理的动作结果,因此,将三维重建网络的输出结果作为参数的初始值来进行优化,能够在保证三维重建结果可靠性的基础上,兼顾三维重建的准确性。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。
图1A和图1B是一些实施例的三维模型的示意图。
图2是本公开实施例的三维重建方法的流程图。
图3是本公开实施例的整体流程图。
图4A和图4B分别是本公开实施例的应用场景的示意图。
图5是本公开实施例的三维重建装置的框图。
图6是本公开实施例的三维重建***的示意图。
图7是本公开实施例的计算机设备的结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合。
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
为了使本技术领域的人员更好的理解本公开实施例中的技术方案,并使本公开实施例的上述目的、特征和优点能够更加明显易懂,下面结合附图对本公开实施例中的技术方案作进一步详细的说明。
对目标对象进行三维重建需要重建出目标对象的体态和肢体旋转,通常使用参数化模型来表达目标对象的体态和肢体旋转,而不仅仅是三维关键点。例如,对不同的人进行三维重建,分别重建出了体态较瘦的人的三维模型(如图1A所示)和体态较胖的人的三维模型(如图1B所示),由于图1A所示的人和图1B所示的人处于相同的姿态下,关键点信息相同,仅通过关键点信息则无法表示出二者体态上的差异。
在相关技术中,一般通过参数优化和网络回归两种方式进行三维重建。参数优化的方法通常选择一套标准参数,依据目标对象的图像的二维视觉特征,采用梯度下降法来对目标对象的三维模型的参数的初始值进行迭代优化,其中图像的二维视觉特征可以选择二维关键点等。参数优化的方法优点在于能够给出较为准确的、符合图像二维视觉特征的参数估计结果,但往往会给出不自然、不合理的动作结果,并且参数优化的最终性能非常依赖参数的初始值,导致基于参数优化的三维重建方式可靠性较低。
网络回归的方法通常训练一个端到端的神经网络来学习从图像到三维模型参数的映射。网络回归的方法优点在于能够给出较为自然合理的动作结果,但由于缺乏大量的训练数据,三维重建结果可能与图像中的二维视觉特征不符,因此,基于网络回归的三维重建方式准确度较低。相关技术中的三维重建方式无法兼顾三维重建结果的准确性和可靠性。
基于此,本公开实施例提供一种三维重建方法,如图2所示,所述方法包括:
步骤201:通过三维重建网络对图像中的目标对象进行三维重建,得到所述目标对象的参数的初始值,其中,所述参数的初始值用于建立所述目标对象的三维模型;
步骤202:基于预先获取的用于表示目标对象的特征的监督信息对所述参数的初始值进行优化,得到参数的优化值;
步骤203:基于所述参数的优化值进行骨骼蒙皮处理,建立所述目标对象的三维模型。
在步骤201中,目标对象可以是三维对象,例如物理空间中的人、动物、机器人等,或者是所述三维对象上的一个或多个区域,例如,人脸或者肢体等。为了便于描述,下文以目标对象是人,对目标对象进行的三维重建为人体重建为例进行说明。所述目标对象的图像可以是单张图像,也可以包括从多个不同视角对目标对象进行拍摄得到的多张图像。基于单张图像的三维人体重建称为单目三维人体重建,基于不同视角的多张图像的三维人体重建称为多目三维人体重建。每张图像都可以是灰度图、RGB图像或者RGBD图像。所述图像可以是目标对象周围的图像采集装置(例如,相机或者摄像头)实时采集的图像,也可以是预先采集并储存的图像。
可以通过三维重建网络对目标对象的图像进行三维重建,其中,三维重建网络可以是一个预先训练的神经网络。三维重建网络可以基于图像进行三维重建,并估计出自然合理的参数的初始值,这里的参数的初始值可以通过一个向量来表示,所述向量的维度例如可以是85维,所述向量中包含人体的运动肢体旋转信息(即姿态参数的初始值,包括人体的全局旋转参数的初始值和23个关键点的关键点旋转参数的初始值)、体态参数的初始值以及摄像机的参数的初始值这三部分信息。人体可以由关键点和连接这些关键点的肢体骨骼表示,人体关键点可包括头顶、鼻子、脖子、左右眼、左右耳、胸部、左右肩膀、左右手肘、左右手腕、左右髋部、左右臀、左右膝盖、左右脚踝等关键点中的一个或多个,姿态参数的初始值用于确定人体的关键点在三维空间中的位置。体态参数的初始值用于确定人体的高矮胖瘦等身材信息。所述摄像机的参数的初始值用于确定人体在摄像机坐标系下在三维空间中的绝对位置,摄像机的参数包括摄像机与人体之间的位移参数以及摄像机的姿态参数,其中,摄像机的姿态参数的初始值可以用人体的全局旋转参数的初始值来代替。可以使用多人线性蒙皮(Skinned Multi-Person Linear,SMPL)模型的参数形式(称为SMPL参数)来表示所述人体参数。在获取SMPL参数的值之后,可以基于SMPL参数的值进行骨骼蒙皮处理,即使用一个映射函数M(θ,β)将体态参数的初始值和姿态参数的初始值映射为人体表面的三维模型,该三维模型包括6890个顶点,顶点之间通过固定的连接关系构成三角面片。可以使用一个预训练的回归器W,从人体表面模型的顶点进一步回归出人体的三维关键点
Figure PCTCN2022075636-appb-000001
即:
Figure PCTCN2022075636-appb-000002
在步骤202中,监督信息可以是图像的二维视觉特征(也被称为二维观察特征),例如,图像中目标对象的二维关键点和所述目标对象上的多个像素点的语义信息中的至少一者。一个像素点的语义信息用于表征所述像素点处于所述目标对象上的哪个区域,所述区域例如可以是头部、手臂、躯干、腿等所在区域。在采用二维关键点信息作为监督信息的情况下,可以使用二维关键点提取网络对图像中的人体关键点位置进行估计,此处可以选用任意的二维姿态估计方法,例如OpenPose。除了采用二维视觉特征作为监督信息之外,还可以将二维视觉特征和目标对象表面的初始三维点云共同作为监督信息,从而进一步提高三维重建的准确性。
在所述图像包括深度图像(例如,所述图像为RGBD图像)的情况下,可以从所述深度图像中提取所述目标对象上多个像素点的深度信息,基于所述深度信息将所述深度图像中所述目标对象上的多个像素点投影到三维空间,得到所述目标对象表面的初始三维点云。
所述多个像素点可以是图像中目标对象上的部分或全部像素点。例如,可以包括目标对象上需要进行三维重建的各个区域的像素点,且每个区域中像素点的数量应大于或等于进行三维重建所需的数量。
由于图像中一般既包括目标对象,又包括背景区域。因此,可以对所述图像中包 括的RGB图像进行图像分割,获取所述RGB图像中目标对象所在的图像区域,基于所述RGB图像中目标对象所在的图像区域确定所述深度图像中目标对象所在的图像区域;获取所述深度图像中所述目标对象所在的图像区域中多个像素点的深度信息。通过进行图像分割,可以从图像中提取出需要进行三维重建的目标对象所在的图像区域,避免图像中的背景区域对三维重建的影响。在一些实施例中,所述深度图像中的像素点与所述RGB图像中的像素点一一对应。例如,所述图像也可以为RGBD图像。
进一步地,还可以从三维点云(即,初始三维点云)中过滤掉离群点,监督信息可包括过滤后的三维点云。所述过滤可以采用点云过滤器实现。通过过滤掉离群点,能够得到更加精细的目标对象表面的三维点云,从而进一步提高三维重建的准确性。对三维点云中的每一个目标三维点,获取与该目标三维点距离最近的n个三维点到该目标三维点的平均距离,假设各个目标三维点对应的平均距离服从一个统计分布(例如,高斯分布),可以计算该统计分布的均值和方差,并基于所述均值和方差设定一个阈值s,那么平均距离在阈值s范围之外的三维点,可以被视为离群点并从三维点云中过滤掉。
在实际应用中,如果所述图像为RGB图像,可以将二维观察特征作为监督信息对所述参数的初始值进行迭代优化。如果所述图像为RGBD图像,可以将二维观察特征和目标对象表面的三维点云共同作为监督信息对所述参数的初始值进行迭代优化。优化方式例如可以采用梯度下降法,本公开对此不做限制。
在步骤203中,可以基于所述参数的优化值进行骨骼蒙皮处理,得到所述目标对象的三维模型。
如图3所示,是本公开实施例的整体流程图。在输入为RGB图像的情况下,可以通过三维重建网络对RGB图像进行三维重建,得到图像中人的人体参数值,并采用关键点提取网络对图像中的人进行关键点提取,得到人体二维关键点。然后,将人体参数值作为参数的初始值,将人体二维关键点作为监督信息,通过参数优化模块对人体参数初始值进行优化,得到人体参数的优化值,并基于人体参数的优化值进行骨骼蒙皮处理,得到人体重建模型。
在输入为RGBD图像的情况下,可以将图像分解为RGB图像和TOF(Time of Flight,飞行时间)深度图,TOF深度图中包括RGB图像中各个像素点的深度信息。可以通过三维重建网络对RGB图像进行三维重建,得到图像中人的人体参数值,并采用关键点提取网络对图像中的人进行关键点提取,得到人体二维关键点。还可以采用点云重建模块来基于TOF深度图中的深度信息重建出人体表面点云。然后,将人体参数值作为参数的初始值,将人体二维关键点和人体表面点云共同作为监督信息,通过参数优化模块对人体参数初始值进行优化,得到人体参数的优化值,并基于人体参数的优化值进行骨骼蒙皮处理,得到人体重建模型。
进一步地,在得到人体重建模型之后,还可以基于RGB图像或者RGBD图像中的颜色信息,对人体重建模型进行色彩处理,以使人体重建模型与图像中的人物的颜色信息相匹配。
本公开实施例中,通过三维重建网络对图像中的目标对象进行三维重建,从而得到参数的初始值,再基于监督信息对所述参数的初始值进行优化,基于参数的优化值来建立目标对象的三维模型。参数优化的方法优点在于能够给出较为精确的,符合图像二维观察特征的三维重建结果,但往往会给不自然的、不合理的动作结果,可靠性较低。而通过三维重建网络进行网络回归则能够给出较为自然合理的动作结果,因此,将三维重建网络的输出结果作为参数的初始值来进行参数优化,能够在保证三维重建结果可靠性的基础上,兼顾三维重建的准确性。
在一些实施例中,在参数优化阶段,可以采用多阶段优化方法。所述多阶段优化方法可包括摄像机优化阶段与姿态优化阶段。在摄像机优化阶段,优化目标为全局旋转参数的值R以及所述图像采集装置与所述目标对象之间的位移参数的当前值t。其中,t和R都是三维向量,R使用轴角形式表达。在姿态优化阶段,优化目标为关键点旋转参数的值与体态参数的值。
由于在优化过程中,改变摄像机位置与改变人体三维关键点位置均可以导致三维关键点的二维投影产生变化,这将会导致优化过程很不稳定。因此在摄像机优化阶段中,固定人体姿态,在姿态优化阶段,固定摄像机位置,从而提高优化过程的稳定性。即,在所述体态参数的初始值和关键点旋转参数的初始值保持不变的情况下,基于所述监督信息和所述位移参数的初始值,对所述图像采集装置的位移参数的当前值以及所述全局旋转参数的初始值进行优化,得到位移参数的优化值和全局旋转参数的优化值;然后保持位移参数的优化值和全局旋转参数的优化值不变,基于所述位移参数的优化值和全局旋转参数的优化值,对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化,得到关键点旋转参数的优化值和体态参数的优化值。
进一步地,可以获取所述目标对象的三维关键点对应的二维投影关键点中属于所述目标对象的预设部位的目标二维投影关键点;其中,所述目标对象的三维关键点基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到;所述二维投影关键点基于所述位移参数的当前值和全局旋转参数的初始值对所述目标对象的三维关键点进行投影得到。获取所述目标二维投影关键点与所述初始二维关键点之间的第一损失。获取所述位移参数的初始值与所述位移参数的当前值之间的第二损失。基于所述第一损失和第二损失对所述位移参数的当前值和全局旋转参数的初始值进行优化。
其中,所述预设部位可以是躯干部位,所述目标二维投影关键点可以包括左右肩膀点,左右髋部点,脊柱中心点等关键点。由于不同的动作对躯干部位的关键点的影响较小,因此,通过采用躯干部位的关键点建立第一损失,能够减轻不同动作对关键点位置的影响,提高优化结果的准确性。第一损失也可以称为躯干关键点投影损失,第二损失也可以称为相机位移正则化损失,第一损失可通过下述公式(1)得到,第二损失可通过下述公式(2)得到:
Figure PCTCN2022075636-appb-000003
L cam=||t-t net|| 2    (2);
其中,L torso和L cam分别表示第一损失和第二损失,x torso
Figure PCTCN2022075636-appb-000004
分别表示目标二维投影关键点和初始二维关键点,t和t net分别表示所述图像采集装置与所述目标对象之间的位移参数的当前值以及所述位移参数的初始值。可以基于第一损失和第二损失确定第一目标损失L 1,例如,所述第一目标损失可以确定为所述第一损失与所述第二损失之和,可通过下述公式(3)确定:
L 1=L torso+L cam    (3)。
可以获取所述目标对象的优化二维投影关键点与所述初始二维关键点之间的第三损失,其中,所述优化二维投影关键点基于所述位移参数的优化值和全局旋转参数的优化值对所述目标对象的优化三维关键点进行投影得到,所述优化三维关键点基于所述全局旋转参数的优化值、关键点旋转参数的初始值和体态参数的初始值得到。获取第四损 失,所述第四损失用于表征所述全局旋转参数的优化值、关键点旋转参数的初始值和体态参数的初始值对应的姿态的合理性。基于所述第三损失和所述第四损失对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化。
第三损失也可以称为二维关键点投影损失,第四损失也可以称为姿态合理性损失,第三损失可通过下述公式(4)确定:
Figure PCTCN2022075636-appb-000005
其中,L 2d为第三损失,x和
Figure PCTCN2022075636-appb-000006
分别表示所述优化二维投影关键点以及所述初始二维关键点。可以基于第三损失和第四损失确定第二目标损失,例如,所述第二目标损失可以确定为所述第三损失与所述第四损失之和,可通过下述公式(5)确定:
L 2=L 2d+L prior    (5);
其中,L 2为第二目标损失,L prior为第四损失,可以采用高斯混合模型(Gaussian Mixture Model,GMM)来获取,用于判断全局旋转参数的优化值、关键点旋转参数的初始和体态参数的初始值对应的姿态是否合理,对不合理的姿态输出较大的损失。
在基于所述位移参数的优化值和全局旋转参数优化值,对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化之后,还可以对所述全局旋转参数的优化值,所述关键点旋转参数的优化值,体态参数的优化值以及所述位移参数的优化值进行联合优化,即采用三阶段优化方式。对于监督信息中包括目标对象表面的三维点云的信息的情况,可以采用所述三阶段优化方式,包括摄像机优化阶段、姿态优化阶段和点云优化阶段。
在摄像机优化阶段,可以获取所述目标对象的三维关键点对应的二维投影关键点中属于所述目标对象的预设部位的目标二维投影关键点;其中,所述目标对象的三维关键点基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到,所述二维投影关键点基于所述位移参数的当前值和全局旋转参数的初始值对所述目标对象的三维关键点进行投影得到。获取所述目标二维投影关键点与所述初始二维关键点之间的第一损失。获取所述位移参数的初始值与所述位移参数的当前值之间的第二损失。获取所述目标对象表面的第一三维点云与所述初始三维点云之间的第五损失;其中,所述第一三维点云基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到。基于所述第一损失、第二损失和第五损失对所述位移参数的当前值和全局旋转参数的初始值进行优化。所述第五损失也可以称为最近点迭代(Iterative Closest Point,ICP)点云配准损失,可通过如下公式(6)确定:
Figure PCTCN2022075636-appb-000007
式中,L icp为所述第五损失,将所述初始三维点云看作点云P,将所述第一三维点云看作点云Q,K 1={(p,q)}为点云P中的每个点到点云Q中距离最近的点构成的点对集合,K 2={(p,q)}为点云Q中的每个点到点云P中距离最近的点构成的点对集合。第一损失和第二损失分别通过如下公式(7)和公式(8)表示:
Figure PCTCN2022075636-appb-000008
L cam=||t-t net|| 2    (8);
其中,L torso和L cam分别表示第一损失和第二损失,x torso
Figure PCTCN2022075636-appb-000009
分别表示目标二维投影关键点和初始二维关键点,t和t net分别表示所述位移参数的当前值以及所述位移参数的初始值。可以基于第一损失、第二损失和第五损失之和确定第一目标损失L 1,再基于第一目标损失对所述位移参数的当前值和全局旋转参数的初始值进行优化,即,如以下公式(9):
L 1=L torso+L cam+L icp    (9)。
三阶段优化过程中的姿态优化阶段与二阶段优化过程中的姿态优化阶段的优化方式相同,此处不再赘述。
在点云优化阶段,可以获取所述目标对象的优化二维投影关键点与所述初始二维关键点之间的第六损失,其中,所述优化二维投影关键点基于所述位移参数的优化值和全局旋转参数的优化值对所述目标对象的优化三维关键点进行投影得到,所述优化三维关键点基于所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值得到。获取第七损失,所述第七损失用于表征所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值对应的姿态的合理性。获取所述目标对象表面的第二三维点云与所述初始三维点云之间的第八损失;其中,所述第二三维点云基于所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值得到。基于所述第六损失、第七损失和第八损失对所述全局旋转参数的优化值、所述关键点旋转参数的优化值、体态参数的优化值以及所述位移参数的优化值进行联合优化,可通过以下公式(10)和公式(11)进行优化:
Figure PCTCN2022075636-appb-000010
Figure PCTCN2022075636-appb-000011
式中,
Figure PCTCN2022075636-appb-000012
为第六损失,
Figure PCTCN2022075636-appb-000013
为优化二维投影关键点,
Figure PCTCN2022075636-appb-000014
为初始二维关键点。第七损失可以采用高斯混合模型来获取,用于判断全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值对应的姿态是否合理,对不合理的姿态输出较大的损失。
Figure PCTCN2022075636-appb-000015
为第八损失,P为所述初始三维点云看作点云,
Figure PCTCN2022075636-appb-000016
为所述第二三维点云,
Figure PCTCN2022075636-appb-000017
为点云P中的每个点到点云
Figure PCTCN2022075636-appb-000018
中距离最近的点构成的点对集合,
Figure PCTCN2022075636-appb-000019
为点云
Figure PCTCN2022075636-appb-000020
中的每个点到点云P中距离最近的点构成的点对集合。进一步地,可以将第六损失、第七损失和第八损失之和确定为第三目标损失L 3,并基于第三目标损失对所述全局旋转参数的优化值、所述关键点旋转参数的优化值、体态参数的优化值以及所述位移参数的优化值进行联合优化,可通过以下公式(12)进行联合优化:
L 3=L 2d+L prior+L icp    (12)。
在目标对象的图像为RGB图像的情况下,可以基于前述包括摄像机优化阶段与姿态优化阶段的二阶段优化方法进行参数优化;在目标对象的图像为RGBD图像的情况下,可以基于前述包括摄像机优化阶段、姿态优化阶段与点云优化阶段的三阶段优化方法进行参数优化。
本方案的使用场景广泛,可以在虚拟试衣间、虚拟主播、视频动作迁移等场景中给出自然合理且准确的人体重建模型。
如图4A所示,是本公开实施例的虚拟试衣间应用场景的示意图。可以通过摄像头403采集用户401的图像,并将采集的图像发送给处理器(图中未示出)进行三维人体重建,以便获取用户401对应的人体重建模型404,并将人体重建模型404展示在显示界面402上供用户401观看。同时,用户401可以选择所需的服饰405,包括但不限于衣服4051和帽子4052等,可以基于人体重建模型404在显示界面402上显示服饰405,从而使用户401观看服饰405的穿戴效果。
如图4B所示,是本公开实施例的虚拟直播间应用场景的示意图。在进行直播的过程中,可以通过主播客户端407采集主播用户406的图像,将主播用户406的图像发送至服务器408进行三维重建,得到主播用户的人体重建模型,即虚拟主播。服务器408可以将主播用户的人体重建模型返回至主播客户端407进行展示,如图中的模型4071所示。此外,主播客户端407还可以采集主播用户的语音信息,并将语音信息发送至服务器408,以使服务器408对人体重建模型以及语音信息进行融合。服务器408可以将融合后的人体重建模型和语音信息发送至观看直播节目的观众客户端409进行显示和播放,其中,显示的人体重建模型如图中的模型4091所示。通过上述方式,可以在观众客户端409上显示虚拟主播进行直播的画面。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
如图5所示,本公开还提供一种三维重建装置,所述装置包括:
第一三维重建模块501,用于通过三维重建网络对图像中的目标对象进行三维重建,得到所述目标对象的参数的初始值,所述参数的初始值用于建立所述目标对象的三维模型;
优化模块502,用于基于预先获取的用于表示目标对象的特征的监督信息对所述参数的初始值进行优化,得到所述参数的优化值;
第二三维重建模块503,用于基于所述参数的优化值进行骨骼蒙皮处理,建立所述目标对象的三维模型。
在一些实施例中,所述监督信息包括第一监督信息,或者所述监督信息包括第一监督信息和第二监督信息;所述第一监督信息包括以下至少一者:所述目标对象的初始二维关键点,所述图像中所述目标对象上的多个像素点的语义信息;所述第二监督信息包括所述目标对象表面的初始三维点云。本公开实施例可以仅采用目标对象的初始二维关键点或者像素点的语义信息作为监督信息来对所述参数的初始值进行优化,优化效率较高,优化复杂度低;或者,也可以将目标对象表面的初始三维点云与前述的初始二维关键点或者像素点的语义信息共同作为监督信息,从而提高获取的参数的优化值的准确度。
在一些实施例中,所述装置还包括:二维关键点提取模块,用于通过关键点提取网络从所述图像中提取所述目标对象的初始二维关键点的信息。将关键点提取网络提取出的初始二维关键点的信息作为监督信息,能够为三维模型生成较为自然合理的动作。
在一些实施例中,所述图像包括所述目标对象的深度图像;所述装置还包括:深度信息提取模块,用于从所述深度图像中提取所述目标对象上多个像素点的深度信息;反向投影模块,用于基于所述深度信息将所述深度图像中所述目标对象上的多个像素点反向投影到三维空间,得到所述目标对象表面的初始三维点云。通过提取深度信息,并基于深度信息将二维图像上的像素点反向投影到三维空间,得到目标对象表面的初始三维点云,从而能够将该初始三维点云作为监督信息来优化参数的初始值,进一步提高了 参数优化的准确性。
在一些实施例中,所述图像还包括所述目标对象的RGB图像;所述深度信息提取模块包括:图像分割单元,用于对所述RGB图像进行图像分割,图像区域确定单元,用于基于图像分割的结果确定所述RGB图像中目标对象所在的图像区域,基于所述RGB图像中目标对象所在的图像区域确定所述深度图像中目标对象所在的图像区域;深度信息获取单元,用于获取所述深度图像中所述目标对象所在的图像区域中多个像素点的深度信息。通过对RGB图像进行图像分割,能够准确地确定目标对象的位置,从而准确地提取出目标对象的深度信息。
在一些实施例中,所述装置还包括:过滤模块,用于从所述初始三维点云中过滤掉离群点,将过滤后的所述初始三维点云作为所述第二监督信息。通过过滤离群点,从而减轻离群点的干扰,进一步提高了参数优化过程的准确性。
在一些实施例中,所述目标对象的图像通过图像采集装置采集得到,所述参数包括:所述目标对象的全局旋转参数、所述目标对象各个关键点的关键点旋转参数、所述目标对象的体态参数以及所述图像采集装置的位移参数;所述优化模块包括:第一优化单元,用于在所述体态参数的初始值和关键点旋转参数的初始值保持不变的情况下,基于所述监督信息和所述位移参数的初始值,对所述图像采集装置的位移参数的当前值以及所述全局旋转参数的初始值进行优化,得到位移参数的优化值和全局旋转参数的优化值;第二优化单元,用于基于所述位移参数的优化值和全局旋转参数的优化值,对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化,得到关键点旋转参数的优化值和体态参数的优化值。由于在优化过程中,改变图像采集装置的位置与改变三维关键点位置均可以导致三维关键点的二维投影产生变化,这将会导致优化过程很不稳定。通过采用两阶段优化的方式,先固定关键点旋转参数的初始值和体态参数的初始值来对图像采集装置的位移参数的初始值和全局旋转参数的初始值进行优化,再固定位移参数的初始值和全局旋转参数的初始值,对关键点旋转参数的初始值和体态参数的初始值进行优化,提高了优化过程的稳定性。
在一些实施例中,所述监督信息包括所述目标对象的初始二维关键点;所述第一优化单元用于:获取所述目标对象的三维关键点对应的二维投影关键点中属于所述目标对象的预设部位的目标二维投影关键点;其中,所述目标对象的三维关键点基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到,所述二维投影关键点基于所述位移参数的当前值和全局旋转参数的初始值对所述目标对象的三维关键点进行投影得到;获取所述目标二维投影关键点与所述初始二维关键点之间的第一损失;获取所述位移参数的初始值与所述位移参数的当前值之间的第二损失;基于所述第一损失和第二损失对所述位移参数的当前值和全局旋转参数的初始值进行优化。预设部位可以是躯干等部位,由于不同的动作对躯干部位的关键点的影响较小,因此,通过采用躯干部位的关键点确定第一损失,能够减轻不同动作对关键点位置的影响,提高优化结果的准确性。由于二维关键点是二维平面上的监督信息,而图像采集装置的位移参数是三维平面上的参数,通过获取第二损失,能够减少优化结果落入二维平面上的局部最优点从而偏离真实点的情况。
在一些实施例中,所述监督信息包括所述目标对象的初始二维关键点;所述第二优化单元用于:获取所述目标对象的优化二维投影关键点与所述初始二维关键点之间的第三损失,所述优化二维投影关键点基于所述位移参数的优化值和全局旋转参数的优化值对所述目标对象的优化三维关键点进行投影得到,所述优化三维关键点基于所述全局旋转参数的优化值、关键点旋转参数的初始值和体态参数的初始值得到;获取第四损失,所述第四损失用于表征所述全局旋转参数的优化值、关键点旋转参数的初始值和体态参 数的初始值对应的姿态的合理性;基于所述第三损失和所述第四损失对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化。本实施例基于位移参数的优化值和全局旋转参数的优化值对关键点旋转参数的初始值和体态参数的初始值进行优化,提高了优化过程的稳定性,同时,通过第四损失保证了优化后的参数对应的姿态的合理性。
在一些实施例中,所述装置还包括:联合优化模块,用于在基于所述位移参数的优化值和全局旋转参数的优化值,对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化之后,对所述全局旋转参数的优化值,所述关键点旋转参数的优化值,体态参数的优化值以及所述位移参数的优化值进行联合优化。本实施例在前述优化的基础上,对优化后的各项参数进行联合优化,从而进一步提高了优化结果的准确性。
在一些实施例中,所述监督信息包括所述目标对象的初始二维关键点和所述目标对象表面的初始三维点云;所述第一优化单元用于:获取所述目标对象的三维关键点对应的二维投影关键点中属于所述目标对象的预设部位的目标二维投影关键点;其中,所述目标对象的三维关键点基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到,所述二维投影关键点基于所述位移参数的当前值和全局旋转参数的初始值对所述目标对象的三维关键点进行投影得到;获取所述目标二维投影关键点与所述初始二维关键点之间的第一损失;获取所述位移参数的初始值与所述位移参数的当前值之间的第二损失;获取所述目标对象表面的第一三维点云与所述初始三维点云之间的第五损失;所述第一三维点云基于所述全局旋转参数的初始值、关键点旋转参数的初始值和体态参数的初始值得到;基于所述第一损失、第二损失和第五损失对所述位移参数的当前值和全局旋转参数的初始值进行优化。本实施例将三维点云加入到监督信息中对初始的各项参数进行优化,从而提高了优化结果的准确性。
在一些实施例中,所述联合优化模块包括:第一获取单元,用于获取所述目标对象的优化二维投影关键点与所述初始二维关键点之间的第六损失,所述优化二维投影关键点基于所述位移参数的优化值和全局旋转参数的优化值对所述目标对象的优化三维关键点进行投影得到,所述优化三维关键点基于所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值得到;第二获取单元,用于获取第七损失,所述第七损失用于表征所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值对应的姿态的合理性;第三获取单元,用于获取所述目标对象表面的第二三维点云与所述初始三维点云之间的第八损失;所述第二三维点云基于所述全局旋转参数的优化值、关键点旋转参数的优化值和体态参数的优化值得到;联合优化单元,用于基于所述第六损失、第七损失和第八损失对所述全局旋转参数的优化值,所述关键点旋转参数的优化值,体态参数的优化值以及所述位移参数的优化值进行联合优化。本实施例将三维点云加入到监督信息中对初始的各项参数进行优化,从而提高了优化结果的准确性。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。
如图6所示,本公开还提供一种三维重建***,所述***包括:
图像采集装置601,用于采集目标对象的图像;以及
与所述图像采集装置601通信连接的处理单元602,用于通过三维重建网络对所述图像中的目标对象进行三维重建,得到所述目标对象的参数的初始值,所述参数的初始值用于建立所述目标对象的三维模型;基于预先获取的用于表示目标对象的特征的监督信息对所述参数的初始值进行优化,得到所述参数的优化值;基于所述参数的优化值进行骨骼蒙皮处理,建立所述目标对象的三维模型。
本公开实施例中的图像采集装置601可以是相机或者摄像头等具有图像采集功能的设备,图像采集装置601采集的图像可以实时传输给处理单元602,或者经过存储,并在需要时从存储空间传输到处理单元602。处理单元602可以是单个服务器或者是由多个服务器构成的服务器集群。处理单元602所执行的方法详见前述三维重建方法的实施例,此处不再赘述。
本说明书实施例还提供一种计算机设备,其至少包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行所述程序时实现前述任一实施例所述的方法。
图7示出了本说明书实施例所提供的一种更为具体的计算设备硬件结构示意图,该设备可以包括:处理器701、存储器702、输入/输出接口703、通信接口704和总线705。其中处理器701、存储器702、输入/输出接口703和通信接口704通过总线705实现彼此之间在设备内部的通信连接。
处理器701可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本说明书实施例所提供的技术方案。处理器701还可以包括显卡,所述显卡可以是Nvidia titan X显卡或者1080Ti显卡等。
存储器702可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器702可以存储操作***和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器702中,并由处理器701来调用执行。
输入/输出接口703用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。
通信接口704用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。
总线705包括一通路,在设备的各个组件(例如处理器701、存储器702、输入/输出接口703和通信接口704)之间传输信息。
需要说明的是,尽管上述设备仅示出了处理器701、存储器702、输入/输出接口703、通信接口704以及总线705,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部组件。
本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前述任一实施例所述的方法。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒 式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本说明书实施例可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本说明书实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本说明书实施例各个实施例或者实施例的某些部分所述的方法。
上述实施例阐明的***、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,在实施本说明书实施例方案时可以把各模块的功能在同一个或多个软件和/或硬件中实现。也可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。

Claims (16)

  1. 一种三维重建方法,所述方法包括:
    通过三维重建网络对图像中的目标对象进行三维重建,得到所述目标对象的参数的初始值,其中,所述参数的初始值用于建立所述目标对象的三维模型;
    基于预先获取的用于表示所述目标对象的特征的监督信息对所述参数的初始值进行优化,得到所述参数的优化值;
    基于所述参数的优化值进行骨骼蒙皮处理,建立所述目标对象的三维模型。
  2. 根据权利要求1所述的方法,其特征在于,所述监督信息包括第一监督信息,或者所述监督信息包括第一监督信息和第二监督信息;
    所述第一监督信息包括以下至少一者:所述目标对象的初始二维关键点,所述图像中所述目标对象上的多个像素点的语义信息;
    所述第二监督信息包括所述目标对象表面的初始三维点云。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    通过关键点提取网络从所述图像中提取所述目标对象的初始二维关键点的信息。
  4. 根据权利要求2或3所述的方法,其特征在于,所述图像包括所述目标对象的深度图像;所述方法还包括:
    从所述深度图像中提取所述目标对象上所述多个像素点的深度信息;
    基于所述深度信息将所述深度图像中所述目标对象上的所述多个像素点反向投影到三维空间,得到所述目标对象表面的所述初始三维点云。
  5. 根据权利要求4所述的方法,其特征在于,所述图像还包括所述目标对象的RGB图像;从所述深度图像中提取所述目标对象上所述多个像素点的深度信息,包括:
    对所述RGB图像进行图像分割;
    基于图像分割的结果确定所述RGB图像中所述目标对象所在的图像区域;
    基于所述RGB图像中所述目标对象所在的图像区域确定所述深度图像中所述目标对象所在的图像区域;
    获取所述深度图像中所述目标对象所在的图像区域中所述多个像素点的深度信息。
  6. 根据权利要求2至5任意一项所述的方法,其特征在于,所述方法还包括:
    从所述初始三维点云中过滤掉离群点,将过滤后的所述初始三维点云作为所述第二监督信息。
  7. 根据权利要求1至6任意一项所述的方法,其特征在于,所述目标对象的图像通过图像采集装置采集得到,所述参数包括:所述目标对象的全局旋转参数、所述目标对象各个关键点的关键点旋转参数、所述目标对象的体态参数以及所述图像采集装置的位移参数;
    基于预先获取的用于表示所述目标对象的特征的监督信息对所述参数的初始值进行优化,包括:
    在所述体态参数的初始值和所述关键点旋转参数的初始值保持不变的情况下,基于所述监督信息和所述位移参数的初始值,对所述图像采集装置的所述位移参数的当前值以及所述全局旋转参数的初始值进行优化,得到所述位移参数的优化值和所述全局旋转参数的优化值;
    基于所述位移参数的优化值和所述全局旋转参数的优化值,对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化,得到所述关键点旋转参数的优化值和所述体态参数的优化值。
  8. 根据权利要求7所述的方法,其特征在于,所述监督信息包括所述目标对象的初始二维关键点;
    基于所述监督信息和所述位移参数的初始值,对所述图像采集装置的所述位移参数的当前值以及所述全局旋转参数的初始值进行优化,包括:
    获取所述目标对象的三维关键点对应的二维投影关键点中属于所述目标对象的预设部位的目标二维投影关键点;其中,所述目标对象的三维关键点基于所述全局旋转参数的初始值、所述关键点旋转参数初始值和所述体态参数的初始值得到,所述二维投影关键点基于所述位移参数的当前值和所述全局旋转参数的初始值对所述目标对象的三维关键点进行投影得到;
    获取所述目标二维投影关键点与所述初始二维关键点之间的第一损失;
    获取所述位移参数的初始值与所述位移参数的当前值之间的第二损失;
    基于所述第一损失和所述第二损失对所述位移参数的当前值和所述全局旋转参数的初始值进行优化。
  9. 根据权利要求7或8所述的方法,其特征在于,所述监督信息包括所述目标对象的初始二维关键点;基于所述位移参数的优化值和所述全局旋转参数的优化值,对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化,包括:
    获取所述目标对象的优化二维投影关键点与所述初始二维关键点之间的第三损失,其中,所述优化二维投影关键点基于所述位移参数的优化值和所述全局旋转参数的优化值对所述目标对象的优化三维关键点进行投影得到,所述优化三维关键点基于所述全局旋转参数的优化值、所述关键点旋转参数的初始值和所述体态参数的初始值得到;
    获取第四损失,所述第四损失用于表征所述全局旋转参数的优化值、所述关键点旋转参数的初始值和所述体态参数的初始值对应的姿态的合理性;
    基于所述第三损失和所述第四损失对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化。
  10. 根据权利要求7至9任意一项所述的方法,其特征在于,在基于所述位移参数的优化值和所述全局旋转参数的优化值对所述关键点旋转参数的初始值和所述体态参数的初始值进行优化之后,所述方法还包括:
    对所述全局旋转参数的优化值、所述关键点旋转参数的优化值、所述体态参数的优化值以及所述位移参数的优化值进行联合优化。
  11. 根据权利要求10所述的方法,其特征在于,所述监督信息包括所述目标对象的初始二维关键点和所述目标对象表面的初始三维点云;基于所述监督信息和所述位移参数的初始值,对所述图像采集装置的所述位移参数的当前值以及所述全局旋转参数的初始值进行优化,包括:
    获取所述目标对象的三维关键点对应的二维投影关键点中属于所述目标对象的预设部位的目标二维投影关键点;其中,所述目标对象的三维关键点基于所述全局旋转参数的初始值、所述关键点旋转参数的初始值和所述体态参数的初始值得到,所述二维投影关键点基于所述位移参数的当前值和所述全局旋转参数的初始值对所述目标对象的三维关键点进行投影得到;
    获取所述目标二维投影关键点与所述初始二维关键点之间的第一损失;
    获取所述位移参数的初始值与所述位移参数的当前值之间的第二损失;
    获取所述目标对象表面的第一三维点云与所述初始三维点云之间的第五损失;其中,所述第一三维点云基于所述全局旋转参数的初始值、所述关键点旋转参数的初始值和所述体态参数的初始值得到;
    基于所述第一损失、所述第二损失和所述第五损失对所述位移参数的当前值和所述全局旋转参数的初始值进行优化。
  12. 根据权利要求10或11所述的方法,其特征在于,对所述全局旋转参数的优化值、所述关键点旋转参数的优化值、所述体态参数的优化值以及所述位移参数的优化值进行联合优化,包括:
    获取所述目标对象的优化二维投影关键点与所述初始二维关键点之间的第六损失,其中,所述优化二维投影关键点基于所述位移参数的优化值和所述全局旋转参数的优化 值对所述目标对象的优化三维关键点进行投影得到,所述优化三维关键点基于所述全局旋转参数的优化值、所述关键点旋转参数的优化值和所述体态参数的优化值得到;
    获取第七损失,所述第七损失用于表征所述全局旋转参数的优化值、所述关键点旋转参数的优化值和所述体态参数的优化值对应的姿态的合理性;
    获取所述目标对象表面的第二三维点云与所述初始三维点云之间的第八损失;所述第二三维点云基于所述全局旋转参数的优化值、所述关键点旋转参数的优化值和所述体态参数的优化值得到;
    基于所述第六损失、所述第七损失和所述第八损失对所述全局旋转参数的优化值、所述关键点旋转参数的优化值、所述体态参数的优化值以及所述位移参数的优化值进行联合优化。
  13. 一种三维重建装置,所述装置包括:
    第一三维重建模块,用于通过三维重建网络对图像中的目标对象进行三维重建,得到所述目标对象的参数的初始值,其中,所述参数的初始值用于建立所述目标对象的三维模型;
    优化模块,用于基于预先获取的用于表示所述目标对象的特征的监督信息对所述参数的初始值进行优化,得到所述参数的优化值;
    第二三维重建模块,用于基于所述参数的优化值进行骨骼蒙皮处理,建立所述目标对象的三维模型。
  14. 一种三维重建***,所述***包括:
    图像采集装置,用于采集目标对象的图像;以及
    与所述图像采集装置通信连接的处理单元,用于通过三维重建网络对所述图像中的所述目标对象进行三维重建,得到所述目标对象的参数的初始值,其中,所述参数的初始值用于建立所述目标对象的三维模型;基于预先获取的用于表示所述目标对象的特征的监督信息对所述参数的初始值进行优化,得到所述参数的优化值;基于所述参数的优化值进行骨骼蒙皮处理,建立所述目标对象的三维模型。
  15. 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现根据权利要求1至12任意一项所述的方法。
  16. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现根据权利要求1至12任意一项所述的方法。
PCT/CN2022/075636 2021-05-10 2022-02-09 三维重建方法、装置和***、介质及计算机设备 WO2022237249A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020237014677A KR20230078777A (ko) 2021-05-10 2022-02-09 3차원 재구성 방법, 장치와 시스템, 매체 및 컴퓨터 기기
JP2023525021A JP2023547888A (ja) 2021-05-10 2022-02-09 三次元再構成方法、装置、システム、媒体及びコンピュータデバイス

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110506464.X 2021-05-10
CN202110506464.XA CN113160418A (zh) 2021-05-10 2021-05-10 三维重建方法、装置和***、介质及计算机设备

Publications (1)

Publication Number Publication Date
WO2022237249A1 true WO2022237249A1 (zh) 2022-11-17

Family

ID=76874172

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/075636 WO2022237249A1 (zh) 2021-05-10 2022-02-09 三维重建方法、装置和***、介质及计算机设备

Country Status (5)

Country Link
JP (1) JP2023547888A (zh)
KR (1) KR20230078777A (zh)
CN (1) CN113160418A (zh)
TW (1) TW202244853A (zh)
WO (1) WO2022237249A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116030189A (zh) * 2022-12-20 2023-04-28 中国科学院空天信息创新研究院 一种基于单视角遥感图像的目标三维重建方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160418A (zh) * 2021-05-10 2021-07-23 上海商汤智能科技有限公司 三维重建方法、装置和***、介质及计算机设备
CN113724378B (zh) * 2021-11-02 2022-02-25 北京市商汤科技开发有限公司 三维建模方法和装置、计算机可读存储介质及计算机设备
CN115375856B (zh) * 2022-10-25 2023-02-07 杭州华橙软件技术有限公司 三维重建方法、设备以及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840939A (zh) * 2019-01-08 2019-06-04 北京达佳互联信息技术有限公司 三维重建方法、装置、电子设备及存储介质
CN110288696A (zh) * 2019-06-13 2019-09-27 南京航空航天大学 一种完备一致生物体三维特征表征模型的建立方法
CN111862299A (zh) * 2020-06-15 2020-10-30 上海非夕机器人科技有限公司 人体三维模型构建方法、装置、机器人和存储介质
CN112037320A (zh) * 2020-09-01 2020-12-04 腾讯科技(深圳)有限公司 一种图像处理方法、装置、设备以及计算机可读存储介质
CN112419454A (zh) * 2020-11-25 2021-02-26 北京市商汤科技开发有限公司 一种人脸重建方法、装置、计算机设备及存储介质
CN112509144A (zh) * 2020-12-09 2021-03-16 深圳云天励飞技术股份有限公司 人脸图像处理方法、装置、电子设备及存储介质
CN113160418A (zh) * 2021-05-10 2021-07-23 上海商汤智能科技有限公司 三维重建方法、装置和***、介质及计算机设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103236082B (zh) * 2013-04-27 2015-12-02 南京邮电大学 面向捕获静止场景的二维视频的准三维重建方法
CN107945269A (zh) * 2017-12-26 2018-04-20 清华大学 基于多视点视频的复杂动态人体对象三维重建方法及***

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840939A (zh) * 2019-01-08 2019-06-04 北京达佳互联信息技术有限公司 三维重建方法、装置、电子设备及存储介质
CN110288696A (zh) * 2019-06-13 2019-09-27 南京航空航天大学 一种完备一致生物体三维特征表征模型的建立方法
CN111862299A (zh) * 2020-06-15 2020-10-30 上海非夕机器人科技有限公司 人体三维模型构建方法、装置、机器人和存储介质
CN112037320A (zh) * 2020-09-01 2020-12-04 腾讯科技(深圳)有限公司 一种图像处理方法、装置、设备以及计算机可读存储介质
CN112419454A (zh) * 2020-11-25 2021-02-26 北京市商汤科技开发有限公司 一种人脸重建方法、装置、计算机设备及存储介质
CN112509144A (zh) * 2020-12-09 2021-03-16 深圳云天励飞技术股份有限公司 人脸图像处理方法、装置、电子设备及存储介质
CN113160418A (zh) * 2021-05-10 2021-07-23 上海商汤智能科技有限公司 三维重建方法、装置和***、介质及计算机设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116030189A (zh) * 2022-12-20 2023-04-28 中国科学院空天信息创新研究院 一种基于单视角遥感图像的目标三维重建方法
CN116030189B (zh) * 2022-12-20 2023-07-04 中国科学院空天信息创新研究院 一种基于单视角遥感图像的目标三维重建方法

Also Published As

Publication number Publication date
TW202244853A (zh) 2022-11-16
KR20230078777A (ko) 2023-06-02
CN113160418A (zh) 2021-07-23
JP2023547888A (ja) 2023-11-14

Similar Documents

Publication Publication Date Title
WO2022237249A1 (zh) 三维重建方法、装置和***、介质及计算机设备
US11238606B2 (en) Method and system for performing simultaneous localization and mapping using convolutional image transformation
US10846903B2 (en) Single shot capture to animated VR avatar
CN113012282B (zh) 三维人体重建方法、装置、设备及存储介质
WO2022001236A1 (zh) 三维模型生成方法、装置、计算机设备及存储介质
CN110264509A (zh) 确定图像捕捉设备的位姿的方法、装置及其存储介质
JP7387202B2 (ja) 3次元顔モデル生成方法、装置、コンピュータデバイス及びコンピュータプログラム
WO2022205762A1 (zh) 三维人体重建方法、装置、设备及存储介质
WO2023109753A1 (zh) 虚拟角色的动画生成方法及装置、存储介质、终端
WO2023071964A1 (zh) 数据处理方法, 装置, 电子设备及计算机可读存储介质
KR20160098560A (ko) 동작 분석 장치 및 방법
CN109242950A (zh) 多人紧密交互场景下的多视角人体动态三维重建方法
WO2023071790A1 (zh) 目标对象的姿态检测方法、装置、设备及存储介质
CN111710035B (zh) 人脸重建方法、装置、计算机设备及存储介质
KR20230071588A (ko) 디오라마 적용을 위한 다수 참여 증강현실 콘텐츠 제공 장치 및 그 방법
CN115496864B (zh) 模型构建方法、重建方法、装置、电子设备及存储介质
CN113723317A (zh) 3d人脸的重建方法、装置、电子设备和存储介质
CN116342782A (zh) 生成虚拟形象渲染模型的方法和装置
KR20220149717A (ko) 단안 카메라로부터 전체 골격 3d 포즈 복구
CN115775300B (zh) 人体模型的重建方法、人体重建模型的训练方法及装置
WO2023160074A1 (zh) 一种图像生成方法、装置、电子设备以及存储介质
CN109615688A (zh) 一种移动设备上的实时人脸三维重建***及方法
US20230290101A1 (en) Data processing method and apparatus, electronic device, and computer-readable storage medium
CN116229008B (zh) 图像处理方法和装置
US20240096041A1 (en) Avatar generation based on driving views

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22806226

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023525021

Country of ref document: JP

ENP Entry into the national phase

Ref document number: 20237014677

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE