WO2022156532A1 - 三维人脸模型重建方法、装置、电子设备及存储介质 - Google Patents

三维人脸模型重建方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022156532A1
WO2022156532A1 PCT/CN2022/070257 CN2022070257W WO2022156532A1 WO 2022156532 A1 WO2022156532 A1 WO 2022156532A1 CN 2022070257 W CN2022070257 W CN 2022070257W WO 2022156532 A1 WO2022156532 A1 WO 2022156532A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
dimensional
parameters
model
camera
Prior art date
Application number
PCT/CN2022/070257
Other languages
English (en)
French (fr)
Inventor
张建杰
柴金祥
熊兴堂
王志勇
李妙鹏
蒋利国
陈磊
Original Assignee
魔珐(上海)信息科技有限公司
上海墨舞科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 魔珐(上海)信息科技有限公司, 上海墨舞科技有限公司 filed Critical 魔珐(上海)信息科技有限公司
Publication of WO2022156532A1 publication Critical patent/WO2022156532A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Definitions

  • the present application relates to the technical field of computer vision, and in particular, to a method, device, electronic device and storage medium for reconstructing a three-dimensional face model.
  • Facial expression capture plays an important role in many fields, typically, such as movies, games, criminal investigation, video surveillance, and so on.
  • relatively accurate facial expressions can indeed be captured by adopting some methods with high cost and complicated process.
  • accurate facial expressions can be obtained by reflectively collecting marker points on the human face.
  • the huge cost of this method and the discomfort it brings to users seriously hinder its development and promotion.
  • Single-camera face image acquisition has the characteristics of low cost, easy installation, and user-friendliness, but single-camera face images generally have two-dimensional information with only one perspective, and it is difficult to provide three-dimensional information. Therefore, in order to obtain vivid facial expressions, it is necessary to reconstruct a 3D face model from a single-camera face image.
  • the current 3D face models reconstructed based on single-camera face images often have low accuracy and are difficult to capture realistic facial expressions.
  • the purpose of the embodiments of the present application is to provide a three-dimensional face model reconstruction method, apparatus, electronic device and storage medium, which can quickly and accurately reconstruct a three-dimensional face model.
  • the three-dimensional face model reconstruction method, device, electronic device, and storage medium provided by the embodiments of the present application are implemented as follows:
  • a three-dimensional face model reconstruction method comprises:
  • the three-dimensional face model is determined by using the identity parameters, face posture parameters and expression parameters, so that the two-dimensional feature points of the face determined by the three-dimensional face model match the two-dimensional feature points of the target face, and the determined texture mapping information Matches the target texture mapping information.
  • the acquiring a single-camera face image includes:
  • Face detection is performed on the single-camera image, and a single-camera face image is intercepted from the single-camera image.
  • the face information prediction model is set to be obtained by training in the following manner:
  • the face information prediction model Inputting the single-camera face sample image into the face information prediction model to generate a prediction result, where the prediction result includes the predicted two-dimensional feature points of the face and texture mapping information;
  • the model parameters are iteratively adjusted until the difference meets a preset requirement.
  • the single-camera face sample image is set to be acquired in the following manner:
  • the two-dimensional feature points of the face and/or the texture mapping information separate face images from the multiple single-camera images, and use the multiple face images as the training face information Single-camera face sample images used by the prediction model.
  • the three-dimensional face model is determined by using the identity parameters, face posture parameters and expression parameters, so that the two-dimensional feature points of the face determined by the three-dimensional face model are consistent with the The two-dimensional feature points of the target face are matched, and the determined texture mapping information is matched with the target texture mapping information, including:
  • the feature points are matched with the two-dimensional feature points of the target face, and the determined texture mapping information is matched with the target texture mapping information.
  • one or two of the parameters of the identity parameter, the face posture parameter and the expression parameter are alternately fixed, and the other two or one of the parameters are adjusted to generate a three-dimensional face.
  • models including:
  • the predicted texture mapping information and the target texture mapping information are Iterative adjustment is performed until at least one of the difference or the number of iterations meets a preset requirement.
  • the method is based on the prediction between the two-dimensional feature points of the predicted face and the two-dimensional feature points of the target face, the predicted texture mapping information and the target texture mapping information.
  • the difference between the two or one parameter is iteratively adjusted until at least one of the difference or the number of iterations meets the preset requirements, including:
  • the use of identity parameters, Face pose parameters and expression parameters determine a three-dimensional face model, so that the two-dimensional feature points of the face determined by the three-dimensional face model match the two-dimensional feature points of the target face, the determined texture mapping information and the target texture mapping information to match, including:
  • the three-dimensional face model of the parameters makes the two-dimensional feature points of the face determined by the N three-dimensional face models match the two-dimensional feature points of the target face, and the determined texture mapping information and the target texture mapping information match.
  • a 3D face model with the same identity parameters including:
  • the parameter or the identity parameter is iteratively adjusted until at least one of the difference or the number of iterations satisfies a preset requirement.
  • the method further includes:
  • the identity parameters of the N predicted 3D face models are used and fixed in the joint optimization, and the face pose parameters and the expression parameters, until at least one of the difference or the number of iterations satisfies a preset requirement.
  • the three-dimensional face model includes a three-dimensional model composed of a preset number of multiple polygonal meshes connected to each other, and the positions of the mesh vertices of the polygonal meshes are determined by the The identity parameter, the face gesture parameter and the expression parameter are determined.
  • the method further includes:
  • the three-dimensional eyeball model includes eye gaze information
  • the three-dimensional face model and the three-dimensional eyeball model are combined into a new three-dimensional face model.
  • a three-dimensional face model reconstruction device comprising:
  • the information prediction module is used to input the single-camera face image into the face information prediction model, and output the two-dimensional feature points of the target face and the target in the single-camera face image through the face information prediction model texture mapping information;
  • a model determination module is used to determine a three-dimensional face model using identity parameters, face posture parameters and expression parameters, so that the two-dimensional feature points of the face determined by the three-dimensional face model match the two-dimensional feature points of the target face ,
  • the determined texture mapping information matches the target texture mapping information.
  • An electronic device includes a processor and a memory for storing instructions executable by the processor, and the processor implements the three-dimensional face model reconstruction method when the processor executes the instructions.
  • a non-transitory computer-readable storage medium when the instructions in the storage medium are executed by a processor, enable the processor to execute the three-dimensional face model reconstruction method.
  • the three-dimensional face model reconstruction method provided by the present application can reconstruct a three-dimensional face model based on a single-camera face image, and takes advantage of the advantages of low-cost, easy installation, and user-friendliness of single-camera image acquisition. Based on this, the advantage of easy acquisition of single-camera face images can not only reduce the construction cost of face information prediction models, but also make reconstruction of 3D face models faster and easier. In the reconstruction process, the two-dimensional face feature points and texture mapping information output by the face information prediction model can effectively improve the accuracy and robustness of the model reconstruction.
  • the three-dimensional face model is obtained by reconstructing identity parameters, face pose parameters and expression parameters, providing accurate and reliable technology for single-camera virtual live broadcast, single-camera intelligent interaction technology, face recognition, criminal investigation monitoring, movie games, expression analysis and other technical fields Program.
  • the present invention focuses on solving the above pain points, and proposes a high-precision, real-time running single-camera facial expression capture scheme.
  • FIG. 1 is a schematic flowchart of a method for reconstructing a three-dimensional face model according to an exemplary embodiment.
  • FIG. 2 is a schematic flowchart of a method for reconstructing a three-dimensional face model according to an exemplary embodiment.
  • FIG. 3 is a schematic flowchart of a method for reconstructing a three-dimensional face model according to an exemplary embodiment.
  • FIG. 4 is a schematic flowchart of a method for reconstructing a three-dimensional face model according to an exemplary embodiment.
  • Fig. 5 is a schematic flowchart of a method for reconstructing a three-dimensional face model according to an exemplary embodiment.
  • FIG. 6 is a schematic flowchart of a method for reconstructing a three-dimensional face model according to an exemplary embodiment.
  • Fig. 7 is a block diagram of an apparatus for reconstructing a three-dimensional face model according to an exemplary embodiment.
  • Fig. 8 is a block diagram of an apparatus for reconstructing a three-dimensional face model according to an exemplary embodiment.
  • Fig. 1 is a schematic flowchart of an embodiment of a three-dimensional face model reconstruction method provided by the present application.
  • the present application provides method operation steps as shown in the following embodiments or drawings, more or less operation steps may be included in the method based on routine or without creative effort. In steps that logically do not have a necessary causal relationship, the execution order of these steps is not limited to the execution order provided by the embodiments of the present application.
  • the method may be executed sequentially or in parallel (eg, in a parallel processor or multi-thread processing environment) according to the methods shown in the embodiments or the accompanying drawings.
  • FIG. 1 an embodiment of the three-dimensional face model reconstruction method provided by the present application is shown in FIG. 1 , and the method may include:
  • S103 Input the single-camera face image into a face information prediction model, and output the target face two-dimensional feature points and target texture mapping information in the single-camera face image through the face information prediction model.
  • S105 Determine a three-dimensional face model using identity parameters, face posture parameters and expression parameters, so that the two-dimensional feature points of the face determined by the three-dimensional face model match the two-dimensional feature points of the target face, and the determined texture
  • the mapping information matches the target texture mapping information.
  • the single-camera face image may include a face image captured by a single camera
  • the single camera may include a single camera device, such as a single-lens reflex camera, a smart device with a camera function (such as Smartphones, tablet computers, smart wearable devices, etc.), the camera can be an RGB camera, or an RGBD camera, etc.
  • the single-camera face image may include an image in any format, such as an RGB image, a grayscale image, and the like. In an actual application environment, the image captured by the camera not only includes a face image, but also includes a background image other than the face.
  • a single-camera face image that only contains human faces as much as possible can be cut out from the images captured by the single-camera.
  • a single-camera image including a face image may be acquired.
  • face detection can be performed on the single-camera image, and a single-camera face image can be intercepted from the single-camera image.
  • a face detection algorithm based on machine learning can be used to detect the face image of the single-camera image, and the single-camera face image can be cut out from the single-camera image.
  • the face detection algorithm may include algorithms such as R-CNN, Fast R-CNN, Faster R-CNN, TCDCN, MTCNN, YOLOV3, SSD, etc., which are not limited here.
  • the single-camera face image after the single-camera face image is acquired, the single-camera face image can be input into a face information prediction model, and the single-camera face image is output through the face information prediction model.
  • the two-dimensional feature points of the human face include key points used to characterize the facial features of the human face.
  • 15 face contour feature points and 16 eye feature points may be included. (8 for each of the left and right eyes), 12 eyebrow feature points (6 for each left and right), 12 nose feature points and 18 mouth feature points.
  • the edges of the polygon meshes are shared with the adjacent polygon meshes. Since the number of the polygon meshes is fixed, the number of mesh vertices of the three-dimensional face model is also fixed. Since the identity parameters, face pose parameters and expression parameters are unknown in the initial stage, the positions of the mesh vertices in the three-dimensional face model are default positions. In subsequent embodiments, the process of reconstructing the three-dimensional face model is the process of adjusting the positions of the vertices of the mesh.
  • the mesh vertices may have unique identifiers, for example, the unique identifiers may include (u, v) coordinates of texture mapping, and thus, the texture mapping information may include face image pixels to the mesh
  • the mapping relationship of the unique identification of lattice vertices may include that the pixel point whose coordinate position is (34, 17) in the single-camera image corresponds to the mesh vertex whose texture coordinate is (0.2, 0.5) in the three-dimensional face model.
  • the face information prediction model may include a multi-task machine learning model, and the multi-task machine learning model may implement various tasks, for example, including a multi-task deep learning network.
  • Tasks Deep learning networks can implement two kinds of prediction tasks.
  • S201 Acquire a plurality of single-camera face sample images, where the single-camera face sample images are marked with face two-dimensional feature points and texture mapping information.
  • S203 Build a face information prediction model, where model parameters are set in the face information prediction model.
  • S205 Input the single-camera face sample image into the face information prediction model to generate a prediction result, where the prediction result includes the predicted two-dimensional feature points of the face and texture mapping information.
  • the preset requirement may include, for example, that the value of the difference is smaller than a preset threshold.
  • the prediction result may include various information such as two-dimensional feature points of the face, texture mapping information, etc., and the difference may include multiple prediction results and the corresponding two-dimensional feature points and texture mapping of the face, respectively.
  • the face information that can be output by the face information prediction model is not limited to the above-mentioned two-dimensional face feature points and the texture mapping information, and can also include any other face information, which is not limited in this application.
  • the machine learning algorithm for training the face information prediction model may include Resnet backbone network, MobileNet backbone network, VGG backbone network, etc., which is not limited here.
  • the single-camera face sample image can be obtained in the following manner:
  • S301 Using multiple cameras to acquire multiple single-camera images of the same face from different angles at the same time.
  • S305 Project the three-dimensional face model into the multiple single-camera images, and obtain the two-dimensional face feature points and texture mapping information in the multiple single-camera images respectively.
  • S307 According to the two-dimensional feature points of the face and/or the texture mapping information, separate face images from the multiple single-camera images respectively, and use the multiple face images as the training The single-camera face sample image used by the face information prediction model.
  • multiple cameras can be used to photograph the same face from multiple angles at the same time, so that multiple single-camera images of the face can be acquired.
  • 5 images are obtained by shooting with 5 cameras, so that 5 single-camera images can be obtained at one time.
  • a three-dimensional face model of the human face can be obtained by reconstructing the multiple single-camera images, and identity parameters, face posture parameters and expression parameters can be determined through the three-dimensional face model.
  • the three-dimensional face model can be projected back into multiple single-camera images, respectively, to obtain two-dimensional face feature points and texture mapping information in each single-camera image, respectively.
  • the 3D face model used for multi-camera reconstruction and the 3D face model used for subsequent single-camera reconstruction need to have the same topology, that is, the same vertex connection relationship.
  • the single-camera image includes face information
  • the single-camera image can be segmented according to the face information
  • the segmented face image can be used as the face information prediction for training
  • the single-camera face sample image used by the model, where the face information can include two-dimensional face feature points and texture mapping information.
  • the two-dimensional feature points and texture mapping information of the face of the single-camera image can be obtained. Therefore, a single-camera face sample image can be obtained by segmenting the face image from the single-camera image according to the two-dimensional feature points of the face.
  • the BoundingBox algorithm can be utilized for image separation.
  • the method of generating the single-camera face sample image in this embodiment can greatly save the cost of manual labeling, and can acquire a large amount of sample data in less time, thereby reducing the cost of acquiring training samples.
  • the three-dimensional data determined by the identity parameters, the face pose parameters and the expression parameters can be reconstructed.
  • a face model so that the two-dimensional feature points of the face determined from the three-dimensional face model are matched with the two-dimensional feature points of the target face, and the determined texture mapping information and the target texture mapping information are matched.
  • the two-dimensional feature points of the face determined by the generated three-dimensional face model can be continuously adjusted by adjusting the identity parameters, the face posture parameters and the expression parameters. Matching with the two-dimensional feature points of the target face, the determined texture mapping information is matched with the target texture mapping information.
  • an analysis-by-synthesis algorithm can be used to adjust parameters to determine the three-dimensional face model.
  • one or two of the identity parameters, face posture parameters and expression parameters can be fixed alternately, and the other two or one of the parameters can be adjusted to generate a three-dimensional face model, so that the three-dimensional human face
  • the two-dimensional feature points of the face determined by the face model are matched with the two-dimensional feature points of the target face, and the determined texture mapping information is matched with the target texture mapping information.
  • alternative optimization strategies such as "optimize facial posture parameters and expression parameters with fixed identity parameters” and "optimize identity parameters with fixed facial posture parameters and facial expression parameters” can be adopted.
  • the strategy of posture parameters and expression parameters, this alternate optimization method can make the three-dimensional face model converge quickly and improve the optimization efficiency.
  • one or two parameters among the identity parameters, face posture parameters and expression parameters are alternately fixed, and the other two or one parameters are adjusted.
  • a 3D face model which can include:
  • S401 Alternately fix one or two of the identity parameters, the face posture parameters and the expression parameters, and adjust the other two or one of the parameters to generate a predicted three-dimensional face model;
  • S403 Project the predicted three-dimensional face model into the single-camera face image, and obtain predicted two-dimensional feature points of the face and predicted texture mapping information;
  • S405 Based on the difference between the predicted two-dimensional feature point of the face and the two-dimensional feature point of the target face, the predicted texture mapping information and the target texture mapping information, for the other two or one The parameters are iteratively adjusted until at least one of the difference or the number of iterations satisfies a preset requirement.
  • an initial three-dimensional face model can be provided.
  • the initial three-dimensional face model is the three-dimensional face model when parameters have not yet been optimized.
  • the 3D face model can be generated based on default identity parameters, default face pose parameters and default expression parameters.
  • the default parameters can be determined according to the average value of the identity parameters, facial posture parameters and expression parameters stored in the preset database, or can be determined by using the identity parameters, facial posture parameters and facial expressions reconstructed from the previous frame of a single-camera face image.
  • the parameters are determined, which is not limited in this application. In addition, this application does not make any limitation on whether to “fix the identity parameters to optimize the facial posture parameters and expression parameters” or to “fix the facial posture parameters and the facial expression parameters to optimize the identity parameters” first.
  • the face pose parameters and expression parameters can be optimized by fixing the identity parameters first.
  • the two-dimensional feature points of the target face and the target texture mapping information of the single-camera face image 1 can be determined, such as 73 target feature points and texture mapping information.
  • the initial three-dimensional face model can be projected into the single-camera face image 1, the predicted face two-dimensional feature points and predicted texture mapping information can be obtained, and the predicted two-dimensional face feature points and the predicted face can be determined.
  • the facial gesture parameter and the expression parameter may be adjusted.
  • the face posture parameters and expression parameters are fixed to optimize the identity parameters, and the adjustment method is the same as the adjustment method in the fixed identity parameters to optimize the face posture parameters and the expression parameters, and will not be repeated here.
  • the identity parameter, the face pose parameter and the expression parameter are adjusted alternately and iteratively until the predicted face two-dimensional feature points and the target face two-dimensional feature points, the predicted texture mapping information and all At least one of the difference between the target texture mapping information or the number of iterations satisfies the preset requirement.
  • the iterative adjustment method may include a gradient descent optimization algorithm (Gradient-based Optimization), a particle swarm optimization algorithm (Particle Swarm Optimization), etc., which is not limited in this application.
  • the preset requirement corresponding to the difference may include that the value of the difference is less than or equal to a preset threshold, and the preset threshold may be set to a value such as 0 or 0.01.
  • the preset requirement corresponding to the number of iterations may include that the number of iterations is less than the preset number of times, and the preset number of times may be set to, for example, 5 times, 7 times, and the like.
  • identity parameter 1, face pose parameter 1, expression parameter 1 If the set of parameters determined when at least one of the difference or the number of iterations meets the preset requirements is (identity parameter 1, face pose parameter 1, expression parameter 1), then (identity parameter 1, face pose parameter 1) Parameter 1, expression parameter 1) can determine and predict the three-dimensional face model.
  • the reconstructed 3D face model has many possibilities, and therefore, a certain degree of ambiguity may occur.
  • the reconstructed 3D face model is not a natural and realistic face state.
  • the prior probability distribution result and prior probability target value of at least one of the identity parameter, the face posture parameter, and the expression parameter may also be obtained, and by comparing the prior The prior probability distribution result and the prior probability target value are used to avoid the prior probability distribution result from exceeding a reasonable range.
  • the prior probability target value can be determined according to a large amount of real collected face data, therefore, the ambiguity of the reconstructed three-dimensional face model can be effectively reduced.
  • S501 Obtain a priori probability distribution result and priori probability target value of at least one of the identity parameter, the face posture parameter, and the expression parameter;
  • S503 Based on the difference between the predicted face two-dimensional feature point and the target face two-dimensional feature point, the predicted texture mapping information and the target texture mapping information, the identity parameter, the face The difference between the prior probability distribution result of at least one of the posture parameters and the expression parameter and the prior probability target value, and iteratively adjust the other two or one parameter until the difference or the number of iterations At least one of them meets the preset requirements.
  • the difference between the prior probability distribution result and the prior probability target value of at least one of the identity parameter, the face posture parameter, and the expression parameter may also be used as the three-dimensional human
  • the convergence condition of the face model can effectively reduce the ambiguity of the reconstructed 3D face model.
  • the identity parameters are fixed.
  • the identity parameters of the single-camera face image 1 can be used continuously, and the optimization During the process, only the face posture parameters and the expression parameters can be optimized, so as to simplify the optimization process and improve the reconstruction efficiency of the face model.
  • multiple single-camera face images of the user can be jointly optimized at the same time to improve the reconstruction efficiency.
  • multiple single-camera face images of the same user may be acquired, such as 20 frames of face images with different expressions captured in real time.
  • alternate optimization strategies such as "optimize face pose parameters and expression parameters with fixed identity parameters” and “optimize identity parameters with fixed face pose parameters and expression parameters” can be used.
  • N the number of single-camera face images participating in the joint optimization is N, and the N single-camera face images belong to the same face.
  • N single-camera face images alternately fix the identity parameters or the facial posture parameters and the expression parameters, adjust the facial posture parameters, the facial expression parameters or the identity parameters, and jointly optimize to generate N A three-dimensional face model with the same identity parameters, respectively making the two-dimensional feature points of the face determined by the N three-dimensional face models match the two-dimensional feature points of the target face, the determined texture mapping information and the target texture mapping information to match.
  • it includes:
  • S601 Alternately fix the identity parameter or the facial posture parameter and the expression parameter, adjust the facial posture parameter, the facial expression parameter or the identity parameter, and generate N predicted three-dimensional facial models, wherein, in the fixed facial posture parameter In the case of adjusting the identity parameters by the expression parameters, jointly optimize the identity parameters of the N predicted three-dimensional face models;
  • S603 Project the N predicted three-dimensional face models into the corresponding single-camera face images, respectively, to obtain predicted face two-dimensional feature points and predicted texture mapping information;
  • S605 Based on the difference between the predicted face two-dimensional feature point and the target face two-dimensional feature point, the predicted texture mapping information and the target texture mapping information, perform a Iteratively adjusts the expression parameter or the identity parameter until at least one of the difference or the number of iterations satisfies a preset requirement.
  • the identity parameters of the same face are the same, the identity parameters of the N predicted three-dimensional face models are jointly optimized under the condition that the facial posture parameters and the expression parameters are fixed to adjust the identity parameters.
  • the technical solutions of the above-mentioned embodiments not only have the advantages of fast convergence speed and high reconstruction efficiency of alternate optimization, but also jointly optimize multiple single-camera face images by utilizing the same characteristics of the same face identity parameters. In this way, through one optimization, Multiple 3D face models can be reconstructed, which greatly improves the reconstruction efficiency.
  • the joint optimization method is used and fixed.
  • the identity parameters of the N predicted three-dimensional face models are adjusted, and the face pose parameters and the expression parameters are adjusted until at least one of the difference or the number of iterations meets a preset requirement.
  • the face pose parameters and expression parameters can be optimized by fixing the identity parameters.
  • the two-dimensional feature points of the target face and the target texture mapping information of the N single-camera face images are respectively determined by using the implementations of S101 and S103.
  • N initial three-dimensional face models may be obtained, and the manner of obtaining the initial three-dimensional face models may refer to the foregoing embodiment, which is not limited herein. Projecting the N initial three-dimensional face models into the corresponding single-camera face images, respectively, can obtain N first predicted face two-dimensional feature points and first predicted texture mapping information, and determine N respectively.
  • the face pose parameters and the expression parameters of the N models are adjusted respectively, and N groups of parameters are obtained as (identity parameter 1, face pose parameter 1, and expression parameter 1), (identity parameter 1, face posture parameter 2, the expression parameter 2) ... (identity parameter 1, face posture parameter N, the expression parameter N), according to N groups of parameters, N first predicted three-dimensional parameters can be determined face model. Then, the facial pose parameters and expression parameters can be fixed, and the identity parameters can be optimized.
  • the N first predicted three-dimensional face models can be projected into the corresponding single-camera face images, and N second predicted face two-dimensional feature points and N second predicted textures can be obtained.
  • N the identity parameters of the N models are adjusted respectively, and N groups of parameters are obtained as (identity parameter X, face pose parameter 1, the expression parameter 1), (identity parameter X, Face pose parameter 2, the expression parameter 2)...(identity parameter X, face pose parameter N, the expression parameter N), according to N groups of parameters, N second predicted three-dimensional face models can be determined . Adjust the face pose parameter, the expression parameter or the identity parameter alternately and iteratively until the predicted face two-dimensional feature point and the target face two-dimensional feature point, the predicted texture mapping information and the At least one of the difference between the target texture mapping information or the number of iterations satisfies the preset requirement.
  • the identity parameter X obtained in the above joint optimization process can be used, and the single-camera obtained after optimization can be used.
  • the face image process only the face posture parameters and the expression parameters may be optimized, which simplifies the optimization process and improves the reconstruction efficiency of the face model.
  • the iterative adjustment method may include a gradient descent optimization algorithm (Gradient-based Optimization), a particle swarm optimization algorithm (Particle Swarm Optimization), etc., which is not limited in this application.
  • the preset requirement corresponding to the difference may include that the value of the difference is less than or equal to a preset threshold, and the preset threshold may be set to a value such as 0 or 0.01.
  • the preset requirement corresponding to the number of iterations may include that the number of iterations is less than the preset number of times, and the preset number of times may be set to, for example, 5 times, 7 times, and the like.
  • the prior probability may also be used to constrain the identity parameter, the face pose parameter, and the expression parameter in a scenario where N single-camera face images are jointly optimized, so that the reconstructed three-dimensional The face model is more realistic.
  • the existing single-camera capture technology still has problems such as poor accuracy and inability to capture eyeball states.
  • the capture of eyeball state plays a decisive role in restoring the fidelity of facial expressions.
  • a three-dimensional eyeball model can be acquired, the three-dimensional eyeball model includes eye gaze information, and then the three-dimensional face model and the three-dimensional eyeball model are combined into a new three-dimensional face model. In this way, a face model with an eyeball state can be captured, which is more realistic.
  • a three-dimensional eyeball model can be obtained, wherein the three-dimensional eyeball model includes eyes.
  • the methods for establishing the 3D eyeball model may include but are not limited to the following methods: eye capture based on infrared devices, the user needs to wear specified infrared glasses or install a specific infrared device, and determine the state of the eyes by comparing the intensity of the reflected infrared light. Reconstructing the eyeball; based on the eye capture of a single camera, using the synthesis-analysis method, the final eye is obtained by comparing the difference between the synthesized eye and the eye observed in the picture; the specific method is not limited here.
  • a vivid and realistic human face image is obtained according to the three-dimensional human face model.
  • a three-dimensional face model of a background actor can be rendered into an animated character to generate a vivid scene of live broadcast of the animated character.
  • the three-dimensional face model of the player can be rendered into the game character to generate a vivid game scene.
  • animation production and movie production which is not limited in this application.
  • the three-dimensional face model reconstruction method provided by this application can be used in offline mode and real-time mode.
  • the offline mode includes a method for reconstructing a three-dimensional face model according to an offline video. It does not need to output the three-dimensional face model immediately, and can be used in the post-production animation film and television.
  • the real-time mode can also run in areas that require real-time interaction with users, such as interactive games and live broadcasts. After GPU acceleration in real-time applications, it can run in real-time (that is, after obtaining a picture, immediately output a 3D face model, the delay between these not easily perceived by the user).
  • the three-dimensional face model reconstruction method can have offline mode and real-time mode, so that it can be more widely used.
  • the three-dimensional face model reconstruction method can reconstruct a three-dimensional face model based on a single-camera face image, and takes advantage of the advantages of low-cost, easy installation, and user-friendliness of single-camera image acquisition. Based on this, the advantage of easy acquisition of single-camera face images can not only reduce the construction cost of face information prediction models, but also make reconstruction of 3D face models faster and easier.
  • the two-dimensional face feature points and texture mapping information output by the face information prediction model can effectively improve the accuracy and robustness of the model reconstruction.
  • the three-dimensional face model is obtained by reconstructing identity parameters, face pose parameters and expression parameters, providing accurate and reliable technology for single-camera virtual live broadcast, single-camera intelligent interaction technology, face recognition, criminal investigation monitoring, movie games, expression analysis and other technical fields Program.
  • the present application also provides an electronic device, comprising a processor and a memory for storing executable instructions of the processor, and the processor can execute the instructions.
  • an electronic device comprising a processor and a memory for storing executable instructions of the processor, and the processor can execute the instructions.
  • the three-dimensional face model reconstruction method described in any of the above embodiments is implemented.
  • the device 800 may include:
  • an acquisition module 801 configured to acquire a single-camera face image
  • the information prediction module 803 is used to input the single-camera face image into the face information prediction model, and output the two-dimensional feature points of the target face in the single-camera face image through the face information prediction model and the target texture mapping information;
  • the model determination module 805 is used to determine a three-dimensional face model by using the identity parameters, the face posture parameters and the expression parameters, so that the two-dimensional feature points of the face determined by the three-dimensional face model are consistent with the two-dimensional feature points of the target face. Matching, the determined texture mapping information matches the target texture mapping information.
  • the acquisition module includes:
  • Image acquisition sub-module used to acquire single-camera images including face images
  • the face detection submodule is used for performing face detection on the single-camera image, and intercepting the single-camera face image from the single-camera image.
  • the face information prediction model is set to be obtained by training the following sub-modules:
  • a sample acquisition sub-module used for acquiring a plurality of single-camera face sample images, wherein the single-camera face sample images are marked with face two-dimensional feature points and texture mapping information;
  • a model construction submodule used for constructing a face information prediction model, wherein model parameters are set in the face information prediction model
  • a prediction result generation sub-module is used to input the single-camera face sample image into the face information prediction model to generate a prediction result, and the prediction result includes the predicted face two-dimensional feature points and texture mapping information ;
  • the iterative adjustment sub-module is configured to iteratively adjust the model parameters based on the difference between the prediction result, the two-dimensional feature points of the face, and the texture mapping information, until the difference meets a preset requirement.
  • the single-camera face sample image is set to be acquired according to the following modules:
  • the image acquisition sub-module is used to acquire multiple single-camera images of the same face from different angles simultaneously by using multiple cameras;
  • model reconstruction sub-module for reconstructing a three-dimensional face model of the face by using the multiple single-camera images
  • a face information acquisition sub-module used to project the three-dimensional face model of the human face into the multiple single-camera images respectively, and obtain the two-dimensional feature points of the face in the multiple single-camera images respectively and texture mapping information;
  • the image segmentation sub-module is used for segmenting a face image from the multiple single-camera images according to the two-dimensional feature points of the face and/or the texture mapping information, and dividing the multiple face images
  • the single-camera face sample image used for training the face information prediction model.
  • the model determination module includes:
  • Alternately adjusting the sub-module used to alternately fix one or two of the identity parameters, face posture parameters and expression parameters, and adjust the other two or one of the parameters to generate a three-dimensional face model, so that the three-dimensional face
  • the two-dimensional feature points of the face determined by the model are matched with the two-dimensional feature points of the target face, and the determined texture mapping information is matched with the target texture mapping information.
  • the alternate adjustment sub-module includes:
  • the prediction model generation unit is used to alternately fix one or two of the parameters of the identity parameter, the face posture parameter and the expression parameter, and adjust the other two or one of the parameters to generate a predicted three-dimensional face model;
  • a face information acquisition unit configured to project the predicted three-dimensional face model into the single-camera face image, and obtain predicted two-dimensional feature points of the face and predicted texture mapping information
  • the iterative adjustment unit is configured to, based on the difference between the predicted two-dimensional feature point of the face and the two-dimensional feature point of the target face, the predicted texture mapping information and the target texture mapping information, adjust the difference between the other two Iteratively adjusts one or one parameter until at least one of the difference or the number of iterations meets a preset requirement.
  • the iterative adjustment unit includes:
  • a priori result obtaining subunit configured to obtain a priori probability distribution result and a priori probability target value of at least one of the identity parameter, the face posture parameter and the expression parameter;
  • an iterative adjustment subunit used for the difference between the predicted face two-dimensional feature point and the target face two-dimensional feature point, the predicted texture mapping information and the target texture mapping information, and the identity parameter , the difference between the a priori probability distribution result of at least one parameter in the face pose parameter and the expression parameter and the prior probability target value, the other two or one parameter is iteratively adjusted until all the At least one of the difference or the number of iterations satisfies the preset requirement.
  • the model determination module when the number N of the single-camera face images is greater than or equal to 2, and the N single-camera face images belong to the same face, the model determination module, include:
  • a multi-model determination submodule configured to alternately fix the identity parameter or the facial posture parameter and the expression parameter based on the N single-camera face images, and adjust the facial posture parameter, the facial expression parameter or the identity parameter, Joint optimization generates N three-dimensional face models with the same identity parameters, so that the two-dimensional feature points of the face determined by the N three-dimensional face models are matched with the two-dimensional feature points of the target face, and the determined texture mapping The information matches the target texture mapping information.
  • the multi-model determination submodule includes:
  • the prediction model generation unit is used to alternately fix identity parameters or face posture parameters and expression parameters, adjust the face posture parameters, the expression parameters or the identity parameters, and generate N predicted three-dimensional face models, wherein, in Under the condition that the facial posture parameters and the facial expression parameters are adjusted to adjust the identity parameters, jointly optimize the identity parameters of the N predicted three-dimensional face models;
  • a predicted face information acquisition unit configured to respectively project the N predicted three-dimensional face models into the corresponding single-camera face images to obtain predicted two-dimensional feature points of the predicted face and predicted texture mapping information
  • the iterative adjustment unit is configured to, based on the difference between the predicted face two-dimensional feature point and the target face two-dimensional feature point, the predicted texture mapping information and the target texture mapping information, adjust the face
  • the posture parameter, the expression parameter or the identity parameter are iteratively adjusted until at least one of the difference or the number of iterations satisfies a preset requirement.
  • the multi-model determination submodule further includes:
  • the optimization and adjustment unit is used to use and fix the identity parameters of the jointly optimized N predicted three-dimensional face models in the process of reconstructing a three-dimensional face for a subsequent single-camera face image, and adjust the face pose parameters and the expression parameters, until at least one of the difference or the number of iterations satisfies a preset requirement.
  • the three-dimensional face model includes a three-dimensional model composed of a preset number of multiple polygonal meshes connected to each other, and the positions of the mesh vertices of the polygonal meshes are determined by the The identity parameter, the face gesture parameter and the expression parameter are determined.
  • the device further includes:
  • an eyeball model acquiring module used for acquiring a three-dimensional eyeball model, where the three-dimensional eyeball model includes eyeball information
  • the model combining module is used for combining the three-dimensional face model and the three-dimensional eyeball model into a new three-dimensional face model.
  • Another aspect of the present application further provides a computer-readable storage medium, on which computer instructions are stored, and when the instructions are executed, implement the steps of the method in any of the foregoing embodiments.
  • the computer-readable storage medium may include a physical device for storing information, usually after digitizing the information and then storing it in a medium using electrical, magnetic or optical means.
  • the computer-readable storage medium described in this embodiment may include: devices that use electrical energy to store information, such as various memories, such as RAM, ROM, etc.; devices that use magnetic energy to store information, such as hard disks, floppy disks, magnetic tapes, magnetic Core memory, magnetic bubble memory, U disk; devices that use optical means to store information such as CD or DVD.
  • devices that use electrical energy to store information such as various memories, such as RAM, ROM, etc.
  • devices that use magnetic energy to store information such as hard disks, floppy disks, magnetic tapes, magnetic Core memory, magnetic bubble memory, U disk
  • devices that use optical means to store information such as CD or DVD.
  • there are other readable storage media such as quantum memory, graphene memory, and so on.
  • a Programmable Logic Device (such as a Field Programmable Gate Array (FPGA)) is an integrated circuit whose logic function is determined by user programming of the device.
  • HDL Hardware Description Language
  • ABEL Advanced Boolean Expression Language
  • AHDL Altera Hardware Description Language
  • HDCal JHDL
  • Lava Lava
  • Lola MyHDL
  • PALASM RHDL
  • VHDL Very-High-Speed Integrated Circuit Hardware Description Language
  • Verilog Verilog
  • the controller may be implemented in any suitable manner, for example, the controller may take the form of eg a microprocessor or processor and a computer readable medium storing computer readable program code (eg software or firmware) executable by the (micro)processor , logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers, examples of controllers include but are not limited to the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicon Labs C8051F320, the memory controller can also be implemented as part of the control logic of the memory.
  • the controller may take the form of eg a microprocessor or processor and a computer readable medium storing computer readable program code (eg software or firmware) executable by the (micro)processor , logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers
  • ASICs application specific integrated circuits
  • controllers include but are not limited to
  • the controller in addition to implementing the controller in the form of pure computer-readable program code, the controller can be implemented as logic gates, switches, application-specific integrated circuits, programmable logic controllers and embedded devices by logically programming the method steps.
  • the same function can be realized in the form of a microcontroller, etc. Therefore, this kind of controller can be regarded as a hardware component, and the devices included in it for realizing various functions can also be regarded as a structure in the hardware component. Or even, the means for implementing various functions can be regarded as both a software module implementing a method and a structure within a hardware component.
  • a typical implementation device is a computer.
  • the computer can be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or A combination of any of these devices.
  • embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • Memory may include forms of non-persistent memory, random access memory (RAM) and/or non-volatile memory in computer readable media, such as read only memory (ROM) or flash memory (flash RAM).
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.
  • the embodiments of the present application may be provided as a method, a system or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • the application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)
  • Collating Specific Patterns (AREA)

Abstract

本申请关于一种三维人脸模型重建方法、装置、电子设备及存储介质。所述方法包括:获取单相机人脸图像;将所述单相机人脸图像输入至人脸信息预测模型中,经所述人脸信息预测模型输出所述单相机人脸图像中的目标人脸二维特征点和目标纹理映射信息;利用身份参数、人脸姿态参数和表情参数确定三维人脸模型,使得所述三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配利用本申请各个实施例提供的三维人脸模型重建方法、装置、电子设备及存储介质,可以快速、准确地重建得到三维人脸模型。

Description

三维人脸模型重建方法、装置、电子设备及存储介质
本申请要求于2021年1月21日提交中国专利局、申请号为202110082037.3、申请名称为“三维人脸模型重建方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉技术领域,尤其涉及一种三维人脸模型重建方法、装置、电子设备及存储介质。
背景技术
人脸表情捕捉在很多领域都发挥着重要作用,典型地,比如电影、游戏、刑侦、视频监控等等。相关技术中,采用一些成本较高且过程复杂的方式确实可以捕捉到比较准确的人脸表情,例如在人脸上采用反射性采集标记点的方式能够获取到准确的人脸表情。但是,该方式所花费的巨大成本以及给用户带来的不舒适感严重阻碍其发展和推广。
单相机人脸图像采集具有低成本、易安装、用户友好等特点,但是单相机人脸图像一般是只有一种视角的二维信息,比较难以提供三维信息。因此,要想获取生动逼真的人脸表情,则需要对单相机人脸图像进行三维人脸模型重建。但是,目前基于单相机人脸图像重建的三维人脸模型往往精确度较低,难以捕捉到逼真的人脸表情。
因此,相关技术中亟需一种精确度更高的基于单相机人脸图像的三维人脸模型重建方式。
发明内容
本申请实施例的目的在于提供一种三维人脸模型重建方法、装置、电子设备及存储介质,可以快速、准确地重建得到三维人脸模型。
本申请实施例提供的三维人脸模型重建方法、装置、电子设备及存储介质是这样实现的:
一种三维人脸模型重建方法,所述方法包括:
获取单相机人脸图像;
将所述单相机人脸图像输入至人脸信息预测模型中,经所述人脸信息预测模型输出所述单相机人脸图像中的目标人脸二维特征点和目标纹理映射信息;
利用身份参数、人脸姿态参数和表情参数确定三维人脸模型,使得所述三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配。
可选的,在本申请的一个实施例中,所述获取单相机人脸图像,包括:
获取包含人脸图像在内的单相机图像;
对所述单相机图像进行人脸检测,并从所述单相机图像中截取单相机人脸图像。
可选的,在本申请的一个实施例中,所述人脸信息预测模型被设置为按照下述方式训练得到:
获取多个单相机人脸样本图像,所述单相机人脸样本图像中标注有人脸二维特征点和纹理映射信息;
构建人脸信息预测模型,所述人脸信息预测模型中设置有模型参数;
将所述单相机人脸样本图像输入至所述人脸信息预测模型中,生成预测结果,所述预测结果包括预测得到的人脸二维特征点和纹理映射信息;
基于所述预测结果与所述人脸二维特征点、所述纹理映射信息之间的差异,对所述模型参数进行迭代调整,直至所述差异满足预设要求。
可选的,在本申请的一个实施例中,所述单相机人脸样本图像被设置为按照下述方式获取:
利用多相机同时从不同角度采集得到同一人脸的多个单相机图像;
利用所述多个单相机图像重建得到所述人脸的三维人脸模型;
将所述人脸的所述三维人脸模型分别投影至所述多个单相机图像中,分别获取所述多个单相机图像中的人脸二维特征点和纹理映射信息;
根据所述人脸二维特征点和/或所述纹理映射信息,分别从所述多个单相机图像中分割出人脸图像,并将多个所述人脸图像作为训练所述人脸信息预测模型所使用的单相机人脸样本图像。
可选的,在本申请的一个实施例中,所述利用身份参数、人脸姿态参数和表情参数确定三维人脸模型,使得所述三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配,包括:
交替固定身份参数、人脸姿态参数和表情参数中的其中一种或两种参数,调整另外两种或一种参数,生成三维人脸模型,使得所述三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配。
可选的,在本申请的一个实施例中,所述交替固定身份参数、人脸姿态参数和表情参数中的其中一种或两种参数,调整另外两种或一种参数,生成三维人脸模型,包括:
交替固定身份参数、人脸姿态参数和表情参数中的其中一种或两种参数,调整另外两种或一种参数,生成预测三维人脸模型;
将所述预测三维人脸模型投影至所述单相机人脸图像中,获取预测人脸二维特征点和预测纹理映射信息;
基于所述预测人脸二维特征点与所述目标人脸二维特征点、所述预测纹理映射信息与所述目标纹理映射信息之间的差异,对所述另外两种或一种参数进行迭代调整,直至所述差异或者迭代次数中的至少一个满足预设要求。
可选的,在本申请的一个实施例中,所述基于所述预测人脸二维特征点与所述目标人脸二维特征点、所述预测纹理映射信息与所述目标纹理映射信息之间的差异,对所述另外两种或一种参数进行迭代调整,直至所述差异或者迭代次数中的至少一个满足预设要求,包括:
获取所述身份参数、所述人脸姿态参数和所述表情参数中至少一种参数的先验概率分布结果和先验概率目标值;
基于所述预测人脸二维特征点与所述目标人脸二维特征点、所述预测纹理映射信息与所述目标纹理映射信息之间的差异以及所述身份参数、所述人脸姿态参数和所述表情参数中至少一种参数的先验概率分布结果与所述先验概率目标值的差异,对所述另外两种或一种参数进行迭代调整,直至所述差异或者迭代次数中的至少一个满足预设要求。
可选的,在本申请的一个实施例中,在所述单相机人脸图像的数量N大于等于2,且N个单相机人脸图像属于同一人脸的情况下,所述利用身份参数、人脸姿态参数和表情参数确定三维人脸模型,使得所述三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配,包括:
基于所述N个单相机人脸图像,交替固定身份参数或者人脸姿态参数、表情参数,调整所述人脸姿态参数、所述表情参数或者所述身份参数,联合优化生成N个具有相同身份参数的三维人脸模型,分别使得所述N个三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配。
可选的,在本申请的一个实施例中,所述交替固定身份参数或者人脸姿态参数、表情参数,调整所述人脸姿态参数、所述表情参数或者所述身份参数,联合优化生成N个具有相同身份参数的三维人脸模型,包括:
交替固定身份参数或者人脸姿态参数、表情参数,调整所述人脸姿态参数、所述表情参数或者所述身份参数,生成N个预测三维人脸模型,其中,在固定人脸姿态参数、表情参数调整身份参数的情况下,联合优化所述N个预测三维人脸模型的身份参数;
分别将所述N个预测三维人脸模型投影至对应的所述单相机人脸图像中,获取预测人脸二维特征点和预测纹理映射信息;
基于所述预测人脸二维特征点与所述目标人脸二维特征点、所述预测纹理映射信息与所述目标纹理映射信息之间的差异,对所述人脸姿态参数、所述表情参数或者所述身份参数进行迭代调整,直至所述差异或者迭代次数中的至少一个满足预设要求。
可选的,在本申请的一个实施例中,在所述联合优化所述N个预测三维人脸模型的身份参数后,所述方法还包括:
针对于后续的单相机人脸图像进行重建三维人脸的过程中,使用并固定所述联合优化所述N个预测三维人脸模型的所述身份参数,调整所述人脸姿态参数和所述表情参数,直至所述差异或者迭代次数中的至少一个满足预设要求。
可选的,在本申请的一个实施例中,所述三维人脸模型包括由预设数量的多个多边形网格相互连接组成的三维模型,所述多边形网格的网格顶点的位置由所述身份参数、所述人脸姿态参数和所述表情参数确定。
可选的,在本申请的一个实施例中,所述方法还包括:
获取三维眼球模型,所述三维眼球模型包括眼神信息;
将所述三维人脸模型和所述三维眼球模型组合成新的三维人脸模型。
一种三维人脸模型重建装置,包括:
获取模块,用于获取单相机人脸图像;
信息预测模块,用于将所述单相机人脸图像输入至人脸信息预测模型中,经所述人脸信息预测模型输出所述单相机人脸图像中的目标人脸二维特征点和目标纹理映射信息;
模型确定模块,用于利用身份参数、人脸姿态参数和表情参数确定三维人脸模型,使得所述三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配。
一种电子设备,包括处理器以及用于存储处理器可执行指令的存储器,所述处理器执 行所述指令时实现所述的三维人脸模型重建方法。
一种非临时性计算机可读存储介质,当所述存储介质中的指令由处理器执行时,使得处理器能够执行所述的三维人脸模型重建方法。
本申请提供的三维人脸模型重建方法,可以基于单相机人脸图像重建得到三维人脸模型,发挥了单相机图像采集的低成本、易安装、用户友好等优势。基于此,单相机人脸图像容易采集的优势不仅可以降低人脸信息预测模型的构建成本,还可以让重建三维人脸模型变得更加快捷、简便。在重建过程中,基于人脸信息预测模型所输出的人脸二维特征点和纹理映射信息,可以有效提高模型重建的准确性和鲁棒性。通过身份参数、人脸姿态参数和表情参数重建得到三维人脸模型,为单相机虚拟直播、单相机智能交互技术、人脸识别、刑侦监控、电影游戏、表情分析等技术领域提供准确可靠的技术方案。本发明着重解决以上痛点问题,提出了一种高精度、实时运行的单相机人脸表情的捕捉方案。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。
图1是根据一示例性实施例示出的一种三维人脸模型重建方法流程示意图。
图2是根据一示例性实施例示出的一种三维人脸模型重建方法流程示意图。
图3是根据一示例性实施例示出的一种三维人脸模型重建方法流程示意图。
图4是根据一示例性实施例示出的一种三维人脸模型重建方法流程示意图。
图5是根据一示例性实施例示出的一种三维人脸模型重建方法流程示意图。
图6是根据一示例性实施例示出的一种三维人脸模型重建方法流程示意图。
图7是根据一示例性实施例示出的一种三维人脸模型重建装置的框图。
图8是根据一示例性实施例示出的一种三维人脸模型重建装置的框图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
下面结合附图对本申请所述的三维人脸模型重建方法进行详细的说明。图1是本申请 提供的三维人脸模型重建方法的一种实施例的方法流程示意图。虽然本申请提供了如下述实施例或附图所示的方法操作步骤,但基于常规或者无需创造性的劳动在所述方法中可以包括更多或者更少的操作步骤。在逻辑性上不存在必要因果关系的步骤中,这些步骤的执行顺序不限于本申请实施例提供的执行顺序。所述方法在实际中的三维人脸模型重建过程中或者装置执行时,可以按照实施例或者附图所示的方法顺序执行或者并行执行(例如并行处理器或者多线程处理的环境)。
具体地,本申请提供的三维人脸模型重建方法的一种实施例如图1所示,所述方法可以包括:
S101:获取单相机人脸图像。
S103:将所述单相机人脸图像输入至人脸信息预测模型中,经所述人脸信息预测模型输出所述单相机人脸图像中的目标人脸二维特征点和目标纹理映射信息。
S105:利用身份参数、人脸姿态参数和表情参数确定三维人脸模型,使得所述三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配。
本申请实施例中,所述单相机人脸图像可以包括利用单相机所拍摄的人脸图像,所述单相机可以包括单个摄像装置,例如包括单镜头反光相机、具有摄像功能的智能设备(如智能手机、平板电脑、智能穿戴设备等),摄像机可以为RGB相机,或者RGBD相机等等。本申请实施例中,所述单相机人脸图像可以包括RGB图像、灰度图像等任何格式的图像。在实际的应用环境中,相机所拍摄的图像不仅仅包括人脸图像,还可以包括人脸以外的背景图像。基于此,可以从单相机捕捉的图像中截取出尽可能只包含人脸的单相机人脸图像。具体地,在一个实施例中,首先,可以获取包含人脸图像在内的单相机图像。然后,可以对所述单相机图像进行人脸检测,并从所述单相机图像中截取单相机人脸图像。在一个示例中,可以利用基于机器学习的人脸检测算法检测出所述单相机图像的人脸图像,并从所述单相机图像中截取出所述单相机人脸图像。其中,所述人脸检测算法可以包括R-CNN、Fast R-CNN、Faster R-CNN、TCDCN、MTCNN、YOLOV3、SSD等算法,在此不做限制。
本申请实施例中,在获取到所述单相机人脸图像之后,可以将所述单相机人脸图像输入至人脸信息预测模型中,经所述人脸信息预测模型输出所述单相机人脸图像中的目标人脸二维特征点和目标纹理映射信息。其中,所述人脸二维特征点包括用于表征人脸面部特征的关键点,在一些示例中,对于73个人脸二维特征点,可以包括15个脸轮廓特征点、 16个眼睛特征点(左右眼各8个),12个眉毛特征点(左右各6个),12个鼻子特征点和18个嘴巴特征点。当然,在其他实施例中,所述人脸二维特征点的数量还可以包括68个、81个等等,本申请对于人脸二维特征点的数量不做限制。所述纹理映射信息指通过纹理坐标建立的人脸图像像素到三维人脸模型的映射关系。在本申请的一个实施例中,所述三维人脸模型可以包括由预设数量的多个多边形网格相互连接组成的三维模型,所述多边形网格的网格顶点的位置由身份参数、人脸姿态参数和表情参数确定。所述多边形网格可以包括三角形网格、五边形网格、六边形网格等等。需要说明的是,所述多边形网格的边与其相邻的多边形网格共享。由于所述多边形网格的数量为固定的,因此,所述三维人脸模型的网格顶点的数量也是固定。由于在初始阶段所述身份参数、人脸姿态参数和表情参数是未知的,因此,所述三维人脸模型中网格顶点的位置为默认位置。后续实施例中,重建所述三维人脸模型的过程即为调整所述网格顶点的位置的过程。在一些示例中,所述网格顶点可以具有唯一标识,所述唯一标识例如可以包括纹理映射的(u,v)坐标,这样,所述纹理映射信息可以包括人脸图像像素点到所述网格顶点的唯一标识的映射关系。在一个示例中,所述映射关系可以包括单相机图像中坐标位置为(34,17)的像素点对应于所述三维人脸模型中纹理坐标为(0.2,0.5)的网格顶点。
在本申请实施例中,利用所述人脸信息预测模型可以确定所述单相机人脸图像中的至少两种人脸信息,包括所述目标人脸二维特征点和所述目标纹理映射信息。在一个实施例中,所述人脸信息预测模型可以包括多任务机器学习模型,所述多任务机器学习模型可以实现多种任务,例如包括多任务深度学习网络,本申请实施例的所述多任务深度学习网络可以实现两种预测任务。本申请实施例中,由于人脸二维特征点和目标纹理映射信息等人脸信息之间具有相关性,因此,将多种人脸信息融合到同一个模型中学习,可以利用多种信息之间的相关性提升所述人脸信息预测模型的准确性。
本申请实施例中,在训练得到所述人脸信息预测模型的一个实施例中,如图2所示,可以包括下述步骤:
S201:获取多个单相机人脸样本图像,所述单相机人脸样本图像中标注有人脸二维特征点和纹理映射信息。
S203:构建人脸信息预测模型,所述人脸信息预测模型中设置有模型参数。
S205:将所述单相机人脸样本图像输入至所述人脸信息预测模型中,生成预测结果,所述预测结果包括预测得到的人脸二维特征点和纹理映射信息。
S207:基于所述预测结果与所述人脸二维特征点、所述纹理映射信息之间的差异, 对所述模型参数进行迭代调整,直至所述差异满足预设要求。
本申请实施例中,所述预设要求例如可以包括所述差异的数值小于预设阈值。由于是多任务学习,所述预测结果中可以包括人脸二维特征点、纹理映射信息等多种信息,所述差异可以包括多个预测结果分别与对应的人脸二维特征点和纹理映射信息之间的差异之和。当然,所述人脸信息预测模型可以输出的人脸信息不限于上述所述人脸二维特征点和所述纹理映射信息,还可以包括其他任何人脸信息,本申请在此不做限制。需要说明的是,训练所述人脸信息预测模型的机器学习算法可以包括Resnet骨干网络、MobileNet骨干网络、VGG骨干网络等等,在此不做限制。
在实际应用中,训练得到准确的模型往往需要较多的样本数据,而样本数据的标注需要耗费较多的时间成本和人力成本,尤其对于基于多任务学习的人脸信息预测模型,需要在所述单相机人脸样本图像上标注人脸二维特征点和纹理映射信息等多种信息。基于此,在本申请的一个实施例中,如图3所示,可以按照下述方式获取所述单相机人脸样本图像:
S301:利用多相机同时从不同角度采集得到同一人脸的多个单相机图像。
S303:利用所述多个单相机图像重建得到所述人脸的三维人脸模型。
S305:将所述三维人脸模型投影至所述多个单相机图像中,分别获取所述多个单相机图像中的人脸二维特征点和纹理映射信息。
S307:根据所述人脸二维特征点和/或所述纹理映射信息,分别从所述多个单相机图像中分割出人脸图像,并将多个所述人脸图像作为训练所述人脸信息预测模型所使用的单相机人脸样本图像。
本申请实施例中,可以利用多个相机同时从多个角度拍摄同一人脸,这样可以获取该人脸的多个单相机图像。例如,利用5个相机拍摄得到5张图像,这样,可以一次性获取到5张单相机图像。然后,可以利用所述多个单相机图像重建得到所述人脸的三维人脸模型,通过所述三维人脸模型可以确定身份参数、人脸姿态参数和表情参数。最后,可以将所述三维人脸模型分别投影回多个单相机图像中,以分别获得到各个单相机图像中的人脸二维特征点和纹理映射信息。此处多相机重建所用到的三维人脸模型和后续单相机重建所用到的三维人脸模型需要具有相同的拓扑结构,即相同的顶点连接关系。
在实际应用场景下,所述单相机图像中包括人脸信息,可以根据人脸信息对所述单相机图像进行人脸图像分割,并将分割得到的人脸图像作为训练所述人脸信息预测模型所使用的单相机人脸样本图像,其中人脸信息可以包括人脸二维特征点和纹理映射信息。在上述实施例中,将所述三维人脸模型投影至所述单相机图像中,可以获取到所述单相机图像 的人脸二维特征点和纹理映射信息。因此,可以根据所述人脸二维特征点,将人脸图像从所述单相机图像分割出来得到单相机人脸样本图像。在一个示例中,可以利用BoundingBox算法进行图像分离。
本实施例中生成所述单相机人脸样本图像的方式可以大量节省人工标注成本,并且可以用较少的时间获取到大量的样本的数据,降低获取训练样本的成本。
本申请实施例中,在利用所述人脸信息预测模型输出所述目标人脸二维特征点和目标纹理映射信息之后,可以重建得到由身份参数、人脸姿态参数和表情参数所确定的三维人脸模型,使得从所述三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配。基于此,在本申请重建三维人脸模型中,可以通过不断调整所述身份参数、所述人脸姿态参数和所述表情参数,使得生成的三维人脸模型所确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配。
在本申请的一个实施例中,可以利用analysis-by-synthesis(合成-分析)算法调整参数确定所述三维人脸模型。在本申请实施例中,可以交替固定身份参数、人脸姿态参数和表情参数中的其中一种或两种参数,调整另外两种或一种参数,生成三维人脸模型,使得所述三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配。本实施例中,可以采用“固定身份参数优化人脸姿态参数、表情参数”和“固定人脸姿态参数、表情参数优化身份参数”这种交替优化的策略,相对于同时优化身份参数、人脸姿态参数和表情参数的策略,这种交替优化的方式能够使得所述三维人脸模型快速收敛,提升优化效率。
具体地,在本申请的一个实施例中,如图4所示,所述交替固定身份参数、人脸姿态参数和表情参数中的其中一种或两种参数,调整另外两种或一种参数,生成三维人脸模型,可以包括:
S401:交替固定身份参数、人脸姿态参数和表情参数中的其中一种或两种参数,调整另外两种或一种参数,生成预测三维人脸模型;
S403:将所述预测三维人脸模型投影至所述单相机人脸图像中,获取预测人脸二维特征点和预测纹理映射信息;
S405:基于所述预测人脸二维特征点与所述目标人脸二维特征点、所述预测纹理映射信息与所述目标纹理映射信息之间的差异,对所述另外两种或一种参数进行迭代调整,直至所述差异或者迭代次数中的至少一个满足预设要求。
上述实施例提供了交替优化的一种具体实施方式,在优化初始时刻,可以提供初始三维人脸模型,所述初始三维人脸模型为还未开始优化参数时的三维人脸模型,所述初始三维人脸模型可以基于默认身份参数、默认人脸姿态参数和默认表情参数中生成。所述默认参数可以根据预设数据库中存储的身份参数、人脸姿态参数和表情参数的平均值确定,也可以利用上一帧单相机人脸图像重建得到的身份参数、人脸姿态参数和表情参数确定,本申请在此不做限制。另外,先“固定身份参数优化人脸姿态参数、表情参数”还是先“固定人脸姿态参数、表情参数优化身份参数”,本申请在此不做限制。
在一个具体的示例中,首先可以固定身份参数优化人脸姿态参数、表情参数。具体地,可以确定单相机人脸图像1的目标人脸二维特征点和目标纹理映射信息,如73个目标特征点和纹理映射信息。然后,可以将所述初始三维人脸模型投影到单相机人脸图像1中,获取到预测人脸二维特征点和预测纹理映射信息,并确定所述预测人脸二维特征点与所述目标人脸二维特征点、所述预测纹理映射信息与所述目标纹理映射信息之间的差异。然后,可以基于所述差异,对所述人脸姿态参数和所述表情参数进行调整。此后,再固定人脸姿态参数、表情参数优化身份参数,调整的方式与固定身份参数优化人脸姿态参数、表情参数中调整的方式相同,在此不再赘述。通过交替迭代调整所述身份参数、所述人脸姿态参数和所述表情参数,直至所述预测人脸二维特征点与所述目标人脸二维特征点、所述预测纹理映射信息与所述目标纹理映射信息之间的差异或者迭代次数中的至少一个满足预设要求。
需要说明的是,所述迭代调整的方式可以包括梯度下降优化算法(Gradient-based Optimization)、粒子群优化算法(Particle Swarm Optimization)等等,本申请在此不做限制。所述差异对应的预设要求可以包括所述差异的数值小于等于预设阈值等,所述预设阈值可以设置为0、0.01等数值。所述迭代次数对应的预设要求可以包括迭代次数小于预设次数,所述预设次数例如可以设置为5次、7次等等。若所述差异或者所述迭代次数中的至少一个满足预设要求时所确定的一组参数为(身份参数1,人脸姿态参数1,表情参数1),由(身份参数1,人脸姿态参数1,表情参数1)可以确定预测三维人脸模型。
实际应用中,重建得到的三维人脸模型会有很多可能性,因此,可能会产生一定程度的歧义性。例如,重建出来的三维人脸模型不是自然逼真的人脸状态。基于此,本申请实施例中,还可以获取所述身份参数、所述人脸姿态参数和所述表情参数中至少一个参数的先验概率分布结果和先验概率目标值,通过比较所述先验概率分布结果和所述先验概率目标值,避免所述先验概率分布结果超出合理的范围。本申请实施例中,所述先验概率目标 值可以根据大量真实采集的人脸数据确定,因此,可以有效降低重建得到的三维人脸模型的歧义性。
具体地,在本申请的一个实施例中,如图5所示,所述基于所述预测人脸二维特征点与所述目标人脸二维特征点、所述预测纹理映射信息与所述目标纹理映射信息之间的差异,对所述另外两种或一种参数进行迭代调整,直至所述差异或者迭代次数中的至少一个满足预设要求,可以包括:
S501:获取所述身份参数、所述人脸姿态参数和所述表情参数中至少一种参数的先验概率分布结果和先验概率目标值;
S503:基于所述预测人脸二维特征点与所述目标人脸二维特征点、所述预测纹理映射信息与所述目标纹理映射信息之间的差异以及所述身份参数、所述人脸姿态参数和所述表情参数中至少一种参数的先验概率分布结果与所述先验概率目标值的差异,对所述另外两种或一种参数进行迭代调整,直至所述差异或者迭代次数中的至少一个满足预设要求。
本申请实施例中,可以将所述身份参数、所述人脸姿态参数和所述表情参数中至少一个参数的先验概率分布结果和先验概率目标值之间的差异也作为所述三维人脸模型收敛的条件,可以有效降低重建得到的三维人脸模型的歧义性。
在进行实时三维人脸模型重建的很多应用场景下,例如直播、拍摄影片等,往往是在一段时间内,只拍摄具有同一个人,即只拍摄同一张人脸。针对相同的人脸,那么其身份参数是固定的,在对后续的单相机人脸图像重建三维人脸模型的过程中,可以继续使用所述单相机人脸图像1的身份参数,并在优化过程中,可以只对所述人脸姿态参数和所述表情参数优化,简化优化过程,提高人脸模型重建效率。
针对上述连续长时间拍摄同一人脸的应用场景,为了提升身份参数的稳定性和准确性,可以同时对该用户的多个单相机人脸图像进行联合优化,以提升重建效率。在一个实施例中,可以获取到同一用户的多个单相机人脸图像,如实时拍摄的20帧不同表情的人脸图像。同样可以采用“固定身份参数优化人脸姿态参数、表情参数”和“固定人脸姿态参数、表情参数优化身份参数”这种交替优化的策略。在该示例中,假设参与联合优化的单相机人脸图像的数量为N,且N个单相机人脸图像属于同一人脸。
进一步,可以基于所述N个单相机人脸图像,交替固定身份参数或者人脸姿态参数、表情参数,调整所述人脸姿态参数、所述表情参数或者所述身份参数,联合优化生成N个具有相同身份参数的三维人脸模型,分别使得所述N个三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹 配。如图6所示,在一个具体的实施例中,包括:
S601:交替固定身份参数或者人脸姿态参数、表情参数,调整所述人脸姿态参数、所述表情参数或者所述身份参数,生成N个预测三维人脸模型,其中,在固定人脸姿态参数、表情参数调整身份参数的情况下,联合优化所述N个预测三维人脸模型的身份参数;
S603:分别将所述N个预测三维人脸模型投影至对应的所述单相机人脸图像中,获取预测人脸二维特征点和预测纹理映射信息;
S605:基于所述预测人脸二维特征点与所述目标人脸二维特征点、所述预测纹理映射信息与所述目标纹理映射信息之间的差异,对所述人脸姿态参数、所述表情参数或者所述身份参数进行迭代调整,直至所述差异或者迭代次数中的至少一个满足预设要求。
本申请实施例中,由于同一个人脸的身份参数是相同的,因此,在固定人脸姿态参数、表情参数调整身份参数的情况下,联合优化所述N个预测三维人脸模型的身份参数。使得上述实施例的技术方案不仅具有交替优化的收敛速度快、重建效率高的优势,还利用同一人脸身份参数相同的特点对多张单相机人脸图像进行联合优化,这样,通过一次优化,可以重建得到多个三维人脸模型,大大提升重建效率。
本申请实施例中,在联合优化所述N个预测三维人脸模型的身份参数后,针对于后续的单相机人脸图像进行重建三维人脸的过程中,使用并固定所述联合优化所述N个预测三维人脸模型的所述身份参数,调整所述人脸姿态参数和所述表情参数,直至所述差异或者迭代次数中的至少一个满足预设要求。
下面通过一个具体的示例说明上述实施例。首先,可以固定身份参数优化人脸姿态参数、表情参数,具体地,利用S101和S103的实施方式分别确定N个单相机人脸图像的目标人脸二维特征点和目标纹理映射信息。然后,可以获取N个初始三维人脸模型,所述初始三维人脸模型的获取方式可以参考上述实施例,在此不做限制。将所述N个初始三维人脸模型分别投影至对应的所述单相机人脸图像中,可以获取到N个第一预测人脸二维特征点和第一预测纹理映射信息,并分别确定N个第一预测人脸二维特征点分别与对应的所述目标人脸二维特征点之间的N个差异以及N个第一预测纹理映射信息分别与对应的所述目标纹理映射信息之间的N个差异。基于所述2N个差异,分别对N个模型的所述人脸姿态参数和所述表情参数进行调整,得到N组参数为(身份参数1,人脸姿态参数1,所述表情参数1)、(身份参数1,人脸姿态参数2,所述表情参数2)……(身份参数1,人脸姿态参数N,所述表情参数N),根据N组参数,可以确定N个第一预 测三维人脸模型。然后,可以固定人脸姿态参数、表情参数,优化身份参数。具体地,可以分别将所述N个第一预测三维人脸模型投影至对应的所述单相机人脸图像中,获取到N个第二预测人脸二维特征点和N个第二预测纹理映射信息,并确定所述N个第二预测人脸二维特征点分别与对应的所述目标人脸二维特征点之间的N个差异:△1、△2、……、△N,以及所述N个第二预测纹理映射信息分别与对应的所述目标纹理映射信息之间的N个差异:△-1、△-2、……、△-N,确定N个差异的和为Σ△1=△1+△2+……+△N以及Σ△2=△-1+△-2+……+△-N。基于Σ△1和Σ△2,分别对N个模型的所述身份参数进行调整,得到N组参数为(身份参数X,人脸姿态参数1,所述表情参数1)、(身份参数X,人脸姿态参数2,所述表情参数2)……(身份参数X,人脸姿态参数N,所述表情参数N),根据N组参数,可以确定N个所述第二预测三维人脸模型。通过交替迭代调整所述人脸姿态参数、所述表情参数或者所述身份参数,直至所述预测人脸二维特征点与所述目标人脸二维特征点、所述预测纹理映射信息与所述目标纹理映射信息之间的差异或者迭代次数中的至少一个满足预设要求。
进一步,在获得身份参数X之后,对于后续获得的单相机人脸图像,在重建三维人脸模型的过程中,可以使用上述联合优化过程中获得的身份参数X,并在优化后续获得的单相机人脸图像过程中,可以只对所述人脸姿态参数和所述表情参数优化,这样简化优化过程,提高人脸模型重建效率。
需要说明的是,所述迭代调整的方式可以包括梯度下降优化算法(Gradient-based Optimization)、粒子群优化算法(Particle Swarm Optimization)等等,本申请在此不做限制。所述差异对应的预设要求可以包括所述差异的数值小于等于预设阈值等,所述预设阈值可以设置为0、0.01等数值。所述迭代次数对应的预设要求可以包括迭代次数小于预设次数,所述预设次数例如可以设置为5次、7次等等。
本申请实施例中,也可以在对N张单相机人脸图像进行联合优化的场景中利用先验概率约束所述身份参数、所述人脸姿态参数和所述表情参数,使得重建得到的三维人脸模型更加逼真。
现有的单相机捕捉技术,仍然存在着准确性差、无法捕捉眼球状态等问题。眼球状态的捕捉,对于还原人脸表情的逼真性起了决定性的作用。基于此,可以获取三维眼球模型,所述三维眼球模型包括眼神信息,然后将所述三维人脸模型和所述三维眼球模型组合成新的三维人脸模型。这样,可以捕捉到具有眼球状态的人脸模型,更加逼真。
在根据本发明实施例提供的方法获得重建后的三维人脸模型之后,可以获得三维眼球 模型,其中三维眼球模型包括眼神。三维眼球模型的建立方法可以包括但不限于以下方法:基于红外设备的眼神捕捉,用户需要佩戴指定的红外眼镜或者安装特定的红外设备,通过比较红外反射光强弱的不同来判定眼神的状态来重建眼球;基于单相机的眼神捕捉,利用合成-分析的方法,通过比较合成的眼神与图片观测到的眼神的区别进行优化得到最终的眼神;具体方法在此不做限定。
本申请实施例中,在确定所述三维人脸模型之后,根据所述三维人脸模型获取到生动逼真的人脸形象。例如,在直播场景中,可以将后台演员的三维人脸模型渲染至动画人物中,产生动画人物直播的生动场景。在游戏场景中,可以将玩家的三维人脸模型渲染至游戏人物中,产生生动的游戏场景。当然,还可以使用到动画制作,电影制作等其他多种应用场景中,本申请在此不做限制。
本申请提供的三维人脸模型重建方法,可以采用在离线模式和实时模式上,离线模式包括根据离线视频进行三维人脸模型重建方法,无需立刻输出三维人脸模型,可以用在后期制作动画影视。实时模式包括还可以在交互游戏、直播等需要和用户进行实时互动的领域运行,实时应用时经过GPU加速之后,可以实时运行(即获得图片后,即刻输出三维人脸模型,这之间的延迟不易被用户感知到)。三维人脸模型重建方法可以有离线模式和实时模式,使得该可以得到更加广泛的应用。
本申请提供的三维人脸模型重建方法,可以基于单相机人脸图像重建得到三维人脸模型,发挥了单相机图像采集的低成本、易安装、用户友好等优势。基于此,单相机人脸图像容易采集的优势不仅可以降低人脸信息预测模型的构建成本,还可以让重建三维人脸模型变得更加快捷、简便。在重建过程中,基于人脸信息预测模型所输出的人脸二维特征点和纹理映射信息,可以有效提高模型重建的准确性和鲁棒性。通过身份参数、人脸姿态参数和表情参数重建得到三维人脸模型,为单相机虚拟直播、单相机智能交互技术、人脸识别、刑侦监控、电影游戏、表情分析等技术领域提供准确可靠的技术方案。
对应于上述三维人脸模型重建方法,如图7所示,本申请还提供一种电子设备,包括处理器以及用于存储处理器可执行指令的存储器,所述处理器执行所述指令时可以实现上述任一实施例所述的三维人脸模型重建方法。
本申请另一方面还提供一种三维人脸模型重建装置,如图8所示,所述装置800可以 包括:
获取模块801,用于获取单相机人脸图像;
信息预测模块803,用于将所述单相机人脸图像输入至人脸信息预测模型中,经所述人脸信息预测模型输出所述单相机人脸图像中的目标人脸二维特征点和目标纹理映射信息;
模型确定模块805,用于利用身份参数、人脸姿态参数和表情参数确定三维人脸模型,使得所述三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配。
可选的,在本申请的一个实施例中,所述获取模块,包括:
图像获取子模块,用于获取包含人脸图像在内的单相机图像;
人脸检测子模块,用于对所述单相机图像进行人脸检测,并从所述单相机图像中截取单相机人脸图像。
可选的,在本申请的一个实施例中,所述人脸信息预测模型被设置为利用下述子模块训练得到:
样本获取子模块,用于获取多个单相机人脸样本图像,所述单相机人脸样本图像中标注有人脸二维特征点和纹理映射信息;
模型构建子模块,用于构建人脸信息预测模型,所述人脸信息预测模型中设置有模型参数;
预测结果生成子模块,用于将所述单相机人脸样本图像输入至所述人脸信息预测模型中,生成预测结果,所述预测结果包括预测得到的人脸二维特征点和纹理映射信息;
迭代调整子模块,用于基于所述预测结果与所述人脸二维特征点、所述纹理映射信息之间的差异,对所述模型参数进行迭代调整,直至所述差异满足预设要求。
可选的,在本申请的一个实施例中,所述单相机人脸样本图像被设置为按照下述模块获取:
图像获取子模块,用于利用多相机同时从不同角度采集得到同一人脸的多个单相机图像;
模型重建子模块,用于利用所述多个单相机图像重建得到所述人脸的三维人脸模型;
人脸信息获取子模块,用于将所述人脸的所述三维人脸模型分别投影至所述多个单相机图像中,分别获取所述多个单相机图像中的人脸二维特征点和纹理映射信息;
图像分割子模块,用于根据所述人脸二维特征点和/或所述纹理映射信息,分别从所 述多个单相机图像中分割出人脸图像,并将多个所述人脸图像作为训练所述人脸信息预测模型所使用的单相机人脸样本图像。
可选的,在本申请的一个实施例中,所述模型确定模块包括:
交替调整子模块,用于交替固定身份参数、人脸姿态参数和表情参数中的其中一种或两种参数,调整另外两种或一种参数,生成三维人脸模型,使得所述三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配。
可选的,在本申请的一个实施例中,所述交替调整子模块,包括:
预测模型生成单元,用于交替固定身份参数、人脸姿态参数和表情参数中的其中一种或两种参数,调整另外两种或一种参数,生成预测三维人脸模型;
人脸信息获取单元,用于将所述预测三维人脸模型投影至所述单相机人脸图像中,获取预测人脸二维特征点和预测纹理映射信息;
迭代调整单元,用于基于所述预测人脸二维特征点与所述目标人脸二维特征点、所述预测纹理映射信息与所述目标纹理映射信息之间的差异,对所述另外两种或一种参数进行迭代调整,直至所述差异或者迭代次数中的至少一个满足预设要求。
可选的,在本申请的一个实施例中,所述迭代调整单元,包括:
先验结果获取子单元,用于获取所述身份参数、所述人脸姿态参数和所述表情参数中至少一种参数的先验概率分布结果和先验概率目标值;
迭代调整子单元,用于基于所述预测人脸二维特征点与所述目标人脸二维特征点、所述预测纹理映射信息与所述目标纹理映射信息之间的差异以及所述身份参数、所述人脸姿态参数和所述表情参数中至少一种参数的先验概率分布结果与所述先验概率目标值的差异,对所述另外两种或一种参数进行迭代调整,直至所述差异或者迭代次数中的至少一个满足预设要求。
可选的,在本申请的一个实施例中,在所述单相机人脸图像的数量N大于等于2,且N个单相机人脸图像属于同一人脸的情况下,所述模型确定模块,包括:
多模型确定子模块,用于基于所述N个单相机人脸图像,交替固定身份参数或者人脸姿态参数、表情参数,调整所述人脸姿态参数、所述表情参数或者所述身份参数,联合优化生成N个具有相同身份参数的三维人脸模型,分别使得所述N个三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配。
可选的,在本申请的一个实施例中,所述多模型确定子模块,包括:
预测模型生成单元,用于交替固定身份参数或者人脸姿态参数、表情参数,调整所述人脸姿态参数、所述表情参数或者所述身份参数,生成N个预测三维人脸模型,其中,在固定人脸姿态参数、表情参数调整身份参数的情况下,联合优化所述N个预测三维人脸模型的身份参数;
预测人脸信息获取单元,用于分别将所述N个预测三维人脸模型投影至对应的所述单相机人脸图像中,获取预测人脸二维特征点和预测纹理映射信息;
迭代调整单元,用于基于所述预测人脸二维特征点与所述目标人脸二维特征点、所述预测纹理映射信息与所述目标纹理映射信息之间的差异,对所述人脸姿态参数、所述表情参数或者所述身份参数进行迭代调整,直至所述差异或者迭代次数中的至少一个满足预设要求。
可选的,在本申请的一个实施例中,所述多模型确定子模块还包括:
优化调整单元,用于针对于后续的单相机人脸图像进行重建三维人脸的过程中,使用并固定所述联合优化所述N个预测三维人脸模型的所述身份参数,调整所述人脸姿态参数和所述表情参数,直至所述差异或者迭代次数中的至少一个满足预设要求。
可选的,在本申请的一个实施例中,所述三维人脸模型包括由预设数量的多个多边形网格相互连接组成的三维模型,所述多边形网格的网格顶点的位置由所述身份参数、所述人脸姿态参数和所述表情参数确定。
可选的,在本申请的一个实施例中,所述装置还包括:
眼球模型获取模块,用于获取三维眼球模型,所述三维眼球模型包括眼神信息;
模型组合模块,用于将所述三维人脸模型和所述三维眼球模型组合成新的三维人脸模型。
本申请另一方面还提供一种计算机可读存储介质,其上存储有计算机指令,所述指令被执行时实现上述任一实施例所述方法的步骤。
所述计算机可读存储介质可以包括用于存储信息的物理装置,通常是将信息数字化后再以利用电、磁或者光学等方式的媒体加以存储。本实施例所述的计算机可读存储介质有可以包括:利用电能方式存储信息的装置如,各式存储器,如RAM、ROM等;利用磁能方式存储信息的装置如,硬盘、软盘、磁带、磁芯存储器、磁泡存储器、U盘;利用光学方式存储信息的装置如,CD或DVD。当然,还有其他方式的可读存储介质,例如量 子存储器、石墨烯存储器等等。
在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable Gate Array,FPGA))就是这样一种集成电路,其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字***“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等,目前最普遍使用的是VHDL(Very-High-Speed Integrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。
控制器可以按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式,控制器的例子包括但不限于以下微控制器:ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320,存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内包括的用于实现各种功 能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。
上述实施例阐明的***、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。
本领域内的技术人员应明白,本发明的实施例可提供为方法、***、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络 接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本申请的实施例可提供为方法、***或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于***实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例 的部分说明即可。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (15)

  1. 一种三维人脸模型重建方法,其特征在于,所述方法包括:
    获取单相机人脸图像;
    将所述单相机人脸图像输入至人脸信息预测模型中,经所述人脸信息预测模型输出所述单相机人脸图像中的目标人脸二维特征点和目标纹理映射信息;
    利用身份参数、人脸姿态参数和表情参数确定三维人脸模型,使得所述三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配。
  2. 根据权利要求1所述的方法,其特征在于,所述获取单相机人脸图像,包括:
    获取包含人脸图像在内的单相机图像;
    对所述单相机图像进行人脸检测,并从所述单相机图像中截取单相机人脸图像。
  3. 根据权利要求1所述的方法,其特征在于,所述人脸信息预测模型被设置为按照下述方式训练得到:
    获取多个单相机人脸样本图像,所述单相机人脸样本图像中标注有人脸二维特征点和纹理映射信息;
    构建人脸信息预测模型,所述人脸信息预测模型中设置有模型参数;
    将所述单相机人脸样本图像输入至所述人脸信息预测模型中,生成预测结果,所述预测结果包括预测得到的人脸二维特征点和纹理映射信息;
    基于所述预测结果与所述人脸二维特征点、所述纹理映射信息之间的差异,对所述模型参数进行迭代调整,直至所述差异满足预设要求。
  4. 根据权利要求3所述的方法,其特征在于,所述单相机人脸样本图像被设置为按照下述方式获取:
    利用多相机同时从不同角度采集得到同一人脸的多个单相机图像;
    利用所述多个单相机图像重建得到所述人脸的三维人脸模型;
    将所述人脸的所述三维人脸模型分别投影至所述多个单相机图像中,分别获取所述多个单相机图像中的人脸二维特征点和纹理映射信息;
    根据所述人脸二维特征点和/或所述纹理映射信息,分别从所述多个单相机图像中分割出人脸图像,并将多个所述人脸图像作为训练所述人脸信息预测模型所使用的单相机人脸样本图像。
  5. 根据权利要求1所述的方法,其特征在于,所述利用身份参数、人脸姿态参数和表情参数确定三维人脸模型,使得所述三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配,包括:
    交替固定身份参数、人脸姿态参数和表情参数中的其中一种或两种参数,调整另外两种或一种参数,生成三维人脸模型,使得所述三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配。
  6. 根据权利要求5所述的方法,其特征在于,所述交替固定身份参数、人脸姿态参数和表情参数中的其中一种或两种参数,调整另外两种或一种参数,生成三维人脸模型,包括:
    交替固定身份参数、人脸姿态参数和表情参数中的其中一种或两种参数,调整另外两种或一种参数,生成预测三维人脸模型;
    将所述预测三维人脸模型投影至所述单相机人脸图像中,获取预测人脸二维特征点和预测纹理映射信息;
    基于所述预测人脸二维特征点与所述目标人脸二维特征点、所述预测纹理映射信息与所述目标纹理映射信息之间的差异,对所述另外两种或一种参数进行迭代调整,直至所述差异或者迭代次数中的至少一个满足预设要求。
  7. 根据权利要求6所述的方法,其特征在于,所述基于所述预测人脸二维特征点与所述目标人脸二维特征点、所述预测纹理映射信息与所述目标纹理映射信息之间的差异,对所述另外两种或一种参数进行迭代调整,直至所述差异或者迭代次数中的至少一个满足预设要求,包括:
    获取所述身份参数、所述人脸姿态参数和所述表情参数中至少一种参数的先验概率分布结果和先验概率目标值;
    基于所述预测人脸二维特征点与所述目标人脸二维特征点、所述预测纹理映射信息与所述目标纹理映射信息之间的差异以及所述身份参数、所述人脸姿态参数和所述表情参数 中至少一种参数的先验概率分布结果与所述先验概率目标值的差异,对所述另外两种或一种参数进行迭代调整,直至所述差异或者迭代次数中的至少一个满足预设要求。
  8. 根据权利要求1所述的方法,其特征在于,在所述单相机人脸图像的数量N大于等于2,且N个单相机人脸图像属于同一人脸的情况下,所述利用身份参数、人脸姿态参数和表情参数确定三维人脸模型,使得所述三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配,包括:
    基于所述N个单相机人脸图像,交替固定身份参数或者人脸姿态参数、表情参数,调整所述人脸姿态参数、所述表情参数或者所述身份参数,联合优化生成N个具有相同身份参数的三维人脸模型,分别使得所述N个三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配。
  9. 根据权利要求8所述的方法,其特征在于,所述交替固定身份参数或者人脸姿态参数、表情参数,调整所述人脸姿态参数、所述表情参数或者所述身份参数,联合优化生成N个具有相同身份参数的三维人脸模型,包括:
    交替固定身份参数或者人脸姿态参数、表情参数,调整所述人脸姿态参数、所述表情参数或者所述身份参数,生成N个预测三维人脸模型,其中,在固定人脸姿态参数、表情参数调整身份参数的情况下,联合优化所述N个预测三维人脸模型的身份参数;
    分别将所述N个预测三维人脸模型投影至对应的所述单相机人脸图像中,获取预测人脸二维特征点和预测纹理映射信息;
    基于所述预测人脸二维特征点与所述目标人脸二维特征点、所述预测纹理映射信息与所述目标纹理映射信息之间的差异,对所述人脸姿态参数、所述表情参数或者所述身份参数进行迭代调整,直至所述差异或者迭代次数中的至少一个满足预设要求。
  10. 根据权利要求9所述的方法,其特征在于,在所述联合优化所述N个预测三维人脸模型的身份参数后,所述方法还包括:
    针对于后续的单相机人脸图像进行重建三维人脸的过程中,使用并固定所述联合优化所述N个预测三维人脸模型的所述身份参数,调整所述人脸姿态参数和所述表情参数,直至所述差异或者迭代次数中的至少一个满足预设要求。
  11. 根据权利要求1所述的方法,其特征在于,所述三维人脸模型包括由预设数量的多个多边形网格相互连接组成的三维模型,所述多边形网格的网格顶点的位置由所述身份参数、所述人脸姿态参数和所述表情参数确定。
  12. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取三维眼球模型,所述三维眼球模型包括眼神信息;
    将所述三维人脸模型和所述三维眼球模型组合成新的三维人脸模型。
  13. 一种三维人脸模型重建装置,其特征在于,包括:
    获取模块,用于获取单相机人脸图像;
    信息预测模块,用于将所述单相机人脸图像输入至人脸信息预测模型中,经所述人脸信息预测模型输出所述单相机人脸图像中的目标人脸二维特征点和目标纹理映射信息;
    模型确定模块,用于利用身份参数、人脸姿态参数和表情参数确定三维人脸模型,使得所述三维人脸模型确定的人脸二维特征点与所述目标人脸二维特征点相匹配、确定的纹理映射信息和目标纹理映射信息相匹配。
  14. 一种电子设备,其特征在于,包括处理器以及用于存储处理器可执行指令的存储器,所述处理器执行所述指令时实现权利要求1-12任一项所述的三维人脸模型重建方法。
  15. 一种非临时性计算机可读存储介质,其特征在于,当所述存储介质中的指令由处理器执行时,使得处理器能够执行权利要求1-12任意一项所述的三维人脸模型重建方法。
PCT/CN2022/070257 2021-01-21 2022-01-05 三维人脸模型重建方法、装置、电子设备及存储介质 WO2022156532A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110082037.3A CN112884881B (zh) 2021-01-21 2021-01-21 三维人脸模型重建方法、装置、电子设备及存储介质
CN202110082037.3 2021-01-21

Publications (1)

Publication Number Publication Date
WO2022156532A1 true WO2022156532A1 (zh) 2022-07-28

Family

ID=76051743

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/070257 WO2022156532A1 (zh) 2021-01-21 2022-01-05 三维人脸模型重建方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN112884881B (zh)
WO (1) WO2022156532A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117315154A (zh) * 2023-10-12 2023-12-29 北京汇畅数宇科技发展有限公司 一种可量化的人脸模型重建方法及***
CN117409466A (zh) * 2023-11-02 2024-01-16 之江实验室 一种基于多标签控制的三维动态表情生成方法及装置

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884881B (zh) * 2021-01-21 2022-09-27 魔珐(上海)信息科技有限公司 三维人脸模型重建方法、装置、电子设备及存储介质
CN113256799A (zh) * 2021-06-07 2021-08-13 广州虎牙科技有限公司 一种三维人脸模型训练方法和装置
CN113327278B (zh) * 2021-06-17 2024-01-09 北京百度网讯科技有限公司 三维人脸重建方法、装置、设备以及存储介质
CN113469091B (zh) * 2021-07-09 2022-03-25 北京的卢深视科技有限公司 人脸识别方法、训练方法、电子设备及存储介质
CN113762147B (zh) * 2021-09-06 2023-07-04 网易(杭州)网络有限公司 人脸表情迁移方法、装置、电子设备及存储介质
CN114339190B (zh) * 2021-12-29 2023-06-23 中国电信股份有限公司 通讯方法、装置、设备及存储介质
CN114898244B (zh) * 2022-04-08 2023-07-21 马上消费金融股份有限公司 一种信息处理方法、装置、计算机设备及存储介质
CN114782864B (zh) * 2022-04-08 2023-07-21 马上消费金融股份有限公司 一种信息处理方法、装置、计算机设备及存储介质
CN114783022B (zh) * 2022-04-08 2023-07-21 马上消费金融股份有限公司 一种信息处理方法、装置、计算机设备及存储介质
CN117689801B (zh) * 2022-09-02 2024-07-30 影眸科技(上海)有限公司 人脸模型的建立方法、装置及电子设备
CN115393532B (zh) * 2022-10-27 2023-03-14 科大讯飞股份有限公司 脸部绑定方法、装置、设备及存储介质
CN116503524B (zh) * 2023-04-11 2024-04-12 广州赛灵力科技有限公司 一种虚拟形象的生成方法、***、装置及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104966316A (zh) * 2015-05-22 2015-10-07 腾讯科技(深圳)有限公司 一种3d人脸重建方法、装置及服务器
CN110956691A (zh) * 2019-11-21 2020-04-03 Oppo广东移动通信有限公司 一种三维人脸重建方法、装置、设备及存储介质
CN111274944A (zh) * 2020-01-19 2020-06-12 中北大学 一种基于单张图像的三维人脸重建方法
CN112819944A (zh) * 2021-01-21 2021-05-18 魔珐(上海)信息科技有限公司 三维人体模型重建方法、装置、电子设备及存储介质
CN112884881A (zh) * 2021-01-21 2021-06-01 魔珐(上海)信息科技有限公司 三维人脸模型重建方法、装置、电子设备及存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978764B (zh) * 2014-04-10 2017-11-17 华为技术有限公司 三维人脸网格模型处理方法和设备
CN105096377B (zh) * 2014-05-14 2019-03-19 华为技术有限公司 一种图像处理方法和装置
CN108596827B (zh) * 2018-04-18 2022-06-17 太平洋未来科技(深圳)有限公司 三维人脸模型生成方法、装置及电子设备
WO2020037676A1 (zh) * 2018-08-24 2020-02-27 太平洋未来科技(深圳)有限公司 三维人脸图像生成方法、装置及电子设备
CN109191507B (zh) * 2018-08-24 2019-11-05 北京字节跳动网络技术有限公司 三维人脸图像重建方法、装置和计算机可读存储介质
CN109377544B (zh) * 2018-11-30 2022-12-23 腾讯科技(深圳)有限公司 一种人脸三维图像生成方法、装置和可读介质
CN110428491B (zh) * 2019-06-24 2021-05-04 北京大学 基于单帧图像的三维人脸重建方法、装置、设备及介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104966316A (zh) * 2015-05-22 2015-10-07 腾讯科技(深圳)有限公司 一种3d人脸重建方法、装置及服务器
CN110956691A (zh) * 2019-11-21 2020-04-03 Oppo广东移动通信有限公司 一种三维人脸重建方法、装置、设备及存储介质
CN111274944A (zh) * 2020-01-19 2020-06-12 中北大学 一种基于单张图像的三维人脸重建方法
CN112819944A (zh) * 2021-01-21 2021-05-18 魔珐(上海)信息科技有限公司 三维人体模型重建方法、装置、电子设备及存储介质
CN112884881A (zh) * 2021-01-21 2021-06-01 魔珐(上海)信息科技有限公司 三维人脸模型重建方法、装置、电子设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117315154A (zh) * 2023-10-12 2023-12-29 北京汇畅数宇科技发展有限公司 一种可量化的人脸模型重建方法及***
CN117409466A (zh) * 2023-11-02 2024-01-16 之江实验室 一种基于多标签控制的三维动态表情生成方法及装置

Also Published As

Publication number Publication date
CN112884881B (zh) 2022-09-27
CN112884881A (zh) 2021-06-01

Similar Documents

Publication Publication Date Title
WO2022156532A1 (zh) 三维人脸模型重建方法、装置、电子设备及存储介质
WO2022156533A1 (zh) 三维人体模型重建方法、装置、电子设备及存储介质
Kartynnik et al. Real-time facial surface geometry from monocular video on mobile GPUs
US11238606B2 (en) Method and system for performing simultaneous localization and mapping using convolutional image transformation
Baruch et al. Arkitscenes: A diverse real-world dataset for 3d indoor scene understanding using mobile rgb-d data
US11756223B2 (en) Depth-aware photo editing
CN111243093B (zh) 三维人脸网格的生成方法、装置、设备及存储介质
WO2022156640A1 (zh) 一种图像的视线矫正方法、装置、电子设备、计算机可读存储介质及计算机程序产品
JP2023545199A (ja) モデル訓練方法、人体姿勢検出方法、装置、デバイスおよび記憶媒体
JP2022503647A (ja) クロスドメイン画像変換
WO2022156626A1 (zh) 一种图像的视线矫正方法、装置、电子设备、计算机可读存储介质及计算机程序产品
WO2014117446A1 (zh) 基于单个视频摄像机的实时人脸动画方法
US20240046557A1 (en) Method, device, and non-transitory computer-readable storage medium for reconstructing a three-dimensional model
WO2023071790A1 (zh) 目标对象的姿态检测方法、装置、设备及存储介质
WO2022148248A1 (zh) 图像处理模型的训练方法、图像处理方法、装置、电子设备及计算机程序产品
Wang et al. Instance shadow detection with a single-stage detector
Baudron et al. E3d: event-based 3d shape reconstruction
US10783704B2 (en) Dense reconstruction for narrow baseline motion observations
CN116977547A (zh) 一种三维人脸重建方法、装置、电子设备和存储介质
Zhang et al. 3D Gesture Estimation from RGB Images Based on DB-InterNet
Shamalik et al. Effective and efficient approach for gesture detection in video through monocular RGB frames
Jiang et al. Complementing event streams and rgb frames for hand mesh reconstruction
WO2017173977A1 (zh) 一种移动终端目标跟踪方法、装置和移动终端
US20240161391A1 (en) Relightable neural radiance field model
WO2023172353A1 (en) Probabilistic keypoint regression with uncertainty

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22742011

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22742011

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 11.12.2023)