CN114266860A

CN114266860A - Three-dimensional face model establishing method and device, electronic equipment and storage medium

Info

Publication number: CN114266860A
Application number: CN202111579569.4A
Authority: CN
Inventors: 黄开竹; 杨超龙; 赵伟光; 叶嘉楠; 闫毓垚; 杨曦
Original assignee: Xian Jiaotong Liverpool University
Current assignee: Xian Jiaotong Liverpool University
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2022-04-01

Abstract

The embodiment of the invention discloses a method and a device for establishing a three-dimensional face model, electronic equipment and a storage medium. The method comprises the following steps: acquiring a two-dimensional sample image containing a face area, and determining sample image characteristics of the two-dimensional sample image; determining a model parameter value in a pre-established three-dimensional deformation model and a sample attitude parameter corresponding to a two-dimensional sample image based on the sample image characteristics, and generating a three-dimensional face model based on each model parameter value; inputting the three-dimensional face model and the attitude parameters into a pre-established differentiable rendering model, and generating a two-dimensional rendering image corresponding to the two-dimensional sample image; and comparing the two-dimensional sample image with the two-dimensional rendering image, adjusting the parameter value of the model based on the comparison result, and determining the three-dimensional deformation model corresponding to the adjusted parameter value of the model as the target three-dimensional face model. The technical scheme of the embodiment of the invention can improve the accuracy of the model parameter value and is beneficial to improving the recognition effect of the target three-dimensional face model.

Description

Three-dimensional face model establishing method and device, electronic equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computer vision, in particular to a method and a device for establishing a three-dimensional face model, electronic equipment and a storage medium.

Background

In recent years, three-dimensional faces have been widely used in various fields. However, many real scenes have only two-dimensional face images, and thus a technology for reconstructing a three-dimensional face by using two-dimensional face images is an important requirement of the real scenes.

At present, a three-dimensional face reconstruction method is based on the establishment of a three-dimensional face database based on a three-dimensional deformation model (3DMM), a statistical method is utilized to represent the three-dimensional face model by parameters, the three-dimensional face model can be changed by changing the parameters, and the problem of three-dimensional face reconstruction is converted into the problem of predicting the 3DMM parameters. In the prior art, a Convolutional Neural Network (CNN) can be used to extract information of a two-dimensional face image to determine 3DMM parameters; the method comprises the steps of constructing a sample data set containing a large number of real 3D face images in advance, manually establishing labels for the 3D face images, training parameters in a convolutional neural network based on the real 3D face images and the labels in the sample data set, and determining 3DMM parameters.

However, the prior art needs to manually construct a large number of three-dimensional faces as training labels, which consumes a large amount of human resources and time resources, and the construction process is prone to errors, so that the 3DMM parameters cannot be accurately determined, the identification effect of the established three-dimensional face model is affected,

disclosure of Invention

The embodiment of the invention provides a method and a device for establishing a three-dimensional face model, electronic equipment and a storage medium, which are used for saving human resources and time resources and improving the accuracy of model parameter values and the recognition effect of a target three-dimensional face model.

In a first aspect, an embodiment of the present invention provides a method for building a three-dimensional face model, including:

acquiring a two-dimensional sample image containing a face region, and determining sample image characteristics of the two-dimensional sample image;

determining a model parameter value in a pre-established three-dimensional deformation model and a sample posture parameter corresponding to the two-dimensional sample image based on the sample image characteristics, and generating a three-dimensional face model based on each model parameter value;

inputting the three-dimensional face model and the posture parameters into a pre-established differentiable rendering model to generate a two-dimensional rendering image corresponding to the two-dimensional sample image;

and comparing the two-dimensional sample image with the two-dimensional rendering image, adjusting the parameter value of the model based on the comparison result, and determining the three-dimensional deformation model corresponding to the adjusted parameter value of the model as a target three-dimensional face model.

In a second aspect, an embodiment of the present invention further provides a device for building a three-dimensional face model, where the device includes:

the system comprises a sample image feature determining module, a face region determining module and a face region determining module, wherein the sample image feature determining module is used for acquiring a two-dimensional sample image containing a face region and determining sample image features of the two-dimensional sample image;

the three-dimensional face model generation module is used for determining a model parameter value in a pre-established three-dimensional deformation model and a sample posture parameter corresponding to the two-dimensional sample image based on the sample image characteristics, and generating a three-dimensional face model based on each model parameter value;

a rendering image generation module, configured to input the three-dimensional face model and the pose parameters into a pre-established differentiable rendering model, and generate a two-dimensional rendering image corresponding to the two-dimensional sample image;

and the model parameter adjusting module is used for comparing the two-dimensional sample image with the two-dimensional rendering image, adjusting the model parameter value based on the comparison result, and determining the three-dimensional deformation model corresponding to the adjusted model parameter value as the target three-dimensional face model.

In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:

one or more processors;

a storage device for storing one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors implement the method for building a three-dimensional face model provided by any embodiment of the invention.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the three-dimensional face model building method provided in any embodiment of the present invention.

The embodiment of the invention provides a three-dimensional face model building method, which comprises the steps of obtaining a two-dimensional sample image containing a face area, and determining sample image characteristics of the two-dimensional sample image; determining a model parameter value in a pre-established three-dimensional deformation model and a sample attitude parameter corresponding to a two-dimensional sample image based on the sample image characteristics, and generating a three-dimensional face model based on each model parameter value; inputting the three-dimensional face model and the attitude parameters into a pre-established differentiable rendering model, and generating a two-dimensional rendering image corresponding to the two-dimensional sample image; and comparing the two-dimensional sample image with the two-dimensional rendering image, adjusting the parameter value of the model based on the comparison result, and determining the three-dimensional deformation model corresponding to the adjusted parameter value of the model as the target three-dimensional face model. According to the embodiment of the invention, the model parameter values of the three-dimensional deformation model are determined through the two-dimensional sample image, a large number of three-dimensional face images do not need to be constructed in advance, and the human resources and the time resources are saved; and the two-dimensional sample image reflecting the real face information is compared with the two-dimensional rendering image obtained through the three-dimensional face model, and the model parameter value is adjusted, so that the accuracy of the model parameter value is improved, and the identification effect of the target three-dimensional face model is favorably improved.

In addition, the three-dimensional face model establishing device, the electronic equipment and the storage medium provided by the invention correspond to the method, and have the same beneficial effects.

Drawings

In order to illustrate the embodiments of the present invention more clearly, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained by those skilled in the art without inventive effort.

Fig. 1 is a flowchart of a method for building a three-dimensional face model according to an embodiment of the present invention;

fig. 2 is a flowchart of another three-dimensional face model building method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a process for building a three-dimensional face model according to an embodiment of the present invention;

fig. 4 is a structural diagram of a three-dimensional face model building apparatus according to an embodiment of the present invention;

fig. 5 is a structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

It should be further noted that, for the convenience of description, only some but not all of the relevant aspects of the present invention are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

Example one

Fig. 1 is a flowchart of a method for building a three-dimensional face model according to an embodiment of the present invention. The method can be executed by a three-dimensional face model establishing device, the device can be realized by software and/or hardware, and the device can be configured in a terminal and/or a server to realize the three-dimensional face model establishing method in the embodiment of the invention.

As shown in fig. 1, the method of the embodiment may specifically include:

s101, obtaining a two-dimensional sample image containing a face area, and determining sample image characteristics of the two-dimensional sample image.

In a specific implementation, a two-dimensional sample image set may be constructed in advance, and two or more two-dimensional sample images each including a face region are stored in the two-dimensional sample image set. Illustratively, two-dimensional sample images at different viewing angles can be acquired for the same face. For example, for the same face information, the two-dimensional sample image may include a left-view sample image, a middle-view sample image, and a right-view sample image.

In order to be able to better extract the sample image features in the two-dimensional sample image, image pre-processing may be performed on the two-dimensional sample image. Specifically, the face-alignment of the three-party support library can be used for detecting key points of the face in the two-dimensional sample image, and the two-dimensional sample image is cut based on the position of each key point of the face.

Exemplary embodiments of the inventionAnd determining the minimum face area containing each key point in the two-dimensional sample image, cutting the two-dimensional sample image, and reserving the image only containing the minimum face area. The cropped image may be represented as I^v. Where v denotes the index of the viewing angle. Illustratively, when acquiring two-dimensional sample images from three view angle acquisitions, I^v＝(I¹,I²,I³) (ii) a The cropped image is represented as the number of wide high channels 224 x 3.

Optionally, determining a sample image feature of the two-dimensional sample image includes: inputting the two-dimensional sample image into a pre-established face analysis network, and generating a two-dimensional mask image which corresponds to the two-dimensional sample image and contains face key points; and performing splicing operation on the two-dimensional sample image and the two-dimensional mask image to generate a spliced image, and inputting the spliced image into an encoding and decoding network to generate sample image characteristics.

Specifically, the two-dimensional sample image obtained after clipping may be input to a pre-established face analysis network. The face analysis network comprises a BiSeNet-based semantic segmentation model, a two-dimensional mask image which highlights face key point information contained in the two-dimensional sample image can be generated through the face analysis network, and the two-dimensional mask image is represented by the number of wide channels 224 x 1. The face is better highlighted through the face analysis network, redundant elements such as hair and neck are removed, and key parts such as the face, the nose, the lower lip, the upper lip, the left eyebrow, the right eyebrow, the left eye, the right eye and the mouth are reserved. The remaining portions may be marked with different numbers to form a two-dimensional mask image. The public part of the two-dimensional mask image can be marked with the same number, so that regional guidance can be provided for subsequent network learning, and a network can pay more attention to a key region of a human face.

For example, the number of the face key points may be 68, the face key points may be marked in the two-dimensional mask image, and the marking may be a numerical marking, for example, the face key points include a face 1, a left eyebrow 2, a right eyebrow 3, a left eye 4, a right eye 5, a nose 10, a mouth 11, an upper lip 12, a lower lip 13, and a remaining background 0.

Furthermore, the two-dimensional sample image and the two-dimensional mask image can be spliced to generate a spliced image, and the spliced image is input into the coding and decoding network. The coding and decoding network can be used for extracting the facial image characteristics in the two-dimensional sample image and the two-dimensional mask image. The coding and decoding network provided by the embodiment of the invention can be composed of three encoders and one decoder so as to realize the depth fusion of multi-view information.

Illustratively, the stitched image is represented by a number of wide high channels 224 × 4, and after being input to the codec network, a sample image feature with a preset number of channels may be output. For example, when the preset number of channels is 64, the output sample image is characterized by the number of wide high channels being 224 × 64.

S102, determining model parameter values in a pre-established three-dimensional deformation model and sample posture parameters corresponding to a two-dimensional sample image based on sample image characteristics, and generating a three-dimensional face model based on the model parameter values.

Specifically, a three-dimensional deformation model may be established in advance, and the three-dimensional deformation model may identify a morphological state of a target object in an input image. The three-dimensional deformation model comprises three model parameter values, and the recognition effect of the three-dimensional deformation model on the two-dimensional target object image can be adjusted by changing the model parameter values. And determining a model parameter value in the three-dimensional deformation model based on the determined sample image characteristics containing the face key points, and generating a three-dimensional face model for identifying the two-dimensional face image based on the model parameter value.

Furthermore, a sample attitude parameter corresponding to the two-dimensional sample image can be determined, and the sample attitude parameter can reflect the attitude state of the face in the two-dimensional sample image. Illustratively, the sample attitude parameters include at least one of a pitch parameter, a yaw parameter, and a roll parameter.

S103, inputting the three-dimensional face model and the posture parameters into a pre-established differentiable rendering model, and generating a two-dimensional rendering image corresponding to the two-dimensional sample image.

Specifically, the differentiable rendering model may be a third-party support library Pytorch3d, and may be configured to generate a two-dimensional rendering image corresponding to the two-dimensional sample image based on the input three-dimensional face model and the pose parameters. Illustratively, the output two-dimensional rendering image is represented by the number of wide high channels 224 × 3, and the image corresponding to each channel is used for displaying face information of different perspectives.

S104, comparing the two-dimensional sample image with the two-dimensional rendering image, adjusting the parameter value of the model based on the comparison result, and determining the three-dimensional deformation model corresponding to the adjusted parameter value of the model as the target three-dimensional face model.

In a particular implementation, a two-dimensional sample image may be compared to a two-dimensional rendered image. For example, the two-dimensional sample image and the two-dimensional rendered image at each viewing angle may be compared, and the model parameter value may be adjusted based on the comparison result at each viewing angle.

Optionally, comparing the two-dimensional sample image with the two-dimensional rendering image, and adjusting a model parameter value based on a comparison result, including: calculating a model loss function based on the two-dimensional sample image and the two-dimensional rendering image; and determining the optimal model parameter value of the corresponding three-dimensional deformation model when the model loss function meets the preset state, and updating the model parameter value based on the optimal model parameter value.

Specifically, the model loss can be divided into three aspects, namely image loss, mask image loss and key point information loss. Thus, the model loss function can be determined by calculating the image loss function, the mask loss function, and the keypoint loss function, respectively.

Optionally, calculating a model loss function based on the two-dimensional sample image and the two-dimensional rendered image includes: determining an image loss function based on the two-dimensional sample image, the two-dimensional mask image and the two-dimensional rendering image; determining a rendering mask image corresponding to the two-dimensional rendering image, and determining a mask loss function based on the rendering mask image and the two-dimensional mask image; respectively inputting the two-dimensional sample image and the two-dimensional mask image into a key point detection model, and determining the sample key point coordinates of each key point in the two-dimensional sample image and the mask key point coordinates of each key point in the two-dimensional mask image; determining a key point loss function based on the sample key point coordinates and the mask key point coordinates; and determining the corresponding function as a model loss function when the image loss function, the mask loss function and the key point loss function are subjected to weighted summation calculation.

Specifically, the loss function of the image, the image loss function L, may be determined based on the two-dimensional sample image, the two-dimensional mask image, and the two-dimensional rendered image_pThe formula of (1) is as follows:

v represents a label of a view angle, a two-dimensional sample image of three view angles is acquired in the embodiment of the invention, wherein v can be 1, 2 or 3, and a value of v can be determined by a person skilled in the art according to the number of actually acquired view angles; p^vRepresenting two-dimensional rendered images I^v'And a two-dimensional sample image I^vIn the region of intersection in the current view, i represents the pixel index, | · | | luminance₂Represents an L2 paradigm; m^vRepresenting a two-dimensional mask image.

Further, the two-dimensional rendering image can be input into a face analysis network, so that a rendering mask image corresponding to the two-dimensional rendering image is obtained, and a mask loss function is determined based on the rendering mask image and the two-dimensional mask image; mask loss function L_mThe formula of (1) is as follows:

wherein v denotes a reference numeral of a viewing angle,

representing a two-dimensional mask image M^vAnd rendering the mask image M^v'The area that intersects in the current view; i represents a pixel index, | ·| non-phosphor₂Represents an L2 paradigm; m^vRepresenting a two-dimensional mask image.

Further, the two-dimensional sample image and the two-dimensional rendering image are respectively input into the key point detection model, and the sample key point coordinates of each key point in the two-dimensional sample image and the rendering key point coordinates of each key point in the two-dimensional rendering image are determined; a keypoint loss function is determined based on the sample keypoint coordinates and the rendering keypoint coordinates.

For example, the key point detection model can detect 68 key points in the human face, and after the two-dimensional sample image is input into the key point detection model, the coordinates of the key points of the sample can be obtained

After the two-dimensional rendering image is input to the key point detection model, the coordinates of the rendering key points can be obtained

n represents the serial number of the key point, v represents the index of the view angle; loss function of key points L_lThe formula of (1) is as follows:

wherein, w_nThe weight parameters representing the key points can be set to 20 for the nose and mouth corners, and the rest key points can be set to 1 for highlighting the influence of the nose and mouth corners on the three-dimensional face recognition. The corresponding functions when the image loss function, the mask loss function and the key point loss function are subjected to weighted summation calculation can be determined as the model loss function.

Further, in order to suppress the generation of the distorted face, a regularization loss function of the model, regularization loss function L, may also be determined_regThe formula is as follows:

L_reg＝w_α||α||²+w_β||β||²+w_γ||γ||²

wherein, alpha, beta and gamma can be three model parameter values in a three-dimensional deformation model, w_α、w_β、w_γThe weight of each model parameter value in the three-dimensional deformation model can be set as w_α＝1,w_β＝0.8,w_γ＝0.017。

The model loss function may be calculated based on the image loss function, the mask loss function, the keypoint loss function, and the regularization loss function, with the following calculation formula:

L_all＝w_pL_p+w_mL_m+w_lL_l+w_regL_reg

wherein L is_allAs a function of model loss, w_p、w_m、w_l、w_regThe image loss function, the mask loss function, the key point loss function and the regularization loss function are respectively set as follows: w is a_p＝24,w_m＝3,w_l＝1,w_reg＝0.0003。

Optionally, determining an optimal model parameter value of the corresponding three-dimensional deformation model when the model loss function satisfies a preset state, and updating the model parameter value based on the optimal model parameter value includes: determining a corresponding model loss value according to the model loss function, and updating a model parameter value based on the model loss value if the model loss value does not meet a preset state; and acquiring a two-dimensional sample image from a pre-established two-dimensional sample image set again, re-determining a model loss value in a gradient reduction mode and carrying out iterative computation on the basis of the two-dimensional sample image acquired again and the updated model parameter value until the model loss value meets a preset state or the iteration frequency reaches a preset threshold value.

Specifically, it may be determined whether the determined model loss value satisfies a preset state. For example, the preset state may include whether the model loss value reaches a threshold range, and if the model loss value is within the threshold range, the model loss value satisfies the preset state. When the model loss value does not meet the preset state, the model parameter value can be updated based on the model loss value so as to improve the recognition accuracy of the generated three-dimensional face model and reduce the model loss.

Specifically, the two-dimensional sample image can be acquired again in the pre-established two-dimensional sample image set, the steps are repeated based on the two-dimensional sample image, the three-dimensional face model is generated again, the three-dimensional face model and the posture parameters are input into the pre-established differentiable rendering model, and the two-dimensional rendering image corresponding to the two-dimensional sample image is generated; and comparing the two-dimensional sample image with the two-dimensional rendering image, re-determining a model loss value in a gradient descending manner, performing iterative computation until the model loss value meets a preset state or the iteration frequency reaches a preset threshold value, stopping the iterative process, and determining the three-dimensional face model generated at the last time as the target three-dimensional face model.

According to the embodiment of the invention, the model parameter values of the three-dimensional deformation model are determined through the two-dimensional sample image, a large number of three-dimensional face images do not need to be constructed in advance, and the human resources and the time resources are saved; and the two-dimensional sample image reflecting the real face information is compared with the two-dimensional rendering image obtained through the three-dimensional face model, the model loss is calculated, the model parameter value is adjusted based on the model loss, the accuracy of the model parameter value is improved, and the identification effect of the target three-dimensional face model is favorably improved.

Example two

Fig. 2 is a flowchart of another three-dimensional face model building method according to an embodiment of the present invention. The present embodiment is optimized based on the above technical solutions. Optionally, determining a model parameter value in a pre-established three-dimensional deformation model and a sample posture parameter corresponding to a two-dimensional sample image based on the sample image characteristics includes: and inputting the sample image characteristics into a pre-established parameter regression network, and determining a model parameter value and a sample attitude parameter in the three-dimensional deformation model based on an output result of the parameter regression network. The same or corresponding terms as those in the above embodiments are not explained in detail herein.

As shown in fig. 2, the method of the embodiment may specifically include:

s201, acquiring a two-dimensional sample image containing a face area, and determining sample image characteristics of the two-dimensional sample image.

S202, inputting the sample image characteristics into a pre-established parameter regression network, and determining model parameter values and sample attitude parameters in the three-dimensional deformation model based on the output result of the parameter regression network.

Specifically, the parametric regression model may include a residual neural network determined based on a residual network, and the parametric regression model is generated by changing a convolution operator of the residual network to an inner convolution operator. Illustratively, the net depth of the residual network is 50.

Optionally, the sample attitude parameters include a pitch parameter, a yaw parameter, and a roll parameter; the method comprises the following steps of inputting sample image features into a pre-established parameter regression network, and determining model parameter values and sample attitude parameters in a three-dimensional deformation model based on an output result of the parameter regression network, wherein the method comprises the following steps: and inputting the sample image characteristics into a parameter regression network constructed based on a residual error neural network, and determining a pitching parameter, a yawing parameter, a rolling parameter and a model parameter value based on an output result. Specifically, the sample image features of each view may be input into a parameter regression network, and the pitch parameter, yaw parameter, roll parameter, and model parameter value corresponding to each view may be output.

And S203, generating a three-dimensional face model based on the model parameter values.

And S204, inputting the three-dimensional face model and the posture parameters into a pre-established differentiable rendering model, and generating a two-dimensional rendering image corresponding to the two-dimensional sample image.

S205, comparing the two-dimensional sample image with the two-dimensional rendering image, adjusting the parameter value of the model based on the comparison result, and determining the three-dimensional deformation model corresponding to the adjusted parameter value of the model as the target three-dimensional face model.

According to the embodiment of the invention, the pitching parameters, the yawing parameters and the rolling parameters under different visual angles are determined by considering that the information reflected by the face images is different under different visual angle conditions and different postures, so that the face information in the two-dimensional sample image can be better reflected, the accuracy of the model parameter values is favorably improved, and the identification effect of the target three-dimensional face model is further improved.

EXAMPLE III

The embodiment corresponding to the three-dimensional face model building method is described in detail above, and specific application scenarios are given below in order to make the technical solutions of the method further clear to those skilled in the art.

A two-dimensional sample image set can be established in advance, the two-dimensional sample image set comprises two or more than two-dimensional sample images, different two-dimensional sample images can show face information of different target objects, and for the face information of the same target object, two-dimensional sample images under three visual angles can be stored, namely a left visual angle, a right visual angle and a middle visual angle; the training of the target three-dimensional face model is completed through the two-dimensional sample image set, fig. 3 is a schematic diagram of a three-dimensional face model establishing process provided by the embodiment of the invention, and as shown in fig. 3, the training steps are as follows:

step one, acquiring two-dimensional sample images of the same target object in a two-dimensional sample image set under 3 visual angles, and respectively carrying out image preprocessing. Specifically, a third-party support library face-alignment is used for detecting key points of a human face of a person image in a two-dimensional sample image, the area of the human face is cut according to the key points of the human face, the cut image is represented by 224 × 3 wide-pass channels, the cut image is updated to a two-dimensional sample image I, v represents the labels of visual angles, the number of the visual angles of the person is 3 in the embodiment of the invention, and the two-dimensional sample image I is compared with the two-dimensional sample image I^v＝(I¹,I²,I³)。

Step two, the obtained two-dimensional sample image I^vInputting the image into a face analysis network to obtain a two-dimensional mask image M^v＝(M¹,M²,M³) And is combined with^vAnd M^vStitching into a new image representation IM^v＝(IM¹,IM²,IM³)。

Step three, IM obtained^vAnd inputting the image data into a coding and decoding network so as to obtain a multi-view fused sample image characteristic F.

Specifically, the coding and decoding network is a novel network structure and comprises three coding ends and a decoding end, the two-dimensional sample images and the two-dimensional mask images at different visual angles are spliced respectively, the spliced images at the three visual angles are subjected to convolution, maximum pooling, convolution operation, maximum pooling operation and convolution operation respectively, three obtained results are spliced, and multiple deconvolution, splicing and convolution operations are performed to obtain a multi-view fused sample image characteristic F.

Step four, inputting the sample image characteristics F into a parameter regression network to predict the model parameter values and the sample attitude parameters Pose of the three-dimensional deformation model^vThe model parameters can be divided into three parameters of alpha, beta and gamma.

Pose^v＝(Pitch¹,Yaw¹,Roll¹；Pitch²,Yaw²,Roll²；Pitch³,Yaw³,Roll³)

Wherein Pitch is a Pitch parameter, Yaw is a Yaw parameter, and Roll is a Roll parameter.

And fifthly, inputting the model parameter values of the three-dimensional deformation model into the three-dimensional deformation model to generate a three-dimensional face model.

Step six, three-dimensional face model and sample attitude parameter Pose^vInputting the image into a pre-established differentiable rendering model to generate a two-dimensional rendering image I^v'＝(I^1',I^2',I^3')。

Step seven, the obtained two-dimensional rendering image I^v'Inputting the image into a face analysis network to obtain a rendering mask image M^v'＝(M^1',M^2',M^3')。

Step eight, two-dimensional sample image I^vAnd two-dimensionally rendering an image I^v'Inputting the coordinates into a key point detection model to respectively obtain the coordinates of key points of 68 samples

And rendering the keypoint coordinates

Step nine, two-dimensional sample image I^vTwo-dimensional mask image M^v＝(M¹,M²,M³) And two-dimensionally rendering an image I^v'For calculating image lossFunction L_p(ii) a Two-dimensional mask image M^v＝(M¹,M²,M³) And rendering the mask image M^v'For calculating the mask loss function L_m(ii) a Coordinate the key points of the sample

Rendering keypoint coordinates

And the preset weight of the key point is used for calculating the loss function L of the key point_l(ii) a Using model parameter values in a three-dimensional deformation model for computing a regularization loss function L_reg. All the above loss functions are as follows:

1) image loss function:

2) Mask loss function:

wherein v denotes a reference numeral of a viewing angle,

representing a two-dimensional mask image M^vAnd rendering the mask image M^v'The area that intersects in the current view; i represents a pixel index, | ·| non-phosphor₂Represents an L2 paradigm; m^vRepresenting two-dimensional masksAnd (4) an image.

3) Loss of key points:

wherein, w_nThe weight parameters representing the key points can be set to 20 for the nose and mouth corners, and the rest key points can be set to 1 for highlighting the influence of the nose and mouth corners on the three-dimensional face recognition.

4) Regularization loss function:

L_reg＝w_α||α||²+w_β||β||²+w_γ||γ||²

5) Model loss function:

L_all＝w_pL_p+w_mL_m+w_lL_l+w_regL_reg

Step ten: adjusting parameters of an encoding and decoding network and a parameter regression network based on the model loss function obtained by calculation; and performing iterative computation on each image in the two-dimensional sample image set according to the steps, optimizing the coding and decoding network and the parameter regression network by minimizing a model loss function, updating parameters of the coding and decoding network and the parameter regression network by using an Adam optimization algorithm in each iterative process until the iteration times reach a time threshold, stopping iteration, determining a model parameter value based on the parameters of the coding and decoding network and the parameter regression network determined at the last time, and generating the target three-dimensional face model.

Example four

Fig. 4 is a structural diagram of a three-dimensional face model building apparatus according to an embodiment of the present invention, where the apparatus is configured to execute the three-dimensional face model building method according to any of the embodiments. The device and the three-dimensional face model establishing method of each embodiment belong to the same inventive concept, and details which are not described in detail in the embodiment of the three-dimensional face model establishing device can refer to the embodiment of the three-dimensional face model establishing method. The device may specifically comprise:

a sample image feature determining module 10, configured to obtain a two-dimensional sample image including a face region, and determine a sample image feature of the two-dimensional sample image;

a three-dimensional face model generation module 11, configured to determine a model parameter value in a pre-established three-dimensional deformation model and a sample posture parameter corresponding to a two-dimensional sample image based on sample image characteristics, and generate a three-dimensional face model based on each model parameter value;

a rendering image generation module 12, configured to input the three-dimensional face model and the pose parameters into a pre-established differentiable rendering model, and generate a two-dimensional rendering image corresponding to the two-dimensional sample image;

and the model parameter adjusting module 13 is configured to compare the two-dimensional sample image with the two-dimensional rendering image, adjust a model parameter value based on a comparison result, and determine a three-dimensional deformation model corresponding to the adjusted model parameter value as a target three-dimensional face model.

On the basis of any optional technical solution in the embodiment of the present invention, optionally, the module 10 for determining a sample image feature includes:

the generating sample image characteristic unit is used for inputting the two-dimensional sample image into a pre-established face analysis network and generating a two-dimensional mask image which corresponds to the two-dimensional sample image and contains face key points; and performing splicing operation on the two-dimensional sample image and the two-dimensional mask image to generate a spliced image, and inputting the spliced image into an encoding and decoding network to generate sample image characteristics.

On the basis of any optional technical solution in the embodiment of the present invention, optionally, the module 13 for adjusting model parameters includes:

a calculation loss function unit for calculating a model loss function based on the two-dimensional sample image and the two-dimensional rendered image; and determining the optimal model parameter value of the corresponding three-dimensional deformation model when the model loss function meets the preset state, and updating the model parameter value based on the optimal model parameter value.

On the basis of any optional technical solution in the embodiment of the present invention, optionally, the unit for calculating a loss function includes:

the weighted summation unit is used for determining an image loss function based on the two-dimensional sample image, the two-dimensional mask image and the two-dimensional rendering image; determining a rendering mask image corresponding to the two-dimensional rendering image, and determining a mask loss function based on the rendering mask image and the two-dimensional mask image; respectively inputting the two-dimensional sample image and the two-dimensional rendering image into a key point detection model, and determining sample key point coordinates of each key point in the two-dimensional sample image and rendering key point coordinates of each key point in the two-dimensional rendering image; determining a key point loss function based on the sample key point coordinates and the rendering key point coordinates; and determining the corresponding function as a model loss function when the image loss function, the mask loss function and the key point loss function are subjected to weighted summation calculation.

the iterative computation unit is used for determining a corresponding model loss value according to the model loss function, and if the model loss value does not meet a preset state, updating a model parameter value based on the model loss value; and acquiring a two-dimensional sample image from a pre-established two-dimensional sample image set again, re-determining a model loss value in a gradient reduction mode and carrying out iterative computation on the basis of the two-dimensional sample image acquired again and the updated model parameter value until the model loss value meets a preset state or the iteration frequency reaches a preset threshold value.

On the basis of any optional technical solution in the embodiment of the present invention, optionally, the module 11 for generating a three-dimensional face model includes:

and the parameter regression calculation unit is used for inputting the characteristics of the sample image into a pre-established parameter regression network and determining the model parameter values and the sample attitude parameters in the three-dimensional deformation model based on the output result of the parameter regression network.

On the basis of any optional technical scheme in the embodiment of the invention, optionally, the sample attitude parameters comprise a pitch parameter, a yaw parameter and a roll parameter; wherein, the parameter regression calculation unit includes:

and the characteristic input unit is used for inputting the characteristics of the sample image into a parameter regression network constructed based on the residual error neural network and determining a pitching parameter, a yawing parameter, a rolling parameter and a model parameter value based on an output result.

According to the embodiment of the invention, the model parameter values of the three-dimensional deformation model are determined through the two-dimensional sample image, a large number of three-dimensional face images do not need to be constructed in advance, and the human resources and the time resources are saved; and the two-dimensional sample image reflecting the real face information is compared with the two-dimensional rendering image obtained through the three-dimensional face model, and the model parameter value is adjusted, so that the accuracy of the model parameter value is improved, and the identification effect of the target three-dimensional face model is favorably improved.

It should be noted that, in the embodiment of the three-dimensional face model building apparatus, each unit and each module included in the embodiment are only divided according to functional logic, but are not limited to the above division, as long as the corresponding function can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

EXAMPLE five

Fig. 5 is a structural diagram of an electronic device according to an embodiment of the present invention. FIG. 5 illustrates a block diagram of an exemplary electronic device 20 suitable for use in implementing embodiments of the present invention. The illustrated electronic device 20 is merely an example and should not be used to limit the functionality or scope of embodiments of the present invention.

As shown in fig. 5, the electronic device 20 is embodied in the form of a general purpose computing device. The components of the electronic device 20 may include, but are not limited to: one or more processors or processing units 201, a system memory 202, and a bus 203 that couples the various system components (including the system memory 202 and the processing unit 201).

Bus 203 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 20 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 20 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 202 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)204 and/or cache memory 205. The electronic device 20 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, the storage system 206 may be used to read from and write to non-removable, nonvolatile magnetic media. A magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 203 by one or more data media interfaces. Memory 202 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 208 having a set (at least one) of program modules 207 may be stored, for example, in memory 202, such program modules 207 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 207 generally perform the functions and/or methodologies of embodiments of the present invention as described herein.

The electronic device 20 may also communicate with one or more external devices 209 (e.g., keyboard, pointing device, display 210, etc.), with one or more devices that enable a user to interact with the electronic device 20, and/or with any devices (e.g., network card, modem, etc.) that enable the electronic device 20 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 211. Also, the electronic device 20 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 212. As shown, the network adapter 212 communicates with other modules of the electronic device 20 over the bus 203. It should be understood that other hardware and/or software modules may be used in conjunction with electronic device 20, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 201 executes various functional applications and data processing by running a program stored in the system memory 202.

The electronic equipment provided by the invention can realize the following method: acquiring a two-dimensional sample image containing a face area, and determining sample image characteristics of the two-dimensional sample image; determining a model parameter value in a pre-established three-dimensional deformation model and a sample attitude parameter corresponding to a two-dimensional sample image based on the sample image characteristics, and generating a three-dimensional face model based on each model parameter value; inputting the three-dimensional face model and the attitude parameters into a pre-established differentiable rendering model, and generating a two-dimensional rendering image corresponding to the two-dimensional sample image; and comparing the two-dimensional sample image with the two-dimensional rendering image, adjusting the parameter value of the model based on the comparison result, and determining the three-dimensional deformation model corresponding to the adjusted parameter value of the model as the target three-dimensional face model.

EXAMPLE six

An embodiment of the present invention provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a method for building a three-dimensional face model, the method including:

acquiring a two-dimensional sample image containing a face area, and determining sample image characteristics of the two-dimensional sample image; determining a model parameter value in a pre-established three-dimensional deformation model and a sample attitude parameter corresponding to a two-dimensional sample image based on the sample image characteristics, and generating a three-dimensional face model based on each model parameter value; inputting the three-dimensional face model and the attitude parameters into a pre-established differentiable rendering model, and generating a two-dimensional rendering image corresponding to the two-dimensional sample image; and comparing the two-dimensional sample image with the two-dimensional rendering image, adjusting the parameter value of the model based on the comparison result, and determining the three-dimensional deformation model corresponding to the adjusted parameter value of the model as the target three-dimensional face model.

Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the above-described method operations, and may also perform related operations in the three-dimensional face model building method provided by any embodiments of the present invention.

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for embodiments of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A three-dimensional face model building method is characterized by comprising the following steps:

2. The method of claim 1, wherein determining sample image features of the two-dimensional sample image comprises:

inputting the two-dimensional sample image into a pre-established face analysis network, and generating a two-dimensional mask image which corresponds to the two-dimensional sample image and contains face key points;

and performing splicing operation on the two-dimensional sample image and the two-dimensional mask image to generate a spliced image, inputting the spliced image into an encoding and decoding network, and generating the sample image characteristics.

3. The method of claim 2, wherein comparing the two-dimensional sample image to the two-dimensional rendered image, and adjusting the model parameter values based on the comparison comprises:

calculating a model loss function based on the two-dimensional sample image and the two-dimensional rendered image;

and determining the optimal model parameter value of the corresponding three-dimensional deformation model when the model loss function meets the preset state, and updating the model parameter value based on the optimal model parameter value.

4. The method of claim 3, wherein the computing a model loss function based on the two-dimensional sample image and the two-dimensional rendered image comprises:

determining an image loss function based on the two-dimensional sample image, the two-dimensional mask image, and the two-dimensional rendered image;

determining a rendering mask image corresponding to the two-dimensional rendering image, and determining a mask loss function based on the rendering mask image and the two-dimensional mask image;

respectively inputting the two-dimensional sample image and the two-dimensional rendering image into a key point detection model, and determining sample key point coordinates of each key point in the two-dimensional sample image and rendering key point coordinates of each key point in the two-dimensional rendering image;

determining a keypoint loss function based on the sample keypoint coordinates and the rendering keypoint coordinates;

and determining a corresponding function when the image loss function, the mask loss function and the key point loss function are subjected to weighted summation calculation as the model loss function.

5. The method according to claim 3, wherein the determining an optimal model parameter value of the three-dimensional deformation model corresponding to the model loss function satisfying a preset state, and updating the model parameter value based on the optimal model parameter value comprises:

determining a corresponding model loss value according to the model loss function, and updating the model parameter value based on the model loss value if the model loss value does not meet the preset state;

and acquiring a two-dimensional sample image from a pre-established two-dimensional sample image set again, re-determining the model loss value in a gradient reduction mode based on the two-dimensional sample image acquired again and the updated model parameter value, and performing iterative computation until the model loss value meets the preset state or the iteration frequency reaches a preset threshold value.

6. The method of claim 1, wherein determining model parameter values in a pre-established three-dimensional deformation model and sample pose parameters corresponding to the two-dimensional sample image based on the sample image features comprises:

inputting the sample image characteristics into a pre-established parameter regression network, and determining the model parameter values and the sample attitude parameters in the three-dimensional deformation model based on the output result of the parameter regression network.

7. The method of claim 6, wherein the sample attitude parameters include a pitch parameter, a yaw parameter, and a roll parameter; wherein the content of the first and second substances,

inputting the sample image features into a pre-established parameter regression network, and determining the model parameter values and the sample attitude parameters in the three-dimensional deformation model based on the output result of the parameter regression network, including:

inputting the sample image features into a parameter regression network constructed based on a residual error neural network, and determining the pitching parameter, the yawing parameter, the rolling parameter and the model parameter value based on the output result.

8. A three-dimensional human face model building device is characterized by comprising:

9. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method of three-dimensional face model building as claimed in any one of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method of three-dimensional face model building as claimed in any one of claims 1 to 7.