CN116563432A

CN116563432A - Three-dimensional digital person generating method and device, electronic equipment and storage medium

Info

Publication number: CN116563432A
Application number: CN202310544701.0A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Moore Threads Technology Co Ltd
Current assignee: Moore Threads Technology Co Ltd
Priority date: 2023-05-15
Filing date: 2023-05-15
Publication date: 2023-08-08
Anticipated expiration: 2043-05-15
Also published as: CN116563432B

Abstract

The present disclosure relates to a three-dimensional digital person generating method and apparatus, an electronic device, and a storage medium. The method comprises the following steps: performing key point detection on the face image to be processed to obtain first specific key point data; determining a first face feature vector corresponding to the face image to be processed; generating initial digital face data according to the first face feature vector, and updating the initial digital face data by utilizing the first specific key point data to obtain three-dimensional digital face data; and processing the three-dimensional digital face data by using digital person generating software to obtain the target three-dimensional digital person. Therefore, the efficiency and accuracy of determining the target three-dimensional digital person can be improved.

Description

Three-dimensional digital person generating method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of animation, and in particular, to a three-dimensional digital person generating method and apparatus, an electronic device, and a storage medium.

Background

In a plurality of industrial application scenes such as industrial interfaces, movies, games and the like, the requirement of building three-dimensional digital people exists.

Typically, engineers first reconstruct three-dimensional digital faces and then dock the three-dimensional digital faces with the three-dimensional digital human body. The accuracy of the three-dimensional digital human face is not high because the two-dimensional image quality is uneven. Therefore, manual face pinching is also required, and the accuracy is low. And, the efficiency of generating three-dimensional digital persons is reduced.

Disclosure of Invention

In view of this, the present disclosure proposes a three-dimensional digital person generating scheme, which can improve the accuracy and efficiency of generating three-dimensional digital persons.

According to an aspect of the present disclosure, there is provided a three-dimensional digital person generating method including: performing key point detection on the face image to be processed to obtain first specific key point data; determining a first face feature vector corresponding to the face image to be processed; generating initial digital face data according to the first face feature vector, and updating the initial digital face data by utilizing the first specific key point data to obtain three-dimensional digital face data; and processing the three-dimensional digital face data by using digital person generating software to obtain the target three-dimensional digital person.

In one possible implementation, the first specific key point data includes: coordinates of a first specific key point, the initial digital face data including: the step of updating the initial digital face data by using the first specific key point data to obtain three-dimensional digital face data, including: and replacing the coordinates of the initial three-dimensional face key points corresponding to the first specific key points by using the coordinates of the first specific key points to obtain the three-dimensional digital face data.

In a possible implementation manner, the determining a first face feature vector corresponding to the face image to be processed includes: performing first segmentation operation on the face image to be processed to obtain a first face segmentation result; and determining the first face feature vector in the first face segmentation result.

In one possible implementation, the first face feature vector includes: a first reflection feature vector, a first detail feature vector, a first pose feature vector, a first expression feature vector, the method further comprising: obtaining a normal map according to the first detail feature vector, the first gesture feature vector and the first expression feature vector; and obtaining the texture map according to the first reflection characteristic vector.

In one possible implementation, the first face feature vector further includes: the first shape feature vector, which generates initial digital face data according to the first face feature vector, includes: and obtaining the initial digital face data according to the first shape feature vector, the first gesture feature vector and the first expression feature vector.

In one possible implementation manner, the processing the three-dimensional digital face data by using digital person generating software to obtain a target three-dimensional digital person includes: based on the coordinates of key points in the three-dimensional digital face data, adjusting the coordinates of the face key points of the standard three-dimensional digital person to obtain first coordinates of the face key points of the target three-dimensional digital person; and rendering the texture map and/or the normal map with the target three-dimensional digital person based on the first coordinates, and determining the target three-dimensional digital person.

In one possible implementation, the method is applied to a first neural network, a training process of the first neural network, including: performing a first downsampling operation on the image samples to determine a third potential feature vector; performing a first up-sampling operation on the third potential feature vector to determine a third face image; performing a second downsampling operation on the image sample, and determining a third detail feature vector of the image sample, wherein the third detail feature vector represents coordinates of detail key points of the face of the image sample; performing a second up-sampling operation on the third detail feature vector and the third potential feature vector to determine a fourth face image; and adjusting parameters of the first neural network based on a first difference between the image sample and the third face image and a second difference between the image sample and the fourth face image.

In one possible implementation, the third potential feature vector includes: the third camera feature vector, the third reflection feature vector, the third light feature vector, the third shape feature vector, the third gesture feature vector, the third expression feature vector of the image sample, the first up-sampling operation is performed on the third potential feature vector, and a third face image is determined, including: performing a third up-sampling operation on the third reflective feature vector to determine a sample texture map, wherein the sample texture map characterizes the colors of key points of faces in an image sample; performing fourth upsampling operation on the third light feature vector to determine light information of the image sample, wherein the light information represents the incident light intensity of the image sample; performing fifth upsampling operation on the third shape feature vector, the third posture feature vector and the third expression feature vector, and determining coordinates of a fourth face key point and reflected light intensity of the fourth face key point; rendering the third camera feature vector, the ray information, the sample texture map, the coordinates of the fourth face key point and the reflected light intensity of the fourth face key point to obtain the third face image; the step of up-sampling the third detail feature vector and the third potential feature vector to determine a fourth face image includes: performing sixth upsampling operation on the third detail feature vector, the third gesture feature vector and the third expression feature vector to perform upsampling operation, and determining a sample normal map, wherein the sample normal map represents the reflected light intensity of each detail key point in an image sample; and rendering the coordinates of the fourth face key points, the reflected light intensity of the fourth face key points, the sample texture mapping and the sample normal mapping, and determining the fourth face image.

According to another aspect of the present disclosure, there is provided a three-dimensional digital person generating apparatus including:

the key point detection unit is used for carrying out key point detection on the face image to be processed to obtain first specific key point data;

the first face feature vector determining unit is used for determining a first face feature vector corresponding to the face image to be processed;

the three-dimensional digital face data determining unit is used for generating initial digital face data according to the first face feature vector, and updating the initial digital face data by utilizing the first specific key point data to obtain three-dimensional digital face data;

and the target three-dimensional digital person generating unit is used for processing the three-dimensional digital face data by utilizing digital person generating software to obtain a target three-dimensional digital person.

In one possible implementation, the first specific key point data includes: coordinates of a first specific key point, the initial digital face data including: the coordinates of the key points of the initial three-dimensional face are determined by the three-dimensional digital face data determining unit, which comprises:

and the coordinate replacing unit is used for replacing the coordinates of the initial three-dimensional face key point corresponding to the first specific key point by using the coordinates of the first specific key point to obtain the three-dimensional digital face data.

In one possible implementation manner, the first face feature vector determining unit includes:

the segmentation unit is used for carrying out a first segmentation operation on the face image to be processed to obtain a first face segmentation result;

and the first face feature determining subunit is used for determining the first face feature vector in the first face segmentation result.

In one possible implementation, the first face feature vector includes: a first reflection feature vector, a first detail feature vector, a first pose feature vector, a first expression feature vector, the apparatus further comprising:

the normal map generating unit is used for obtaining a normal map according to the first detail feature vector, the first gesture feature vector and the first expression feature vector;

and the texture map generating unit is used for obtaining the texture map according to the first reflection characteristic vector.

In one possible implementation, the first face feature vector further includes: a first shape feature vector, the three-dimensional digital face data determination unit including:

and the initial digital face data generating unit is used for obtaining the initial digital face data according to the first shape feature vector, the first gesture feature vector and the first expression feature vector.

In one possible implementation, the target three-dimensional digital person generating unit includes:

the first coordinate determining unit is used for adjusting the coordinates of the face key points of the standard three-dimensional digital person based on the coordinates of the key points in the three-dimensional digital face data to obtain the first coordinates of the face key points of the target three-dimensional digital person;

and the rendering unit is used for rendering the texture map and/or the normal map and the target three-dimensional digital person based on the first coordinates, and determining the target three-dimensional digital person.

In one possible implementation, the apparatus is applied to a first neural network, a training process of the first neural network, including:

performing a first downsampling operation on the image samples to determine a third potential feature vector;

performing a first up-sampling operation on the third potential feature vector to determine a third face image;

performing a second downsampling operation on the image sample, and determining a third detail feature vector of the image sample, wherein the third detail feature vector represents coordinates of detail key points of the face of the image sample;

performing a second up-sampling operation on the third detail feature vector and the third potential feature vector to determine a fourth face image;

And adjusting parameters of the first neural network based on a first difference between the image sample and the third face image and a second difference between the image sample and the fourth face image.

In one possible implementation, the third potential feature vector includes: a third camera feature vector, a third reflection feature vector, a third light feature vector, a third shape feature vector, a third pose feature vector, a third expression feature vector of the image sample,

the performing a first upsampling operation on the third potential feature vector to determine a third face image comprises:

performing a third up-sampling operation on the third reflective feature vector to determine a sample texture map, wherein the sample texture map characterizes the colors of key points of faces in an image sample;

performing fourth upsampling operation on the third light feature vector to determine light information of the image sample, wherein the light information represents the incident light intensity of the image sample;

performing fifth upsampling operation on the third shape feature vector, the third posture feature vector and the third expression feature vector, and determining coordinates of a fourth face key point and reflected light intensity of the fourth face key point;

Rendering the third camera feature vector, the ray information, the sample texture map, the coordinates of the fourth face key point and the reflected light intensity of the fourth face key point to obtain the third face image;

the step of up-sampling the third detail feature vector and the third potential feature vector to determine a fourth face image includes:

performing sixth upsampling operation on the third detail feature vector, the third gesture feature vector and the third expression feature vector to perform upsampling operation, and determining a sample normal map, wherein the sample normal map represents the reflected light intensity of each detail key point in an image sample;

and rendering the coordinates of the fourth face key points, the reflected light intensity of the fourth face key points, the sample texture mapping and the sample normal mapping, and determining the fourth face image.

According to another aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the above-described method when executing the instructions stored by the memory.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer program instructions, wherein the computer program instructions, when executed by a processor, implement the above-described method.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, performs the above method.

In the embodiment of the disclosure, the key point detection can be performed on the face image to be processed independently to obtain the first specific key point. And determining a first face feature vector corresponding to the face image to be processed. And generating initial digital face data according to the first face feature vector. And updating the initial face data by using the first specific key point data to obtain three-dimensional digital face data representing more accurate information, thereby improving the accuracy of the subsequent generation of the target three-dimensional digital person. And then, processing the three-dimensional digital face data by using digital person generating software to obtain the target three-dimensional digital person. Because the initial face data can be updated by utilizing the first specific key point, the accuracy of the three-dimensional digital face data is improved, and manual face pinching is not needed. Therefore, the accuracy and efficiency of generating three-dimensional digital persons are generally improved.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features and aspects of the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 provides a flow diagram of a three-dimensional digital person generation method of an embodiment of the present disclosure.

Fig. 2 provides another flow diagram of a three-dimensional digital person generation method of an embodiment of the present disclosure.

Fig. 3 provides a schematic structural view of a three-dimensional digital person generating apparatus according to an embodiment of the present disclosure.

Fig. 4 provides a schematic structural diagram of an electronic device for three-dimensional digital human generation in accordance with an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

In addition, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.

Generally, in a scene of generating a three-dimensional digital person, an engineer may generate a three-dimensional digital face through a model using a two-dimensional image. But the accuracy of the three-dimensional digital human face is not high because the quality of the two-dimensional images is uneven. Although increasing the number of layers of the model can improve the accuracy of the three-dimensional digital face to some extent, the cost of re-modeling and training the model is high. Moreover, the model effect is strongly related to the training sample, and the accuracy improvement is not obvious.

In addition, the generated three-dimensional digital human face can be adjusted by manually pinching the face, but the accuracy is not high. Moreover, the efficiency of generating three-dimensional digital persons is reduced.

Fig. 1 provides a flow diagram of a three-dimensional digital person generation method of an embodiment of the present disclosure. As shown in fig. 1, the method includes:

And S11, performing key point detection on the face image to be processed to obtain first specific key point data.

The face image to be processed may be a two-dimensional image containing a face. The first feature key point data may be part of key point data of a face in the face image to be processed. For example: the first feature key point data may be key point data of a facial feature and a face contour in a face image to be processed. The first feature key point data may characterize the shape of a partial region of the three-dimensional face.

In embodiments of the present disclosure, a particular keypoint data extraction model may be pre-trained; and performing key point detection on the face image to be processed by using the specific key point data extraction model to obtain first characteristic key point data.

S12, determining a first face feature vector corresponding to the face image to be processed.

The first face feature vector may be a feature vector characterizing a face in the face image to be processed. In the embodiment of the disclosure, the first face feature vector may be extracted from the face image to be processed.

And S13, generating initial digital face data according to the first face feature vector, and updating the initial digital face data by utilizing the first specific key point data to obtain three-dimensional digital face data.

The initial digital face data may characterize an initial state of the three-dimensional digital face, as well as an initial light intensity at each location of the three-dimensional digital face. The initial state may include: the initial shape, initial posture and initial five-sense organ relative position of the three-dimensional digital human face. In the embodiment of the disclosure, the initial digital face data may be updated by using the first specific key point data, so as to update the three-dimensional digital face state and the light intensity at each position of the three-dimensional digital face, thereby obtaining the three-dimensional digital face data. The three-dimensional digital face data may represent the updated state of the three-dimensional digital face and the updated light intensity at each location of the three-dimensional digital face. The updated state may include: the updated shape, the updated posture and the updated relative position of the five sense organs of the three-dimensional digital human face. The information of the three-dimensional digital face data representation is more accurate than the information of the initial digital face representation.

S14, processing the three-dimensional digital face data by using digital person generating software to obtain the target three-dimensional digital person.

The digital person generating software can dock the three-dimensional digital face with the three-dimensional digital human body to obtain an initial three-dimensional digital human body. And rendering the initial three-dimensional digital person, and updating the face light and the color of the initial three-dimensional digital person to generate the target three-dimensional digital person.

The first specific keypoints may be a portion of the initial three-dimensional face keypoints. The first specific key points may be key points of facial features and facial contour parts. Some of the initial three-dimensional face keypoints correspond to the first specific keypoint. The coordinates of the first specific key point are more accurate than the coordinates of the initial three-dimensional face key point corresponding to the first specific key point. In the embodiment of the disclosure, the coordinates of the initial three-dimensional face key point corresponding to the first specific key point may be replaced by the coordinates of the first specific key point. Thus, the accuracy of the three-dimensional digital face data can be improved.

The contrast and color between the person and the background are different in the image to be processed. Some images to be processed have background colors close to those of people; some of the image characters to be processed occupy a relatively small area, etc. In this way, the background is prone to interference with the extraction of the first face feature vector. Therefore, in the embodiment of the present disclosure, the face in the image to be processed may be segmented first. That is, the first segmentation operation is performed to obtain the first face segmentation result. Then, in the first face segmentation result, a first face feature vector is determined. Therefore, the influence of the background on the extraction of the first face feature vector can be reduced, and the accuracy of determining the first face feature vector is improved.

The first reflection feature vector may characterize colors in the image to be processed at key points of the face.

The human looks are different. By comparing certain parts of the person, the face can be distinguished. For example: the five sense organs, cheekbones, facial contours, etc. are compared. For convenience of the following description, these parts for distinguishing the face are named as detail parts. The above is merely an example, and the present embodiment is not limited to the details.

The first detail feature vector may characterize coordinates of key points of a detail portion of a face in the image to be processed. The first pose feature vector may characterize the orientation of a face in the image to be processed, as well as the position offset angle relative to a standard head. The first expression feature vector may represent a mapping relationship between each key point of a face in the image to be processed and a standard face key point.

In the embodiment of the present disclosure, the first detail feature vector, the first gesture feature vector, and the first expression feature vector may be spliced as a whole, that is, a first spliced vector. And upsampling the first spliced vector to obtain a normal map. The normal map may exhibit reflected light intensities at key points of a face represented by three-dimensional digital face data.

And, the first reflection feature vector may be processed, for example: and carrying out up-sampling operation on the first reflection characteristic vector to obtain a texture map. The texture map may exhibit colors at key points of a face characterized by three-dimensional digital face data.

Since the position of the detail part of the face, the direction and rotation angle of the face, and the expression of the face affect the reflected light intensity of each key point on the face. Therefore, the first detail feature vector, the first gesture feature vector and the first expression feature vector can be processed, and the accuracy of determining the normal map is improved. In addition, the texture map can be obtained directly according to the first reflection characteristic vector, and the efficiency of obtaining the texture map is improved.

The first shape feature vector may characterize the contours of the face in the image to be processed, as well as the contours of portions of the face, for example: the outline of the five sense organs, cheeks, forehead, etc., and the positional relationship of the parts of the face.

In the embodiment of the present disclosure, the first shape feature vector, the first gesture feature vector, and the first expression feature vector may be spliced as a whole, that is, the second spliced vector. And upsampling the second spliced vector to obtain initial digital face data. Because the shape, the posture and the expression of the face in the image to be processed are integrated, the accuracy of the initial digital face data is improved.

The standard three-dimensional digital person may be a pre-generated three-dimensional digital person. The standard three-dimensional digital person includes: standard three-dimensional face, standard three-dimensional body. A standard three-dimensional face may be composed of face key points. By changing the coordinates of the face key points of the standard three-dimensional face, the shape, posture, expression, etc. of the standard three-dimensional face can be changed. The face key points of the standard three-dimensional face can be in one-to-one correspondence with the key points in the three-dimensional digital face data.

In the embodiment of the disclosure, the coordinates of the key points of the face of the standard three-dimensional digital person can be adjusted based on the coordinates of the key points in the three-dimensional digital face data. After adjustment, the first coordinates of the face key points of the target three-dimensional digital person can be obtained. The first coordinates are the same as the coordinates of the corresponding key points in the three-dimensional digital face data. Namely, the face of the standard three-dimensional face is automatically adjusted to be the three-dimensional face represented by the three-dimensional digital face data, so that the efficiency is improved.

In addition, as the coordinates of each first coordinate are the same as the coordinates of the corresponding key points in the three-dimensional digital face data, the normal map can show the reflected light intensity and the texture map on each key point of the face represented by the three-dimensional digital face data, and the color on each key point of the face represented by the three-dimensional digital face data can be shown. Thus, the texture map and/or the normal map may be rendered with the face of the target three-dimensional digital person based on the first coordinates, determining the target three-dimensional digital person.

In one possible implementation manner, the method is applied to the first neural network, that is, the first face feature vector is extracted by using the first neural network, or further, the first face feature vector is extracted by the first neural network, and corresponding initial digital face data is generated. The first neural network may be pre-trained, or it may be appreciated that it may be trained first before reasoning using the first neural network. Specifically, the training process of the first neural network includes: performing a first downsampling operation on the image samples to determine a third potential feature vector; performing a first up-sampling operation on the third potential feature vector to determine a third face image; performing a second downsampling operation on the image sample, and determining a third detail feature vector of the image sample, wherein the third detail feature vector represents coordinates of detail key points of the face of the image sample; performing a second up-sampling operation on the third detail feature vector and the third potential feature vector to determine a fourth face image; and adjusting parameters of the first neural network based on a first difference between the image sample and the third face image and a second difference between the image sample and the fourth face image.

The specific type of the first neural network is not limited, and may be, for example, a convolutional neural network, a cyclic neural network, and the like. A plurality of encoders and a plurality of decoders may be included in the first neural network, with each encoder and decoder cooperating with the other, the output data of the different encoders and decoders may be capable of characterizing the physical meaning of the input image in some respect, such as light, reflection, shape, pose, detail, etc. In one possible implementation, the third potential feature may be regarded as a face feature vector corresponding to the image sample extracted based on the current network parameters in the first neural network; in another embodiment, the third potential feature vector may cooperate with the third detail feature vector to generate a face feature vector corresponding to the image sample extracted by the first neural network based on the current network parameter; in another embodiment, the initial digital face data extracted by the first neural network based on the current network parameters is obtained after the decoder processes the initial digital face data based on the third potential feature vector or based on the third potential feature vector and the third detail feature vector. The third potential feature may be a low-dimensional feature vector. The third potential feature may characterize the feature of the image sample, and may include: features of the image itself, features of a face in the image, such as: features such as the pose, the shape and the expression of the face, and the like, and the reflected light intensity features of key points of each face are determined according to the pose, the shape and the expression of the face. The face detail key points of the image sample may be part of the face key points of the image sample and located at the detail points of the image sample face.

In an embodiment of the disclosure, determining a third potential feature by performing a first downsampling operation on the image sample on the one hand; and then performing a first up-sampling operation on the third potential feature to determine a third face image. On the other hand, performing a second up-sampling operation on the image sample to determine a third detail feature vector; and performing a second up-sampling operation on the third detail feature vector and the third potential feature vector to determine a fourth face image, wherein the third face image can be a two-dimensional image, and the fourth face image can be a two-dimensional image. The fourth face image is more accurate in face detail than the third face image. Here, the first downsampling and the second downsampling may be different; the first upsampling and the second upsampling may be different.

In practice, the third face image may be a reconstructed image sample. The fourth face image is another reconstructed image sample. Therefore, the closer the third face image and the fourth face image are to the image sample, the higher the accuracy of the first neural network is. Thus, parameters of the first neural network may be adjusted based on a first difference of the image sample and the third face image, a second difference of the image sample and the fourth face image.

Thus, the trained first neural network not only can accurately represent the outline, the appearance and the posture of the human face, but also can accurately represent the characteristics of the detail part of the human face, and the accuracy of the first neural network is improved.

In one possible implementation, the third detail feature vector includes at least one of: nose detail feature vector, eye corner detail feature vector, mouth corner detail feature vector, chin detail feature vector, forehead detail feature vector, and cheekbone detail feature vector.

There may be a large difference between the face in the image sample and the face in the input image at the time of actual use. For example, the face of the image sample is the face of a western person and the face of the input image is the face of an asian person. Therefore, even if the accuracy of the first neural network satisfies the requirement during training; but the accuracy of the first neural network may be reduced during use.

Whether different ethnic groups or individuals of different characters, there are significant differences in nose, corners of eyes, corners of mouth, chin, forehead, cheekbones. Therefore, the nose, the canthus, the chin, the forehead, the cheekbones can be determined as the detail part; and one or more of the feature vectors corresponding to the detail parts are used as third detail feature vectors. The dependence of the accuracy of the first neural network on the image sample is reduced, and the stability and universality of the first neural network are improved.

The third camera feature vector may characterize the angle, distance of the image acquisition device that acquired the image sample relative to the object being acquired. The third reflective feature vector may characterize the color of each key point of the face in the image sample. The third ray feature vector may characterize the intensity, angle of incident light when the image sample was taken. The third shape feature vector may characterize the coordinates of a key point of a fourth three-dimensional face (i.e., the coordinates of a fourth face key point) corresponding to the face in the image sample. The third shape feature vector may represent the contour of the face and the contour of the five sense organs in the image sample. The third pose feature vector may represent the overall orientation of the face in the image sample, as well as the offset, rotation angle relative to a standard face. The third expression feature vector may represent an offset of each portion of the face in the image sample relative to the corresponding each portion of the standard face, and a positional relationship of each portion of the face in the image sample.

In the embodiment of the disclosure, the coordinates of the fourth face key point and the reflected light intensity on the fourth face key point may be determined by using the third shape feature vector, the third posture feature vector and the third expression feature vector. Therefore, the fourth three-dimensional face represented by the fourth face key points not only shows the appearance, the posture and the expression of the face in the image sample, but also shows the light intensity of each part of the face in the image sample. And the coordinates of the fourth face key point are three-dimensional coordinates.

And, a sample texture map characterizing the color of the fourth face key point may also be determined using the third reflective feature vector. And then, rendering based on the third camera feature vector, the light ray information, the sample texture map, the coordinates of the fourth face key point and the reflected light intensity of the fourth face key point, and determining a third face image.

The third minutiae feature vector characterizes coordinates of minutiae points of the image sample face. In the embodiment of the disclosure, a sample normal map representing the intensity of the reflected light of each detail key point in the image sample may be determined by using the third detail feature vector, the third gesture feature vector, and the third expression feature vector. And rendering the coordinates of the fourth face key point, the reflected light intensity of the fourth face key point and the sample normal map, so that the coordinates of the fifth face key point and the reflected light intensity of the fifth face key point can be obtained. The fifth face key points characterize a fifth three-dimensional face. The fifth three-dimensional face can more accurately display the features of the detail parts of the face in the image sample than the fourth three-dimensional face. And the coordinates of the five face key points, the reflected light intensity of the fifth face key points and the sample texture map can be rendered to determine a fourth face image.

In the embodiment of the disclosure, the image sample, the third face image, and the fourth face image are two-dimensional images. The embodiment of the disclosure comprises two links, namely a third face image generation link and a fourth face image generation link. In the step of generating the third face image, the shape, posture, expression and color of the whole face in the image sample are focused. And in the step of generating the fourth face image, focusing on the characteristics of the face detail part in the image sample. And the influence of the detail features on the overall shape, posture and expression of the face is displayed in the fourth image. Therefore, the difference (the first difference and the second difference) between the third face image and the fourth face image and the image sample is used for adjusting the parameters of the first neural network, and the accuracy of generating the three-dimensional face can be improved in the aspects of the whole and the details.

Fig. 2 provides another flow diagram of a three-dimensional digital person generation method of an embodiment of the present disclosure. As shown in fig. 2, in generating the target three-dimensional digital person, an artificial intelligence service platform and a rendering engine platform are required.

In the artificial intelligence service platform, the following operations are performed:

s21, preprocessing a face image to obtain a face image to be processed, wherein the preprocessing can comprise operations such as attribute editing, image enhancement and the like on the face image;

S22, generating initial digital face data, a normal map and a texture map according to the image to be processed;

at the rendering engine platform, the following operations are performed:

s23, adjusting coordinates of face key points of the standard three-dimensional digital person based on coordinates of the key points in the three-dimensional digital face data to obtain first coordinates of the face key points of the target three-dimensional digital person;

s24, rendering the texture map and/or the normal map with the target three-dimensional digital person based on the first coordinates to determine the target three-dimensional digital person;

s25, adding at least one of the following to the target three-dimensional digital person based on the first coordinates: skin, teeth, hair, material.

Fig. 3 provides a schematic structural view of a three-dimensional digital person generating apparatus according to an embodiment of the present disclosure. The apparatus 30 comprises:

the key point detection unit 31 is configured to perform key point detection on a face image to be processed, so as to obtain first specific key point data;

a first face feature vector determining unit 32, configured to determine a first face feature vector corresponding to the face image to be processed;

a three-dimensional digital face data determining unit 33, configured to generate initial digital face data according to the first face feature vector, and update the initial digital face data with the first specific key point data to obtain three-dimensional digital face data;

The target three-dimensional digital person generating unit 34 is configured to process the three-dimensional digital face data by using digital person generating software to obtain a target three-dimensional digital person.

In one possible implementation, the first specific key point data includes: coordinates of a first specific key point, the initial digital face data including: the three-dimensional digital face data determining unit 33 includes:

In one possible implementation manner, the first face feature vector determining unit 32 includes:

In one possible implementation, the first face feature vector includes: a first reflection feature vector, a first detail feature vector, a first pose feature vector, a first expression feature vector, the device 30 further comprising:

In one possible implementation, the first face feature vector further includes: a first shape feature vector, the three-dimensional digital face data determining unit 32 includes:

In one possible implementation, the target three-dimensional digital person generating unit 34 includes:

In one possible implementation, the apparatus 30 is applied to a first neural network, a training process of the first neural network, including:

In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.

The disclosed embodiments also provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium.

The embodiment of the disclosure also provides an electronic device, which comprises: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the above-described method when executing the instructions stored by the memory.

Embodiments of the present disclosure also provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, performs the above method.

Fig. 4 provides a schematic structural diagram of an electronic device for three-dimensional digital human generation in accordance with an embodiment of the present disclosure. For example, electronic device 1900 may be provided as a server or terminal device. Referring to FIG. 4, electronic device 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output interface 1958 (I/O interface). The electronic device 1900 may operateBased on an operating system stored in memory 1932, e.g., windows Server ^TM ，Mac OS X ^TM ，Unix ^TM ,Linux ^TM ，FreeBSD ^TM Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of electronic device 1900 to perform the methods described above.

The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of three-dimensional digital person generation, comprising:

performing key point detection on the face image to be processed to obtain first specific key point data;

determining a first face feature vector corresponding to the face image to be processed;

generating initial digital face data according to the first face feature vector, and updating the initial digital face data by utilizing the first specific key point data to obtain three-dimensional digital face data;

and processing the three-dimensional digital face data by using digital person generating software to obtain the target three-dimensional digital person.

2. The method of claim 1, wherein the first specific key point data comprises: coordinates of a first specific key point, the initial digital face data including: the step of updating the initial digital face data by using the first specific key point data to obtain three-dimensional digital face data, including:

and replacing the coordinates of the initial three-dimensional face key points corresponding to the first specific key points by using the coordinates of the first specific key points to obtain the three-dimensional digital face data.

3. The method according to claim 1, wherein the determining a first face feature vector corresponding to the face image to be processed includes:

performing first segmentation operation on the face image to be processed to obtain a first face segmentation result;

and determining the first face feature vector in the first face segmentation result.

4. The method of claim 1, wherein the first face feature vector comprises: a first reflection feature vector, a first detail feature vector, a first pose feature vector, a first expression feature vector, the method further comprising:

Obtaining a normal map according to the first detail feature vector, the first gesture feature vector and the first expression feature vector;

and obtaining the texture map according to the first reflection characteristic vector.

5. The method of claim 4, wherein the first face feature vector further comprises: the first shape feature vector, which generates initial digital face data according to the first face feature vector, includes:

and obtaining the initial digital face data according to the first shape feature vector, the first gesture feature vector and the first expression feature vector.

6. The method of claim 4, wherein processing the three-dimensional digital face data using digital person generation software to obtain a target three-dimensional digital person comprises:

based on the coordinates of key points in the three-dimensional digital face data, adjusting the coordinates of the face key points of the standard three-dimensional digital person to obtain first coordinates of the face key points of the target three-dimensional digital person;

and rendering the texture map and/or the normal map with the target three-dimensional digital person based on the first coordinates, and determining the target three-dimensional digital person.

7. The method of claim 1, wherein the first face feature vector or the initial digital face data is generated by a first neural network, a training process of the first neural network comprising:

performing a first downsampling operation on the image samples using the first neural network to determine a third potential feature vector;

performing a second downsampling operation on the image sample by using the first neural network, and determining a third detail feature vector of the image sample, wherein the third detail feature vector represents coordinates of detail key points of the face of the image sample;

8. The method of claim 7, wherein the third potential feature vector comprises: a third camera feature vector, a third reflection feature vector, a third light feature vector, a third shape feature vector, a third pose feature vector, a third expression feature vector of the image sample,

and performing a second upsampling operation on the third detail feature vector and the third potential feature vector to determine a fourth face image, including:

9. A three-dimensional digital person generating apparatus, comprising:

10. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to implement the method of any one of claims 1 to 8 when executing the instructions stored by the memory.

11. A non-transitory computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 8.