WO2023185395A1 - Facial expression capturing method and apparatus, computer device, and storage medium - Google Patents

Facial expression capturing method and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2023185395A1
WO2023185395A1 PCT/CN2023/080015 CN2023080015W WO2023185395A1 WO 2023185395 A1 WO2023185395 A1 WO 2023185395A1 CN 2023080015 W CN2023080015 W CN 2023080015W WO 2023185395 A1 WO2023185395 A1 WO 2023185395A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
image
virtual
driven
sub
Prior art date
Application number
PCT/CN2023/080015
Other languages
French (fr)
Chinese (zh)
Inventor
徐国智
温翔
李嘉豪
周佳庆
胡天磊
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023185395A1 publication Critical patent/WO2023185395A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings

Definitions

  • the present disclosure relates to the field of computer technology, and specifically to a facial expression capturing method, device, computer equipment and storage medium.
  • Embodiments of the present disclosure provide at least a facial expression capturing method, device, computer equipment and storage medium.
  • embodiments of the present disclosure provide a method for capturing facial expressions, including:
  • shape fusion BS coefficients are generated; the BS coefficients are used to input into a three-dimensional game engine to generate a three-dimensional virtual face model.
  • the generation process of the BS coefficient is performed after inputting the face driving image data into a pre-trained face processing model
  • the face processing model includes a first sub-face processing model and a second sub-face processing model; the first sub-face processing model is used to output the second expression feature based on the face driving image data, The second sub-face processing model is used to obtain the BS coefficient based on the second expression feature;
  • the first sub-face processing model includes an encoder, a first decoder and a second decoder.
  • the encoder is used to extract features from the image to obtain expression features.
  • the first decoder is used to decode the expression features to obtain The virtual face generates an image, and the second decoder is used to decode expression features to obtain the face driven data generated image.
  • the first sub-face processing model is trained through the following steps:
  • the face driven image data sample is encoded by the encoder of the first sub-face processing model to obtain the first control person's expression features, and the first control person's expression features are input into the first sub-face processing model a first decoder to obtain a first virtual face generated image; and, pass the virtual face image sample through the first sub-face processing model
  • the encoder encodes to obtain the first virtual expression feature, inputs the first virtual expression feature into the second decoder of the first sub-face processing model, and obtains the first face-driven data generated image;
  • the first virtual face generated image is encoded by the encoder to obtain the second virtual expression feature; and, the first face driving data generated image is encoded by the encoder to obtain the second controller expression feature;
  • Adjust model parameter information of the first sub-face processing model Based on the first loss information between the first controller expression feature and the second virtual expression feature, and the second loss information between the first virtual expression feature and the second controller expression feature, Adjust model parameter information of the first sub-face processing model.
  • the method further includes:
  • the method further includes:
  • the second loss information between the first virtual expression feature and the second controller expression feature Based on the first loss information between the first controller expression feature and the second virtual expression feature, the second loss information between the first virtual expression feature and the second controller expression feature, the The third loss information and the fourth loss information are used to adjust model parameter information of the first sub-face processing model.
  • determining the third loss information based on the first image information of the face driven image data sample and the second image information of the second face driven data generated image includes:
  • Third loss information is determined based on the first image quality loss information and the second image quality loss information.
  • the method further includes:
  • the face driven image data sample and the first virtual face generated image are input into a pre-trained discriminator to obtain the first authenticity discrimination result of the face driven image data sample and the first virtual face generated image.
  • the second authenticity judgment result of the human face generated image based on the first authenticity judgment result and the second authenticity judgment result, adjust the model parameter information of the first sub-face processing model until the second authenticity judgment result
  • the second authenticity judgment result of a virtual face generated image matches the first authenticity judgment result of the face driven image data sample;
  • the second sub-face processing model is trained through the following steps:
  • model parameter information of the second sub-face processing model is adjusted.
  • the acquisition of face driven image data samples and virtual face image samples includes:
  • the augmented face-driven image data and the augmented virtual face image are separately segmented to obtain face-driven image data including the first facial region.
  • Samples and virtual face image samples of the second face area including:
  • embodiments of the present disclosure also provide a facial expression capturing device, including:
  • the first acquisition module is used to acquire face-driven image data
  • a first extraction module configured to perform first feature extraction on the face-driven image data to obtain the first expression feature of the face-driven image data
  • a second extraction module configured to generate a virtual face image based on the first expression feature, and perform second feature extraction on the virtual face image to obtain the second expression feature of the virtual face image;
  • a generation module configured to generate shape fusion BS coefficients based on the second expression feature; the BS coefficients are used to input into a three-dimensional game engine to generate a three-dimensional virtual face model.
  • embodiments of the present disclosure also provide a computer device, including: a processor, a memory, and a bus.
  • the memory stores machine-readable instructions executable by the processor.
  • the processing The processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the steps in the above-mentioned first aspect, or any possible implementation manner of the first aspect, are performed.
  • embodiments of the present disclosure also provide a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • the computer program executes the above-mentioned first aspect, or any of the first aspects. steps in a possible implementation.
  • embodiments of the present disclosure also provide a computer program product.
  • the computer product carries program code.
  • the instructions included in the program code can be used to perform the above-mentioned first aspect, or any possible method in the first aspect. steps in the implementation.
  • the facial expression capture method performs first feature extraction on the acquired face driven image data to obtain the first expression feature of the face driven image data; and then generates a virtual face image based on the first expression feature, The second feature is extracted from the virtual face image to obtain the second expression feature of the virtual face image; finally, the BS coefficient is generated based on the second expression feature.
  • the BS coefficients can be automatically obtained based on the face driving image data, so that the 3D virtual character model can be quickly driven based on the automatically obtained BS coefficients.
  • information is lost through expression features in different domains, that is: the expression features of the first control person and the first virtual face of the face-driven image data sample First loss information between the second virtual expression feature of the generated image, and second loss between the first virtual expression feature of the virtual face image sample and the second control person expression feature of the first face driven data generated image Information
  • adjusting the model parameter information of the first sub-face processing model can make the encoder in the trained first sub-face processing model have better ability to obtain expression features through cross-domain encoding; through images in different domains Loss information, that is: third loss information between the first image information of the face-driven image data sample and the second image information of the second face-driven data generated image, and the third image information of the virtual face image sample and
  • the fourth loss information between the fourth image information of the second virtual face generated image and adjusting the model parameter information of the first sub-face processing model can make the decoder in the first sub-face processing model after training
  • the output image is more similar to the input
  • the training process of the face processing model can be realized without labeling the BS coefficients of the face-driven image data, so that the first sub-face processing model that has been trained can output expression features, and thus the trained first sub-face processing model can be
  • the second sub-face processing model can automatically obtain relatively accurate BS coefficients based on expression features.
  • Figure 1 shows a flow chart of a method for capturing facial expressions provided by an embodiment of the present disclosure
  • Figure 2 shows a flow chart of another facial expression capturing method provided by an embodiment of the present disclosure
  • Figure 3 shows a training flow chart of the first sub-face processing model provided by an embodiment of the present disclosure
  • Figure 4 shows a schematic flowchart of obtaining training samples provided by an embodiment of the present disclosure
  • Figure 5 shows a training flow chart of another first sub-face processing model provided by an embodiment of the present disclosure
  • Figure 6 shows a schematic flowchart of using a discriminator to train a first sub-face processing model provided by an embodiment of the present disclosure
  • Figure 7 shows a training flow chart of the second sub-face processing model provided by an embodiment of the present disclosure
  • Figure 8 shows a training flow chart of another second sub-face processing model provided by an embodiment of the present disclosure
  • Figure 9 shows a schematic structural diagram of a facial expression capturing device provided by an embodiment of the present disclosure.
  • Figure 10 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
  • the present disclosure provides a facial expression capturing method, which performs first feature extraction on the acquired face-driven image data to obtain the first expression feature of the face-driven image data; and then Generate a virtual face image based on the first expression feature, perform second feature extraction on the virtual face image, and obtain the second expression feature of the virtual face image; finally, generate a BS coefficient based on the second expression feature.
  • the BS coefficients can be automatically obtained based on the face driving image data, so that the 3D virtual character model can be quickly driven based on the automatically obtained BS coefficients.
  • the execution subject of the facial expression capturing method provided by the embodiment of the disclosure is generally a computer device with certain computing capabilities.
  • the facial expression capturing method provided by the embodiment of the present disclosure will be described below by taking the execution subject as a server as an example.
  • the method includes S101 to S104, where:
  • S102 Perform first feature extraction on the face-driven image data to obtain the first expression feature of the face-driven image data.
  • S103 Generate a virtual face image based on the first expression feature, and perform second feature extraction on the virtual face image to obtain the second expression feature of the virtual face image.
  • S104 Based on the second expression feature, generate shape fusion BS coefficients; the BS coefficients are used to input into a three-dimensional game engine to generate a three-dimensional virtual face model.
  • the face-driven image data may refer to the facial image including the controller, and the face-driven image data may be images under various facial expressions, such as images under various expressions such as smile, anger, fear, etc. wait.
  • the image including the controller's face can be obtained by collecting images of the controller.
  • the image including the controller's face can be collected by taking photos or video recordings. It should be noted here that the process of obtaining face-driven image data can be performed by the controller himself after triggering the image acquisition operation, or by the controller's authorization.
  • the collected image containing the face of the controller may be an original image containing parts other than the face (such as neck, hair, etc.). Other parts in the original image may have a certain impact on the process of capturing the facial expressions of the controller. For example, it may affect the extraction of expression features. Therefore, after the original image is collected, the original image can be preprocessed to obtain only the facial expressions containing Face-driven image data of a person's face.
  • the collected original images can first be subjected to face detection, face key point detection, face registration and other processes in order to determine the original The face area in the image; then based on the face area, the original image is segmented to obtain face-driven image data that only contains the face of the controller.
  • the above-mentioned process of preprocessing the original image can, to a certain extent, eliminate the influence of other parts of the original image on the process of capturing the facial expressions of the controller.
  • the first expression feature obtained by performing first feature extraction on the face driven image data may be an expression feature in the red, green, blue (Red Green Blue, RGB) color domain.
  • the first expression feature can be loaded on the virtual face to generate a virtual face image with the same facial expression as the face-driven image data.
  • the second expression feature obtained by performing second feature extraction on the virtual face image may be an expression feature in the computer graphics (Computer Graphics, CG) domain.
  • the shape fusion BS coefficient is the shape fusion Blend Shape coefficient.
  • BS coefficients can be used to input into 3D game engines, Generate a three-dimensional virtual face model, that is, input the BS coefficients into the three-dimensional game engine to generate a three-dimensional virtual face model.
  • the three-dimensional virtual face model generated based on the BS coefficient is consistent with the facial expression of the virtual face image, and then consistent with the facial expression of the face-driven image data, thus realizing the process of capturing facial expressions from the face-driven image data. , which can then be applied to the three-dimensional virtual face model.
  • the above process of capturing facial expressions can be implemented based on a pre-trained facial processing model, that is, the generation process of BS coefficients is performed after inputting the face driving image data into the pre-trained facial processing model. . That is, by inputting the acquired face-driven image data into the pre-trained face processing model, the BS coefficient can be obtained.
  • the face processing model may include a first sub-face processing model and a second sub-face processing model.
  • the first sub-face processing model is used to output the second expression feature based on the face driving image data;
  • the second sub-face processing model is used to obtain the BS coefficient based on the second expression feature.
  • the first sub-face processing model may be an autoencoder structure, and the second sub-face processing model may be a deep neural network (Deep Neural Networks, DNN).
  • the first sub-face processing model may include an encoder, a first decoder and a second decoder; wherein the encoder is used to extract features from the image to obtain expression features; and the first decoder is used to decode the expression features. A virtual face generated image is obtained; the second decoder is used to decode expression features to obtain a face driven data generated image.
  • the original image after obtaining the original image containing the face of the controller, the original image will be subjected to facial key point detection processing, and After the face registration process, the face-driven image data is obtained, and then the face-driven image data is input into the encoder Encoder in the first sub-face processing model, and the first feature extraction is performed on the face-driven image data to obtain the face
  • the first expression feature of the face driver image data then, the first decoder Decoder decodes the first expression feature to obtain the virtual face image; next, the encoder Encoder extracts the second feature of the virtual face image to obtain the virtual human face
  • the second sub-face processing model DNN obtains the BS coefficient based on the second expression feature.
  • the training process of the first sub-face processing model and the second sub-face processing model in the above-mentioned face processing model is not performed at the same time.
  • the training process of the second sub-face processing model may be performed during the first sub-face processing model. It is performed after the training of the sub-face processing model is completed.
  • the following will introduce the training process of the first sub-face processing model and the second sub-face processing model respectively according to the training sequence of the first sub-face processing model and the second sub-face processing model.
  • the first sub-face processing model is trained through the following steps:
  • the face-driven image data sample refers to an image sample including the face of the controller.
  • Face-driven image data samples can be image samples under various facial expressions, such as image samples under various expressions such as smile, anger, fear, etc.
  • the image sample including the face of the controller can be obtained by collecting images of the controller.
  • the image can be collected by taking photos or video recordings to obtain the image sample including the face of the controller.
  • the process of obtaining face-driven image data can be performed by the controller himself after triggering the image acquisition operation, or by the controller's authorization.
  • Virtual face image samples refer to image samples containing virtual faces.
  • Virtual face image samples can be various facial expressions. Image samples under different emotions, such as image samples under various expressions such as smile, anger, fear, etc.
  • a plurality of different BS coefficients can be generated in advance, and then the BS coefficients generate image samples containing the face of the virtual object.
  • the collected image samples containing the face of the controller and the generated image samples containing the face of the virtual object may be original face images containing parts other than the face (such as neck, hair, etc.), that is, the original face image.
  • Face driver image data and original virtual face image The original face driver image data and other parts of the original virtual face image may have a certain impact on the training process of the first sub-face processing model. For example, it may affect the extraction of expression features. Therefore, after obtaining the original face After driving the image data and the original virtual face image, the original face driving image data and the original virtual face image can be preprocessed respectively to obtain face-driven image data samples that only contain the face of the controller, and samples that contain only virtual objects. Image samples of faces.
  • the original face-driven image data and the original virtual face image can be augmented respectively.
  • augmentation processing can add some information or transform image features to the original image, selectively highlight or suppress certain features in the original image, and expand the number of image samples. By increasing the number of image samples, it can improve
  • the accurate training of the first sub-face processing model makes the second expression features obtained by the trained first sub-face processing model more accurate.
  • the augmented face-driven image data and the augmented virtual face image can be segmented separately to obtain a face-driven image data sample containing the first face area and a virtual face of the second face area.
  • Image sample the first face area is the face area of the controller, and the second face area is the face area of the virtual object.
  • the augmented face driving image data and the augmented virtual face image can be segmented respectively.
  • the face-driven image data and the augmented virtual face image are sequentially subjected to face detection, face key point detection processing and face registration processing to determine the first facial area and the face region of the augmented face-driven image data.
  • the second face area of the augmented virtual face image then, based on the first face area, segment the augmented face-driven image data to obtain a face-driven image data sample containing the first face area , and perform segmentation processing on the augmented virtual face image based on the second face area to obtain a virtual face image sample containing the second face area.
  • the face driven image data sample including the first face area and the virtual face image sample including the second face area can also be masked respectively to obtain a first mask image and a second mask image.
  • the above-mentioned preprocessing process of the original face-driven image data and the original virtual face image can eliminate to a certain extent the impact of the original face-driven image data and other parts of the original virtual face image on the first sub-face processing model. the impact of the training process.
  • S302 Encode the face driven image data sample through the encoder of the first sub-face processing model to obtain the first control person's expression features, and input the first control person's expression features into the first sub-face
  • the first decoder of the processing model obtains a first virtual face generated image
  • the virtual face image sample is encoded by the encoder of the first sub-face processing model to obtain the first virtual expression feature
  • the first virtual expression feature is obtained by encoding the virtual face image sample
  • the first virtual expression feature is input into the second decoder of the first sub-face processing model to obtain a first face driven data generated image.
  • the first controller's expression features may be expression features in the RGB domain. After inputting the first controller's expression features into the first decoder of the first sub-face processing model, the first decoder can obtain the first virtual face generated image based on the first controller's expression features, and the generated first virtual face The generated images have the same facial expression as the face-driven image data samples.
  • the first virtual expression feature is also an expression feature in the RGB domain.
  • the second decoder After inputting the first virtual expression feature into the second decoder of the first sub-face processing model, the second decoder can obtain the first face driving data to generate an image based on the first virtual expression feature, and the generated first face driving data The generated image has the same facial expression as the virtual face image sample.
  • S303 Encode the first virtual face generated image through the encoder to obtain the second virtual expression feature; and encode the first face driving data generated image through the encoder to obtain the second controller expression. feature.
  • the second virtual expression feature and the second controller expression feature may be expression features in the CG domain.
  • the expression feature loss information in different domains can be obtained. Based on the expression feature loss information, the model parameter information of the first sub-face processing model is adjusted. Through the loss information of expression features in different domains, the encoder in the trained first sub-face processing model can have better ability to encode expression features across domains.
  • S304 Based on the first loss information between the first controller expression feature and the second virtual expression feature, and the second loss between the first virtual expression feature and the second controller expression feature information to adjust the model parameter information of the first sub-face processing model.
  • the generated first virtual face generated image and the face driven image data sample have the same facial expression, that is, the first controller expression feature and the second virtual expression feature are expression features corresponding to the same facial expression,
  • the first controller's expression features and the second virtual expression features are expression features in the RGB domain and the CG domain respectively. Therefore, the first loss information can be determined based on the first controller's expression features and the second virtual expression features. In the same way, the second loss information is determined based on the first virtual expression feature and the second controller expression feature.
  • the model parameter information of the first sub-face processing model is adjusted, the first sub-face processing model is trained, and a trained first sub-face processing model is obtained.
  • the first loss information and the second loss information can be calculated using the Cycle Consistency Loss (CCL) loss function formula respectively:
  • xi represents the face-driven image data sample
  • Enc( xi ) is the expression feature of the first control person of the face-driven image data sample
  • Dec j is based on The first virtual face generated image generated by the expression features of the first controller
  • Enc(Dec j (Enc(xi ) )) is the second virtual expression feature of the first virtual face generated image.
  • xi represents the virtual face image sample
  • Enc( xi ) is the first virtual expression feature of the virtual face image sample
  • Dec j (Enc(xi ) ) is based on the first virtual expression
  • the first face driven data generated image generated by the feature, Enc(Dec j (Enc( xi ))) is the second controller expression feature of the first face driven data generated image.
  • the first image quality loss information of the generated face image in the CG domain and the image information of the face image in the RGB domain can also be used.
  • the model parameter information of the sub-face processing model is adjusted.
  • the face driven image data sample is encoded by the encoder of the first sub-face processing model to obtain the first control
  • the first control person's expression characteristics can also be input into the second decoder of the first sub-face processing model to obtain the second face-driven data to generate an image.
  • the second face-driven data generated image is a face image in the CG domain.
  • the second face-driven data generated image has the same facial expression as the face-driven image data sample.
  • third loss information is determined based on the first image information of the face-driven image data sample and the second image information of the second face-driven data generated image.
  • the third loss information is the image quality loss information between the image information of the image generated by the second face driving data in the CG domain and the image information of the face driving image data in the RGB domain.
  • a second virtual face generated image is obtained.
  • the second virtual face generated image is a face image in the CG domain.
  • the second virtual face generated image has the same facial expression as the virtual face image sample.
  • fourth loss information is determined based on the third image information of the virtual face image sample and the fourth image information of the second virtual face generated image.
  • the fourth loss information is image quality loss information between the image information of the second virtual face generated image in the CG domain and the image information of the virtual face image in the RGB domain.
  • the above-mentioned The third loss information and the above-mentioned fourth loss information adjust the model parameter information of the first sub-face processing model.
  • the image information may include pixel value information of each pixel point in the image, brightness information of the image, contrast information of the image, structural information of the image, etc.
  • the first image quality loss information may be determined based on the pixel value information of each pixel point in the image, and based on the first image quality loss information, the first image quality loss information may be determined.
  • the third loss information or the fourth loss information; or the second image quality loss information can be determined based on the brightness information, contrast information, and structure information of the image, and the third loss information or the fourth loss can be determined based on the second image quality loss information. information; or determine the third loss information or the fourth loss information based on the first image quality loss information and the second image quality loss information.
  • the image quality loss information can also be determined based on other image information, which will not be described in detail here.
  • the face driven image data sample is input into the encoder Encoder of the first sub-face processing model, and the encoding is obtained
  • the expression characteristics of the first controller are input into the first decoder of the first sub-face processing model to obtain the first virtual face generated image, and the expression characteristics of the first controller are input into the first sub-face processing model.
  • the second decoder of the face processing model, Decoder obtains the second face driver data to generate an image.
  • the first virtual face generated image is input to the encoder Encoder, and the second virtual expression feature is obtained by encoding; and the first face driving data generated image is input to the encoder Encoder, and the second controller expression feature is encoded.
  • the first loss information is determined; based on the a virtual expression feature and a second controller expression feature to determine the second loss information; and determine the third loss information based on the first image information of the face driven image data sample and the second image information of the second face driven data generated image ; Determine fourth loss information based on the third image information of the virtual face image sample and the fourth image information of the second virtual face generated image.
  • the model parameter information of the first sub-face processing model can be adjusted based on the first loss information, the second loss information, the third loss information, and the fourth loss information.
  • the following will take the determination of the third loss information as an example to introduce the process of determining the third loss information based on the first image quality loss information and the second image quality loss information.
  • the first image quality loss information may be determined based on the first pixel value information of each pixel point in the face driven image data sample and the second pixel value information of each pixel point in the second face driven data generated image.
  • mask processing can be performed on the face driven image data sample and the second face driven data generated image respectively in advance to obtain a mask image corresponding to the face driven image data sample and the second face driven data generated image.
  • a mask image corresponding to the image and then based on the first pixel value information of each pixel point in the mask image corresponding to the face driving image data sample and the second face driving data, the third value of each pixel point in the mask image corresponding to the image is generated. Two pixel value information to determine the first image quality loss.
  • the second brightness information of the image is generated based on the first brightness information of the face driven image data sample and the second face driven data sample, and the image brightness loss information is determined; based on the first contrast information and the second face driven image data sample
  • the face-driven data generates second contrast information of the image to determine the image contrast loss information; based on the first structural information of the face-driven image data sample and the second face-driven data, the second structural information of the image is generated to determine the image structure loss information.
  • second image quality loss information is determined based on the image brightness loss information, image contrast loss information, and image structure loss information.
  • third loss information is determined based on the first image quality loss information and the second image quality loss information.
  • the first image quality loss information and the second image quality loss information can be weighted and summed to obtain the third loss information.
  • the third loss information can be Among them, xi represents the face-driven image data sample, f( xi ) represents the first pixel value information of each pixel in the face-driven image data sample, and y i represents each of the pixels in the image generated by the second face-driven data.
  • the determination process of the fourth loss information is similar to the determination process of the third loss information mentioned above, and will not be described again here.
  • the fourth loss information can be calculated using the Structural Similarity Index Measure (SSIM) loss function.
  • SSIM Structural Similarity Index Measure
  • C 1 (K 1 L) 2
  • ⁇ x is the brightness information of the face driven image data sample
  • ⁇ y is the brightness information of the second face driven data generated image.
  • ⁇ x is the brightness information of the virtual face image sample
  • ⁇ y is the brightness information of the second virtual face generated image.
  • the discriminator can also be used to train the first sub-face processing model.
  • the discriminator can be pre-trained.
  • the discriminator and the first sub-face processing model form an adversarial network, and the discriminator optimizes the first sub-face processing model through the discrimination results of the generated images, so that the first virtual person generated by the first sub-face processing model
  • the face-generated image is more similar to the face-driven image data sample, and the first face-driven data-generated image is more similar to the virtual image sample.
  • the face driving image data sample and the first virtual face generated image can be input for pre-training
  • the first authenticity discrimination result of the face-driven image data sample and the second authenticity discrimination result of the first virtual face generated image are obtained; based on the first authenticity discrimination result and the second authenticity discrimination result , adjust the model parameter information of the first sub-face processing model until the second authenticity discrimination result of the first virtual face generated image matches the first authenticity discrimination result of the face driven image data sample.
  • the pre-trained discriminator can obtain the authenticity judgment result that the face-driven image data sample is real, that is, the first authenticity judgment result.
  • the first sub-face processing model may obtain the first The virtual face generated image is not a real authenticity judgment result, that is, the second authenticity judgment result.
  • the model parameter information of the first sub-face processing model can be adjusted based on the first authenticity judgment result and the second authenticity judgment result.
  • the first virtual face generated image generated by the first sub-face processing model after adjusting the model parameter information is input into the discriminator, and the second authenticity judgment result of the first virtual face generated image is obtained again. If The second authenticity judgment result still indicates that the first virtual face generated image is not authentic, then continue to adjust the model parameter information of the first sub-face processing model until the second authenticity judgment result of the first virtual face generated image is consistent with the second authenticity judgment result of the first virtual face generated image.
  • the first authenticity judgment result of the face-driven image data sample matches.
  • the first authenticity judgment result and the second authenticity judgment result can be represented by probability values. For example, when it is true, the authenticity judgment result can be represented by 1; when it is false, the authenticity judgment result can be Represented by 0. When the difference between the second authenticity judgment result of the first virtual face generated image and the first authenticity judgment result of the face driven image data sample is less than the set threshold, it can be considered that the first virtual face generated image is The second authenticity discrimination result matches the first authenticity discrimination result of the face-driven image data sample, and the training can be ended at this time.
  • the virtual face image sample and the first face-driven data generated image into a pre-trained discriminator to obtain the third authenticity discrimination result of the virtual face image sample and the first face-driven data generated image
  • the fourth authenticity discrimination result based on the third authenticity discrimination result and the fourth authenticity discrimination result, adjust the model parameter information of the first sub-face processing model until the first face driving data generates the fourth authenticity of the image
  • the discrimination result matches the third authenticity discrimination result of the virtual face image sample.
  • the model parameter information of the first sub-face processing model can be adjusted by referring to the aforementioned process, here No more details.
  • the training process of the first sub-face processing model is introduced above, and the training process of the second sub-face processing model is introduced below.
  • the second sub-face processing model is trained through the following steps:
  • S701 Input the virtual face image sample into the encoder of the trained first sub-face processing model to obtain a third virtual expression feature
  • S702 Input the third virtual expression feature into the second sub-face processing model to obtain the predicted BS coefficient corresponding to the virtual face image sample;
  • S703 Determine the fifth loss information based on the predicted BS coefficient and the known BS coefficient corresponding to the virtual face image sample;
  • the first sub-face processing model is trained. After the encoder of the trained first sub-face processing model encodes the virtual face image sample, the third virtual expression feature obtained is more accurate.
  • the second sub-face processing model may be a DNN, and the DNN may predict based on the third virtual expression feature to obtain the predicted BS coefficient corresponding to the virtual face image sample.
  • the known BS coefficient corresponding to the virtual face image sample refers to the BS system used to generate the virtual face image sample. number.
  • the mean square error (MSE) loss that is, the fifth loss information, can be calculated based on the predicted BS coefficient and the known BS coefficient corresponding to the virtual face image sample.
  • the model parameter information of the second sub-face processing model is adjusted to obtain the trained second sub-face processing model.
  • the writing order of each step does not mean a strict execution order and does not constitute any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possible The internal logic is determined.
  • the embodiments of the present disclosure also provide a facial expression capture device corresponding to the facial expression capture method. Since the principle of solving the problem of the device in the embodiment of the present disclosure is similar to the above-mentioned facial expression capture method of the embodiment of the present disclosure, therefore For the implementation of the device, please refer to the implementation of the method, and repeated details will not be repeated.
  • the device includes: a first acquisition module 901, a first extraction module 902, a second extraction module 903, and a generation module 904; in,
  • the first acquisition module 901 is used to acquire face driven image data
  • the first extraction module 902 is used to perform first feature extraction on the face-driven image data to obtain the first expression feature of the face-driven image data;
  • the second extraction module 903 is used to generate a virtual face image based on the first expression feature, and perform second feature extraction on the virtual face image to obtain the second expression feature of the virtual face image;
  • the generation module 904 is configured to generate shape fusion BS coefficients based on the second expression feature; the BS coefficients are used to input into a three-dimensional game engine to generate a three-dimensional virtual face model.
  • the generation process of the BS coefficient is performed after inputting the face driving image data into a pre-trained face processing model
  • the face processing model includes a first sub-face processing model and a second sub-face processing model; the first sub-face processing model is used to output the second expression feature based on the face driving image data, The second sub-face processing model is used to obtain the BS coefficient based on the second expression feature;
  • the first sub-face processing model includes an encoder, a first decoder and a second decoder.
  • the encoder is used to extract features from the image to obtain expression features.
  • the first decoder is used to decode the expression features to obtain The virtual face generates an image, and the second decoder is used to decode expression features to obtain the face driven data generated image.
  • the device further includes:
  • the second acquisition module is used to acquire face-driven image data samples and virtual face image samples
  • the first input module is used to encode the face driven image data sample through the encoder of the first sub-face processing model to obtain the first control person's expression features, and input the first control person's expression features into the
  • the first decoder of the first sub-face processing model obtains the first virtual face generated image; and, the virtual face image sample is encoded by the encoder of the first sub-face processing model to obtain the first virtual face image.
  • Expression features, input the first virtual expression features Enter the second decoder of the first sub-face processing model to obtain the first face driving data to generate an image;
  • An encoding module configured to encode the first virtual face generated image through the encoder to obtain a second virtual expression feature; and encode the first face driving data generated image through the encoder to obtain a second virtual expression feature. Controller’s facial expression characteristics;
  • the first adjustment module is configured to based on the first loss information between the first controller expression feature and the second virtual expression feature, and the first loss information between the first virtual expression feature and the second controller expression feature.
  • the second loss information between the sub-face processing models is used to adjust the model parameter information of the first sub-face processing model.
  • the device further includes:
  • the second input module is used to input the expression features of the first control person after encoding the face driving image data sample through the encoder of the first sub-face processing model to obtain the expression features of the first control person.
  • the second decoder of the first sub-face processing model obtains the second face driving data to generate an image;
  • a first determination module configured to determine third loss information based on the first image information of the face-driven image data sample and the second image information of the second face-driven data generated image;
  • the device also includes:
  • the third input module is configured to input the first virtual expression feature into the third virtual expression feature after encoding the virtual face image sample through the encoder of the first sub-face processing model to obtain the first virtual expression feature.
  • the first decoder of a sub-face processing model obtains a second virtual face generated image;
  • a second determination module configured to determine fourth loss information based on the third image information of the virtual face image sample and the fourth image information of the second virtual face generated image
  • the first adjustment module is specifically used for:
  • the second loss information between the first virtual expression feature and the second controller expression feature Based on the first loss information between the first controller expression feature and the second virtual expression feature, the second loss information between the first virtual expression feature and the second controller expression feature, the The third loss information and the fourth loss information are used to adjust model parameter information of the first sub-face processing model.
  • the first determination module is specifically used to:
  • Third loss information is determined based on the first image quality loss information and the second image quality loss information.
  • the device further includes:
  • the fourth input module is used to, after obtaining the first virtual face generated image and the first face driven data generated image, combine the face driven image data sample and the first virtual face generated image Enter pre-trained In the discriminator, a first authenticity discrimination result of the face driven image data sample and a second authenticity discrimination result of the first virtual face generated image are obtained; based on the first authenticity discrimination result and the For the second authenticity judgment result, the model parameter information of the first sub-face processing model is adjusted until the second authenticity judgment result of the first virtual face generated image is consistent with the second authenticity judgment result of the face driven image data sample.
  • One authenticity judgment result matches;
  • the device further includes:
  • a fifth input module configured to input the virtual face image sample into the encoder of the trained first sub-face processing model to obtain the third virtual expression feature
  • a sixth input module used to input the third virtual expression feature into the second sub-face processing model to obtain the predicted BS coefficient corresponding to the virtual face image sample;
  • a third determination module configured to determine fifth loss information based on the predicted BS coefficient and the known BS coefficient corresponding to the virtual face image sample;
  • a second adjustment module configured to adjust model parameter information of the second sub-face processing model based on the fifth loss information.
  • the second acquisition module is specifically used for:
  • the second acquisition module is specifically used for:
  • a schematic structural diagram of a computer device 1000 provided for an embodiment of the present disclosure includes a processor 1001 , a memory 1002 , and a bus 1003 .
  • the memory 1002 is used to store execution instructions, including the memory 10021 and the external memory 10022; the memory 10021 here is also called the internal memory, and is used to temporarily store the operation data in the processor 1001, and to communicate with external devices such as hard disks.
  • the processor 1001 exchanges data with the external memory 10022 through the memory 10022.
  • the processor 1001 and the memory 1002 communicate through the bus 1003, so that the processor 1001 executes the following instructions:
  • shape fusion BS coefficients are generated; the BS coefficients are used to input into a three-dimensional game engine to generate a three-dimensional virtual face model.
  • Embodiments of the present disclosure also provide a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • the storage medium may be a volatile or non-volatile computer-readable storage medium.
  • Embodiments of the present disclosure also provide a computer program product.
  • the computer product carries program code.
  • the instructions included in the program code can be used to execute the steps of the facial expression capturing method described in the above method embodiments. For details, please refer to the above method. The embodiments will not be described again here.
  • the above-mentioned computer program product can be specifically implemented by hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium.
  • the computer program product is embodied as a software product, such as a Software Development Kit (SDK), etc. wait.
  • SDK Software Development Kit
  • the disclosed devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division.
  • multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium that is executable by a processor.
  • the technical solution of the present disclosure is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments of the present disclosure. all or part of the process.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The present disclosure provides a facial expression capturing method and apparatus, a computer device, and a storage medium. The method comprises: acquiring face driving image data; performing first feature extraction on the face driving image data to obtain a first expression feature of the face driving image data; generating a virtual face image on the basis of the first expression feature, and performing second feature extraction on the virtual face image to obtain a second expression feature of the virtual face image; and generating a blend shape (BS) coefficient on the basis of the second expression feature, the BS coefficient being used for inputting a three-dimensional game engine to generate a three-dimensional virtual face model. According to embodiments of the present disclosure, the BS coefficient can be automatically obtained on the basis of the face driving image data, such that a 3D virtual role model can be quickly driven on the basis of the automatically obtained BS coefficient.

Description

一种面部表情捕捉方法、装置、计算机设备及存储介质Facial expression capturing method, device, computer equipment and storage medium
本申请要求于2022年03月30日提交中国专利局、申请号为202210326965.4、申请名称为“一种面部表情捕捉方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on March 30, 2022, with the application number 202210326965.4 and the application title "A facial expression capture method, device, computer equipment and storage medium", and its entire content incorporated herein by reference.
技术领域Technical field
本公开涉及计算机技术领域,具体而言,涉及一种面部表情捕捉方法、装置、计算机设备及存储介质。The present disclosure relates to the field of computer technology, and specifically to a facial expression capturing method, device, computer equipment and storage medium.
背景技术Background technique
在某些活动中,演员希望通过自己的面部表情,控制三维(3 Dimensions,3D)虚拟角色模型。演员可以驱动相关设备获取自己的面部表情,使得3D虚拟角色模型呈现出与自己相同的面部表情。在想要得到能够呈现出与演员相同的面部表情的3D虚拟角色模型的时候,通常需要手动标注演员的面部表情对应的形状融合(Blend Shape,BS)系数,然后利用BS系数驱动3D虚拟角色模型。这种手动标注的方式效率较低,影响3D虚拟角色模型的驱动效率。In some activities, actors hope to control three-dimensional (3D) virtual character models through their facial expressions. Actors can drive relevant equipment to obtain their own facial expressions, so that the 3D virtual character model shows the same facial expressions as themselves. When you want to obtain a 3D virtual character model that can show the same facial expression as an actor, you usually need to manually mark the shape fusion (Blend Shape, BS) coefficients corresponding to the actor's facial expressions, and then use the BS coefficients to drive the 3D virtual character model . This manual annotation method is inefficient and affects the driving efficiency of the 3D virtual character model.
发明内容Contents of the invention
本公开实施例至少提供一种面部表情捕捉方法、装置、计算机设备及存储介质。Embodiments of the present disclosure provide at least a facial expression capturing method, device, computer equipment and storage medium.
第一方面,本公开实施例提供了一种面部表情捕捉方法,包括:In a first aspect, embodiments of the present disclosure provide a method for capturing facial expressions, including:
获取人脸驱动图像数据;Obtain face-driven image data;
对所述人脸驱动图像数据进行第一特征提取,得到所述人脸驱动图像数据的第一表情特征;Perform first feature extraction on the face-driven image data to obtain the first expression feature of the face-driven image data;
基于所述第一表情特征生成虚拟人脸图像,并对所述虚拟人脸图像进行第二特征提取,得到所述虚拟人脸图像的第二表情特征;Generate a virtual face image based on the first expression feature, and perform second feature extraction on the virtual face image to obtain the second expression feature of the virtual face image;
基于所述第二表情特征,生成形状融合BS系数;所述BS系数用于输入三维游戏引擎,生成三维虚拟人脸模型。Based on the second expression feature, shape fusion BS coefficients are generated; the BS coefficients are used to input into a three-dimensional game engine to generate a three-dimensional virtual face model.
一种可选的实施方式中,所述BS系数的生成过程为将所述人脸驱动图像数据输入预先训练的脸部处理模型后执行的;In an optional implementation, the generation process of the BS coefficient is performed after inputting the face driving image data into a pre-trained face processing model;
所述脸部处理模型包括第一子脸部处理模型和第二子脸部处理模型;所述第一子脸部处理模型用于基于所述人脸驱动图像数据输出所述第二表情特征,所述第二子脸部处理模型用于基于所述第二表情特征得到所述BS系数;The face processing model includes a first sub-face processing model and a second sub-face processing model; the first sub-face processing model is used to output the second expression feature based on the face driving image data, The second sub-face processing model is used to obtain the BS coefficient based on the second expression feature;
所述第一子脸部处理模型包括编码器、第一解码器和第二解码器,所述编码器用于对图像进行特征提取得到表情特征,所述第一解码器用于对表情特征进行解码得到虚拟人脸生成图像,所述第二解码器用于对表情特征进行解码得到人脸驱动数据生成图像。The first sub-face processing model includes an encoder, a first decoder and a second decoder. The encoder is used to extract features from the image to obtain expression features. The first decoder is used to decode the expression features to obtain The virtual face generates an image, and the second decoder is used to decode expression features to obtain the face driven data generated image.
一种可选的实施方式中,所述第一子脸部处理模型通过以下步骤训练得到:In an optional implementation, the first sub-face processing model is trained through the following steps:
获取人脸驱动图像数据样本、以及虚拟人脸图像样本;Obtain face-driven image data samples and virtual face image samples;
将所述人脸驱动图像数据样本经所述第一子脸部处理模型的编码器编码得到第一控制人员表情特征,将所述第一控制人员表情特征输入所述第一子脸部处理模型的第一解码器,得到第一虚拟人脸生成图像;以及,将所述虚拟人脸图像样本经所述第一子脸部处理模型 的编码器编码得到第一虚拟表情特征,将所述第一虚拟表情特征输入所述第一子脸部处理模型的第二解码器,得到第一人脸驱动数据生成图像;The face driven image data sample is encoded by the encoder of the first sub-face processing model to obtain the first control person's expression features, and the first control person's expression features are input into the first sub-face processing model a first decoder to obtain a first virtual face generated image; and, pass the virtual face image sample through the first sub-face processing model The encoder encodes to obtain the first virtual expression feature, inputs the first virtual expression feature into the second decoder of the first sub-face processing model, and obtains the first face-driven data generated image;
将所述第一虚拟人脸生成图像经所述编码器编码得到第二虚拟表情特征;以及,将所述第一人脸驱动数据生成图像经所述编码器编码得到第二控制人员表情特征;The first virtual face generated image is encoded by the encoder to obtain the second virtual expression feature; and, the first face driving data generated image is encoded by the encoder to obtain the second controller expression feature;
基于所述第一控制人员表情特征和所述第二虚拟表情特征之间的第一损失信息,以及所述第一虚拟表情特征和所述第二控制人员表情特征之间的第二损失信息,调整所述第一子脸部处理模型的模型参数信息。Based on the first loss information between the first controller expression feature and the second virtual expression feature, and the second loss information between the first virtual expression feature and the second controller expression feature, Adjust model parameter information of the first sub-face processing model.
一种可选的实施方式中,在将所述人脸驱动图像数据样本经所述第一子脸部处理模型的编码器编码得到第一控制人员表情特征之后,所述方法还包括:In an optional implementation, after the face driven image data sample is encoded by the encoder of the first sub-face processing model to obtain the first control person's expression characteristics, the method further includes:
将所述第一控制人员表情特征输入所述第一子脸部处理模型的所述第二解码器,得到第二人脸驱动数据生成图像;Input the expression characteristics of the first controller into the second decoder of the first sub-face processing model to obtain a second face-driven data generated image;
基于所述人脸驱动图像数据样本的第一图像信息和所述第二人脸驱动数据生成图像的第二图像信息,确定第三损失信息;determining third loss information based on the first image information of the face-driven image data sample and the second image information of the second face-driven data generated image;
在将所述虚拟人脸图像样本经所述第一子脸部处理模型的编码器编码得到第一虚拟表情特征之后,所述方法还包括:After encoding the virtual face image sample through the encoder of the first sub-face processing model to obtain the first virtual expression feature, the method further includes:
将所述第一虚拟表情特征输入所述第一子脸部处理模型的所述第一解码器,得到第二虚拟人脸生成图像;Input the first virtual expression feature into the first decoder of the first sub-face processing model to obtain a second virtual face generated image;
基于所述虚拟人脸图像样本的第三图像信息和所述第二虚拟人脸生成图像的第四图像信息,确定第四损失信息;Determine fourth loss information based on the third image information of the virtual face image sample and the fourth image information of the second virtual face generated image;
所述基于所述第一控制人员表情特征和所述第二虚拟表情特征之间的第一损失信息,以及所述第一虚拟表情特征和所述第二控制人员表情特征之间的第二损失信息,调整所述第一子脸部处理模型的模型参数信息,包括:The first loss information based on the first controller expression feature and the second virtual expression feature, and the second loss between the first virtual expression feature and the second controller expression feature Information to adjust the model parameter information of the first sub-face processing model, including:
基于所述第一控制人员表情特征和所述第二虚拟表情特征之间的第一损失信息、所述第一虚拟表情特征和所述第二控制人员表情特征之间的第二损失信息、所述第三损失信息、所述第四损失信息,调整所述第一子脸部处理模型的模型参数信息。Based on the first loss information between the first controller expression feature and the second virtual expression feature, the second loss information between the first virtual expression feature and the second controller expression feature, the The third loss information and the fourth loss information are used to adjust model parameter information of the first sub-face processing model.
一种可选的实施方式中,所述基于所述人脸驱动图像数据样本的第一图像信息和所述第二人脸驱动数据生成图像的第二图像信息,确定第三损失信息,包括:In an optional implementation, determining the third loss information based on the first image information of the face driven image data sample and the second image information of the second face driven data generated image includes:
基于所述人脸驱动图像数据样本中各个像素点的第一像素值信息和所述第二人脸驱动数据生成图像的中各个像素点的第二像素值信息,确定第一图像质量损失信息;Determine first image quality loss information based on the first pixel value information of each pixel point in the face driven image data sample and the second pixel value information of each pixel point in the second face driven data generated image;
基于所述人脸驱动图像数据样本的第一亮度信息和所述第二人脸驱动数据生成图像的第二亮度信息,确定图像亮度损失信息;determining image brightness loss information based on first brightness information of the face driven image data sample and second brightness information of the second face driven data generated image;
基于所述人脸驱动图像数据样本的第一对比度信息和所述第二人脸驱动数据生成图像的第二对比度信息,确定图像对比度损失信息;determining image contrast loss information based on first contrast information of the face driven image data sample and second contrast information of the second face driven data generated image;
基于所述人脸驱动图像数据样本的第一结构信息和所述第二人脸驱动数据生成图像的第二结构信息,确定图像结构损失信息;Determine image structure loss information based on the first structural information of the face-driven image data sample and the second structural information of the second face-driven data generated image;
基于所述图像亮度损失信息、所述图像对比度损失信息和所述图像结构损失信息,确定所述第二图像质量损失信息;determining the second image quality loss information based on the image brightness loss information, the image contrast loss information and the image structure loss information;
基于所述第一图像质量损失信息和所述第二图像质量损失信息,确定第三损失信息。 Third loss information is determined based on the first image quality loss information and the second image quality loss information.
一种可选的实施方式中,得到所述第一虚拟人脸生成图像和所述第一人脸驱动数据生成图像之后,所述方法还包括:In an optional implementation, after obtaining the first virtual face generated image and the first face driven data generated image, the method further includes:
将所述人脸驱动图像数据样本和所述第一虚拟人脸生成图像输入预先训练好的判别器中,得到所述人脸驱动图像数据样本的第一真伪判别结果和所述第一虚拟人脸生成图像的第二真伪判别结果;基于所述第一真伪判别结果和所述第二真伪判别结果,调整所述第一子脸部处理模型的模型参数信息,直至所述第一虚拟人脸生成图像的第二真伪判别结果与所述人脸驱动图像数据样本的第一真伪判别结果相匹配;The face driven image data sample and the first virtual face generated image are input into a pre-trained discriminator to obtain the first authenticity discrimination result of the face driven image data sample and the first virtual face generated image. The second authenticity judgment result of the human face generated image; based on the first authenticity judgment result and the second authenticity judgment result, adjust the model parameter information of the first sub-face processing model until the second authenticity judgment result The second authenticity judgment result of a virtual face generated image matches the first authenticity judgment result of the face driven image data sample;
和/或,and / or,
将所述虚拟人脸图像样本和所述第一人脸驱动数据生成图像输入预先训练好的判别器中,得到所述虚拟人脸图像样本的第三真伪判别结果和所述第一人脸驱动数据生成图像的第四真伪判别结果;基于所述第三真伪判别结果和所述第四真伪判别结果,调整所述第一子脸部处理模型的模型参数信息,直至所述第一人脸驱动数据生成图像的第四真伪判别结果与所述虚拟人脸图像样本的第三真伪判别结果相匹配。Input the virtual face image sample and the first face driving data generated image into a pre-trained discriminator to obtain the third authenticity judgment result of the virtual face image sample and the first face driving data to generate a fourth authenticity judgment result of the image; based on the third authenticity judgment result and the fourth authenticity judgment result, adjust the model parameter information of the first sub-face processing model until the third authenticity judgment result The fourth authenticity judgment result of a face-driven data generated image matches the third authenticity judgment result of the virtual face image sample.
一种可选的实施方式中,所述第二子脸部处理模型通过以下步骤训练得到:In an optional implementation, the second sub-face processing model is trained through the following steps:
将所述虚拟人脸图像样本输入至所述训练完成的第一子脸部处理模型的所述编码器中,得到所述第三虚拟表情特征;Input the virtual face image sample into the encoder of the trained first sub-face processing model to obtain the third virtual expression feature;
将所述第三虚拟表情特征输入至所述第二子脸部处理模型中,得到所述虚拟人脸图像样本对应的预测BS系数;Input the third virtual expression feature into the second sub-face processing model to obtain the predicted BS coefficient corresponding to the virtual face image sample;
基于所述预测BS系数以及所述虚拟人脸图像样本对应的已知的BS系数,确定第五损失信息;Determine fifth loss information based on the predicted BS coefficient and the known BS coefficient corresponding to the virtual face image sample;
基于所述第五损失信息,调整所述第二子脸部处理模型的模型参数信息。Based on the fifth loss information, model parameter information of the second sub-face processing model is adjusted.
一种可选的实施方式中,所述获取人脸驱动图像数据样本、以及虚拟人脸图像样本,包括:In an optional implementation, the acquisition of face driven image data samples and virtual face image samples includes:
获取原始人脸驱动图像数据、以及原始虚拟人脸图像;Obtain the original face driver image data and the original virtual face image;
分别对所述原始人脸驱动图像数据和所述原始虚拟人脸图像进行增广处理,得到增广后人脸驱动图像数据样本和增广后虚拟人脸图像样本。Perform augmentation processing on the original face driven image data and the original virtual face image respectively to obtain augmented face driven image data samples and augmented virtual face image samples.
分别对所述增广后人脸驱动图像数据和所述增广后虚拟人脸图像进行分割处理,得到包含第一脸部区域的人脸驱动图像数据样本和第二脸部区域的虚拟人脸图像样本。Perform segmentation processing on the augmented face-driven image data and the augmented virtual face image respectively to obtain a face-driven image data sample containing the first face area and a virtual face of the second face area. Image sample.
一种可选的实施方式中,所述分别对所述增广后人脸驱动图像数据和所述增广后虚拟人脸图像进行分割处理,得到包含第一脸部区域的人脸驱动图像数据样本和第二脸部区域的虚拟人脸图像样本,包括:In an optional implementation, the augmented face-driven image data and the augmented virtual face image are separately segmented to obtain face-driven image data including the first facial region. Samples and virtual face image samples of the second face area, including:
分别对所述增广后人脸驱动图像数据和所述增广后虚拟人脸图像依次进行人脸检测、人脸关键点检测处理和人脸配准处理,确定增广后人脸驱动图像数据的第一脸部区域和所述增广后虚拟人脸图像的第二脸部区域;Perform face detection, face key point detection processing and face registration processing on the augmented face-driven image data and the augmented virtual face image respectively to determine the augmented face-driven image data The first face area and the second face area of the augmented virtual face image;
基于所述第一脸部区域,对所述增广后人脸驱动图像数据进行分割处理,得到包含所述第一脸部区域的人脸驱动图像数据样本,以及基于所述第二脸部区域,对所述增广后虚拟人脸图像进行分割处理,得到包含所述第二脸部区域的虚拟人脸图像样本。Based on the first face area, perform segmentation processing on the augmented face-driven image data to obtain face-driven image data samples including the first face area, and based on the second face area , perform segmentation processing on the augmented virtual face image to obtain a virtual face image sample including the second face area.
第二方面,本公开实施例还提供一种面部表情捕捉装置,包括: In a second aspect, embodiments of the present disclosure also provide a facial expression capturing device, including:
第一获取模块,用于获取人脸驱动图像数据;The first acquisition module is used to acquire face-driven image data;
第一提取模块,用于对所述人脸驱动图像数据进行第一特征提取,得到所述人脸驱动图像数据的第一表情特征;A first extraction module, configured to perform first feature extraction on the face-driven image data to obtain the first expression feature of the face-driven image data;
第二提取模块,用于基于所述第一表情特征生成虚拟人脸图像,并对所述虚拟人脸图像进行第二特征提取,得到所述虚拟人脸图像的第二表情特征;A second extraction module, configured to generate a virtual face image based on the first expression feature, and perform second feature extraction on the virtual face image to obtain the second expression feature of the virtual face image;
生成模块,用于基于所述第二表情特征,生成形状融合BS系数;所述BS系数用于输入三维游戏引擎,生成三维虚拟人脸模型。A generation module, configured to generate shape fusion BS coefficients based on the second expression feature; the BS coefficients are used to input into a three-dimensional game engine to generate a three-dimensional virtual face model.
第三方面,本公开实施例还提供一种计算机设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当计算机设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。In a third aspect, embodiments of the present disclosure also provide a computer device, including: a processor, a memory, and a bus. The memory stores machine-readable instructions executable by the processor. When the computer device is running, the processing The processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the steps in the above-mentioned first aspect, or any possible implementation manner of the first aspect, are performed.
第四方面,本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。In a fourth aspect, embodiments of the present disclosure also provide a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is run by a processor, the computer program executes the above-mentioned first aspect, or any of the first aspects. steps in a possible implementation.
第五方面,本公开实施例还提供一种计算机程序产品,该计算机产品承载有程序代码,所述程序代码包括的指令可用于执行如上述第一方面,或第一方面中任一种可能的实施方式中的步骤。In a fifth aspect, embodiments of the present disclosure also provide a computer program product. The computer product carries program code. The instructions included in the program code can be used to perform the above-mentioned first aspect, or any possible method in the first aspect. steps in the implementation.
本公开实施例提供的面部表情捕捉方法,对获取到的人脸驱动图像数据进行第一特征提取,得到人脸驱动图像数据的第一表情特征;然后基于第一表情特征生成虚拟人脸图像,并对虚拟人脸图像进行第二特征提取,得到虚拟人脸图像的第二表情特征;最后基于第二表情特征,生成BS系数。通过上述过程,可以基于人脸驱动图像数据自动获取到BS系数,从而可以基于自动获取的BS系数可以快速驱动3D虚拟角色模型。The facial expression capture method provided by the embodiment of the present disclosure performs first feature extraction on the acquired face driven image data to obtain the first expression feature of the face driven image data; and then generates a virtual face image based on the first expression feature, The second feature is extracted from the virtual face image to obtain the second expression feature of the virtual face image; finally, the BS coefficient is generated based on the second expression feature. Through the above process, the BS coefficients can be automatically obtained based on the face driving image data, so that the 3D virtual character model can be quickly driven based on the automatically obtained BS coefficients.
进一步的,本公开实施例在对脸部处理模型进行训练的过程中,通过不同域下的表情特征损失信息,即:人脸驱动图像数据样本的第一控制人员表情特征与第一虚拟人脸生成图像的第二虚拟表情特征之间的第一损失信息,以及虚拟人脸图像样本的第一虚拟表情特征与第一人脸驱动数据生成图像的第二控制人员表情特征之间的第二损失信息,调整第一子脸部处理模型的模型参数信息,可以使得训练完成的第一子脸部处理模型中的编码器具有更好地跨域编码得到表情特征的能力;通过不同域下的图像损失信息,即:人脸驱动图像数据样本的第一图像信息和第二人脸驱动数据生成图像的第二图像信息之间的第三损失信息,以及虚拟人脸图像样本的第三图像信息和第二虚拟人脸生成图像的第四图像信息之间的第四损失信息,调整第一子脸部处理模型的模型参数信息,可以使得训练完成的第一子脸部处理模型中的解码器器输出的图像与输入的图像更相似。通过上述过程,可以在不对人脸驱动图像数据进行BS系数标注的情况下,实现对脸部处理模型的训练过程,使得训练完成的第一子脸部处理模型输出表情特征,进而使得训练完成的第二子脸部处理模型能够基于表情特征自动得到相对准确的BS系数。Furthermore, in the process of training the face processing model in the embodiment of the present disclosure, information is lost through expression features in different domains, that is: the expression features of the first control person and the first virtual face of the face-driven image data sample First loss information between the second virtual expression feature of the generated image, and second loss between the first virtual expression feature of the virtual face image sample and the second control person expression feature of the first face driven data generated image Information, adjusting the model parameter information of the first sub-face processing model can make the encoder in the trained first sub-face processing model have better ability to obtain expression features through cross-domain encoding; through images in different domains Loss information, that is: third loss information between the first image information of the face-driven image data sample and the second image information of the second face-driven data generated image, and the third image information of the virtual face image sample and The fourth loss information between the fourth image information of the second virtual face generated image and adjusting the model parameter information of the first sub-face processing model can make the decoder in the first sub-face processing model after training The output image is more similar to the input image. Through the above process, the training process of the face processing model can be realized without labeling the BS coefficients of the face-driven image data, so that the first sub-face processing model that has been trained can output expression features, and thus the trained first sub-face processing model can be The second sub-face processing model can automatically obtain relatively accurate BS coefficients based on expression features.
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present disclosure more obvious and understandable, preferred embodiments are given below and described in detail with reference to the accompanying drawings.
附图说明 Description of the drawings
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the drawings required to be used in the embodiments will be briefly introduced below. The drawings here are incorporated into the specification and constitute a part of this specification. These drawings are The drawings illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It should be understood that the following drawings only illustrate certain embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. For those of ordinary skill in the art, without exerting creative efforts, they can also Other relevant drawings are obtained based on these drawings.
图1示出了本公开实施例所提供的一种面部表情捕捉方法的流程图;Figure 1 shows a flow chart of a method for capturing facial expressions provided by an embodiment of the present disclosure;
图2示出了本公开实施例所提供的另一种面部表情捕捉方法的流程图;Figure 2 shows a flow chart of another facial expression capturing method provided by an embodiment of the present disclosure;
图3示出了本公开实施例所提供的第一子脸部处理模型的训练流程图;Figure 3 shows a training flow chart of the first sub-face processing model provided by an embodiment of the present disclosure;
图4示出了本公开实施例所提供的获取训练样本的流程示意图;Figure 4 shows a schematic flowchart of obtaining training samples provided by an embodiment of the present disclosure;
图5示出了本公开实施例所提供的另一种第一子脸部处理模型的训练流程图;Figure 5 shows a training flow chart of another first sub-face processing model provided by an embodiment of the present disclosure;
图6示出了本公开实施例所提供的利用判别器对第一子脸部处理模型进行训练的流程示意图;Figure 6 shows a schematic flowchart of using a discriminator to train a first sub-face processing model provided by an embodiment of the present disclosure;
图7示出了本公开实施例所提供的第二子脸部处理模型的训练流程图;Figure 7 shows a training flow chart of the second sub-face processing model provided by an embodiment of the present disclosure;
图8示出了本公开实施例所提供的另一种第二子脸部处理模型的训练流程图;Figure 8 shows a training flow chart of another second sub-face processing model provided by an embodiment of the present disclosure;
图9示出了本公开实施例所提供的一种面部表情捕捉装置的结构示意图;Figure 9 shows a schematic structural diagram of a facial expression capturing device provided by an embodiment of the present disclosure;
图10示出了本公开实施例所提供的一种计算机设备的示意图。Figure 10 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only These are some embodiments of the present disclosure, but not all embodiments. The components of the embodiments of the present disclosure generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Therefore, the following detailed description of the embodiments of the disclosure provided in the appended drawings is not intended to limit the scope of the claimed disclosure, but rather to represent selected embodiments of the disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without any creative efforts shall fall within the scope of protection of the present disclosure.
在实际中,想要得到能够呈现出与演员相同的面部表情的3D虚拟角色模型的时候,在演员驱动相关设备获取到自己的面部图像之后,需要通过手动标注的方式将演员的面部图像对应的BS系数标注出来,然后根据演员的面部图像以及对应的BS系数驱动3D虚拟角色模型。这种手动标注的方式效率较低,影响3D虚拟角色模型的驱动效率。In practice, when you want to obtain a 3D virtual character model that can show the same facial expression as an actor, after the actor drives the relevant device to obtain his own facial image, he needs to manually annotate the corresponding facial image of the actor. The BS coefficient is marked, and then the 3D virtual character model is driven based on the actor's facial image and the corresponding BS coefficient. This manual annotation method is inefficient and affects the driving efficiency of the 3D virtual character model.
基于此,为了克服上段所述的缺陷,本公开提供了一种面部表情捕捉方法,对获取到的人脸驱动图像数据进行第一特征提取,得到人脸驱动图像数据的第一表情特征;然后基于第一表情特征生成虚拟人脸图像,并对虚拟人脸图像进行第二特征提取,得到虚拟人脸图像的第二表情特征;最后基于第二表情特征,生成BS系数。通过上述过程,可以基于人脸驱动图像数据自动获取到BS系数,从而可以基于自动获取的BS系数可以快速驱动3D虚拟角色模型。Based on this, in order to overcome the shortcomings mentioned in the previous paragraph, the present disclosure provides a facial expression capturing method, which performs first feature extraction on the acquired face-driven image data to obtain the first expression feature of the face-driven image data; and then Generate a virtual face image based on the first expression feature, perform second feature extraction on the virtual face image, and obtain the second expression feature of the virtual face image; finally, generate a BS coefficient based on the second expression feature. Through the above process, the BS coefficients can be automatically obtained based on the face driving image data, so that the 3D virtual character model can be quickly driven based on the automatically obtained BS coefficients.
针对以上缺陷以及所提出的解决方案,均是发明人在经过实践并仔细研究后得出的结果,因此,上述问题的发现过程以及下文中本公开针对上述问题所提出的解决方案,都应 该是发明人在本公开过程中对本公开做出的贡献。The above defects and proposed solutions are the results obtained by the inventor after practice and careful research. Therefore, the discovery process of the above problems and the solutions proposed by the present disclosure for the above problems below should be This is the inventor's contribution to this disclosure during the course of this disclosure.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that similar reference numerals and letters represent similar items in the following figures, therefore, once an item is defined in one figure, it does not need further definition and explanation in subsequent figures.
为便于对本实施例进行理解,首先对本公开实施例所公开的一种面部表情捕捉方法进行详细介绍,本公开实施例所提供的面部表情捕捉方法的执行主体一般为具有一定计算能力的计算机设备。In order to facilitate understanding of this embodiment, a facial expression capturing method disclosed in an embodiment of the disclosure is first introduced in detail. The execution subject of the facial expression capturing method provided by the embodiment of the disclosure is generally a computer device with certain computing capabilities.
下面以执行主体为服务器为例对本公开实施例提供的面部表情捕捉方法加以说明。The facial expression capturing method provided by the embodiment of the present disclosure will be described below by taking the execution subject as a server as an example.
参见图1所示,为本公开实施例提供的一种面部表情捕捉方法的流程图,所述方法包括S101~S104,其中:Referring to Figure 1, a flow chart of a method for capturing facial expressions is provided according to an embodiment of the present disclosure. The method includes S101 to S104, where:
S101:获取人脸驱动图像数据。S101: Obtain face-driven image data.
S102:对所述人脸驱动图像数据进行第一特征提取,得到所述人脸驱动图像数据的第一表情特征。S102: Perform first feature extraction on the face-driven image data to obtain the first expression feature of the face-driven image data.
S103:基于所述第一表情特征生成虚拟人脸图像,并对所述虚拟人脸图像进行第二特征提取,得到所述虚拟人脸图像的第二表情特征。S103: Generate a virtual face image based on the first expression feature, and perform second feature extraction on the virtual face image to obtain the second expression feature of the virtual face image.
S104:基于所述第二表情特征,生成形状融合BS系数;所述BS系数用于输入三维游戏引擎,生成三维虚拟人脸模型。S104: Based on the second expression feature, generate shape fusion BS coefficients; the BS coefficients are used to input into a three-dimensional game engine to generate a three-dimensional virtual face model.
在本公开实施例中,人脸驱动图像数据可以指包含控制人员的脸部图像,人脸驱动图像数据可以是各种面部表情下的图像,例如微笑、愤怒、恐惧等各种表情下的图像等。这里,可以通过对控制人员进行图像采集的方式,得到包含控制人员脸部的图像,例如,可以通过拍照或者录像等方式采集包含控制人员脸部的图像。这里需要说明的是,获取人脸驱动图像数据的过程可以是控制人员自己触发图像采集操作后执行的,或者是经过控制人员授权后执行的。In the embodiment of the present disclosure, the face-driven image data may refer to the facial image including the controller, and the face-driven image data may be images under various facial expressions, such as images under various expressions such as smile, anger, fear, etc. wait. Here, the image including the controller's face can be obtained by collecting images of the controller. For example, the image including the controller's face can be collected by taking photos or video recordings. It should be noted here that the process of obtaining face-driven image data can be performed by the controller himself after triggering the image acquisition operation, or by the controller's authorization.
采集到的包含控制人员脸部的图像可能是包含除脸部以外的其他部位(例如颈部、头发等)的原始图像。原始图像中的其他部位可能会对捕捉控制人员面部表情的过程产生一定的影响,例如,可能影响表情特征的提取,因此,在采集到原始图像后,可以对原始图像进行预处理,得到仅包含控制人员脸部的人脸驱动图像数据。The collected image containing the face of the controller may be an original image containing parts other than the face (such as neck, hair, etc.). Other parts in the original image may have a certain impact on the process of capturing the facial expressions of the controller. For example, it may affect the extraction of expression features. Therefore, after the original image is collected, the original image can be preprocessed to obtain only the facial expressions containing Face-driven image data of a person's face.
在对采集到的原始图像进行预处理的过程中,在一种实施方式中,首先可以对采集到的原始图像依次进行人脸检测、人脸关键点检测、人脸配准等处理,确定原始图像中的脸部区域;然后基于脸部区域,对原始图像进行分割处理,得到仅包含控制人员脸部的人脸驱动图像数据。上述对原始图像进行预处理的过程,可以在一定程度上消除原始图像中的其他部分对捕捉控制人员面部表情的过程产生的影响。In the process of preprocessing the collected original images, in one embodiment, the collected original images can first be subjected to face detection, face key point detection, face registration and other processes in order to determine the original The face area in the image; then based on the face area, the original image is segmented to obtain face-driven image data that only contains the face of the controller. The above-mentioned process of preprocessing the original image can, to a certain extent, eliminate the influence of other parts of the original image on the process of capturing the facial expressions of the controller.
通过对人脸驱动图像数据进行第一特征提取所得到的第一表情特征可以是在红绿蓝(Red Green Blue,RGB)颜色域下的表情特征。在得到人脸驱动图像数据的第一表情特征之后,可以将第一表情特征加载在虚拟人脸上,生成与人脸驱动图像数据具有相同面部表情的虚拟人脸图像。The first expression feature obtained by performing first feature extraction on the face driven image data may be an expression feature in the red, green, blue (Red Green Blue, RGB) color domain. After obtaining the first expression feature of the face-driven image data, the first expression feature can be loaded on the virtual face to generate a virtual face image with the same facial expression as the face-driven image data.
通过对虚拟人脸图像进行第二特征提取所得到的第二表情特征可以是在计算机图形学(Computer Graphics,CG)域下的表情特征。The second expression feature obtained by performing second feature extraction on the virtual face image may be an expression feature in the computer graphics (Computer Graphics, CG) domain.
形状融合BS系数即形状融合Blend Shape系数。BS系数可以用于输入三维游戏引擎, 生成三维虚拟人脸模型,也就是将BS系数输入至三维游戏引擎中,可以生成三维虚拟人脸模型。基于BS系数生成的三维虚拟人脸模型与虚拟人脸图像的面部表情是一致的,继而与人脸驱动图像数据的面部表情是一致的,从而实现了从人脸驱动图像数据捕捉面部表情的过程,继而可以应用到三维虚拟人脸模型。The shape fusion BS coefficient is the shape fusion Blend Shape coefficient. BS coefficients can be used to input into 3D game engines, Generate a three-dimensional virtual face model, that is, input the BS coefficients into the three-dimensional game engine to generate a three-dimensional virtual face model. The three-dimensional virtual face model generated based on the BS coefficient is consistent with the facial expression of the virtual face image, and then consistent with the facial expression of the face-driven image data, thus realizing the process of capturing facial expressions from the face-driven image data. , which can then be applied to the three-dimensional virtual face model.
在一种实施方式中,上述面部表情捕捉的过程可以是基于预先训练的脸部处理模型实现的,即BS系数的生成过程为将人脸驱动图像数据输入预先训练的脸部处理模型后执行的。也就是,将获取到的人脸驱动图像数据输入至预先训练的脸部处理模型中,可以得到BS系数。In one implementation, the above process of capturing facial expressions can be implemented based on a pre-trained facial processing model, that is, the generation process of BS coefficients is performed after inputting the face driving image data into the pre-trained facial processing model. . That is, by inputting the acquired face-driven image data into the pre-trained face processing model, the BS coefficient can be obtained.
其中,脸部处理模型可以包括第一子脸部处理模型和第二子脸部处理模型。其中,第一子脸部处理模型用于基于人脸驱动图像数据输出第二表情特征;第二子脸部处理模型用于基于第二表情特征得到BS系数。示例性地,第一子脸部处理模型可以为自编码器Autoencoder结构,第二子脸部处理模型可以为深度神经网络(Deep Neural Networks,DNN)。Wherein, the face processing model may include a first sub-face processing model and a second sub-face processing model. Among them, the first sub-face processing model is used to output the second expression feature based on the face driving image data; the second sub-face processing model is used to obtain the BS coefficient based on the second expression feature. For example, the first sub-face processing model may be an autoencoder structure, and the second sub-face processing model may be a deep neural network (Deep Neural Networks, DNN).
进一步地,第一子脸部处理模型中可以包括编码器、第一解码器和第二解码器;其中,编码器用于对图像进行特征提取得到表情特征;第一解码器用于对表情特征进行解码得到虚拟人脸生成图像;第二解码器用于对表情特征进行解码得到人脸驱动数据生成图像。Further, the first sub-face processing model may include an encoder, a first decoder and a second decoder; wherein the encoder is used to extract features from the image to obtain expression features; and the first decoder is used to decode the expression features. A virtual face generated image is obtained; the second decoder is used to decode expression features to obtain a face driven data generated image.
在具体实施中,参见图2所示的另一种面部表情捕捉方法的流程图中,在获取到的包含控制人员脸部的原始图像后,将对原始图像进行人脸关键点检测处理、以及人脸配准处理后,得到人脸驱动图像数据,然后人脸驱动图像数据输入至第一子脸部处理模型中的编码器Encoder中,对人脸驱动图像数据进行第一特征提取,得到人脸驱动图像数据的第一表情特征;然后,第一解码器Decoder对第一表情特征进行解码得到虚拟人脸图像;接下来,编码器Encoder对虚拟人脸图像进行第二特征提取,得到虚拟人脸图像的第二表情特征;最后,第二子脸部处理模型DNN基于第二表情特征得到BS系数。In a specific implementation, referring to the flow chart of another facial expression capturing method shown in Figure 2, after obtaining the original image containing the face of the controller, the original image will be subjected to facial key point detection processing, and After the face registration process, the face-driven image data is obtained, and then the face-driven image data is input into the encoder Encoder in the first sub-face processing model, and the first feature extraction is performed on the face-driven image data to obtain the face The first expression feature of the face driver image data; then, the first decoder Decoder decodes the first expression feature to obtain the virtual face image; next, the encoder Encoder extracts the second feature of the virtual face image to obtain the virtual human face The second expression feature of the face image; finally, the second sub-face processing model DNN obtains the BS coefficient based on the second expression feature.
上述脸部处理模型中的第一子脸部处理模型与第二子脸部处理模型的训练过程不是同时进行的,在具体实施中,第二子脸部处理模型的训练过程可以是在第一子脸部处理模型训练完成之后进行的。The training process of the first sub-face processing model and the second sub-face processing model in the above-mentioned face processing model is not performed at the same time. In a specific implementation, the training process of the second sub-face processing model may be performed during the first sub-face processing model. It is performed after the training of the sub-face processing model is completed.
下面将按照第一子脸部处理模型与第二子脸部处理模型的训练先后顺序,分别对第一子脸部处理模型与第二子脸部处理模型的训练过程进行介绍。The following will introduce the training process of the first sub-face processing model and the second sub-face processing model respectively according to the training sequence of the first sub-face processing model and the second sub-face processing model.
下面先介绍第一子脸部处理模型的训练过程。参见图3所示的第一子脸部处理模型的训练流程图,第一子脸部处理模型通过以下步骤训练得到:The following first introduces the training process of the first sub-face processing model. Referring to the training flow chart of the first sub-face processing model shown in Figure 3, the first sub-face processing model is trained through the following steps:
S301:获取人脸驱动图像数据样本、以及虚拟人脸图像样本。S301: Obtain face-driven image data samples and virtual face image samples.
在本公开实施例中,人脸驱动图像数据样本指包含控制人员脸部的图像样本。人脸驱动图像数据样本可以是各种面部表情下的图像样本,例如微笑、愤怒、恐惧等各种表情下的图像样本等。In the embodiment of the present disclosure, the face-driven image data sample refers to an image sample including the face of the controller. Face-driven image data samples can be image samples under various facial expressions, such as image samples under various expressions such as smile, anger, fear, etc.
这里,可以通过对控制人员进行图像采集的方式,得到包含控制人员脸部的图像样本,例如,可以通过拍照或者录像等方式采集图像,得到包含控制人员脸部的图像样本。这里需要说明的是,获取人脸驱动图像数据的过程可以是控制人员自己触发图像采集操作后执行的,或者是经过控制人员授权后执行的。Here, the image sample including the face of the controller can be obtained by collecting images of the controller. For example, the image can be collected by taking photos or video recordings to obtain the image sample including the face of the controller. It should be noted here that the process of obtaining face-driven image data can be performed by the controller himself after triggering the image acquisition operation, or by the controller's authorization.
虚拟人脸图像样本指包含虚拟人脸的图像样本,虚拟人脸图像样本可以是各种面部表 情下的图像样本,例如微笑、愤怒、恐惧等各种表情下的图像样本。这里,可以预先生成多个不同的BS系数,然后BS系数生成包含虚拟对象脸部的图像样本。Virtual face image samples refer to image samples containing virtual faces. Virtual face image samples can be various facial expressions. Image samples under different emotions, such as image samples under various expressions such as smile, anger, fear, etc. Here, a plurality of different BS coefficients can be generated in advance, and then the BS coefficients generate image samples containing the face of the virtual object.
采集到的包含控制人员脸部的图像样本、以及生成的包含虚拟对象脸部的图像样本可能是包含除脸部以外的其他部位(例如颈部、头发等)的原始人脸图像,即原始人脸驱动图像数据和原始虚拟人脸图像。原始人脸驱动图像数据和原始虚拟人脸图像中的其他部位可能会对第一子脸部处理模型的训练过程产生一定的影响,例如可能影响表情特征的提取,因此,在获取到原始人脸驱动图像数据以及原始虚拟人脸图像之后,可以分别对原始人脸驱动图像数据以及原始虚拟人脸图像进行预处理,得到仅包含控制人员脸部的人脸驱动图像数据样本,以及仅包含虚拟对象脸部的图像样本。The collected image samples containing the face of the controller and the generated image samples containing the face of the virtual object may be original face images containing parts other than the face (such as neck, hair, etc.), that is, the original face image. Face driver image data and original virtual face image. The original face driver image data and other parts of the original virtual face image may have a certain impact on the training process of the first sub-face processing model. For example, it may affect the extraction of expression features. Therefore, after obtaining the original face After driving the image data and the original virtual face image, the original face driving image data and the original virtual face image can be preprocessed respectively to obtain face-driven image data samples that only contain the face of the controller, and samples that contain only virtual objects. Image samples of faces.
在对获取到原始人脸驱动图像数据以及原始虚拟人脸图像进行预处理的过程中,在一种实施方式中,首先,可以分别对原始人脸驱动图像数据和原始虚拟人脸图像进行增广处理,得到增广后人脸驱动图像数据样本和增广后虚拟人脸图像样本。其中,通过增广处理可以对原始图像附加一些信息或变换图像特征,有选择地突出或抑制原始图像中的某些特征,可以实现对图像样本的数量扩充,通过增加图像样本的数量,可以提高对第一子脸部处理模型的训练精确,使得训练完成的第一子脸部处理模型得到的第二表情特征更加准确。In the process of preprocessing the original face-driven image data and the original virtual face image, in one implementation, first, the original face-driven image data and the original virtual face image can be augmented respectively. Process to obtain augmented face driven image data samples and augmented virtual face image samples. Among them, augmentation processing can add some information or transform image features to the original image, selectively highlight or suppress certain features in the original image, and expand the number of image samples. By increasing the number of image samples, it can improve The accurate training of the first sub-face processing model makes the second expression features obtained by the trained first sub-face processing model more accurate.
接下来,可以分别对增广后人脸驱动图像数据和增广后虚拟人脸图像进行分割处理,得到包含第一脸部区域的人脸驱动图像数据样本和第二脸部区域的虚拟人脸图像样本。其中,第一脸部区域即控制人员脸部区域,第二脸部区域即虚拟对象脸部区域。Next, the augmented face-driven image data and the augmented virtual face image can be segmented separately to obtain a face-driven image data sample containing the first face area and a virtual face of the second face area. Image sample. Among them, the first face area is the face area of the controller, and the second face area is the face area of the virtual object.
在进一步的实施方式中,在对增广后人脸驱动图像数据和增广后虚拟人脸图像进行分割处理的过程中,如图4所示的获取训练样本的流程示意图中,可以分别对增广后人脸驱动图像数据和增广后虚拟人脸图像依次进行人脸检测、人脸关键点检测处理和人脸配准处理,确定增广后人脸驱动图像数据的第一脸部区域和增广后虚拟人脸图像的第二脸部区域;然后,基于第一脸部区域,对增广后人脸驱动图像数据进行分割处理,得到包含第一脸部区域的人脸驱动图像数据样本,以及基于第二脸部区域,对增广后虚拟人脸图像进行分割处理,得到包含第二脸部区域的虚拟人脸图像样本。这里,还可以分别对包含第一脸部区域的人脸驱动图像数据样本和包含第二脸部区域的虚拟人脸图像样本进行掩膜处理,得到第一掩膜图像和第二掩膜图像。上述对原始人脸驱动图像数据以及原始虚拟人脸图像进行预处理的过程,可以在一定程度上消除原始人脸驱动图像数据以及原始虚拟人脸图像中的其他部分对第一子脸部处理模型的训练过程产生的影响。In a further embodiment, during the segmentation process of the augmented face driven image data and the augmented virtual face image, as shown in the schematic flow chart of obtaining training samples as shown in Figure 4, the augmented face driving image data and the augmented virtual face image can be segmented respectively. The face-driven image data and the augmented virtual face image are sequentially subjected to face detection, face key point detection processing and face registration processing to determine the first facial area and the face region of the augmented face-driven image data. The second face area of the augmented virtual face image; then, based on the first face area, segment the augmented face-driven image data to obtain a face-driven image data sample containing the first face area , and perform segmentation processing on the augmented virtual face image based on the second face area to obtain a virtual face image sample containing the second face area. Here, the face driven image data sample including the first face area and the virtual face image sample including the second face area can also be masked respectively to obtain a first mask image and a second mask image. The above-mentioned preprocessing process of the original face-driven image data and the original virtual face image can eliminate to a certain extent the impact of the original face-driven image data and other parts of the original virtual face image on the first sub-face processing model. the impact of the training process.
S302:将所述人脸驱动图像数据样本经所述第一子脸部处理模型的编码器编码得到第一控制人员表情特征,将所述第一控制人员表情特征输入所述第一子脸部处理模型的第一解码器,得到第一虚拟人脸生成图像;以及,将所述虚拟人脸图像样本经所述第一子脸部处理模型的编码器编码得到第一虚拟表情特征,将所述第一虚拟表情特征输入所述第一子脸部处理模型的第二解码器,得到第一人脸驱动数据生成图像。S302: Encode the face driven image data sample through the encoder of the first sub-face processing model to obtain the first control person's expression features, and input the first control person's expression features into the first sub-face The first decoder of the processing model obtains a first virtual face generated image; and, the virtual face image sample is encoded by the encoder of the first sub-face processing model to obtain the first virtual expression feature, and the first virtual expression feature is obtained by encoding the virtual face image sample The first virtual expression feature is input into the second decoder of the first sub-face processing model to obtain a first face driven data generated image.
第一控制人员表情特征可以是在RGB域下的表情特征。将第一控制人员表情特征输入第一子脸部处理模型的第一解码器后,第一解码器可以基于第一控制人员表情特征得到第一虚拟人脸生成图像,生成的第一虚拟人脸生成图像与人脸驱动图像数据样本具有相同的面部表情。 The first controller's expression features may be expression features in the RGB domain. After inputting the first controller's expression features into the first decoder of the first sub-face processing model, the first decoder can obtain the first virtual face generated image based on the first controller's expression features, and the generated first virtual face The generated images have the same facial expression as the face-driven image data samples.
第一虚拟表情特征也是RGB域下的表情特征。将第一虚拟表情特征输入第一子脸部处理模型的第二解码器后,第二解码器可以基于第一虚拟表情特征得到第一人脸驱动数据生成图像,生成的第一人脸驱动数据生成图像与虚拟人脸图像样本具有相同的面部表情。The first virtual expression feature is also an expression feature in the RGB domain. After inputting the first virtual expression feature into the second decoder of the first sub-face processing model, the second decoder can obtain the first face driving data to generate an image based on the first virtual expression feature, and the generated first face driving data The generated image has the same facial expression as the virtual face image sample.
S303:将所述第一虚拟人脸生成图像经所述编码器编码得到第二虚拟表情特征;以及,将所述第一人脸驱动数据生成图像经所述编码器编码得到第二控制人员表情特征。S303: Encode the first virtual face generated image through the encoder to obtain the second virtual expression feature; and encode the first face driving data generated image through the encoder to obtain the second controller expression. feature.
第二虚拟表情特征与第二控制人员表情特征可以是CG域下的表情特征。The second virtual expression feature and the second controller expression feature may be expression features in the CG domain.
基于RGB域下的表情特征与CG域下的表情特征,可以得到不同域下的表情特征损失信息,基于表情特征损失信息,对第一子脸部处理模型的模型参数信息进行调整。通过不同域下的表情特征损失信息可以使得训练完成的第一子脸部处理模型中的编码器具有更好地跨域编码得到表情特征的能力。Based on the expression features in the RGB domain and the expression features in the CG domain, the expression feature loss information in different domains can be obtained. Based on the expression feature loss information, the model parameter information of the first sub-face processing model is adjusted. Through the loss information of expression features in different domains, the encoder in the trained first sub-face processing model can have better ability to encode expression features across domains.
S304:基于所述第一控制人员表情特征和所述第二虚拟表情特征之间的第一损失信息,以及所述第一虚拟表情特征和所述第二控制人员表情特征之间的第二损失信息,调整所述第一子脸部处理模型的模型参数信息。S304: Based on the first loss information between the first controller expression feature and the second virtual expression feature, and the second loss between the first virtual expression feature and the second controller expression feature information to adjust the model parameter information of the first sub-face processing model.
由前所述,生成的第一虚拟人脸生成图像与人脸驱动图像数据样本具有相同的面部表情,也就是第一控制人员表情特征和第二虚拟表情特征是对应相同面部表情的表情特征,只是第一控制人员表情特征和第二虚拟表情特征分别是RGB域下与CG域下的表情特征,因此,这里可以基于第一控制人员表情特征和第二虚拟表情特征,确定第一损失信息。同理,基于第一虚拟表情特征和第二控制人员表情特征,确定第二损失信息。As mentioned above, the generated first virtual face generated image and the face driven image data sample have the same facial expression, that is, the first controller expression feature and the second virtual expression feature are expression features corresponding to the same facial expression, However, the first controller's expression features and the second virtual expression features are expression features in the RGB domain and the CG domain respectively. Therefore, the first loss information can be determined based on the first controller's expression features and the second virtual expression features. In the same way, the second loss information is determined based on the first virtual expression feature and the second controller expression feature.
基于第一损失信息和第二损失信息,调整第一子脸部处理模型的模型参数信息,对第一子脸部处理模型进行训练,得到训练完成的第一子脸部处理模型。Based on the first loss information and the second loss information, the model parameter information of the first sub-face processing model is adjusted, the first sub-face processing model is trained, and a trained first sub-face processing model is obtained.
其中,第一损失信息和第二损失信息可以分别采用循环一致性(Cycle Consistency Loss,CCL)损失函数公式计算得到:Among them, the first loss information and the second loss information can be calculated using the Cycle Consistency Loss (CCL) loss function formula respectively:
其中,在计算第一损失信息时,xi表示人脸驱动图像数据样本,Enc(xi)为人脸驱动图像数据样本的第一控制人员表情特征,Decj(Enc(xi))为基于第一控制人员表情特征生成的第一虚拟人脸生成图像,Enc(Decj(Enc(xi)))为第一虚拟人脸生成图像的第二虚拟表情特征。在计算第二损失信息时,xi表示虚拟人脸图像样本,Enc(xi)为虚拟人脸图像样本的第一虚拟表情特征,Decj(Enc(xi))为基于第一虚拟表情特征生成的第一人脸驱动数据生成图像,Enc(Decj(Enc(xi)))为第一人脸驱动数据生成图像的第二控制人员表情特征。 Among them, when calculating the first loss information, xi represents the face-driven image data sample, Enc( xi ) is the expression feature of the first control person of the face-driven image data sample, and Dec j (Enc(xi ) ) is based on The first virtual face generated image generated by the expression features of the first controller, Enc(Dec j (Enc(xi ) )) is the second virtual expression feature of the first virtual face generated image. When calculating the second loss information, xi represents the virtual face image sample, Enc( xi ) is the first virtual expression feature of the virtual face image sample, and Dec j (Enc(xi ) ) is based on the first virtual expression The first face driven data generated image generated by the feature, Enc(Dec j (Enc( xi ))) is the second controller expression feature of the first face driven data generated image.
为了提高生成图像的图像质量,在一种实施方式中,还可以基于生成的CG域下的人脸图像的图像信息与RGB域下的人脸图像的图像信息的图像质量损失信息,对第一子脸部处理模型的模型参数信息进行调整。In order to improve the image quality of the generated image, in one embodiment, the first image quality loss information of the generated face image in the CG domain and the image information of the face image in the RGB domain can also be used. The model parameter information of the sub-face processing model is adjusted.
具体地,在将人脸驱动图像数据样本经第一子脸部处理模型的编码器编码得到第一控 制人员表情特征之后,还可以将第一控制人员表情特征输入第一子脸部处理模型的第二解码器,得到第二人脸驱动数据生成图像。这里,第二人脸驱动数据生成图像是CG域下的人脸图像。第二人脸驱动数据生成图像与人脸驱动图像数据样本具有相同的面部表情。Specifically, the face driven image data sample is encoded by the encoder of the first sub-face processing model to obtain the first control After controlling the expression characteristics of the person, the first control person's expression characteristics can also be input into the second decoder of the first sub-face processing model to obtain the second face-driven data to generate an image. Here, the second face-driven data generated image is a face image in the CG domain. The second face-driven data generated image has the same facial expression as the face-driven image data sample.
然后,基于人脸驱动图像数据样本的第一图像信息和第二人脸驱动数据生成图像的第二图像信息,确定第三损失信息。第三损失信息,即CG域下的第二人脸驱动数据生成图像的图像信息与RGB域下的人脸驱动图像数据的图像信息之间图像质量损失信息。Then, third loss information is determined based on the first image information of the face-driven image data sample and the second image information of the second face-driven data generated image. The third loss information is the image quality loss information between the image information of the image generated by the second face driving data in the CG domain and the image information of the face driving image data in the RGB domain.
在将第一虚拟表情特征输入第一子脸部处理模型的所述第一解码器,得到第二虚拟人脸生成图像。这里,第二虚拟人脸生成图像是CG域下的人脸图像。第二虚拟人脸生成图像与虚拟人脸图像样本具有相同的面部表情。After inputting the first virtual expression feature into the first decoder of the first sub-face processing model, a second virtual face generated image is obtained. Here, the second virtual face generated image is a face image in the CG domain. The second virtual face generated image has the same facial expression as the virtual face image sample.
然后,基于虚拟人脸图像样本的第三图像信息和第二虚拟人脸生成图像的第四图像信息,确定第四损失信息。第四损失信息,即CG域下的第二虚拟人脸生成图像的图像信息与RGB域下的虚拟人脸图像的图像信息之间图像质量损失信息。Then, fourth loss information is determined based on the third image information of the virtual face image sample and the fourth image information of the second virtual face generated image. The fourth loss information is image quality loss information between the image information of the second virtual face generated image in the CG domain and the image information of the virtual face image in the RGB domain.
最终,基于前述第一控制人员表情特征和所述第二虚拟表情特征之间的第一损失信息、前述第一虚拟表情特征和所述第二控制人员表情特征之间的第二损失信息、上述第三损失信息、上述第四损失信息,调整第一子脸部处理模型的模型参数信息。Finally, based on the first loss information between the aforementioned first controller expression characteristics and the second virtual expression characteristics, the second loss information between the aforementioned first virtual expression characteristics and the second controller expression characteristics, the above-mentioned The third loss information and the above-mentioned fourth loss information adjust the model parameter information of the first sub-face processing model.
在上述实施方式中,图像信息可以包括图像中各个像素点的像素值信息、图像的亮度信息、图像的对比度信息、图像的结构信息等。In the above embodiment, the image information may include pixel value information of each pixel point in the image, brightness information of the image, contrast information of the image, structural information of the image, etc.
在具体实施中,在确定第三损失信息或第四损失信息的过程中,可以基于图像中各个像素点的像素值信息,确定第一图像质量损失信息,并基于第一图像质量损失信息,确定第三损失信息或第四损失信息;或者可以基于图像的亮度信息、对比度信息、结构信息,确定第二图像质量损失信息,并基于第二图像质量损失信息,确定第三损失信息或第四损失信息;或者是基于第一图像质量损失信息以及第二图像质量损失信息,确定第三损失信息或第四损失信息。In a specific implementation, in the process of determining the third loss information or the fourth loss information, the first image quality loss information may be determined based on the pixel value information of each pixel point in the image, and based on the first image quality loss information, the first image quality loss information may be determined. The third loss information or the fourth loss information; or the second image quality loss information can be determined based on the brightness information, contrast information, and structure information of the image, and the third loss information or the fourth loss can be determined based on the second image quality loss information. information; or determine the third loss information or the fourth loss information based on the first image quality loss information and the second image quality loss information.
这里需要说明的是,在其他实施方式中,还可以基于其他图像信息确定图像质量损失信息,这里不再详述。It should be noted here that in other implementations, the image quality loss information can also be determined based on other image information, which will not be described in detail here.
在具体实施中,参见图5所示的另一种第一子脸部处理模型的训练流程图,将人脸驱动图像数据样本输入至第一子脸部处理模型的编码器Encoder中,编码得到第一控制人员表情特征,将第一控制人员表情特征输入第一子脸部处理模型的第一解码器Decoder,得到第一虚拟人脸生成图像,并且将第一控制人员表情特征输入第一子脸部处理模型的第二解码器Decoder,得到第二人脸驱动数据生成图像。In a specific implementation, referring to the training flow chart of another first sub-face processing model shown in Figure 5, the face driven image data sample is input into the encoder Encoder of the first sub-face processing model, and the encoding is obtained The expression characteristics of the first controller are input into the first decoder of the first sub-face processing model to obtain the first virtual face generated image, and the expression characteristics of the first controller are input into the first sub-face processing model. The second decoder of the face processing model, Decoder, obtains the second face driver data to generate an image.
将虚拟人脸图像样本输入至第一子脸部处理模型的编码器Encoder,编码得到第一虚拟表情特征,将第一虚拟表情特征输入第一子脸部处理模型的第二解码器Decoder,得到第一人脸驱动数据生成图像,并且将第一虚拟表情特征输入第一子脸部处理模型的第一解码器Decoder,得到第二虚拟人脸生成图像。Input the virtual face image sample to the encoder Encoder of the first sub-face processing model, encode it to obtain the first virtual expression feature, input the first virtual expression feature to the second decoder Decoder of the first sub-face processing model, and obtain The first face driving data generates an image, and the first virtual expression feature is input into the first decoder of the first sub-face processing model to obtain a second virtual face generated image.
然后,将第一虚拟人脸生成图像输入至编码器Encoder,编码得到第二虚拟表情特征;以及,将第一人脸驱动数据生成图像输入至编码器Encoder,编码得到第二控制人员表情特征。Then, the first virtual face generated image is input to the encoder Encoder, and the second virtual expression feature is obtained by encoding; and the first face driving data generated image is input to the encoder Encoder, and the second controller expression feature is encoded.
最后,基于第一控制人员表情特征和第二虚拟表情特征,确定第一损失信息;基于第 一虚拟表情特征和第二控制人员表情特征,确定第二损失信息;基于人脸驱动图像数据样本的第一图像信息和第二人脸驱动数据生成图像的第二图像信息,确定第三损失信息;基于虚拟人脸图像样本的第三图像信息和第二虚拟人脸生成图像的第四图像信息,确定第四损失信息。Finally, based on the first controller expression characteristics and the second virtual expression characteristics, the first loss information is determined; based on the a virtual expression feature and a second controller expression feature to determine the second loss information; and determine the third loss information based on the first image information of the face driven image data sample and the second image information of the second face driven data generated image ; Determine fourth loss information based on the third image information of the virtual face image sample and the fourth image information of the second virtual face generated image.
之后就可以根据第一损失信息、第二损失信息、第三损失信息、第四损失信息,调整第一子脸部处理模型的模型参数信息。Then, the model parameter information of the first sub-face processing model can be adjusted based on the first loss information, the second loss information, the third loss information, and the fourth loss information.
下面将以确定第三损失信息为例,对基于第一图像质量损失信息以及第二图像质量损失信息,确定第三损失信息的过程进行介绍。The following will take the determination of the third loss information as an example to introduce the process of determining the third loss information based on the first image quality loss information and the second image quality loss information.
这里,可以基于人脸驱动图像数据样本中各个像素点的第一像素值信息和第二人脸驱动数据生成图像的中各个像素点的第二像素值信息,确定第一图像质量损失信息。Here, the first image quality loss information may be determined based on the first pixel value information of each pixel point in the face driven image data sample and the second pixel value information of each pixel point in the second face driven data generated image.
在具体实施中,可以预先分别对人脸驱动图像数据样本和第二人脸驱动数据生成图像进行掩膜处理,得到人脸驱动图像数据样本对应的掩膜图像,以及第二人脸驱动数据生成图像对应的掩膜图像,然后基于人脸驱动图像数据样本对应的掩膜图像中各个像素点的第一像素值信息与第二人脸驱动数据生成图像对应的掩膜图像中各个像素点的第二像素值信息,确定第一图像质量损失。In a specific implementation, mask processing can be performed on the face driven image data sample and the second face driven data generated image respectively in advance to obtain a mask image corresponding to the face driven image data sample and the second face driven data generated image. A mask image corresponding to the image, and then based on the first pixel value information of each pixel point in the mask image corresponding to the face driving image data sample and the second face driving data, the third value of each pixel point in the mask image corresponding to the image is generated. Two pixel value information to determine the first image quality loss.
并且,基于人脸驱动图像数据样本的第一亮度信息和第二人脸驱动数据生成图像的第二亮度信息,确定图像亮度损失信息;基于人脸驱动图像数据样本的第一对比度信息和第二人脸驱动数据生成图像的第二对比度信息,确定图像对比度损失信息;基于人脸驱动图像数据样本的第一结构信息和第二人脸驱动数据生成图像的第二结构信息,确定图像结构损失信息。Furthermore, the second brightness information of the image is generated based on the first brightness information of the face driven image data sample and the second face driven data sample, and the image brightness loss information is determined; based on the first contrast information and the second face driven image data sample The face-driven data generates second contrast information of the image to determine the image contrast loss information; based on the first structural information of the face-driven image data sample and the second face-driven data, the second structural information of the image is generated to determine the image structure loss information. .
这里,也可以先得到人脸驱动图像数据样本对应的掩膜图像以及第二人脸驱动数据生成图像对应的掩膜图像之后,再分别确定图像亮度损失信息、图像对比度损失信息以及图像结构损失信息。Here, you can also first obtain the mask image corresponding to the face driving image data sample and the mask image corresponding to the second face driving data generated image, and then determine the image brightness loss information, image contrast loss information and image structure loss information respectively. .
然后,基于图像亮度损失信息、图像对比度损失信息以及图像结构损失信息,确定第二图像质量损失信息。Then, second image quality loss information is determined based on the image brightness loss information, image contrast loss information, and image structure loss information.
最后,基于第一图像质量损失信息和所述第二图像质量损失信息,确定第三损失信息。这里,可以对第一图像质量损失信息和第二图像质量损失信息进行加权求和,得到第三损失信息。Finally, third loss information is determined based on the first image quality loss information and the second image quality loss information. Here, the first image quality loss information and the second image quality loss information can be weighted and summed to obtain the third loss information.
这里,第三损失信息可以为其中,xi表示人脸驱动图像数据样本时,f(xi)表示人脸驱动图像数据样本中各个像素点的第一像素值信息,yi表示第二人脸驱动数据生成图像的中各个像素点的第二像素值信息;xi表示虚拟人脸图像样本时,f(xi)表示虚拟人脸图像样本中各个像素点的第一像素值信息,yi表示第二虚拟人脸生成图像的中各个像素点的第二像素值信息。Here, the third loss information can be Among them, xi represents the face-driven image data sample, f( xi ) represents the first pixel value information of each pixel in the face-driven image data sample, and y i represents each of the pixels in the image generated by the second face-driven data. The second pixel value information of the pixel; when xi represents the virtual face image sample, f( xi ) represents the first pixel value information of each pixel in the virtual face image sample, and y i represents the generation of the second virtual face The second pixel value information of each pixel in the image.
第四损失信息的确定过程与上述第三损失信息的确定过程类似,这里不再赘述。 The determination process of the fourth loss information is similar to the determination process of the third loss information mentioned above, and will not be described again here.
这里,可以利用结构相似性指标衡量损失函数(Structural Similarity Index Measure,SSIM)计算第四损失信息,具体地,第四损失信息为LSSIM(x,y)=[l(x,y)α·c(x,y)β·s(x,y)γ],α、β、γ大于0,用来调整三个模块间的重要性。Here, the fourth loss information can be calculated using the Structural Similarity Index Measure (SSIM) loss function. Specifically, the fourth loss information is L SSIM (x, y) = [l (x, y) α · c(x,y) β ·s(x,y) γ ], α, β, γ are greater than 0, used to adjust the importance between the three modules.
其中,表示图像亮度损失信息,C1为常数,是为了避免分母的平方和部分接近0时造成***的不稳定,C1=(K1L)2,L为图像灰度级数,一般为L=255,K1<<1。当x为人脸驱动图像数据样本、y为第二人脸驱动数据生成图像时,μx为人脸驱动图像数据样本的亮度信息,μy为第二人脸驱动数据生成图像的亮度信息。当x为虚拟人脸图像样本、y为第二虚拟人脸生成图像时,μx为虚拟人脸图像样本的亮度信息,μy为第二虚拟人脸生成图像的亮度信息。in, Represents image brightness loss information. C 1 is a constant to avoid system instability when the sum of squares of the denominator is close to 0. C 1 = (K 1 L) 2 , L is the image gray level, generally L = 255, K 1 <<1. When x is the face driven image data sample and y is the second face driven data generated image, μ x is the brightness information of the face driven image data sample, and μ y is the brightness information of the second face driven data generated image. When x is a virtual face image sample and y is a second virtual face generated image, μ x is the brightness information of the virtual face image sample, and μ y is the brightness information of the second virtual face generated image.
表示图像对比度损失信息,常数C2=(K2L)2,且K2<<1。当x为人脸驱动图像数据样本、y为第二人脸驱动数据生成图像时,σx为人脸驱动图像数据样本的对比度信息,σy为第二人脸驱动数据生成图像的对比度信息。当x为虚拟人脸图像样本、y为第二虚拟人脸生成图像时,σx为虚拟人脸图像样本的对比度信息,σy为第二虚拟人脸生成图像的对比度信息。 Represents image contrast loss information, constant C 2 =(K 2 L) 2 , and K 2 <<1. When x is the face-driven image data sample and y is the image generated by the second face-driven data, σ x is the contrast information of the face-driven image data sample, and σ y is the contrast information of the image generated by the second face-driven data. When x is a virtual face image sample and y is a second virtual face generated image, σ x is the contrast information of the virtual face image sample, and σ y is the contrast information of the second virtual face generated image.
表示图像结构损失信息,且 represents the image structure loss information, and
为了提高生成图像的质量,还可以利用判别器对第一子脸部处理模型进行训练。判别器可以是预先训练好的。判别器与第一子脸部处理模型构成对抗网络,通过判别器对生成图像的判别结果,对第一子脸部处理模型进行优化,从而使得第一子脸部处理模型生成的第一虚拟人脸生成图像与人脸驱动图像数据样本更相似,以及第一人脸驱动数据生成图像与虚拟图像样本更相似。In order to improve the quality of the generated images, the discriminator can also be used to train the first sub-face processing model. The discriminator can be pre-trained. The discriminator and the first sub-face processing model form an adversarial network, and the discriminator optimizes the first sub-face processing model through the discrimination results of the generated images, so that the first virtual person generated by the first sub-face processing model The face-generated image is more similar to the face-driven image data sample, and the first face-driven data-generated image is more similar to the virtual image sample.
在一种实施方式中,参见图6所示的利用判别器对第一子脸部处理模型进行训练的流程示意图中,可以将人脸驱动图像数据样本和第一虚拟人脸生成图像输入预先训练好的判别器中,得到人脸驱动图像数据样本的第一真伪判别结果和第一虚拟人脸生成图像的第二真伪判别结果;基于第一真伪判别结果和第二真伪判别结果,调整第一子脸部处理模型的模型参数信息,直至第一虚拟人脸生成图像的第二真伪判别结果与人脸驱动图像数据样本的第一真伪判别结果相匹配。 In one embodiment, referring to the schematic flow chart of using a discriminator to train the first sub-face processing model shown in Figure 6, the face driving image data sample and the first virtual face generated image can be input for pre-training In a good discriminator, the first authenticity discrimination result of the face-driven image data sample and the second authenticity discrimination result of the first virtual face generated image are obtained; based on the first authenticity discrimination result and the second authenticity discrimination result , adjust the model parameter information of the first sub-face processing model until the second authenticity discrimination result of the first virtual face generated image matches the first authenticity discrimination result of the face driven image data sample.
经过预先训练好的判别器可以得到人脸驱动图像数据样本为真的真伪判别结果,即第一真伪判别结果。在第一子脸部处理模型的训练开始阶段,由于第一子脸部处理模型生成的第一虚拟人脸生成图像与人脸驱动图像数据样本的相似性较小,经过判别器可能得到第一虚拟人脸生成图像不为真的真伪判别结果,即第二真伪判别结果。此时,可以基于第一真伪判别结果和第二真伪判别结果,调整第一子脸部处理模型的模型参数信息。The pre-trained discriminator can obtain the authenticity judgment result that the face-driven image data sample is real, that is, the first authenticity judgment result. At the beginning of the training of the first sub-face processing model, since the similarity between the first virtual face generated image generated by the first sub-face processing model and the face driving image data sample is small, the first sub-face processing model may obtain the first The virtual face generated image is not a real authenticity judgment result, that is, the second authenticity judgment result. At this time, the model parameter information of the first sub-face processing model can be adjusted based on the first authenticity judgment result and the second authenticity judgment result.
然后,再将利用调整模型参数信息后第一子脸部处理模型生成的第一虚拟人脸生成图像输入至判别器中,再次得到第一虚拟人脸生成图像的第二真伪判别结果,如果第二真伪判别结果仍然指示第一虚拟人脸生成图像不为真,则继续调整第一子脸部处理模型的模型参数信息,直至第一虚拟人脸生成图像的第二真伪判别结果与人脸驱动图像数据样本的第一真伪判别结果相匹配。Then, the first virtual face generated image generated by the first sub-face processing model after adjusting the model parameter information is input into the discriminator, and the second authenticity judgment result of the first virtual face generated image is obtained again. If The second authenticity judgment result still indicates that the first virtual face generated image is not authentic, then continue to adjust the model parameter information of the first sub-face processing model until the second authenticity judgment result of the first virtual face generated image is consistent with the second authenticity judgment result of the first virtual face generated image. The first authenticity judgment result of the face-driven image data sample matches.
在一种实施方式中,第一真伪判别结果和第二真伪判别结果可以用概率值表示,比如,为真时,真伪判别结果可以用1表示;为伪时,真伪判别结果可以用0表示。当第一虚拟人脸生成图像的第二真伪判别结果与人脸驱动图像数据样本的第一真伪判别结果的差值小于设定阈值的情况下,可以认为第一虚拟人脸生成图像的第二真伪判别结果与人脸驱动图像数据样本的第一真伪判别结果相匹配,此时可以结束训练。In one implementation, the first authenticity judgment result and the second authenticity judgment result can be represented by probability values. For example, when it is true, the authenticity judgment result can be represented by 1; when it is false, the authenticity judgment result can be Represented by 0. When the difference between the second authenticity judgment result of the first virtual face generated image and the first authenticity judgment result of the face driven image data sample is less than the set threshold, it can be considered that the first virtual face generated image is The second authenticity discrimination result matches the first authenticity discrimination result of the face-driven image data sample, and the training can be ended at this time.
和/或,将虚拟人脸图像样本和第一人脸驱动数据生成图像输入预先训练好的判别器中,得到虚拟人脸图像样本的第三真伪判别结果和第一人脸驱动数据生成图像的第四真伪判别结果;基于第三真伪判别结果和第四真伪判别结果,调整第一子脸部处理模型的模型参数信息,直至第一人脸驱动数据生成图像的第四真伪判别结果与虚拟人脸图像样本的第三真伪判别结果相匹配。and/or, input the virtual face image sample and the first face-driven data generated image into a pre-trained discriminator to obtain the third authenticity discrimination result of the virtual face image sample and the first face-driven data generated image The fourth authenticity discrimination result; based on the third authenticity discrimination result and the fourth authenticity discrimination result, adjust the model parameter information of the first sub-face processing model until the first face driving data generates the fourth authenticity of the image The discrimination result matches the third authenticity discrimination result of the virtual face image sample.
这里,基于虚拟人脸图像样本的第三真伪判别结果和第一人脸驱动数据生成图像的第四真伪判别结果,调整第一子脸部处理模型的模型参数信息可以参照前述过程,这里不再详述。Here, based on the third authenticity discrimination result of the virtual face image sample and the fourth authenticity discrimination result of the first face driving data generated image, the model parameter information of the first sub-face processing model can be adjusted by referring to the aforementioned process, here No more details.
上面介绍了第一子脸部处理模型的训练过程,下面介绍第二子脸部处理模型的训练过程。参见图7所示的第二子脸部处理模型的训练流程图,第二子脸部处理模型通过以下步骤训练得到:The training process of the first sub-face processing model is introduced above, and the training process of the second sub-face processing model is introduced below. Referring to the training flow chart of the second sub-face processing model shown in Figure 7, the second sub-face processing model is trained through the following steps:
S701:将所述虚拟人脸图像样本输入至所述训练完成的第一子脸部处理模型的所述编码器中,得到第三虚拟表情特征;S701: Input the virtual face image sample into the encoder of the trained first sub-face processing model to obtain a third virtual expression feature;
S702:将所述第三虚拟表情特征输入至所述第二子脸部处理模型中,得到所述虚拟人脸图像样本对应的预测BS系数;S702: Input the third virtual expression feature into the second sub-face processing model to obtain the predicted BS coefficient corresponding to the virtual face image sample;
S703:基于所述预测BS系数以及所述虚拟人脸图像样本对应的已知的BS系数,确定第五损失信息;S703: Determine the fifth loss information based on the predicted BS coefficient and the known BS coefficient corresponding to the virtual face image sample;
S704:基于所述第五损失信息,调整所述第二子脸部处理模型的模型参数信息。S704: Based on the fifth loss information, adjust the model parameter information of the second sub-face processing model.
这里,第一子脸部处理模型是训练完成的,经过训练完成的第一子脸部处理模型的编码器对虚拟人脸图像样本的编码,得到的第三虚拟表情特征更加准确。Here, the first sub-face processing model is trained. After the encoder of the trained first sub-face processing model encodes the virtual face image sample, the third virtual expression feature obtained is more accurate.
这里,第二子脸部处理模型可以是DNN,DNN可以基于第三虚拟表情特征预测得到虚拟人脸图像样本对应的预测BS系数。Here, the second sub-face processing model may be a DNN, and the DNN may predict based on the third virtual expression feature to obtain the predicted BS coefficient corresponding to the virtual face image sample.
虚拟人脸图像样本对应的已知的BS系数指前述生成虚拟人脸图像样本所使用的BS系 数。这里,可以基于预测BS系数以及虚拟人脸图像样本对应的已知的BS系数,计算均方误差(Mean Square Error,MSE)损失,即第五损失信息。The known BS coefficient corresponding to the virtual face image sample refers to the BS system used to generate the virtual face image sample. number. Here, the mean square error (MSE) loss, that is, the fifth loss information, can be calculated based on the predicted BS coefficient and the known BS coefficient corresponding to the virtual face image sample.
基于第五损失信息,调整第二子脸部处理模型的模型参数信息,得到训练完成的第二子脸部处理模型。Based on the fifth loss information, the model parameter information of the second sub-face processing model is adjusted to obtain the trained second sub-face processing model.
具体实施中,参见图8所示的另一种第二子脸部处理模型的训练流程图,将虚拟人脸图像样本输入至训练完成的第一子脸部处理模型的编码器Encoder中,得到第三虚拟表情特征,将第三虚拟表情特征输入至DNN中,得到虚拟人脸图像样本对应的预测BS系数;然后,基于预测BS系数以及所述虚拟人脸图像样本对应的已知的BS系数,确定第五损失信息。In the specific implementation, refer to the training flow chart of another second sub-face processing model shown in Figure 8, input the virtual face image sample into the encoder Encoder of the trained first sub-face processing model, and obtain The third virtual expression feature, input the third virtual expression feature into the DNN to obtain the predicted BS coefficient corresponding to the virtual face image sample; then, based on the predicted BS coefficient and the known BS coefficient corresponding to the virtual face image sample , determine the fifth loss information.
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above-mentioned methods of specific embodiments, the writing order of each step does not mean a strict execution order and does not constitute any limitation on the implementation process. The specific execution order of each step should be based on its function and possible The internal logic is determined.
基于同一发明构思,本公开实施例中还提供了与面部表情捕捉方法对应的面部表情捕捉装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述面部表情捕捉方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。Based on the same inventive concept, the embodiments of the present disclosure also provide a facial expression capture device corresponding to the facial expression capture method. Since the principle of solving the problem of the device in the embodiment of the present disclosure is similar to the above-mentioned facial expression capture method of the embodiment of the present disclosure, therefore For the implementation of the device, please refer to the implementation of the method, and repeated details will not be repeated.
参照图9所示,为本公开实施例提供的一种面部表情捕捉装置的结构示意图,所述装置包括:第一获取模块901、第一提取模块902、第二提取模块903、生成模块904;其中,Referring to Figure 9, which is a schematic structural diagram of a facial expression capturing device provided by an embodiment of the present disclosure, the device includes: a first acquisition module 901, a first extraction module 902, a second extraction module 903, and a generation module 904; in,
第一获取模块901,用于获取人脸驱动图像数据;The first acquisition module 901 is used to acquire face driven image data;
第一提取模块902,用于对所述人脸驱动图像数据进行第一特征提取,得到所述人脸驱动图像数据的第一表情特征;The first extraction module 902 is used to perform first feature extraction on the face-driven image data to obtain the first expression feature of the face-driven image data;
第二提取模块903,用于基于所述第一表情特征生成虚拟人脸图像,并对所述虚拟人脸图像进行第二特征提取,得到所述虚拟人脸图像的第二表情特征;The second extraction module 903 is used to generate a virtual face image based on the first expression feature, and perform second feature extraction on the virtual face image to obtain the second expression feature of the virtual face image;
生成模块904,用于基于所述第二表情特征,生成形状融合BS系数;所述BS系数用于输入三维游戏引擎,生成三维虚拟人脸模型。The generation module 904 is configured to generate shape fusion BS coefficients based on the second expression feature; the BS coefficients are used to input into a three-dimensional game engine to generate a three-dimensional virtual face model.
一种可选的实施方式中,所述BS系数的生成过程为将所述人脸驱动图像数据输入预先训练的脸部处理模型后执行的;In an optional implementation, the generation process of the BS coefficient is performed after inputting the face driving image data into a pre-trained face processing model;
所述脸部处理模型包括第一子脸部处理模型和第二子脸部处理模型;所述第一子脸部处理模型用于基于所述人脸驱动图像数据输出所述第二表情特征,所述第二子脸部处理模型用于基于所述第二表情特征得到所述BS系数;The face processing model includes a first sub-face processing model and a second sub-face processing model; the first sub-face processing model is used to output the second expression feature based on the face driving image data, The second sub-face processing model is used to obtain the BS coefficient based on the second expression feature;
所述第一子脸部处理模型包括编码器、第一解码器和第二解码器,所述编码器用于对图像进行特征提取得到表情特征,所述第一解码器用于对表情特征进行解码得到虚拟人脸生成图像,所述第二解码器用于对表情特征进行解码得到人脸驱动数据生成图像。The first sub-face processing model includes an encoder, a first decoder and a second decoder. The encoder is used to extract features from the image to obtain expression features. The first decoder is used to decode the expression features to obtain The virtual face generates an image, and the second decoder is used to decode expression features to obtain the face driven data generated image.
一种可选的实施方式中,所述装置还包括:In an optional implementation, the device further includes:
第二获取模块,用于获取人脸驱动图像数据样本、以及虚拟人脸图像样本;The second acquisition module is used to acquire face-driven image data samples and virtual face image samples;
第一输入模块,用于将所述人脸驱动图像数据样本经所述第一子脸部处理模型的编码器编码得到第一控制人员表情特征,将所述第一控制人员表情特征输入所述第一子脸部处理模型的第一解码器,得到第一虚拟人脸生成图像;以及,将所述虚拟人脸图像样本经所述第一子脸部处理模型的编码器编码得到第一虚拟表情特征,将所述第一虚拟表情特征输 入所述第一子脸部处理模型的第二解码器,得到第一人脸驱动数据生成图像;The first input module is used to encode the face driven image data sample through the encoder of the first sub-face processing model to obtain the first control person's expression features, and input the first control person's expression features into the The first decoder of the first sub-face processing model obtains the first virtual face generated image; and, the virtual face image sample is encoded by the encoder of the first sub-face processing model to obtain the first virtual face image. Expression features, input the first virtual expression features Enter the second decoder of the first sub-face processing model to obtain the first face driving data to generate an image;
编码模块,用于将所述第一虚拟人脸生成图像经所述编码器编码得到第二虚拟表情特征;以及,将所述第一人脸驱动数据生成图像经所述编码器编码得到第二控制人员表情特征;An encoding module, configured to encode the first virtual face generated image through the encoder to obtain a second virtual expression feature; and encode the first face driving data generated image through the encoder to obtain a second virtual expression feature. Controller’s facial expression characteristics;
第一调整模块,用于基于所述第一控制人员表情特征和所述第二虚拟表情特征之间的第一损失信息,以及所述第一虚拟表情特征和所述第二控制人员表情特征之间的第二损失信息,调整所述第一子脸部处理模型的模型参数信息。The first adjustment module is configured to based on the first loss information between the first controller expression feature and the second virtual expression feature, and the first loss information between the first virtual expression feature and the second controller expression feature. The second loss information between the sub-face processing models is used to adjust the model parameter information of the first sub-face processing model.
一种可选的实施方式中,所述装置还包括:In an optional implementation, the device further includes:
第二输入模块,用于在将所述人脸驱动图像数据样本经所述第一子脸部处理模型的编码器编码得到第一控制人员表情特征之后,将所述第一控制人员表情特征输入所述第一子脸部处理模型的所述第二解码器,得到第二人脸驱动数据生成图像;The second input module is used to input the expression features of the first control person after encoding the face driving image data sample through the encoder of the first sub-face processing model to obtain the expression features of the first control person. The second decoder of the first sub-face processing model obtains the second face driving data to generate an image;
第一确定模块,用于基于所述人脸驱动图像数据样本的第一图像信息和所述第二人脸驱动数据生成图像的第二图像信息,确定第三损失信息;A first determination module configured to determine third loss information based on the first image information of the face-driven image data sample and the second image information of the second face-driven data generated image;
所述装置还包括:The device also includes:
第三输入模块,用于在将所述虚拟人脸图像样本经所述第一子脸部处理模型的编码器编码得到第一虚拟表情特征之后,将所述第一虚拟表情特征输入所述第一子脸部处理模型的所述第一解码器,得到第二虚拟人脸生成图像;The third input module is configured to input the first virtual expression feature into the third virtual expression feature after encoding the virtual face image sample through the encoder of the first sub-face processing model to obtain the first virtual expression feature. The first decoder of a sub-face processing model obtains a second virtual face generated image;
第二确定模块,用于基于所述虚拟人脸图像样本的第三图像信息和所述第二虚拟人脸生成图像的第四图像信息,确定第四损失信息;a second determination module configured to determine fourth loss information based on the third image information of the virtual face image sample and the fourth image information of the second virtual face generated image;
所述第一调整模块,具体用于:The first adjustment module is specifically used for:
基于所述第一控制人员表情特征和所述第二虚拟表情特征之间的第一损失信息、所述第一虚拟表情特征和所述第二控制人员表情特征之间的第二损失信息、所述第三损失信息、所述第四损失信息,调整所述第一子脸部处理模型的模型参数信息。Based on the first loss information between the first controller expression feature and the second virtual expression feature, the second loss information between the first virtual expression feature and the second controller expression feature, the The third loss information and the fourth loss information are used to adjust model parameter information of the first sub-face processing model.
一种可选的实施方式中,第一确定模块,具体用于:In an optional implementation, the first determination module is specifically used to:
基于所述人脸驱动图像数据样本中各个像素点的第一像素值信息和所述第二人脸驱动数据生成图像的中各个像素点的第二像素值信息,确定第一图像质量损失信息;Determine first image quality loss information based on the first pixel value information of each pixel point in the face driven image data sample and the second pixel value information of each pixel point in the second face driven data generated image;
基于所述人脸驱动图像数据样本的第一亮度信息和所述第二人脸驱动数据生成图像的第二亮度信息,确定图像亮度损失信息;determining image brightness loss information based on first brightness information of the face driven image data sample and second brightness information of the second face driven data generated image;
基于所述人脸驱动图像数据样本的第一对比度信息和所述第二人脸驱动数据生成图像的第二对比度信息,确定图像对比度损失信息;determining image contrast loss information based on first contrast information of the face driven image data sample and second contrast information of the second face driven data generated image;
基于所述人脸驱动图像数据样本的第一结构信息和所述第二人脸驱动数据生成图像的第二结构信息,确定图像结构损失信息;Determine image structure loss information based on the first structural information of the face-driven image data sample and the second structural information of the second face-driven data generated image;
基于所述图像亮度损失信息、所述图像对比度损失信息和所述图像结构损失信息,确定所述第二图像质量损失信息;determining the second image quality loss information based on the image brightness loss information, the image contrast loss information and the image structure loss information;
基于所述第一图像质量损失信息和所述第二图像质量损失信息,确定第三损失信息。Third loss information is determined based on the first image quality loss information and the second image quality loss information.
一种可选的实施方式中,所述装置还包括:In an optional implementation, the device further includes:
第四输入模块,用于在得到所述第一虚拟人脸生成图像和所述第一人脸驱动数据生成图像之后,将所述人脸驱动图像数据样本和所述第一虚拟人脸生成图像输入预先训练好的 判别器中,得到所述人脸驱动图像数据样本的第一真伪判别结果和所述第一虚拟人脸生成图像的第二真伪判别结果;基于所述第一真伪判别结果和所述第二真伪判别结果,调整所述第一子脸部处理模型的模型参数信息,直至所述第一虚拟人脸生成图像的第二真伪判别结果与所述人脸驱动图像数据样本的第一真伪判别结果相匹配;The fourth input module is used to, after obtaining the first virtual face generated image and the first face driven data generated image, combine the face driven image data sample and the first virtual face generated image Enter pre-trained In the discriminator, a first authenticity discrimination result of the face driven image data sample and a second authenticity discrimination result of the first virtual face generated image are obtained; based on the first authenticity discrimination result and the For the second authenticity judgment result, the model parameter information of the first sub-face processing model is adjusted until the second authenticity judgment result of the first virtual face generated image is consistent with the second authenticity judgment result of the face driven image data sample. One authenticity judgment result matches;
和/或,and / or,
将所述虚拟人脸图像样本和所述第一人脸驱动数据生成图像输入预先训练好的判别器中,得到所述虚拟人脸图像样本的第三真伪判别结果和所述第一人脸驱动数据生成图像的第四真伪判别结果;基于所述第三真伪判别结果和所述第四真伪判别结果,调整所述第一子脸部处理模型的模型参数信息,直至所述第一人脸驱动数据生成图像的第四真伪判别结果与所述虚拟人脸图像样本的第三真伪判别结果相匹配。Input the virtual face image sample and the first face driving data generated image into a pre-trained discriminator to obtain the third authenticity judgment result of the virtual face image sample and the first face driving data to generate a fourth authenticity judgment result of the image; based on the third authenticity judgment result and the fourth authenticity judgment result, adjust the model parameter information of the first sub-face processing model until the third authenticity judgment result The fourth authenticity judgment result of a face-driven data generated image matches the third authenticity judgment result of the virtual face image sample.
一种可选的实施方式中,所述装置还包括:In an optional implementation, the device further includes:
第五输入模块,用于将所述虚拟人脸图像样本输入至所述训练完成的第一子脸部处理模型的所述编码器中,得到所述第三虚拟表情特征;A fifth input module, configured to input the virtual face image sample into the encoder of the trained first sub-face processing model to obtain the third virtual expression feature;
第六输入模块,用于将所述第三虚拟表情特征输入至所述第二子脸部处理模型中,得到所述虚拟人脸图像样本对应的预测BS系数;A sixth input module, used to input the third virtual expression feature into the second sub-face processing model to obtain the predicted BS coefficient corresponding to the virtual face image sample;
第三确定模块,用于基于所述预测BS系数以及所述虚拟人脸图像样本对应的已知的BS系数,确定第五损失信息;A third determination module, configured to determine fifth loss information based on the predicted BS coefficient and the known BS coefficient corresponding to the virtual face image sample;
第二调整模块,用于基于所述第五损失信息,调整所述第二子脸部处理模型的模型参数信息。A second adjustment module, configured to adjust model parameter information of the second sub-face processing model based on the fifth loss information.
一种可选的实施方式中,第二获取模块,具体用于:In an optional implementation, the second acquisition module is specifically used for:
获取原始人脸驱动图像数据、以及原始虚拟人脸图像;Obtain the original face driver image data and the original virtual face image;
分别对所述原始人脸驱动图像数据和所述原始虚拟人脸图像进行增广处理,得到增广后人脸驱动图像数据样本和增广后虚拟人脸图像样本。Perform augmentation processing on the original face driven image data and the original virtual face image respectively to obtain augmented face driven image data samples and augmented virtual face image samples.
分别对所述增广后人脸驱动图像数据和所述增广后虚拟人脸图像进行分割处理,得到包含第一脸部区域的人脸驱动图像数据样本和第二脸部区域的虚拟人脸图像样本。Perform segmentation processing on the augmented face-driven image data and the augmented virtual face image respectively to obtain a face-driven image data sample containing the first face area and a virtual face of the second face area. Image sample.
一种可选的实施方式中,第二获取模块,具体用于:In an optional implementation, the second acquisition module is specifically used for:
分别对所述增广后人脸驱动图像数据和所述增广后虚拟人脸图像依次进行人脸检测、人脸关键点检测处理和人脸配准处理,确定增广后人脸驱动图像数据的第一脸部区域和所述增广后虚拟人脸图像的第二脸部区域;Perform face detection, face key point detection processing and face registration processing on the augmented face-driven image data and the augmented virtual face image respectively to determine the augmented face-driven image data The first face area and the second face area of the augmented virtual face image;
基于所述第一脸部区域,对所述增广后人脸驱动图像数据进行分割处理,得到包含所述第一脸部区域的人脸驱动图像数据样本,以及基于所述第二脸部区域,对所述增广后虚拟人脸图像进行分割处理,得到包含所述第二脸部区域的虚拟人脸图像样本。Based on the first face area, perform segmentation processing on the augmented face-driven image data to obtain face-driven image data samples including the first face area, and based on the second face area , perform segmentation processing on the augmented virtual face image to obtain a virtual face image sample including the second face area.
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。For a description of the processing flow of each module in the device and the interaction flow between the modules, please refer to the relevant descriptions in the above method embodiments, and will not be described in detail here.
基于同一技术构思,本公开实施例还提供了一种计算机设备。参照图10所示,为本公开实施例提供的计算机设备1000的结构示意图,包括处理器1001、存储器1002、和总线1003。其中,存储器1002用于存储执行指令,包括内存10021和外部存储器10022;这里的内存10021也称内存储器,用于暂时存放处理器1001中的运算数据,以及与硬盘等外部 存储器10022交换的数据,处理器1001通过内存10021与外部存储器10022进行数据交换,当计算机设备1000运行时,处理器1001与存储器1002之间通过总线1003通信,使得处理器1001在执行以下指令:Based on the same technical concept, embodiments of the present disclosure also provide a computer device. Referring to FIG. 10 , a schematic structural diagram of a computer device 1000 provided for an embodiment of the present disclosure includes a processor 1001 , a memory 1002 , and a bus 1003 . Among them, the memory 1002 is used to store execution instructions, including the memory 10021 and the external memory 10022; the memory 10021 here is also called the internal memory, and is used to temporarily store the operation data in the processor 1001, and to communicate with external devices such as hard disks. The processor 1001 exchanges data with the external memory 10022 through the memory 10022. When the computer device 1000 is running, the processor 1001 and the memory 1002 communicate through the bus 1003, so that the processor 1001 executes the following instructions:
获取人脸驱动图像数据;Obtain face-driven image data;
对所述人脸驱动图像数据进行第一特征提取,得到所述人脸驱动图像数据的第一表情特征;Perform first feature extraction on the face-driven image data to obtain the first expression feature of the face-driven image data;
基于所述第一表情特征生成虚拟人脸图像,并对所述虚拟人脸图像进行第二特征提取,得到所述虚拟人脸图像的第二表情特征;Generate a virtual face image based on the first expression feature, and perform second feature extraction on the virtual face image to obtain the second expression feature of the virtual face image;
基于所述第二表情特征,生成形状融合BS系数;所述BS系数用于输入三维游戏引擎,生成三维虚拟人脸模型。Based on the second expression feature, shape fusion BS coefficients are generated; the BS coefficients are used to input into a three-dimensional game engine to generate a three-dimensional virtual face model.
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的面部表情捕捉方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。Embodiments of the present disclosure also provide a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is run by a processor, the steps of the facial expression capturing method described in the above method embodiment are executed. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.
本公开实施例还提供一种计算机程序产品,该计算机产品承载有程序代码,所述程序代码包括的指令可用于执行上述方法实施例中所述的面部表情捕捉方法的步骤,具体可参见上述方法实施例,在此不再赘述。Embodiments of the present disclosure also provide a computer program product. The computer product carries program code. The instructions included in the program code can be used to execute the steps of the facial expression capturing method described in the above method embodiments. For details, please refer to the above method. The embodiments will not be described again here.
其中,上述计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。Among them, the above-mentioned computer program product can be specifically implemented by hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium. In another optional embodiment, the computer program product is embodied as a software product, such as a Software Development Kit (SDK), etc. wait.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working process of the device described above can be referred to the corresponding process in the foregoing method embodiment, and will not be described again here. In the several embodiments provided in this disclosure, it should be understood that the disclosed devices and methods can be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方 法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium that is executable by a processor. Based on this understanding, the technical solution of the present disclosure is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments of the present disclosure. all or part of the process. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code. .
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以所述权利要求的保护范围为准。 Finally, it should be noted that the above-mentioned embodiments are only specific implementations of the present disclosure and are used to illustrate the technical solutions of the present disclosure rather than to limit them. The protection scope of the present disclosure is not limited thereto. Although refer to the foregoing The embodiments describe the present disclosure in detail. Those of ordinary skill in the art should understand that any person familiar with the technical field can still modify the technical solutions recorded in the foregoing embodiments within the technical scope disclosed in the present disclosure. Changes may be easily imagined, or equivalent substitutions may be made to some of the technical features; and these modifications, changes or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and shall be included in the present disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be subject to the protection scope of the claims.

Claims (13)

  1. 一种面部表情捕捉方法,包括:A method of capturing facial expressions, including:
    获取人脸驱动图像数据;Obtain face-driven image data;
    对所述人脸驱动图像数据进行第一特征提取,得到所述人脸驱动图像数据的第一表情特征;Perform first feature extraction on the face-driven image data to obtain the first expression feature of the face-driven image data;
    基于所述第一表情特征生成虚拟人脸图像,并对所述虚拟人脸图像进行第二特征提取,得到所述虚拟人脸图像的第二表情特征;Generate a virtual face image based on the first expression feature, and perform second feature extraction on the virtual face image to obtain the second expression feature of the virtual face image;
    基于所述第二表情特征,生成形状融合BS系数;所述BS系数用于输入三维游戏引擎,生成三维虚拟人脸模型。Based on the second expression feature, shape fusion BS coefficients are generated; the BS coefficients are used to input into a three-dimensional game engine to generate a three-dimensional virtual face model.
  2. 根据权利要求1所述的方法,其中,所述BS系数的生成过程为将所述人脸驱动图像数据输入预先训练的脸部处理模型后执行的;The method according to claim 1, wherein the generation process of the BS coefficient is performed after inputting the face driving image data into a pre-trained face processing model;
    所述脸部处理模型包括第一子脸部处理模型和第二子脸部处理模型;所述第一子脸部处理模型用于基于所述人脸驱动图像数据输出所述第二表情特征,所述第二子脸部处理模型用于基于所述第二表情特征得到所述BS系数;The face processing model includes a first sub-face processing model and a second sub-face processing model; the first sub-face processing model is used to output the second expression feature based on the face driving image data, The second sub-face processing model is used to obtain the BS coefficient based on the second expression feature;
    所述第一子脸部处理模型包括编码器、第一解码器和第二解码器,所述编码器用于对图像进行特征提取得到表情特征,所述第一解码器用于对表情特征进行解码得到虚拟人脸生成图像,所述第二解码器用于对表情特征进行解码得到人脸驱动数据生成图像。The first sub-face processing model includes an encoder, a first decoder and a second decoder. The encoder is used to extract features from the image to obtain expression features. The first decoder is used to decode the expression features to obtain The virtual face generates an image, and the second decoder is used to decode expression features to obtain the face driven data generated image.
  3. 根据权利要求2所述的方法,其中,所述第一子脸部处理模型通过以下步骤训练得到:The method according to claim 2, wherein the first sub-face processing model is trained through the following steps:
    获取人脸驱动图像数据样本、以及虚拟人脸图像样本;Obtain face-driven image data samples and virtual face image samples;
    将所述人脸驱动图像数据样本经所述第一子脸部处理模型的编码器编码得到第一控制人员表情特征,将所述第一控制人员表情特征输入所述第一子脸部处理模型的第一解码器,得到第一虚拟人脸生成图像;以及,将所述虚拟人脸图像样本经所述第一子脸部处理模型的编码器编码得到第一虚拟表情特征,将所述第一虚拟表情特征输入所述第一子脸部处理模型的第二解码器,得到第一人脸驱动数据生成图像;The face driven image data sample is encoded by the encoder of the first sub-face processing model to obtain the first control person's expression features, and the first control person's expression features are input into the first sub-face processing model The first decoder obtains the first virtual face generated image; and, the virtual face image sample is encoded by the encoder of the first sub-face processing model to obtain the first virtual expression feature, and the first virtual expression feature is obtained by encoding the virtual face image sample A virtual expression feature is input into the second decoder of the first sub-face processing model to obtain the first face driving data to generate an image;
    将所述第一虚拟人脸生成图像经所述编码器编码得到第二虚拟表情特征;以及,将所述第一人脸驱动数据生成图像经所述编码器编码得到第二控制人员表情特征;The first virtual face generated image is encoded by the encoder to obtain the second virtual expression feature; and, the first face driving data generated image is encoded by the encoder to obtain the second controller expression feature;
    基于所述第一控制人员表情特征和所述第二虚拟表情特征之间的第一损失信息,以及所述第一虚拟表情特征和所述第二控制人员表情特征之间的第二损失信息,调整所述第一子脸部处理模型的模型参数信息。Based on the first loss information between the first controller expression feature and the second virtual expression feature, and the second loss information between the first virtual expression feature and the second controller expression feature, Adjust model parameter information of the first sub-face processing model.
  4. 根据权利要求3所述的方法,其中,在将所述人脸驱动图像数据样本经所述第一子脸部处理模型的编码器编码得到第一控制人员表情特征之后,所述方法还包括:The method according to claim 3, wherein after encoding the face driven image data sample through the encoder of the first sub-face processing model to obtain the first control person expression characteristics, the method further includes:
    将所述第一控制人员表情特征输入所述第一子脸部处理模型的所述第二解码器,得到第二人脸驱动数据生成图像;Input the expression characteristics of the first controller into the second decoder of the first sub-face processing model to obtain a second face-driven data generated image;
    基于所述人脸驱动图像数据样本的第一图像信息和所述第二人脸驱动数据生成图像的第二图像信息,确定第三损失信息;determining third loss information based on the first image information of the face-driven image data sample and the second image information of the second face-driven data generated image;
    在将所述虚拟人脸图像样本经所述第一子脸部处理模型的编码器编码得到第一虚 拟表情特征之后,所述方法还包括:After encoding the virtual face image sample through the encoder of the first sub-face processing model, a first virtual face image sample is obtained. After generating pseudo-expression features, the method also includes:
    将所述第一虚拟表情特征输入所述第一子脸部处理模型的所述第一解码器,得到第二虚拟人脸生成图像;Input the first virtual expression feature into the first decoder of the first sub-face processing model to obtain a second virtual face generated image;
    基于所述虚拟人脸图像样本的第三图像信息和所述第二虚拟人脸生成图像的第四图像信息,确定第四损失信息;Determine fourth loss information based on the third image information of the virtual face image sample and the fourth image information of the second virtual face generated image;
    所述基于所述第一控制人员表情特征和所述第二虚拟表情特征之间的第一损失信息,以及所述第一虚拟表情特征和所述第二控制人员表情特征之间的第二损失信息,调整所述第一子脸部处理模型的模型参数信息,包括:The first loss information based on the first controller expression feature and the second virtual expression feature, and the second loss between the first virtual expression feature and the second controller expression feature Information to adjust the model parameter information of the first sub-face processing model, including:
    基于所述第一控制人员表情特征和所述第二虚拟表情特征之间的第一损失信息、所述第一虚拟表情特征和所述第二控制人员表情特征之间的第二损失信息、所述第三损失信息、所述第四损失信息,调整所述第一子脸部处理模型的模型参数信息。Based on the first loss information between the first controller expression feature and the second virtual expression feature, the second loss information between the first virtual expression feature and the second controller expression feature, the The third loss information and the fourth loss information are used to adjust model parameter information of the first sub-face processing model.
  5. 根据权利要求4所述的方法,其中,所述基于所述人脸驱动图像数据样本的第一图像信息和所述第二人脸驱动数据生成图像的第二图像信息,确定第三损失信息,包括:The method of claim 4, wherein the third loss information is determined based on the first image information of the face driven image data sample and the second image information of the second face driven data generated image, include:
    基于所述人脸驱动图像数据样本中各个像素点的第一像素值信息和所述第二人脸驱动数据生成图像的中各个像素点的第二像素值信息,确定第一图像质量损失信息;Determine first image quality loss information based on the first pixel value information of each pixel point in the face driven image data sample and the second pixel value information of each pixel point in the second face driven data generated image;
    基于所述人脸驱动图像数据样本的第一亮度信息和所述第二人脸驱动数据生成图像的第二亮度信息,确定图像亮度损失信息;determining image brightness loss information based on first brightness information of the face driven image data sample and second brightness information of the second face driven data generated image;
    基于所述人脸驱动图像数据样本的第一对比度信息和所述第二人脸驱动数据生成图像的第二对比度信息,确定图像对比度损失信息;determining image contrast loss information based on first contrast information of the face driven image data sample and second contrast information of the second face driven data generated image;
    基于所述人脸驱动图像数据样本的第一结构信息和所述第二人脸驱动数据生成图像的第二结构信息,确定图像结构损失信息;Determine image structure loss information based on the first structural information of the face-driven image data sample and the second structural information of the second face-driven data generated image;
    基于所述图像亮度损失信息、所述图像对比度损失信息和所述图像结构损失信息,确定所述第二图像质量损失信息;determining the second image quality loss information based on the image brightness loss information, the image contrast loss information and the image structure loss information;
    基于所述第一图像质量损失信息和所述第二图像质量损失信息,确定第三损失信息。Third loss information is determined based on the first image quality loss information and the second image quality loss information.
  6. 根据权利要求3所述的方法,其中,得到所述第一虚拟人脸生成图像和所述第一人脸驱动数据生成图像之后,所述方法还包括:The method according to claim 3, wherein after obtaining the first virtual face generated image and the first face driving data generated image, the method further includes:
    将所述人脸驱动图像数据样本和所述第一虚拟人脸生成图像输入预先训练好的判别器中,得到所述人脸驱动图像数据样本的第一真伪判别结果和所述第一虚拟人脸生成图像的第二真伪判别结果;基于所述第一真伪判别结果和所述第二真伪判别结果,调整所述第一子脸部处理模型的模型参数信息,直至所述第一虚拟人脸生成图像的第二真伪判别结果与所述人脸驱动图像数据样本的第一真伪判别结果相匹配;The face driven image data sample and the first virtual face generated image are input into a pre-trained discriminator to obtain the first authenticity discrimination result of the face driven image data sample and the first virtual face generated image. The second authenticity judgment result of the human face generated image; based on the first authenticity judgment result and the second authenticity judgment result, adjust the model parameter information of the first sub-face processing model until the second authenticity judgment result The second authenticity judgment result of a virtual face generated image matches the first authenticity judgment result of the face driven image data sample;
    和/或,and / or,
    将所述虚拟人脸图像样本和所述第一人脸驱动数据生成图像输入预先训练好的判别器中,得到所述虚拟人脸图像样本的第三真伪判别结果和所述第一人脸驱动数据生成图像的第四真伪判别结果;基于所述第三真伪判别结果和所述第四真伪判别结果,调整所述第一子脸部处理模型的模型参数信息,直至所述第一人脸驱动数据生成图像 的第四真伪判别结果与所述虚拟人脸图像样本的第三真伪判别结果相匹配。Input the virtual face image sample and the first face driving data generated image into a pre-trained discriminator to obtain the third authenticity judgment result of the virtual face image sample and the first face driving data to generate a fourth authenticity judgment result of the image; based on the third authenticity judgment result and the fourth authenticity judgment result, adjust the model parameter information of the first sub-face processing model until the third authenticity judgment result A face-driven data-generated image The fourth authenticity judgment result matches the third authenticity judgment result of the virtual face image sample.
  7. 根据权利要求2所述的方法,其中,所述第二子脸部处理模型通过以下步骤训练得到:The method according to claim 2, wherein the second sub-face processing model is trained through the following steps:
    将所述虚拟人脸图像样本输入至所述训练完成的第一子脸部处理模型的所述编码器中,得到所述第三虚拟表情特征;Input the virtual face image sample into the encoder of the trained first sub-face processing model to obtain the third virtual expression feature;
    将所述第三虚拟表情特征输入至所述第二子脸部处理模型中,得到所述虚拟人脸图像样本对应的预测BS系数;Input the third virtual expression feature into the second sub-face processing model to obtain the predicted BS coefficient corresponding to the virtual face image sample;
    基于所述预测BS系数以及所述虚拟人脸图像样本对应的已知的BS系数,确定第五损失信息;Determine fifth loss information based on the predicted BS coefficient and the known BS coefficient corresponding to the virtual face image sample;
    基于所述第五损失信息,调整所述第二子脸部处理模型的模型参数信息。Based on the fifth loss information, model parameter information of the second sub-face processing model is adjusted.
  8. 根据权利要求3所述的方法,其中,所述获取人脸驱动图像数据样本、以及虚拟人脸图像样本,包括:The method according to claim 3, wherein said obtaining face driven image data samples and virtual face image samples includes:
    获取原始人脸驱动图像数据、以及原始虚拟人脸图像;Obtain the original face driver image data and the original virtual face image;
    分别对所述原始人脸驱动图像数据和所述原始虚拟人脸图像进行增广处理,得到增广后人脸驱动图像数据样本和增广后虚拟人脸图像样本。Perform augmentation processing on the original face driven image data and the original virtual face image respectively to obtain augmented face driven image data samples and augmented virtual face image samples.
    分别对所述增广后人脸驱动图像数据和所述增广后虚拟人脸图像进行分割处理,得到包含第一脸部区域的人脸驱动图像数据样本和第二脸部区域的虚拟人脸图像样本。Perform segmentation processing on the augmented face-driven image data and the augmented virtual face image respectively to obtain a face-driven image data sample containing the first face area and a virtual face of the second face area. Image sample.
  9. 根据权利要求8所述的方法,其中,所述分别对所述增广后人脸驱动图像数据和所述增广后虚拟人脸图像进行分割处理,得到包含第一脸部区域的人脸驱动图像数据样本和第二脸部区域的虚拟人脸图像样本,包括:The method according to claim 8, wherein the segmentation processing is performed on the augmented face driver image data and the augmented virtual face image to obtain a face driver including the first facial region. Image data samples and virtual face image samples of the second face area include:
    分别对所述增广后人脸驱动图像数据和所述增广后虚拟人脸图像依次进行人脸检测、人脸关键点检测处理和人脸配准处理,确定增广后人脸驱动图像数据的第一脸部区域和所述增广后虚拟人脸图像的第二脸部区域;Perform face detection, face key point detection processing and face registration processing on the augmented face-driven image data and the augmented virtual face image respectively to determine the augmented face-driven image data The first face area and the second face area of the augmented virtual face image;
    基于所述第一脸部区域,对所述增广后人脸驱动图像数据进行分割处理,得到包含所述第一脸部区域的人脸驱动图像数据样本,以及基于所述第二脸部区域,对所述增广后虚拟人脸图像进行分割处理,得到包含所述第二脸部区域的虚拟人脸图像样本。Based on the first face area, perform segmentation processing on the augmented face-driven image data to obtain face-driven image data samples including the first face area, and based on the second face area , perform segmentation processing on the augmented virtual face image to obtain a virtual face image sample including the second face area.
  10. 一种面部表情捕捉装置,包括:A facial expression capturing device including:
    第一获取模块,用于获取人脸驱动图像数据;The first acquisition module is used to acquire face-driven image data;
    第一提取模块,用于对所述人脸驱动图像数据进行第一特征提取,得到所述人脸驱动图像数据的第一表情特征;A first extraction module, configured to perform first feature extraction on the face-driven image data to obtain the first expression feature of the face-driven image data;
    第二提取模块,用于基于所述第一表情特征生成虚拟人脸图像,并对所述虚拟人脸图像进行第二特征提取,得到所述虚拟人脸图像的第二表情特征;A second extraction module, configured to generate a virtual face image based on the first expression feature, and perform second feature extraction on the virtual face image to obtain the second expression feature of the virtual face image;
    生成模块,用于基于所述第二表情特征,生成形状融合BS系数;所述BS系数用于输入三维游戏引擎,生成三维虚拟人脸模型。A generation module, configured to generate shape fusion BS coefficients based on the second expression feature; the BS coefficients are used to input into a three-dimensional game engine to generate a three-dimensional virtual face model.
  11. 一种计算机设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当计算机设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如权利要求1至9任一项所述的面部表情捕捉方法的步骤。 A computer device, including: a processor, a memory, and a bus. The memory stores machine-readable instructions executable by the processor. When the computer device is running, the processor communicates with the memory through the bus. , when the machine readable instructions are executed by the processor, the steps of the facial expression capturing method according to any one of claims 1 to 9 are performed.
  12. 一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如权利要求1至9任一项所述的面部表情捕捉方法的步骤。A computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is run by a processor, the steps of the facial expression capturing method according to any one of claims 1 to 9 are executed.
  13. 一种计算机程序产品,该计算机产品承载有程序代码,所述程序代码包括的指令可用于执行如权利要求1至9任一项所述的面部表情捕捉方法的步骤。 A computer program product carries a program code, and the program code includes instructions that can be used to execute the steps of the facial expression capturing method according to any one of claims 1 to 9.
PCT/CN2023/080015 2022-03-30 2023-03-07 Facial expression capturing method and apparatus, computer device, and storage medium WO2023185395A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210326965.4 2022-03-30
CN202210326965.4A CN114677739A (en) 2022-03-30 2022-03-30 Facial expression capturing method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2023185395A1 true WO2023185395A1 (en) 2023-10-05

Family

ID=82076845

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/080015 WO2023185395A1 (en) 2022-03-30 2023-03-07 Facial expression capturing method and apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN114677739A (en)
WO (1) WO2023185395A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114677739A (en) * 2022-03-30 2022-06-28 北京字跳网络技术有限公司 Facial expression capturing method and device, computer equipment and storage medium
CN116188640B (en) * 2022-12-09 2023-09-08 北京百度网讯科技有限公司 Three-dimensional virtual image generation method, device, equipment and medium
CN117540789B (en) * 2024-01-09 2024-04-26 腾讯科技(深圳)有限公司 Model training method, facial expression migration method, device, equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108363973A (en) * 2018-02-07 2018-08-03 电子科技大学 A kind of unconfined 3D expressions moving method
CN108564641A (en) * 2018-03-16 2018-09-21 中国科学院自动化研究所 Expression method for catching and device based on UE engines
US10970907B1 (en) * 2019-07-02 2021-04-06 Facebook Technologies, Llc System and method for applying an expression to an avatar
CN113205449A (en) * 2021-05-21 2021-08-03 珠海金山网络游戏科技有限公司 Expression migration model training method and device and expression migration method and device
CN114677739A (en) * 2022-03-30 2022-06-28 北京字跳网络技术有限公司 Facial expression capturing method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108363973A (en) * 2018-02-07 2018-08-03 电子科技大学 A kind of unconfined 3D expressions moving method
CN108564641A (en) * 2018-03-16 2018-09-21 中国科学院自动化研究所 Expression method for catching and device based on UE engines
US10970907B1 (en) * 2019-07-02 2021-04-06 Facebook Technologies, Llc System and method for applying an expression to an avatar
CN113205449A (en) * 2021-05-21 2021-08-03 珠海金山网络游戏科技有限公司 Expression migration model training method and device and expression migration method and device
CN114677739A (en) * 2022-03-30 2022-06-28 北京字跳网络技术有限公司 Facial expression capturing method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN114677739A (en) 2022-06-28

Similar Documents

Publication Publication Date Title
WO2023185395A1 (en) Facial expression capturing method and apparatus, computer device, and storage medium
Chen et al. Fsrnet: End-to-end learning face super-resolution with facial priors
US11410457B2 (en) Face reenactment
Wang et al. Identity-and pose-robust facial expression recognition through adversarial feature learning
CN108596024B (en) Portrait generation method based on face structure information
WO2020258668A1 (en) Facial image generation method and apparatus based on adversarial network model, and nonvolatile readable storage medium and computer device
KR102605077B1 (en) Methods and systems for compositing realistic head rotations and facial animation on mobile devices
WO2023040679A1 (en) Fusion method and apparatus for facial images, and device and storage medium
US11475608B2 (en) Face image generation with pose and expression control
CN108198177A (en) Image acquiring method, device, terminal and storage medium
US11880957B2 (en) Few-shot image generation via self-adaptation
WO2020263541A1 (en) Portrait editing and synthesis
CN107423689B (en) Intelligent interactive face key point marking method
JP2022133378A (en) Face biological detection method, device, electronic apparatus, and storage medium
WO2023000895A1 (en) Image style conversion method and apparatus, electronic device and storage medium
CN115914505B (en) Video generation method and system based on voice-driven digital human model
WO2024051480A1 (en) Image processing method and apparatus, computer device, and storage medium
CN114127776A (en) Method and system for training generative confrontation network with formation data
CN109636867B (en) Image processing method and device and electronic equipment
CN111582066A (en) Heterogeneous face recognition model training method, face recognition method and related device
WO2023185398A1 (en) Facial processing method and apparatus, and computer device and storage medium
Xu et al. RelightGAN: Instance-level generative adversarial network for face illumination transfer
CN112200236A (en) Training method of face parameter recognition model and face parameter recognition method
CN115631285B (en) Face rendering method, device, equipment and storage medium based on unified driving
CN111275778A (en) Face sketch generating method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23777773

Country of ref document: EP

Kind code of ref document: A1