CN115393488B

CN115393488B - Method and device for driving virtual character expression, electronic equipment and storage medium

Info

Publication number: CN115393488B
Application number: CN202211338132.6A
Authority: CN
Inventors: 梁柏荣; 周航; 徐志良; 何栋梁; 刘经拓
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-10-28
Filing date: 2022-10-28
Publication date: 2023-03-03
Anticipated expiration: 2042-10-28
Also published as: CN115393488A

Abstract

The application discloses a driving method and device for virtual character expressions, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, specifically relates to the technical fields of augmented reality, virtual reality, computer vision, deep learning and the like, and can be applied to scenes such as a meta universe and virtual digital people. The specific scheme is as follows: acquiring a face image, and inputting the face image into a three-dimensional face model to obtain a first face state vector corresponding to the face image; inputting the first face state vector into a coefficient mapping model to obtain coefficients corresponding to a plurality of first mixed deformations of the face image; determining coefficients corresponding to a plurality of second mixed deformations of the virtual character according to the coefficients corresponding to the plurality of first mixed deformations respectively; and driving the expression of the virtual character according to the coefficients corresponding to the plurality of second mixed deformations. The method improves the accuracy of expression capture, improves the accuracy of expression drive of the virtual character, and improves the expressive force of the virtual character.

Description

Method and device for driving virtual character expression, electronic equipment and storage medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to the technical fields of augmented reality, virtual reality, computer vision, deep learning and the like, can be applied to scenes such as a meta universe and a virtual digital person, and particularly relates to a driving method and device for expressions of virtual characters, electronic equipment and a storage medium.

Background

With the development of artificial intelligence technology, the application scenes of the three-dimensional virtual character are more and more. The three-dimensional facial expression capturing technology is one of important technologies in three-dimensional virtual characters, and aims to acquire a facial expression and transfer the facial expression to the virtual character to realize controllable expression of the virtual character. Therefore, the accuracy of expression capture is crucial.

Disclosure of Invention

The application provides a driving method and device for virtual character expressions, electronic equipment and a storage medium. The specific scheme is as follows:

according to an aspect of the present application, there is provided a method for driving an expression of a virtual character, including:

acquiring a face image, and inputting the face image into a three-dimensional face model to obtain a first face state vector corresponding to the face image;

inputting the first human face state vector into a coefficient mapping model to obtain coefficients corresponding to a plurality of first mixed deformations corresponding to the human face image respectively, wherein the first mixed deformations are three-dimensional models for forming human face expressions;

determining coefficients corresponding to a plurality of second mixed deformations of the virtual character according to the coefficients corresponding to the first mixed deformations, wherein the second mixed deformations are three-dimensional models for forming the expression of the virtual character;

and driving the expression of the virtual character according to the coefficients corresponding to the plurality of second mixed deformations respectively.

According to another aspect of the present application, there is provided a driving apparatus of an expression of a virtual character, including:

the first acquisition module is used for acquiring a face image and inputting the face image into the three-dimensional face model to obtain a first face state vector corresponding to the face image;

the second acquisition module is used for inputting the first human face state vector into the coefficient mapping model so as to obtain coefficients corresponding to a plurality of first mixed deformations corresponding to the human face image respectively, wherein the first mixed deformations are three-dimensional models used for forming human face expressions;

the determining module is used for determining the coefficients corresponding to a plurality of second mixed deformations of the virtual character according to the coefficients corresponding to the first mixed deformations, wherein the second mixed deformations are three-dimensional models used for forming the expression of the virtual character;

and the driving module is used for driving the expressions of the virtual characters according to the coefficients respectively corresponding to the plurality of second mixed deformations.

According to another aspect of the present application, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of the above embodiments.

According to another aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to the above-described embodiments.

According to another aspect of the present application, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the steps of the method of the above-mentioned embodiment.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be considered limiting of the present application. Wherein:

fig. 1 is a schematic flowchart illustrating a method for driving an expression of a virtual character according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart illustrating a method for driving an expression of a virtual character according to another embodiment of the present application;

fig. 3 is a schematic flowchart illustrating a method for driving an expression of a virtual character according to another embodiment of the present application;

fig. 4 is a flowchart illustrating a method for driving an expression of a virtual character according to another embodiment of the present application;

fig. 5 is a schematic structural diagram of a driving apparatus for virtual character expression according to an embodiment of the present application;

fig. 6 is a block diagram of an electronic device for implementing a method for driving an expression of a virtual character according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application to assist in understanding, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

A method, an apparatus, an electronic device, and a storage medium for driving an expression of a virtual character according to an embodiment of the present application are described below with reference to the drawings.

Artificial intelligence is a subject of research that uses computers to simulate some human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), and has both hardware-level and software-level technologies. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology comprises a computer vision technology, a voice recognition technology, a natural language processing technology, deep learning, a big data processing technology, a knowledge map technology and the like.

The augmented reality technology is a technology for skillfully fusing virtual information and a real world, and widely applies various technical means such as multimedia, three-dimensional modeling, real-time tracking and registration, intelligent interaction, sensing and the like, and applies virtual information such as characters, images, three-dimensional models, music, videos and the like generated by a computer to the real world after analog simulation, wherein the two kinds of information supplement each other, thereby realizing the 'enhancement' of the real world.

The virtual reality technology comprises a computer, electronic information and simulation technology, and the basic implementation mode is that the computer technology is taken as the main technology, the latest development results of various high technologies such as three-dimensional graphic technology, multimedia technology, simulation technology, display technology, servo technology and the like are utilized and integrated, and a virtual world with various sense experiences such as vivid three-dimensional vision, touch, smell and the like is generated by means of a computer and other equipment, so that people in the virtual world can generate an immersive feeling.

Computer vision is a science for researching how to make a machine "see", and means that a camera and a computer are used to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect.

Deep learning is a new research direction in the field of machine learning. Deep learning is the intrinsic law and expression level of learning sample data, and information obtained in the learning process is very helpful for interpretation of data such as characters, images and sounds. The final aim of the method is to enable the machine to have the analysis and learning capability like a human, and to recognize data such as characters, images and sounds.

In the related art, three-dimensional facial expression capturing mainly performs three-dimensional face reconstruction through key points, so that expression capturing capability of the expression is limited by expression capability of the key points, and some expressions may not be expressed.

Based on this, an embodiment of the present application provides a method for driving an expression of a virtual character, in which a face state vector in a face image is obtained by using a three-dimensional face model, the face state vector is mapped by using a coefficient mapping model to obtain coefficients corresponding to a plurality of first mixed deformations of the face image, the coefficients corresponding to second mixed deformations of the virtual character are determined according to the coefficients corresponding to the first mixed deformations, and the expression of the virtual character is driven according to the coefficients corresponding to the second mixed deformations, so that accuracy of expression capture is improved.

Fig. 1 is a flowchart illustrating a method for driving an expression of a virtual character according to an embodiment of the present application.

The method for driving the expression of the virtual character according to the embodiment of the present application may be executed by the apparatus for driving the expression of the virtual character according to the embodiment of the present application, and the apparatus may be configured in an electronic device to implement a function of driving the expression of the virtual character.

The electronic device may be any device with computing capability, for example, a personal computer, a mobile terminal, a server, and the like, and the mobile terminal may be a hardware device with various operating systems, touch screens, and/or display screens, such as an in-vehicle device, a mobile phone, a tablet computer, a personal digital assistant, a wearable device, and the like.

As shown in fig. 1, the method for driving the expression of the virtual character includes:

step 101, obtaining a face image, and inputting the face image into a three-dimensional face model to obtain a first face state vector corresponding to the face image.

The face image may be obtained by shooting a face of a person, or may be an image of a face region cut out from an image obtained by shooting the person, or may be obtained by other methods, which is not limited in the present application.

After the face image is obtained, the face image may be input into the three-dimensional face model to obtain a first face state vector output by the three-dimensional face model.

The three-dimensional face model may be a 3D DMM model (3D morphablemodel, three-dimensional deformable face model), or other three-dimensional face models.

In the application, the first face state vector may be used to represent state information of a face in a face image, and the first face state vector may include a facial expression vector, a facial shape vector, a facial pose vector, and the like of the face in the face image.

And 102, inputting the first face state vector into a coefficient mapping model to obtain coefficients corresponding to a plurality of first mixed deformations corresponding to the face image.

The coefficient mapping model can be obtained by pre-training, the coefficient mapping model can be used for mapping the first face state vector output by the three-dimensional face model to the coefficients corresponding to the first mixed deformations respectively, and the accuracy of the coefficients of the first mixed deformations can be improved by using the coefficient mapping model.

In this application, the first mixed deformation may refer to a three-dimensional model for forming a facial expression, and the facial expression may be represented by a weighted combination of a plurality of first mixed deformations and coefficients thereof.

In this application, the first mixed deformation may be understood as a fixed deformed face model, the coefficient of the first mixed deformation may be understood as a weight of the first mixed deformation, and the coefficient of the first mixed deformation may be used to characterize an action range of the first mixed deformation.

For example, the facial expression may be divided into 51 first mixed deformations, such as a right eye looking at the right side, a left eye looking at the left side, a front eye looking at the left side, a mouth opening, and the like, and a facial expression may be obtained according to a weighted combination of the 51 first mixed deformations and respective corresponding coefficients.

Step 103, determining the coefficients corresponding to the second mixed deformations of the virtual character according to the coefficients corresponding to the first mixed deformations.

In this application, the second mixed deformation may be a three-dimensional model for composing the expression of the virtual character, and the expression of the virtual character may be expressed by a weighted combination of a plurality of second mixed deformations and their coefficients.

In this application, the second mixed deformation may also be understood as a fixed deformed face model, the coefficient of the second mixed deformation may be understood as a weight of the second mixed deformation, and the coefficient of the second mixed deformation may be used to characterize an action range of the second mixed deformation.

It should be noted that the second mixed variations may be the same or different for different virtual characters, and may be set according to actual needs.

In the present application, the number of the second mixed deformation may be the same as or different from the number of the first mixed deformation, and the present application does not limit this.

In practical applications, the number of the second mixed deformations may be larger than the number of the first mixed deformations in order to improve the expressive power of the virtual character. For example, the number of the first mixed deformation may be 51, and the number of the second mixed deformation may be 300.

In this application, the coefficients corresponding to the plurality of second mixed deformations may be determined based on the correspondence between the first mixed deformations and the second mixed deformations and the coefficients corresponding to the plurality of first mixed deformations, or the second mixed deformation related to each first mixed deformation may be found according to the semantics of each first mixed deformation, and then the coefficient of the related second mixed deformation may be determined according to the coefficient of each first mixed deformation.

And 104, driving the expression of the virtual character according to the coefficients respectively corresponding to the plurality of second mixed deformations.

In this application, since the coefficient of the second mixed deformation may be used to represent the action range of the second mixed deformation, the area corresponding to each second mixed deformation in the face model of the virtual character may be controlled according to the coefficient of each second mixed deformation, so as to obtain the expression of the virtual character.

Or, the plurality of second mixed deformations and the respective corresponding coefficients may be weighted and combined to obtain the expression of the virtual character, thereby implementing the driving of the expression of the virtual character. Therefore, the expression of the virtual character is driven based on the weighted combination of the plurality of second mixed deformations and the coefficients thereof, and the accuracy of the expression control of the virtual character is improved.

In the embodiment of the application, a face image is input into a three-dimensional face model, a first face state vector corresponding to the face image is obtained by using the three-dimensional face model, the first face state vector is input into a coefficient mapping model, coefficients corresponding to a plurality of first mixed deformations corresponding to the face image are obtained, the coefficients corresponding to a plurality of second mixed deformations of a virtual character are determined according to the coefficients corresponding to the first mixed deformations, and finally the expression of the virtual character is driven according to the coefficients corresponding to the second mixed deformations. Therefore, the expression of the virtual character is driven based on the mapping from the first human face state vector to the first mixed deformation coefficient and the mapping from the first mixed deformation coefficient to the second mixed deformation coefficient of the virtual character, so that the expression capturing accuracy, the expression driving accuracy and the expressive force of the virtual character are improved.

Fig. 2 is a flowchart illustrating a method for driving an expression of a virtual character according to another embodiment of the present application.

As shown in fig. 2, the method for driving the expression of the virtual character includes:

step 201, obtaining a face image, and inputting the face image into a three-dimensional face model to obtain a first face state vector corresponding to the face image.

Step 202, inputting the first face state vector into a coefficient mapping model to obtain coefficients corresponding to a plurality of first mixed deformations corresponding to the face image.

In the present application, steps 201 to 202 are similar to those described in the above embodiments, and therefore are not described herein again.

Step 203, acquiring a corresponding relation between the plurality of first mixed deformations and the plurality of second mixed deformations.

In this application, one first mixed deformation may correspond to one or more second mixed deformations, and the first mixed deformation may be the same as the face region associated with the corresponding second mixed deformation. For example, a first mixed deformation is the front of the left eye, and the face region associated with the corresponding second mixed deformation is also the left eye.

And step 204, determining a second mixed deformation associated with each first mixed deformation in the plurality of second mixed deformations according to the corresponding relation.

In this application, for each first mixed deformation, the corresponding relationship may be queried to determine a second mixed deformation corresponding to each first mixed deformation, and the second mixed deformation corresponding to each first mixed deformation may be used as the second mixed deformation associated with the first mixed deformation.

For example, if the second mixed deformation corresponding to a certain first mixed deformation C1 in the correspondence relationship includes D3, D4, D5, and D6, the second mixed deformation associated with the first mixed deformation C1 is D3, D4, D5, and D6. Wherein C1, D3, D4, D5, D6 can be understood as a designation of a mixed deformation.

In step 205, the coefficients of the associated second hybrid deformations are determined from the coefficients of each first hybrid deformation.

In the present application, the coefficient of each first mixed deformation may be used as the coefficient of the associated second mixed deformation.

In order to meet the diversified requirements on the control of the expressions of the virtual characters, in the present application, a preset coefficient mapping rule corresponding to each first mixed deformation may be obtained, where the coefficient mapping rule may refer to a mapping rule between a coefficient of the first mixed deformation and a coefficient of the second mixed deformation. Then, a coefficient mapping rule corresponding to each first mixed deformation may be adopted, and a coefficient of a second mixed deformation associated with each first mixed deformation is mapped based on the coefficient of each first mixed deformation.

For example, the coefficient mapping rule corresponding to a certain first mixed deformation a11 is that the coefficient of the second mixed deformation B21 is 1 time of the coefficient of a11, the coefficient of the second mixed deformation B22 is 0.8 time of the coefficient of a11, and the coefficient of the second mixed deformation B23 is 0.8 time of the coefficient of a11, and if the coefficient of a11 is 0.8, the coefficients of B21, B22, and B23 are 0.8, 0.64, and 0.64, respectively.

In the present application, the coefficient of the associated second mixed deformation may be determined based on a coefficient mapping rule corresponding to the first mixed deformation. Therefore, the diversified requirements on the expression drive of the virtual character can be met by adjusting the coefficient mapping rule corresponding to the first mixed deformation.

Alternatively, in the present application, a preset coefficient adjustment rule for each second mixed deformation may be obtained, where a coefficient of each first mixed deformation is determined as an initial coefficient of the associated second mixed deformation, and then the initial coefficient of the associated second mixed deformation is adjusted according to the coefficient adjustment rule of the associated second mixed deformation, so as to obtain an associated coefficient of the second mixed deformation.

For example, one second hybrid transformation associated with the first hybrid transformation a12 is B24, the coefficient adjustment rule of B24 is to subtract 0.1 from the initial coefficient, and the coefficient of B24 is greater than or equal to 0, if the coefficient of a12 is 0.6, then the coefficient of B24 is 0.6-0.1=0.5.

As another example, the second hybrid deformation associated with the first hybrid deformation a13 is B25, the coefficient adjustment rule of B25 is adjusted to be 0.9 times of the initial coefficient, and if the coefficient of a13 is 0.8, the coefficient of B25 is 0.8 × 0.9=0.72.

In the present application, the coefficient of the second mixed deformation may be adjusted based on the coefficient adjustment rule of each second mixed deformation on the basis of the coefficient of the associated first mixed deformation. Therefore, the diversified requirements on the expression drive of the virtual character can be met by adjusting the coefficient mapping rule of the second mixed deformation.

And step 206, driving the expression of the virtual character according to the coefficients respectively corresponding to the plurality of second mixed deformations.

In the present application, the content of step 206 is similar to that described in the above embodiments, and therefore, the description thereof is omitted.

In the embodiment of the application, when the coefficients corresponding to the second mixed deformations of the virtual character are determined according to the coefficients corresponding to the first mixed deformations, the coefficient of each second mixed deformation can be determined based on the corresponding relationship between the first mixed deformations and the second mixed deformations and the coefficient of each first mixed deformation, so that the mapping from the first mixed deformation to the second mixed deformation is realized, the expression capturing accuracy is improved, and the accuracy of the expression driving of the virtual character is improved.

Fig. 3 is a flowchart illustrating a method for driving an expression of a virtual character according to another embodiment of the present application.

As shown in fig. 3, the method for driving the expression of the virtual character includes:

step 301, acquiring a face image, and inputting the face image into a three-dimensional face model to obtain a first face state vector corresponding to the face image.

Step 302, inputting the first face state vector into a coefficient mapping model to obtain coefficients corresponding to a plurality of first mixed deformations corresponding to the face image.

In the present application, steps 301 to 302 are similar to those described in the above embodiments, and therefore are not described herein again.

Step 303, determining a second hybrid transformation associated with each first hybrid transformation in the plurality of second hybrid transformations according to the semantics of the plurality of first hybrid transformations and the semantics of the plurality of second hybrid transformations.

In the present application, each first hybrid variant has a certain semantic meaning, and each second hybrid variant also has a certain semantic meaning. For example, if a first mixed deformation is a face model in front of the left eye, then the semantic meaning of the first mixed deformation is the front of the left eye.

In the application, semantics of a plurality of first mixed deformations and semantics of a plurality of second mixed deformations may be obtained, a semantic matching degree between each first mixed deformation and each second mixed deformation is calculated, and the second mixed deformation whose semantic matching degree is greater than a preset threshold may be used as the second mixed deformation associated with each first mixed deformation. Alternatively, the second mixed deformations associated with the first mixed deformation may be determined one by one based on the semantic matching degree.

In the application, the second mixed deformation associated with each first mixed deformation is determined according to the semantic matching degree between the first mixed deformation and the second mixed deformation, so that the accuracy is high.

Alternatively, in the present application, the face region associated with each first mixed deformation may be determined according to the semantic meaning of each first mixed deformation, the face region associated with each second mixed deformation may be determined according to the semantic meaning of each second mixed deformation, the face region associated with each first mixed deformation is respectively compared with the face regions associated with the plurality of second mixed deformations, and the second mixed deformation, which is the same as the face region associated with each first mixed deformation, in the plurality of second mixed deformations may be determined as the second mixed deformation associated with each first mixed deformation.

For example, if the face region associated with a certain first mixed deformation C1 is the right eye, and the face regions associated with the second mixed deformations D1 and D2 are also the right eye, then the second mixed deformations D1 and D2 may be determined as the second mixed deformation associated with C1.

Therefore, in the application, the face region associated with each first mixed deformation and the face region associated with each second mixed deformation can be determined based on the semantics of the first mixed deformation and the second mixed deformation, the second mixed deformation associated with each first mixed deformation can be determined by comparing the face region associated with the first mixed deformation with the face region associated with the second mixed deformation, and the determination method is simple and convenient.

In step 304, the coefficients of the associated second hybrid deformations are determined from the coefficients of each first hybrid deformation.

Step 305, driving the expression of the virtual character according to the coefficients corresponding to the plurality of second mixed deformations.

In the present application, steps 304-305 are similar to those described in the above embodiments, and therefore are not described herein again.

In the embodiment of the application, when the coefficients corresponding to the second mixed deformations of the virtual character are determined according to the coefficients corresponding to the first mixed deformations, the second mixed deformation associated with each first mixed deformation can be determined based on the semantics of the first mixed deformations and the semantics of the second mixed deformations, and then the coefficient of the associated second mixed deformation is determined according to the coefficient of each first mixed deformation, so that the mapping from the first mixed deformation to the second mixed deformation is realized, the accuracy of expression capture is improved, and the accuracy of expression driving of the virtual character is improved.

Fig. 4 is a flowchart illustrating a method for driving an expression of a virtual character according to another embodiment of the present application.

As shown in fig. 4, the coefficient mapping model may be obtained by training the following steps:

step 401, obtaining a face sample image and labeling coefficients corresponding to a plurality of first mixed deformations corresponding to the face sample image.

In the present application, the labeling coefficients respectively corresponding to the plurality of first mixed deformations of the face sample image may be coefficients given by shooting software when shooting the face sample image by using the shooting software.

According to the method and device for obtaining the face sample images, when the face sample images are obtained, expressions of multiple people can be shot to obtain multiple sample videos. In order to reduce the amount of computation, frames of each sample video can be extracted to obtain a multi-frame image of each sample video, then face key point detection is performed on each frame of image in the multi-frame image aiming at each sample video to obtain face key points in each frame of image, a face area in each frame of image is determined according to the face key points in each frame of image, clipping processing is performed on each frame of image to obtain a face area image in each frame of image, and the face area image is adjusted to a preset size to obtain a face sample image.

When the sample video is subjected to frame extraction, as a possible implementation manner, the sample video may be subjected to frame extraction according to a set frame extraction interval or a set frame extraction frequency, so as to obtain a multi-frame image.

As another possible implementation manner, in consideration of the similarity of contents between consecutive video frames, in order to reduce the amount of calculation and improve the processing efficiency, in the present application, the sample video may be subjected to de-duplication processing according to the similarity between the contents of the video frames in the sample video, so as to obtain a plurality of video frames (i.e., multi-frame images).

Therefore, the face key point detection is carried out on each frame of image in the multi-frame images of the sample video, so that the face region image is extracted from each frame of image, the calculation amount of the subsequent three-dimensional face model can be reduced, and the accuracy of the model output result is improved. Moreover, the human face sample images are unified into a fixed size, so that the three-dimensional human face model processing can be facilitated.

Step 402, inputting the face sample image into the three-dimensional face model to obtain a second face state vector corresponding to the face sample image.

The second face state vector may include a facial expression vector, a facial shape vector, a facial pose vector, and the like of a face in the face sample image.

In order to reduce the training time of the coefficient mapping model and accelerate the convergence rate of the model, in the application, the pixel values of all the pixel points in the face sample image can be normalized to obtain the normalized face sample image, and the normalized face sample image is input into the three-dimensional face model, so that the calculated amount of the model can be reduced, the training time can be reduced, and the convergence rate of the model can be accelerated.

When the face sample image is normalized, the pixel value of each pixel point can be divided by the maximum pixel value and the specified value is subtracted, so that the normalized pixel value of each pixel point is obtained.

For example, if the maximum pixel value is 255 and the specified value is 1, the pixel value of each pixel in the face sample image may be divided by 255 and then subtracted by 1, so that the normalized pixel value of each pixel is [ -1, 1 ]. Alternatively, the designated value may also be 0, so that the normalized pixel value of each pixel point is between [0,1 ].

It should be noted that the above-mentioned specified values are merely examples, and should not be construed as limiting the present application.

Step 403, inputting the second face state vector into the initial coefficient mapping model to obtain prediction coefficients corresponding to the plurality of first mixed deformations corresponding to the face sample image.

In the application, the second face state vector corresponding to the face sample image may be input into the initial coefficient mapping model to obtain prediction coefficients corresponding to a plurality of first mixed deformations output by the initial coefficient mapping.

Step 404, training the initial coefficient mapping model according to the difference between the prediction coefficient and the labeling coefficient to obtain a coefficient mapping model.

In the application, a loss value of each first mixed deformation may be determined according to a difference between a prediction coefficient corresponding to each first mixed deformation and a standard coefficient, the loss values of the plurality of first mixed deformations are added to obtain a model loss value of an initial coefficient mapping model, parameters of the initial coefficient mapping model are adjusted according to the model loss value, and then the adjusted coefficient mapping model is trained continuously until a training end condition is met to obtain the coefficient mapping model.

When the initial coefficient mapping model is trained, the initial coefficient mapping model can be trained in a deep learning mode, and compared with other machine learning methods, deep learning is better in performance on a large data set.

In the embodiment of the application, the second face state vector of the face sample image is input into the initial coefficient mapping model to obtain a plurality of first mixed deformation prediction coefficients corresponding to the face sample image, and the initial coefficient mapping model is trained according to the difference between the prediction coefficients and the labeling coefficients respectively corresponding to the plurality of first mixed deformations, so that the coefficient mapping model can be obtained, and the accuracy of the first mixed deformation coefficients can be improved by using the coefficient mapping model.

In order to implement the foregoing embodiments, the present application further provides a driving apparatus for virtual character expressions. Fig. 5 is a schematic structural diagram of a driving apparatus for virtual character expressions according to an embodiment of the present application.

As shown in fig. 5, the virtual character expression driving apparatus 500 includes:

a first obtaining module 510, configured to obtain a face image, and input the face image into a three-dimensional face model to obtain a first face state vector corresponding to the face image;

a second obtaining module 520, configured to input the first face state vector into the coefficient mapping model to obtain coefficients corresponding to a plurality of first mixed deformations corresponding to the face image, where the first mixed deformations are three-dimensional models used for forming the face expression;

a determining module 530, configured to determine, according to coefficients corresponding to the first mixed deformations, coefficients corresponding to second mixed deformations of the virtual character, where the second mixed deformations are three-dimensional models used for forming the expression of the virtual character;

and a driving module 540, configured to drive the expression of the virtual character according to the coefficients corresponding to the plurality of second mixed transformations.

In a possible implementation manner of the embodiment of the present application, the determining module 530 is configured to:

acquiring a corresponding relation between the plurality of first mixed deformations and the plurality of second mixed deformations;

determining a second mixed deformation associated with each first mixed deformation in the plurality of second mixed deformations according to the corresponding relation;

from the coefficients of each first hybrid deformation, the coefficients of the associated second hybrid deformation are determined.

determining a second mixed deformation associated with each first mixed deformation in the plurality of second mixed deformations according to the semantics of the plurality of first mixed deformations and the semantics of the plurality of second mixed deformations;

determining semantic matching degrees between each first mixed deformation and the plurality of second mixed deformations according to the semantics of the plurality of first mixed deformations and the semantics of the plurality of second mixed deformations;

and determining a second mixed deformation associated with each first mixed deformation according to the plurality of semantic matching degrees corresponding to each first mixed deformation.

determining a face region associated with each first mixed deformation according to the semantic meaning of each first mixed deformation;

determining a face region associated with each second mixed deformation according to the semantic meaning of each second mixed deformation;

comparing the face area associated with each first mixed deformation with the face areas associated with a plurality of second mixed deformations respectively;

and determining a second mixed deformation which is the same as the human face region associated with each first mixed deformation in the plurality of second mixed deformations as the associated second mixed deformation.

acquiring a coefficient mapping rule corresponding to each first mixed deformation;

and determining the coefficient of the associated second mixed deformation according to the coefficient mapping rule and the coefficient of each first mixed deformation.

acquiring a coefficient adjustment rule of each second mixed deformation;

determining the coefficient of each first mixed deformation as the initial coefficient of the associated second mixed deformation;

and adjusting the initial coefficient of the associated second mixed deformation according to the coefficient adjustment rule of the associated second mixed deformation to determine the coefficient of the associated second mixed deformation.

In a possible implementation manner of the embodiment of the present application, the apparatus may further include:

the third acquisition module is used for acquiring the face sample image and the labeling coefficients corresponding to the plurality of first mixed deformations corresponding to the face sample image respectively;

the fourth acquisition module is used for inputting the face sample image into the three-dimensional face model to obtain a second face state vector corresponding to the face sample image;

a fifth obtaining module, configured to input the second face state vector into the initial coefficient mapping model to obtain prediction coefficients corresponding to a plurality of first mixed deformations corresponding to the face sample image;

and the training module is used for training the initial coefficient mapping model according to the difference between the prediction coefficient and the labeling coefficient so as to obtain the coefficient mapping model.

In a possible implementation manner of the embodiment of the present application, the fourth obtaining module is configured to:

normalizing the pixel value of each pixel point in the face sample image to obtain a normalized face sample image;

and inputting the normalized face sample image into the three-dimensional face model to obtain a second face state vector.

In a possible implementation manner of the embodiment of the present application, the third obtaining module is configured to:

acquiring a plurality of sample videos, and performing frame extraction on each sample video to acquire a multi-frame image of each sample video;

carrying out face key point detection on each frame of image in multiple frames of images to obtain face key points in each frame of image;

clipping each frame of image according to the key points of the face in each frame of image to obtain a face region image in each frame of image;

and adjusting the human face area image to a preset size to obtain a human face sample image.

In a possible implementation manner of the embodiment of the present application, the driving module 540 is configured to:

and carrying out weighted combination on the plurality of second mixed deformations according to the coefficients corresponding to the plurality of second mixed deformations respectively so as to drive the expression of the virtual character.

It should be noted that the explanation of the foregoing embodiment of the method for driving the virtual character expression is also applicable to the driving apparatus for the virtual character expression of this embodiment, and therefore, the description is omitted here.

In the embodiment of the application, a first face state vector corresponding to a face image is obtained by inputting the face image into a three-dimensional face model, then the first face state vector is input into a coefficient mapping model, coefficients corresponding to a plurality of first mixed deformations corresponding to the face image are obtained, coefficients corresponding to a plurality of second mixed deformations of a virtual character are determined according to the coefficients corresponding to the first mixed deformations, and finally, the expression of the virtual character is driven according to the coefficients corresponding to the second mixed deformations. Therefore, the expression of the virtual character is driven based on the mapping from the first human face state vector to the first mixed deformation coefficient and the mapping from the first mixed deformation coefficient to the second mixed deformation coefficient of the virtual character, so that the expression capturing accuracy, the expression driving accuracy and the expressive force of the virtual character are improved.

According to embodiments of the present application, an electronic device, a readable storage medium, and a computer program product are also provided.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the device 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a ROM (Read-Only Memory) 602 or a computer program loaded from a storage unit 608 into a RAM (Random Access Memory) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An I/O (Input/Output) interface 605 is also connected to the bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 601 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing Unit 601 include, but are not limited to, a CPU (Central Processing Unit), a GPU (graphics Processing Unit), various dedicated AI (Artificial Intelligence) computing chips, various computing Units running machine learning model algorithms, a DSP (Digital Signal Processor), and any suitable Processor, controller, microcontroller, and the like. The calculation unit 601 executes the respective methods and processes described above, such as the driving method of the virtual character expression. For example, in some embodiments, the method of driving virtual character expressions may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 600 via ROM 602 and/or communications unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the driving method of the virtual character expression described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured by any other suitable means (e.g., by means of firmware) to perform the method of driving the avatar expression.

Various implementations of the systems and techniques described here above may be realized in digital electronic circuitry, integrated circuitry, FPGAs (Field Programmable Gate arrays), ASICs (Application-Specific Integrated circuits), ASSPs (Application Specific Standard products), SOCs (System On Chip, system On a Chip), CPLDs (Complex Programmable Logic devices), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present application may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a RAM, a ROM, an EPROM (erasable Programmable Read-Only-Memory) or flash Memory, an optical fiber, a CD-ROM (Compact Disc Read-Only-Memory), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a Display device (e.g., a CRT (Cathode Ray Tube) or LCD (Liquid Crystal Display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: LAN (Local Area Network), WAN (Wide Area Network), internet and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service extensibility in a conventional physical host and VPS service (Virtual Private Server). The server may also be a server of a distributed system, or a server incorporating a blockchain.

According to an embodiment of the present application, there is also provided a computer program product, which when executed by an instruction processor in the computer program product, executes the method for driving the virtual character expression provided in the above embodiment of the present application.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for driving the expression of a virtual character comprises the following steps:

inputting the first face state vector into a coefficient mapping model to obtain coefficients corresponding to a plurality of first mixed deformations corresponding to the face image respectively, wherein the first mixed deformations are three-dimensional models for forming face expressions, and the face expression is represented by weighted combination of the first mixed deformations and the coefficients;

driving the expression of the virtual character according to the coefficients corresponding to the plurality of second mixed deformations;

determining the coefficients corresponding to the second mixed deformations of the virtual character according to the coefficients corresponding to the first mixed deformations, respectively, includes:

determining a plurality of second mixed deformations associated with each first mixed deformation in the plurality of second mixed deformations according to the corresponding relation;

determining coefficients of the associated plurality of second hybrid deformations from the coefficients of each of the first hybrid deformations;

further comprising:

determining a second hybrid deformation associated with each of the first hybrid deformations from the semantics of the first hybrid deformations and the semantics of the second hybrid deformations;

determining the coefficients of the associated second hybrid deformations from the coefficients of each of the first hybrid deformations.

2. The method of claim 1, wherein the determining a second hybrid morph of the plurality of second hybrid morphs associated with each of the first hybrid morphs according to the semantics of the plurality of first hybrid morphs and the semantics of the plurality of second hybrid morphs comprises:

and determining a second mixed deformation associated with each first mixed deformation according to a plurality of semantic matching degrees corresponding to each first mixed deformation.

3. The method of claim 1, wherein the determining, from the semantics of the first hybrid variants and the semantics of the second hybrid variants, a second hybrid variant of the second hybrid variants that is associated with each of the first hybrid variants comprises:

comparing the face region associated with each first mixed deformation with the face regions associated with the plurality of second mixed deformations;

and determining a second mixed deformation in the plurality of second mixed deformations, which is the same as the face region associated with each first mixed deformation, as the associated second mixed deformation.

4. The method of any one of claims 1-3, wherein said determining coefficients of said associated second hybrid variant from coefficients of each of said first hybrid variants comprises:

determining the coefficients of the associated second hybrid deformations according to the coefficient mapping rules and the coefficients of each of the first hybrid deformations.

5. The method of any one of claims 1-3, wherein said determining coefficients of said associated second hybrid variant from coefficients of each of said first hybrid variants comprises:

acquiring a coefficient adjustment rule of each second mixed deformation;

determining the coefficient of each first hybrid deformation as the initial coefficient of the associated second hybrid deformation;

6. The method of claim 1, wherein the coefficient mapping model is trained by:

acquiring a face sample image and labeling coefficients corresponding to the plurality of first mixed deformations corresponding to the face sample image respectively;

inputting the face sample image into the three-dimensional face model to obtain a second face state vector corresponding to the face sample image;

inputting the second face state vector into an initial coefficient mapping model to obtain prediction coefficients corresponding to the plurality of first mixed deformations corresponding to the face sample image;

and training the initial coefficient mapping model according to the difference between the prediction coefficient and the labeling coefficient to obtain the coefficient mapping model.

7. The method of claim 6, wherein the inputting the face image into the three-dimensional face model to obtain a second face state vector corresponding to the face sample image comprises:

and inputting the normalized face sample image into the three-dimensional face model to obtain the second face state vector.

8. The method of claim 6, wherein said obtaining a face sample image comprises:

performing face key point detection on each frame of image in the multiple frames of images to obtain face key points in each frame of image;

and adjusting the human face area image to a preset size to obtain the human face sample image.

9. The method of claim 1, wherein the driving the expression of the virtual character according to the coefficients corresponding to the second mixed deformations comprises:

10. A driving apparatus of an expression of a virtual character, comprising:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a face image and inputting the face image into a three-dimensional face model so as to obtain a first face state vector corresponding to the face image;

a second obtaining module, configured to input the first face state vector into a coefficient mapping model to obtain coefficients corresponding to a plurality of first mixed deformations corresponding to the face image, where the first mixed deformations are three-dimensional models used for forming a face expression, and a face expression is represented by a weighted combination of the plurality of first mixed deformations and the coefficients;

the determining module is used for determining the coefficients corresponding to a plurality of second mixed deformations of the virtual character according to the coefficients corresponding to the first mixed deformations respectively, wherein the second mixed deformations are used for forming a three-dimensional model of the virtual character expression;

the driving module is used for driving the expression of the virtual character according to the coefficients respectively corresponding to the plurality of second mixed deformations;

the determining module is configured to:

determining a plurality of second mixed deformations associated with each first mixed deformation in the plurality of second mixed deformations according to the corresponding relation; determining coefficients of the associated plurality of second hybrid deformations from the coefficients of each of the first hybrid deformations;

and is also used for:

determining a second hybrid morph in the plurality of second hybrid morphs, which is associated with each first hybrid morph, according to the semantics of the plurality of first hybrid morphs and the semantics of the plurality of second hybrid morphs;

determining coefficients of the associated second hybrid transformations from the coefficients of each of the first hybrid transformations.

11. The apparatus of claim 10, wherein the means for determining is configured to:

12. The apparatus of claim 10, wherein the means for determining is configured to:

comparing the face region associated with each first mixed deformation with the face regions associated with the plurality of second mixed deformations respectively;

13. The apparatus of any of claims 10-12, wherein the means for determining is configured to:

14. The apparatus of any one of claims 10-12, wherein the means for determining is configured to:

acquiring a coefficient adjustment rule of each second mixed deformation;

15. The apparatus of claim 10, further comprising:

a third obtaining module, configured to obtain a face sample image and labeling coefficients corresponding to the plurality of first mixed deformations corresponding to the face sample image;

a fifth obtaining module, configured to input the second face state vector into an initial coefficient mapping model, so as to obtain prediction coefficients corresponding to the plurality of first mixed deformations corresponding to the face sample image, respectively;

16. The apparatus of claim 15, wherein the fourth obtaining means is configured to:

and inputting the normalized human face sample image into the three-dimensional human face model to obtain the second human face state vector.

17. The apparatus of claim 15, wherein the third obtaining means is configured to:

clipping each frame of image according to the face key points in each frame of image to obtain a face region image in each frame of image;

18. The apparatus of claim 10, wherein the drive module is to:

19. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.