CN116246014B

CN116246014B - Image generation method and device, storage medium and electronic equipment

Info

Publication number: CN116246014B
Application number: CN202211693006.2A
Authority: CN
Inventors: 曹佳炯
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-12-28
Filing date: 2022-12-28
Publication date: 2024-05-14
Anticipated expiration: 2042-12-28
Also published as: CN116246014A

Abstract

The specification discloses an image generation method, an image generation device, a storage medium and electronic equipment, wherein the method comprises the following steps: the service platform can acquire a first basis function vector aiming at the object face reconstruction model in a source domain, the first basis function vector is sent to the terminal equipment, the terminal equipment can acquire a target domain two-dimensional image in a target domain, a second basis function vector is obtained based on the first basis function vector and the target domain two-dimensional image, the control object face reconstruction model drives the virtual object image based on the second basis function vector to obtain a virtual object three-dimensional image, and image quality enhancement processing is carried out based on the virtual object three-dimensional image to obtain a target virtual object image.

Description

Image generation method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an image generating method, an image generating device, a storage medium, and an electronic device.

Background

With the rapid development of computer technology, virtual scenes such as metauniverse and virtual reality are increasingly widely applied in recent years, and at present, related virtual scene applications such as metauniverse and the like are in a rapid development stage, most of research and application are focused on image generation of 'virtual object images', and it is particularly critical to generate vivid 'virtual object images' with high quality.

Disclosure of Invention

The specification provides an image generation method, an image generation device, a storage medium and electronic equipment, wherein the technical scheme is as follows:

in a first aspect, the present specification provides an image generation method, applied to a service platform, the method including:

Acquiring a first basis function vector of a reconstruction model of a face of a subject in a source domain;

And transmitting the first basis function vector to at least one terminal device so that the terminal device obtains at least one target domain two-dimensional image in a target domain, obtains a second basis function vector based on the first basis function vector and the target domain two-dimensional image, controls the object face reconstruction model to drive the virtual object image based on the second basis function vector to obtain a virtual object three-dimensional image, and carries out image quality enhancement processing based on the virtual object three-dimensional image to obtain a target virtual object image.

In a second aspect, the present specification provides an image generation method, applied to a terminal device, the method including:

acquiring a first basis function vector from a service platform, wherein the first basis function vector is a basis function vector of an object facial reconstruction model acquired from a source domain by the service platform;

Acquiring at least one target domain two-dimensional image in a target domain, and carrying out self-adaptive cross-domain adjustment processing on the first basis function vector based on the first basis function vector and the target domain two-dimensional image to obtain a second basis function vector;

and controlling the object face reconstruction model to drive the virtual object image based on the second basis function vector to obtain a three-dimensional virtual object image, and carrying out image quality enhancement processing based on the three-dimensional virtual object image to obtain a target virtual object image.

In a third aspect, the present specification provides an image generation apparatus, the apparatus comprising:

The data acquisition module is used for acquiring a first basis function vector aiming at the object facial reconstruction model in the source domain;

The data transmission module is used for transmitting the first basis function vector to at least one terminal device so that the terminal device can acquire at least one target domain two-dimensional image in a target domain, carrying out self-adaptive cross-domain adjustment processing on the first basis function vector and the target domain two-dimensional image to obtain a second basis function vector, controlling the object face reconstruction model to carry out virtual object image driving based on the second basis function vector to obtain a virtual object three-dimensional image, and carrying out image quality enhancement processing based on the virtual object three-dimensional image to obtain a target virtual object image.

In a fourth aspect, the present specification provides an image generation apparatus, the apparatus comprising:

the data acquisition module is used for acquiring a first basis function vector from the service platform;

the object driving module is used for acquiring at least one target domain two-dimensional image in a target domain, and carrying out self-adaptive cross-domain adjustment processing on the first basis function vector based on the first basis function vector and the target domain two-dimensional image to obtain a second basis function vector;

In a fifth aspect, the present description provides a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-described method steps.

In a sixth aspect, the present description provides an electronic device, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.

In a seventh aspect, the present description provides a computer program product storing at least one instruction for loading by a processor and performing the above-described method steps.

The technical scheme provided by some embodiments of the present specification has the following beneficial effects:

In one or more embodiments of the present disclosure, a service platform may obtain a first basis function vector for an object face reconstruction model in a source domain, send the first basis function vector to a terminal device, and the terminal device may obtain a target domain two-dimensional image in a target domain, obtain a second basis function vector adapted to the target domain based on the first basis function vector and the target domain two-dimensional image, so as to control the object face reconstruction model to perform virtual object image driving based on the second basis function vector to obtain a virtual object three-dimensional image, and perform image quality enhancement processing based on the virtual object three-dimensional image, so as to obtain a target virtual object image.

Drawings

In order to more clearly illustrate the technical solutions of the present specification or the prior art, the following description will briefly introduce the drawings that are required to be used in the embodiments or the prior art descriptions, it is obvious that the drawings in the following description are only some embodiments of the present specification, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of a scenario of an object processing system provided herein;

FIG. 2 is a flow chart of an image generation method provided in the present specification;

FIG. 3 is a flow chart of an exemplary first basis function vector retrieving process provided herein;

FIG. 4 is a flow chart of another embodiment of an avatar generation method provided herein;

FIG. 5 is a flow chart of an image generation method provided in the present specification;

FIG. 6 is a flow chart of a second basis function vector retrieving process provided in the present specification;

fig. 7 is a schematic structural view of an image generating apparatus provided in the present specification;

fig. 8 is a schematic structural view of an image generating apparatus provided in the present specification;

fig. 9 is a schematic structural view of an electronic device provided in the present specification;

Fig. 10 is a schematic structural view of an electronic device provided in the present specification;

FIG. 11 is a schematic diagram of the architecture of the operating system and user space provided herein;

FIG. 12 is an architecture diagram of the android operating system of FIG. 11;

FIG. 13 is an architecture diagram of the IOS operating system of FIG. 11.

Detailed Description

The following description of the embodiments of the present invention will be made apparent from, and elucidated with reference to, the drawings of the present specification, in which embodiments described are only some, but not all, embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

In the description of the present specification, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present specification, it should be noted that, unless expressly specified and limited otherwise, "comprise" and "have" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. The specific meaning of the terms in this specification will be understood by those of ordinary skill in the art in the light of the specific circumstances. In addition, in the description of the present specification, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

In the related art, a trained object face reconstruction model is directly adapted to an end side (such as a terminal device side) for virtual object image generation, and the object face reconstruction model is usually generated based on a plurality of basis function vector sets obtained in a training stage; at present, the training data used by the object face reconstruction model in the training stage is difficult to be consistent with the data characteristics of the end side application scene, or the training data used by the object face reconstruction model in the training stage is difficult to cover the actual end side application scene, and the output image of the object face reconstruction model in the end side application scene has reduced fitting capacity, so that the quality of the generated virtual object image is poor.

The present specification is described in detail below with reference to specific examples.

Referring to fig. 1, a schematic view of a scenario of an object processing system provided in the present specification is provided. As shown in fig. 1, the object processing system may include at least a client cluster and a service platform 100.

The client cluster may include at least one client, as shown in fig. 1, specifically including a client 1 corresponding to a user 1, a client 2 corresponding to a user 2, …, and a client n corresponding to a user n, where n is an integer greater than 0.

Each client in the client cluster may be a communication-enabled electronic device including, but not limited to: wearable devices, handheld devices, personal computers, tablet computers, vehicle-mounted devices, smart phones, computing devices, or other processing devices connected to a wireless modem, etc. Electronic devices in different networks may be called different names, for example: a user equipment, an access terminal, a subscriber unit, a subscriber station, a mobile station, a remote terminal, a mobile device, a user terminal, a wireless communication device, a user agent or user equipment, a cellular telephone, a cordless telephone, an electronic device in a Personal Digital Assistant (PDA), a 5G network, or a future evolution network, etc.

The service platform 100 may be a separate server device, such as: rack-mounted, blade, tower-type or cabinet-type server equipment or hardware equipment with stronger computing capacity such as workstations, mainframe computers and the like is adopted; the server cluster may also be a server cluster formed by a plurality of servers, and each server in the server cluster may be formed in a symmetrical manner, wherein each server is functionally equivalent and functionally equivalent in a transaction link, and each server may independently provide services to the outside, and the independent provision of services may be understood as no assistance of another server is needed.

The application scenarios to which the image generating method of one or more embodiments of the present disclosure may be applied include, but are not limited to, fitting of one or more of a meta space scenario, an augmented reality scenario, a virtual reality scenario, a visual special effect scenario, etc., where in the foregoing scenario, there is a situation that an object face reconstruction model for three-dimensional face image generation has an object handling mismatch between a source domain of a training stage and a target domain of an application stage, the image generating method may be used to implement cross-domain adaptation through a service platform and a terminal device (a device corresponding to a client) under the target domain, and accurately assist in generating a high-quality three-dimensional face image, and after obtaining an accurate second basis function vector, the terminal device or the service platform may perform virtual object image driving based on the target domain two-dimensional image acquired by the terminal device for the entity object in the target domain based on the second basis function vector by controlling the object face reconstruction model to obtain a virtual object three-dimensional image, and perform image quality enhancement on the virtual object three-dimensional image based on the second basis function vector to obtain the target virtual object image.

In one or more embodiments of the present disclosure, the service platform 100 may establish a communication connection with at least one client in the client cluster, and complete data interaction during the object processing based on the communication connection, for example, the service platform 100 may issue data to the client based on a first basis function vector for the object face reconstruction model obtained by the image generating method of the present disclosure; as another example, the service platform 100 may obtain a target domain two-dimensional image from a target domain corresponding to the client, and so on.

It should be noted that, the service platform 100 establishes a communication connection with at least one client in the client cluster through a network for interactive communication, where the network may be a wireless network, or may be a wired network, where the wireless network includes, but is not limited to, a cellular network, a wireless local area network, an infrared network, or a bluetooth network, and the wired network includes, but is not limited to, an ethernet network, a universal serial bus (universal serial bus, USB), or a controller area network. In one or more embodiments of the specification, techniques and/or formats including HyperText Mark-up Language (HTML), extensible markup Language (Extensible Markup Language, XML), and the like are used to represent data exchanged over a network (e.g., a target compression package). All or some of the links may also be encrypted using conventional encryption techniques such as secure sockets layer (Secure Socket Layer, SSL), transport layer security (Transport Layer Security, TLS), virtual private network (Virtual Private Network, VPN), internet protocol security (Internet Protocol Security, IPsec), etc. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.

The embodiments of the object processing system provided in the present specification and the image generating method in one or more embodiments belong to the same concept, and an execution subject corresponding to the image generating method in one or more embodiments in the present specification may be the service platform 100 described above; the execution subject corresponding to the image generation method in one or more embodiments of the specification may also be a terminal device corresponding to the client, which is specifically determined based on an actual application environment. The implementation process of the object processing system embodiment may be described in detail in the following method embodiment, which is not described herein.

Based on the scene diagram shown in fig. 1, the image generation method provided in one or more embodiments of the present specification will be described in detail.

Referring to fig. 2, a flow diagram of a figure generation method, which may be implemented in dependence on a computer program, may be run on a figure generation device based on von neumann system is provided for one or more embodiments of the present description. The computer program may be integrated in the application or may run as a stand-alone tool class application. The image generating means may be a service platform.

Specifically, the image generation method comprises the following steps:

s102: acquiring a first basis function vector of a reconstruction model of a face of a subject in a source domain;

The object face reconstruction model may be a model for three-dimensional face reconstruction in the related art, and through the object face reconstruction model, a three-dimensional avatar under the virtual world may be output based on a two-dimensional image of the object face in the real physical world as a model input. For example, the object face reconstruction model may be a 3DMM model in the related art.

In the related art, an object face reconstruction model such as a 3DMM defines a three-dimensional face space as a combination of one or more basis functions such as a face shape basis function, a facial expression basis function, a texture basis function, and the like by combining expert prior knowledge, and different three-dimensional virtual face object face reconstruction models can output a three-dimensional virtual face image by adjusting weights of the respective basis functions. Further, the foregoing basis functions are typically characterized in terms of basis function vectors, which correspond to at least facial shape basis functions and facial expression basis functions present in the 3d mm model, and are characterized by a series of principal component vectors (e.g., 10 128-dimensional vectors), and are therefore also commonly referred to as basis function vectors.

In practical applications, the basis function vector of the object face reconstruction model is critical to the whole object processing process, and is the input of the object face reconstruction model by taking a two-dimensional face image as the input of the object face reconstruction model, and the object face reconstruction model can characterize different three-dimensional virtual faces by changing weight parameters based on the basis function vector.

In one or more embodiments of the present specification, the object face reconstruction model may be understood as a model that has been trained using sample data of a source domain, and a basis function vector corresponding to the object face reconstruction model that has been trained is a first basis function vector.

The source domain refers to a domain corresponding to a training data set used by an object face reconstruction model, the object face reconstruction model is usually a trained object face reconstruction model such as a 3DMM model obtained directly from a related technology, and a source domain two-dimensional image in a training set corresponding to the object face reconstruction model and a reference basis function vector used by the object face reconstruction model can be obtained directly, and the obtained reference basis function vector can be used as a first basis function vector. In some embodiments, considering that most source training data sets deviate in terms of object type, years, etc., further optimization adjustments may be performed based on the obtained reference basis function vector to obtain the first basis function vector.

The first basis function vector may be understood as a basis function vector of the object facial reconstruction model in a source domain basis vector space.

In the practical application stage, the object face reconstruction model is directly adapted to the end side (such as the side of the terminal equipment), and the domain corresponding to the end side and even the scene is extremely different from the source domain, for example, the possible object face reconstruction model uses data in the A domain in the training stage, the end side corresponds to the B domain in the practical application stage, the fitting capability of a new target domain of the end side is reduced, and the situation of under-fitting of the output virtual object image can occur due to the limited expression capability of the first basis function vector in the target domain, so that the situation can be improved by executing the image generation method of one or more embodiments.

S104: and transmitting the first basis function vector to at least one terminal device so that the terminal device obtains at least one target domain two-dimensional image in a target domain, obtains a second basis function vector based on the first basis function vector and the target domain two-dimensional image, controls the object face reconstruction model to drive the virtual object image based on the second basis function vector to obtain a virtual object three-dimensional image, and carries out image quality enhancement processing based on the virtual object three-dimensional image to obtain a target virtual object image.

The second basis function vector is a basis function vector of the object facial reconstruction model in a target domain basis vector space.

The virtual object three-dimensional image is a three-dimensional face image which is virtual based on the second basis function vector through the object face reconstruction model, and the virtual object three-dimensional image can be generally characterized in the forms of point cloud data, three-dimensional image data and the like.

The target virtual object image is a three-dimensional object face image obtained by enhancing the image quality of the virtual object three-dimensional image, and the quality of the virtual digital face can be improved.

It can be appreciated that the service platform generally interacts with terminal devices where multiple clients are located, where each terminal device is located in a different domain, i.e. a target domain; the service platform can send the first basis function vector to at least one terminal device, and after the terminal device receives the first basis function vector, the terminal device can collect at least one target domain two-dimensional image of the user object in the current target domain, wherein the target domain two-dimensional image is a two-dimensional image, such as an RGB (red, green and blue) face image, which is collected by the terminal device and at least comprises the face of the user object. Then a second basis function vector can be obtained based on the first basis function vector and the target domain two-dimensional image, then an object face reconstruction model (which can be obtained in advance from a service platform) can be controlled to drive the virtual object image based on the second basis function vector to obtain a three-dimensional virtual object image, and image quality enhancement processing is carried out based on the three-dimensional virtual object image to obtain a target virtual object image.

In one or more embodiments of the present disclosure, the image quality of the target virtual object image is greatly improved, so that the object face reconstruction model under the source domain can avoid the situations of reduced fitting capability and reduced model processing capability after being applied to the target domain at the end side, and the image generation mode is closer to the actual application scene, and has good cross-domain robustness.

Illustratively, in one or more embodiments of the present disclosure, fig. 3 is a flow diagram of an exemplary first basis function vector retrieving process. Optionally, based on the one or more embodiments, when executing the first basis function vector for the object face reconstruction model in the source domain, the method may include the following:

S202: acquiring a source domain two-dimensional image in a source domain, and acquiring a reference basis function vector corresponding to an object face reconstruction model determined based on the source domain two-dimensional image;

The source domain two-dimensional image may be understood as a training image in a source domain dataset used by the object face reconstruction model in a training phase.

The reference basis function vector is a basis function vector corresponding to the object face reconstruction model after model training is performed on the initial object face reconstruction model by using a source domain two-dimensional image in the source domain data set to obtain the object face reconstruction model, for example, a basis function vector with at least a facial shape and a facial expression basis function vector are corresponding to the 3DMM model, and the reference basis function vector is characterized by a series of principal component vectors (for example, 10 vectors with 128 dimensions), so that the basis function is also commonly called as a basis function vector.

Alternatively, a source domain two-dimensional image of the object face reconstruction model in the source training dataset of the training phase and a reference basis function vector may be obtained from the public network.

In one or more embodiments of the present disclosure, it is considered that a majority of the source training data sets, that is, the plurality of source domain two-dimensional images, deviate in terms of object type, years, and the like, and based on this, further optimization adjustment is performed based on the obtained reference basis function vector in combination with the pose angle estimation of the source domain two-dimensional images, so as to obtain the first basis function vector.

S204: and determining a posture angle corresponding to the source domain two-dimensional image, and performing basis vector adjustment on the reference basis function vector based on the posture angle to obtain a first basis function vector aiming at the object face reconstruction model.

The attitude angles include, but are not limited to, forms characterized in terms of yaw angle yaw, pitch angle pitch, roll angle roll.

It can be understood that the facial attitude angle recognition can be performed on the user object in the source domain two-dimensional image to obtain the attitude angle of the source domain two-dimensional image, and the attitude angle in the source domain two-dimensional image can be recognized by adopting a related attitude angle recognition mode aiming at the user object and taking the source domain two-dimensional image as a reference.

Optionally, the preselection may be trained with a source domain basis function generating model based on a machine learning model, the source domain basis function generating model takes a source domain two-dimensional image and a reference basis function vector as model inputs, and outputs a first basis function vector for the object face reconstruction model, where the first basis function vector is a first basis function vector obtained by performing two-stage optimization on the reference basis function vector through a posture angle of the source domain two-dimensional image.

Alternatively, the source domain basis function generation model may be implemented based on a fit of one or more of a machine learning network including, but not limited to, a convolutional neural network (Convolutional Neural Network, CNN) model, a deep neural network (Deep Neural Network, DNN) model, a recurrent neural network (Recurrent Neural Networks, RNN), a model, an embedded (embedding) model, a gradient-lifting decision tree (Gradient Boosting Decision Tree, GBDT) model, a logistic regression (Logistic Regression, LR) model, and the like.

Illustratively, a source domain two-dimensional image and a reference basis function vector can be input into a source domain basis function generation model, an attitude angle corresponding to the source domain two-dimensional image is determined through the source domain basis function generation model, attitude correction processing is carried out on the source domain two-dimensional image based on the attitude angle to obtain a source domain two-dimensional correction image, so as to extract source domain object facial features of the source domain two-dimensional correction image, basis vector adjustment is carried out based on the source domain object facial features and the reference basis function vector to obtain a first basis function vector, and the first basis function vector is output.

Optionally, the source domain basis function generating model may include a plurality of network parts, such as an object pose estimating network, a first object feature extracting network and a basis function generating network, where the object pose estimating network is used to determine a pose angle corresponding to the source domain two-dimensional image, optionally, the object pose estimating network may perform pose correction on an object in the source domain two-dimensional image based on the pose angle to obtain a source domain (sample) corrected image, and generally correct the object in the source domain two-dimensional image to a preset angle to obtain the source domain (sample) corrected image, and the first object feature extracting network is used to perform object (face) feature extraction on the source domain (sample) corrected image corrected by the source domain two-dimensional (sample) image, and the basis function generating network is used to perform basis vector adjustment on a reference basis function vector.

Illustratively, in the training phase of the source domain basis function generation model: the object pose estimation network and the first object feature extraction network may be pre-training networks, that is, machine learning networks that have been trained, and in a training stage of generating the model by using the source domain basis function, network model parameter adjustment may not be performed on the object pose estimation network and the first object feature extraction network, that is, network states of the object pose estimation network and the first object feature extraction network in the training stage remain unchanged.

Illustratively, in the training phase of the source domain basis function generation model: the training data and the object posture estimation network are mainly combined, and the first object characteristic extraction network assists in training the basis function to generate the network.

In one possible implementation, the training process of the source domain basis function generation model is as follows:

A2: the execution main body can acquire a source domain two-dimensional sample image under a source domain and acquire a reference basis function sample vector corresponding to an object face reconstruction model determined based on the source domain two-dimensional sample image, and model training is carried out on an initial source domain basis function generation model by adopting the source domain two-dimensional sample image and the reference basis function sample vector, wherein the initial source domain basis function generation model at least comprises an object posture estimation network, a first object feature extraction network and a basis function generation network;

In one or more embodiments of the present disclosure, the source domain two-dimensional sample image and the reference basis function sample vector are used as inputs to a model training stage of an initial source domain basis function generating model, where the initial source domain basis function generating model may be obtained based on one or more of machine learning networks, and the initial source domain basis function generating model includes at least a pre-trained object pose estimation network, a pre-trained first object feature extraction network, and a basis function generating network. The object pose estimation network, the first object feature extraction network, may be pre-trained directly in other machine learning tasks or obtained directly from the public network. In the model training process, a back propagation algorithm is generally adopted to combine model loss to carry out model parameter adjustment on the basis function generation network only.

A4: in the model training process, an object posture estimation network is adopted to obtain a sample posture angle based on the source domain two-dimensional sample image, posture correction processing is carried out on the source domain two-dimensional sample image based on the sample posture angle to obtain a source domain sample correction image, a first object characteristic extraction network is adopted to extract source domain object sample characteristics based on the source domain sample correction image, and a basis function generation network is adopted to carry out basis vector adjustment based on the source domain object sample characteristics and the reference basis function sample vector to obtain a first basis function sample vector;

Illustratively, the input of the object pose estimation network E is a source domain two-dimensional sample image, for example, may be an RGB face image, and the input is a corresponding sample pose angle, that is, a pitch-yaw-roll pose angle;

Illustratively, the first object feature extraction network may be understood as an object identity feature extractor I, where the input of the first object feature extraction network is a source domain sample correction image after posture correction, and the output is a corresponding source domain object sample feature; it will be appreciated that the corrected source domain sample corrected image is feature re-extracted to resist source dataset image quality deviations.

Illustratively, the input of the basis function generating network D is a source domain object sample feature and a reference basis function sample vector, and the reference basis function sample vector may be a 3DMM basis function group of the open source data set, and the output is a new first basis function sample vector, such as a 3DMM basis function vector;

illustratively, the above process is a forward propagation training process for each round of the model training phase.

A6: and calculating a second model loss based on the first basis function sample vector, the source domain two-dimensional sample image, the source domain virtual three-dimensional image tag and the reference basis function sample vector, and performing model parameter adjustment on a basis function generation network in the initial source domain basis function generation model by adopting the second model loss to obtain a source domain basis function generation model.

Illustratively, in each round of back propagation training in the model training stage, a second model loss can be calculated based on the first basis function sample vector, the source domain two-dimensional sample image, the source domain virtual three-dimensional image tag and the reference basis function sample vector, and model parameter adjustment is performed on the basis of the second model loss on the basis function generation network in the initial source domain basis function generation model until the initial source domain basis function generation model meets the model ending training condition, so as to obtain the source domain basis function generation model.

In one or more embodiments herein, the model ending training condition may include, for example, a loss function having a value less than or equal to a preset loss function threshold, a number of iterations reaching a preset number of times threshold, and so on. The specific model end training conditions may be determined based on actual conditions and are not specifically limited herein.

In a possible implementation, the calculating the second model loss based on the first basis function sample vector, the source domain two-dimensional sample image, a source domain avatar label, and the reference basis function sample vector may be:

Specifically, in a model training stage of each round of initial source domain basis function generation model, a source domain sample correction image is input into an object face reconstruction model, and the object face reconstruction model is controlled to drive an virtual object image based on the first basis function sample vector, so that a three-dimensional image of a source domain virtual object is obtained;

it can be understood that, since a new first basis function sample vector is obtained, the basis function vector update can be performed on the object face reconstruction model at this time, that is, the control object face reconstruction model performs image driving of the object based on the first basis function sample vector, and the source domain sample correction image can be used as input to control the object face reconstruction model to drive the virtual three-dimensional object to generate the face image of the corresponding expression type.

Specifically, a source domain virtual three-dimensional image label corresponding to the source domain two-dimensional sample image is obtained while the source domain virtual object three-dimensional image is obtained;

In the model training stage of each round of initial source domain basis function generation model, three-dimensional 3D information corresponding to a corresponding source domain two-dimensional sample image is acquired, wherein the three-dimensional 3D information comprises pre-labeled virtual three-dimensional image labels, and the virtual three-dimensional image labels can be labeled by adopting expert three-dimensional labeling service in advance.

Specifically, a corrected reconstruction Loss _recons is obtained by adopting a first Loss calculation formula based on a source domain sample corrected image and a source domain virtual three-dimensional image label, a basis function expression Loss _pre is obtained by adopting a second Loss calculation formula based on a first basis function sample vector and a reference basis function sample vector, and a second model Loss is determined based on the corrected reconstruction Loss and the basis function expression Loss;

Further, the first loss calculation formula satisfies the following formula:

Wherein the Loss _recons is the correction reconstruction Loss, the V _3DMM is the source domain virtual object three-dimensional figure, and the V _GT is the source domain virtual three-dimensional figure label;

Illustratively, the corrected and reconstructed loss based on the attitude angle can be obtained by correcting and reconstructing the loss corresponding to the first loss calculation formula.

The second loss calculation satisfies the following formula:

The Loss _pre is the basis function expression Loss, the βase _3DMM is the first basis function sample vector, the Base _3DMM-OLD is the reference basis function sample vector, and the X is a conversion matrix of the first basis function sample vector and the reference basis function sample vector.

Illustratively, the basis function expression loss corresponding to the second loss calculation formula can be used for quantizing each component vector component of the open source or the existing reference basis function sample vector by using the new first basis function sample vector, and after the basis function expression loss is calculated, the model loss can be used for ensuring that the basis function vector is adjusted so that the image expression capability of the basis function vector is not attenuated.

Illustratively, since the first basis function sample vector and the reference basis function sample vector are already obtained, the transformation matrix X can be automatically solved when the model trains and optimizes the model parameters, and typically, the transformation matrix can be output at the fully connected layer of the initial source domain basis function generation model.

In one or more embodiments of the present disclosure, the image quality of the target virtual object image is greatly improved, so that the quality of the virtual digital face can be improved, and the situation that the fitting capability is reduced and the model processing capability is reduced after the object face reconstruction model is applied to the end side can be avoided.

Referring to fig. 4, fig. 4 is a flow chart illustrating another embodiment of an avatar generation method according to one or more embodiments of the present disclosure. Specific:

s302: acquiring a first basis function vector of a reconstruction model of a face of a subject in a source domain;

reference may be made specifically to the method steps of other embodiments of the present disclosure, and details are not repeated here.

S304: acquiring at least one target domain two-dimensional image in a target domain from at least one terminal device;

In one or more embodiments of the present description, the service platform may obtain, from the terminal device, several target domain two-dimensional images of the target domain in which it is located.

For example, the service platform may send a target domain image acquisition request for a target domain two-dimensional image to a plurality of terminal devices, the plurality of terminal devices may acquire the target domain two-dimensional image of the target domain user object in which the request is located in response to the request, the target domain two-dimensional image may be a plurality of images with different angles, and then the terminal devices send the target domain two-dimensional image to the service platform.

For example, the terminal device actively collects the target domain two-dimensional image of the user object in the target domain, and then the terminal device sends the target domain two-dimensional image to the service platform.

Optionally, the plurality of terminal devices may be devices under different types of target domain scenes, and the different types of target domain scenes may be, for example, a payment target domain scene, a face recognition access target domain scene, a three-dimensional game target domain scene, a three-dimensional shopping virtual experience target domain scene, and so on. Aiming at the terminal equipment service platform under the scenes of different types of target domains, the object processing can be respectively carried out on the target domain two-dimensional graphs under the different types of target domains.

S306: performing self-adaptive cross-domain adjustment processing on the first basis function vector based on the first basis function vector and the target domain two-dimensional image to obtain a second basis function vector;

reference may be made specifically to the method steps of other embodiments of the present disclosure, where the steps may be performed on a service platform or a terminal device, and are not described herein.

S308: and sending the second basis function vector to the terminal equipment so that the terminal equipment controls the object face reconstruction model to drive the virtual object image based on the second basis function vector to obtain a three-dimensional virtual object image, and performing image quality enhancement processing based on the three-dimensional virtual object image to obtain a target virtual object image.

It can be understood that after the second basis function vector is regenerated, the second basis function vector can be sent to a corresponding terminal device, the terminal device can update the basis function of the object face reconstruction model after receiving the second basis function vector, the terminal device can control the object face reconstruction model to drive the virtual object image based on the second basis function vector to obtain the three-dimensional image of the virtual object, and the terminal device can perform image quality enhancement processing based on the three-dimensional image of the virtual object to obtain the target virtual object image.

Referring to fig. 5, a flow diagram of a figure generation method, which may be implemented in dependence on a computer program, may be run on a figure generation device based on von neumann system is provided for one or more embodiments of the present description. The computer program may be integrated in the application or may run as a stand-alone tool class application. The image generating means may be a terminal device.

Specifically, the image generation method comprises the following steps:

S402: acquiring a first basis function vector from a service platform, wherein the first basis function vector is a basis function vector of an object facial reconstruction model acquired from a source domain by the service platform;

It will be appreciated that the service platform may obtain a first basis function vector for the object facial reconstruction model in the source domain and then send the first basis function vector to the terminal device.

S404: acquiring at least one target domain two-dimensional image in a target domain, and carrying out self-adaptive cross-domain adjustment processing on the first basis function vector based on the first basis function vector and the target domain two-dimensional image to obtain a second basis function vector;

The target field is an application field scene related to a three-dimensional face image, and the target field can be a payment target field, a face recognition access control target field, a three-dimensional game target field, a three-dimensional shopping virtual experience target field and the like.

Illustratively, after the terminal device acquires the first basis function vector, at least one target domain two-dimensional image of the user object under the current target domain may be acquired by an image acquisition portion (such as a camera), where the target domain two-dimensional image is a two-dimensional image acquired by the terminal device and at least includes a face of the user object, such as an RGB face image.

Schematically, the target domain two-dimensional image is usually an image in an actual application stage, the target domain two-dimensional image does not carry three-dimensional 3D information, and the terminal device can adopt the unlabeled target domain two-dimensional image to perform cross-domain adaptive adaptation of the basis function on the basis of the new first basis function vector, that is, perform adaptive cross-domain adjustment on the first basis function vector in the source domain to obtain a second basis function vector.

In one or more embodiments of the present disclosure, the adaptive cross-domain adjustment may be implemented by a basis function cross-domain adjustment model that is trained based on machine learning in advance, and the adaptive cross-domain adjustment of the first basis function vector is implemented by the basis function cross-domain adjustment model, to obtain the second basis function vector.

Illustratively, in one or more embodiments of the present disclosure, fig. 6 is a flow diagram of an exemplary second basis function vector retrieving process. Optionally, based on the one or more embodiments, when performing the adaptive cross-domain adjustment processing on the first basis function vector based on the first basis function vector and the target domain two-dimensional image to obtain a second basis function vector, the method may include the following scheme:

s502: inputting the first basis function vector and the target domain two-dimensional image into a basis function cross-domain adjustment model;

the basis function cross-domain adjustment model may be model trained based on a fit of one or more of the machine learning models.

The basis function cross-domain adjustment model generates a second basis function vector of an adaptive target domain by performing cross-domain adaptive adaptation of the basis function by using an unlabeled target domain two-dimensional image (only an RGB two-dimensional image, but no corresponding 3D information) on the basis of the obtained basis function (i.e., the first basis function vector) such as the source domain 3 DMM.

Illustratively, the input of the basis function cross-domain adjustment model is a first basis function vector and a target domain two-dimensional image, and the output is a second basis function vector of the object face reconstruction model under the target domain.

S504: extracting the object facial features of the target domain two-dimensional image through the basis function cross-domain adjustment model, carrying out self-adaptive cross-domain adjustment based on the object facial features and the first basis function vector to obtain a second basis function vector, and outputting the second basis function vector.

Optionally, the basis function cross-domain adjustment model includes at least a second object feature extraction network, a basis function adaptive network, and a cross-domain challenge classification network, and in some embodiments, the trained basis function cross-domain adjustment model may not include the cross-domain challenge classification network only including at least the second object feature extraction network, the basis function adaptive network.

Optionally, the second object feature extraction network may be a pre-training network, that is, an acquired pre-trained machine learning network for extracting image features from two-dimensional images, and in a model training process of the basis function cross-domain adjustment model, the second object feature extraction network may not perform model training adjustment.

Alternatively, the first object feature extraction network may be the same as the first object feature extraction network described above, or may be a different object feature extraction network, which is not specifically limited herein.

In a possible implementation manner, the extracting, by the basis function cross-domain adjustment model, the object facial feature of the target domain two-dimensional image, so as to perform adaptive cross-domain adjustment based on the object facial feature and the first basis function vector to obtain a second basis function vector may be:

The execution body may perform feature extraction on the target domain two-dimensional image by using the second object feature extraction network to obtain an object facial feature, adaptively adjust the first basis function vector by using the basis function adaptive network based on the object facial feature and the first basis function vector to obtain a second basis function vector, and perform domain type recognition by using the cross-domain countermeasure classification network based on the second basis function vector to obtain a reference domain type for the second basis function vector, where the reference domain type is used for performing domain type recognition supervision on the second basis function vector.

Schematically, inputting the target domain two-dimensional image and the first basis function vector into a basis function cross-domain adjustment model, wherein the processing object of the second object feature extraction network is the target domain two-dimensional image, and the processing result of the second object feature extraction network is the object facial feature; the processing object of the basis function self-adaptive network is an object facial feature and a first basis function vector, and the basis function of the target domain is assisted to be adapted to the first basis function vector based on the object facial feature, so that a second basis function vector is obtained. The input of the cross-domain countermeasure classification network can be a reference domain class identified by the output of the second basis function vector, namely a target domain class corresponding to the second basis function vector, and the cross-domain countermeasure classification network can be used for carrying out domain class identification supervision on the second basis function vector in an actual application stage and can be generally used in a model error adjustment stage after the subsequent model is online;

In some embodiments, the cross-domain challenge classification network may be network quiesced after the model is online, i.e., without outputting the reference domain class to the current execution body.

In one possible implementation, the training process of the basis function cross-domain adjustment model is as follows:

B2: acquiring a first reference sample set function vector under the source domain and a target domain two-dimensional sample image under the target domain, and performing model training on an initial basis function cross-domain adjustment model by adopting the target domain two-dimensional sample image and the first reference sample set function vector;

It will be appreciated that an initial basis function cross-domain adjustment model based on a machine learning model may be created in advance, and may include at least a second object feature extraction network, a basis function adaptation network, and a cross-domain countermeasure classification network.

B4: in the model training process, extracting a sample object facial feature through the second object feature extraction network based on a target domain two-dimensional sample image, obtaining a second sample basis function vector through a basis function self-adaptive network based on the sample object facial feature and a first reference sample basis function vector, and obtaining a sample domain category through a cross-domain countermeasure classification network based on the sample basis function vector, wherein the sample basis function vector comprises a first sample basis function vector and/or a second sample basis function vector;

it can be understood that the input of the cross-domain countermeasure classification network is a cross-domain adaptive basis function/source domain 3DMM basis function, that is, at least one of the first sample basis function vector and the second sample basis function vector, and the training process of the whole model is supervised by introducing a model supervision and identification signal into the cross-domain countermeasure classification network, and the output of the cross-domain countermeasure classification network is a class corresponding to the basis function, that is, the sample domain class.

B6: calculating a first model loss based on the sample domain class, the sample domain class label and the sample basis function vector, and performing model parameter adjustment on a basis function adaptive network and the cross-domain countermeasure classification network in the initial basis function cross-domain adjustment model based on the first model loss.

The sample domain class label can be understood as: the sample domain class label of the first sample basis function vector is a source domain class label, and the sample domain class label of the second sample basis function vector is a target domain class label;

Illustratively, the calculating the first model loss based on the sample domain class, the sample domain class label, and the sample basis function vector may be:

Obtaining classification loss by adopting a third loss calculation formula based on the sample domain type and the sample domain type label, determining a mean basis vector and a variance basis vector based on the sample basis function vector, obtaining distribution loss by adopting a fourth loss calculation formula based on the mean basis vector and the variance basis vector, and determining a first model loss based on the classification loss and the distribution loss;

the third loss calculation satisfies the following formula:

Loss_acc-cls＝-SoftminLoss(pred,y)

wherein, the Loss _acc-cls is the classification Loss, the pred is the sample domain class, and the y is a sample domain class label;

The classification loss is a cross-domain classification loss, the accuracy of cross-domain classification is minimized by a third loss calculation formula, "-softminLOSS (pred, y)" is a set cross-domain classification accuracy function, in some embodiments, when the sample domain class pred and the sample domain class label y are consistent, the value output by the cross-domain classification accuracy function is smaller relative to when the sample domain class pred and the sample domain class label y are inconsistent, and when the sample domain class pred and the sample domain class label y are inconsistent, the value output by the cross-domain classification accuracy function is larger when the sample domain class pred and the sample domain class label y are consistent, so that the model tends to be inaccurately distinguishable when the initial basis function cross-domain adjustment model identifies the basis function vector of the source domain and the target domain. The "-softminLOSS (pred, y)" form is not limited herein and may be any form of piecewise function, linear function, nonlinear function, or the like.

The fourth loss calculation satisfies the following formula:

Wherein the Loss _distribution is the distribution Loss, the avg (Base _3DMM) is the mean basis vector, and the std (Base _3DMM) is the variance basis vector.

The distribution loss can be understood as the distribution loss of the basis function in different domain categories, and the basis function vector can be regarded as a gaussian distribution situation based on the fourth loss calculation formula, so that the mean basis vector and the variance basis vector are as consistent as possible.

Optionally, the mean basis vector may be understood as a vector obtained by performing mean operation on a plurality of component basis vectors included in the basis function vector;

Optionally, the variance base vector may be understood as a vector obtained by performing variance operation on a plurality of component base vectors included in the base function vector;

optionally, the mean basis vector of each stage is expected to approach a zero vector by adjusting the model parameters and the variance basis vector of each stage is expected to approach a vector by adjusting the model parameters during the model back propagation by the fourth loss calculation formula.

Schematically, in each round of back propagation training in the model training stage, a first model loss may be calculated based on the sample domain class, the sample domain class label and the sample basis function vector, and model parameter adjustment may be performed on the basis of the first model loss on the basis function adaptive network and the cross-domain countermeasure classification network in the initial basis function cross-domain adjustment model until the initial basis function cross-domain adjustment model meets a model ending training condition, so as to obtain the basis function cross-domain adjustment model.

S506: and controlling the object face reconstruction model to drive the virtual object image based on the second basis function vector to obtain a three-dimensional virtual object image, and carrying out image quality enhancement processing based on the three-dimensional virtual object image to obtain a target virtual object image.

Optionally, an image quality enhancement mode can be adopted to enhance the image quality of the three-dimensional image of the virtual object so as to improve the super-resolution effect of the three-dimensional space and ensure the quality of the digital three-dimensional image.

In a possible implementation manner, a face tuning model may be trained in advance, and the three-dimensional image of the virtual object may be input to the face tuning model for image quality enhancement processing, and the target virtual object image may be output.

The facial tuning model may be derived based on one or more of the machine learning models, for example, the facial tuning model may be a CNN model applied in three dimensions.

Schematically, the input of the face tuning model is a three-dimensional image of the virtual object, and the output is a target virtual object image subjected to image quality enhancement treatment;

In one possible implementation, the training process of the face tuning model is as follows:

specifically, at least one virtual object three-dimensional sample image generated by the object face reconstruction model based on the second basis function vector is obtained, and a virtual three-dimensional sample image label corresponding to the virtual object three-dimensional sample image is obtained;

specifically, inputting the three-dimensional sample image of the virtual object into an initial face tuning model to perform at least one round of model training;

Schematically, the input of the object face reconstruction model is the virtual object three-dimensional sample image generated after the basis function fitting obtained by the cross-domain self-adaption.

Specifically, in the model training process, performing three-dimensional space super-resolution processing on the three-dimensional sample image of the virtual object through an initial object face reconstruction model to obtain a target virtual object sample image, determining a two-dimensional projection image of the target virtual object sample image at least one preset angle in a two-dimensional space, and determining a two-dimensional projection image tag of the three-dimensional sample image of the virtual object at least one preset angle in the two-dimensional space;

alternatively, the preset angle may be any angle i within a certain range, and the preset angle may be generally a plurality of angles.

Optionally, the two-dimensional projection image projected from the three-dimensional space to the two-dimensional space can be obtained by performing a two-dimensional projection process on the target virtual object sample image or the virtual object three-dimensional sample image at a preset angle, wherein the projection image of the target virtual object sample image is the two-dimensional projection image, and the projection image of the virtual object three-dimensional sample image is the two-dimensional projection image label.

Specifically, determining a third model loss by adopting a fifth loss calculation type based on the two-dimensional projection image and the two-dimensional projection image label, and carrying out model parameter adjustment on the initial face tuning model by adopting the third model loss to obtain a face tuning model;

The fifth loss calculation satisfies the following formula:

The Loss _sup is the third model Loss, 2 _sup-i is a two-dimensional projection image of the target virtual object sample image at a preset angle i in two-dimensional space, 2 _GT-i is a two-dimensional projection image tag of the virtual object three-dimensional sample image at a preset angle i in two-dimensional space, and i and N are positive integers.

Schematically, in the back propagation training of each round of model training stage, a third model loss can be determined by adopting a fifth loss calculation based on the two-dimensional projection image and the two-dimensional projection image label, and model parameter adjustment is performed on the initial face tuning model by adopting the third model loss until the initial face tuning model meets the model finishing training condition, so as to obtain the face tuning model.

In one or more embodiments of the present disclosure, the terminal device may acquire at least one target domain two-dimensional image in a target domain, and send the target domain two-dimensional image to a service platform, so that the service platform performs adaptive cross-domain adjustment processing on the first basis function vector based on the first basis function vector and the target domain two-dimensional image, to obtain a second basis function vector; and the terminal equipment can acquire the second basis function vector from the service platform, control the object face reconstruction model to drive the virtual object image based on the second basis function vector to obtain a three-dimensional virtual object image, and perform image quality enhancement processing based on the three-dimensional virtual object image to obtain a target virtual object image.

In one or more embodiments of the present disclosure, the image quality of the target virtual object image is greatly improved, so that the quality of the virtual digital face can be improved, and the situation that the fitting capability is reduced and the model processing capability is reduced after the object face reconstruction model is applied to the end side can be avoided, so that the image generation mode is closer to the actual application scene, and has good robustness

The image generating apparatus provided in the present specification will be described in detail with reference to fig. 7. Note that, the image generating apparatus shown in fig. 7 is used to perform the method of the embodiment shown in fig. 1 to 6 of the present specification, and for convenience of explanation, only the portion relevant to the present specification is shown, and specific technical details are not disclosed, please refer to the embodiment shown in fig. 1 to 6 of the present specification.

Referring to fig. 7, a schematic structural diagram of the character generating apparatus of the present specification is shown. The image generating apparatus 1 may be implemented as all or a part of the user terminal by software, hardware or a combination of both. According to some embodiments, the image generating apparatus 1 comprises a data acquisition module 11, a data transmission module 12, in particular for:

a data acquisition module 11 for acquiring a first basis function vector for a subject facial reconstruction model in a source domain;

The data sending module 12 is configured to send the first basis function vector to at least one terminal device, so that the terminal device obtains at least one target domain two-dimensional image in a target domain, perform adaptive cross-domain adjustment processing on the first basis function vector based on the first basis function vector and the target domain two-dimensional image to obtain a second basis function vector, control the object face reconstruction model to perform virtual object image driving based on the second basis function vector to obtain a virtual object three-dimensional image, and perform image quality enhancement processing based on the virtual object three-dimensional image to obtain a target virtual object image.

Optionally, the data acquisition module 11 is configured to:

acquiring a source domain two-dimensional image in a source domain, and acquiring a reference basis function vector corresponding to an object face reconstruction model determined based on the source domain two-dimensional image;

And determining a posture angle corresponding to the source domain two-dimensional image, and performing basis vector adjustment on the reference basis function vector based on the posture angle to obtain a first basis function vector aiming at the object face reconstruction model.

Optionally, the data acquisition module 11 is configured to:

Inputting the source domain two-dimensional image and the reference basis function vector into a source domain basis function generation model, determining an attitude angle corresponding to the source domain two-dimensional image through the source domain basis function generation model, carrying out attitude correction processing on the source domain two-dimensional image based on the attitude angle to obtain a source domain two-dimensional correction image, extracting source domain object facial features of the source domain two-dimensional correction image, carrying out basis vector adjustment based on the source domain object facial features and the reference basis function vector to obtain a first basis function vector, and outputting the first basis function vector.

Optionally, the data acquisition module 11 is configured to:

Acquiring a source domain two-dimensional sample image under the source domain and acquiring a reference basis function sample vector corresponding to an object face reconstruction model determined based on the source domain two-dimensional sample image, and performing model training on an initial source domain basis function generation model by adopting the source domain two-dimensional sample image and the reference basis function sample vector, wherein the initial source domain basis function generation model at least comprises an object posture estimation network, a first object feature extraction network and a basis function generation network;

In the model training process, an object posture estimation network is adopted to obtain a sample posture angle based on the source domain two-dimensional sample image, posture correction processing is carried out on the source domain two-dimensional sample image based on the sample posture angle to obtain a source domain sample correction image, a first object characteristic extraction network is adopted to extract source domain object sample characteristics based on the source domain sample correction image, and a basis function generation network is adopted to carry out basis vector adjustment based on the source domain object sample characteristics and the reference basis function sample vector to obtain a first basis function sample vector;

and calculating a first model loss based on the first basis function sample vector, the source domain two-dimensional sample image, the source domain virtual three-dimensional image tag and the reference basis function sample vector, and carrying out model parameter adjustment on a basis function generation network in the initial source domain basis function generation model by adopting the first model loss to obtain a source domain basis function generation model.

Optionally, the data acquisition module 11 is configured to:

Inputting the source domain sample correction image into an object face reconstruction model, and controlling the object face reconstruction model to drive an virtual object image based on the first basis function sample vector to obtain a source domain virtual object three-dimensional image;

Acquiring a source domain virtual three-dimensional image tag corresponding to the source domain two-dimensional sample image;

obtaining corrected reconstruction loss by adopting a first loss calculation formula based on the source domain sample corrected image and the source domain virtual three-dimensional image tag, obtaining basis function expression loss by adopting a second loss calculation formula based on the first basis function sample vector and the reference basis function sample vector, and determining first model loss based on the corrected reconstruction loss and the basis function expression loss;

the first loss calculation satisfies the following formula:

The second loss calculation satisfies the following formula:

The Loss _pre is the basis function expression Loss, the Base _3DMM is the first basis function sample vector, the Base _3DMM-OLD is the reference basis function sample vector, and the X is a conversion matrix of the first basis function sample vector and the reference basis function sample vector.

Optionally, the device 1 is further configured to:

acquiring at least one target domain two-dimensional image in a target domain from at least one terminal device;

performing self-adaptive cross-domain adjustment processing on the first basis function vector based on the first basis function vector and the target domain two-dimensional image to obtain a second basis function vector;

And sending the second basis function vector to the terminal equipment so that the terminal equipment controls the object face reconstruction model to drive the virtual object image based on the second basis function vector to obtain a three-dimensional virtual object image, and performing image quality enhancement processing based on the three-dimensional virtual object image to obtain a target virtual object image.

It should be noted that, when the image generating apparatus provided in the above embodiment performs the image generating method, only the division of the above functional modules is used for illustration, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the image generating device and the image generating method provided in the foregoing embodiments belong to the same concept, which embody the detailed implementation process in the method embodiment, and are not described herein again.

The foregoing description is provided for the purpose of illustration only and does not represent the advantages or disadvantages of the embodiments.

In one or more embodiments of the present disclosure, the image quality of the avatar of the virtual object is greatly improved, so that the quality of the virtual digital face can be improved, and the situation that the fitting capability is reduced and the model processing capability is reduced after the object face reconstruction model is applied to the end side can be avoided, so that the avatar generation mode is closer to the actual application scene, and has good robustness.

Referring to fig. 8, a schematic structural diagram of the character generating apparatus of the present specification is shown. The image generating means 2 may be implemented as all or part of the user terminal by software, hardware or a combination of both. According to some embodiments, the avatar generation device 2 includes a data acquisition module 21, an object driving module 22, and an avatar processing module 23, specifically configured to:

A data acquisition module 21, configured to acquire a first basis function vector from a service platform;

The object driving module 22 is configured to obtain at least one target domain two-dimensional image in a target domain, and perform adaptive cross-domain adjustment processing on the first basis function vector based on the first basis function vector and the target domain two-dimensional image to obtain a second basis function vector;

The image processing module 23 is configured to control the object face reconstruction model to perform virtual object image driving based on the second basis function vector to obtain a virtual object three-dimensional image, and perform image quality enhancement processing based on the virtual object three-dimensional image to obtain a target virtual object image;

optionally, the object driving module 22 is configured to:

inputting the first basis function vector and the target domain two-dimensional image into a basis function cross-domain adjustment model, extracting the object facial features of the target domain two-dimensional image through the basis function cross-domain adjustment model, carrying out self-adaptive cross-domain adjustment based on the object facial features and the first basis function vector to obtain a second basis function vector, and outputting the second basis function vector.

Optionally, the basis function cross-domain adjustment model includes at least a second object feature extraction network, a basis function adaptive network, and a cross-domain countermeasure classification network, and the object driving module 22 is configured to:

And carrying out feature extraction on the target domain two-dimensional image by adopting the second object feature extraction network to obtain an object facial feature, carrying out self-adaptive adjustment on the first basis function vector by adopting the basis function self-adaptive network based on the object facial feature and the first basis function vector to obtain a second basis function vector, carrying out domain type recognition on the second basis function vector by adopting the cross-domain countermeasure classification network to obtain a reference domain type aiming at the second basis function vector, wherein the reference domain type is used for carrying out domain type recognition supervision on the second basis function vector.

Optionally, the object driving module 22 is configured to:

acquiring a first reference sample set function vector under the source domain and a target domain two-dimensional sample image under the target domain, and performing model training on an initial basis function cross-domain adjustment model by adopting the target domain two-dimensional sample image and the first reference sample set function vector;

In the model training process, extracting a sample object facial feature through the second object feature extraction network based on a target domain two-dimensional sample image, obtaining a second sample basis function vector through a basis function self-adaptive network based on the sample object facial feature and a first reference sample basis function vector, and obtaining a sample domain category through a cross-domain countermeasure classification network based on the sample basis function vector, wherein the sample basis function vector comprises a first sample basis function vector and/or a second sample basis function vector;

and calculating a second model loss based on the sample domain class, the sample domain class label and the sample basis function vector, and performing model parameter adjustment on a basis function adaptive network and the cross-domain countermeasure classification network in the initial basis function cross-domain adjustment model based on the second model loss.

Optionally, the object driving module 22 is configured to:

Obtaining classification loss by adopting a third loss calculation formula based on the sample domain type and the sample domain type label, determining a mean basis vector and a variance basis vector based on the sample basis function vector, obtaining distribution loss by adopting a fourth loss calculation formula based on the mean basis vector and the variance basis vector, and determining a second model loss based on the classification loss and the distribution loss;

the third loss calculation satisfies the following formula:

Loss_acc-cls＝-SoftminLoss(pred,y)

The fourth loss calculation satisfies the following formula:

Wherein the Loss _distribution is the distribution Loss, the avg (Base _3DNM) is the mean basis vector, and the std (Base _3DMM) is the variance basis vector.

Optionally, the image processing module 23 is configured to:

And inputting the three-dimensional image of the virtual object into a face tuning model for image quality enhancement processing, and outputting the target virtual object image.

Optionally, the image processing module 23 is configured to:

acquiring at least one virtual object three-dimensional sample image generated by the object face reconstruction model based on the second basis function vector, and acquiring a virtual three-dimensional sample image tag corresponding to the virtual object three-dimensional sample image;

Inputting the three-dimensional sample image of the virtual object into an initial face tuning model to perform at least one round of model training;

In the model training process, performing three-dimensional space super-resolution processing on the three-dimensional sample image of the virtual object to obtain a target virtual object sample image, determining a two-dimensional projection image of the target virtual object sample image at least one preset angle in a two-dimensional space, and determining a two-dimensional projection image tag of the three-dimensional sample image of the virtual object at least one preset angle in the two-dimensional space;

Determining a third model loss by adopting a fifth loss calculation type based on the two-dimensional projection image and the two-dimensional projection image label, and carrying out model parameter adjustment on the initial face tuning model by adopting the third model loss to obtain a face tuning model;

The fifth loss calculation satisfies the following formula:

Optionally, the image processing module 23 is configured to:

acquiring at least one target domain two-dimensional image in a target domain, and sending the target domain two-dimensional image to a service platform so that the service platform carries out self-adaptive cross-domain adjustment processing on the first basis function vector based on the first basis function vector and the target domain two-dimensional image to obtain a second basis function vector;

And acquiring the second basis function vector from a service platform, controlling the object face reconstruction model to drive the virtual object image based on the second basis function vector to obtain a three-dimensional virtual object image, and performing image quality enhancement processing based on the three-dimensional virtual object image to obtain a target virtual object image.

The present disclosure further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are adapted to be loaded by a processor and execute the image generating method according to the embodiment shown in fig. 1 to 6, and the specific execution process may refer to the specific description of the embodiment shown in fig. 1 to 6, which is not repeated herein.

The present disclosure further provides a computer program product, where at least one instruction is stored, where the at least one instruction is loaded by the processor and executed by the processor to perform the image generating method according to the embodiment shown in fig. 1 to fig. 6, and the specific execution process may refer to the specific description of the embodiment shown in fig. 1 to fig. 6, which is not repeated herein.

Referring to fig. 9, a schematic structural diagram of another electronic device is provided in an embodiment of the present application. As shown in fig. 9, the electronic device 1000 may include: at least one processor 1001, at least one network interface 1004, a user interface 1003, a memory 1005, at least one communication bus 1002.

Wherein the communication bus 1002 is used to enable connected communication between these components.

The user interface 1003 may include a Display (Display), and the optional user interface 1003 may further include a standard wired interface, a wireless interface, among others.

The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.

Wherein the processor 1001 may include one or more processing cores. The processor 1001 connects various parts within the entire server 1000 using various interfaces and lines, and performs various functions of the server 1000 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1005, and calling data stored in the memory 1005. Alternatively, the processor 1001 may be implemented in at least one hardware form of digital signal Processing (DIGITAL SIGNAL Processing, DSP), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 1001 may integrate one or a combination of several of a central processor (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 1001 and may be implemented by a single chip.

The Memory 1005 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1005 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). The memory 1005 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc.; the storage data area may store data or the like referred to in the above respective method embodiments. The memory 1005 may also optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 9, an operating system, a network communication module, a user interface module, and application programs may be included in the memory 1005, which is one type of computer storage medium.

In the electronic device 1000 shown in fig. 9, the electronic device 1000 may be a service platform, and the user interface 1003 is mainly used for providing an input interface for a user to obtain data input by the user; and the processor 1001 may be configured to call an application program stored in the memory 1005, and specifically perform the following operations:

In one embodiment, the processor 1001, in executing the first basis function vector for the object facial reconstruction model in the acquisition source domain, performs the steps of:

In one embodiment, the processor 1001 performs, when executing the determining the pose angle corresponding to the source domain two-dimensional image, a basis vector adjustment on the reference basis function vector based on the pose angle, to obtain a first basis function vector for a target facial reconstruction model, and performs the following steps:

In one embodiment, the processor 1001, when executing the avatar generation method, further performs the steps of:

In one embodiment, the processor 1001 performs the following steps in performing the calculation of the second model loss based on the first basis function sample vector, the source domain two-dimensional sample image, source domain avatar label, and the reference basis function sample vector:

the first loss calculation satisfies the following formula:

The second loss calculation satisfies the following formula:

In one embodiment, after executing the first basis function vector for the object facial reconstruction model in the source domain, the processor 1001 further executes the steps of:

Referring to fig. 10, a block diagram of an electronic device according to an exemplary embodiment of the present disclosure is shown. The electronic device in this specification may include one or more of the following: processor 110, memory 120, input device 130, output device 140, and bus 150. The processor 110, the memory 120, the input device 130, and the output device 140 may be connected by a bus 150.

Processor 110 may include one or more processing cores. The processor 110 utilizes various interfaces and lines to connect various portions of the overall electronic device, perform various functions of the electronic device 100, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and invoking data stored in the memory 120. Alternatively, the processor 110 may be implemented in at least one hardware form of Digital Signal Processing (DSP), field-programmable gate array (FPGA), programmable logic array (programmable logic Array, PLA). The processor 110 may integrate one or a combination of several of a central processor (central processing unit, CPU), an image processor (graphics processing unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 110 and may be implemented solely by a single communication chip.

The memory 120 may include a random access memory (random Access Memory, RAM) or a read-only memory (ROM). Optionally, the memory 120 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 120 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, which may be an Android (Android) system, including an Android system-based deep development system, an IOS system developed by apple corporation, including an IOS system-based deep development system, or other systems, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The storage data area may also store data created by the electronic device in use, such as phonebooks, audiovisual data, chat log data, and the like.

Referring to FIG. 11, the memory 120 may be divided into an operating system space in which the operating system runs and a user space in which native and third party applications run. In order to ensure that different third party application programs can achieve better operation effects, the operating system allocates corresponding system resources for the different third party application programs. However, the requirements of different application scenarios in the same third party application program on system resources are different, for example, under the local resource loading scenario, the third party application program has higher requirement on the disk reading speed; in the animation rendering scene, the third party application program has higher requirements on the GPU performance. The operating system and the third party application program are mutually independent, and the operating system often cannot timely sense the current application scene of the third party application program, so that the operating system cannot perform targeted system resource adaptation according to the specific application scene of the third party application program.

In order to enable the operating system to distinguish specific application scenes of the third-party application program, data communication between the third-party application program and the operating system needs to be communicated, so that the operating system can acquire current scene information of the third-party application program at any time, and targeted system resource adaptation is performed based on the current scene.

Taking an operating system as an Android system as an example, as shown in fig. 12, a program and data stored in the memory 120 may be stored in the memory 120 with a Linux kernel layer 320, a system runtime library layer 340, an application framework layer 360 and an application layer 380, where the Linux kernel layer 320, the system runtime library layer 340 and the application framework layer 360 belong to an operating system space, and the application layer 380 belongs to a user space. The Linux kernel layer 320 provides the underlying drivers for various hardware of the electronic device, such as display drivers, audio drivers, camera drivers, bluetooth drivers, wi-Fi drivers, power management, and the like. The system runtime layer 340 provides the main feature support for the Android system through some C/c++ libraries. For example, the SQLite library provides support for databases, the OpenGL/ES library provides support for 3D graphics, the Webkit library provides support for browser kernels, and the like. Also provided in the system runtime library layer 340 is An Zhuoyun runtime library (Android runtime), which primarily provides some core libraries that can allow developers to write Android applications using the Java language. The application framework layer 360 provides various APIs that may be used in building applications, which developers can also build their own applications by using, for example, campaign management, window management, view management, notification management, content provider, package management, call management, resource management, location management. At least one application program is running in the application layer 380, and these application programs may be native application programs of the operating system, such as a contact program, a short message program, a clock program, a camera application, etc.; and may also be a third party application developed by a third party developer, such as a game-like application, instant messaging program, photo beautification program, etc.

Taking an operating system as an IOS system as an example, the program and data stored in the memory 120 are shown in fig. 13, the IOS system includes: core operating system layer 420 (Core OS layer), core services layer 440 (Core SERVICES LAYER), media layer 460 (MEDIA LAYER), and touchable layer 480 (Cocoa Touch Layer). The core operating system layer 420 includes an operating system kernel, drivers, and underlying program frameworks that provide more hardware-like functionality for use by the program frameworks at the core services layer 440. The core services layer 440 provides system services and/or program frameworks required by the application, such as a Foundation (Foundation) framework, an account framework, an advertisement framework, a data storage framework, a network connection framework, a geographic location framework, a sports framework, and the like. The media layer 460 provides an interface for applications related to audiovisual aspects, such as a graphics-image related interface, an audio technology related interface, a video technology related interface, an audio video transmission technology wireless play (AirPlay) interface, and so forth. The touchable layer 480 provides various commonly used interface-related frameworks for application development, with the touchable layer 480 being responsible for user touch interactions on the electronic device. Such as a local notification service, a remote push service, an advertisement framework, a game tool framework, a message User Interface (UI) framework, a User Interface UIKit framework, a map framework, and so forth.

Among the frameworks illustrated in fig. 13, frameworks related to most applications include, but are not limited to: a base framework in core services layer 440 and UIKit frameworks in touchable layer 480. The infrastructure provides many basic object classes and data types, providing the most basic system services for all applications, independent of the UI. While the class provided by the UIKit framework is a base UI class library for creating touch-based user interfaces, iOS applications can provide UIs based on the UIKit framework, so it provides the application's infrastructure for building user interfaces, drawing, handling and user interaction events, responding to gestures, and so on.

The manner and principle of implementing data communication between the third party application program and the operating system in the IOS system may refer to the Android system, and this description is not repeated here.

The input device 130 is configured to receive input instructions or data, and the input device 130 includes, but is not limited to, a keyboard, a mouse, a camera, a microphone, or a touch device. The output device 140 is used to output instructions or data, and the output device 140 includes, but is not limited to, a display device, a speaker, and the like. In one example, the input device 130 and the output device 140 may be combined, and the input device 130 and the output device 140 are a touch display screen for receiving a touch operation thereon or thereabout by a user using a finger, a touch pen, or any other suitable object, and displaying a user interface of each application program. Touch display screens are typically provided on the front panel of an electronic device. The touch display screen may be designed as a full screen, a curved screen, or a contoured screen. The touch display screen can also be designed to be a combination of a full screen and a curved screen, and a combination of a special-shaped screen and a curved screen is not limited in this specification.

In addition, those skilled in the art will appreciate that the configuration of the electronic device shown in the above-described figures does not constitute a limitation of the electronic device, and the electronic device may include more or less components than illustrated, or may combine certain components, or may have a different arrangement of components. For example, the electronic device further includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a wireless fidelity (WIRELESS FIDELITY, wiFi) module, a power supply, and a bluetooth module, which are not described herein.

In this specification, the execution subject of each step may be the electronic device described above. Optionally, the execution subject of each step is an operating system of the electronic device. The operating system may be an android system, an IOS system, or other operating systems, which is not limited in this specification.

The electronic device of the present specification may further have a display device mounted thereon, and the display device may be various devices capable of realizing a display function, for example: cathode ray tube displays (cathode ray tubedisplay, CR), light-emitting diode displays (light-emitting diode display, LED), electronic ink screens, liquid Crystal Displays (LCD), plasma display panels (PLASMA DISPLAY PANEL, PDP), and the like. A user may utilize a display device on electronic device 101 to view displayed text, images, video, etc. The electronic device may be a smart phone, a tablet computer, a gaming device, an AR (Augmented Reality ) device, an automobile, a data storage, an audio playing device, a video playing device, a notebook, a desktop computing device, a wearable device such as an electronic watch, electronic glasses, an electronic helmet, an electronic bracelet, an electronic necklace, an electronic article of clothing, etc.

In the electronic device shown in fig. 10, where the electronic device may be a terminal device, the processor 110 may be configured to invoke an application program stored in the memory 120 and specifically perform the following operations:

In one embodiment, the processor 110 performs the adaptive cross-domain adjustment on the first basis function vector based on the first basis function vector and the target domain two-dimensional image to obtain a second basis function vector, and performs the following steps:

In one embodiment, the processor 110, when executing the basis function cross-domain adjustment model, includes at least a second object feature extraction network, a basis function adaptation network, and a cross-domain countermeasure classification network,

Extracting the object facial features of the target domain two-dimensional image through the basis function cross-domain adjustment model to perform self-adaptive cross-domain adjustment based on the object facial features and the first basis function vector to obtain a second basis function vector, including:

In one embodiment, the processor 110, when executing the avatar generation method, further performs the steps of:

In one embodiment, the processor 110, in performing the calculating the second model penalty based on the sample domain class, sample domain class label, and the sample basis function vector, performs the steps of:

the third loss calculation satisfies the following formula:

Loss_acc-cls＝-SoftminLoss(pred,y)

The fourth loss calculation satisfies the following formula:

In one embodiment, the processor 110 performs the following steps when performing the image quality enhancement process based on the three-dimensional avatar of the virtual object to obtain the target avatar:

The fifth loss calculation satisfies the following formula:

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, or the like.

It should be noted that, information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals according to the embodiments of the present disclosure are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions. For example, object features, interactive behavior features, user information, and the like referred to in this specification are all acquired with sufficient authorization.

The foregoing disclosure is only illustrative of the preferred embodiments of the present invention and is not to be construed as limiting the scope of the claims, which follow the meaning of the claims of the present invention.

Claims

1. An image generation method applied to a service platform, the method comprising:

Transmitting the first basis function vector to at least one terminal device, so that the terminal device obtains at least one target domain two-dimensional image in a target domain, obtains a second basis function vector based on the first basis function vector and the target domain two-dimensional image, controls the object face reconstruction model to drive a virtual object image based on the second basis function vector to obtain a virtual object three-dimensional image, and carries out image quality enhancement processing based on the virtual object three-dimensional image to obtain a target virtual object image;

Wherein the obtaining a second basis function vector based on the first basis function vector and the target domain two-dimensional image includes:

The terminal equipment inputs the first basis function vector and the target domain two-dimensional image into a basis function cross-domain adjustment model, extracts the object facial features of the target domain two-dimensional image through the basis function cross-domain adjustment model, so as to obtain a second basis function vector by self-adaptive cross-domain adjustment based on the object facial features and the first basis function vector, and outputs the second basis function vector.

2. The method of claim 1, the obtaining a first basis function vector for a subject facial reconstruction model in a source domain, comprising:

3. The method of claim 2, wherein determining the pose angle corresponding to the source domain two-dimensional image, performing base vector adjustment on the reference base function vector based on the pose angle, and obtaining a first base function vector for a target facial reconstruction model, comprises:

4. A method according to claim 3, the method further comprising:

And calculating a first model loss based on the first basis function sample vector, the source domain two-dimensional sample image, the source domain virtual three-dimensional image tag and the reference basis function sample vector, and performing model parameter adjustment on a basis function generation network in the initial source domain basis function generation model by adopting the first model loss to obtain a source domain basis function generation model.

5. The method of claim 4, the calculating a second model loss based on the first basis function sample vector, the source domain two-dimensional sample image, a source domain avatar label, and the reference basis function sample vector, comprising:

the first loss calculation satisfies the following formula:

Wherein the said For the correction of reconstruction losses, the/>For the source domain virtual object three-dimensional avatar, the/>A virtual three-dimensional image label for the source domain;

The second loss calculation satisfies the following formula:

Wherein the said Expression loss for the basis function, the/>For the first basis function sample vector, the/>For the reference basis function sample vector, the/>A transformation matrix for the first basis function sample vector and the reference basis function sample vector.

6. The method of claim 1, further comprising, after the obtaining the first basis function vector for the object facial reconstruction model in the source domain:

7. An image generation method applied to a terminal device, the method comprising:

Controlling the object face reconstruction model to drive the virtual object image based on the second basis function vector to obtain a three-dimensional virtual object image, and carrying out image quality enhancement processing based on the three-dimensional virtual object image to obtain a target virtual object image;

8. The method of claim 7, wherein the basis function cross-domain adjustment model comprises at least a second object feature extraction network, a basis function adaptation network, and a cross-domain challenge classification network,

9. The method of claim 8, the method further comprising:

10. The method of claim 9, the calculating a second model loss based on the sample domain class, sample domain class label, and the sample basis function vector, comprising:

the third loss calculation satisfies the following formula:

Wherein the said For the classification loss, the pred is the sample domain class, and the y is a sample domain class label;

The fourth loss calculation satisfies the following formula:

Wherein the said For the distribution loss, the/>For the mean basis vector, the/>Is the variance base vector.

11. The method of claim 7, wherein the performing the image quality enhancement process based on the three-dimensional avatar to obtain the target avatar comprises:

12. The method of claim 11, the method further comprising:

The fifth loss calculation satisfies the following formula:

Wherein the said For the third model loss, the/>A two-dimensional projection image of the target virtual object sample image at a preset angle i in a two-dimensional space, wherein the/>And presetting a two-dimensional projection image label of the three-dimensional sample image of the virtual object at an angle i in a two-dimensional space, wherein i and N are positive integers.

13. The method of claim 7, the method further comprising:

14. An image generation apparatus, the apparatus comprising:

The data transmission module is used for transmitting the first basis function vector to at least one terminal device, so that the terminal device obtains at least one target domain two-dimensional image in a target domain, carries out self-adaptive cross-domain adjustment processing on the first basis function vector and the target domain two-dimensional image to obtain a second basis function vector, controls the object face reconstruction model to carry out virtual object image driving based on the second basis function vector to obtain a virtual object three-dimensional image, and carries out image quality enhancement processing based on the virtual object three-dimensional image to obtain a target virtual object image;

15. An image generation apparatus, the apparatus comprising:

the data acquisition module is used for acquiring a first basis function vector from a service platform, wherein the first basis function vector is a basis function vector of an object face reconstruction model acquired from a source domain by the service platform;

The image processing module is used for controlling the object face reconstruction model to drive the virtual object image based on the second basis function vector to obtain a three-dimensional virtual object image, and carrying out image quality enhancement processing based on the three-dimensional virtual object image to obtain a target virtual object image;

16. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method steps of any one of claims 1 to 6 or 7 to 13.

17. A computer program product storing at least one instruction for loading by a processor and performing the method steps of any of claims 1-6 or 7-13.

18. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-6 or 7-13.