CN117649482A

CN117649482A - Image processing method, device, electronic equipment and computer readable storage medium

Info

Publication number: CN117649482A
Application number: CN202211046131.4A
Authority: CN
Inventors: 鲍文杰; 王乃洲; 张玉兵; 马雪浩
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd; Guangzhou Shiyuan Artificial Intelligence Innovation Research Institute Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd; Guangzhou Shiyuan Artificial Intelligence Innovation Research Institute Co Ltd
Priority date: 2022-08-30
Filing date: 2022-08-30
Publication date: 2024-03-05

Abstract

The disclosure provides an image processing method, an image processing device, an electronic device and a computer readable storage medium, and relates to the technical field of image processing, wherein the method comprises the following steps: reconstructing the first face image to obtain a first face model; driving the first face model to perform motion transformation to obtain a second face model; rendering a second face model to obtain a second face image; and reconstructing detail information of the second face image to obtain a target face image with higher resolution than the second face image. The method and the device simplify the face modeling process by taking the low-precision 3D face model as input, reduce the cost and time consumption of face modeling, and realize the gradual improvement of rendering quality by using a cascade method from coarse to fine.

Description

Image processing method, device, electronic equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method, an image processing device, an electronic device, and a computer readable storage medium.

Background

The 3D (3D) face model rendering technology plays an important role in the fields of virtual digital person production, movie production, game production, and the like. In the traditional rendering technology based on rasterization or ray tracing, in order to make the rendered face image more lifelike, high-precision scanning equipment is often required to scan a true face to establish a 3D model, and meanwhile, high-computation computing equipment is also required to render the face model to obtain a 2D face image. However, high-precision scanning equipment and high-computation-force computing equipment often need higher cost, and building a high-precision 3D face model also needs to acquire a large amount of face point cloud data by using professional equipment, so that the process is complex and time-consuming. Therefore, it has been challenging to render high quality digital face content (e.g., face video, etc.) in a limited cost and time.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The disclosure aims to provide an image processing method, an image processing device, electronic equipment and a computer readable storage medium, which can solve the technical problems of complex flow and high cost in establishing a high-precision 3D face model.

Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

According to an aspect of the present disclosure, there is provided an image processing method including:

reconstructing a first face image to obtain a first face model, wherein the face characteristic parameters of the first face model are less than those of the first face image;

driving the first face model to perform motion transformation to obtain a second face model;

rendering the second face model to obtain a second face image;

reconstructing detail information of the second face image to obtain a target face image with higher resolution than the second face image, wherein the detail information at least comprises texture information and illumination information.

Optionally, the step of reconstructing the detail information of the second face image to obtain the target face image with a resolution higher than that of the second face image includes:

constructing a neural renderer based on an countermeasure network;

reconstructing the detail information of the second face image by adopting the nerve renderer to obtain a target face image with higher resolution than the second face image.

Optionally, the neural renderer includes a generator and a discriminator, the generator is connected with the discriminator, and the step of constructing the neural renderer based on the countermeasure network includes:

acquiring a training data pair, wherein the training data pair comprises a first sample face image and a second sample face image, and the resolution of the first sample face image is lower than that of the second sample face image;

inputting the first sample face image into the generator to obtain a sample pseudo image, and fixing first model parameters of the discriminator;

determining a first loss function of the generator according to the second sample face image and the sample pseudo image, and updating a second model parameter of the generator according to the first loss function;

Constructing positive sample data and negative sample data according to the training data pairs and the sample pseudo images;

inputting the sample data and the negative sample data into the discriminator to respectively obtain a positive sample image corresponding to the positive sample data and a negative sample image corresponding to the negative sample data, and fixing the second model parameters;

determining a second loss function of the arbiter according to the positive sample image and the negative sample image, and updating the first model parameters according to the second loss function;

determining a target loss function of the neural renderer according to the first loss function and the second loss function;

and under the condition that the curve of the target loss function tends to be stable and the positive sample image and the negative sample image reach preset expectations, stopping updating the first model parameter and the second model parameter so as to complete the construction of the nerve renderer.

Optionally, the step of constructing positive sample data and negative sample data according to the training data pair and the sample pseudo image includes:

splicing the first sample face image and the second sample face image according to the dimension of the image channel to obtain the positive sample data; the method comprises the steps of,

And splicing the first sample face image and the sample pseudo image according to the dimension of the image channel to obtain the negative sample data.

Optionally, the step of determining the second loss function of the arbiter according to the positive sample image and the negative sample image includes:

acquiring a first probability result corresponding to the positive sample image and a second probability result corresponding to the negative sample image;

and determining a second loss function of the discriminator according to the first probability result and the second probability result.

Optionally, the step of determining the target loss function of the neural renderer according to the first loss function and the second loss function includes:

extracting a first high-dimensional feature of the second sample face image and a second high-dimensional feature of the sample pseudo image;

acquiring a third loss function according to the characteristic difference between the first high-dimensional characteristic and the second high-dimensional characteristic;

and determining the target loss function according to the sum of the first loss function, the second loss function and the third loss function.

Optionally, the generator comprises a neural network and the arbiter comprises a full convolutional network.

Optionally, the step of reconstructing the first face image to obtain the first face model includes:

acquiring a trained face model reconstruction network;

and reconstructing the first face image by adopting the face model reconstruction network to obtain the first face model.

Optionally, the step of driving the first face model to perform motion transformation to obtain a second face model includes:

acquiring a trained face driving module;

and driving the first face model to perform motion transformation according to the target motion by adopting the face driving module to obtain the second face model.

According to another aspect of the present disclosure, there is provided an image processing apparatus including:

the model building module is used for reconstructing the first face image to obtain a first face model, and the face characteristic parameters of the first face model are smaller than those of the first face image;

the image driving module is used for driving the first face model to perform motion transformation to obtain a second face model;

the image rendering module is used for rendering the second face model to obtain a second face image;

And the image reconstruction module is used for reconstructing detail information of the second face image to obtain a target face image with resolution higher than that of the second face image, wherein the detail information at least comprises texture information and illumination information.

According to still another aspect of the present disclosure, there is provided an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the image processing method as described in the above embodiment when executing the computer program.

According to still another aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method as described in the above embodiments.

The image processing method, the device, the electronic equipment and the computer readable storage medium provided by the embodiment of the disclosure have the following technical effects:

according to the technical scheme, a rough result can be provided for the method by using a 3D face model reconstruction network based on a single Zhang Ren face image, then the details of the face image are finished by adopting an image-to-image method, namely, firstly, the first face image is reconstructed to obtain a low-precision first face model, then the first face model is driven to perform motion transformation to obtain a second face model, then, the second face model is rendered to obtain a rough second face image, finally, the detail information of the second face image is reconstructed to obtain a technical scheme of a target face image with higher resolution than that of the second face image, and by using the low-precision 3D face model as input, the face modeling process is simplified, the face modeling cost and time consumption are reduced, and the gradual rendering quality improvement by using a rough-to-fine cascading method is realized.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.

FIG. 1 shows a flow diagram of an image processing method in an exemplary embodiment of the present disclosure;

FIG. 2 is a diagram illustrating an image corresponding to an image processing method according to an exemplary embodiment of the present disclosure;

FIG. 3 is a schematic flow chart corresponding to step S140 in the image processing method of the present disclosure;

FIG. 4 illustrates an exemplary flow diagram corresponding to the presently disclosed construction of a neural renderer;

FIG. 5 shows a flow diagram for determining a target loss function for a neural renderer;

fig. 6 shows a schematic configuration diagram of an image processing apparatus according to an exemplary embodiment of the present disclosure;

Fig. 7 shows a schematic structural diagram of an electronic device in an exemplary embodiment of the present disclosure.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present disclosure more apparent, the embodiments of the present disclosure will be described in further detail below with reference to the accompanying drawings.

When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not meant to be all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the disclosure as detailed in the accompanying claims.

It is noted that the above-described figures are merely schematic illustrations of processes involved in a method according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

Embodiments of an image processing method provided by the present disclosure are described below. Wherein fig. 1 shows a flow diagram of an image processing method in an exemplary embodiment of the present disclosure; fig. 2 shows an image schematic diagram corresponding to the image processing method in an exemplary embodiment of the present disclosure. As shown in fig. 1 and 2, an image processing method provided by an embodiment of the method of the present disclosure includes the following schemes:

Step S110: and reconstructing the first face image to obtain a first face model.

In an exemplary embodiment, the first face image may be understood as a real face image, i.e. R in fig. 2 _i . The real face image includes all features and parts of the human head, such as hair, ears, eyebrows, eyes, nose, mouth, chin, neck, etc. Reconstructing the first face image means that a 3D face model corresponding to the first face image and having low accuracy can be modeled by the first face image, and the 3D face model is actually a face BFM (Basel Face Model) model, which is called a first face model, namely M in fig. 2 _i The first face model is a 3D model of the face, excluding hair, ears, chin, neck, etc., which are blocked. Wherein M is _i ＝{α _i ，β _i ，γ _i ，δ _i ，θ _i }im＝1，α _i Representing expression base coefficient, beta _i Representing identity coefficients, gamma i representing albedo coefficients, delta _i Represents the illumination coefficient, theta _i Representing the attitude coefficient, M _i Represents the ith model and m represents the number. The first face model being low-precision relative to the first face image, the face feature parameters of the first face model being less than the face feature parameters of the first face image, wherein the face feature parameters are used to represent parameters of various parts of the face, such as parameters of eye parts, may indicate that the first face model is low-precision relative to the first face image. In addition, the fact that the first face model is low-precision relative to the first face image can also be understood that the detail information of the face in the first face model is deviated from the detail information of the face in the first face image, and the similarity between the first face model and the first face image is low.

Optionally, the first face model may be obtained through a face model reconstruction network, that is, step S110 includes the following schemes:

acquiring a trained face model reconstruction network;

It should be appreciated that the face model reconstruction network is also referred to as a 3D face model reconstruction network, i.e. denoted C in fig. 2, which is pre-trained for the construction of 3D face models, and the constructed 3D face models are low-precision. After the first face image is obtained, a trained face model reconstruction network is obtained, input data of the first face image is input into the face model reconstruction network, and the face model reconstruction network performs three-dimensional reconstruction on the first face image to obtain a low-precision first face model.

Step S120: and driving the first face model to perform motion transformation to obtain a second face model.

After the first face model is obtained, the first face model is driven to perform motion transformation, namely the first face model is enabled to perform corresponding target motion, so that a second face model, namely M ', is obtained' _i . The second face model is also a low-precision and 3D model of the face, which differs from the first face model in the action of the face. As shown in fig. 2, a first face model M _i Is slightly opened, the target action is to open the large mouth, namely to increase the opening degree of the mouth, and the second face model M' _i Is larger than the first face model M _i The mouth opening of (2) is then the second face model M' _i With a first face model M _i Is a distinction between (a) and (b).

Optionally, the second face model may be obtained by the face driving module, that is, step S120 includes the following schemes:

acquiring a trained face driving module;

The face driver module, also referred to as a 3D face driver module, indicated as D in fig. 2, is pre-trained. The face driving module can be a neural network capable of generating corresponding BFM model coefficients, the BFM model coefficients can also be manually set parameters, and the purpose of the face driving module is to drive the face model to make a desired target action, so that tasks such as video production and the like are completed.

After the first face model is obtained, a trained face driving module is obtained, then the face driving module is adopted to drive the first face model to perform motion transformation according to target motion, and the first face model after the motion transformation, namely a second face model M ', is obtained' _i . For example, the first face model is closed and the target motion is open, and then the face driving module drives the first face model to open the mouth, so as to obtain a second face model M' _i Is open mouth.

Step S130: and rendering the second face model to obtain a second face image.

And after the second face model is obtained, rendering the second face model, wherein the rendering of the second face model means that the second face model performs raster rendering on the second face model to obtain a rough second face image. The second face image is a 2D image, the accuracy of the second face image is basically the same as that of the first face image, that is, the second face image and the first face image are basically similar, the second face image is equivalent to the first face image, and the second face image is different from the first face image in terms of facial action, and the resolution of the second face image is lower than that of the first face image, that is, the second face image is rough or not clear.

Step S140: reconstructing detail information of the second face image to obtain a target face image with higher resolution than the second face image.

The detail information includes at least texture information and illumination information. After obtaining the second face image, for the first faceReconstructing the detail information of the two face images to obtain a target face image with higher resolution than the second face image, namely reconstructing the detail information of the second face image to obtain a high-resolution and high-precision face image, namely a realistic target face image. F as shown in FIG. 2 _i Representing the target face image. The reconstruction of the detail information of the second face image can be understood as complementing the missing detail information of the second face image, so that the obtained target face image is high-resolution.

According to the technical scheme, a coarser result can be provided for the method by using a 3D face model reconstruction network based on a single Zhang Ren face image, then the details of the face image are finished by adopting an image-to-image method, namely, firstly, the first face image is reconstructed to obtain a low-precision first face model, then the first face model is driven to perform motion transformation to obtain a second face model, the second face model is rendered to obtain a coarse second face image, finally, the detail information of the second face image is reconstructed to obtain the technical scheme of the target face image with higher resolution than that of the second face image, and by using the low-precision 3D face model as input, the face modeling process is simplified, the face modeling cost and time consumption are reduced, and the step-by-step rendering quality is improved by using a coarse-to-fine cascading method.

Fig. 3 is a schematic flow chart corresponding to step S140 in the image processing method of the present disclosure. Optionally, based on the above method embodiment, step S140 includes the following schemes:

step S141: constructing a neural renderer based on an countermeasure network;

step S142: reconstructing the detail information of the second face image by adopting the nerve renderer to obtain a target face image with higher resolution than the second face image.

In an exemplary embodiment, the target face image may be obtained by reconstructing details of the second face image by a neural renderer based on the countermeasure network, where the neural renderer is based on the countermeasure network, and it may also be understood that the present disclosure uses the conditional countermeasure generation network as the neural renderer. The neural renderer needs to be constructed in advance, and can also be understood that the neural renderer needs to be trained in advance, then the rough second face image is input into the trained neural renderer, the detailed information such as texture, illumination and the like of the second face image is reconstructed through the neural renderer, the target face image with the resolution higher than that of the second face image is obtained, namely, a vivid 2D target face image, and then the vivid 2D target face image is output as final output content.

By way of example, fig. 4 illustrates an exemplary flow diagram corresponding to the present disclosure for constructing a neural renderer. Optionally, based on the above method embodiment, the neural renderer includes a generator and a discriminator, an output of the generator is connected with an input of the discriminator, the generator is used for generating the pseudo-image, and the discriminator is used for distinguishing the true image from the false image. Step S141 includes the following schemes:

acquiring training data pairs;

In an exemplary embodiment, the model parameters of the discriminators are fixed when training the generator and the discriminators in the training process of the neural renderer, that is, when model parameters of the generator are updated; when the model parameters of the discriminator are updated, the model parameters of the generator are fixed.

The training data pair comprises a first sample face image and a second sample face image, the resolution of the first sample face image is lower than that of the second sample face image, the first sample face image is rough relative to the second sample face image, the second sample face image is a real face image, and the first sample face image is a rough face image obtained after the second sample face image sequentially passes through face model reconstruction network processing, face driving module processing and grating rendering.

The first sample face image and the second sample face image are provided with a plurality of pieces, and can be understood as an image sequence, namely the first sample face image sequence and the second sample face image sequence, wherein the number of the first sample face image and the second sample face image is the same. The first sample sequence of face images is denoted as I,where k represents the number of first sample face images and the second sample face image sequence is represented as I _r ，/>Where k represents the number of face images of the second sample, then the training data pair is denoted +.>Where n represents the logarithm of the training data pair, i.e., n=k, and the first sample face image and the second sample face image corresponding thereto are one data pair.

Training of a generator: after the training data pair is acquired, the generator is trained through the first sample face image in the training data pair, namely the first sample face image in the first sample face image sequence is sequentially input into the generator, the generator generates sample pseudo images, the generated sample pseudo images also have a plurality of pieces, the number of the sample pseudo images is the same as that of the first sample face images, the sample pseudo image sequence is called as I',where k represents the number of sample pseudo-images. After the sample pseudo-image sequence is obtained, a first loss function of the generator is determined according to the quality difference between the sample pseudo-image in the sample pseudo-image sequence and the second sample face image in the second sample face image sequence, and further, the second model parameters of the generator are continuously updated according to the first loss function, so that continuous training of the generator is realized. Wherein the mass difference can be represented by a difference in resolution, the first loss function is set to L1 loss, denoted as L _l1 . The first model parameters of the discriminant need to be fixed during the training of the generator, i.e. they are constant throughout the updating of the second model parameters of the generator. Through continuous training of the generator, the quality of I' and the quality of I are closer and closer, so that the rendered face is more and more real. Optionally, the generator includes a neural network, and in order to obtain a better image generation effect, a network structure of the generator may be used as the neural network of the ultraviolet network. Since the neural renderer adopts the condition generation countermeasure network in the generation countermeasure network (GAN, generative Adversarial Networks), the generator inputs the first sample face diagram i as the condition instead of inputting a series of noises, thereby greatly reducing the search space of the neural network, leading the training of the generator to be faster and receivingThe model performance is better while the model is converged.

Training of a discriminator: the role of the discriminator is to distinguish the authenticity of the image, so that the image is fed back to the generator correspondingly and is opposed to the generator, and the image and the generator can obtain better model performance in continuous games. In the condition generating countermeasure network, the input of the discriminator is divided into two parts, one part is a second sample face image sequence And sample pseudo-image sequence->Another part being conditional input, i.e. first sample face image sequence

After updating the first model parameters of the discriminator, training the discriminator, namely constructing an authenticity data pair according to the training data pair and the sample pseudo-image, wherein the authenticity data pair comprises positive sample data and negative sample data, wherein the positive sample data and the negative sample data are provided with digital labels, for example, the digital label of the positive sample data is set to be 1, and the digital label of the negative sample data is set to be 0. Optionally, the constructing positive sample data and negative sample data according to the training data pair and the sample pseudo image includes: and splicing the first sample face image and the second sample face image according to the dimension of the image channel to obtain positive sample data, and splicing the first sample face image and the sample pseudo image according to the dimension of the image channel to obtain negative sample data.

For example, a second sample face image sequenceAnd sample pseudo-image sequenceAs a first partial input, a first sample face image sequence +.>As another partial input, where i _i ，/>i’ _i ∈R ^N×N×3 N represents the resolution of the image, and 3 indicates that the input images are all three channels. Image i of the first sample face _i And second sample face image +.>Stitching in the third image channel dimension to obtain positive sample data, denoted +.>Also, the first sample face image i _i And sample pseudo-image i' _i Stitching in the third image channel dimension to obtain negative sample data represented as (i) _i ，i’ _i )。

After the construction of the positive sample data and the negative sample data is completed, the positive sample data and the negative sample data are respectively input into the discriminator to respectively obtain a positive sample image corresponding to the positive sample data and a negative sample image corresponding to the negative sample data, and then a second loss function of the discriminator is determined according to the positive sample image and the negative sample image, and further the first model parameters of the discriminator are continuously updated according to the second loss function, so that continuous training of the discriminator is realized. Wherein the second loss function is set to combat losses, denoted L _GAN . During training of the discriminant, the second model parameters of the generator need to be fixed, i.e. during updating of the first model parameters of the discriminant, the second model parameters of the generator are constant all the time. To obtain better local detail, the arbiter includes a full convolutional network, for example, a PatchGAN full convolutional network with multi-scale output may be employed as the network structure of the arbiter.

Optionally, the determining the second loss function of the arbiter according to the positive sample image and the negative sample image includes: acquiring a first probability result corresponding to the positive sample image and a second probability result corresponding to the negative sample image; and determining a second loss function of the discriminator according to the first probability result and the second probability result.

It should be understood that after the positive sample data and the negative sample data are respectively input into the discriminator, the positive sample image output by the discriminator carries a first probability result, the negative sample image carries a second probability result, the first probability result and the second probability result both comprise probability values of the corresponding image being a true image and probability values of the false image, the probability values comprise a plurality of probability values, and each probability value respectively represents the probability that each piece of information in the image output by the discriminator is true or false. After the first probability result and the second probability result are obtained, a second loss function of the discriminator is obtained through calculation by a calculation method of cross probability based on the first probability result and the second probability result.

In order to make the generated face image clear and vivid, after the first loss function and the second loss function are obtained respectively, determining a target loss function of the nerve renderer by the first loss function and the second loss function, which is denoted as L _r ，L _r ＝L _GAN +L _l1 Wherein L is _GAN And L _1l The fight loss and the L1 loss are respectively combined, and the face images with higher quality can be obtained through combined training.

After the target loss function of the nerve renderer is obtained, in the process of continuously training the generator and the discriminator in sequence, judging whether the curve of the target loss function is stable or not, judging whether the positive sample image and the negative sample image output by the generator reach preset expectations or not, if the curve of the target loss function tends to be stable and the positive sample image and the negative sample image reach the preset expectations, considering that the model is converged, stopping updating the first model parameter of the discriminator and the second model parameter of the generator, and finishing training of the discriminator and the generator, namely indicating that the construction of the nerve renderer is finished, and generating a vivid 2D face image by using the constructed nerve renderer.

By way of example, fig. 5 shows a flow diagram for determining an objective loss function for a neural renderer. Optionally, based on the method embodiment, the determining the target loss function of the neural renderer according to the first loss function and the second loss function includes:

Step 1411: extracting a first high-dimensional feature of the second sample face image and a second high-dimensional feature of the sample pseudo image;

step 1412: acquiring a third loss function according to the characteristic difference between the first high-dimensional characteristic and the second high-dimensional characteristic;

step 1413: and determining the target loss function according to the sum of the first loss function, the second loss function and the third loss function.

In an exemplary embodiment, the present disclosure also introduces perceptual loss in order to better enhance the sharpness of the image quality details. Respectively extracting a first high-dimensional feature of a second sample face image and a second high-dimensional feature of a sample pseudo image by utilizing a pretrained VGG16 network, then calculating feature differences of the first high-dimensional feature and the second high-dimensional feature, and expressing the loss of the high-dimensional feature of the image by the feature differences to obtain a third loss function L _per Through a third loss function L _per The model is constrained, so that the model generates a face image with higher quality. Wherein the third loss function L _per The calculation formula of (2) is as follows:

wherein,representing a first high-dimensional feature, VGG (i' _i ) Representing a second high-dimensional feature.

After obtaining the third loss function, calculating the sum of the first loss function, the second loss function and the third loss function to obtain a target loss function of the nerve renderer, namely, the target loss function is:

L _r ＝L _GAN +L _l1 +L _per 。

The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.

In which fig. 6 shows a schematic configuration of an image processing apparatus to which an embodiment of the present disclosure can be applied. Referring to fig. 6, the image processing apparatus shown in the figure may be implemented as a whole or a part of a terminal by software, hardware, or a combination of both, or may be integrated in the terminal or on a server as a separate module.

The image processing apparatus 600 in the embodiment of the present disclosure, the image processing apparatus 600 includes:

the model building module 610 is configured to reconstruct a first face image to obtain a first face model, where a face feature parameter of the first face model is less than a face feature parameter of the first face image;

the image driving module 620 is configured to drive the first face model to perform motion transformation to obtain a second face model;

an image rendering module 630, configured to render the second face model to obtain a second face image;

the image reconstruction module 640 is configured to reconstruct detail information of the second face image, to obtain a target face image with a resolution higher than that of the second face image, where the detail information includes at least texture information and illumination information.

In an exemplary embodiment, based on the foregoing scheme, the image reconstruction module 640 includes:

a renderer construction unit for constructing a neural renderer based on the countermeasure network;

and the detail reconstruction unit is used for reconstructing the detail information of the second face image by adopting the nerve renderer to obtain a target face image with higher resolution than the second face image.

In an exemplary embodiment, based on the foregoing, the neural renderer includes a generator and a arbiter, and the renderer construction unit includes:

a data pair acquisition unit configured to acquire a training data pair including a first sample face image and a second sample face image, the first sample face image having a resolution lower than that of the second sample face image;

a pseudo-image generating unit, configured to input the first sample face image into the generator, obtain a sample pseudo-image, and fix a first model parameter of the discriminator;

a first loss function determining unit, configured to determine a first loss function of the generator according to the second sample face image and the sample pseudo image, and update a second model parameter of the generator according to the first loss function;

The positive and negative sample construction unit is used for constructing positive sample data and negative sample data according to the training data pair and the sample pseudo image;

a sample image generating unit configured to input the sample data and the negative sample data into the discriminator, obtain a positive sample image corresponding to the positive sample data and a negative sample image corresponding to the negative sample data, respectively, and fix the second model parameter;

a second loss function determining unit configured to determine a second loss function of the arbiter based on the positive sample image and the negative sample image, and update the first model parameter based on the second loss function;

a target loss function determining unit configured to determine a target loss function of the neural renderer based on the first loss function and the second loss function;

and the judging unit is used for stopping updating the first model parameter and the second model parameter under the condition that the curve of the target loss function tends to be stable and the positive sample image and the negative sample image reach a preset expected value so as to complete the construction of the nerve renderer.

In an exemplary embodiment, based on the foregoing solution, the positive and negative sample building unit includes:

The first splicing subunit is used for splicing the first sample face image and the second sample face image according to the dimension of the image channel to obtain the positive sample data; the method comprises the steps of,

and the second splicing subunit is used for splicing the first sample face image and the sample pseudo image according to the dimension of the image channel to obtain the negative sample data.

In an exemplary embodiment, based on the foregoing aspect, the second loss function determining unit includes, in determining a second loss function of the arbiter from the positive sample image and the negative sample image:

the probability result obtaining subunit is used for obtaining a first probability result corresponding to the positive sample image and a second probability result corresponding to the negative sample image;

and a second loss function determining subunit, configured to determine a second loss function of the arbiter according to the first probability result and the second probability result.

In an exemplary embodiment, based on the foregoing scheme, the objective loss function determining unit includes:

a feature extraction subunit, configured to extract a first high-dimensional feature of the second sample face image and a second high-dimensional feature of the sample pseudo image;

A third loss function determining subunit, configured to obtain a third loss function according to a feature difference between the first high-dimensional feature and the second high-dimensional feature;

and a target loss function determining subunit configured to determine the target loss function according to a sum of the first loss function, the second loss function, and the third loss function.

In an exemplary embodiment, based on the foregoing, the generator includes a neural network and the arbiter includes a full convolutional network.

In an exemplary embodiment, based on the foregoing solution, the model building module 610 includes:

the reconstruction network acquisition unit is used for acquiring a trained face model reconstruction network;

the face model building unit is used for reconstructing the first face image by adopting the face model reconstruction network to obtain the first face model.

In an exemplary embodiment, based on the foregoing scheme, the image driving module 620 includes:

the driving module network acquisition unit is used for acquiring a trained face driving module;

and the image driving unit is used for driving the first face model to perform motion transformation according to the target motion by adopting the face driving module to obtain the second face model.

It should be noted that, in the image processing apparatus provided in the foregoing embodiment, when the image processing method is executed, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be performed by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the image processing apparatus and the image processing method provided in the foregoing embodiments belong to the same concept, so for details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the image processing method of the present disclosure, and details are not repeated here.

The foregoing embodiment numbers of the present disclosure are merely for description and do not represent advantages or disadvantages of the embodiments.

The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods of the previous embodiments. The computer readable storage medium may include, among other things, any type of disk including floppy disks, optical disks, DVDs, CD-ROMs, micro-drives, and magneto-optical disks, ROM, RAM, EPROM, EEPROM, DRAM, VRAM, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.

The disclosed embodiments also provide an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods of the embodiments described above when the processor executes the program.

Fig. 7 schematically shows a structural schematic of the electronic device. Referring to fig. 7, an electronic device 700 includes: a processor 701 and a memory 702.

In the embodiment of the disclosure, the processor 701 is a control center of a computer system, and may be a processor of a physical machine or a processor of a virtual machine. Processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 701 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 701 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state.

In the embodiment of the present disclosure, the processor 701 is specifically configured to: reconstructing a first face image to obtain a first face model, wherein the face characteristic parameters of the first face model are less than those of the first face image; driving the first face model to perform motion transformation to obtain a second face model; rendering the second face model to obtain a second face image; reconstructing detail information of the second face image to obtain a target face image with higher resolution than the second face image, wherein the detail information at least comprises texture information and illumination information.

Further, the processor 701 is further configured to: constructing a neural renderer based on an countermeasure network; reconstructing the detail information of the second face image by adopting the nerve renderer to obtain a target face image with higher resolution than the second face image.

Further, the processor 701 is further configured to: acquiring a training data pair, wherein the training data pair comprises a first sample face image and a second sample face image, and the resolution of the first sample face image is lower than that of the second sample face image; inputting the first sample face image into the generator to obtain a sample pseudo image, and fixing first model parameters of the discriminator; determining a first loss function of the generator according to the second sample face image and the sample pseudo image, and updating a second model parameter of the generator according to the first loss function; constructing positive sample data and negative sample data according to the training data pairs and the sample pseudo images; inputting the sample data and the negative sample data into the discriminator to respectively obtain a positive sample image corresponding to the positive sample data and a negative sample image corresponding to the negative sample data, and fixing the second model parameters; determining a second loss function of the arbiter according to the positive sample image and the negative sample image, and updating the first model parameters according to the second loss function; determining a target loss function of the neural renderer according to the first loss function and the second loss function; and under the condition that the curve of the target loss function tends to be stable and the positive sample image and the negative sample image reach preset expectations, stopping updating the first model parameter and the second model parameter so as to complete the construction of the nerve renderer.

Further, the processor 701 is further configured to: splicing the first sample face image and the second sample face image according to the dimension of the image channel to obtain the positive sample data; and splicing the first sample face image and the sample pseudo image according to the dimension of the image channel to obtain the negative sample data.

Further, the processor 701 is further configured to: acquiring a first probability result corresponding to the positive sample image and a second probability result corresponding to the negative sample image; and determining a second loss function of the discriminator according to the first probability result and the second probability result.

Further, the processor 701 is further configured to: extracting a first high-dimensional feature of the second sample face image and a second high-dimensional feature of the sample pseudo image; acquiring a third loss function according to the characteristic difference between the first high-dimensional characteristic and the second high-dimensional characteristic;

Further, the processor 701 is further configured to: acquiring a trained face model reconstruction network; and reconstructing the first face image by adopting the face model reconstruction network to obtain the first face model.

Further, the processor 701 is further configured to: acquiring a trained face driving module; and driving the first face model to perform motion transformation according to the target motion by adopting the face driving module to obtain the second face model.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. The memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments of the present disclosure, a non-transitory computer readable storage medium in memory 702 is used to store at least one instruction for execution by processor 701 to implement the methods in embodiments of the present disclosure.

In some embodiments, the electronic device 700 further includes: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 703 via buses, signal lines or a circuit board. Specifically, the peripheral device includes: at least one of a display 704, a camera 705, and an audio circuit 706.

A peripheral interface 703 may be used to connect I/O (Input/Output) related at least one peripheral device to the processor 701 and memory 702. In some embodiments of the present disclosure, the processor 701, the memory 702, and the peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments of the present disclosure, either or both of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on separate chips or circuit boards. The embodiments of the present disclosure are not particularly limited thereto.

The display screen 704 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 704 is a touch display, the display 704 also has the ability to collect touch signals at or above the surface of the display 704. The touch signal may be input to the processor 701 as a control signal for processing. At this point, the display 704 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments of the present disclosure, the display 704 may be one, providing a front panel of the electronic device 700; in other embodiments of the present disclosure, the display 704 may be at least two, respectively disposed on different surfaces of the electronic device 700 or in a folded design; in still other embodiments of the present disclosure, the display 704 may be a flexible display disposed on a curved surface or a folded surface of the electronic device 700. Even more, the display screen 704 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display 704 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera 705 is used to capture images or video. Optionally, camera 705 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the electronic device 700, and the rear camera is disposed on the back of the electronic device 700. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments of the present disclosure, camera 705 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 706 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing. For purposes of stereo acquisition or noise reduction, the microphone may be multiple, and disposed at different locations of the electronic device 700. The microphone may also be an array microphone or an omni-directional pickup microphone.

The power supply 707 is used to power the various components in the electronic device 700. The power supply 707 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power supply 707 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

The block diagram of the electronic device 700 shown in the embodiments of the present disclosure does not constitute a limitation of the electronic device 700, and the electronic device 700 may include more or less components than illustrated, or may combine certain components, or may employ a different arrangement of components.

In the description of the present disclosure, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meaning of the terms in this disclosure will be understood by those of ordinary skill in the art in the specific context. Furthermore, in the description of the present disclosure, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The foregoing is merely specific embodiments of the disclosure, but the protection scope of the disclosure is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the disclosure, and the changes and substitutions are intended to be covered by the protection scope of the disclosure. Accordingly, equivalent variations from the claims of the present disclosure are intended to be covered by this disclosure.

Claims

1. An image processing method, characterized in that the image processing method comprises:

rendering the second face model to obtain a second face image;

reconstructing detail information of the second face image to obtain a target face image with resolution higher than that of the second face image, wherein the detail information at least comprises texture information and illumination information.

2. The image processing method according to claim 1, wherein the step of reconstructing detail information of the second face image to obtain a target face image having a higher resolution than the second face image includes:

Constructing a neural renderer based on an countermeasure network;

and reconstructing the detail information of the second face image by adopting the nerve renderer to obtain a target face image with higher resolution than the second face image.

3. The image processing method of claim 2, wherein the neural renderer includes a generator and a arbiter, the generator being connected to the arbiter, the step of constructing a neural renderer based on a countermeasure network comprising:

inputting the first sample face image into the generator to obtain a sample pseudo image, and fixing a first model parameter of the discriminator;

and under the condition that the curve of the target loss function tends to be stable and the positive sample image and the negative sample image reach a preset expected, stopping updating the first model parameter and the second model parameter so as to complete the construction of the nerve renderer.

4. The image processing method of claim 3, wherein the step of constructing positive sample data and negative sample data from the training data pair and the sample pseudo-image comprises:

splicing the first sample face image and the second sample face image according to the dimension of an image channel to obtain the positive sample data; the method comprises the steps of,

5. The image processing method according to claim 3, wherein the step of determining the second loss function of the arbiter from the positive sample image and the negative sample image includes:

6. The image processing method of claim 3, wherein the step of determining the target loss function of the neural renderer according to the first loss function and the second loss function comprises:

acquiring a third loss function according to the characteristic difference of the first high-dimensional characteristic and the second high-dimensional characteristic;

7. The image processing method of claim 3, wherein the generator comprises a neural network and the arbiter comprises a full convolution network.

8. The image processing method according to any one of claims 1 to 7, wherein the step of reconstructing the first face image to obtain the first face model includes:

acquiring a trained face model reconstruction network;

9. The image processing method according to any one of claims 1 to 7, wherein the step of driving the first face model to perform motion transformation to obtain a second face model includes:

acquiring a trained face driving module;

and driving the first face model to perform motion transformation according to a target motion by adopting the face driving module to obtain the second face model.

10. An image processing apparatus, characterized in that the image processing apparatus comprises:

the image reconstruction module is used for reconstructing detail information of the second face image to obtain a target face image with resolution higher than that of the second face image, and the detail information at least comprises texture information and illumination information.

11. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the image processing method according to any one of claims 1 to 9 when executing the computer program.

12. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the image processing method according to any one of claims 1 to 9.