CN112991152A

CN112991152A - Image processing method and device, electronic equipment and storage medium

Info

Publication number: CN112991152A
Application number: CN202110241859.1A
Authority: CN
Inventors: 袁燚; 宋祎瑶; 胡志鹏
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2021-03-04
Filing date: 2021-03-04
Publication date: 2021-06-18

Abstract

The application provides an image processing method, an image processing device, an electronic device and a storage medium, wherein the method comprises the following steps: after an input source face image and a target face image of a face to be changed are obtained, extracting a first latent space feature vector of the source face image in a first specified dimension and extracting a second latent space feature vector of the target face image in a second specified dimension; vector fusion is carried out on the first latent space feature vector and the second latent space feature vector to obtain a fused third latent space feature vector; and generating a first face-changing image according to the third latent space feature vector, wherein the first face-changing image with a good display effect can be obtained by the method.

Description

Image processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.

Background

With the development of image technology, image face changing technology is currently available, that is: the face on one image (source image) can be transplanted onto the face of the other image (target image) so that the newly obtained target image has the face of the source image.

In the related face changing technology, different weights are respectively distributed for image feature vectors of a source image and image features of a target image, then the image feature vectors of the source image and the image features of the target image are fused together in a weighting summation mode, and a face changing image is obtained through the fused feature vectors.

Disclosure of Invention

In view of this, embodiments of the present application provide an image processing method, an image processing apparatus, an electronic device, and a storage medium, so as to obtain a face-changed image with a better effect.

In a first aspect, an embodiment of the present application provides an image processing method, including:

after an input source face image and a target face image of a face to be changed are obtained, extracting a first latent space feature vector of the source face image in a first specified dimension and extracting a second latent space feature vector of the target face image in a second specified dimension;

vector fusion is carried out on the first latent space feature vector and the second latent space feature vector to obtain a fused third latent space feature vector;

and generating a first face changing image according to the third latent space feature vector.

Optionally, the first specified dimension includes a dimension for representing identity information of the face image, and the second specified dimension includes a dimension for representing attribute information of the face image.

Optionally, the extracting a first latent space feature vector of the source face image in a first specified dimension and extracting a second latent space feature vector of the target face image in a second specified dimension includes:

respectively inputting the source face image and the target face image into a trained face encoder to obtain a fourth latent space feature vector of the source face image in a third specified dimension and a fifth latent space feature vector of the target face image in the third specified dimension, wherein the third specified dimension comprises the first specified dimension and the second specified dimension;

and selecting a latent space feature vector corresponding to the first specified dimension from the fourth latent space feature vector to use the selected latent space feature vector as the first latent space feature vector, and selecting a latent space feature vector corresponding to the second specified dimension from the fifth latent space feature vector to use the selected latent space feature vector as the second latent space feature vector.

Optionally, the selecting a latent space feature vector corresponding to the first specified dimension from the fourth latent space feature vector to use the selected latent space feature vector as the first latent space feature vector, and selecting a latent space feature vector corresponding to the second specified dimension from the fifth latent space feature vector to use the selected latent space feature vector as the second latent space feature vector includes:

distributing a weight with a numerical value of 1 to the latent space eigenvector corresponding to the first designated dimension in the fourth latent space eigenvector, distributing a weight with a numerical value of 0 to the latent space eigenvector corresponding to the other dimensions in the fourth latent space eigenvector, and taking the obtained latent space eigenvector as the first latent space eigenvector;

and assigning a weight with a numerical value of 1 to the latent space eigenvector corresponding to the second specified dimension in the fifth latent space eigenvector, assigning a weight with a numerical value of 0 to the latent space eigenvector corresponding to the other dimension in the fifth latent space eigenvector, and taking the obtained latent space eigenvector as the second latent space eigenvector.

Optionally, the generating a first face-changed image according to the third latent space feature vector includes:

and inputting the third latent space feature vector into a trained face decoder to obtain the first face-changing image.

Optionally, after obtaining the first face-changed image, the method further includes:

calculating a first identity loss value of the first face-changed image and the source face image in the first designated dimension according to the first face-changed image and the source face image;

calculating a first attribute loss value of the first face-changed image and the target face image in the second designated dimension according to the first face-changed image and the target face image;

judging whether the first identity loss value is smaller than a first threshold value or not, and judging whether the first attribute loss value is smaller than a second threshold value or not;

and if the first identity loss value is smaller than the first threshold value and the first attribute loss value is smaller than the second threshold value, saving the first face changing image.

Optionally, the method further comprises:

if the first identity loss value is greater than or equal to the first threshold value and/or the first attribute loss value is greater than or equal to the second threshold value, performing gradient back propagation training on the trained face decoder by using the first identity loss value and the first attribute loss value;

inputting the third latent space feature vector into a retrained face decoder to obtain a second face-changed image;

and continuously calculating identity loss values of the second face-changed image and the source face image in the first specified dimension, calculating attribute loss values of the second face-changed image and the target face image in the second specified dimension until the obtained identity loss value is smaller than the first threshold value and the attribute loss value is smaller than the second threshold value, and storing the second face-changed image.

Optionally, the target face image comprises a face image of a game virtual character.

Optionally, the method further comprises:

after a sample source face image and a sample target face image of a face to be changed are obtained, the sample source face image and the sample target face image are respectively input into a face encoder to be trained, and a sixth latent space feature vector of the sample source face image in the third specified dimension and a seventh latent space feature vector of the sample target face image in the third specified dimension are obtained;

distributing a weight with a numerical value of 1 to the latent space eigenvector corresponding to the first specified dimension in the sixth latent space eigenvector, distributing a weight with a numerical value of 0 to the latent space eigenvector corresponding to the other dimension in the sixth latent space eigenvector, using the obtained latent space eigenvector as the first sample latent space eigenvector, distributing a weight with a numerical value of 1 to the latent space eigenvector corresponding to the second specified dimension in the seventh latent space eigenvector, distributing a weight with a numerical value of 0 to the latent space eigenvector corresponding to the other dimension in the seventh latent space eigenvector, and using the obtained latent space eigenvector as the second sample latent space eigenvector;

vector fusion is carried out on the first sample latent space feature vector and the second sample latent space feature vector to obtain a fused third sample latent space feature vector;

inputting the third sample latent space feature vector into a face decoder to be trained to obtain a third face-changed image;

inputting a target latent space feature vector into the face decoder to be trained to obtain a target image, wherein the target latent space feature vector comprises the sixth latent space feature vector or the seventh latent space feature vector;

calculating a Mean Square Error (MSE) loss value between the target image and the face image corresponding to the target latent space feature vector and a perception loss value between a latent space feature vector of the target image and the target latent space feature vector according to the face image corresponding to the target latent space feature vector and the target image;

calculating, from the third face-changed image and the sample-source facial image, a second loss of identity value in the first specified dimension for the third face-changed image and the sample-source facial image, and a second loss of attribute value in the second specified dimension for the third face-changed image and the sample-target facial image;

and performing gradient back propagation training on the face encoder to be trained and the face decoder to be trained according to the MSE loss value, the perception loss value, the second identity loss value and the second attribute loss value to obtain the trained face encoder and the trained face decoder.

Optionally, the first identity loss value and the second identity loss value are both calculated by the following formula:

L_id＝1-cos(z_id(Y)，z_id(X_s))；

the first attribute loss value and the second attribute loss value are both obtained by the following formula:

the MES loss value is obtained by the following formula:

the perceptual loss value is obtained by the following formula:

wherein X is used in calculating the first identity loss value and the first attribute loss value_SFor the source face image, X_tFor the target face image, Y is the first face-changed image, and X is the second identity loss value and the second attribute loss value when calculating the second identity loss value and the second attribute loss value_SFor the sample source face image, X_tFor the sample target face image, Y is the sample third face-changed image, Z_idA feature extractor for said first latent space feature vector, Z_attN is the number of layers of the latent space feature vector, x is the face image corresponding to the target latent space feature vector, x 'is the target image, C, H and W are the number of channels, the length and the width of the face image corresponding to the target latent space feature vector, C', H 'and W' are the number of channels, the length and the width of the target image, respectively, and F is the feature extractor in the third specified dimension.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

the face conversion device comprises an extraction unit, a face conversion unit and a face conversion unit, wherein the extraction unit is used for extracting a first latent space feature vector of a source face image on a first specified dimension and extracting a second latent space feature vector of a target face image on a second specified dimension after the input source face image and the target face image of a face to be converted are obtained;

the fusion unit is used for carrying out vector fusion on the first latent space feature vector and the second latent space feature vector to obtain a fused third latent space feature vector;

and the generating unit is used for generating a first face changing image according to the third latent space feature vector.

Optionally, when the extracting unit is configured to extract a first latent space feature vector of the source face image in a first specified dimension, and extract a second latent space feature vector of the target face image in a second specified dimension, the extracting unit includes:

Optionally, when the extracting unit is configured to select a latent space feature vector corresponding to the first specified dimension from the fourth latent space feature vector to use the selected latent space feature vector as the first latent space feature vector, and select a latent space feature vector corresponding to the second specified dimension from the fifth latent space feature vector to use the selected latent space feature vector as the second latent space feature vector, the extracting unit is configured to:

Optionally, when the generating unit is configured to generate the first face-changed image according to the third latent space feature vector, the generating unit is configured to:

Optionally, the apparatus further comprises:

the fine adjustment unit is used for calculating first identity loss values of the first face-changed image and the source face image in the first designated dimension according to the first face-changed image and the source face image after the first face-changed image is obtained; the first attribute loss value of the first face changing image and the first attribute loss value of the target face image in the second specified dimension are calculated according to the first face changing image and the target face image; and determining whether the first loss of identity value is less than a first threshold and whether the first loss of attribute value is less than a second threshold; and saving the first face-changed image if the first loss of identity value is less than the first threshold and the first attribute loss value is less than the second threshold.

Optionally, the fine tuning unit is further configured to:

Optionally, the apparatus further comprises:

a training unit, configured to, after obtaining a sample source face image and a sample target face image of a face to be changed, input the sample source face image and the sample target face image into a face encoder to be trained, respectively, to obtain a sixth latent spatial feature vector of the sample source face image in the third specified dimension and a seventh latent spatial feature vector of the sample target face image in the third specified dimension; the weight distribution module is used for distributing a weight with a numerical value of 1 to the latent space eigenvector corresponding to the first specified dimension in the sixth latent space eigenvector, distributing a weight with a numerical value of 0 to the latent space eigenvector corresponding to the other dimension in the sixth latent space eigenvector, using the obtained latent space eigenvector as the first sample latent space eigenvector, distributing a weight with a numerical value of 1 to the latent space eigenvector corresponding to the second specified dimension in the seventh latent space eigenvector, distributing a weight with a numerical value of 0 to the latent space eigenvector corresponding to the other dimension in the seventh latent space eigenvector, and using the obtained latent space eigenvector as the second sample latent space eigenvector; the second sample latent space feature vector is used for carrying out vector fusion on the first sample latent space feature vector and the second sample latent space feature vector to obtain a fused third sample latent space feature vector; the third sample latent space feature vector is input into a face decoder to be trained to obtain a third face-changing image; the face decoder is used for inputting a target latent space feature vector into the face decoder to be trained to obtain a target image, wherein the target latent space feature vector comprises the sixth latent space feature vector or the seventh latent space feature vector; the system comprises a target latent space feature vector, a face image and a sensing loss value, wherein the target latent space feature vector corresponds to a face image, and the face image corresponds to a face image; and calculating, from the third face-changed image and the sample-source facial image, a second identity loss value of the third face-changed image and the sample-source facial image in the first specified dimension, and a second attribute loss value of the third face-changed image and the sample-target facial image in the second specified dimension; and the system is used for performing gradient back propagation training on the face encoder to be trained and the face decoder to be trained according to the MSE loss value, the perception loss value, the second identity loss value and the second attribute loss value so as to obtain the trained face encoder and the trained face decoder.

L_id＝1-cos(z_id(Y)，z_id(X_s))；

the MES loss value is obtained by the following formula:

the perceptual loss value is obtained by the following formula:

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the image processing method according to any one of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the image processing method according to any one of the first aspect.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

in the application, in order to avoid that the face-changing image has the characteristics of a source image and a target image in any dimension at the same time, after an input source face image and a target face image to be face-changed are obtained, a first latent space feature vector in a first specified dimension in the source face image is extracted, a second latent space feature vector in a second specified dimension of the target face image is extracted, then the first latent space feature vector and the second latent space feature vector are subjected to vector fusion to obtain a third latent space feature vector after fusion, and the first face-changing image is generated according to the third latent space feature vector, because the image contents influenced by different latent space feature vectors are different and the image contents corresponding to the first specified dimension and the second specified dimension are different, the content of the first face-changing image in the first specified dimension is only influenced by the first latent space feature vector when the first face-changing image is generated, the content of the first face-changing image in the second designated dimension is only influenced by the second latent space feature vector, so that the content of the generated first face-changing image in the first designated dimension is the same as the content of the source face image in the first designated dimension, the content of the first face-changing image in the second designated dimension is the same as the content of the target face image in the second designated dimension, the characteristics of the source face image in the first designated dimension can be transplanted to the target face image, and the obtained first face-changing image has a good display effect.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of another image processing method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of another image processing method according to an embodiment of the present application;

fig. 4 is a schematic flowchart of another image processing method according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an image processing apparatus according to a second embodiment of the present application;

fig. 6 is a schematic structural diagram of another image processing apparatus according to a second embodiment of the present application;

fig. 7 is a schematic structural diagram of another image processing apparatus according to a second embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to a third embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted in advance that all face images referred to in this application are two-dimensional images, and feature vectors in multiple dimensions can be obtained through a certain face image, including: feature vectors for representing the dimensions of the five sense organs, feature vectors for representing the dimensions of the expressions, feature vectors for representing the dimensions of the background, and the like. The latent space feature vector in the present application refers to a feature vector obtained by projecting a face image onto a latent space, and taking a first latent space feature vector as an example, the first latent space feature vector refers to a latent space feature vector on a first specified dimension obtained by projecting the face image onto the latent space, and if the first specified dimension is a dimension for representing five sense organs, the first latent space feature vector is a latent space feature vector corresponding to the five sense organs in the face image.

Example one

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application, as shown in fig. 1, the method includes the following steps:

step 101, after an input source face image and a target face image of a face to be changed are obtained, extracting a first latent space feature vector of the source face image in a first specified dimension, and extracting a second latent space feature vector of the target face image in a second specified dimension.

And 102, carrying out vector fusion on the first latent space feature vector and the second latent space feature vector to obtain a fused third latent space feature vector.

And 103, generating a first face changing image according to the third latent space feature vector.

Specifically, the dimensions of the face image are set to include a first specified dimension and a second specified dimension, such as: if the five sense organs of a face image are transplanted into another face image to obtain a face-changed image, extracting a first latent space feature vector on a first designated dimension in the source face image and a second latent space feature vector on a second designated dimension in the target face image, performing vector fusion on the first latent space feature vector and the second latent space feature vector to obtain a fused third latent space feature vector, and generating the first face-changed image according to the third latent space feature vector, wherein different latent space feature vectors affect image contents on different dimensions, and image contents corresponding to the first designated dimension and the second designated dimension are different, therefore, when the first face-changed image is generated, the content of the first face-changed image in the first specified dimension is only influenced by the first latent space feature vector, and the content of the first face-changed image in the second specified dimension is only influenced by the second latent space feature vector, namely: for each dimension in the first face-changed image, the dimension is only affected by one face image, so that the image content of the generated first face-changed image in the first specified dimension is the same as the image content of the source face image in the first specified dimension, the image content of the first face-changed image in the second specified dimension is the same as the image content of the target face image in the second specified dimension, and further, the image content of the source face image in the first specified dimension (such as the five sense organs) can be transplanted to the target face image.

Moreover, the image content of the source face image in the first specified dimension changes adaptively with the image content of the target face image in the second specified dimension, so that the image content of the source face image in the first specified dimension is more matched with the image content of the target face image in the second specified dimension, for example: when the first specified dimension is a dimension for representing facial features and the second specified dimension includes a dimension for representing expressions, when vector fusion is performed on the first latent space feature vector and the second latent space feature vector for fusion, the facial features of the source facial image can be changed according to the expressions of the target facial image, so that the facial features of the source facial image have the expressions of the target facial image, the obtained facial features of the first face-changed image are the facial features of the source facial image, and the expressions of the facial features are the expressions of the target facial image, so that the whole of the obtained first face-changed image is more natural and harmonious, and the first face-changed image obtained by the method has a better display effect.

For example, if the five sense organs of the source face image are the standard five sense organs, the expression of the source face image is crying, and the expression of the target face image is laughing, then the five sense organs in the first face-changed image obtained in the manner shown in fig. 1 are the five sense organs of the source face image, that is: the facial features in the first face-changed image are standard facial features, and the facial features in the first face-changed image are smiling, and the amplitude and form of the smile are the same as those in the source face image, i.e.: the five sense organs of the source face image are adaptively adjusted according to the expression of the target face image, and if the posture in the target face image also affects the five sense organs, for example: if the source face image is a front photograph of a face, the target face image is a side photograph, and the left eye cannot be seen in the target face image, then the facial features in the first face-changed image obtained in the manner shown in fig. 1 are the facial features of the source face image, but the facial features of the facial features do not include the facial features of the left eye, and the postures of the facial features are adaptively adjusted according to the postures of the target face image, although other dimensions in the target face image may also affect the facial features, such as: the image background and the like, and the specific adaptive adjustment mode is not described in detail herein, because the image content of the source face image in the first specified dimension changes adaptively along with the image content of the target face image in the second specified dimension, the obtained first face-changed image has a better display effect.

Optionally, the first specified dimension includes a dimension for representing image identity information, and the second specified dimension includes a dimension for representing image attribute information.

Specifically, any face image includes two pieces of information: the face image identification information and the face image attribute information, wherein the face image identification information comprises facial feature information, and the face image attribute information comprises: when a first specified dimension includes a dimension for representing identity information of a face image and a second specified dimension includes a dimension for representing attribute information of the face image, the face image identity information of the first face-changed image obtained by the method shown in fig. 1 is the face image identity information of a source face image, and the face image attribute information of the first face-changed image is the face image attribute information of a target face image, that is: the first face-changed image has facial image attribute information such as facial features of a source facial image and facial features such as expression, color development, image background, dressing and posture of a target facial image, the facial features of the first face-changed image are not influenced by the facial features of the target facial image, the facial image attribute information such as the expression, color development, image background, dressing and posture of the first face-changed image is not influenced by the facial image attribute information such as the expression, color development, image background, dressing and posture of the source facial image, the facial features of the source facial image in the first face-changed image can be adaptively adjusted according to the expression in the target facial image, and if the posture in the target facial image also influences the facial features, the facial features of the source facial image in the first face-changed image can be adaptively adjusted according to the posture in the target facial image, so that the obtained first face-changed image is more natural in whole, and the facial image is more attractive in appearance, And the first face changing image obtained by the method has a better display effect.

In a possible implementation, fig. 2 is a schematic flowchart of another image processing method provided in the first embodiment of the present application, and as shown in fig. 2, when step 101 is executed, the following steps may be implemented:

step 201, inputting the source face image and the target face image into a trained face encoder respectively, to obtain a fourth latent space feature vector of the source face image in a third specified dimension and a fifth latent space feature vector of the target face image in the third specified dimension, where the third specified dimension includes the first specified dimension and the second specified dimension.

Step 202, selecting a latent space feature vector corresponding to the first specified dimension from the fourth latent space feature vector to use the selected latent space feature vector as the first latent space feature vector, and selecting a latent space feature vector corresponding to the second specified dimension from the fifth latent space feature vector to use the selected latent space feature vector as the second latent space feature vector.

Specifically, the face encoder is a series of convolutional networks, the convolutional networks mainly comprise convolutional layers, pooling layers and normalization layers, and images can be projected onto a latent space through the face encoder so as to obtain latent space feature vectors.

After obtaining the source face image and the target face image, inputting the source face image and the target face image into a trained face encoder, wherein the trained face encoder can respectively project the source face image and the target face image onto a latent space, so as to obtain a fourth latent space feature vector of the source face image in a third specified dimension (since the third specified dimension includes a dimension for representing face image identity information and a dimension for representing face image attribute information, the obtained fourth latent space feature vector includes feature vectors of the face image identity information and the face image attribute information of the source face image), and a fifth latent space feature vector of the target face image in the third specified dimension (the fifth latent space feature vector includes feature vectors of the face image identity information and the face image attribute information of the target face image), therefore, a latent space feature vector corresponding to the dimension for representing the identity information of the face image can be selected from the fourth latent space feature vector, so that the selected latent space feature vector is used as a first latent space feature vector corresponding to the first specified dimension, a latent space feature vector corresponding to the dimension for representing the attribute information of the face image is selected from the fifth latent space feature vector, and the selected latent space feature vector is used as a second latent space feature vector corresponding to the second specified dimension.

For example, the dimensions of the face image include 0 th dimension to 17 th dimension, such as: the 4 th dimension represents a mouth expression, the 5 th dimension represents a five sense organs, the 8 th dimension represents a hair color, in this case, the third specified dimension includes the 0 th dimension to the 17 th dimension, the first specified dimension includes the 5 th dimension (a dimension for representing face image identity information), the second specified dimension includes the other dimensions (a dimension for representing face image attribute information) except the 5 th dimension in the 0 th dimension to the 17 th dimension, when the source face image and the target face image are respectively input into the trained face encoder, a fourth latent space feature vector of the source face image in the 18 dimensions and a fifth latent space feature vector of the target face image in the 18 dimensions can be obtained, wherein one dimension corresponds to one latent space feature vector, that is: the obtained fourth latent space feature vector and the fifth latent space feature vector respectively comprise 18 latent space feature vectors, then the latent space feature vector corresponding to the 5 th dimension is selected from the fourth latent space feature vector to serve as the first latent space feature vector corresponding to the first specified dimension, the latent space feature vector corresponding to other dimensions except the 5 th dimension is selected from the fifth latent space feature vector to serve as the second latent space feature vector corresponding to the second specified dimension, and therefore the selected latent space feature vectors can be used for reconstructing the latent space feature vectors corresponding to the 18 dimensions needed by one face image.

It should be noted that the above examples are only illustrative and do not limit the present application, and for example: the dimension for representing the identity information of the face image can be further split into a plurality of sub-dimensions, such as: the dimension for representing the eyes, the dimension for representing the nose, the dimension for representing the eyebrows, and the dimension for representing the lips, and the specific representation part with respect to a certain dimension may be set according to actual needs, and is not particularly limited herein.

In one possible embodiment, when step 201 is performed, this can be achieved by:

and assigning a weight with a numerical value of 1 to the latent space eigenvector corresponding to the first designated dimension in the fourth latent space eigenvector, assigning a weight with a numerical value of 0 to the latent space eigenvector corresponding to the other dimension in the fourth latent space eigenvector, and taking the obtained latent space eigenvector as the first latent space eigenvector.

Specifically, the fourth latent space feature vector and the fifth latent space feature vector both include latent space feature vectors of multiple dimensions, respectively, and since only the latent space feature vector corresponding to the dimension for representing the face image identity information in the fourth latent space feature vector and the latent space feature vector corresponding to the dimension for representing the face image attribute information in the fifth latent space feature vector are required in generating the first face-changed image, it is necessary to remove the latent space feature vector corresponding to the dimension for representing the face image attribute information in the fourth latent space feature vector and the latent space feature vector corresponding to the dimension for representing the face image identity information in the fifth latent space feature vector, so that a weight having a value of 1 may be assigned to the latent space feature vector corresponding to the dimension for representing the face image identity information in the fourth latent space feature vector, and assigning a weight with a numerical value of 0 to a latent space feature vector corresponding to the dimension for representing the face image attribute information in the fourth latent space feature vector, assigning a weight with a numerical value of 0 to a latent space feature vector corresponding to the dimension for representing the face image identity information in the fifth latent space feature vector, and assigning a weight with a numerical value of 1 to a latent space feature vector corresponding to the dimension for representing the face image attribute information in the fifth latent space feature vector, thereby extracting a first latent space feature vector corresponding to the dimension (first specified dimension) for representing the face image identity information in the source face image and a second latent space feature vector corresponding to the dimension (second specified dimension) for representing the face image attribute information in the target face image.

In a possible embodiment, in step 103, the third latent spatial feature vector may be input into a trained face decoder to obtain the first face-changed image.

Specifically, the third latent space feature vector includes latent space feature vectors corresponding to all dimensions required by a face image, the trained face decoder may perform decompiling on the latent space feature vectors corresponding to all dimensions included in the third latent space feature vector to obtain image information corresponding to all dimensions, and then generate a first face-changed image according to the image information, because the latent space feature vector corresponding to the dimension used for representing the identity information of the face image in the third latent space feature vector belongs to the source face image and the latent space feature vector corresponding to the dimension used for representing the attribute information of the face image belongs to the target face image, the obtained first face-changed image has the same five sense organs as the source face image and the same other parts as the target face image, therefore, the purpose of transplanting the five sense organs in the source face image to the target face image is achieved.

In a possible implementation, fig. 3 is a schematic flowchart of another image processing method provided in the first embodiment of the present application, and as shown in fig. 3, after obtaining the first face-changed image, the method further includes the following steps:

step 301, calculating a first identity loss value of the first face-changed image and the source face image in the first designated dimension according to the first face-changed image and the source face image.

Step 302, calculating a first attribute loss value of the first face-changed image and the target face image in the second designated dimension according to the first face-changed image and the target face image.

Step 303, determining whether the first loss of identity value is smaller than a first threshold, and determining whether the first attribute loss value is smaller than a second threshold, if the first loss of identity value is smaller than the first threshold, and the first attribute loss value is smaller than the second threshold, step 304 is executed, and if the first loss of identity value is greater than or equal to the first threshold, and/or the first attribute loss value is greater than or equal to the second threshold, steps 305-307 are executed.

And step 304, storing the first face changing image.

And 305, performing gradient back propagation training on the trained face decoder by using the first identity loss value and the first attribute loss value.

And step 306, inputting the third latent space feature vector into a retrained face decoder to obtain a second face-changed image.

Step 307, continuing to calculate identity loss values of the second face-changed image and the source face image in the first specified dimension, and calculating attribute loss values of the second face-changed image and the target face image in the second specified dimension until the obtained identity loss value is smaller than the first threshold value and the attribute loss value is smaller than the second threshold value, and saving the second face-changed image.

Specifically, the trained face decoder can generate a better face-changed image for a face image used in training, but the effect of the face-changed image generated for a newly input face image may be relatively poor, and in order to determine whether the effect of the face-changed image generated for the newly input face image is poor, after a first face-changed image is obtained, first identity loss values of the first face-changed image and a source face image in a first specified dimension need to be calculated, that is: calculating the difference between the first face-changed image and the source face image in the dimension for representing the identity information of the face images, and calculating a first attribute loss value of the first face-changed image and the target face image in a second specified dimension, namely: when the first loss value is greater than or equal to the first threshold value, the similarity of the face image identity information of the first face-changed image and the source face image is lower, and when the first loss value is greater than or equal to the second threshold value, the similarity of the face image identity information of the first face-changed image and the source face image is lower, and at least one of the first loss value or the first loss value is greater than or equal to the corresponding threshold value, the method comprises the steps of obtaining a first identity loss value and a first attribute loss value, performing gradient back propagation training on a face decoder to finish fine adjustment on the face decoder, inputting a third latent space feature vector into the face decoder obtained after retraining again, calculating identity loss values of a second face-changed image and a source face image in a first designated dimension after obtaining a second face-changed image, calculating the attribute loss values of the second face-changed image and a target face image in a second designated dimension, continuously judging whether the newly obtained identity loss value is smaller than a first threshold value or not and whether the newly obtained attribute loss value is smaller than a second threshold value or not, if the value is less than the threshold value, the second face-changed image is saved, if at least one of the values is greater than or equal to the corresponding threshold value, the processing is continued in the step 305 and 307 until the obtained identity loss value and the obtained attribute loss value are less than the respective corresponding threshold value, the finally obtained face-changed image is saved, and the face decoder in use is adjusted again in the method by a fine adjustment mode, so that the generated face-changed image has a better effect.

In one possible embodiment, the target face image comprises a face image of a game avatar.

Specifically, the face image of the real character of the source face image may be the face image of a cartoon character, and the character avatar in the target face image may be the face image of the game virtual character, so that the five sense organs of the real character or the five sense organs of the cartoon character can be transplanted to the game virtual character, so that the game virtual character has more face sculpts.

In a possible implementation, fig. 4 is a schematic flowchart of another image processing method provided in the first embodiment of the present application, and as shown in fig. 4, the method further includes the following steps:

step 401, after obtaining a sample source face image and a sample target face image of a face to be changed, respectively inputting the sample source face image and the sample target face image into a face encoder to be trained, so as to obtain a sixth latent spatial feature vector of the sample source face image in the third specified dimension and a seventh latent spatial feature vector of the sample target face image in the third specified dimension.

Step 402, a weight with a numerical value of 1 is allocated to the latent space eigenvector corresponding to the first specified dimension in the sixth latent space eigenvector, a weight with a numerical value of 0 is allocated to the latent space eigenvector corresponding to the other dimension in the sixth latent space eigenvector, the obtained latent space eigenvector is used as the first sample latent space eigenvector, a weight with a numerical value of 1 is allocated to the latent space eigenvector corresponding to the second specified dimension in the seventh latent space eigenvector, a weight with a numerical value of 0 is allocated to the latent space eigenvector corresponding to the other dimension in the seventh latent space eigenvector, and the obtained latent space eigenvector is used as the second sample latent space eigenvector.

And 403, performing vector fusion on the first sample latent space feature vector and the second sample latent space feature vector to obtain a fused third sample latent space feature vector.

And step 404, inputting the third sample latent space feature vector into a face decoder to be trained to obtain a third face-changed image.

Step 405, inputting a target latent space feature vector into the face decoder to be trained to obtain a target image, wherein the target latent space feature vector includes the sixth latent space feature vector or the seventh latent space feature vector.

The related explanation of the steps 401-405 can refer to the related explanation of the above embodiments, and will not be described in detail here.

Step 406, calculating a Mean Square Error (MSE) loss value between the target image and the face image corresponding to the target latent space feature vector, and a perceptual loss value between a latent space feature vector of the target image and the target latent space feature vector according to the face image corresponding to the target latent space feature vector and the target image.

Step 407, calculating a second identity loss value of the third face-changed image and the sample source face image in the first specified dimension and a second attribute loss value of the third face-changed image and the sample target face image in the second specified dimension according to the third face-changed image and the sample source face image.

And 408, performing gradient back propagation training on the face encoder to be trained and the face decoder to be trained according to the MSE loss value, the perception loss value, the second identity loss value and the second attribute loss value to obtain the trained face encoder and the trained face decoder.

Specifically, MSE loss calculation is performed on the euclidean distance between the target face image and the face image corresponding to the target latent space feature vector, in order to determine the difference between the target face image and the face image corresponding to the target latent space feature vector, the perceptual loss value is calculated on the euclidean distance between the target face image and the face image corresponding to the target latent space feature vector, in order to determine the difference between the latent space feature vector of the target face image and the target latent space feature vector, the second identity loss value is calculated on the dimension representing the identity information of the face image between the third face-changed image and the sample source face image, the second attribute loss value is calculated on the dimension representing the attribute information of the face image between the third face-changed image and the sample target image, so that after the four loss values are determined, the face encoder and the face decoder can be trained, so that the face encoder can extract the latent space feature vector with small interference, and the face decoder can generate a face changing image with a good effect.

After obtaining the four loss values, respectively judging whether the four loss values are all smaller than the respective corresponding threshold values, if so, finishing the training, if at least one loss value is not smaller than the corresponding threshold value, continuing to train the face encoder and the face decoder, after each training, continuing to repeat the step 401-407 to obtain new four loss values, then continuing to judge whether the newly obtained four loss values are all smaller than the respective corresponding threshold values, if so, finishing the training, if at least one loss value is not smaller than the corresponding threshold value, then, repeating the step 401-407 again, and then, repeating the step 401-407 until the obtained four loss values are all smaller than the corresponding threshold values, thereby finishing the training.

It should be noted that the sixth latent spatial feature vector and the seventh latent spatial feature vector each include a latent spatial feature vector in a first specified dimension and a latent spatial feature vector in a second specified dimension, further, the sixth latent spatial feature vector includes a latent spatial feature vector of the sample source face image in a dimension for representing identity information of the face image and a dimension for representing attribute information of the face image, and the seventh latent spatial feature vector includes a latent spatial feature vector of the sample target face image in a dimension for representing identity information of the face image and a dimension for representing attribute information of the face image.

In one possible embodiment, the first identity loss value and the second identity loss value are each calculated by the following formula:

L_id＝1-cos(z_id(Y)，z_id(X_s))；

the MES loss value is obtained by the following formula:

the perceptual loss value is obtained by the following formula:

wherein X is used in calculating the first identity loss value and the first attribute loss value_SFor the source face image, X_tFor the target face image, Y is the first face-changed image, and X is the second identity loss value and the second attribute loss value when calculating the second identity loss value and the second attribute loss value_SFor the sample source face image, X_tFor the sample target face image, Y is the sample third face-changed image, Z_idA feature extractor for said first latent space feature vector, Z_attA feature extractor for the second latent space feature vector, n is the number of layers of the latent space feature vector, and x is the target latent spaceThe face image corresponding to the feature vector, x 'is the target image, C, H and W are the number of channels, length and width of the face image corresponding to the target latent space feature vector, C', H 'and W' are the number of channels, length and width of the target image, respectively, and F is the feature extractor in the third specified dimension.

Example two

Fig. 5 is a schematic structural diagram of an image processing apparatus according to a second embodiment of the present application, and as shown in fig. 5, the apparatus includes:

an extracting unit 51, configured to extract a first latent space feature vector of an input source face image and a target face image of a face to be changed in a first specified dimension, and extract a second latent space feature vector of the target face image in a second specified dimension;

a fusion unit 52, configured to perform vector fusion on the first latent space feature vector and the second latent space feature vector to obtain a fused third latent space feature vector;

and a generating unit 53, configured to generate a first face-changed image according to the third latent space feature vector.

In one possible embodiment, the first specified dimension includes a dimension for representing face image identity information, and the second specified dimension includes a dimension for representing face image attribute information.

In a possible embodiment, the extracting unit 51, when extracting a first latent spatial feature vector of the source face image in a first specified dimension and extracting a second latent spatial feature vector of the target face image in a second specified dimension, includes:

In a possible implementation, the extracting unit 51 is configured to select a latent spatial feature vector corresponding to the first specified dimension from the fourth latent spatial feature vector to use the selected latent spatial feature vector as the first latent spatial feature vector, and select a latent spatial feature vector corresponding to the second specified dimension from the fifth latent spatial feature vector to use the selected latent spatial feature vector as the second latent spatial feature vector, and includes:

In a possible embodiment, the generating unit 53 is configured to, when generating the first face-changed image according to the third latent spatial feature vector, include:

In a possible implementation, fig. 6 is a schematic structural diagram of another image processing apparatus provided in example two of the present application, and as shown in fig. 6, the apparatus further includes:

a fine adjustment unit 54, configured to calculate, after obtaining the first face change image, a first identity loss value of the first face change image and the first identity loss value of the source face image in the first specified dimension according to the first face change image and the source face image; the first attribute loss value of the first face changing image and the first attribute loss value of the target face image in the second specified dimension are calculated according to the first face changing image and the target face image; and determining whether the first loss of identity value is less than a first threshold and whether the first loss of attribute value is less than a second threshold; and saving the first face-changed image if the first loss of identity value is less than the first threshold and the first attribute loss value is less than the second threshold.

In a possible embodiment, the fine tuning unit 54 is further configured to:

In a possible implementation, fig. 7 is a schematic structural diagram of another image processing apparatus provided in example two of the present application, and as shown in fig. 7, the apparatus further includes:

a training unit 55, configured to, after obtaining a sample source face image and a sample target face image of a face to be changed, input the sample source face image and the sample target face image into a face encoder to be trained, respectively, to obtain a sixth latent spatial feature vector of the sample source face image in the third specified dimension and a seventh latent spatial feature vector of the sample target face image in the third specified dimension; the weight distribution module is used for distributing a weight with a numerical value of 1 to the latent space eigenvector corresponding to the first specified dimension in the sixth latent space eigenvector, distributing a weight with a numerical value of 0 to the latent space eigenvector corresponding to the other dimension in the sixth latent space eigenvector, using the obtained latent space eigenvector as the first sample latent space eigenvector, distributing a weight with a numerical value of 1 to the latent space eigenvector corresponding to the second specified dimension in the seventh latent space eigenvector, distributing a weight with a numerical value of 0 to the latent space eigenvector corresponding to the other dimension in the seventh latent space eigenvector, and using the obtained latent space eigenvector as the second sample latent space eigenvector; the second sample latent space feature vector is used for carrying out vector fusion on the first sample latent space feature vector and the second sample latent space feature vector to obtain a fused third sample latent space feature vector; the third sample latent space feature vector is input into the face decoder to be trained to obtain a third face-changing image; the face decoder is used for inputting a target latent space feature vector into a face decoder to be trained to obtain a target image, wherein the target latent space feature vector comprises the sixth latent space feature vector or the seventh latent space feature vector; the system comprises a target latent space feature vector, a face image and a sensing loss value, wherein the target latent space feature vector corresponds to a face image, and the face image corresponds to a face image; and calculating, from the third face-changed image and the sample-source facial image, a second identity loss value of the third face-changed image and the sample-source facial image in the first specified dimension, and a second attribute loss value of the third face-changed image and the sample-target facial image in the second specified dimension; and the system is used for performing gradient back propagation training on the face encoder to be trained and the face decoder to be trained according to the MSE loss value, the perception loss value, the second identity loss value and the second attribute loss value so as to obtain the trained face encoder and the trained face decoder.

L_id＝1-cos(z_id(Y)，z_id(X_s))；

the MES loss value is obtained by the following formula:

the perceptual loss value is obtained by the following formula:

wherein X is used in calculating the first identity loss value and the first attribute loss value_SFor the source face image, X_tCalculating the second identity loss value and the second face change image for the target face image and Y for the first face change imageWhen attribute loses value, X_SFor the sample source face image, X_tFor the sample target face image, Y is the sample third face-changed image, Z_idA feature extractor for said first latent space feature vector, Z_attN is the number of layers of the latent space feature vector, x is the face image corresponding to the target latent space feature vector, x 'is the target image, C, H and W are the number of channels, the length and the width of the face image corresponding to the target latent space feature vector, C', H 'and W' are the number of channels, the length and the width of the target image, respectively, and F is the feature extractor in the third specified dimension.

For the explanation of the second embodiment, reference is made to the detailed description of the first embodiment, and the detailed description is omitted here.

EXAMPLE III

Fig. 8 is a schematic structural diagram of an electronic device according to a third embodiment of the present application, including: a processor 801, a storage medium 802 and a bus 803, wherein the storage medium 802 stores machine-readable instructions executable by the processor 801, when the electronic device executes the image processing method, the processor 801 communicates with the storage medium 802 through the bus 803, and the processor 801 executes the machine-readable instructions to execute the method steps described in the first embodiment.

Example four

A fourth embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the method steps described in the first embodiment.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image processing method, comprising:

2. The method of claim 1, wherein the first specified dimension comprises a dimension for representing face image identity information and the second specified dimension comprises a dimension for representing face image attribute information.

3. The method of claim 1, wherein the extracting a first latent spatial feature vector of the source facial image in a first specified dimension and extracting a second latent spatial feature vector of the target facial image in a second specified dimension comprises:

4. The method according to claim 3, wherein the selecting a latent space eigenvector corresponding to the first specified dimension from the fourth latent space eigenvector to use the selected latent space eigenvector as the first latent space eigenvector, and selecting a latent space eigenvector corresponding to the second specified dimension from the fifth latent space eigenvector to use the selected latent space eigenvector as the second latent space eigenvector comprises:

5. The method of claim 3, wherein generating a first face-changed image from the third latent spatial feature vector comprises:

6. The method of claim 5, wherein after obtaining the first face-changed image, the method further comprises:

7. The method of claim 6, wherein the method further comprises:

8. The method of claim 1, wherein the target face image comprises a face image of a game avatar.

9. The method of claim 6, wherein the method further comprises:

10. The method of claim 9, wherein the first identity loss value and the second identity loss value are each calculated by the following equation:

L_id＝1-cos(z_id(Y)，z_id(X_s))；

the MES loss value is obtained by the following formula:

the perceptual loss value is obtained by the following formula:

11. An image processing apparatus characterized by comprising:

12. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the image processing method according to any one of claims 1 to 10.

13. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the image processing method according to any one of claims 1 to 10.