CN113610989B

CN113610989B - Method and device for training style migration model and method and device for style migration

Info

Publication number: CN113610989B
Application number: CN202110891320.0A
Authority: CN
Inventors: 王迪
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-08-04
Filing date: 2021-08-04
Publication date: 2022-12-27
Anticipated expiration: 2041-08-04
Also published as: CN113610989A

Abstract

The disclosure provides a style migration model training method and device, and relates to the technical fields of computer vision, deep learning, enhancement/virtual reality and the like. The specific implementation scheme is as follows: acquiring a preset sample set, wherein the sample set comprises at least one sample, and the sample comprises a two-dimensional original image and a stylized graph corresponding to the two-dimensional original image; acquiring a pre-established style migration network comprising a convolutional neural network, wherein the convolutional neural network is used for representing the relationship between the two-dimensional image and the stylized three-dimensional image parameters; inputting two-dimensional original images in the samples selected from the sample set into a style migration network to obtain three-dimensional image parameters predicted by a convolutional neural network; training a convolutional neural network based on the predicted three-dimensional image parameters and the selected stylized graph in the sample; and taking the convolutional neural network as a style migration model in response to determining that the convolutional neural network meets the training completion condition. This embodiment improves the accuracy of stylized transformation.

Description

Method and device for training style migration model and method and device for style migration

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to the technical fields of computer vision, deep learning, augmented/virtual reality, and the like, and in particular, to a style migration model training method and apparatus, a style migration method and apparatus, an electronic device, a computer-readable medium, and a computer program product.

Background

And the stylized reconstruction is to perform stylized three-dimensional migration after the real three-dimensional reconstruction, adjust a three-dimensional reconstruction substrate and modify a mixed shape coefficient to realize a stylized reconstruction effect. The existing method completely depends on aesthetic and subjective feelings of an artist to adjust a real reconstruction base, and carries out certain linear or nonlinear transformation on a real reconstruction mixed shape coefficient to obtain a stylized mixed shape coefficient, the transformed stylized mixed shape coefficient lacks objective constraint conditions, is a corresponding relation between unilateral subjective real reconstruction and stylized reconstruction, and the accuracy of the corresponding relation is low.

Disclosure of Invention

A style migration model training method and apparatus, a style migration method and apparatus, an electronic device, a computer readable medium, and a computer program product are provided.

According to a first aspect, there is provided a style migration model training method, the method comprising: acquiring a preset sample set, wherein the sample set comprises at least one sample, and the sample comprises a two-dimensional original image and a stylized graph corresponding to the two-dimensional original image; acquiring a pre-established style migration network comprising a convolutional neural network, wherein the convolutional neural network is used for representing the relationship between the two-dimensional image and the stylized three-dimensional image parameters; inputting two-dimensional original images of the samples selected from the sample set into a style migration network to obtain three-dimensional image parameters predicted by a convolutional neural network; training a convolutional neural network based on the predicted three-dimensional image parameters and the selected stylized graph in the sample; and taking the style migration network as a style migration model in response to determining that the convolutional neural network meets the training completion condition.

According to a second aspect, there is provided a style migration method, the method comprising: acquiring a face image to be stylized; and inputting the face image into the style migration model generated by adopting the method described in any one of the implementation manners of the first aspect, and outputting the stylized result of the face image.

According to a third aspect, there is provided a style migration model training apparatus, the apparatus comprising: the system comprises a sample acquisition unit, a data processing unit and a data processing unit, wherein the sample acquisition unit is configured to acquire a preset sample set, the sample set comprises at least one sample, and the sample comprises two-dimensional original pictures and stylized pictures corresponding to the two-dimensional original pictures; the network acquisition unit is configured to acquire a pre-established style migration network comprising a convolutional neural network, wherein the convolutional neural network is used for representing the relationship between the two-dimensional image and the stylized three-dimensional image parameters; the selecting unit is configured to input the two-dimensional original image of the selected sample in the sample set into the style migration network to obtain the three-dimensional image parameter predicted by the convolutional neural network; a training unit configured to train a convolutional neural network based on the predicted three-dimensional image parameters and the stylized graph in the selected sample; an output unit configured to take the style migration network as a style migration model in response to determining that the convolutional neural network satisfies a training completion condition.

According to a fourth aspect, there is also provided a style migration apparatus, the apparatus comprising: an acquisition unit configured to acquire a face image to be stylized; and the classification unit is configured to input the face image into the style migration model generated by adopting the method described in any one of the implementation manners of the first aspect, and output a stylized result of the face image.

According to a fifth aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as described in any one of the implementations of the first aspect or the second aspect.

According to a sixth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method as described in any implementation of the first or second aspects.

According to a seventh aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first or second aspect.

The method and the device for training the style migration model provided by the embodiment of the disclosure comprise the following steps of firstly, obtaining a preset sample set, wherein the sample set comprises at least one sample, and the sample comprises two-dimensional original drawings and stylized drawings corresponding to the two-dimensional original drawings; secondly, acquiring a pre-established style migration network comprising a convolutional neural network, wherein the convolutional neural network is used for representing the relationship between the two-dimensional image and the stylized three-dimensional image parameters; thirdly, inputting two-dimensional original images of the samples selected from the sample set into a style migration network to obtain three-dimensional image parameters predicted by the convolutional neural network; training a convolutional neural network based on the predicted three-dimensional image parameters and the selected stylized graph in the sample; and finally, taking the style migration network as a style migration model in response to the fact that the convolutional neural network meets the training completion condition. Therefore, the quantification of the stylized three-dimensional image parameters can be effectively regulated and controlled by constructing the convolutional neural network related to the three-dimensional image parameters, the stylized graph corresponding to the two-dimensional original graph is objectively obtained, the style migration model with high precision is obtained, and the two-dimensional original graph and the stylized graph sample are adopted, so that the reliability and the accuracy of the style migration model training are improved.

The style migration method and the style migration device provided by the embodiment of the disclosure are used for acquiring a face image to be stylized; and inputting the face image to be stylized into the style migration model generated by adopting the style migration model training method of the embodiment, and outputting the stylized result of the face image. Therefore, the style migration model generated by the convolutional neural network can improve the accuracy of the stylized result of the stylized migration of the face.

It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow diagram of one embodiment of a style migration model training method according to the present disclosure;

FIG. 2 is a schematic diagram of an architecture for training a style migration network according to an embodiment of the present disclosure;

FIG. 3 is a flow diagram according to one embodiment of a style migration method of the present disclosure;

FIG. 4 is a schematic diagram illustrating the structure of one embodiment of a migration model training apparatus according to the present disclosure;

FIG. 5 is a schematic block diagram of one embodiment of a style migration apparatus according to the present disclosure;

FIG. 6 is a block diagram of an electronic device for implementing a style migration model training method or a style migration method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Because the three-dimensional reconstruction is displayed through the rendering effect, the design stylization is usually directed at the 2D style diagram effect at the beginning, the effect of the rendering after the three-dimensional reconstruction substrate is manually adjusted is different from the expected 2D style diagram effect, and the problem that the effect of the rendering is different from the expected rendering effect when the three-dimensional reconstruction substrate is manually adjusted by an artist in the prior art is solved. The stylized reconstruction logic of the style migration model training method provided by the present disclosure is: firstly, a selected 2D effect graph is taken as a reconstruction target, if the base is subjectively adjusted, rendering is carried out, and the generated rendering graph can be matched with the initial ideal 2D effect graph.

FIG. 1 shows a process 100 according to one embodiment of the disclosed style migration model training method, which includes the steps of:

step 101, a preset sample set is obtained.

In this embodiment, the execution subject on which the style migration model training method operates may obtain the sample set in a variety of ways, for example, the execution subject may obtain the sample set stored in the database server in a wired connection way or a wireless connection way. As another example, a user may collect a sample via a terminal, such that an executing entity may receive the sample collected by the terminal and store such sample locally, thereby generating a sample set.

Here, the sample set may include at least one sample. The sample comprises a two-dimensional original image and a stylized graph corresponding to the two-dimensional original image.

In this embodiment, the stylized graph corresponding to the two-dimensional original drawing is a two-dimensional stylized graph, and is also a stylized label graph marked with the two-dimensional original drawing, the stylized graph corresponding to the two-dimensional original drawing can determine the style that the two-dimensional original drawing needs to be converted, and the stylized graph corresponding to the two-dimensional original drawing is also a two-dimensional image.

Step 102, a pre-established style migration network including a convolutional neural network is obtained.

In this embodiment, the style migration network is configured to perform style migration on the two-dimensional image to obtain a stylized result with a stylized effect, where the stylized result may be a two-dimensional image, and the stylized result may also be a three-dimensional image.

Convolutional neural networks, which are the main network structure of a style migration network, are composed of neurons with learnable weights and bias constants. Each neuron receives some input and performs a dot product calculation, the output being a score for each class. And when the parameters of the convolutional neural network in the style migration network are adjusted to the condition that the convolutional neural network meets the training completion condition, the style migration network is the trained style migration model.

In this embodiment, since the convolutional neural network can only obtain three-dimensional image parameters, and the two-dimensional original image in the sample is a two-dimensional image, the style migration network may further include an image conversion module (as shown in fig. 2) in addition to the convolutional neural network, where the image conversion module includes: a first module (not shown), a second module (not shown); wherein the first module is configured to convert the three-dimensional image parameters into a stylized three-dimensional image, such as the three-dimensional grid map of FIG. 2; the second module is for converting the stylized three-dimensional image into a predicted stylized graph. In this embodiment, parameters of the first module and the second module may be changed with a change of the convolutional neural network, or parameters of the first module and the second module are not changed, in this embodiment, the first module and the second module do not participate in calculation of a loss value of a CNN (convolutional neural network) in a training process of the style migration network, and only play a role in transferring a back propagation gradient of the convolutional network.

Here, the convolutional neural network of the present embodiment is used to characterize the relationship between the two-dimensional image and the stylized three-dimensional image parameters. In this embodiment, the three-dimensional image parameter refers to a key parameter for generating a three-dimensional image, for example, the three-dimensional image parameter includes: blending at least one of shape factor, stylized substrate. The mixed shape coefficient is also called bs coefficient and is used for representing the characteristic value of the mixed shape of the prediction stylized graph; and the stylized substrate is used for representing the feature vectors of the stylized graph in the feature vector space. And multiplying the stylized mixed shape coefficient and the stylized base line in the feature vector space to obtain a three-dimensional grid map corresponding to the predicted stylized map.

Optionally, the three-dimensional image parameters may further include: three-dimensional grid parameters, which are key parameters for generating a three-dimensional model, can be used for drawing a three-dimensional grid surface, and a point in a three-dimensional space is represented by (x, y, z) and can be connected with the point by a grid. For example, when the current sample is a face image, the three-dimensional mesh parameters may be a 3D point cloud and patch information, where the 3D point cloud is a three-dimensional coordinate of a pixel on the face and the patch information is used to indicate a key 3D point constituting the face of the face.

In this embodiment, based on the mixed shape coefficient and the stylized substrate, the three-dimensional mesh parameters may be obtained through conventional calculation, so that the three-dimensional model representation of the two-dimensional original drawing in the sample may be obtained.

And 103, inputting the two-dimensional original images of the samples selected from the sample set into a style migration network to obtain the three-dimensional image parameters predicted by the convolutional neural network.

In this embodiment, the execution subject may select a sample from the sample set obtained in step 101, input the two-dimensional artwork in the selected sample into the style migration network, and execute the training steps from step 104 to step 105. The selection manner and the number of samples are not limited in the present application. For example, in one iterative training, one sample may be randomly selected, or a plurality of samples may be selected from a sample set in one iterative training, a stylized graph in each selected sample is used as a supervised label and a predicted style graph (for example, the predicted style graph in fig. 2 is a graph obtained by rendering through a renderer) to calculate a loss value of the CNN, and a convolution network is adjusted, where the predicted style graph is an image generated by inputting two-dimensional original images into the convolution network to obtain three-dimensional image parameters and based on the three-dimensional image parameters.

In this embodiment, the style migration network may include a convolutional neural network and an image conversion module, where the image conversion module is configured to generate a predicted style sheet from the predicted three-dimensional image parameters. In this embodiment, the prediction style chart may be an image obtained by calculating a predicted three-dimensional image parameter, and the prediction style chart obtained by the image conversion module may be a three-dimensional image or a two-dimensional image.

When the prediction style map required to be obtained by the image conversion module is a three-dimensional image, the image conversion module may convert the three-dimensional image parameters into a three-dimensional prediction style map by using a conventional image conversion method. When the prediction style map required to be obtained by the image conversion module is a two-dimensional image, the image conversion module can convert the three-dimensional image parameters into a three-dimensional prediction style map, and then convert the three-dimensional prediction style map into a two-dimensional prediction style map through the renderer.

In this embodiment, the predicted style graph obtained by the image conversion module may be used as an output of the style migration network, that is, the output of the style migration network may be a two-dimensional image or a three-dimensional image; alternatively, the output of the convolutional neural network may also be used as the output of the style migration network, i.e. the output of the style migration network is the three-dimensional image parameters.

And 104, training the convolutional neural network based on the predicted three-dimensional image parameters and the selected stylized graph in the sample.

In this embodiment, the prediction style chart output by the image conversion module may be a three-dimensional image or a two-dimensional image. When the prediction style graph is a three-dimensional image, a fixed renderer (the parameters of the fixed renderer are not changed due to the change of the parameters of the convolutional neural network) is required to render the prediction style graph in the stylized migration network, and a two-dimensional prediction style graph is obtained; when the predicted stylized graph is a two-dimensional image, the difference between the predicted image and the stylized graph in the selected sample can be directly calculated in the stylized migration network, and the loss value of the convolutional neural network is calculated to obtain the loss value of the convolutional neural network, so that iterative training of the convolutional neural network is realized.

In this embodiment, the convolutional neural network is a main network structure of the stylized migration network, a loss value of the convolutional neural network can be obtained in each iterative training of the convolutional neural network, and the loss value of the convolutional neural network is also a loss value of the stylized migration network, and a method for calculating the loss value of the convolutional neural network is also a method for detecting a training result of the stylized migration network, as shown in fig. 2, a process of the stylized migration network training may be as follows: inputting two-dimensional original images in a sample into a stylized migration network to obtain three-dimensional image parameters output by CNN, generating three-dimensional images according to the three-dimensional image parameters by an image conversion module in the stylized migration network, converting the three-dimensional images into two-dimensional images, obtaining a predicted stylized image as a stylized result, determining the loss value of the CNN according to the difference between the stylized result and the stylized images (supervision labels) in the selected sample, and iteratively adjusting the parameters of the convolutional neural network by using a gradient back propagation mode to gradually reduce the loss value. And stopping adjusting the parameters of the CNN when the loss value of the convolutional neural network converges to a certain range or the iteration frequency reaches a preset frequency threshold value, and obtaining a style migration network comprising the trained CNN, namely the trained style migration model.

It should be noted that, when the convolutional neural network performs gradient back propagation, the image transformation module only performs gradient propagation on the convolutional neural network, and does not have any influence on the gradient of the convolutional neural network.

As shown in fig. 2, the prediction style map is a two-dimensional image, after a sample in the sample set is selected, the convolutional neural network is iteratively trained through the two-dimensional original image selected from the sample set and a stylized map corresponding to the two-dimensional original image, each iterative convolutional neural network has an output result, a loss value of the convolutional neural network is calculated during each iterative training, specifically, the loss value can be calculated from a loss function of the convolutional neural network, and the loss value of the convolutional neural network can be reduced by adjusting parameters of the convolutional neural network.

And step 105, taking the style migration network as a style migration model in response to the fact that the convolutional neural network meets the training completion condition.

In this embodiment, the training completion condition includes at least one of: the training iteration times of the convolutional neural network reach a preset iteration threshold value, and the loss value of the convolutional neural network is smaller than the preset loss value threshold value. For example, the training iteration of the convolutional neural network reaches 5 thousand times. The loss value of the convolutional neural network is less than 0.05.

In some optional implementations of this embodiment, in response to that the convolutional neural network does not satisfy the training completion condition, adjusting a relevant parameter in the convolutional neural network so that a loss value of the convolutional neural network converges, and continuing to perform the above training steps 103 to 105 based on the adjusted convolutional neural network.

In this optional implementation, when the convolutional neural network does not satisfy the training completion condition, the relevant parameters of the convolutional neural network are adjusted, which helps to help the loss value convergence of the convolutional neural network.

In this embodiment, if the training is not completed, the parameters of the convolutional neural network are adjusted to converge the loss value of the convolutional neural network. Specifically, steps 103 to 105 are repeatedly executed, and the parameters of the recurrent neural network are adjusted so that the action loss value gradually decreases until convergence.

The style migration model training method provided by the embodiment of the disclosure comprises the steps of firstly, obtaining a preset sample set, wherein the sample set comprises at least one sample, and the sample comprises two-dimensional original drawings and stylized drawings corresponding to the two-dimensional original drawings; secondly, acquiring a pre-established style migration network comprising a convolutional neural network, wherein the convolutional neural network is used for representing the relationship between the two-dimensional image and the stylized three-dimensional image parameters; thirdly, inputting two-dimensional original images of the samples selected in the sample set into a style migration network to obtain three-dimensional image parameters predicted by the convolutional neural network; training a convolutional neural network based on the predicted three-dimensional image parameters and the selected stylized graph in the sample; and finally, taking the style migration network as a style migration model in response to the fact that the convolutional neural network meets the training completion condition. Therefore, the quantization of the stylized three-dimensional image parameters can be effectively regulated and controlled by constructing the convolutional neural network related to the three-dimensional image parameters, the stylized graph corresponding to the two-dimensional original image is objectively obtained, the style migration model with high precision is obtained, and the two-dimensional original image and the stylized graph sample are adopted, so that the reliability and the accuracy of the style migration model training are improved.

In some alternative implementations of the present embodiment, the histogram in the sample is obtained by: and inputting the two-dimensional original image in the sample into the trained style conversion model to obtain a stylized graph corresponding to the two-dimensional original image, wherein the style conversion model is used for representing the relationship between the two-dimensional image and stylized three-dimensional image parameters.

In this alternative implementation, the style conversion model may be a GAN (generic adaptive Network) Network, the GAN Network is composed of two parts, a generator and an identifier, and the relationship between the generator and the identifier may be described by a competition or an adversary relationship.

Wherein the discriminator penalizes the false targets generated by the generator and awards the true targets, thereby determining the specific contents of the bad false targets and the good true targets. The generator generates better false targets than the last time through evolution, so that the identifier has smaller punishment on the generator. The above is a cycle, in the next cycle, the discriminator evolves the punishment to the false target again by learning the false target and the true target evolved in the previous cycle, and simultaneously the generator evolves again until the true target is consistent, and the evolution is finished.

In this embodiment, the two-dimensional original image in the sample is input into the stylized conversion model after training, so that the accuracy of obtaining the stylized image can be improved, and the training effect of the style migration model is improved.

In some optional implementations of this embodiment, the generating the prediction style sheet based on the predicted three-dimensional image parameter is a predicted mixed shape coefficient, and includes: generating a three-dimensional grid map based on the predicted mixed shape coefficient and a preset stylized substrate; and inputting the three-dimensional grid graph into a micro-renderer to generate a prediction grid graph.

In this optional implementation, the image conversion module in the style migration network has a preset stylized base, and when the predicted three-dimensional image parameter is a mixed shape coefficient predicted by the convolutional neural network, the image conversion module in the style migration network is used to multiply the mixed shape coefficient and the preset stylized base to generate a three-dimensional grid map, and then the three-dimensional grid map is input to the micro-renderer to generate a two-dimensional predicted grid map.

In the optional implementation mode, when the predicted three-dimensional image parameters are mixed shape coefficients predicted by the convolutional neural network, the image conversion module is adopted, the predicted three-dimensional image parameters can be effectively converted into the two-dimensional prediction style diagram, an optional mode is provided for obtaining the prediction style diagram, and the reliability of obtaining the prediction style diagram is ensured.

Optionally, the predicted three-dimensional image parameter is a predicted mixed shape coefficient, and the generating of the prediction format map based on the predicted three-dimensional image parameter includes: generating a three-dimensional grid map based on the predicted mixed shape coefficient and a preset stylized substrate; and taking the three-dimensional grid map as a prediction format map.

In some optional implementations of this embodiment, the generating the predicted stylized graph includes: generating a three-dimensional grid map based on a preset mixed shape coefficient and a predicted stylized substrate; and inputting the three-dimensional grid graph into a micro-renderer to generate a prediction grid graph.

In this optional implementation, the image conversion module in the style migration network has a preset mixed shape coefficient, and when the predicted three-dimensional image parameter is the predicted stylized base output by the convolutional neural network, the image conversion module in the style migration network may be adopted to linearly multiply the preset mixed shape coefficient and the predicted stylized base to generate a three-dimensional grid map, and then the three-dimensional grid map is input to the micro-renderer to generate a two-dimensional predicted grid map.

In this embodiment, both the renderer and the micro-renderer are modules for converting three-dimensional light energy transmission processing into two-dimensional images. Micro-renderable machines make learning three-dimensional structures from a single picture increasingly realistic by calculating derivatives of the rendering process, as opposed to through the renderer. Micro-renderable is currently widely used in three-dimensional reconstruction, particularly human reconstruction, face reconstruction, three-dimensional attribute estimation, and the like.

In the optional implementation mode, when the predicted three-dimensional image parameter is the stylized substrate predicted by the convolutional neural network, the image conversion module is adopted, the predicted three-dimensional image parameter can be effectively converted into the two-dimensional predicted style diagram, an optional mode is provided for obtaining the predicted style diagram, and the accuracy of the predicted style diagram is ensured.

Optionally, the predicted three-dimensional image parameter is a predicted stylized base, and the generating of the predicted style sheet based on the predicted three-dimensional image parameter includes: generating a three-dimensional grid map based on a preset mixed shape coefficient and a predicted stylized substrate; and taking the three-dimensional grid map as a prediction format map.

In some optional implementations of this embodiment, the generating a predicted style sheet based on the predicted three-dimensional image parameter is performed by: generating a three-dimensional grid map based on the predicted three-dimensional grid parameters; and inputting the three-dimensional grid graph into a micro-renderer to generate a prediction grid graph.

In the optional implementation mode, an image conversion module in the style migration network is provided with preset 3D point cloud or patch information, when the predicted three-dimensional image parameter is a predicted three-dimensional grid parameter output by the convolutional neural network and the three-dimensional grid parameter is a predicted 3D point cloud, the image conversion module in the style migration network is adopted to select the 3D point cloud corresponding to the preset patch information from the predicted 3D point cloud, the selected 3D point cloud is adopted to generate a three-dimensional grid map, and then the three-dimensional grid map is input into the micro-renderer to generate a two-dimensional predicted grid map.

Optionally, when the predicted three-dimensional image parameter is a predicted three-dimensional grid parameter output by the convolutional neural network and the three-dimensional grid parameter is predicted patch information, selecting a 3D point cloud corresponding to the predicted patch information from a preset 3D point cloud by using an image conversion module in the style migration network, generating a three-dimensional grid map by using the selected 3D point cloud, and inputting the three-dimensional grid map into the micro-renderer to generate a two-dimensional predicted grid map.

In the optional implementation mode, when the predicted three-dimensional image parameters are the three-dimensional grid parameters predicted by the convolutional neural network, the image conversion module is adopted, the predicted three-dimensional image parameters can be effectively converted into the two-dimensional prediction style diagram, an optional mode is provided for obtaining the prediction style diagram, and the reliability of obtaining the prediction style diagram is ensured.

Optionally, the generating a prediction style map based on the predicted three-dimensional image parameter includes: generating a three-dimensional grid map based on the predicted three-dimensional grid parameters; and taking the three-dimensional grid map as a prediction grid map.

Further, based on the style migration model training method provided by the embodiment, the present disclosure also provides an embodiment of a style migration method, and the style migration method disclosed by the present disclosure combines the artificial intelligence fields of computer vision, deep learning, and the like.

Referring to fig. 3, a flow 300 of a style migration method according to an embodiment of the present disclosure is shown, and the style migration method provided in this embodiment includes the following steps:

step 301, obtaining a face image to be stylized.

For example, the execution subject can acquire the facial image to be stylized stored in the database server in a wired connection mode or a wireless connection mode.

In this embodiment, the acquired face image to be stylized may be a color image and/or a grayscale image, and the format of the acquired face image to be stylized is not limited in this disclosure.

Step 302, inputting the face image into the style transition model, and outputting the stylized result of the face image.

In this embodiment, the execution subject may input the facial image to be stylized acquired in step 301 into the style migration model, so as to obtain a stylized result of the facial image to be stylized. The stylized result is a result of style migration of the face image to be stylized, and based on the structure of the style migration model, the stylized result may be a two-dimensional stylized graph corresponding to the face image, a three-dimensional stylized graph corresponding to the face image, or a three-dimensional image parameter corresponding to the face image. The image contents of the two-dimensional stylized graph and the three-dimensional stylized graph are the same as the contents of the face image, and compared with the face image, the style display of the face image is changed only by the two-dimensional stylized graph and the three-dimensional stylized graph, for example, the face image is in a cartoon style, and the stylized result is a human face image in an oil painting style.

In this embodiment, the style migration model may be obtained by training by using the method described in the embodiment of fig. 1, and the specific training process may refer to the description related to the embodiment of fig. 1, which is not described herein again.

In this embodiment, the style migration model may be used to perform style migration on a face image to be stylized, so as to obtain a stylized result. The style migration refers to using some means to convert the face image from the original style to another style, and meanwhile, it is ensured that the content of the face image is not changed, the style of the stylized face image to be stylized before the style migration and the style of the stylized result are different, and the style of the stylized face image to be stylized before the style migration and the style of the stylized result are respectively different, which is not limited in this embodiment, for example, the style type of the stylized face image to be stylized and the style type of the stylized result may both be: one of the styles of the cartoon, the comedy, the drama, the oil painting and the like inputs a real two-dimensional face image into the style migration model, and the style migration model outputs a two-dimensional cartoon face image or a three-dimensional cartoon face image.

It should be noted that the style migration method of the present embodiment may be used to test the style migration model generated by each of the above embodiments. And then the style migration model can be continuously optimized according to the stylized result. The method may also be a practical application method of the style migration model generated by the above embodiments. The style migration model generated by the embodiments is adopted to perform the style migration of the face image, which is beneficial to improving the accuracy of the style migration.

The style migration method provided by the embodiment of the disclosure acquires a face image to be stylized; and inputting the face image to be stylized into the style transition model generated by adopting the style transition model training method of the embodiment, and outputting the stylized result of the face image. Therefore, the style migration model generated by the convolutional neural network can improve the accuracy of the stylized result of the stylized migration of the face.

With further reference to fig. 4, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of a style migration model training apparatus, which corresponds to the embodiment of the method shown in fig. 1 and is specifically applicable to various electronic devices.

As shown in fig. 4, the style transition model training apparatus 400 provided in this embodiment includes: a sample acquiring unit 401, a network acquiring unit 402, a selecting unit 403, a training unit 404, and an output unit 405. The sample acquiring unit 401 may be configured to acquire a preset sample set, where the sample set includes at least one sample, and the sample includes a two-dimensional original image and a stylized graph corresponding to the two-dimensional original image. The network acquisition unit 402 may be configured to acquire a pre-established style migration network including a convolutional neural network for characterizing a relationship between the two-dimensional image and the stylized three-dimensional image parameters. The selecting unit 403 may be configured to input the two-dimensional original image of the selected sample in the sample set into the style transition network, so as to obtain the three-dimensional image parameter predicted by the convolutional neural network. The training unit 404, described above, may be configured to train a convolutional neural network based on the predicted three-dimensional image parameters and the stylized graph in the selected sample. The output unit 405 may be configured to take the style migration network as the style migration model in response to determining that the convolutional neural network satisfies the training completion condition.

In this embodiment, in the style transition model training apparatus 400: the detailed processing and the technical effects of the sample obtaining unit 401, the network obtaining unit 402, the selecting unit 403, the training unit 404, and the output unit 405 may refer to the related descriptions of step 101, step 102, step 103, step 104, and step 105 in the corresponding embodiment of fig. 1, which are not described herein again.

In some optional implementations of this embodiment, the apparatus 400 further includes: an adjustment unit (not shown in the figure). The adjusting unit may be configured to adjust a relevant parameter in the convolutional neural network so that a loss value of the convolutional neural network converges in response to determining that the convolutional neural network does not satisfy the training completion condition, and the selecting unit 403, the training unit 404, and the output unit 405 continue training the adjusted convolutional neural network based on the adjusted convolutional neural network.

In some optional implementations of this embodiment, the stylized graph in the sample is obtained through a stylization conversion unit (not shown in the figure), where the stylization conversion unit may be configured to input the two-dimensional artwork in the sample into a trained stylized model, so as to obtain a stylized graph corresponding to the two-dimensional artwork.

In some optional implementations of this embodiment, the predicted three-dimensional image parameter is a predicted mixed shape coefficient, and the training unit 404 includes: a first mesh generation module (not shown), a first style generation module (not shown). The first mesh generation module may be configured to generate a three-dimensional mesh map based on the predicted mixed shape coefficient and a preset stylized base. The first style generation module may be configured to input the three-dimensional grid map into the micro-renderer to generate the prediction grid map.

In some optional implementations of this embodiment, the predicted three-dimensional image parameter is a predicted stylized base, and the training unit 404 includes: a second mesh generation module (not shown), a second style generation module (not shown). The second mesh generation module may be configured to generate a three-dimensional mesh map based on a preset mixed shape coefficient and the predicted stylized base. The second style generation module may be configured to input the three-dimensional grid map into the micro-renderer to generate the prediction grid map.

In some optional implementations of this embodiment, the predicted three-dimensional image parameter is a predicted three-dimensional mesh parameter, and the training unit 404 includes: a third mesh generation module (not shown), and a third style generation module (not shown). Wherein the third mesh generation module may be configured to generate a three-dimensional mesh map based on the predicted three-dimensional mesh parameters. The third style generation module may be configured to input the three-dimensional grid map into the micro-renderer to generate the prediction grid map.

First, a sample acquisition unit 401 acquires a preset sample set, where the sample set includes at least one sample, and the sample includes two-dimensional original drawings and stylized drawings corresponding to the two-dimensional original drawings; secondly, the network obtaining unit 402 obtains a pre-established style migration network including a convolutional neural network, wherein the convolutional neural network is used for representing the relationship between the two-dimensional image and the stylized three-dimensional image parameter; thirdly, the selecting unit 403 inputs the two-dimensional original image of the sample selected in the sample set into the style migration network to obtain the three-dimensional image parameter predicted by the convolutional neural network; from time to time, the training unit 404 trains the convolutional neural network based on the predicted three-dimensional image parameters and the stylized graph in the selected sample; finally, the output unit 405 takes the style migration network as a style migration model in response to determining that the convolutional neural network satisfies the training completion condition. Therefore, the quantification of the stylized three-dimensional image parameters can be effectively regulated and controlled by constructing the convolutional neural network related to the three-dimensional image parameters, the stylized graph corresponding to the two-dimensional original graph is objectively obtained, the style migration model with high precision is obtained, and the two-dimensional original graph and the stylized graph sample are adopted, so that the reliability and the accuracy of the style migration model training are improved.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of a style migration apparatus, which corresponds to the embodiment of the method shown in fig. 3, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the style migration apparatus 500 provided in this embodiment includes: an acquisition unit 501 and a conversion unit 502. The acquiring unit 501 may be configured to acquire a face image to be stylized. The conversion unit 502 may be configured to input the face image into the style transition model generated by the method described in the embodiment of fig. 1, and output the stylized result of the face image.

In the present embodiment, in the style migration apparatus 500: the detailed processing of the obtaining unit 501 and the converting unit 502 and the technical effects thereof can refer to the related descriptions of step 301 and step 302 in the corresponding embodiment of fig. 3, which are not repeated herein.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 601 performs the various methods and processes described above, such as the style migration model training method or the style migration method. For example, in some embodiments, the style migration model training method or style migration method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 600 via ROM 602 and/or communications unit 609. When the computer program is loaded into RAM603 and executed by computing unit 601, one or more steps of the style migration model training method or style migration method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the style migration model training method or the style migration method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable style migration model training apparatus, style migration apparatus, such that the program codes, when executed by the processor or controller, cause the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server combining a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. A style migration model training method, the method comprising:

acquiring a preset sample set, wherein the sample set comprises at least one sample, and the sample comprises a two-dimensional original drawing and a stylized graph corresponding to the two-dimensional original drawing;

obtaining a pre-established style migration network comprising a convolutional neural network and an image conversion module, wherein the convolutional neural network is used for representing the relationship between a two-dimensional image and stylized three-dimensional image parameters, the image conversion module is used for transmitting a back propagation gradient of the convolutional neural network and does not participate in the calculation of a loss value of the convolutional neural network, and the image conversion module comprises: a first module for converting the three-dimensional image parameters into a stylized three-dimensional image and a second module for converting the stylized three-dimensional image into a predicted stylized map;

inputting two-dimensional original images of the samples selected from the sample set into the style migration network to obtain three-dimensional image parameters predicted by the convolutional neural network;

training the convolutional neural network based on the predicted three-dimensional image parameters and the selected stylized graph in the sample; the training the convolutional neural network based on the predicted three-dimensional image parameters and the stylized graph in the selected sample comprises: generating the predicted style sheet based on the predicted three-dimensional image parameters; training the convolutional neural network based on the prediction style sheet and the style sheet in the selected sample;

and taking the style migration network as a style migration model in response to determining that the convolutional neural network meets a training completion condition.

2. The method of claim 1, further comprising:

and in response to determining that the convolutional neural network does not meet the training completion condition, adjusting relevant parameters in the convolutional neural network so that the loss value of the convolutional neural network is converged, and continuing to train the adjusted convolutional neural network based on the adjusted convolutional neural network.

3. The method of claim 1, wherein the stroke map in the sample is obtained by:

and inputting the two-dimensional original image in the sample into a trained style conversion model to obtain a stylized graph corresponding to the two-dimensional original image, wherein the style conversion model is used for representing the relationship between the two-dimensional image and stylized three-dimensional image parameters.

4. The method according to one of claims 1-3, wherein the predicted three-dimensional image parameter is a predicted hybrid shape coefficient, and the generating of the prediction format map based on the predicted three-dimensional image parameter comprises:

generating a three-dimensional grid map based on the predicted mixed shape coefficient and a preset stylized substrate;

and inputting the three-dimensional grid graph into a micro-renderer to generate a prediction grid graph.

5. The method according to one of claims 1-3, wherein the predicted three-dimensional image parameter is a predicted stylized basis, and the generating a predicted style sheet based on the predicted three-dimensional image parameter comprises:

generating a three-dimensional grid map based on a preset mixed shape coefficient and the predicted stylized substrate;

6. The method according to one of claims 1 to 3, wherein the predicted three-dimensional image parameter is a predicted three-dimensional mesh parameter, and the generating of the prediction style sheet based on the predicted three-dimensional image parameter comprises:

generating a three-dimensional grid map based on the predicted three-dimensional grid parameters;

and inputting the three-dimensional grid graph into a micro-renderer to generate a prediction format graph.

7. A style migration method, the method comprising:

acquiring a face image to be stylized;

inputting the face image into a style migration model generated by the method of any one of claims 1-6, and outputting a stylized result of the face image.

8. A style migration model training apparatus, the apparatus comprising:

the system comprises a sample acquisition unit, a processing unit and a processing unit, wherein the sample acquisition unit is configured to acquire a preset sample set, the sample set comprises at least one sample, and the sample comprises two-dimensional original pictures and stylized pictures corresponding to the two-dimensional original pictures;

a network obtaining unit configured to obtain a pre-established style transition network including a convolutional neural network for characterizing a relationship between a two-dimensional image and stylized three-dimensional image parameters and an image conversion module for transferring a back propagation gradient of the convolutional neural network without participating in calculation of a loss value of the convolutional neural network, the image conversion module including: a first module for converting the three-dimensional image parameters into a stylized three-dimensional image and a second module for converting the stylized three-dimensional image into a predicted stylized map;

the selecting unit is configured to input two-dimensional original images of the samples selected from the sample set into the style migration network to obtain three-dimensional image parameters predicted by the convolutional neural network;

a training unit configured to train the convolutional neural network based on the predicted three-dimensional image parameters and a stylized graph in the selected sample; the training unit is further configured to: generating the predicted style sheet based on the predicted three-dimensional image parameters; training the convolutional neural network based on the prediction style sheet and the style sheet in the selected sample;

an output unit configured to treat the style migration network as a style migration model in response to determining that the convolutional neural network satisfies a training completion condition.

9. The apparatus of claim 8, the apparatus further comprising:

an adjusting unit configured to adjust a relevant parameter in the convolutional neural network so that a loss value of the convolutional neural network converges in response to determining that the convolutional neural network does not satisfy a training completion condition, wherein the selecting unit, the training unit, and the output unit continue training the adjusted convolutional neural network based on the adjusted convolutional neural network.

10. The apparatus of claim 8, wherein the stroke map in the sample is obtained by a style conversion unit:

and the style conversion unit is configured to input the two-dimensional original image in the sample into the trained style conversion model to obtain a stylized graph corresponding to the two-dimensional original image.

11. Apparatus according to one of claims 8 to 10, wherein the predicted three-dimensional image parameters are predicted hybrid shape coefficients, the training unit comprising:

a first mesh generation module configured to generate a three-dimensional mesh map based on the predicted hybrid shape coefficient and a preset stylized basis;

a first style generation module configured to input the three-dimensional grid map into a micro-renderer to generate a prediction grid map.

12. Apparatus according to one of claims 8 to 10, wherein the predicted three-dimensional image parameter is a predicted stylized basis, the training unit comprising:

a second mesh generation module configured to generate a three-dimensional mesh map based on a preset hybrid shape coefficient and the predicted stylized basis;

a second style generation module configured to input the three-dimensional grid map into a micro-renderer to generate a prediction grid map.

13. Apparatus according to one of claims 8 to 10, wherein the predicted three-dimensional image parameter is a predicted three-dimensional mesh parameter, the training unit comprising:

a third mesh generation module configured to generate a three-dimensional mesh map based on the predicted three-dimensional mesh parameters;

a third style generation module configured to input the three-dimensional grid map into a micro-renderer to generate a prediction style map.

14. A style migration apparatus, the apparatus comprising:

an acquisition unit configured to acquire a face image to be stylized;

a conversion unit configured to input the face image into a style migration model generated by the method according to any one of claims 1 to 6, and output a stylized result of the face image.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1-7.