CN110197226B

CN110197226B - Unsupervised image translation method and system

Info

Publication number: CN110197226B
Application number: CN201910461740.8A
Authority: CN
Inventors: 邵桂芳; 刘暾东; 李铁军; 黄梦; 高凤强
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2019-05-30
Filing date: 2019-05-30
Publication date: 2021-02-09
Anticipated expiration: 2039-05-30
Also published as: CN110197226A

Abstract

The invention discloses an unsupervised image translation method and an unsupervised image translation system. Two different image set domains of the same object are taken as research objects, and based on a double-capsule competition network and multi-subject generation countermeasure, an unsupervised image translation method and system are provided, so that the model discrimination and generation capacity is improved, images with richer global and local characteristics are generated, the distribution of the image domains can be captured more accurately, and the mapping relation between different domains can be learned.

Description

Unsupervised image translation method and system

Technical Field

The invention relates to the field of image translation, in particular to an unsupervised image translation method and an unsupervised image translation system.

Background

With the advent of information multimedia technology, the technology of using images as a main propagation medium has been rapidly developed, and the technology related to image processing has become more and more important. Computer vision technology has been widely used thanks to the breakthrough progress of artificial intelligence technology, especially deep learning technology. In many computer vision tasks, many problems require composite images, such as texture synthesis, image analogy, image super-resolution, image segmentation, style conversion, seasonal conversion, and image understanding. The image translation technology which fuses the features of different domains is hopeful to solve the problems as a unified framework. For example, images of different street scenes may be synthesized using this technique to augment an unmanned scene data set to improve the learning capabilities of the unmanned vehicle. In order to facilitate traffic management work, the images can be mutually converted from day to night. Of course, the technology also embodies strong superiority for the translation of image domain and label domain for realizing semantic segmentation.

In response to the above problems, the approaches that have appeared in recent years can be largely classified into three categories. The first category is non-learning methods, and methods based on geometric images and image gap filling and case-based methods are proposed for synthesizing image textures and rendering styles of different images. The second category is a deep learning-based method, which uses different forms of deep learning networks to effectively solve the problems of image segmentation, image reconstruction, depth-of-field estimation, super-resolution, and the like, including convolutional networks (CNN), cyclic networks (RNN), deep cyclic fusion networks (DRFN), and the like. The third category is the method of creating countermeasures, relying on the important role of countermeasures learning in computer vision tasks, and a series of methods based on creating a countermeasure network (GAN) have been proposed as a unified framework to solve the image translation problem. For example, the Pix2Pix model is used to play a certain role in the supervised image translation problem by using condition information, but this method needs corresponding label data to form a paired data training model, and the solved scene task is limited. The problem of unsupervised image translation for unpaired data is increasingly important. Therefore, the DualGAN model, the DiscoGAN model and the cyclic countermeasure network CycleGAN model are successively proposed to solve the problem of unpaired data.

Although good results have been obtained for unsupervised image translation related studies, some problems still remain. For example, even the more powerful CycleGAN models are still inadequate in dealing with the accuracy of mapping between different set domains, and the realism of translating images when capturing geometric and global features.

Disclosure of Invention

The invention aims to provide an unsupervised image translation method and an unsupervised image translation system, which are based on a double-capsule competition network and multi-subject generation countermeasure, can improve the discrimination and generation capability of a model, are used for generating images with richer global and local characteristics, and can more accurately capture the distribution of image domains and learn the mapping relation between different domains.

In order to achieve the purpose, the invention provides the following scheme:

an unsupervised image translation method comprising:

dividing original image data into source domain data and target domain data;

designing a generation countermeasure network, and initializing the weight and the hyper-parameter of the generation countermeasure network; the generating a countermeasure network includes: generator G_t2sAnd a discriminator D_{s2t_1}And a discriminator D_{s2t_2}Generator G_s2tAnd a discriminator D_{t2s_1}And a discriminator D_{t2s_2}；

Performing a first conversion task from the source domain data to the target domain data;

calculating a first discriminant loss in the first conversion task;

calculating a first generation loss in the first conversion task;

judging the image generated in the first conversion task;

calculating a first discriminant loss of the image generated in the first conversion task;

calculating a first reconstruction error in the first conversion task;

performing a second conversion task from the target domain data to the source domain data;

calculating a second discrimination loss in the second conversion task;

calculating a second generation penalty in the second conversion task;

judging the image generated in the second conversion task;

calculating a second discrimination loss of the image generated in the second conversion task;

calculating a second reconstruction error in the second conversion task;

updating the weight of the generation countermeasure network in accordance with the first discrimination loss, the first generation loss, a first discrimination loss of the generated image, the first reconstruction error, the second discrimination loss, the second generation loss, a second discrimination loss of the generated image, and the second reconstruction error;

and translating the image according to the updated weight.

Optionally, the performing the first conversion task from the source domain data to the target domain data specifically includes:

performing a task of converting the source domain data into the target domain data, and extracting image data X from the source domain data according to the batch processing size_sAre respectively inputted to the discriminators D_{s2t_1}And the discriminator D_{s2t_2}And judging whether the product is true or false.

Optionally, the calculating the first generation loss in the first conversion task specifically includes:

extracting image data X from the source domain data in batch size_sIs input to the generator G_s2tIn (1), an image X is generated with the source domain converted into the target domain_s2tCalculating the first generator loss g_t2s。

Optionally, the determining the image generated in the first conversion task specifically includes:

the image X to be generated_s2tAre respectively inputted to the discriminators D_{t2s_1}And the discriminator D_{t2s_2}And judging whether the product is true or false.

Optionally, the calculating a first reconstruction error in the first conversion task specifically includes:

the image X is processed_s2tInput to the generator G_t2sTo generate an image X_s2t2sComputing said image X in source domain data_sAnd generating image X by fusing features through two different generators_s2t2sReconstruction error L therebetween_rec。

Optionally, the performing of the second conversion task from the target domain data to the source domain data specifically includes:

performing a task of converting the source domain data into the target domain data, and extracting image data X from the target domain data according to the batch processing size_tAre respectively inputted to the discriminators D_{t2s_1}And a discriminator D_{t2s_2}And judging whether the product is true or false.

Optionally, the calculating a second generation loss in the second conversion task specifically includes:

extracting image data X from the target domain data in a batch size_tIs input to the generator G_t2sIn (1), an image X is generated in which the target domain is converted into the source domain_t2sCalculating the second generator loss g_s2t。

Optionally, the determining the image generated in the second conversion task specifically includes:

the image X to be generated_t2sAre respectively inputted to the discriminators D_{s2t_1}And the discriminator D_{s2t_2}And judging whether the product is true or false.

Optionally, the calculating a second reconstruction error in the second conversion task specifically includes:

the image X to be generated_t2sInput to the generator G_s2tTo generate an image X_t2s2tCalculating an original target field image X_tAnd generating image X by fusing features through two different generators_t2s2tReconstruction error L therebetween_rec。

An unsupervised image translation system comprising:

the original image dividing module is used for dividing original image data into source domain data and target domain data;

the system comprises a generation countermeasure network initialization module, a generation countermeasure network selection module and a generation countermeasure network selection module, wherein the generation countermeasure network initialization module is used for designing a generation countermeasure network and initializing the weight and the hyperparameter of the generation countermeasure network; the generating a countermeasure network includes: generator G_t2sAnd a discriminator D_{s2t_1}And a discriminator D_{s2t_2}Generator G_s2tAnd a discriminator D_{t2s_1}And a discriminator D_{t2s_2}；

The first conversion module is used for performing a first conversion task from the source domain data to the target domain data;

a first discriminant loss calculation module, configured to calculate a first discriminant loss in the first conversion task;

a first generation loss calculation module for calculating a first generation loss in the first conversion task;

the first judging module is used for judging the image generated in the first conversion task;

a first discriminant loss calculation module for generating an image, configured to calculate a first discriminant loss of the image generated in the first conversion task;

a first reconstruction error calculation module for calculating a first reconstruction error in the first conversion task;

the second conversion module is used for performing a second conversion task from the target domain data to the source domain data;

a second discrimination loss calculation module, configured to calculate a second discrimination loss in the second conversion task;

a second generation loss calculation module, configured to calculate a second generation loss in the second conversion task;

the second judging module is used for judging the image generated in the second conversion task;

a second discrimination loss calculation module for generating an image, configured to calculate a second discrimination loss of the image generated in the second conversion task;

a second reconstruction error calculation module for calculating a second reconstruction error in the second conversion task;

a weight updating module for updating the weight of the generated countermeasure network based on the first discrimination loss, the first generation loss, a first discrimination loss of the generated image, the first reconstruction error, the second discrimination loss, the second generation loss, a second discrimination loss of the generated image, and the second reconstruction error;

and the image translation module is used for translating the image according to the updated weight.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the invention takes two different image set domains of the same object as research objects, generates countermeasures based on a double-capsule competition network and a multi-subject, provides an unsupervised image translation method, improves the discrimination and generation capability of a model, is used for generating images with richer global and local characteristics, and can more accurately capture the distribution of the image domains and learn the mapping relation among the different domains.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flowchart of an unsupervised image translation method according to embodiment 1 of the present invention;

FIG. 2 is a block diagram of an unsupervised image translation system according to embodiment 2 of the present invention;

FIG. 3 is a network framework diagram of an unsupervised image translation method according to embodiment 3 of the present invention;

FIG. 4 is a diagram of pairwise pairs in a dataset cityscaps

Translating results of the different methods;

FIG. 5 is a diagram of the pairwise pairs for FIG. 4 in the dataset cityscaps

The results of the first and third row details in the translation of (a);

FIG. 6 is a generated image for different methods of the night → day translation at different iteration steps;

FIG. 7 is a detailed result of the generated image corresponding to each method at step 100,000 in FIG. 6;

figure 8 shows that in the data set sketch2photo,

translating results of the different methods;

FIG. 9 is a graph of a data set summer2winter YoIn the set of the standard codes of the semite,

translating results of the different methods;

figure 10 is a graph of the data set Oil2 chip,

translating results of the different methods;

figure 11 is a graph of the data set Ukiyoe2photo,

translating results of the different methods;

figure 12 is a graph of the data set vanogh 2photo,

translating results of the different methods;

FIG. 13 is an input image X and a reconstructed image obtained by a different method;

figure 14 is a graph of the data set at Day2night,

results of different methods of translation;

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention aims to provide an unsupervised image translation method and an unsupervised image translation system, which are based on a double-capsule competition network and multi-subject generation countermeasure, can improve the capability of model discrimination and generation for generating images with richer global and local features, and can more accurately capture the distribution of image domains and learn the mapping relation between different domains.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Example 1:

fig. 1 is a flowchart of an unsupervised image translation method according to embodiment 1 of the present invention. As shown in fig. 1, an unsupervised image translation method includes:

step 101: the original image data is divided into source domain data and target domain data.

Step 102: designing a generation countermeasure network, and initializing the weight and the hyper-parameters of the generation countermeasure network; generating the countermeasure network includes: generator G_t2sAnd a discriminator D_{s2t_1}And a discriminator D_{s2t_2}Generator G_s2tAnd a discriminator D_{t2s_1}And a discriminator D_{t2s_2}。

Step 103: performing a first conversion task from source domain data to target domain data, specifically comprising:

performing conversion task from source domain data to target domain data, and extracting image data X from source domain data according to batch processing size_sAre respectively input to a discriminator D_{s2t_1}And a discriminator D_{s2t_2}And judging whether the product is true or false.

Step 104: calculating a first discriminant loss in a first conversion task, in particular for the above-mentioned discriminator D_{s2t_1}And D_{s2t_2}Judging loss d by judging whether the source domain image is true or false_s2tAnd (5) linear fusion calculation.

Step 105: calculating a first generation loss in the first conversion task, specifically comprising:

batch-sized extraction of image data X from source domain data_sThen input to the generator G_s2tIn (1), an image X is generated with the source domain converted into the target domain_s2tCalculating the first generator loss g_t2s。

Step 106: the method for judging the image generated in the first conversion task specifically comprises the following steps:

image X to be generated_s2tAre respectively input to a discriminator D_{t2s_1}And a discriminator D_{t2s_2}And judging whether the product is true or false.

Step 107: calculating a first discriminant loss for the image generated in the first conversion task, in particular for the above-mentioned discriminator D_{t2s_1}And D_{t2s_2}Discrimination loss d is determined by discriminating true and false of generated image_t2sAnd (5) linear fusion calculation.

Step 108: calculating a first reconstruction error in the first conversion task, specifically comprising:

image X_s2tInput to the generator G_t2sTo generate an image X_s2t2sComputing an image X in source domain data_sAnd generating image X by fusing features through two different generators_s2t2sReconstruction error L therebetween_rec。

Step 109: performing a second conversion task from the target domain data to the source domain data, specifically including:

performing conversion task from source domain data to target domain data, and extracting image data X from target domain data according to batch processing size_tAre respectively input to a discriminator D_{t2s_1}And a discriminator D_{t2s_2}And judging whether the product is true or false.

Step 110: calculating a second discrimination loss in a second conversion task, in particular, for the discriminator D mentioned above_{t2s_1}And D_{t2s_2}Judging loss d by judging whether the source domain image is true or false_t2sAnd (5) linear fusion calculation.

Step 111: calculating a second generation loss in the second conversion task, specifically including:

extraction of image data X from object domain data in batch size_tThen input to the generator G_t2sIn (1), an image X is generated in which the target domain is converted into the source domain_t2sCalculating the second generator loss g_s2t。

Step 112: the determining the image generated in the second conversion task specifically includes:

image X to be generated_t2sAre respectively input to a discriminator D_{s2t_1}And a discriminator D_{s2t_2}And judging whether the product is true or false.

Step 113:calculating a second discrimination loss of the image generated in the second conversion task, in particular, for the discriminator D_{s2t_1}And D_{s2t_2}Discrimination loss d is determined by discriminating true and false of generated image_s2tAnd (5) linear fusion calculation.

Step 114: calculating a second reconstruction error in the second conversion task, specifically comprising:

image X to be generated_t2sInput to the generator G_s2tTo generate an image X_t2s2tCalculating an original target field image X_tAnd generating image X by fusing features through two different generators_t2s2tReconstruction error L therebetween_rec。

Step 115: the weights of the countermeasure network are generated by updating based on the first discrimination loss, the first generation loss, the first discrimination loss of the generated image, the first reconstruction error, the second discrimination loss, the second generation loss, the second discrimination loss of the generated image, and the second reconstruction error.

Step 116: and translating the image according to the updated weight.

The invention takes two different image set domains of the same object as research objects, and provides an implementation method of unsupervised image translation based on a double-capsule competition network and multi-subject generation countermeasure, which improves the model discrimination and generation capability for generating images with richer global and local characteristics, and can more accurately capture the distribution of the image domains and learn the mapping relation among different domains. In the process of mutually converting the source domain and the target domain, the target of each subtask (converting the source domain into the target domain or converting the target domain into the source domain) is determined by two discriminators D₁(convolutional network) and D₂The original data and the data generated by the generator G (residual error network) are judged to be true and false, and then the three main bodies reach a Nash equilibrium point in the process of mutual competition so as to learn the characteristics of the target domain.

The innovation of the unsupervised image translation method based on the dual-capsule competition network and the multi-subject generation countermeasure is mainly embodied in 3 aspects: first, the present invention develops a new generative confrontation model in order to generate more detailed and structural features during image translation. Secondly, in order to solve the problem of unsupervised image translation, the capsule network is introduced for the first time to serve as a discriminator of a multi-subject generation countermeasure model so as to improve the overall discrimination generation capability of the model. Thirdly, the invention optimizes the routing algorithm in the capsule network and empirically proves the effectiveness of the capsule network.

Example 2:

fig. 2 is a structural diagram of an unsupervised image translation system according to embodiment 2 of the present invention. As shown in fig. 2, an unsupervised image translation system includes:

an original image dividing module 201, configured to divide original image data into source domain data and target domain data.

A generation countermeasure network initialization module 202, configured to design a generation countermeasure network, and initialize weights and hyper-parameters of the generation countermeasure network; generating the countermeasure network includes: generator G_t2sAnd a discriminator D_{s2t_1}And a discriminator D_{s2t_2}Generator G_s2tAnd a discriminator D_{t2s_1}And a discriminator D_{t2s_2}。

A first conversion module 203, configured to perform a first conversion task from the source domain data to the target domain data.

A first discriminant loss computation module 204 is configured to compute a first discriminant loss in the first conversion task.

A first generation loss calculation module 205 for calculating a first generation loss in the first conversion task.

A first determination module 206, configured to determine an image generated in the first conversion task.

A first discriminant loss calculation module 207 for generating an image, for calculating a first discriminant loss of the image generated in the first conversion task.

A first reconstruction error calculation module 208 is configured to calculate a first reconstruction error in the first conversion task.

And a second conversion module 209, configured to perform a second conversion task from the target domain data to the source domain data.

A second judgment loss calculating module 210, configured to calculate a second judgment loss in the second conversion task.

A second generation loss calculation module 211, configured to calculate a second generation loss in the second conversion task.

And a second judging module 212, configured to judge the image generated in the second conversion task.

A second discrimination loss calculation module 213 for generating an image, for calculating a second discrimination loss of the image generated in the second conversion task.

A second reconstruction error calculation module 214 for calculating a second reconstruction error in the second conversion task.

A weight updating module 215 for updating the weight of the generated countermeasure network according to the first discrimination loss, the first generation loss, the first discrimination loss of the generated image, the first reconstruction error, the second discrimination loss, the second generation loss, the second discrimination loss of the generated image, and the second reconstruction error.

And the image translation module 216 is configured to perform image translation according to the updated weight.

Example 3:

fig. 3 is a network framework diagram of an unsupervised image translation method according to embodiment 3 of the present invention. The unsupervised image translation method in the embodiment 3 of the invention comprises the following steps:

s1: the original image data is divided into source domain data and target domain data to form a training data set and a testing data set.

S2: all the sub-network weights and hyper-parameters in the framework are initialized, and the model is built.

S3: in a conversion task from a Source Domain to a Target Domain, image data X is fetched from the Source Domain in a batch size_sAre respectively input to a discriminator D_{s2t_1}(convolutional network) and discriminator D_{s2t_2}(improved capsule network) to discriminate true from false (Real or Fake).

S4: for the above-mentioned discriminator D_{s2t_1}And D_{s2t_2}Judging loss d by judging whether the source domain image is true or false_s2tAnd (5) linear fusion calculation.

S5: as with S3, image data X is fetched from the source domain in batch size_sThen input to the generator G_s2t(residual error network) for generating an image X with a source domain converted into a target domain_s2tAnd calculating the generator loss g_t2s。

S6: image X to be generated_s2tAre respectively input to a discriminator D_{t2s_1}(convolutional network) and discriminator D_{t2s_2}(improved capsule network) to discriminate true from false (Real or Fake).

S7: for the above-mentioned discriminator D_{t2s_1}And D_{t2s_2}Discrimination loss d is determined by discriminating true and false of generated image_t2sAnd (5) linear fusion calculation.

S8: continuing the image X to be generated_s2tInput to the generator G_t2sGeneration of image X in residual network_s2t2sNamely, the source domain image generates the fusion generation characteristic flowing to the target domain and returns the fusion generation characteristic of the source domain, thereby calculating the original source domain image X_sAnd generating image X by fusing features through two different generators_s2t2sReconstruction error L therebetween_rec。

S9: in a conversion task from a Target Domain to a Source Domain, image data X is fetched from the Target Domain in a batch size_tAre respectively input to a discriminator D_{t2s_1}(convolutional network) and discriminator D_{t2s_2}(improved capsule network) to discriminate true from false (Real or Fake).

S10: for the above-mentioned discriminator D_{t2s_1}And D_{t2s_2}Judging loss d by judging whether the source domain image is true or false_t2sAnd (5) linear fusion calculation.

S11: as with S9, image data X are fetched in batch size from the target domain_tThen input to the generator G_t2s(residual error network) for generating an image X in which the target domain is converted into the source domain_t2sAnd calculating the generator loss g_s2t。

S12: image X to be generated_t2sAre respectively input to a discriminator D_{s2t_1}(convolutional network) and discriminator D_{s2t_2}(improved Capsule network) discriminationTrue Or false (Real Or Fake).

S13: for the above-mentioned discriminator D_{s2t_1}And D_{s2t_2}Discrimination loss d is determined by discriminating true and false of generated image_s2tAnd (5) linear fusion calculation.

S14: continuing the image X to be generated_t2sInput to the generator G_s2tGeneration of image X in residual network_t2s2tThat is, the target domain image generation flows to the source domain fusion generation feature and returns to the target domain fusion generation feature, thereby calculating the original target domain image X_tAnd generating image X by fusing features through two different generators_t2s2tReconstruction error L therebetween_rec。

S15: respectively reducing the above loss g_s2t,g_t2s,d_s2t,d_t2sAnd a reconstruction error L_recMinimization to update the generator G in turn_t2s(residual network), Generator G_s2t(residual error network), discriminator D_{s2t_1}(convolutional network) and discriminator D_{s2t_2}(improved Capsule network), arbiter D_{t2s_1}(convolutional network) and discriminator D_{t2s_2}Network weights of (improved capsule network).

Example 4:

the embodiment of the invention provides an unsupervised image translation method based on a double-capsule competition network and multi-subject generation countermeasure_t2s(residual error network), discriminator D_{s2t_1}(convolutional network), arbiter D_{s2t_2}(improved Capsule network), Generator G_s2t(residual error network), discriminator D_{t2s_1}(convolutional network) and discriminator D_{t2s_2}(improved capsule network). The method may comprise the steps of:

step 1: experimental data preparation phase.

The data set used in the present invention includes three paired data sets and four unpaired data sets, and as shown in table 1, the original image data is divided into source domain data and target domain data to form a training data set and a testing data set.

TABLE 1 data set, examples number, size and description

Step 2: model building and parameter initialization phases.

Initializing all sub-network weights and hyper-parameters in the framework, and establishing a model; the learning rate of 0.00002 linearly decays in the first 100 generations and in the latter 100 generations.

And step 3: the Source Domain (Source Domain) converts a subtask of the Target Domain (Target Domain).

In a conversion task from a Source Domain to a Target Domain, image data X is fetched from the Source Domain in a batch size_sAre respectively input to a discriminator D_{s2t_1}(convolutional network) and discriminator D_{s2t_2}(improved capsule network) to discriminate true from false (Real or Fake). Wherein the size of the input image is 256 × 3, and a discriminator D_{s2t_1}The (convolutional network) consists of three convolutional layers with a step size of 2, 4 x 4 convolutional kernel and two convolutional layers with a step size of 1, 4 x 4 convolutional kernel. And a discriminator D_{s2t_2}The improved capsule network is improved according to the original capsule network, firstly three convolution layers with the step size of 2, 5 × 5 convolution kernels are used for sequentially extracting characteristic subgraphs, then the characteristic subgraphs are input into an initial capsule layer through one convolution layer with the step size of 1, 9 × 9 convolution kernels, and the initial capsule layer is selected to reach a final 16D capsule layer through a dynamic routing algorithm, so that the dynamic routing algorithm is improved.

In order to avoid covering small value changes by large value changes, a compression function in a dynamic routing algorithm is improved: c. C_ijIs a coupling coefficient, determined during an iterative process of dynamic routing, which indicates a trend from a low dimensional capsule i to a high dimensional capsule j, the higher the trend the more pronounced.

Is the prediction vector of the capsule network.

And 4, step 4: and (4) calculating the discrimination loss of the source domain image in the source domain conversion target domain.

For the above-mentioned discriminator D_{s2t_1}And D_{s2t_2}Judging loss d by judging whether the source domain image is true or false_s2tAnd (5) linear fusion calculation. Since multi-agent generation countermeasure is adopted, i.e. the capsule network is used as an additional discriminator to improve the discrimination generation capability, the improved complete objective function is:

in the formula, L_DuCaGAN(G_t2s,D_s2t-1,D_s2t-2,X_s,X_t) Loss function, L, representing the task of converting the source domain to the target domain_DuCaGAN(G_s2t,D_t2s-1,D_t2s-2,X_t,X_s) Loss function, L, representing target domain to source domain task_rec(G_t2s,G_s2t) Representing the reconstruction error, lambda, of the image in two tasks_recThe hyper-parameter represents the degree of importance of the reconstruction error in the overall objective function, and may also be understood as the weight of the consistency loss function.

For the source domain to target domain subtasks, the objective function it depends on is as follows:

wherein L is_MIs a marginal loss, λ, characteristic of the capsule network₁And λ₂The weight representing the margin loss.

To avoid model collapse and training instability, introducing marginal loss, the calculation form is as follows:

v_k＝CapsuleD(x_k) (4)

for the subtask of converting the target domain image from the source domain, the generator G_t2sDiscriminator D_s2t-1Sum discriminator

，D_s2t-2Image X for a target field_tAnd the generated image X_sAnd (3) mutually competing to reach a Nash equilibrium point, and then completing the optimization problem:

and 5: the generation loss of the source domain to the destination domain.

Batch-sized extraction of image data X from a source domain_sThen input to the generator G_s2t(residual error network) for generating an image X with a source domain converted into a target domain_s2tAnd calculating the generator loss g_t2s. Generator G_s2tThe (residual net) consists of two convolutional layers with a step size of 2, 3 × 3 convolutional kernels for downsampling, two deconvolution layers with a step size of 2, 3 × 3 convolutional kernels for training 256 × 256 images, and one convolutional layer with a step size of 1, 7 × 7 convolutional kernels.

Step 6: generation of image X in Source Domain conversion target Domain_s2tAnd (4) judging.

Image X to be generated_s2tAre respectively input to a discriminator D_{t2s_1}(convolutional network) and discriminator D_{t2s_2}(improved capsule network) to discriminate true and false (Real or Fake);

and 7: and (4) calculating the discrimination loss of the generated image in the source domain conversion target domain.

For the above-mentioned discriminator D_{t2s_1}And D_{t2s_2}Discrimination loss d is determined by discriminating true and false of generated image_t2sLinear fusion calculation;

and 8: and (4) performing cycle-consistent reconstruction calculation on the image in the source domain and the conversion target domain.

Continuing the image X to be generated_s2tInput to the generator G_t2s(residual network) generationImage X_s2t2sNamely, the source domain image generates the fusion generation characteristic flowing to the target domain and returns the fusion generation characteristic of the source domain, thereby calculating the original source domain image X_sAnd generating image X by fusing features through two different generators_s2t2sReconstruction error L therebetween_rec。

In order to ensure that the generator can synthesize a real image when switching between the two domains, not only the penalty of confrontation needs to be optimized, but also the cyclic coincidence penalty characterizing the reconstruction error needs to be optimized at the same time, as follows:

and step 9: the Target Domain (Target Domain) converts a subtask of the Source Domain (Source Domain).

In a conversion task from a Target Domain to a Source Domain, image data X is fetched from the Target Domain in a batch size_tAre respectively input to a discriminator D_{t2s_1}(convolutional network) and discriminator D_{t2s_2}(improved capsule network) to discriminate true from false (Real or Fake). Here, discriminator D_{t2s_1}(convolutional network) and discriminator D_{t2s_2}The network structure of the (improved capsule network) is identical to that described in step 3, as is the progression to the capsule network.

Step 10: and (4) calculating the discrimination loss of the target domain image in the target domain conversion source domain.

For the above-mentioned discriminator D_{t2s_1}And D_{t2s_2}Judging loss d by judging whether the source domain image is true or false_t2sAnd (5) linear fusion calculation.

For the target domain to convert the source domain subtask, the objective function it depends on is as follows:

wherein L is_MIs a marginal loss, λ, characteristic of the capsule network₁And λ₂Indicating marginThe lost weight.

For the subtask that the target domain is converted into the source domain, the generator G_s2tAnd a discriminator D_t2s-1And a discriminator D_t2s-2For image X of the source domain_sAnd the generated image X_tAnd (3) mutually competing to reach a Nash equilibrium point, and then completing the optimization problem:

step 11: the destination domain translates the generation loss of the source domain.

Batch-sized extraction of image data X from a target domain_tThen input to the generator G_t2s(residual error network) for generating an image X in which the target domain is converted into the source domain_t2sAnd calculating the generator loss g_s2t. Generator G_t2sThe (residual network) is consistent with the structure described in step 5.

Step 12: generation of image X in target Domain transformed Source Domain_t2sAnd (4) judging.

Image X to be generated_t2sAre respectively input to a discriminator D_{s2t_1}(convolutional network) and discriminator D_{s2t_2}(improved capsule network) to discriminate true from false (Real Or Fake).

Step 13: and (4) calculating the discrimination loss of the generated image in the target domain conversion source domain.

For the above-mentioned discriminator D_{s2t_1}And D_{s2t_2}Discrimination loss d is determined by discriminating true and false of generated image_s2tAnd (5) linear fusion calculation.

Step 14: and (4) performing cycle-consistent reconstruction calculation on the image in the target domain conversion source domain.

Continuing the image X to be generated_t2sInput to the generator G_s2tGeneration of image X in residual network_t2s2tThat is, the target domain image generation flows to the source domain fusion generation feature and returns to the target domain fusion generation feature, thereby calculating the original target domain image X_tAnd generating image X by fusing features through two different generators_t2s2tReconstruction error betweenDifference L_rec。

In

equations

3 and 8, the parameter λ₁And λ₂Set to 0,0.5 and 1, respectively, suitable values were obtained through different experimental comparisons. In equation 5, λ is set to 0.5 based on the calculation of the margin loss. Lambda [ alpha ]_recAre set to appropriate values to avoid large value drastic changes. The parameter values are shown in table 2:

TABLE 2 parameter values

Step 15: and (5) a weight updating process.

Respectively reducing the above loss g_s2t,g_t2s,d_s2t,d_t2sAnd a reconstruction error L_recMinimization to update the generator G in turn_t2s(residual network), Generator G_s2t(residual error network), discriminator D_{s2t_1}(convolutional network) and discriminator D_{s2t_2}(improved Capsule network), arbiter D_{t2s_1}(convolutional network) and discriminator D_{t2s_2}Network weights of (improved capsule network).

Example 5:

in order to verify the effectiveness of the unsupervised image translation method, the method of the invention was compared with the existing DCGAN, Pix2Pix and CycleGA methods. Different evaluation indexes and comparison methods exist for different data set tasks.

And (5) evaluating indexes of image segmentation. The evaluation of semantic labeling tasks related to the citrescaps data set in the embodiment comprises Frequency weight overlap weighted IOU, accuracy Per pixel Per-pixel acc, accuracy Per category Per-class acc and category overlap ClassIOU.

FCN-8s Score can use existing classifiers to automatically quantify the results of the generation if the generated image is sufficiently authentic, then classifiers trained on the authentic image can also correctly classify the resultant image. This example uses FCN-8s Score to make quantitative evaluations of tasks on the Cityscapes dataset.

FIG. 4 is a diagram of pairwise pairs in a dataset cityscaps

The results of the different methods are translated. FIG. 5 is a diagram of the pairwise pairs for FIG. 4 in the dataset cityscaps

The first and third row details in the translation.

The results of the experiment are shown in fig. 4 and 5 and tables 3 and 4. It can be seen that on the task of the cityscaps data set, the method not only learns the characteristics of the target domain and maintains the structural characteristics of the source domain, but also generates images with more reasonable and accurate detail information. This relies mainly on the powerful learning capabilities of the introduced capsule network.

FIG. 6 is a generated image for different methods of the night → day translation at different iteration steps; FIG. 7 is a detailed result of the generated image corresponding to each method at step 100,000 in FIG. 6; figure 8 shows that in the data set sketch2photo,

translating results of the different methods; figure 9 is a graph of the data set summer2 witter Yosemite,

translating results of the different methods; figure 10 is a graph of the data set Oil2 chip,

translating results of the different methods; figure 11 is a graph of the data set Ukiyoe2photo,

translating results of the different methods; figure 12 is a graph of the data set vanogh 2photo,

translating results of the different methods; FIG. 13 shows an input image X and XA reconstructed image obtained by the same method; FIG. 14 shows the results of different methods of Day → night translation in the Day2night dataset.

For the conversion task of Day2night data set, as shown in fig. 6, fig. 7 and fig. 14, the method of the present invention can generate image distribution closer to the target domain, and can also capture the characteristics of brightness, structure and the like of the target scene quickly. For the task of converting the face structure of the Sketch2photo data set, as shown in fig. 8, the invention learns the brightness, color features and local structure features of face images in different domains, but has a certain gap compared with the actual label. As shown in FIGS. 9, 10, 11 and 12, including tasks from different scenarios

The conversion of,

The conversion of,

The conversion of,

The method not only reflects the knowledge transfer of different domain characteristics of the image, but also can approximate the real distribution of the target domain. In addition to learning structural, texture, color, and style features, the resulting image is also more realistic. In addition, from the reconstructed image G_t2s(G_s2t(x) To analyze the effectiveness of the method of the invention, if the model performs well, the reconstructed image should be closer to the input image. As shown in fig. 13, randomly selected images in different dataset tasks are compared to the input image, and it can be seen that the method of the present invention produces images closer to the input image. This also illustrates the effectiveness of the method in terms of reconstruction. FIG. 14 illustrates the discrimination loss and resulting image of the different methods during 5,000, 40,000 and 100,000 step iterations. The sequence from top to bottom is as follows: DCGAN, CycleGAN and the method of the invention (DuCaGAN).

TABLE 3 FCN scores for different methods of labels → photos translation in the Cityscapes dataset.

Table 4 Performance of the labels → photos translation in the Cityscapes dataset.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. An unsupervised image translation method, comprising:

dividing original image data into source domain data and target domain data;

calculating a first discriminant loss in the first conversion task;

calculating a first generation loss in the first conversion task;

judging the image generated in the first conversion task;

calculating a first reconstruction error in the first conversion task;

calculating a second discrimination loss in the second conversion task;

calculating a second generation penalty in the second conversion task;

judging the image generated in the second conversion task;

calculating a second reconstruction error in the second conversion task;

performing image translation according to the updated weight;

the performing of the first conversion task from the source domain data to the target domain data specifically includes:

performing a task of converting the source domain data into the target domain data, and extracting image data X from the source domain data according to the batch processing size_sAre respectively inputted to the discriminators D_{s2t_1}And the discriminator D_{s2t_2}Judging whether the product is true or false;

the calculating a first discriminant loss in the first conversion task specifically includes:

for the discriminator D_{s2t_1}And the discriminator D_{s2t_2}Judging loss d by judging whether the source domain image is true or false_s2tLinear fusion calculation;

the calculating the first generation loss in the first conversion task specifically includes:

extracting image data X from the source domain data in batch size_sIs input to the generator G_s2tIn (1), an image X is generated with the source domain converted into the target domain_s2tCalculating the first generator loss g_t2s；

The determining the image generated in the first conversion task specifically includes:

the image X to be generated_s2tAre respectively inputted to the discriminators D_{t2s_1}And the discriminator D_{t2s_2}Judging whether the product is true or false;

the calculating a first discriminant loss of the image generated in the first conversion task specifically includes:

for the discriminator D_{t2s_1}And the discriminator D_{t2s_2}Discrimination loss d is determined by discriminating true and false of generated image_t2sLinear fusion calculation;

the calculating a first reconstruction error in the first conversion task specifically includes:

the image X is processed_s2tInput to the generator G_t2sTo generate an image X_s2t2sComputing said image X in source domain data_sAnd generating image X by fusing features through two different generators_s2t2sReconstruction error L therebetween_rec；

The performing of the second conversion task from the target domain data to the source domain data specifically includes:

performing a task of converting the source domain data into the target domain data, and extracting image data X from the target domain data according to the batch processing size_tAre respectively inputted to the discriminators D_{t2s_1}And the discriminator D_{t2s_2}Judging whether the product is true or false;

the calculating a second discrimination loss in the second conversion task specifically includes:

for the discriminator D_{t2s_1}And the discriminator D_{t2s_2}Judging loss d by judging whether the source domain image is true or false_t2sLinear fusion calculation;

the calculating a second generation loss in the second conversion task specifically includes:

extracting image data X from the target domain data in a batch size_tIs input to the generator G_t2sIn (1), an image X is generated in which the target domain is converted into the source domain_t2sCalculating the second generator loss g_s2t；

The determining the image generated in the second conversion task specifically includes:

the image X to be generated_t2sAre respectively inputted to the discriminators D_{s2t_1}And the discriminator D_{s2t_2}Judging whether the product is true or false;

the calculating a second reconstruction error in the second conversion task specifically includes:

2. An unsupervised image translation system, comprising:

A first conversion module, configured to perform a first conversion task from the source domain data to the target domain data, specificallyFor performing a task of converting the source domain data into the target domain data, and extracting image data X from the source domain data in a batch size_sAre respectively inputted to the discriminators D_{s2t_1}And the discriminator D_{s2t_2}Judging whether the product is true or false;

a first discriminant loss calculation module for calculating a first discriminant loss in the first conversion task, specifically for the discriminator D_{s2t_1}And the discriminator D_{s2t_2}Judging loss d by judging whether the source domain image is true or false_s2tLinear fusion calculation;

a first generation loss calculation module, configured to calculate a first generation loss in the first conversion task, specifically, to extract image data X from the source domain data in a batch size_sIs input to the generator G_s2tIn (1), an image X is generated with the source domain converted into the target domain_s2tCalculating the first generator loss g_t2s；

A first determination module, configured to determine an image generated in the first conversion task, specifically, to determine the generated image X_s2tAre respectively inputted to the discriminators D_{t2s_1}And the discriminator D_{t2s_2}Judging whether the product is true or false;

a first discriminant loss calculation module for calculating a first discriminant loss for the image generated in the first conversion task, in particular for the discriminator D_{t2s_1}And the discriminator D_{t2s_2}Discrimination loss d is determined by discriminating true and false of generated image_t2sLinear fusion calculation;

a first reconstruction error calculation module for calculating a first reconstruction error in the first conversion task, in particular for transforming the image X_s2tInput to the generator G_t2sTo generate an image X_s2t2sComputing said image X in source domain data_sAnd generating image X by fusing features through two different generators_s2t2sReconstruction error L therebetween_rec；

A second conversion module for performing the above-mentioned purposeA second conversion task from the domain-marked data to the source domain data, specifically, a conversion task for converting the source domain data to the target domain data, and extracting image data X from the target domain data according to a batch processing size_tAre respectively inputted to the discriminators D_{t2s_1}And the discriminator D_{t2s_2}Judging whether the product is true or false;

a second discrimination loss calculation module for calculating a second discrimination loss in the second conversion task, specifically, for the discriminator D_{t2s_1}And the discriminator D_{t2s_2}Judging loss d by judging whether the source domain image is true or false_t2sLinear fusion calculation;

a second generation loss calculation module, configured to calculate a second generation loss in the second conversion task, specifically, to extract image data X from the target domain data according to a batch size_tIs input to the generator G_t2sIn (1), an image X is generated in which the target domain is converted into the source domain_t2sCalculating the second generator loss g_s2t；

A second judging module, configured to judge the image generated in the second conversion task, specifically, configured to judge the generated image X_t2sAre respectively inputted to the discriminators D_{s2t_1}And the discriminator D_{s2t_2}Judging whether the product is true or false;

a second discrimination loss calculation module for calculating a second discrimination loss of the image generated in the second conversion task, in particular, for the discriminator D_{s2t_1}And the discriminator D_{s2t_2}Discrimination loss d is determined by discriminating true and false of generated image_s2tLinear fusion calculation;

a second reconstruction error calculation module for calculating a second reconstruction error in the second conversion task, in particular for generating the image X_t2sInput to the generator G_s2tTo generate an image X_t2s2tCalculating an original target field image X_tAnd generating image X by fusing features through two different generators_t2s2tReconstruction error L therebetween_rec；