CN111626917B

CN111626917B - Bidirectional image conversion system and method based on deep learning

Info

Publication number: CN111626917B
Application number: CN202010284081.8A
Authority: CN
Inventors: 杨浩特; 涂仕奎
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2020-04-13
Filing date: 2020-04-13
Publication date: 2024-02-20
Anticipated expiration: 2040-04-13
Also published as: CN111626917A

Abstract

The invention provides a bidirectional image conversion system based on deep learning, which comprises: a bidirectional generator: in the forward and reverse directions of the bidirectional generator, image conversion tasks between a pair of image domains can be respectively carried out, and in any direction of the bidirectional generator, the model calculates target conversion images of the images by using a depth parallel computing framework on input multi-channel image data; a discriminator: the discriminator carries out quality evaluation on the image obtained by the bidirectional generator and the real image, and the quality evaluation result is used for training the bidirectional generator and the discriminator. Meanwhile, a bidirectional image conversion method based on system implementation is provided. The invention provides a bidirectional generator structure, which greatly reduces parameters of a deep learning model on the premise of not reducing the image generation quality, and can realize two pairs of image conversion tasks in one model under the supervision.

Description

Bidirectional image conversion system and method based on deep learning

Technical Field

The invention relates to the technical field of image conversion, in particular to a two-way image conversion system and method based on deep learning.

Background

Image conversion refers to the task of image conversion between two image fields, which converts an image from image a into an image belonging to image B according to certain rules or requirements. The result of the transformation may be to keep the content of image field a unchanged, but to introduce features of image field B, such as a style migration task, or to generate an image of image field B containing more information, such as color information, from a neural network based on less information of image field a, such as a coloring (coloring) task, or to change the content of image field a to adapt to the content of image field B, such as (transformation between zebra and zebra).

There are many successful models in the task of image conversion between paired data sets, with the Pix2Pix model being one of the more successful. The Pix2Pix model is proposed to use the antagonism generation network (GAN) framework to handle such tasks. It can be seen as a conditional challenge-generating network (CGAN) that is the input image of the generator (sample of image domain a).

The Pix2Pix model fully combines and utilizes the advantages of DCGAN and CGAN, and can realize high-quality generation results. But its training process requires paired data. This is because one term in the loss function of the Pix2Pix model is the L1 distance of the generated image and the label image. If the two data sets used by the model are unpaired, then this loss cannot be used for training.

The CycleGAN model was proposed to solve the image conversion problem between unpaired datasets by introducing a loop consistency penalty (cycle consistency) instead of the L1 penalty. Assuming two translators f and g, f realize the translation from Chinese "I love science" to English "I love science", g realizes the translation from English "I love science" to Chinese "I love science". Then, the two translators can be considered to be in inverse relationship to each other. If both translators have good enough performance, then theoretically, f and g should satisfy f (g (I love science)) =i love science, which is a loop consistency. The CycleGAN model uses a pair of generators in inverse relation to each other to respectively realize the task of image conversion between two image domains (two unpaired data sets), so that the generated image can be reconstructed into an original image after passing through the two generators in inverse relation, and L1 loss can be used between the reconstructed image and the input image. The corresponding model also requires two discriminators to respectively discriminate the generation quality of the generator.

A minimum mean square error reconstruction (Lmser) network is a classical network structure. It is obtained by folding together a traditional self-encoder (AE) along the central hidden layer, because of the symmetrical structure of the encoder and decoder of AE, the neurons of the encoder will coincide with those of the decoder, as will the weights of the encoder. In Lmser, the connection between adjacent layers of neurons is a distributed cascade (distributed cascading) relationship. In the training process of the network, information can be continuously and bidirectionally transferred between two adjacent layers. Among the many attributes of Lmser, there are three of the most important dualities. I.e. duality of the bi-directional structure (DBA), duality of the connection weights (DCW) and duality of the pairing neurons (DPN). In recent years, lmser achieves good effects in tasks such as image recognition, medical image segmentation, picture super-resolution and the like.

However, the existing image conversion technology still has the problems of large model parameter number and single conversion task, and can not really meet the requirement of image conversion. No description or report of similar technology is found at present, and similar data at home and abroad are not collected.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a bidirectional image conversion system and method based on deep learning, which are based on minimum mean square error reconstruction and are used for resisting bidirectional image conversion of a generation network and a depth parallel computing framework, wherein one bidirectional generator can be used for completing image conversion tasks of two unidirectional generators in a CycleGAN model, and further, two pairs of independent image conversion tasks can be completed in two directions of one bidirectional generator.

The invention is realized by the following technical scheme.

According to one aspect of the present invention, there is provided a deep learning-based bi-directional image conversion system, including:

a bidirectional generator: performing an image conversion task between a pair of images in the forward and reverse directions of the bidirectional generator; in either direction, the target conversion image of the image can be calculated using a depth parallel computing framework for the input multi-channel image data;

a discriminator: and the discriminator carries out quality evaluation on the image and the real image obtained by the bidirectional generator, and the quality evaluation result is fed back to the bidirectional generator to train the bidirectional generator.

Preferably, the bidirectional generator includes a forward direction and a reverse direction, and each of the forward direction and the reverse direction includes: convolution layer, residual network and deconvolution layer; wherein the convolution kernels for the convolution layer and the residual network in the forward direction share the deconvolution layer and the residual network in the reverse direction; the first and last layers in both switching directions do not share a convolution kernel.

Preferably, in the training process of the bidirectional generator, a loss function adopted by an updating mechanism is an anti-loss function; wherein: the counterloss functions in the forward and reverse directions are:

L _GAN (G _f ，D _B ，A，B)＝logD _B (y)+log(1-D _B (G _f (x)))

L _GAN (G _b ，D _A ，A，B)＝logD _A (x)+log(1-D _A (G _b (y)))

wherein x and y represent images from two different data sets A and B, respectively, D _A And D _B Representing two discriminators, G _f And G _b Is a bidirectional generating module, and respectively runs in forward and backward directions.

Preferably, the update mechanism employs a random gradient descent optimization method:

where η is the update rate used to control the magnitude of model updates;is the information fed back to the depth parallel computing framework after the update mechanism is computed.

Preferably, the target converted image of the image is obtained by:

the input multi-channel image data is reduced to 64 and then restored to 256 through a 15-layer computing module by utilizing a depth parallel computing framework, the number of channels is increased to 128, then reduced to the original number of channels, and finally the output image conversion result is the target conversion image.

Preferably, the method for evaluating the quality by the discriminator comprises the following steps:

the size of the input image is gradually compressed to 32 through a 5-layer computing module by utilizing a depth parallel computing framework, the number of channels is increased to 512, the number of channels is finally compressed to 1, and finally the output result is used as the result of the input image quality evaluation by the discriminator.

According to another aspect of the present invention, there is provided a bi-directional image conversion method based on deep learning, including:

performing an image conversion task between a pair of images in the forward and reverse directions of the bidirectional generator; in either direction, the target conversion image of the image can be calculated using a depth parallel computing framework for the input multi-channel image data;

and the discriminator evaluates the quality of the image obtained by the bidirectional generator and the real image.

Preferably, the method further comprises:

and feeding the quality evaluation result of the discriminator back to the bidirectional generator to train the bidirectional generator.

Preferably, in the training process, the loss function adopted by the updating mechanism is an anti-loss function; wherein: the counterloss functions in the forward and reverse directions are:

L _GAN (G _f ，D _B ，A，B)＝logD _B (y)+log(1-D _B (G _f (x)))

L _GAN (G _b ，D _A ，A，B)＝logD _A (x)+log(1-D _A (G _b (y)))

Preferably, the target converted image of the image is obtained by:

Compared with the prior art, the invention has the following beneficial effects:

the invention greatly reduces the parameters of the model on the premise of not changing the image conversion effect.

The invention can realize two pairs of image conversion tasks on one model.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:

FIG. 1 is a diagram of a general architecture of a deep learning based bi-directional image conversion system in accordance with one embodiment of the present invention;

FIG. 2 is a schematic diagram of a bi-directional image conversion system based on deep learning according to another embodiment of the present invention;

fig. 3 is a diagram of a bi-directional generator architecture in two embodiments of the present invention.

Detailed Description

The following describes embodiments of the present invention in detail: the embodiment is implemented on the premise of the technical scheme of the invention, and detailed implementation modes and specific operation processes are given. It should be noted that variations and modifications can be made by those skilled in the art without departing from the spirit of the invention, which falls within the scope of the invention.

A first embodiment of the present invention provides a deep learning-based bidirectional image conversion system, including:

a bidirectional generator: performing an image conversion task between a pair of unpaired images in the forward and reverse directions of the bidirectional generator; in the forward direction, computing a target conversion image of the image using a depth parallel computing framework for the input multi-channel image data; the method comprises the steps of carrying out a first treatment on the surface of the Taking this target transformed image as input in the other direction, a depth parallel computing framework is used to compute a reconstructed image of the forward input image, which is used to compute the loop consistency loss.

A discriminator: the discriminator carries out quality evaluation on the image and the real image obtained by the bidirectional generator, and the quality evaluation result is fed back to the bidirectional generator to train the bidirectional generator.

As a preferred embodiment, the system requires only one generator module to accomplish the two image conversion tasks, and the parallel computing framework used is end-to-end. The bi-directional generator performs two conversion tasks simultaneously in one module. In the forward process of the bi-directional generator, the image from image domain a is converted into an image belonging to image domain B. In the backward process of the bi-directional generator, the image from image domain B is converted into an image belonging to image domain a.

To achieve the bi-directional generator effect, the bi-directional generator shares convolution kernels in both generation directions of the bi-directional generator. Each direction of the bi-directional generator includes a convolution layer, a residual module and a deconvolution layer. Wherein the convolution kernel for the convolution layer and residual network in the forward direction of the bi-directional generator will be used for the deconvolution layer and residual network in the reverse direction of the module. In this way, sharing of convolution kernels is achieved. This approach can be traced back to the Duality (DCW) of the connection weights of Lmser. Which is extended here to convolutional neural network structures.

As a preferred embodiment, in the bi-directional generator: the multi-channel image data is input, the size of the multi-channel data is obviously reduced to 64 and then restored to 256 through a 15-layer computing module by utilizing a depth parallel computing framework, the number of channels is firstly increased to 128 and then reduced to the original number of channels, and the final output is the result of image conversion.

As a preferred embodiment, in the arbiter: the method comprises the steps of inputting multichannel image data, or being a real image or generating an image, gradually compressing the size of the multichannel image data to 32 through a 5-layer computing module by using a depth parallel computing framework, increasing the channel number to 512, compressing to 1, and finally outputting as an evaluation of the image quality of the input multichannel image data by using a discriminator.

In this first embodiment, the bi-directional generator can input multi-channel image data in both the forward and reverse directions and perform an image conversion task based on the input. All previous deep learning models use a single generation module, the parameter quantity is large, and the execution task of the model is single. A specific way to implement a bi-directional generator is to share convolution kernels in both generation directions of the bi-directional generation module. Each direction of the bi-directional generation module comprises a convolution layer, a residual module and a deconvolution layer. Wherein the convolution kernel used for the convolution layer and residual network in the forward direction of the bi-directional generation module will be used for the deconvolution layer and residual network in the reverse direction of the module. In this way, sharing of convolution kernels is achieved.

The deep learning-based bi-directional image conversion system provided in this first embodiment is described in further detail below with reference to the accompanying drawings.

As shown in fig. 1, a schematic structural diagram of an implementation of the two-way image conversion system based on deep learning according to the first embodiment is provided. This embodiment uses two data sets a and B. In the figure there is a bi-directional generator G and two discriminators D _A And D _B . The input image is subjected to forward direction of the bidirectional generator to obtain a conversion target image, and then the conversion target image is input into the reverse direction of the bidirectional generator to obtain a reconstructed image of the input image. Conversion purposeThe target image is the required output result. The conversion target image and the real image are sent to a discriminator for evaluation, and evaluation feedback guides the learning process of the bidirectional generator. The reconstructed image is used to calculate a loop consistency loss.

As shown in fig. 3, a schematic diagram of the bidirectional generator used in the first implementation is shown. The two unidirectional generators of the previous CycleGAN model each use a set of independent convolution kernels. The two sets of convolution kernels do not have any direct relation, and cannot directly influence each other in the training process. Whereas in the bi-directional generator of the first embodiment of the invention the convolution kernel is shared between the layers of the network for both switching directions. Each direction of the bi-directional generator includes a convolution layer, a residual module, and a deconvolution layer. Wherein the convolution kernel for the convolution layer and residual network in the forward direction of the bi-directional generator will be used for the deconvolution layer and residual network in the reverse direction of the module. It is noted that the first and last layers of the generator (the two layers identified by the thin black lines in fig. 3) do not share a convolution kernel. The convolution kernel weights of the two layers are not shared in order to maintain the same network structure in both directions of the generators, i.e. the structure and number of convolutions and deconvolution layers undergone by the image are the same as seen from the input ends of the two generators, so that the same generation quality in both directions can be ensured.

In this first embodiment, the training step comprises:

the penalty functions used for training are classified into a cyclic consistency penalty function and an antagonistic penalty function. The loss of antagonism in the forward and reverse directions are respectively as follows:

L _GAN (G _f ，D _B ，A，B)＝logD _B (y)+log(1-D _B (G _f (x)))

L _GAN (G _b ，D _A ，A，B)＝logD _A (x)+log(1-D _A (G _b (y)))

wherein x and y represent images from two different data sets A and B, respectively, D _A And D _B Representing two discriminators, G _f And G _b Refers to the sameThe bi-directional generator is operable in forward and backward directions, respectively.

The loop consistency loss function is:

L _cyc (G _b ，G _f )＝||G _b (G _f (x))-y|| ₁ +||G _f (G _b (y))-x|| ₁

the optimization method used to optimize the loss function is a random gradient descent:

where η is the update rate used to control the magnitude of model updates. L is a weighted sum of the cyclic consistency loss function and the counterloss function.Is the information fed back to the parallel framework after the update mechanism calculates.

A second embodiment of the present invention provides another deep learning-based bi-directional image conversion system, comprising:

a bidirectional generator: performing image conversion tasks between two pairs of paired images in the forward and reverse directions of the bidirectional generator; the target conversion image of the image is calculated using a depth parallel computing framework for the input multi-channel image data in both directions.

As a preferred embodiment, the system requires only one generator module to perform the task of converting between two pairs of paired images, and the parallel computing framework used is end-to-end. The bi-directional generator performs two conversion tasks simultaneously in one module. In the forward process of the bi-directional generator, the image from image domain a is converted into an image belonging to image domain B. In the backward process of the bi-directional generator, the image from image domain C is converted into an image belonging to image domain D.

The deep learning-based bi-directional image conversion system provided in this second embodiment is described in further detail below with reference to the accompanying drawings.

As shown in fig. 2, a schematic structural diagram of an implementation of the two-way image conversion system based on deep learning according to the second embodiment is provided. This embodiment uses four data sets a, B, C and D. In the figure there is a bi-directional generator G and two discriminators D _B And D _D . The input image from the data set a gets the conversion target image belonging to the data set B through the forward direction of the bi-directional generator, and the input image from the data set C gets the conversion target image belonging to the data set D through the reverse direction of the bi-directional generator. The conversion target image and the real image are sent to a discriminator for evaluation, and evaluation feedback guides the learning process of the bidirectional generator.

As shown in fig. 3, a schematic diagram of the bidirectional generator used in the second embodiment is shown. Which is identical to the specific structure of the bi-directional generator in the first embodiment of the present invention.

In this second embodiment, the training step comprises:

the loss functions used for training are the L1 loss function and the counterloss function. The loss of antagonism in the forward and reverse directions are respectively as follows:

L _GAN (G _f ，D _B ，A，B)＝logD _B (y ₁ )+log(1-D _B (G _f (x ₁ )))

L _GAN (G _b ，D _D ，C，D)＝logD _D (y ₂ )+log(1-D _D (G _f (x ₂ )))

wherein x is ₁ ，y ₁ ，x ₂ ，y ₂ Multi-channel image data from data sets a, B, C and D, respectively. D (D) _B And D _D Representing two discriminators, G _f And G _b Refers to the same bi-directional generator that can operate in forward and backward directions, respectively.

The L1 loss function is:

L ₁ (G _b ，G _f )＝||G _f (x ₁ )-y ₁ || ₁ +||G _f (G _b (x ₂ )-y ₂ || ₁

where η is the update rate used to control the magnitude of model updates. L is a weighted sum of the L1 penalty function and the counterpenalty function.Is the information fed back to the parallel framework after the update mechanism calculates.

Based on the two-way image conversion system based on deep learning provided by the two embodiments of the present invention, the embodiment of the present invention also provides a two-way image conversion method based on deep learning, including:

Further, the method further comprises the following steps:

feeding the quality evaluation result of the discriminator back to the bidirectional generator to train the bidirectional generator;

in the training process, the loss function adopted by the updating mechanism is an anti-loss function; wherein: the counterloss functions in the forward and reverse directions are:

L _GAN (G _f ，D _B ，A，B)＝logD _B (y)+log(1-D _B (G _f (x)))

L _GAN (G _b ，D _A ，A，B)＝logD _A (x)+log(1-D _A (G _b (y)))

wherein x and y represent images from two different data sets A and B, respectively, D _A And D _B Representing two discriminators, G _f And G _b The forward direction and the reverse direction are the forward direction and the backward direction of a bidirectional generating module respectively;

the updating mechanism adopts an optimization method of random gradient descent:

Further, the target converted image of the image is obtained by:

Further, the quality evaluation method of the discriminator comprises the following steps:

According to the deep learning-based bidirectional image conversion system and the deep learning-based bidirectional image conversion method, image conversion tasks between a pair of image domains can be respectively carried out in the forward direction and the reverse direction of the bidirectional generator, and in any direction of the bidirectional generator, a model calculates target conversion images of images by using a depth parallel computing framework on input multi-channel image data; the discriminator carries out quality evaluation on the image obtained by the bidirectional generator and the real image, and under the condition of supervision and unsupervised, the quality evaluation result is used for training the bidirectional generator and the discriminator. Compared with the previous deep learning model for the image conversion task, the invention provides a bidirectional generator structure, greatly reduces parameters of the deep learning model on the premise of not reducing the image generation quality, and can realize two pairs of image conversion tasks in one model under the supervision condition.

The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the invention.

Claims

1. A deep learning-based bi-directional image conversion system, comprising:

a discriminator: the discriminator carries out quality evaluation on the image and the real image obtained by the bidirectional generator, and the quality evaluation result is fed back to the bidirectional generator to train the bidirectional generator;

the bidirectional generator comprises a positive conversion direction and a negative conversion direction, and each conversion direction comprises: convolution layer, residual network and deconvolution layer; wherein the convolution kernels for the convolution layer and the residual network in the forward direction share the deconvolution layer and the residual network in the reverse direction; the first layer and the last layer in the two conversion directions do not share a convolution kernel;

in the process of training the bidirectional generator, a loss function adopted by an updating mechanism is an anti-loss function; wherein: the counterloss functions in the forward and reverse directions are:

L _GAN (G _f ，D _B ，A，B)＝logD _B (y)+log(1-D _B (G _f (x)))

L _GAN (G _b ，D _A ，A，B)＝logD _A (x)+log(1-D _A (G _b (y)))

the loop consistency loss function is:

L _cyc (G _b ，G _f )＝||G _b (G _f (x))-y|| ₁ +||G _f (G _b (y))-x|| ₁ ；

where η is the update rate used to control the magnitude of model updates;the information fed back to the depth parallel computing framework after the update mechanism is computed;

the target conversion image of the image is obtained by the following steps:

the method comprises the steps of utilizing a depth parallel computing framework, reducing the size of input multi-channel image data to 64 and then recovering to 256 through a 15-layer computing module, increasing the number of channels to 128, then reducing and recovering to the original number of channels, and finally obtaining an output image conversion result as a target conversion image;

the quality evaluation method of the discriminator comprises the following steps:

gradually compressing the size of an input image to 32 through a 5-layer computing module by utilizing a depth parallel computing framework, increasing the number of channels to 512, compressing to 1, and finally outputting a result as a result of evaluating the quality of the input image by a discriminator;

the system can realize the conversion task between two pairs of paired images only by one generator module, and the parallel computing framework is end-to-end, and two conversion tasks are simultaneously executed in one bidirectional generator module, wherein: in the forward process of the bi-directional generator, the image from image domain a is converted into an image belonging to image domain B; in the backward process of the bi-directional generator, the image from image domain C is converted into an image belonging to image domain D.

2. A bi-directional image conversion method based on deep learning, comprising:

the discriminator evaluates the quality of the image and the real image obtained by the bidirectional generator;

further comprises:

in the training process, a loss function adopted by an updating mechanism is an anti-loss function; wherein: the counterloss functions in the forward and reverse directions are:

L _GAN (G _f ，D _B ，A，B)＝logD _B (y)+log(1-D _B (G _f (x)))

L _GAN (G _b ，D _A ，A，B)＝logD _A (x)+log(1-D _A (G _b (y)))

loop consistency loss function L _cyc (G _b ，G _f ) The method comprises the following steps:

the target conversion image of the image is obtained by the following steps:

the method can realize the conversion task between two pairs of paired images only by one generator module, and the parallel computing framework is end-to-end, and two conversion tasks are simultaneously executed in one bidirectional generator module, wherein: in the forward process of the bi-directional generator, the image from image domain a is converted into an image belonging to image domain B; in the backward process of the bi-directional generator, the image from image domain C is converted into an image belonging to image domain D.