CN111626917B - Bidirectional image conversion system and method based on deep learning - Google Patents

Bidirectional image conversion system and method based on deep learning Download PDF

Info

Publication number
CN111626917B
CN111626917B CN202010284081.8A CN202010284081A CN111626917B CN 111626917 B CN111626917 B CN 111626917B CN 202010284081 A CN202010284081 A CN 202010284081A CN 111626917 B CN111626917 B CN 111626917B
Authority
CN
China
Prior art keywords
image
generator
conversion
bidirectional
discriminator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010284081.8A
Other languages
Chinese (zh)
Other versions
CN111626917A (en
Inventor
杨浩特
涂仕奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202010284081.8A priority Critical patent/CN111626917B/en
Publication of CN111626917A publication Critical patent/CN111626917A/en
Application granted granted Critical
Publication of CN111626917B publication Critical patent/CN111626917B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a bidirectional image conversion system based on deep learning, which comprises: a bidirectional generator: in the forward and reverse directions of the bidirectional generator, image conversion tasks between a pair of image domains can be respectively carried out, and in any direction of the bidirectional generator, the model calculates target conversion images of the images by using a depth parallel computing framework on input multi-channel image data; a discriminator: the discriminator carries out quality evaluation on the image obtained by the bidirectional generator and the real image, and the quality evaluation result is used for training the bidirectional generator and the discriminator. Meanwhile, a bidirectional image conversion method based on system implementation is provided. The invention provides a bidirectional generator structure, which greatly reduces parameters of a deep learning model on the premise of not reducing the image generation quality, and can realize two pairs of image conversion tasks in one model under the supervision.

Description

Bidirectional image conversion system and method based on deep learning
Technical Field
The invention relates to the technical field of image conversion, in particular to a two-way image conversion system and method based on deep learning.
Background
Image conversion refers to the task of image conversion between two image fields, which converts an image from image a into an image belonging to image B according to certain rules or requirements. The result of the transformation may be to keep the content of image field a unchanged, but to introduce features of image field B, such as a style migration task, or to generate an image of image field B containing more information, such as color information, from a neural network based on less information of image field a, such as a coloring (coloring) task, or to change the content of image field a to adapt to the content of image field B, such as (transformation between zebra and zebra).
There are many successful models in the task of image conversion between paired data sets, with the Pix2Pix model being one of the more successful. The Pix2Pix model is proposed to use the antagonism generation network (GAN) framework to handle such tasks. It can be seen as a conditional challenge-generating network (CGAN) that is the input image of the generator (sample of image domain a).
The Pix2Pix model fully combines and utilizes the advantages of DCGAN and CGAN, and can realize high-quality generation results. But its training process requires paired data. This is because one term in the loss function of the Pix2Pix model is the L1 distance of the generated image and the label image. If the two data sets used by the model are unpaired, then this loss cannot be used for training.
The CycleGAN model was proposed to solve the image conversion problem between unpaired datasets by introducing a loop consistency penalty (cycle consistency) instead of the L1 penalty. Assuming two translators f and g, f realize the translation from Chinese "I love science" to English "I love science", g realizes the translation from English "I love science" to Chinese "I love science". Then, the two translators can be considered to be in inverse relationship to each other. If both translators have good enough performance, then theoretically, f and g should satisfy f (g (I love science)) =i love science, which is a loop consistency. The CycleGAN model uses a pair of generators in inverse relation to each other to respectively realize the task of image conversion between two image domains (two unpaired data sets), so that the generated image can be reconstructed into an original image after passing through the two generators in inverse relation, and L1 loss can be used between the reconstructed image and the input image. The corresponding model also requires two discriminators to respectively discriminate the generation quality of the generator.
A minimum mean square error reconstruction (Lmser) network is a classical network structure. It is obtained by folding together a traditional self-encoder (AE) along the central hidden layer, because of the symmetrical structure of the encoder and decoder of AE, the neurons of the encoder will coincide with those of the decoder, as will the weights of the encoder. In Lmser, the connection between adjacent layers of neurons is a distributed cascade (distributed cascading) relationship. In the training process of the network, information can be continuously and bidirectionally transferred between two adjacent layers. Among the many attributes of Lmser, there are three of the most important dualities. I.e. duality of the bi-directional structure (DBA), duality of the connection weights (DCW) and duality of the pairing neurons (DPN). In recent years, lmser achieves good effects in tasks such as image recognition, medical image segmentation, picture super-resolution and the like.
However, the existing image conversion technology still has the problems of large model parameter number and single conversion task, and can not really meet the requirement of image conversion. No description or report of similar technology is found at present, and similar data at home and abroad are not collected.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a bidirectional image conversion system and method based on deep learning, which are based on minimum mean square error reconstruction and are used for resisting bidirectional image conversion of a generation network and a depth parallel computing framework, wherein one bidirectional generator can be used for completing image conversion tasks of two unidirectional generators in a CycleGAN model, and further, two pairs of independent image conversion tasks can be completed in two directions of one bidirectional generator.
The invention is realized by the following technical scheme.
According to one aspect of the present invention, there is provided a deep learning-based bi-directional image conversion system, including:
a bidirectional generator: performing an image conversion task between a pair of images in the forward and reverse directions of the bidirectional generator; in either direction, the target conversion image of the image can be calculated using a depth parallel computing framework for the input multi-channel image data;
a discriminator: and the discriminator carries out quality evaluation on the image and the real image obtained by the bidirectional generator, and the quality evaluation result is fed back to the bidirectional generator to train the bidirectional generator.
Preferably, the bidirectional generator includes a forward direction and a reverse direction, and each of the forward direction and the reverse direction includes: convolution layer, residual network and deconvolution layer; wherein the convolution kernels for the convolution layer and the residual network in the forward direction share the deconvolution layer and the residual network in the reverse direction; the first and last layers in both switching directions do not share a convolution kernel.
Preferably, in the training process of the bidirectional generator, a loss function adopted by an updating mechanism is an anti-loss function; wherein: the counterloss functions in the forward and reverse directions are:
L GAN (G f ,D B ,A,B)=logD B (y)+log(1-D B (G f (x)))
L GAN (G b ,D A ,A,B)=logD A (x)+log(1-D A (G b (y)))
wherein x and y represent images from two different data sets A and B, respectively, D A And D B Representing two discriminators, G f And G b Is a bidirectional generating module, and respectively runs in forward and backward directions.
Preferably, the update mechanism employs a random gradient descent optimization method:
where η is the update rate used to control the magnitude of model updates;is the information fed back to the depth parallel computing framework after the update mechanism is computed.
Preferably, the target converted image of the image is obtained by:
the input multi-channel image data is reduced to 64 and then restored to 256 through a 15-layer computing module by utilizing a depth parallel computing framework, the number of channels is increased to 128, then reduced to the original number of channels, and finally the output image conversion result is the target conversion image.
Preferably, the method for evaluating the quality by the discriminator comprises the following steps:
the size of the input image is gradually compressed to 32 through a 5-layer computing module by utilizing a depth parallel computing framework, the number of channels is increased to 512, the number of channels is finally compressed to 1, and finally the output result is used as the result of the input image quality evaluation by the discriminator.
According to another aspect of the present invention, there is provided a bi-directional image conversion method based on deep learning, including:
performing an image conversion task between a pair of images in the forward and reverse directions of the bidirectional generator; in either direction, the target conversion image of the image can be calculated using a depth parallel computing framework for the input multi-channel image data;
and the discriminator evaluates the quality of the image obtained by the bidirectional generator and the real image.
Preferably, the method further comprises:
and feeding the quality evaluation result of the discriminator back to the bidirectional generator to train the bidirectional generator.
Preferably, in the training process, the loss function adopted by the updating mechanism is an anti-loss function; wherein: the counterloss functions in the forward and reverse directions are:
L GAN (G f ,D B ,A,B)=logD B (y)+log(1-D B (G f (x)))
L GAN (G b ,D A ,A,B)=logD A (x)+log(1-D A (G b (y)))
wherein x and y represent images from two different data sets A and B, respectively, D A And D B Representing two discriminators, G f And G b Is a bidirectional generating module, and respectively runs in forward and backward directions.
Preferably, the update mechanism employs a random gradient descent optimization method:
where η is the update rate used to control the magnitude of model updates;is the information fed back to the depth parallel computing framework after the update mechanism is computed.
Preferably, the target converted image of the image is obtained by:
the input multi-channel image data is reduced to 64 and then restored to 256 through a 15-layer computing module by utilizing a depth parallel computing framework, the number of channels is increased to 128, then reduced to the original number of channels, and finally the output image conversion result is the target conversion image.
Preferably, the method for evaluating the quality by the discriminator comprises the following steps:
the size of the input image is gradually compressed to 32 through a 5-layer computing module by utilizing a depth parallel computing framework, the number of channels is increased to 512, the number of channels is finally compressed to 1, and finally the output result is used as the result of the input image quality evaluation by the discriminator.
Compared with the prior art, the invention has the following beneficial effects:
the invention greatly reduces the parameters of the model on the premise of not changing the image conversion effect.
The invention can realize two pairs of image conversion tasks on one model.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
FIG. 1 is a diagram of a general architecture of a deep learning based bi-directional image conversion system in accordance with one embodiment of the present invention;
FIG. 2 is a schematic diagram of a bi-directional image conversion system based on deep learning according to another embodiment of the present invention;
fig. 3 is a diagram of a bi-directional generator architecture in two embodiments of the present invention.
Detailed Description
The following describes embodiments of the present invention in detail: the embodiment is implemented on the premise of the technical scheme of the invention, and detailed implementation modes and specific operation processes are given. It should be noted that variations and modifications can be made by those skilled in the art without departing from the spirit of the invention, which falls within the scope of the invention.
A first embodiment of the present invention provides a deep learning-based bidirectional image conversion system, including:
a bidirectional generator: performing an image conversion task between a pair of unpaired images in the forward and reverse directions of the bidirectional generator; in the forward direction, computing a target conversion image of the image using a depth parallel computing framework for the input multi-channel image data; the method comprises the steps of carrying out a first treatment on the surface of the Taking this target transformed image as input in the other direction, a depth parallel computing framework is used to compute a reconstructed image of the forward input image, which is used to compute the loop consistency loss.
A discriminator: the discriminator carries out quality evaluation on the image and the real image obtained by the bidirectional generator, and the quality evaluation result is fed back to the bidirectional generator to train the bidirectional generator.
As a preferred embodiment, the system requires only one generator module to accomplish the two image conversion tasks, and the parallel computing framework used is end-to-end. The bi-directional generator performs two conversion tasks simultaneously in one module. In the forward process of the bi-directional generator, the image from image domain a is converted into an image belonging to image domain B. In the backward process of the bi-directional generator, the image from image domain B is converted into an image belonging to image domain a.
To achieve the bi-directional generator effect, the bi-directional generator shares convolution kernels in both generation directions of the bi-directional generator. Each direction of the bi-directional generator includes a convolution layer, a residual module and a deconvolution layer. Wherein the convolution kernel for the convolution layer and residual network in the forward direction of the bi-directional generator will be used for the deconvolution layer and residual network in the reverse direction of the module. In this way, sharing of convolution kernels is achieved. This approach can be traced back to the Duality (DCW) of the connection weights of Lmser. Which is extended here to convolutional neural network structures.
As a preferred embodiment, in the bi-directional generator: the multi-channel image data is input, the size of the multi-channel data is obviously reduced to 64 and then restored to 256 through a 15-layer computing module by utilizing a depth parallel computing framework, the number of channels is firstly increased to 128 and then reduced to the original number of channels, and the final output is the result of image conversion.
As a preferred embodiment, in the arbiter: the method comprises the steps of inputting multichannel image data, or being a real image or generating an image, gradually compressing the size of the multichannel image data to 32 through a 5-layer computing module by using a depth parallel computing framework, increasing the channel number to 512, compressing to 1, and finally outputting as an evaluation of the image quality of the input multichannel image data by using a discriminator.
In this first embodiment, the bi-directional generator can input multi-channel image data in both the forward and reverse directions and perform an image conversion task based on the input. All previous deep learning models use a single generation module, the parameter quantity is large, and the execution task of the model is single. A specific way to implement a bi-directional generator is to share convolution kernels in both generation directions of the bi-directional generation module. Each direction of the bi-directional generation module comprises a convolution layer, a residual module and a deconvolution layer. Wherein the convolution kernel used for the convolution layer and residual network in the forward direction of the bi-directional generation module will be used for the deconvolution layer and residual network in the reverse direction of the module. In this way, sharing of convolution kernels is achieved.
The deep learning-based bi-directional image conversion system provided in this first embodiment is described in further detail below with reference to the accompanying drawings.
As shown in fig. 1, a schematic structural diagram of an implementation of the two-way image conversion system based on deep learning according to the first embodiment is provided. This embodiment uses two data sets a and B. In the figure there is a bi-directional generator G and two discriminators D A And D B . The input image is subjected to forward direction of the bidirectional generator to obtain a conversion target image, and then the conversion target image is input into the reverse direction of the bidirectional generator to obtain a reconstructed image of the input image. Conversion purposeThe target image is the required output result. The conversion target image and the real image are sent to a discriminator for evaluation, and evaluation feedback guides the learning process of the bidirectional generator. The reconstructed image is used to calculate a loop consistency loss.
As shown in fig. 3, a schematic diagram of the bidirectional generator used in the first implementation is shown. The two unidirectional generators of the previous CycleGAN model each use a set of independent convolution kernels. The two sets of convolution kernels do not have any direct relation, and cannot directly influence each other in the training process. Whereas in the bi-directional generator of the first embodiment of the invention the convolution kernel is shared between the layers of the network for both switching directions. Each direction of the bi-directional generator includes a convolution layer, a residual module, and a deconvolution layer. Wherein the convolution kernel for the convolution layer and residual network in the forward direction of the bi-directional generator will be used for the deconvolution layer and residual network in the reverse direction of the module. It is noted that the first and last layers of the generator (the two layers identified by the thin black lines in fig. 3) do not share a convolution kernel. The convolution kernel weights of the two layers are not shared in order to maintain the same network structure in both directions of the generators, i.e. the structure and number of convolutions and deconvolution layers undergone by the image are the same as seen from the input ends of the two generators, so that the same generation quality in both directions can be ensured.
In this first embodiment, the training step comprises:
the penalty functions used for training are classified into a cyclic consistency penalty function and an antagonistic penalty function. The loss of antagonism in the forward and reverse directions are respectively as follows:
L GAN (G f ,D B ,A,B)=logD B (y)+log(1-D B (G f (x)))
L GAN (G b ,D A ,A,B)=logD A (x)+log(1-D A (G b (y)))
wherein x and y represent images from two different data sets A and B, respectively, D A And D B Representing two discriminators, G f And G b Refers to the sameThe bi-directional generator is operable in forward and backward directions, respectively.
The loop consistency loss function is:
L cyc (G b ,G f )=||G b (G f (x))-y|| 1 +||G f (G b (y))-x|| 1
the optimization method used to optimize the loss function is a random gradient descent:
where η is the update rate used to control the magnitude of model updates. L is a weighted sum of the cyclic consistency loss function and the counterloss function.Is the information fed back to the parallel framework after the update mechanism calculates.
A second embodiment of the present invention provides another deep learning-based bi-directional image conversion system, comprising:
a bidirectional generator: performing image conversion tasks between two pairs of paired images in the forward and reverse directions of the bidirectional generator; the target conversion image of the image is calculated using a depth parallel computing framework for the input multi-channel image data in both directions.
A discriminator: the discriminator carries out quality evaluation on the image and the real image obtained by the bidirectional generator, and the quality evaluation result is fed back to the bidirectional generator to train the bidirectional generator.
As a preferred embodiment, the system requires only one generator module to perform the task of converting between two pairs of paired images, and the parallel computing framework used is end-to-end. The bi-directional generator performs two conversion tasks simultaneously in one module. In the forward process of the bi-directional generator, the image from image domain a is converted into an image belonging to image domain B. In the backward process of the bi-directional generator, the image from image domain C is converted into an image belonging to image domain D.
The deep learning-based bi-directional image conversion system provided in this second embodiment is described in further detail below with reference to the accompanying drawings.
As shown in fig. 2, a schematic structural diagram of an implementation of the two-way image conversion system based on deep learning according to the second embodiment is provided. This embodiment uses four data sets a, B, C and D. In the figure there is a bi-directional generator G and two discriminators D B And D D . The input image from the data set a gets the conversion target image belonging to the data set B through the forward direction of the bi-directional generator, and the input image from the data set C gets the conversion target image belonging to the data set D through the reverse direction of the bi-directional generator. The conversion target image and the real image are sent to a discriminator for evaluation, and evaluation feedback guides the learning process of the bidirectional generator.
As shown in fig. 3, a schematic diagram of the bidirectional generator used in the second embodiment is shown. Which is identical to the specific structure of the bi-directional generator in the first embodiment of the present invention.
In this second embodiment, the training step comprises:
the loss functions used for training are the L1 loss function and the counterloss function. The loss of antagonism in the forward and reverse directions are respectively as follows:
L GAN (G f ,D B ,A,B)=logD B (y 1 )+log(1-D B (G f (x 1 )))
L GAN (G b ,D D ,C,D)=logD D (y 2 )+log(1-D D (G f (x 2 )))
wherein x is 1 ,y 1 ,x 2 ,y 2 Multi-channel image data from data sets a, B, C and D, respectively. D (D) B And D D Representing two discriminators, G f And G b Refers to the same bi-directional generator that can operate in forward and backward directions, respectively.
The L1 loss function is:
L 1 (G b ,G f )=||G f (x 1 )-y 1 || 1 +||G f (G b (x 2 )-y 2 || 1
the optimization method used to optimize the loss function is a random gradient descent:
where η is the update rate used to control the magnitude of model updates. L is a weighted sum of the L1 penalty function and the counterpenalty function.Is the information fed back to the parallel framework after the update mechanism calculates.
Based on the two-way image conversion system based on deep learning provided by the two embodiments of the present invention, the embodiment of the present invention also provides a two-way image conversion method based on deep learning, including:
performing an image conversion task between a pair of images in the forward and reverse directions of the bidirectional generator; in either direction, the target conversion image of the image can be calculated using a depth parallel computing framework for the input multi-channel image data;
and the discriminator evaluates the quality of the image obtained by the bidirectional generator and the real image.
Further, the method further comprises the following steps:
feeding the quality evaluation result of the discriminator back to the bidirectional generator to train the bidirectional generator;
in the training process, the loss function adopted by the updating mechanism is an anti-loss function; wherein: the counterloss functions in the forward and reverse directions are:
L GAN (G f ,D B ,A,B)=logD B (y)+log(1-D B (G f (x)))
L GAN (G b ,D A ,A,B)=logD A (x)+log(1-D A (G b (y)))
wherein x and y represent images from two different data sets A and B, respectively, D A And D B Representing two discriminators, G f And G b The forward direction and the reverse direction are the forward direction and the backward direction of a bidirectional generating module respectively;
the updating mechanism adopts an optimization method of random gradient descent:
where η is the update rate used to control the magnitude of model updates;is the information fed back to the depth parallel computing framework after the update mechanism is computed.
Further, the target converted image of the image is obtained by:
the input multi-channel image data is reduced to 64 and then restored to 256 through a 15-layer computing module by utilizing a depth parallel computing framework, the number of channels is increased to 128, then reduced to the original number of channels, and finally the output image conversion result is the target conversion image.
Further, the quality evaluation method of the discriminator comprises the following steps:
the size of the input image is gradually compressed to 32 through a 5-layer computing module by utilizing a depth parallel computing framework, the number of channels is increased to 512, the number of channels is finally compressed to 1, and finally the output result is used as the result of the input image quality evaluation by the discriminator.
According to the deep learning-based bidirectional image conversion system and the deep learning-based bidirectional image conversion method, image conversion tasks between a pair of image domains can be respectively carried out in the forward direction and the reverse direction of the bidirectional generator, and in any direction of the bidirectional generator, a model calculates target conversion images of images by using a depth parallel computing framework on input multi-channel image data; the discriminator carries out quality evaluation on the image obtained by the bidirectional generator and the real image, and under the condition of supervision and unsupervised, the quality evaluation result is used for training the bidirectional generator and the discriminator. Compared with the previous deep learning model for the image conversion task, the invention provides a bidirectional generator structure, greatly reduces parameters of the deep learning model on the premise of not reducing the image generation quality, and can realize two pairs of image conversion tasks in one model under the supervision condition.
The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the invention.

Claims (2)

1. A deep learning-based bi-directional image conversion system, comprising:
a bidirectional generator: performing an image conversion task between a pair of images in the forward and reverse directions of the bidirectional generator; in either direction, the target conversion image of the image can be calculated using a depth parallel computing framework for the input multi-channel image data;
a discriminator: the discriminator carries out quality evaluation on the image and the real image obtained by the bidirectional generator, and the quality evaluation result is fed back to the bidirectional generator to train the bidirectional generator;
the bidirectional generator comprises a positive conversion direction and a negative conversion direction, and each conversion direction comprises: convolution layer, residual network and deconvolution layer; wherein the convolution kernels for the convolution layer and the residual network in the forward direction share the deconvolution layer and the residual network in the reverse direction; the first layer and the last layer in the two conversion directions do not share a convolution kernel;
in the process of training the bidirectional generator, a loss function adopted by an updating mechanism is an anti-loss function; wherein: the counterloss functions in the forward and reverse directions are:
L GAN (G f ,D B ,A,B)=logD B (y)+log(1-D B (G f (x)))
L GAN (G b ,D A ,A,B)=logD A (x)+log(1-D A (G b (y)))
wherein x and y represent images from two different data sets A and B, respectively, D A And D B Representing two discriminators, G f And G b The forward direction and the reverse direction are the forward direction and the backward direction of a bidirectional generating module respectively;
the loop consistency loss function is:
L cyc (G b ,G f )=||G b (G f (x))-y|| 1 +||G f (G b (y))-x|| 1
the updating mechanism adopts an optimization method of random gradient descent:
where η is the update rate used to control the magnitude of model updates;the information fed back to the depth parallel computing framework after the update mechanism is computed;
the target conversion image of the image is obtained by the following steps:
the method comprises the steps of utilizing a depth parallel computing framework, reducing the size of input multi-channel image data to 64 and then recovering to 256 through a 15-layer computing module, increasing the number of channels to 128, then reducing and recovering to the original number of channels, and finally obtaining an output image conversion result as a target conversion image;
the quality evaluation method of the discriminator comprises the following steps:
gradually compressing the size of an input image to 32 through a 5-layer computing module by utilizing a depth parallel computing framework, increasing the number of channels to 512, compressing to 1, and finally outputting a result as a result of evaluating the quality of the input image by a discriminator;
the system can realize the conversion task between two pairs of paired images only by one generator module, and the parallel computing framework is end-to-end, and two conversion tasks are simultaneously executed in one bidirectional generator module, wherein: in the forward process of the bi-directional generator, the image from image domain a is converted into an image belonging to image domain B; in the backward process of the bi-directional generator, the image from image domain C is converted into an image belonging to image domain D.
2. A bi-directional image conversion method based on deep learning, comprising:
performing an image conversion task between a pair of images in the forward and reverse directions of the bidirectional generator; in either direction, the target conversion image of the image can be calculated using a depth parallel computing framework for the input multi-channel image data;
the discriminator evaluates the quality of the image and the real image obtained by the bidirectional generator;
further comprises:
feeding the quality evaluation result of the discriminator back to the bidirectional generator to train the bidirectional generator;
in the training process, a loss function adopted by an updating mechanism is an anti-loss function; wherein: the counterloss functions in the forward and reverse directions are:
L GAN (G f ,D B ,A,B)=logD B (y)+log(1-D B (G f (x)))
L GAN (G b ,D A ,A,B)=logD A (x)+log(1-D A (G b (y)))
wherein x and y represent images from two different data sets A and B, respectively, D A And D B Representing two discriminators, G f And G b The forward direction and the reverse direction are the forward direction and the backward direction of a bidirectional generating module respectively;
loop consistency loss function L cyc (G b ,G f ) The method comprises the following steps:
L cyc (G b ,G f )=||G b (G f (x))-y|| 1 +||G f (G b (y))-x|| 1
the updating mechanism adopts an optimization method of random gradient descent:
where η is the update rate used to control the magnitude of model updates;the information fed back to the depth parallel computing framework after the update mechanism is computed;
the target conversion image of the image is obtained by the following steps:
the method comprises the steps of utilizing a depth parallel computing framework, reducing the size of input multi-channel image data to 64 and then recovering to 256 through a 15-layer computing module, increasing the number of channels to 128, then reducing and recovering to the original number of channels, and finally obtaining an output image conversion result as a target conversion image;
the quality evaluation method of the discriminator comprises the following steps:
gradually compressing the size of an input image to 32 through a 5-layer computing module by utilizing a depth parallel computing framework, increasing the number of channels to 512, compressing to 1, and finally outputting a result as a result of evaluating the quality of the input image by a discriminator;
the method can realize the conversion task between two pairs of paired images only by one generator module, and the parallel computing framework is end-to-end, and two conversion tasks are simultaneously executed in one bidirectional generator module, wherein: in the forward process of the bi-directional generator, the image from image domain a is converted into an image belonging to image domain B; in the backward process of the bi-directional generator, the image from image domain C is converted into an image belonging to image domain D.
CN202010284081.8A 2020-04-13 2020-04-13 Bidirectional image conversion system and method based on deep learning Active CN111626917B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010284081.8A CN111626917B (en) 2020-04-13 2020-04-13 Bidirectional image conversion system and method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010284081.8A CN111626917B (en) 2020-04-13 2020-04-13 Bidirectional image conversion system and method based on deep learning

Publications (2)

Publication Number Publication Date
CN111626917A CN111626917A (en) 2020-09-04
CN111626917B true CN111626917B (en) 2024-02-20

Family

ID=72258834

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010284081.8A Active CN111626917B (en) 2020-04-13 2020-04-13 Bidirectional image conversion system and method based on deep learning

Country Status (1)

Country Link
CN (1) CN111626917B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240605A (en) * 2021-05-21 2021-08-10 南开大学 Image enhancement method for forward and backward bidirectional learning based on symmetric neural network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109584325A (en) * 2018-10-30 2019-04-05 河北科技大学 A kind of two-way coloration method for the animation image unanimously fighting network based on the U-shaped period
CN110910351A (en) * 2019-10-31 2020-03-24 上海交通大学 Ultrasound image modality migration and classification method and terminal based on generation countermeasure network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10636141B2 (en) * 2017-02-09 2020-04-28 Siemens Healthcare Gmbh Adversarial and dual inverse deep learning networks for medical image analysis
US11232541B2 (en) * 2018-10-08 2022-01-25 Rensselaer Polytechnic Institute CT super-resolution GAN constrained by the identical, residual and cycle learning ensemble (GAN-circle)

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109584325A (en) * 2018-10-30 2019-04-05 河北科技大学 A kind of two-way coloration method for the animation image unanimously fighting network based on the U-shaped period
CN110910351A (en) * 2019-10-31 2020-03-24 上海交通大学 Ultrasound image modality migration and classification method and terminal based on generation countermeasure network

Also Published As

Publication number Publication date
CN111626917A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
CN111242288B (en) Multi-scale parallel deep neural network model construction method for lesion image segmentation
CN110544297A (en) Three-dimensional model reconstruction method for single image
WO2023280064A1 (en) Audiovisual secondary haptic signal reconstruction method based on cloud-edge collaboration
CN111161200A (en) Human body posture migration method based on attention mechanism
Xin et al. Residual attribute attention network for face image super-resolution
CN112233012B (en) Face generation system and method
WO2020177214A1 (en) Double-stream video generation method based on different feature spaces of text
CN105740911A (en) Structure sparsification maintenance based semi-supervised dictionary learning method
CN111626296B (en) Medical image segmentation system and method based on deep neural network and terminal
CN112950480A (en) Super-resolution reconstruction method integrating multiple receptive fields and dense residual attention
CN111626917B (en) Bidirectional image conversion system and method based on deep learning
CN113140023A (en) Text-to-image generation method and system based on space attention
CN116740223A (en) Method for generating image based on text
CN113888399B (en) Face age synthesis method based on style fusion and domain selection structure
CN113836319A (en) Knowledge completion method and system for fusing entity neighbors
CN117078539A (en) CNN-transducer-based local global interactive image restoration method
CN113436198A (en) Remote sensing image semantic segmentation method for collaborative image super-resolution reconstruction
CN113298895A (en) Convergence guarantee-oriented unsupervised bidirectional generation automatic coding method and system
Jiang et al. Cross-level reinforced attention network for person re-identification
Liu et al. How well apply multimodal mixup and simple mlps backbone to medical visual question answering?
Jin et al. Efficient action recognition with introducing r (2+ 1) d convolution to improved transformer
Wang et al. APST-Flow: A Reversible Network-Based Artistic Painting Style Transfer Method.
Pei et al. Visual relational reasoning for image caption
CN111104868A (en) Cross-quality face recognition method based on convolutional neural network characteristics
CN116579918B (en) Attention mechanism multi-scale image conversion method based on style independent discriminator

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant