CN108171320B

CN108171320B - Image domain conversion network and conversion method based on generative countermeasure network

Info

Publication number: CN108171320B
Application number: CN201711273921.5A
Authority: CN
Inventors: 肖锋; 白猛猛; 冯飞
Original assignee: Xian Technological University
Current assignee: Xian Technological University
Priority date: 2017-12-06
Filing date: 2017-12-06
Publication date: 2021-10-19
Anticipated expiration: 2037-12-06
Also published as: CN108171320A

Abstract

The invention discloses an image domain conversion network and a conversion method based on a generative countermeasure network, which comprises a U-shaped generative network, a true and false authentication network and a pairing authentication network, wherein the image domain conversion process mainly comprises the following steps: 1) training the U-shaped generation network, and establishing a network model of the U-shaped generation network; 2) inputting the image to be converted into the network model established in the step 1) after normalization processing, and completing image domain conversion of the image to be converted; the invention can realize the image domain conversion task of the local area in the image, and has high image local area conversion quality, strong network judgment capability and strong image conversion stability, thereby greatly improving the authenticity of the generated image.

Description

Image domain conversion network and conversion method based on generative countermeasure network

Technical Field

The invention relates to the technical field of image domain conversion, in particular to an image domain conversion network and a conversion method based on a generative countermeasure network.

Background

The image domain conversion is an important research direction in computer vision and has wide application prospect. Currently, the emergence of a countermeasure network (GAN) has achieved remarkable achievement in the field of image generation, which also provides a new solution for image domain transformation. And inputting the image and generating the image of the target domain of the network generation by using the generating type confrontation network, wherein the training of the network is completed based on the game between the generating network and the identifying network. The generation type countermeasure network is an unsupervised learning method when originally proposed, and gradually learns the data distribution in a training set through the game between the generation network and the discrimination network, so that the generation network can randomly generate data according to the learned data distribution by inputting a random value, and the earliest application is the generation of images. And then, adding artificial conditions to the input of the GAN by the Conditional GAN to ensure that the generated data is not generated randomly any more, but different data is generated according to different input conditions.

Later, a Conditional GAN that improves on the original GAN appeared, which was currently studied, by adding artificial conditions to the GAN input, can generate specific image data for a specific input, rather than inputting the immediate data to generate the immediate image data. The proposed Conditional GAN enables the image domain to be converted using a generative confrontation network, i.e. the image domain is input as an original domain image in the frame thereof, and the target domain image can be output through training and learning. The image domain conversion GAN implemented under this framework has: (1) pix2pix GAN, which uses a supervised method, the network is based on a generating network and a countermeasure identification network, and the conversion task of the whole image domain is solved; (2) the Cycle GAN uses an unsupervised method, two confrontation networks are used through two generation networks, and the conversion of an image domain is realized by utilizing a Cycle-consistency loss Cycle training generation network and a confrontation identification network. Although the unsupervised method does not need the training data in one-to-one pairing, the conversion effect is inferior to that of the supervised pix2pix network, the conversion of the whole image domain is still aimed at, and in the existing image domain conversion, a domain conversion task of a local region in an image does not have a special GAN.

Disclosure of Invention

The invention aims to provide an image domain conversion network and a conversion method based on a generating type countermeasure network, which can realize the image domain conversion task of a local area in an image, and have the advantages of high image local area conversion quality, strong network judgment capability and strong image conversion stability, thereby greatly improving the authenticity of the generated image.

The technical scheme adopted by the invention is as follows:

an image domain conversion network based on a generating type countermeasure network comprises a U-shaped generating network, a true and false identification network and a pairing identification network, wherein the U-shaped generating network comprises a coding network and a decoding network, the Input end of the coding network is connected with an Input image Input, the Output end of the coding network is connected with the Input end of the decoding network, and the Output end of the decoding network outputs an image Output; setting real target domain images matched with the Input images one by one as target domain images target; the network generated image Output is used as a training negative sample of a true and false identification network and input into a negative sample input end of the true and false identification network, the target domain image target is used as a training positive sample of the true and false identification network and input into a positive sample input end of the true and false identification network, and a value Output by the true and false identification network is used as a true and false loss value and fed back to a true and false loss input end of a decoding network; the network generated image Output and the corresponding Input image Input are used as training negative samples to be Input into a negative sample Input end of the pairing identification network, the target domain image target and the corresponding Input image Input are used as training positive samples to be Input into a positive sample Input end of the pairing identification network, and a value Output by the pairing identification network is used as a pairing loss value to be fed back to a pairing loss Input end of the decoding network; and feeding back the structural similarity value between the network generated image Output and the target domain image target as a compensation loss value to a compensation loss input end of the decoding network.

The coding network comprises eight layers of convolutional networks, the convolutional kernel size of each layer of convolutional network is 3 x 3, the step length is 2 x2, each layer of convolutional network comprises a convolutional layer, a Batch Normalization layer and a Leak ReLU active layer, and the alpha parameter of the Leak ReLU active layer is 0.2; the decoding network comprises eight layers of deconvolution networks, the size of a deconvolution kernel of each layer of deconvolution network is 3 x 3, the step length is 2 x2, each layer of deconvolution network comprises a deconvolution layer, a Batch Normalization layer and an activation layer, the activation layers of the first to seventh layers of deconvolution networks adopt ReLU activation layers, and the activation layer of the eighth layer of deconvolution network adopts a tanh activation layer.

The true and false authentication network comprises a plurality of layers of true and false authentication convolutional networks which are sequentially transmitted, each layer of true and false authentication convolutional network comprises a convolutional layer, a Batch Normalization layer and an activation layer, the activation layer of the last layer of true and false authentication convolutional network adopts a Sigmoid activation function, and the activation layers of the other layers of true and false authentication convolutional networks adopt ReLU functions.

The pairing authentication network comprises a Concat layer and a plurality of layers of pairing authentication convolutional networks which are transmitted in sequence, each layer of pairing authentication convolutional network comprises a convolutional layer, a Batch Normalization layer and an activation layer, the activation layer of the last layer of pairing authentication convolutional network adopts a Sigmoid activation function, and the activation layers of the other layers of pairing authentication convolutional networks adopt ReLU functions.

An image domain conversion method based on a generative countermeasure network comprises the following steps:

1) training the U-shaped generation network, and establishing a network model of the U-shaped generation network; the method specifically comprises the following steps:

A. collecting a training image set of a domain to be converted, wherein the training image set comprises original domain images and target domain images which are matched one by one, normalizing the original domain images in the training image set, the normalized images are Input images during network training, and the target domain images in the training image set are target domain images corresponding to the Input images;

B. converting the Input image Input obtained in the step A into a network generation image Output of a training network through a U-shaped generation network;

C. and (3) training a multi-pair discrimination network by using the Input image Input, the target domain image target and the network generation image Output obtained in the step A and the step B: the training of the multi-pair authentication network comprises the training of a true authentication network and a false authentication network and the training of a pairing authentication network, wherein the training of the true authentication network and the false authentication network comprises the following steps:

c11: initializing the network weight of the true and false authentication network by adopting a random initialization method;

c12: taking a network generation image Output as a negative sample, taking a target domain image target corresponding to the Input image Input as a positive sample, training in the true and false authentication network, and updating the network weight of the true and false authentication network by using a cross mutual information entropy loss function and an adam optimization algorithm;

the training of the pair-wise authentication network comprises the following steps:

c21: initializing the network weight of the pairing authentication network by adopting a random initialization method:

c22: taking the network generation image Output and the corresponding Input image Input as negative samples, taking the Input image Input and the corresponding target domain image target as positive samples, training in the pairing authentication network, and updating the network weight of the pairing authentication network by using a cross mutual information entropy loss function and an adam optimization algorithm;

D. repeating the step C, and fixing the network weights of the true and false authentication networks and the pairing authentication network after two times of multi-countermeasure authentication network training;

E. training the U-shaped generation network by using the multi-confrontation discrimination network obtained after the training in the step D; the method specifically comprises the following steps:

e1: initializing the network weight of the U-shaped generation network by adopting a Haville random initialization method;

e2: inputting the network generated image Output into a true and false identification network, outputting a true and false loss value by the true and false identification network, and feeding back the Output true and false loss value to a decoding network in the U-shaped generation network for updating the network weight: the true and false identification network outputs 30 × 1 images, and is used for returning loss values of the network generated image Output close to the real image, wherein the range of pixel point values of each image Output by the true and false identification network is 0 to 1, the closer the pixel point value is to 1, the closer the Input image Input is to the real image in the pixel point receptive field area, and the closer the pixel point value is to 0, the closer the Input image Input is to the real image in the pixel point receptive field area;

e3: inputting the Input image Input and the corresponding network generation image Output into a pairing identification network, outputting a pairing loss value by the pairing identification network, and feeding back the Output pairing loss value to a decoding network in the U-shaped generation network for updating the network weight: the pairing identification network outputs 30 × 1 images, and is used for returning whether the Input image Input and the network generation image Output are loss values of pairing of the Input image Input and the target domain image target or not; the range of each image pixel point value Output by the pairing identification network is 0 to 1, wherein the closer to 1, the more matched the Input image Input and the network generated image Output, and the closer to 0, the more unmatched the Input image Input and the network generated image Output;

e4: calculating a structural similarity value between the network generated image Output and the target domain image target, and feeding back the calculated structural similarity value as a loss to a decoding network in the U-shaped generating network for updating the network weight; the structural similarity value comprises an SSIM loss function calculation result and an L1 regularization calculation result, wherein the SSIM loss function is derived from an SSIM algorithm, an output value SSIM (x, y) of the SSIM algorithm represents the similarity between two images, namely the structural similarity between an input image x and a target domain image y, the range of SSIM (x, y) is-1 to 1, the similarity of the two images is higher when the similarity is close to 1, and when the input image x is the same as the target domain image y, the value of SSIM (x, y) is equal to 1;

the calculation formula of the output value of the SSIM algorithm is as follows:

in the formula (1), x is the Input image Input, and y is the target area image target, μ, corresponding to the Input image Input_xIs the average value of x, μ_yIs the average value of y and is,

is the variance of x and is,

is the variance of y, σ_xyIs the covariance of x and y, c₁＝(k₁L)²And c₂＝(k₂L)²Is a constant for maintaining stability, L is the dynamic range of pixel values, k₁＝0.01,₂＝0.03；

F. C-E is weight training of the U-shaped generation network once, the steps C-E are repeated, after the weight training of the U-shaped generation network is completed twice, the training of the U-shaped generation network is completed, and the obtained generation network is a network model of the U-shaped generation network; 2) inputting the image to be converted into the network model established in the step 1) after normalization processing, and completing image domain conversion of the image to be converted: inputting the normalized image serving as an Input image into the network model established in the step 1), extracting high-dimensional features of the Input image Input by a coding network, outputting the network through a decoding network to generate an image Output, wherein the Output network generated image Output is a target domain image subjected to image domain conversion.

The overall loss function of the U-shaped generation network in the step E is as follows:

L_GAN(G,D₁,D₂)＝L_D1+λ₁L_D2+λ₂L_ssim+λ₃L₁ (2)

the overall loss function to be optimized in the overall generative countermeasure network is:

G^*＝argmin_GANmax_D1max_D2(L_GAN(G,D₁,D₂)+L_D1+L_D2) (3)

in the formulas (2) and (3),

representing a loss of true or false output from the true or false discrimination network,

indicating a loss of pairing at the output of the pairing authentication network,

representing the loss SSIM loss calculated by the SSIM loss function,

represents L₁A regular term loss, x represents the Input image Input, y represents the target domain image target, λ corresponding to the Input image Input₁Generating a parameter, lambda, for the pairing loss accounting for the weight in the overall loss of the network₂A parameter, lambda, representing the weight of SSIM loss in the overall loss of the U-type generation network₃Represents L₁The value of the regular term accounts for the weight parameter in the integral loss of the U-shaped generation network;

in the initial training stage of the U-shaped generation network, the proportion of true and false loss, pairing loss, SSIM loss and L1 regular term loss is 1:1:4:1, and along with the increase of the network training times, the proportion of true and false loss, pairing loss, SSIM loss and L1 regular term loss gradually changes to 1:1: and 0.5:1, namely the SSSIMloss accounts for the weight parameter in the overall loss of the U-shaped generation network and gradually decreases according to the set overall training times.

The cross mutual information entropy loss function in the step C is a cross mutual information entropy loss function with a smooth item; the formula of the cross mutual information entropy loss function with the smoothing term is:

in the formula (4), i is the size of batch, t_iFor predicted sample value, y_iThe value for the true sample value, for the added smoothing term, is chosen to be 0.005.

The generation process of the network generated image Output comprises the following steps:

a) the image to be converted is normalized into an image with 256 × 3 pixels, the normalized image is used as an Input image Input to be Input into a coding network, the Input image Input sequentially passes through 8 layers of convolution networks in the coding network, and finally output data is a characteristic image with 1 × 1024; the convolution kernel size of each layer of convolution network in the coding network is 3 x 3, and the step length is 2 x 2;

b) inputting the characteristic image of 1 × 1024 generated in the step a) into a decoding network, sequentially passing the characteristic image through 8 layers of deconvolution networks of the decoding network, and simultaneously inputting the characteristic image after the operation of each layer of convolution network in the step a) into a deconvolution layer with the same data tensor size for operation, and finally generating a complete network generated image Output, wherein the input of the deconvolution layer not only has the characteristic image from the previous layer of deconvolution operation, but also has a convolution operation characteristic image corresponding to the tensor size; and the size of a deconvolution kernel of each layer of deconvolution network is 3 x 3, and the step size is 2 x 2.

In the step b), in the characteristic images input by the deconvolution network of the first three layers, Dropout operation is added in the process of inputting the characteristic images after the convolution network operation of each layer in the step a) into the deconvolution layer with the same data tensor size for operation; wherein the parameter of Dropout operation is used as 0.2, i.e. 20% of the connected nodes in the two connection layers are randomly closed.

The SSIM algorithm in step E4 is calculated in the form of a sliding window with a convolution kernel, where the size of the sliding window is 7 × 7.

The invention has the following advantages:

(1) the method comprises the steps of establishing a network model of the U-shaped generation network through a countermeasure generation network comprising the U-shaped generation network, a pairing identification network and a true and false identification network, and realizing image domain conversion of local images through the established network model, so that the vacancy that the countermeasure generation network aiming at the local image conversion does not exist at present is filled, the use range of the countermeasure generation network in the image domain conversion field is improved, and the effect and the reliability of the image domain conversion are improved;

(2) in the network model training process of the generated network, a multi-countermeasure mode of a pairing identification network and a true and false identification network is adopted, in order to improve the problem that the judgment capability of the network identification network is poor at the initial training stage, an SSIM loss function is added, the similarity of images is calculated by using an SSIM algorithm, the calculation result is used as loss to update the weight of the generated network, and the calculation result of the SSIM algorithm is used as loss to make up the problem of low initial countermeasure capability of the countermeasure network, so that the generated network can be converged better, and a better image domain conversion effect is never obtained;

(3) the cross mutual information entropy loss function is used in the training process of the multi-confrontation authentication network, so that the training of the multi-confrontation authentication network is more stable, the traditional cross entropy loss function contains log operation, the loss fluctuation is large in the initial stage of calculating loss, and further the result that the loss function is 0 is possible to appear in the training stage, so that the training failure is caused, the fluctuation in the training process is reduced by adding a smooth item, the training failure condition is prevented, and the stability of deep confrontation network training is improved;

(4) by arranging the device in a generating network, the input of a deconvolution layer not only has a characteristic image from the deconvolution operation of the previous layer, but also has a convolution operation characteristic image corresponding to the tensor size, so that the information of the image is retained to the maximum extent, the characteristic information in the original image is better and more completely stored, the effect and the authenticity of image domain conversion are improved, Dropout operation is added into the operation of inputting the characteristic image to the corresponding deconvolution layer by the convolution layers of the first three layers of the coding network, the simplification of the image obtained after decoding is effectively prevented, and the image domain conversion quality is further improved;

(3) by adopting the Leak ReLU active layer and setting the parameters of the Leak ReLU to be 0.2, the information of the original image domain of the generated network is better reserved, the residual information is reserved as much as possible when the network reversely transmits, the integrity of the converted image is improved, and the conversion effect is ensured;

drawings

FIG. 1 is a diagram of a network architecture of the present invention;

FIG. 2 is a diagram of the U-shaped generation network of FIG. 1;

FIG. 3 is a network architecture diagram of the true and false authentication network of FIG. 1;

FIG. 4 is a network architecture diagram of the paired authentication network of FIG. 1;

FIG. 5 is a U-shaped generated network training diagram of FIG. 1;

FIG. 6 is a network training diagram of the true and false authentication network of FIG. 1;

fig. 7 is a network training diagram of the pair authentication network of fig. 1.

Detailed Description

For a better understanding of the present invention, the technical solutions of the present invention are further described below with reference to the accompanying drawings.

As shown in FIG. 1, the invention comprises a U-shaped generation network U-net, a true and false authentication network D1-net and a pairing authentication network D2-net, and also comprises a structural similarity numerical calculation part, wherein the structural similarity numerical calculation comprises an SSIM loss function calculation part and an L1 regularization part; the U-shaped generation network U-net is used for converting an image domain, the true and false identification network D1-net is an identifier used for judging whether the network generation image Output is true, and the identification network D2-net is an identifier used for judging whether the network generation image Output is matched with an original image;

as shown in fig. 2, the U-shaped generation network U-net includes a coding network F-net and a decoding network G-net, the coding network F-net performs convolution operation on the image to output a high-dimensional feature map thereof, and the decoding network G-net performs generation of the image by performing deconvolution on the feature map using a deconvolution network.

The Input end of the coding network F-net is connected with the Input image Input, the Output end of the coding network F-net is connected with the Input end of the decoding network G-net, and the Output end of the decoding network G-net generates a network generated image Output.

The coding network F-net comprises eight layers of convolutional networks, the convolutional kernel size of each layer of convolutional network is 3 x 3, the step length is 2 x2, each layer of convolutional network comprises a convolutional layer, a Batch Normalization layer and a Leak ReLU activation layer, and as the generation network needs to reserve the original image domain information as much as possible and reserve the residual error information as much as possible when the network reversely propagates, the invention adopts a Leak ReLU activation function for selecting each layer of activation functions of the coding network F-net in the generation network, wherein the alpha parameter of the Leak ReLU activation layer is 0.2.

The decoding network G-net comprises eight layers of deconvolution networks, the size of a deconvolution kernel of each layer of deconvolution network is 3 x 3, the step length is 2 x2, each layer of deconvolution network comprises a deconvolution layer, a Batch Normalization layer and an activation layer, the activation layers of the first to seventh layers of deconvolution networks adopt ReLU activation layers, and the activation layer of the eighth layer of deconvolution network adopts a tanh activation layer.

As shown in fig. 3, the true/false discrimination network D1-net is for discriminating whether or not the generated image is a true image, and thus it is input as one image and output as true/false; the true and false authentication network D1-net comprises multiple layers of successively transmitted true and false authentication convolutional networks, and each layer of true and false authentication convolutional network comprises a convolutional layer, a Batch Normalization layer and an activation layer.

The ReLU activation function can effectively transfer residual errors and can keep nonlinear fitting, the output value of the tanh activation function is between-1 and 1, the output value of the Sigmoid activation function value is between 0 and 1, and calculation is convenient for a label, so that the excitation function used by each layer except the last layer of the true and false identification network D1-net is the ReLU function, the last layer of the output layer is the Sigmoid activation function, namely the activation layer of the last layer of the true and false identification convolutional network adopts the Sigmoid activation function, and the activation layers of the other layers of the true and false identification convolutional networks adopt the ReLU function.

The design of the convolution kernel of each layer of true and false identification convolution network follows the principle of small convolution kernel, 3 × 3 convolution kernels are adopted, the convolution operation that the step size stride of each layer is 1 between 32 × 256 true and false identification convolution networks and 30 × 1 true and false identification convolution networks is removed, and the rest step sizes stride are 2. In order to prevent the gradient diffusion phenomenon, a back Normalization layer is added in the network of each layer, and meanwhile, the step length of partial layer convolution calculation is considered to be 2, so that the pooling effect is achieved, and therefore a posing layer is not added in the network.

The pair discrimination network D2-net is used to discriminate whether or not the generated image is paired with the input image, and thus its input is two images; the pairing identification network D2-net comprises a Concat layer and a plurality of layers of pairing identification convolution networks which are sequentially transmitted, each layer of pairing identification convolution network comprises a convolution layer, a Batch Normalization layer and an activation layer, the activation layer of the last layer of pairing identification convolution network adopts a Sigmoid activation function, and the activation layers of the other layers of pairing identification convolution networks adopt ReLU functions.

As shown in fig. 4, the paired authentication network D2-net is similar to the true and false authentication network D1-net in structure, and only one image is added at the input of the first layer, i.e., the input of the first layer is 256 × 6 tensor, the ReLU and sigmoid activation functions are also selected, the Bacth Normalization layer is used, and the loss function with smoothing terms is used as with D1-net.

Setting real target domain images matched with the Input images one by one as target domain images target; the network generation image Output is used as a training negative sample of the true and false authentication network D1-net and is input into a negative sample input end of the true and false authentication network D1-net, the target domain image target is used as a training positive sample of the true and false authentication network D1-net and is input into a positive sample input end of the true and false authentication network D1-net, and a value Output by the true and false authentication network D1-net is fed back to a true and false loss input end of the decoding network G-net as a true and false loss value; the network generated image Output and the corresponding Input image Input are used as training negative samples to be Input into a negative sample Input end of the paired identification network D2-net, the target domain image target and the corresponding Input image Input are used as training positive samples to be Input into a positive sample Input end of the paired identification network D2-net, and a value Output by the paired identification network D2-net is used as a pairing loss value to be fed back to a pairing loss Input end of the decoding network G-net; and feeding back a structural similarity value between the network generated image Output and the target domain image target as a compensation loss value to a compensation loss input end of the decoding network G-net, wherein the calculation of the structural similarity value comprises an SSIM loss function and L1 regularization.

The invention also comprises an image domain conversion method based on the generative countermeasure network, which comprises the following steps:

1) training the U-shaped generation network U-net, and establishing a network model of the U-shaped generation network U-net; the method specifically comprises the following steps:

A. collecting a training image set of a domain to be converted, wherein the training image set comprises original domain images and target domain images which are matched one by one, the original domain images in the training image set are normalized into images with 256 × 3 pixels, the normalized images are Input images during network training, and the target domain images in the training image set are target domain images corresponding to the Input images;

B. converting the Input image Input obtained in the step A into a network generation image Output of a training network through a U-shaped generation network U-net;

C. and (3) training a multi-pair discrimination network by using the Input image Input, the target domain image target and the network generation image Output obtained in the step A and the step B: the training of the multi-confrontation authentication network comprises the training of a true authentication network D1-net and the training of a pairing authentication network D2-net;

as shown in fig. 6, the training of the true-false discrimination network D1-net includes the following steps:

c11: initializing the network weight of the true and false authentication network D1-net by adopting a random initialization method;

c12: taking a network generated image Output as a negative sample, taking a target domain image target corresponding to the Input image Input as a positive sample, performing classification training in a true and false authentication network D1-net, and updating the network weight of the true and false authentication network D1-net by using a cross mutual information entropy loss function with a smooth term and an adam optimization algorithm;

the cross mutual information entropy is larger at the initial stage of calculation loss, and meanwhile, 0 possibly occurs in the training stage to cause training failure, smooth items are added to reduce fluctuation during training, the training failure condition is prevented from occurring, and the improved cross mutual information entropy function with the smooth items improves the stability of deep confrontation network training;

in the two-classification training, firstly, generating one-hot type labels for positive and negative samples, then calculating cross entropy loss by using sigmoid cross Entrol added with a smooth term according to values from 0 to 1 output by a last layer of sigmoid activation function, and finally updating the weight of a true and false discrimination network D1-net according to the loss of feedback;

wherein the formula of the cross mutual information entropy loss function with the smooth term is

In the formula (1), i is the size of batch, t_iFor predicted sample value, y_iSelecting the value of the added smoothing item as 0.005 for the real sample value;

as shown in fig. 7, the training of the pair-wise authentication network D2-net includes the following steps:

c21: initializing the network weight of the pairing authentication network D2-net by adopting a random initialization method:

c22: taking the network generated image Output and the corresponding Input image Input as negative samples, taking the Input image Input and the corresponding target domain image target as positive samples, performing two-classification training in the paired authentication network D2-net, and updating the network weight of the paired authentication network D2-net by using a cross mutual information entropy loss function with a smooth term and an adam optimization algorithm;

D. and C, repeating the step C, and after two times of multi-confrontation authentication network training, fixing the network weights of the true and false authentication network D1-net and the pairing authentication network D2-net: in order to enable the training of the U-shaped generation network U-net to be more stable, a strategy of training the true and false identification network D1-net and the matched identification network D2-net for multiple times and then training the U-shaped generation network U-net is adopted;

E. training the U-shaped generation network U-net by using the multi-confrontation authentication network obtained after the training in the step D; as shown in fig. 5, the training of the U-net includes the following steps:

e1: initializing the network weight of the U-shaped generation network U-net by adopting a Haville random initialization method;

e2: inputting the network generated image Output into a true and false identification network D1-net, outputting a true and false loss value by the true and false identification network D1-net, and feeding back the Output true and false loss value to a decoding network G-net in the U-shaped generation network U-net for updating the network weight: the true and false identification network D1-net outputs 30 × 1 images, which are used to return loss values of the network generated image Output close to the real image, wherein the pixel point value range of each image Output by the true and false identification network D1-net is 0 to 1, the closer the pixel point value is to 1, the closer the Input image Input is to the real image in the pixel receptive field area, and the closer the pixel point value is to 0, the closer the Input image Input is to the real image in the pixel receptive field area;

e3: inputting the Input image Input and the corresponding network generation image Output into the pairing identification network D2-net, outputting a pairing loss value by the pairing identification network D2-net, and feeding back the Output pairing loss value to a decoding network G-net in the U-shaped generation network U-net for updating the network weight: the pair discrimination network D2-net outputs 30 × 1 images for returning whether the Input image Input and the network generated image Output are loss values of the pair of the Input image Input and the target domain image target; wherein, the pixel point value of each image Output by the pair discrimination network D2-net ranges from 0 to 1, and a closer to 1 indicates a more matching between the Input image Input and the network generated image Output, and a closer to 0 indicates a more non-matching;

e4: calculating a structural similarity value between the network generated image Output and the target domain image target, and feeding back the calculated structural similarity value serving as loss to a decoding network G-net in the U-shaped generated network U-net for updating the network weight; the calculation of the structural similarity value comprises an SSIM loss function and L1 regularization, wherein the SSIM loss function is derived from an SSIM algorithm and is an index for measuring the similarity of two images, the output value SSIM (x, y) of the SSIM algorithm represents the similarity between the two images, namely the structural similarity between an input image x and a target domain image y, the range of the SSIM (x, y) is-1 to 1, the similarity between the two images is higher when the SSIM (x, y) is close to 1, and when the input image x and the target domain image y are identical, the value of the SSIM (x, y) is equal to 1; by using the SSIM as a loss function, the generated network can be converged better, so that a better image domain conversion effect is obtained;

in the formula (2), x is the Input image Input, and y is the target area image target, μ, corresponding to the Input image Input_xIs the average value of x, μ_yIs the average value of y and is,

is the variance of x and is,

is the variance of y, σ_xyIs the covariance of x and y, c₁＝(k₁L)²And₂＝(k₂L)²is a constant for maintaining stability, L is the dynamic range of pixel values, k₁＝0.01,₂＝0.03；

F. C-E is weight training of the U-shaped generation network U-net once, the steps C-E are repeated, after the weight training of the U-shaped generation network U-net is completed twice, the training of the U-shaped generation network U-net is completed, and the generated network is a network model of the U-shaped generation network U-net;

however, when the SSIM value of two images is calculated, it needs to be converted into a sliding window form, which results in selecting sliding windows with different sizes and different parameters δ, and the obtained results are different. The SSIM algorithm originally proposed by Wang et al adopts a sliding window of 11 × 11, however, since the difference between the image generated by the generation network at the initial stage and the target image is large, the value of SSIM between two images calculated by the sliding window with large size is very close to 0 at the initial stage, so that the loss cannot be effectively transmitted back to the generation network, and the training of the antagonistic generation network GAN fails. In view of this problem and considering that the network inputs are photographs of 256 × 256 pixels, the computation of the SSIM algorithm in the present invention finally takes the form of a sliding window of convolution kernel, with a size of 7 × 7 being chosen.

In the step E of training the U-shaped generation network U-net, the overall loss function of the U-shaped generation network U-net is:

L_GAN(G,D₁,D₂)＝L_D1+λ₁L_D2+λ₂L_ssim+λ₃L₁ (3)

G^*＝argmin_GANmax_D1max_D2(L_GAN(G,D₁,D₂)+L_D1+L_D2) (4)

in the formulas (3) and (4),

representing a loss of true or false output from the true or false discrimination network D1-net,

representing a loss of pairing output by the pairing authentication network D2-net,

representing the SSIM loss calculated by the SSIM loss function,

represents L₁A regular term loss, x represents the Input image Input, y represents the target domain image target, λ corresponding to the Input image Input₁Parameter, lambda, of weights in a decoding network G-net for generating a network for a pairing loss as a whole₂Parameter, λ, representing the weight of SSIM loss in the decoding network G-net of the overall generation network₃Represents L₁The values of the regularization terms account for the parameters of the weights in the decoding network G-net of the overall generation network.

In the initial training stage of the U-shaped generation network U-net, the proportion of the true and false loss, the pairing loss, the SSIM loss and the L1 regular term loss is 1:1:4:1, and along with the increase of the network training times, the proportion of the true and false loss, the pairing loss, the SSIM loss and the L1 regular term loss gradually changes to 1:1: and 0.5:1, namely the SSSIMloss accounts for the weight parameter in the U-shaped generation network U-net overall loss and gradually decreases according to the set overall training times.

The method comprises the steps that when the discrimination capability of a true discrimination network D1-net and a false discrimination network D2-net is low in the initial stage of network training, SSIM loss functions can be used for feeding back U-shaped generated network U-net residual errors, so that target domain images can be effectively generated, and when the discrimination capability of the true discrimination network D1-net and the false discrimination network D2-net is continuously improved in training, the weight of the SSIM loss functions in the feedback generated network residual errors is reduced, so that the larger part of the generated network residual errors are from the loss fed back by the true discrimination network D1-net and the false discrimination network D2-net, and the effect is better than that of an existing image domain conversion method and the generated image is more real.

2) Normalizing the image to be converted, wherein the normalized pixels are 256 × 256, and the normalized image is input into the network model established in the step 1), so that the image domain conversion of the image to be converted can be completed: inputting the normalized image serving as an Input image into the network model established in the step 1), extracting high-dimensional characteristics of the Input image Input by the encoding network F-net, outputting the network through the decoding network G-net to generate an image Output, wherein the Output network generated image Output is the target domain image subjected to image domain conversion.

In the image domain conversion process of the U-shaped generation network, the image needs to be input into the coding network F-net to be subjected to convolution operation firstly, and then deconvolution operation is carried out to realize the conversion of the image domain; however; in the conventional U-shaped generation network, partial information of the original image is difficult to retain in the convolution process, so in order to better and more completely store the characteristic information in the original image, the generation process of the network generated image Output of the invention comprises the following steps:

a) the image to be converted is normalized into an image with 256 × 3 pixels, the normalized image is used as an Input image Input to be Input into a coding network F-net, the Input image Input sequentially passes through 8 layers of convolution networks in the coding network F-net, and finally output data is a characteristic image with 1 × 1024; the convolution kernel size of each layer of convolution network in the coding network F-net is 3 x 3, and the step length is 2 x 2;

b) inputting the characteristic image of 1 x 1024 generated in the step a) into a decoding network G-net, enabling the characteristic image to sequentially pass through 8 layers of deconvolution networks of the decoding network G-net, and simultaneously inputting the characteristic image after the operation of each layer of convolution network in the step a) into a deconvolution layer with the same data tensor size for operation to finally generate a complete network generated image Output, wherein the input of the deconvolution layer not only has the characteristic image from the previous layer of deconvolution operation, but also has the convolution operation characteristic image corresponding to the tensor size; and the size of a deconvolution kernel of each layer of deconvolution network is 3 x 3, and the step size is 2 x 2.

In the characteristic images input by the deconvolution networks of the first three layers, Dropout operation is added in the process of inputting the characteristic images after the convolution network operation of each layer in the step a) into the deconvolution layer with the same data tensor size for operation; wherein the parameter of Dropout operation is used as 0.2, i.e. 20% of the connected nodes in the two connection layers are randomly closed.

The input of the deconvolution layer not only has the characteristic image from the last layer of deconvolution operation, but also has the characteristic image corresponding to the size of tensor, so that the information of the image is reserved to the maximum extent, the characteristic information in the original image is better and more completely stored, the effect and the reality of image domain conversion are improved, and in order to prevent the image obtained by decoding the network G-net from being unified, Dropout operation is added into the operation of inputting the characteristic image to the corresponding deconvolution layer by the first three convolution layers of the coding network F-net, the image obtained after decoding is effectively prevented from being unified, and the image domain conversion quality is further improved.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that various changes, modifications and substitutions can be made therein without departing from the spirit and scope of the embodiments of the present invention.

Claims

1. An image domain conversion network system based on a generative countermeasure network, characterized in that: the image matching method comprises a U-shaped generation network, a true and false identification network and a pairing identification network, wherein the U-shaped generation network comprises an encoding network and a decoding network, the Input end of the encoding network is connected with an Input image Input, the Output end of the encoding network is connected with the Input end of the decoding network, and the Output end of the decoding network outputs an image Output generated by the network; setting real target domain images matched with the Input images one by one as target domain images target; the network generated image Output is used as a training negative sample of a true and false identification network and input into a negative sample input end of the true and false identification network, the target domain image target is used as a training positive sample of the true and false identification network and input into a positive sample input end of the true and false identification network, and a value Output by the true and false identification network is used as a true and false loss value and fed back to a true and false loss input end of a decoding network; the network generated image Output and the corresponding Input image Input are used as training negative samples to be Input into a negative sample Input end of the pairing identification network, the target domain image target and the corresponding Input image Input are used as training positive samples to be Input into a positive sample Input end of the pairing identification network, and a value Output by the pairing identification network is used as a pairing loss value to be fed back to a pairing loss Input end of the decoding network; and feeding back the structural similarity value between the network generated image Output and the target domain image target as a compensation loss value to a compensation loss input end of the decoding network.

2. The image domain conversion network system based on the generative countermeasure network as claimed in claim 1, wherein: the coding network comprises eight layers of convolutional networks, the convolutional kernel size of each layer of convolutional network is 3 x 3, the step length is 2 x2, each layer of convolutional network comprises a convolutional layer, a Batch Normalization layer and a Leak ReLU active layer, and the alpha parameter of the Leak ReLU active layer is 0.2; the decoding network comprises eight layers of deconvolution networks, the size of a deconvolution kernel of each layer of deconvolution network is 3 x 3, the step length is 2 x2, each layer of deconvolution network comprises a deconvolution layer, a Batch Normalization layer and an activation layer, the activation layers of the first to seventh layers of deconvolution networks adopt ReLU activation layers, and the activation layer of the eighth layer of deconvolution network adopts a tanh activation layer.

3. The image domain conversion network system based on the generative countermeasure network as claimed in claim 2, wherein: the true and false authentication network comprises a plurality of layers of true and false authentication convolutional networks which are sequentially transmitted, each layer of true and false authentication convolutional network comprises a convolutional layer, a Batch Normalization layer and an activation layer, the activation layer of the last layer of true and false authentication convolutional network adopts a Sigmoid activation function, and the activation layers of the other layers of true and false authentication convolutional networks adopt ReLU functions.

4. The image domain conversion network system based on the generative countermeasure network as claimed in claim 3, wherein: the pairing identification network comprises a Concat layer and a plurality of layers of pairing identification convolution networks which are sequentially transmitted, each layer of pairing identification convolution network comprises a convolution layer, a Batch Norm optimization layer and an activation layer, the activation layer of the last layer of pairing identification convolution network adopts a Sigmoid activation function, and the activation layers of the other layers of pairing identification convolution networks adopt ReLU functions.

5. An image domain conversion method of the image domain conversion network system based on the generative countermeasure network as claimed in claim 4, wherein: the method comprises the following steps:

f1: initializing the network weight of the U-shaped generation network by adopting a Haville random initialization method;

in the formula (1), x isThe Input image Input, y is a target area image target, μ corresponding to the Input image Input_xIs the average value of x, μ_yIs the average value of y and is,

is the variance of x and is,

is the variance of y, σ_xyIs the covariance of x and y, c₁＝(k₁L)²And c₂＝(k₂L)²Is a constant for maintaining stability, L is the dynamic range of pixel values, k₁＝0.01，k₂＝0.03；

F. C-E is weight training of the U-shaped generation network once, the steps C-E are repeated, after the weight training of the U-shaped generation network is completed twice, the training of the U-shaped generation network is completed, and the obtained generation network is a network model of the U-shaped generation network;

2) inputting the image to be converted into the network model established in the step 1) after normalization processing, and completing image domain conversion of the image to be converted: inputting the normalized image serving as an Input image into the network model established in the step 1), extracting high-dimensional features of the Input image Input by a coding network, outputting the network through a decoding network to generate an image Output, wherein the Output network generated image Output is a target domain image subjected to image domain conversion.

6. The image domain conversion method of the image domain conversion network system based on the generative countermeasure network as claimed in claim 5, wherein: the overall loss function of the U-shaped generation network in the step E is as follows:

L_GAN(G，D₁，D₂)＝L_D1+λ₁L_D2+λ₂L_ssim+λ₃L₁ (2)

G^*＝arg min_GANmax_D1max_D2(L_GAN(G，D₁，D₂)+L_D1+L_D2) (3)

in the formulas (2) and (3),

representing the loss SSIM loss calculated by the SSIM loss function,

in the initial training stage of the U-shaped generation network, the proportion of true and false loss, pairing loss, SSIM loss and L1 regular term loss is 1:1:4:1, with the increase of the network training times, the proportion of true and false loss, pairing loss, SSIM loss and L1 regular term loss gradually becomes 1:1: 0.5:1, the SSSIMloss accounts for the weight of the overall loss of the U-shaped generation network, and the parameters are gradually reduced according to the set overall training times.

7. The image domain conversion method of the image domain conversion network system based on the generative countermeasure network as claimed in claim 5, wherein: the cross mutual information entropy loss function in the step C is a cross mutual information entropy loss function with a smooth item;

the formula of the cross mutual information entropy loss function with the smoothing term is:

in the formula (4), i is the size of batch, t_iFor predicted sample value, y_iFor the true sample value, EPS is the added smoothing term, and the value of EPS is chosen to be 0.005.

8. The image domain conversion method of the image domain conversion network system based on the generative countermeasure network as claimed in claim 5, wherein: the generation process of the network generated image Output comprises the following steps:

9. The image domain conversion method of the image domain conversion network system based on the generative countermeasure network as claimed in claim 8, wherein: in the step b), in the characteristic images input by the deconvolution network of the first three layers, Dropout operation is added in the process of inputting the characteristic images after the convolution network operation of each layer in the step a) into the deconvolution layer with the same data tensor size for operation; wherein the parameter of Dropout operation is used as 0.2, i.e. 20% of the connected nodes in the two connection layers are randomly closed.

10. The image domain conversion method of the image domain conversion network system based on the generative countermeasure network as claimed in claim 5, wherein: the SSIM algorithm in step E4 is calculated in the form of a sliding window with a convolution kernel, where the size of the sliding window is 7 × 7.