CN115841589A

CN115841589A - Unsupervised image translation method based on generation type self-attention mechanism

Info

Publication number: CN115841589A
Application number: CN202211394182.6A
Authority: CN
Inventors: 付春玲; 胡崇豪; 周林; 李军伟
Original assignee: Henan University
Current assignee: Henan University
Priority date: 2022-11-08
Filing date: 2022-11-08
Publication date: 2023-03-24
Anticipated expiration: 2042-11-08

Abstract

The invention relates to the technical field of image processing, in particular to an unsupervised image translation method based on a generative self-attention mechanism, which comprises the following steps: inputting the obtained image to be translated into an independent encoder to obtain a DSI depth information space of the image to be translated, and further obtaining a combined and superposed DSI depth information space; inputting the combined and superposed DSI depth information space into a pre-constructed and trained generator containing a generation type self-attention mechanism to obtain a target translation image; and acquiring a target domain image, inputting the target domain image and the target translation image into a pre-constructed and trained multi-scale discriminator containing a generating self-attention mechanism, and judging whether the target translation image is a real image. The method is mainly applied to the field of unsupervised image translation, solves the problem of low accuracy of the conventional unsupervised image translation method, and effectively improves the image quality of the translated image.

Description

Unsupervised image translation method based on generation type self-attention mechanism

Technical Field

The invention relates to the technical field of image processing, in particular to an unsupervised image translation method based on a generating type self-attention mechanism.

Background

Image-to-image translation is an important branch of computer vision and graphics problem research, and refers to the process of converting an image from a source domain to a target domain, the application of which involves super-resolution, image rendering, image inpainting, image generation, style migration, and the like. And under the condition of keeping the image content unchanged, generating a translation image by using the mapping relation between the original domain and the target domain. In the process of image-to-image translation, the image-to-image translation process can be divided into supervised image translation and unsupervised image translation, and further divided into paired image translation and unpaired image translation according to whether the image data sets are paired or not. In real life, since paired data sets are extremely difficult to acquire and acquisition cost is very high, for translation of image data sets, image translation is performed on unpaired images by adopting unsupervised image translation, and therefore, the unpaired unsupervised image translation has become a mainstream trend of research.

GAN (generic adaptive network, generative countermeasure network) is a powerful Generative model, and there is significant room for improvement in the future. In the conventional GAN image-to-image translation method, a real image from a source domain is transmitted to a U-shaped encoder network structure, and a generator generates a synthetic image corresponding to a target domain. The network used for classification is called a discriminator, which decides whether the image synthesized by the generator can be recognized as a real image. However, in many image translation tasks, the resulting composite image has a low image quality, indicating that the source domain image and the composite image differ only in a very small portion. However, when a cat is changed into a puppy, the content of the background is kept, the characteristics of the head area are changed, a cyclic generation countermeasure network (CycleGAN) is a common method for solving the problem, and the cyclic generation countermeasure network is the most representative method for unsupervised image translation and is particularly suitable for style conversion.

Disclosure of Invention

In order to solve the technical problem of relatively low accuracy of the unsupervised image translation result of the existing method, the invention aims to provide an unsupervised image translation method based on a generating type self-attention mechanism, and the adopted technical scheme is as follows:

one embodiment of the invention provides an unsupervised image translation method based on a generating type self-attention mechanism, which comprises the following steps:

acquiring an image to be translated, inputting the image to be translated into an independent encoder, and performing image preprocessing on the image to be translated to obtain a DSI (digital signal interface) depth information space of the image to be translated and further obtain a DSI depth information space after combination and superposition;

inputting the combined and superposed DSI depth information space into a pre-constructed and trained generator containing a generation type self-attention mechanism to obtain a target translation image corresponding to the image to be translated;

acquiring a target domain image, inputting the target domain image and the target translation image into a pre-constructed and trained multi-scale discriminator containing a generating type self-attention mechanism, judging whether the target translation image is a real image or not, and enabling a generator to generate an image close to the target domain.

Further, the independent encoder is configured to perform image preprocessing on an image to be translated, the generator is configured to output a target translation image corresponding to the image to be translated, the multi-scale discriminator is configured to discriminate whether the target translation image corresponding to the image to be translated generated by the generator is false or true, the target translation image is a false image, and both the image to be translated and the true image are source domain images.

Further, the step of constructing a generator with a generative self-attention mechanism comprises:

and constructing a generating self-attention mechanism module, and embedding the generating self-attention mechanism module between a residual error layer module and an upper sampling layer module in the generator to obtain the generator containing the generating self-attention mechanism.

Further, the step of constructing the multi-scale discriminator with the generative self-attention mechanism comprises:

and inserting the generating self-attention mechanism module in front of a down-sampling layer module of the multi-scale discriminator to obtain the multi-scale discriminator containing the generating self-attention mechanism.

Further, the generator with the generating self-attention mechanism comprises a down-sampling layer module, a modified residual block module, a generating self-attention mechanism module and an up-sampling layer module.

Further, the step of obtaining a target translation image corresponding to the image to be translated includes:

inputting the combined and superposed DSI depth information space into a down-sampling layer module in a pre-constructed and trained generator containing a generating type self-attention mechanism to obtain coding feature mapping of the combined and superposed DSI depth information space;

modifying the residual block of the multi-scale discriminator, and mapping and inputting the coding features into a modified residual block module in a generator to obtain an original domain feature image;

inputting the original domain feature map into a generative self-attention mechanism network in a generator to obtain feature information of the original domain feature map;

inputting the characteristic information of the original domain characteristic image into an up-sampling layer module in a generator to obtain a target translation image corresponding to the image to be translated, wherein the target translation image is a synthetic image formed by mapping and converting the characteristics of a target domain corresponding to the target translation image.

Further, the step of image preprocessing the image to be translated comprises:

performing convolution processing on an image to be translated, extracting image characteristics of the image to be translated, sampling the image characteristics of the image to be translated, combining hidden vectors, and extracting characteristic information of the image to be translated again through convolution operation, wherein the characteristic information is DSI depth information space.

The invention has the following beneficial effects:

the invention provides an unsupervised image translation method based on a generating type self-attention mechanism, which is used for realizing the task of translating images, and the method obtains the DSI (Digital Systems Information) depth Information space of the images to be translated by carrying out image processing on the acquired images to be translated, and converts the images into the characteristic Information of the images, thereby enhancing the efficiency of image translation to a certain extent, improving the accuracy of the input data of a generator and further improving the accuracy of the unsupervised image translation result. Compared with the prior art, the generator containing the generating type self-attention mechanism of the pre-constructed and trained generating type self-attention mechanism network is beneficial to improving the capability of extracting deeper image features, further improving the capability of transforming image space details and being beneficial to the generator to generate more vivid translated images. The pre-constructed and trained multi-scale discriminator with the generating self-attention mechanism is beneficial to further transmitting image characteristics, and finally unsupervised and unmatched image translation from the image to be translated to the target translation image is realized. Whether the target translation image is a real image or not is judged based on the corresponding distribution probability of the target translation image, so that the image quality of a synthetic image obtained in the unsupervised unpaired image translation is effectively improved, and the accuracy of the unsupervised image translation result is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flowchart of an unsupervised image translation method based on a generative self-attention mechanism according to the present invention;

FIG. 2 is a diagram illustrating steps of image preprocessing performed on an image to be translated by an independent encoder according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an unsupervised image translation method based on a generative self-attention mechanism according to the present invention;

FIG. 4 is a diagram illustrating a structure of a generator with a generative self-attention mechanism according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a multi-scale discriminator structure with a generating self-attention mechanism according to an embodiment of the present invention;

fig. 6 is a comparison result of a translated image obtained by performing unsupervised image translation on a data set in which a target domain corresponding to a kitten is a source domain and a plurality of existing mainstream unsupervised image translation methods according to an embodiment of the present invention;

fig. 7 is a comparison result between a translated image obtained by unsupervised image translation performed by this embodiment and a plurality of existing mainstream unsupervised image translation methods for a data set in which a target domain corresponding to a source domain is a puppy is a kitten in the embodiment of the present invention;

fig. 8 is a comparison result between a translated image obtained by unsupervised image translation performed by this embodiment and a plurality of conventional mainstream unsupervised image translation methods for a data set in which a source domain is an apple and a target domain is an orange and a source domain is a girl and a target domain is a boy in this embodiment of the present invention.

Detailed Description

To further explain the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description of the embodiments, structures, features and effects of the technical solutions according to the present invention will be given with reference to the accompanying drawings and preferred embodiments. In the following description, different references to "one embodiment" or "another embodiment" do not necessarily refer to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

In order to better solve the problems of low image quality and inaccurate translation result of the translated image translated by the unsupervised image, the embodiment provides an unsupervised image translation method based on a generative self-attention mechanism, a flow chart of the method is shown in fig. 1, a structural schematic diagram of the method is shown in fig. 3, and the method comprises the following steps:

(1) The method comprises the steps of obtaining an image to be translated, inputting the image to be translated into an independent encoder, carrying out image preprocessing on the image to be translated to obtain a DSI (digital signal interface) depth information space of the image to be translated, and further obtaining the DSI depth information space after combination and superposition.

In the embodiment, in order to reduce the loss of the generator and the multi-scale discriminator and improve the accuracy of the image translation result, the image to be translated is input into the independent encoder, and the image to be translated is the source domain image. The independent encoder is used for image preprocessing of an image to be translated and comprises a down-sampling layer module and an up-sampling layer module, and the independent encoder is composed of a plurality of convolution layers. Fig. 2 shows a schematic diagram of the step of performing image preprocessing on an image to be translated by an independent encoder, which specifically includes: performing convolution processing on an image to be translated, extracting image characteristics of the image to be translated, sampling the image characteristics of the image to be translated, combining hidden vectors, and extracting characteristic information of the image to be translated again through convolution operation, wherein the characteristic information is DSI depth information space. The image preprocessing is carried out on the image to be translated, so that the information of the image to be translated can be grasped more intuitively, each network can be more concentrated on the target of the network, and the translated image is more reasonable and accurate.

(2) And inputting the combined and superposed DSI depth information space into a pre-constructed and trained generator containing a generation type self-attention mechanism to obtain a target translation image corresponding to the image to be translated.

In this embodiment, the generator may be configured to output a target translation image corresponding to an image to be translated, and the generator includes a deep layer module, a down-sampling layer module, a residual layer module, a generating self-attention mechanism module, and an up-sampling layer module, and a schematic diagram of a structure of the generator including the generating self-attention mechanism is shown in fig. 4. The specific process for constructing the generator with the generating type self-attention mechanism comprises the following steps: firstly, a generating type self-attention mechanism module is constructed, and the generating type self-attention mechanism module is embedded between a residual error layer module and an upper sampling layer module in a generator to obtain the generator containing the generating type self-attention mechanism. The generator with the generative self-attention mechanism enhances the capability of the generator to extract deeper features at the central pixel, further improves the capability of changing the image space details and is helpful for guiding the generator to generate more vivid images. The image translation process is a step of acquiring a target translation image corresponding to an image to be translated, wherein the target translation image is a false image, namely an image which does not exist really, and the steps comprise:

(2-1) inputting the DSI depth information space after combination and superposition into a down-sampling layer module in a pre-constructed and trained generator containing a generating type self-attention mechanism to obtain the coding feature mapping of the DSI depth information space after combination and superposition. The down-sampling layer module is composed of convolution frequency spectrum normalization activation layer and is used for coding shape or texture feature with obvious change into high-dimensional feature image

And (2-2) modifying the residual block of the multi-scale discriminator, and mapping the coding features to a modified residual block module in the generator to obtain the original domain feature image. The residual block is composed of jump connection, in order to improve the effect of the residual block in the image processing process, the number of the original residual block is modified and set to be eight, namely the residual block is composed of eight residual convolution blocks, the residual block can also be called a hidden layer module, and the hidden layer module is used for extracting the multi-level features of the high-dimensional feature image and performing accurate linear division on different data types.

And (2-3) inputting the original domain feature map into a generative self-attention mechanism network in the generator to obtain feature information of the original domain feature map. The generation type self-attention mechanism module is composed of a plurality of convolution layers, is a supplement part of convolution and is used for simulating the relationship between widely separated space regions, guiding the generator to pay attention to the space structure information of the image, calculating the distance between any two pixel points in the image, and obtaining the global geometric characteristics of the image based on the characteristic correlation of the pixel points so as to generate a more vivid image.

And (2-4) inputting the characteristic information of the original domain characteristic image into an up-sampling layer module in the generator to obtain a target translation image corresponding to the image to be translated, wherein the target translation image is a synthetic image formed by mapping and converting the characteristics of the target domain corresponding to the image to be translated. The upsampling layer module consists of a convolutional layer and a hyperbolic tangent function, which is used to restore the image.

(3) And acquiring a target domain image, inputting the target domain image and the target translation image into a pre-constructed and trained multi-scale discriminator containing a generating self-attention mechanism, and judging whether the target translation image is a real image.

In this embodiment, the multi-scale discriminator including the generating self-attention mechanism is used to determine the true degree and the loss degree of the target translation image corresponding to the image to be translated output by the generator, the multi-scale discriminator includes a generating self-attention mechanism module, a down-sampling layer module, a multi-layer sensing module, and a multi-scale classifier module, and a schematic diagram of a structure of the multi-scale discriminator including the generating self-attention mechanism is shown in fig. 5. The specific process of constructing the multi-scale discriminator with the generating self-attention mechanism comprises the following steps: and inserting the generating self-attention mechanism module in front of a down-sampling layer module of the multi-scale discriminator to obtain the multi-scale discriminator containing the generating self-attention mechanism.

The method comprises the following steps of taking a target domain image and a target translation image as pre-constructed and trained input images of a multi-scale discriminator with a generating self-attention mechanism, and specifically comprises the following steps: and inputting the target domain image and the target translation image into a pre-constructed and trained multi-scale discriminator containing a generating self-attention mechanism, wherein the multi-scale discriminator outputs whether the target translation image is a real image, and the real image is a real image which is a source domain image and is not a false image generated by the generator.

It should be noted that the generative self-attention mechanism network in the multi-scale discriminator is composed of a plurality of convolutional layers, which can encode the input image into high-dimensional features, and is helpful to follow the continuity, limit the severity of function change, and avoid the model collapse of the generator. The multi-layer perception module is composed of a CAM (Channel attention module) attention module which comprises a convolutional neural network layer, SN (switched Normalization, self-adaptive Normalization for differential learning) light Normalization and a LEAKyrelu activation function, and the CAM attention module is helpful for the attention-directed model to flexibly control the variation of the shape and texture. The multi-scale classifier module adopts a multi-scale design and introduces a residual attention mechanism to further promote feature propagation. Meanwhile, the present embodiment also learns the feature mapping generated by the attention vector w, and the formula of the feature mapping is: f (a) = γ × ω × EA (a) + EA (a), a feature map including a remaining attention mechanism is obtained, where a training parameter r determines a balance relationship between an attention feature and an original feature, a is a distribution of a target translation image, a is sample data of a, and E is a feature mapping of an independent encoder.

In order to improve the accuracy of the image translation result, that is, to improve the image quality of the translated image, a generator with a generation-type self-attention mechanism and a multi-scale discriminator need to be trained, the training process plays a critical role in the image translation process, and the quality of the training effect directly affects the translation effect and the quality of the final translated image, and the method comprises the following steps:

(3-1) randomly extracting images from the source domain data set, cropping the extracted images, and adjusting the size of the cropped images to 256 × 256 to obtain training samples of the generator containing the generative self-attention mechanism.

And (3-2) inputting the training samples in the step (3-1) into an independent encoder to obtain a DSI depth information space corresponding to the training samples, wherein the DSI depth information space is characteristic information obtained from different layers, and the characteristic information is a hidden vector.

And (3-3) acquiring the target domain image, constructing a generator with a generating type self-attention mechanism, and inputting the combined and superposed DSI depth information space into the generator with the generating type self-attention mechanism to obtain a composite image.

(3-4) putting the composite image and the target domain image into a multi-scale discriminator having a generative self-attention mechanism, respectively, and calculating a loss degree based on an output result of the multi-scale discriminator.

(3-4-1) the antagonistic loss function in the generator promotes the distribution of the composite image, helping to match the distribution of the target domain image, the calculation formula of the antagonistic loss function in the generator is:

wherein, C _a As discriminators a, G as generators b, E _a Are independent encoders a, E _b For the independent encoder B, B is the distribution of the image to be translated, i.e. the distribution of the source domain image, a is the distribution of the target translated image, i.e. the distribution of the synthesized image, B 'is the sample data of B, and a' is the sample data of a.

(3-4-2) a circular consistency loss function can be used to reduce the probability of model collisions, given that an image a ∈ A is transformed from A to B and back to A, the resulting image should have the same distribution as A, and the calculation formula of the circular consistency loss function is:

/>

b 'is sample data of B, a' is sample data of A, B is distribution of image to be translated, F is generator a, G is generator B, E _b Are independent encoders a, E _b Is an independent encoder b.

(3-4-3) in order to ensure that the output image of the generator and the input image have similar distribution, applying reconstruction to the generator is subjected to consistency constraint, namely given that the image a belongs to A, the output image of the generator should not be changed, and a reconstruction loss function helps the generator to extract the hierarchical features, so as to reduce errors generated in the process of extracting the features by DAIN (Depth-Aware Video Frame Interpolation), wherein the calculation formula of the reconstruction loss function is as follows:

wherein B 'is sample data of B, a' is sample data of A, B is distribution of image to be translated, F is generator a, E _b Is an independent encoder b.

(3-4-4) in this embodiment, the countermeasure loss of the generator is a loss function used in the multi-scale discriminator, and the countermeasure loss helps to guide the multi-scale discriminator to distinguish the source domain image from the target domain image, so as to promote the distribution probability of the synthesized image output by the generator to be continuously close to the distribution probability of the target domain image, that is, to promote the matching of the distribution of the synthesized image and the distribution of the target domain image, and simultaneously, the parameters of the multi-scale discriminator are optimized, and the countermeasure loss is maximized. The formula for the loss function of the multi-scale discriminator is:

wherein, C _b As discriminators b, G as generators b, E _a Are independent encoders a, E _b For the independent encoder B, B is the distribution of the image to be translated, i.e. the distribution of the source domain image, a is the distribution of the target translated image, i.e. the distribution of the synthesized image, B 'is the sample data of B, and a' is the sample data of a.

(3-5) setting a total optimization target, setting different weights for the loss functions in the step (3), and adding the loss function values.

And (3-6) repeating the steps (3-1) to (3-5) until the generator containing the generative self-attention mechanism and the multi-scale discriminator reach the set iteration cycle number.

It should be noted that, with continuous training of the network, the feature extraction effect of the generator including the generating-type self-attention mechanism and the multi-scale discriminator is gradually enhanced, the synthesized image approaches to the target domain image, and a satisfactory image translation result is finally achieved.

In order to test the image translation effect of the unsupervised image translation method based on the generative self-attention mechanism in the embodiment, the similarity between two image data sets can be measured by using the initial distance Frechet, which can be well matched with the judgment of human visual quality and is commonly used for evaluating the image quality of generative confrontation network samples. The method comprises the following specific steps: the method comprises the steps of calculating the characteristics of an FID (Frequency identification number) initial network by calculating the Frechet initial distance between Gaussian functions corresponding to two image data sets, wherein the Frechet association distance score is a measure for calculating the distance between the characteristic vectors corresponding to a real image and a synthetic image, and the lower the association distance score is, the higher the image quality of the synthetic image is. Of course, KID (Kernel initiation Distance) can also be used to measure the similarity between two image datasets, and belongs to the same quality metric as FID, and the main difference is that KID has a simple and unbiased estimator, which helps to increase the accuracy of the measurement result, especially when the number of original feature redundant images is large. To better match human for image quality assessment, the visual similarity between the real image and the synthetic image is obtained by calculating the maximum mean square error, with a smaller KID value indicating that the real image is more visually similar to the synthetic image.

In this embodiment, the translation result of the translated image in this embodiment will be further described with reference to experiments.

Experimental data, there are 3 types of unpaired datasets, which are cat2dog, applet 2orange and man2 wman datasets, that is, a dataset whose target domain is a dog, target domain is an orange, and source domain is a girl, and target domain is a boy. For the cat2dog data set, the training data comprises 771 cat images and 1264 puppy images, and the test data comprises 100 cat images and 100 puppy images; for the applet 2orange data set, the training data included 955 apple images and 1019 orange images, and the test data includes 266 apple images and 248 orange images; for the man2 wman dataset, the training data included 1200 images for men and 1200 images for women, and the test data included 115 images for men and 115 images for women.

Experimental setup, the generator and the multi-scale discriminator both use ReLU (Rectified Linear Units) as the activation function, and set the slope of the activation function to 0.2, adam (adaptive moment estimation) as the optimizer, which has a learning rate of 0.0001, randomly flips the input image horizontally with 0.5 probability, adjusts the size of the input image to 286, and randomly clips to 256 × 256. The generator model and the multi-scale discriminator model are subjected to 100000 times of iterative training, the weights of the countermeasure loss function, the cycle consistency loss function and the reconstruction loss function are respectively set to be 1, 10 and 10, the transfer rate of the weight transfer mechanism is set to be 0.9, and the batch size is set to be 1.

Experimental results, evaluated from the qualitative and quantitative perspectives, the present embodiment compares, from the 3 dataset perspective, translation images obtained by performing unsupervised image translation with a plurality of existing mainstream unsupervised image translation methods, where the plurality of existing mainstream unsupervised image translation methods include IEGAN, NICEGAN, U-GAT-IT, cycleGAN, UNIT, and mutit, and the comparison result includes a dataset whose target domain corresponding to a kitten is a source domain, a target domain corresponding to a puppy is a source domain, a target domain corresponding to an apple is a puppy source domain, a target domain corresponding to an orange source domain, and a target domain corresponding to a girl is a boy source domain. For a data set of which the source domain is a puppy and the target domain is a puppy, the comparison result between the translated images of the embodiment and the translated images of the multiple existing mainstream unsupervised image translation methods for unsupervised image translation is shown in fig. 6; for a data set of which the source domain is a puppy and the target domain is a kitten, a comparison result of a translated image obtained by performing unsupervised image translation by using the embodiment and a plurality of existing mainstream unsupervised image translation methods is shown in fig. 7; for a data set with a source domain of apple and a target domain of orange and a source domain of girl and a target domain of boy, the comparison result between the translation image obtained by performing unsupervised image translation by the present embodiment and a plurality of existing mainstream unsupervised image translation methods is shown in fig. 8. Fig. 6, 7 and 8 show the FID value and KID value of the present embodiment for image translation on 3 different data sets with a plurality of existing mainstream unsupervised image translation methods, and the smaller the FID value and KID value, the better the generation effect of the synthesized image, i.e. the higher the image quality of the translated image. Therefore, according to the 3 comparison results, the unsupervised image translation method based on the generation type self-attention mechanism has certain advantages compared with the existing multiple mainstream unsupervised image translation methods, and the translation quality and the translation accuracy of the translated images are effectively improved.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; the modifications or substitutions do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present application, and are included in the protection scope of the present application.

Claims

1. An unsupervised image translation method based on a generative self-attention mechanism is characterized by comprising the following steps:

acquiring an image to be translated, inputting the image to be translated into an independent encoder, and performing image preprocessing on the image to be translated to obtain a DSI (digital signal interface) depth information space of the image to be translated and further obtain a combined and superposed DSI depth information space;

2. The unsupervised image translation method based on the generated self-attention mechanism as claimed in claim 1, wherein the independent encoder is configured to perform image preprocessing on the image to be translated, the generator is configured to output a target translation image corresponding to the image to be translated, the multi-scale discriminator is configured to discriminate whether the target translation image corresponding to the image to be translated generated by the generator is false, the target translation image is a false image, and both the image to be translated and the true image are source domain images.

3. The unsupervised image translation method based on the generative self-attention mechanism as claimed in claim 1, wherein the step of constructing the generator with the generative self-attention mechanism comprises:

4. The unsupervised image translation method based on the generative self-attention mechanism as claimed in claim 1, wherein the step of constructing the multi-scale discriminator comprising the generative self-attention mechanism comprises:

5. The unsupervised image translation method based on the generative self-attention mechanism as claimed in claim 1, wherein the generator containing the generative self-attention mechanism comprises a down-sampling layer module, a modified residual block module, a generative self-attention mechanism module and an up-sampling layer module.

6. The unsupervised image translation method based on the generative self-attention mechanism as claimed in claim 5, wherein the step of obtaining the target translation image corresponding to the image to be translated comprises:

modifying the residual block of the multi-scale discriminator, and mapping and inputting the coding features to a modified residual block module in a generator to obtain an original domain feature image;

inputting the original domain feature map into a generating type self-attention mechanism network in a generator to obtain feature information of the original domain feature map;

7. The unsupervised image translation method based on the generative self-attention mechanism as claimed in claim 1, wherein the image preprocessing step for the image to be translated comprises: