CN115115870A - Image translation method, system, medium and device - Google Patents

Image translation method, system, medium and device Download PDF

Info

Publication number
CN115115870A
CN115115870A CN202210571742.4A CN202210571742A CN115115870A CN 115115870 A CN115115870 A CN 115115870A CN 202210571742 A CN202210571742 A CN 202210571742A CN 115115870 A CN115115870 A CN 115115870A
Authority
CN
China
Prior art keywords
image
translation
module
translated
visible light
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210571742.4A
Other languages
Chinese (zh)
Inventor
刘红
洪汉玉
马雷
陈冰川
赵凡
罗心怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Institute of Technology
Original Assignee
Wuhan Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Institute of Technology filed Critical Wuhan Institute of Technology
Priority to CN202210571742.4A priority Critical patent/CN115115870A/en
Publication of CN115115870A publication Critical patent/CN115115870A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to the field of deep learning image translation, and in particular, to an image translation method, system, medium, and device. The method comprises the following steps: inputting the preprocessed image to be translated into a visible light-infrared image translation network to obtain a translation result and edge information of the image, wherein the visible light-infrared image translation network comprises an image translation module and an image edge information supervision module, and the image translation module comprises an attention submodule. According to the invention, a self-attention mechanism is introduced into the generator and the discriminator to guide the visible light to the infrared image translation network to focus on the more discriminative region in the image, and the edge information supervision module is introduced to restrict the consistency of the edges of the generated image and the real image of the target domain, so that the effects of blurring and non-alignment of the edges of the translated image in the existing method can be favorably solved.

Description

Image translation method, system, medium and device
Technical Field
The present invention relates to the field of deep learning image translation, and in particular, to an image translation method, system, medium, and device.
Background
The traditional infrared image generation method is mainly developed by means of infrared image simulation software, and because the method is based on an actual physical model, the related parameter quantity is large, and manual intervention is needed in the research and application process, the requirements for quick and large-scale generation of infrared images cannot be met. With the development of deep learning and the increasingly intensive research of image translation technology on image generation tasks such as image style conversion, image rain removal, image restoration and the like, the problem of infrared image generation can also be solved by adopting the image translation technology in deep learning.
Image translation is a research hotspot in recent years of deep learning, and the purpose of image translation is to convert an image from an expression form of one domain into an expression form of another domain. The generation of a countermeasure Network (GAN) proposed in 2014 has gained great attention and development in recent years at home and abroad due to its quite excellent performance in image conversion. The generation countermeasure network is composed of a generator and a discriminator, and the network can output synthetic data fitting the distribution of real data through a maximum-minimum game between the generator and the discriminator.
The original GAN network is difficult to ensure the stability of training, the quality of generated images is limited, and the subsequently proposed cycloegan network introduces the cycle consistency loss to improve the structural stability of the network conversion images on the basis of the original generation countermeasure loss. With the significance effect of the attention mechanism on the classification task, the SAGAN and U-GAT-IT networks introduce an attention submodule in the image translation process, so that the networks pay more attention to areas with more discriminability in the conversion process in the translation process, and the generation quality of the networks is improved. Although the performance of the existing image translation network is better and better, the network still has the problems of edge diffusion, deformation, instability and the like of the generated image.
Disclosure of Invention
The invention aims to provide an image translation method, system, medium and equipment.
The technical scheme for solving the technical problems is as follows: a visible light to infrared image translation method comprising:
inputting the preprocessed image to be translated into a visible light-infrared image translation network to obtain a translation result and edge information of the image, wherein the visible light-infrared image translation network comprises an image translation module and an image edge information supervision module, and the image translation module comprises an attention submodule.
The invention has the beneficial effects that: a self-attention mechanism is introduced into a generator and a discriminator to guide visible light to an infrared image translation network to focus on a more discriminative area in an image, so that the translation of the image is better realized; the consistency of the edges of the generated image and the input image is restrained by introducing an edge information monitoring module, so that the problems of edge blurring and misalignment of the translated image in the existing method are solved.
On the basis of the technical scheme, the invention can be improved as follows.
Further, the pretreatment specifically comprises:
and sequentially carrying out rotation, scaling, cutting and normalization processing on the image to be translated.
Further, the process of obtaining the translation result specifically comprises:
performing feature extraction on the preprocessed image to be translated through a coding submodule in a generator in the image translation module to obtain a corresponding feature spectrum, wherein the coding submodule consists of three downsampling blocks and six residual blocks;
carrying out importance weighting on the feature spectrum finally output by the coding submodule through an attention submodule in the generator to obtain an attention spectrum;
and decoding the attention spectrum through a decoding submodule in the generator to obtain the translation result.
Further, the specific process of obtaining the edge information is as follows:
and after a first downsampling block of a coding submodule in a generator in the image translation module, externally connecting an edge head, performing edge information prediction on a first characteristic spectrum obtained after the first downsampling block is processed through the edge head, and constraining predicted edge information through an edge loss function.
Further, the visible light-to-infrared image translation network is trained through a training sample set, optimized through minimizing a total loss function, and tested through a testing sample set.
Another technical solution of the present invention for solving the above technical problems is as follows: a visible-to-infrared image translation system, comprising:
the processing module is used for inputting the preprocessed image to be translated into a visible light-infrared image translation network to obtain a translation result and edge information of the image, the visible light-infrared image translation network comprises an image translation module and an image edge information supervision module, and the image translation module comprises an attention submodule.
The invention has the beneficial effects that: the self-attention mechanism is introduced into the generator and the discriminator to guide the visible light to the infrared image translation network to focus on the more discriminative area in the image, so that the image translation is better realized; the consistency of the edges of the generated image and the input image is restrained by introducing an edge information monitoring module, so that the problems of edge blurring and misalignment of the translated image in the existing method are solved.
Further, the pretreatment specifically comprises:
and sequentially carrying out rotation, scaling, cutting and normalization processing on the image to be translated.
Further, the process of obtaining the translation result specifically comprises:
performing feature extraction on the preprocessed image to be translated through a coding submodule in a generator in the image translation module to obtain a corresponding feature spectrum, wherein the coding submodule consists of three downsampling blocks and six residual blocks;
carrying out importance weighting on the feature spectrum finally output by the coding submodule through an attention submodule in the generator to obtain an attention spectrum;
and decoding the attention spectrum through a decoding submodule in the generator to obtain the translation result.
Further, the specific process of obtaining the edge information is as follows:
and after a first downsampling block of a coding submodule in a generator in the image translation module, externally connecting an edge head, performing edge information prediction on a first characteristic spectrum obtained after the first downsampling block is processed through the edge head, and constraining predicted edge information through an edge loss function.
Further, the visible light-to-infrared image translation network is trained through a training sample set, optimized through minimizing a total loss function, and tested through a testing sample set.
Another technical solution of the present invention for solving the above technical problems is as follows: a storage medium having stored therein instructions which, when read by a computer, cause the computer to perform a visible-to-infrared image translation method as in any one of the above.
The invention has the beneficial effects that: a self-attention mechanism is introduced into a generator and a discriminator to guide visible light to an infrared image translation network to focus on a more discriminative area in an image, so that the translation of the image is better realized; the edge information monitoring module is introduced to restrict the consistency of the edges of the generated image and the input image, so that the problems of edge blurring and misalignment of the translated image in the existing method can be solved.
Another technical solution of the present invention for solving the above technical problems is as follows: an electronic device includes the storage medium and a processor executing instructions in the storage medium.
The beneficial effects of the invention are: a self-attention mechanism is introduced into a generator and a discriminator to guide visible light to an infrared image translation network to focus on a more discriminative area in an image, so that the translation of the image is better realized; the edge information monitoring module is introduced to restrict the consistency of the edges of the generated image and the input image, so that the problems of edge blurring and misalignment of the translated image in the existing method can be solved.
Drawings
FIG. 1 is a flow chart of a method for translating a visible light image into an infrared image according to an embodiment of the present invention;
FIG. 2 is a block diagram of a visible light to infrared image translation system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a model for unidirectional conversion from a source domain (visible light) to a target domain (infrared) according to an embodiment of the visible-to-infrared image translation method of the present invention;
fig. 4 is a schematic diagram of a self-attention mechanism in an attention submodule according to an embodiment of a method for translating a visible light image into an infrared image.
Detailed Description
The principles and features of this invention are described below in conjunction with examples which are set forth to illustrate, but are not to be construed to limit the scope of the invention.
As shown in fig. 1, a method for translating a visible light image into an infrared image includes:
step 1, preprocessing an image to be translated;
and 2, inputting the preprocessed image to be translated into a visible light-infrared image translation network to obtain a translation result and edge information of the image, wherein the visible light-infrared image translation network comprises an image translation module and an image edge information supervision module, and the image translation module comprises an attention sub-module.
In some possible embodiments, a self-attention mechanism is introduced into the generator and the discriminator to guide the visible light to the infrared image translation network to focus on more discriminative areas in the image, so that the translation of the image is better realized; the edge information monitoring module is introduced to restrict the consistency of the edges of the generated image and the input image, so that the problems of edge blurring and misalignment of the translated image in the existing method can be solved.
It should be noted that the image to be translated is: since the total penalty function introduces a round-robin penalty, the system has two translation directions. One is the translation (i.e. conversion) of the visible light image into the infrared image, and the image to be translated is the visible light image; the other direction is the translation from the infrared image to the visible light image, and the image to be translated is the infrared image.
The specific process of preprocessing the data set comprises the following steps:
carrying out data preprocessing operations such as random horizontal rotation, random scaling, random cutting, normalization and the like on an image to be translated;
for deployment, S11: randomly and horizontally turning over an original image;
s12: scaling the image data obtained in S11 to a size of 286 × 286;
s13: the image data obtained in S12 is cropped to a size of 256 × 256.
The preprocessing has the advantages that similar but not identical samples are generated by preprocessing the image data set, so that the training data set is enlarged, overfitting is inhibited, and the generalization capability of the model is improved.
For the image translation module:
as shown in fig. 3, the image translation module is composed of a generator and a discriminator which introduce a self-attention mechanism, respectively;
the generator converts an input image (a preprocessed image to be translated) into a synthetic image fitting the data distribution of a target domain, and the discriminator judges whether the synthetic image and a real image (a target image which is preprocessed in the same way) of the target domain input into the generator are true or false and outputs the discrimination probability. The discrimination probability is used for carrying out optimization training on the visible light-infrared image translation network. The discriminator comprises a coding submodule, an attention submodule and a classification submodule;
the encoding submodule of the generator performs feature extraction on the input image. The coding sub-module consists of three down-sampling blocks and six residual blocks, and the input image respectively obtains a characteristic spectrum F through the three down-sampling blocks e1 、F e2 、F e3 The channel dimensions corresponding to the three characteristic spectrums are respectively 64, 128 and 256, in order to increase the network depth, a residual layer composed of six residual blocks is stacked behind a down-sampling layer, and the output characteristic spectrum F is not changed by the residual layer e4 (the preprocessed input image outputs a first characteristic spectrum under the action of a first downsampling block of an encoding module; the first characteristic spectrum outputs a second characteristic spectrum under the action of a second downsampling block; the second characteristic spectrum outputs a third characteristic spectrum under the action of a third downsampling block; the third characteristic spectrum outputs the characteristic spectrum under the action of a residual error layer; namely, a fourth characteristic spectrum), wherein the channel dimension is still 256;
attention submodule pair feature spectrum F of generator e4 Importance weighting is performed. Firstly, to the characteristic spectrum F e4 Obtaining a self-attention spectrum F using a self-attention mechanism sa The principle of the self-attention mechanism is shown in fig. 4. In particular, the size is R C×H×W Characteristic spectrum F of e4 Respectively obtaining Q, K, V three vectors by three convolution kernels with the size of 1 multiplied by 1, wherein
Figure BDA0003659384980000071
、V∈R C×H×W And C is s C/4; then the vector is processed
Figure BDA0003659384980000072
Carrying out recombination and transposition operations to obtain a vector
Figure BDA0003659384980000073
And a vector of
Figure BDA0003659384980000074
、V∈R C×H×W Then a new vector is obtained through adaptive average pooling
Figure BDA0003659384980000075
、V′∈R C×(H′×W′) (ii) a Then multiplying the vector Q ' by the vector K ' and performing softmax operation to obtain an attention weight matrix, and multiplying the obtained attention weight by the vector V ' to obtain a self-attention spectrum F sa (ii) a Immediately after to the self-attention spectrum F sa Respectively obtaining global characteristic spectrum F by adopting global average pooling and maximum pooling gap And F gmp ,F gap And F gmp Simultaneously followed by a full link layer to obtain two classification outputs L gap And L gmp The full link layer corresponding parameter (which is initialized randomly by the network and is updated as the network is optimized) is used as the self-attention spectrum F sa Is re-weighted to obtain a weighted feature spectrum F' gap And F' gmp Prepared from F' gap And F' gmp Spliced together according to channel dimensions to obtain an attention spectrum F a (ii) a In addition to the attention weighting operation of the feature spectrum described above, two classification outputs L gap And L gmp Splicing according to the channel dimensions to obtain a final classification output result, wherein the classification output has the function of enabling a generator to pay more attention to more discriminative positions in an image;
decoding submodule pair attention spectrum F of generator a And decoding to obtain a translation image. The decoding submodule is composed of a residual block with adaptive example/layer normalization and an up-sampling layer, and specifically an attention spectrum F a Sending to a multilayer perceptron (MLP) to obtain parameters gamma and beta in normalization operation, and enabling the network to adaptively select case normalization or layer normalization to obtain a normalized feature spectrum F norm The purpose of the normalization operation is to enable the network to adaptively learn the corresponding changes of shape, color, texture and the like in image translation; characteristic spectrum F norm Reconstructing the image through an upper sampling layer to finally obtain a translation image with the same resolution as the input image;
output of generator by encoding submodule of discriminatorAnd (3) performing feature extraction on the translated image or the real image of the target domain (a visible light image and an infrared image are provided in a data set in pairs, and when the translation direction is from visible light to infrared image, the real image of the target domain is the infrared image provided by the data set and preprocessed. The encoding sub-module of the discriminator contains only a down-sampling layer, which consists of four down-sampling blocks. The image input into the discriminator is respectively processed by four down-sampling blocks to obtain a characteristic spectrum F D1 、F D2 、F D3 、F D4 The channel dimensions corresponding to the four characteristic spectrums are 64, 128, 256 and 512 respectively;
attention submodule pair characteristic spectrum F of discriminator D4 Importance weighting is performed. Firstly, to the characteristic spectrum F D4 Obtaining a self-attention spectrum F by applying a self-attention mechanism Dsa Immediately following the self-attention spectrum F Dsa Respectively obtaining global characteristic spectrum F by adopting global average pooling and maximum pooling Dgap And F Dgmp ,F Dgap And F Dgmp Two classification outputs L are obtained through the full connection layer Dgap And L Dgmp Taking the corresponding parameter of the full connection layer as a self-attention spectrum F Dsa Is re-weighted to obtain a weighted feature spectrum F' Dgap And F' Dgmp Prepared from F' Dgap And F' Dgmp Spliced together according to channel dimensions to obtain an attention spectrum F Da (ii) a In addition to the attention weighting operation of the feature spectrum described above, two classification outputs L Dgap And L Dgmp Splicing according to the channel dimensions to obtain a final classification output result, wherein the classification output has the function of enabling a discriminator network to pay more attention to more discriminative positions in an image;
the classification submodule of the discriminator combines the attention spectrum F Da The channel dimension of (2) is reduced to 1, and the authenticity probability of the input image of the discriminator is obtained (after the channel dimension is reduced to 1, the corresponding authenticity probability value is obtained).
The translation module has the advantages that when the generator converts the image and the discriminator judges the image, the network can pay attention to the area which has higher discrimination on the image translation, so that the performances of the generator and the discriminator are improved, the quality of the translated image is better, and the translated image is closer to the target image.
Aiming at the edge information supervision module:
in order to keep the translated image consistent (i.e. aligned) with the edge of the input image, an edge information supervision module is introduced on the basis of a translation network;
the first downsampling block of the slave generator coding module is followed by an edge prediction unit, the purpose of which is to predict the feature spectrum F e1 Corresponding edge information; the supervision information corresponding to the predicted edge information is the edge information of an image corresponding to a target domain (because the total loss function introduces cyclic loss, the system has two translation directions, one is the translation (namely conversion) from a visible light image to an infrared image, the corresponding target domain is the infrared domain, the other direction is the translation from the infrared image to the visible light image, the corresponding target domain is the visible light domain), and the supervision information is obtained by adopting a Sobel operator for the target domain image.
The edge information monitoring module has the advantages that the edge information monitoring module is beneficial to a network to translate, and meanwhile, the guiding system pays attention to the consistency of edges in the translation process, so that the outline edges of the translated images are clearer, more regular and more consistent, and the integral translation quality of the images is improved.
Training of a visible light-infrared image translation network:
first, the overall loss function of the network is defined. The overall loss function of the network is a joint function consisting of the generation penalty, the round-robin penalty, the consistency penalty, the classification penalty, and the edge penalty. The distribution of the generated image is continuously fitted with the distribution of the corresponding output domain by resisting loss, the stability of the network conversion image in structure can be improved by the cycle consistency loss, the convergence speed can be improved by the consistency loss, the network can be better distinguished by the BCE classification loss, the edge information of the generated image is guided to be consistent with the edge information of the target image by the edge loss through the Dice loss. The overall loss function of the network is:
L=p 1 L gan +p 2 L cyc +p 3 L iden +p 4 L class +p 5 L edge (1)
wherein p is 1 ~p 5 For corresponding generation of antagonistic losses L gan Cyclic loss L cyc Loss of consistency L iden Class loss L class And loss of edge information L edge All the proportionality coefficients are hyper-parameters. The optimization of the network includes both the source domain to target domain and target domain to source domain conversion. The respective loss terms corresponding to equation (1) when the source domain is converted into the target domain are expressed as follows:
Figure BDA0003659384980000101
Figure BDA0003659384980000102
Figure BDA0003659384980000103
Figure BDA0003659384980000104
Figure BDA0003659384980000105
Figure BDA0003659384980000106
wherein X is the source domain X se Or target domain X tt Se → tt indicates that the translation direction is from the source domain to the target domain, and E is desired. In the formulas (5) and (6)Eta of se (x) And
Figure BDA00036593849800001016
are respectively a generator G se→tt Attention module and discriminator D tt The final classification output of the attention module, in equation (7)
Figure BDA0003659384980000107
Representing predicted edges
Figure BDA0003659384980000108
The number i of the pixels of (1),
Figure BDA0003659384980000109
representing corresponding edge information of real image of target domain
Figure BDA00036593849800001010
Pixel represents the total number of pixels, σ is a laplacian smoothing term of value 1. Similarly, the expression formula for converting the target domain to the source domain corresponding to each loss term is similar to the above formula. The procedure is as follows. (an optimal model can be obtained after network optimization, and the optimal model is used as a final model for testing a sample set to be tested).
The respective loss terms corresponding to equation (1) when the target domain is converted into the source domain are expressed as follows:
Figure BDA00036593849800001011
Figure BDA00036593849800001012
Figure BDA00036593849800001013
Figure BDA00036593849800001014
Figure BDA00036593849800001015
Figure BDA0003659384980000111
wherein X is the source domain X se Or target domain X tt Tt → se denotes the translation direction as target domain to source domain. Eta in the formula (11) and the formula (12) tt (x) And
Figure BDA0003659384980000112
are respectively a generator G se→tt Attention module and discriminator D se The final classification output of the attention module, in equation (13)
Figure BDA0003659384980000113
Representing predicted edges
Figure BDA0003659384980000114
The number i of the pixels of (1),
Figure BDA0003659384980000115
representing the corresponding edge information of the real image of the source domain
Figure BDA0003659384980000116
Pixel represents the total number of pixels, σ is a laplacian smoothing term of value 1.
The method has the advantages that the distribution of the generated image is continuously fitted with the distribution of a corresponding output domain (a source domain or a target domain) by resisting loss, the stability of the structure of the network conversion image is improved by introducing the cyclic consistency loss, the convergence speed can be improved by introducing the consistency loss, the network can be guided to better distinguish the source domain image and the target domain image by introducing the BCE classification loss, and the edge loss is used for guiding the edge information of the generated image to be consistent with the edge information of the target image by adopting the Dice loss.
Then, optimizing the visible light-infrared image translation network according to the total loss function by adopting a training sample set (a public data set LLVIP downloaded from the Internet comprises the training sample set and a testing sample set) through an automatic differentiation technology by using a stochastic gradient descent and back propagation algorithm (optimizing the visible light-infrared image translation network by minimizing the total loss function);
the visible light-infrared image translation network is tested on the basis of the weight of a training set by adopting a test sample set (an open data set LLVIP downloaded from the Internet comprises the training sample set and the test sample set se→tt I.e. generators G with optimized test sample sets se→tt Resulting in an output translated image).
Optimizing the network by utilizing an automatic differentiation technology provided in a Pythrch and using a random gradient descent and back propagation based algorithm and minimizing each loss term in a total loss function; the test is performed using the training weights using the test sample set.
The method has the advantages that the network parameters are optimized through random gradient descent, a back propagation algorithm and a loss function; the test set is then used to perform tests based on the training set weights.
The image translation module introduced with the self-attention mechanism consists of a generator and a discriminator respectively introduced with the self-attention mechanism, and the image translation system has two translation directions, namely a translation direction from a source domain (visible light) to a target domain (infrared) and a translation direction from the target domain (infrared) to the source domain (visible light), so the system has two generators in corresponding directions and two discriminators in corresponding directions; the coding module of the generator extracts the characteristics of the input image, the system pays more attention to the area with large change in the translation process through the introduced self-attention mechanism, and the decoding module introduced with the adaptive instance-layer normalization guides the system to adaptively learn the change of shape, color, texture and the like in the translation process while reconstructing the image;
the image edge information supervision module leads out an edge head in a coding module of the generator to predict edge information, and guides a system to focus on the consistency of edges in the translation process by performing loss supervision on the edge information of the image corresponding to an output domain extracted by a Sobel operator, so that the outline edge of the translated image is clearer and more regular;
the loss function module is used for defining a total loss function of the network, wherein the total loss function consists of generation countermeasure loss, cycle loss, consistency loss, classification loss and edge loss;
the training test module optimizes the visible light to infrared image translation network by using the training sample set and tests the visible light to infrared image translation network by using the test sample set.
A method steps for implementing a visible-to-infrared image translation method when executed by a processor, a memory, and a program stored on the memory.
The embodiment of the invention is to test the proposed algorithm on the public data set LLVIP. The LLVIP dataset is a visible-infrared image paired dataset for low light vision. The data set contains 33672 images, a total of 16836 visible-infrared image pairs, most of which were taken in a very dark scene, all images being strictly aligned in time and space.
The method provided by the invention is realized by using NVIDIA TITAN V GPU and an open source machine learning library Pythrch. During each iteration of training, the initial learning rate is set to 10 -4 The batch size (batch-size) is set to 16, the number of iterations is 1000000, and the optimal model parameters are saved after training is completed.
Example 1, described with reference to fig. 1 and 3, S1: carrying out data preprocessing operations such as random horizontal rotation, scaling, clipping, normalization and the like on an image to be translated (because a target function introduces cyclic loss, the system has two translation directions, one is translation (namely conversion) from a visible light image to an infrared image, and the corresponding input image is the visible light image, and the other direction is translation from the infrared image to the visible light image, and the corresponding input image is the infrared image);
s2: and constructing an image translation module. The construction of the image translation module includes the construction of a generator and a discriminator, wherein the generator converts the preprocessed input image (i.e. the image to be translated) in step S1 into a translated image fitting the data distribution of the target domain by sending the preprocessed input image (i.e. the input image is converted into the translated image under the action of the encoding module, the attention module and the decoding module of the generator), the discriminator performs true and false discrimination on the translated image (the translated image obtained by the generator) or the real image of the target domain (i.e. the preprocessed target image in step S1) input into the generator (the translated image or the real image of the target domain obtains a probability value for judging whether the image is real or synthesized (i.e. false) under the action of the encoding module, the attention module and the classifying module of the discriminator network, and the probability value is 1 indicates that the image is real, a value of 0 indicates that the image is composed (false) data. The true and false judgment is the judgment of the translated image and the corresponding target domain real image. When the translation direction is from visible light to infrared image, the true and false judgment is the judgment between the infrared image generated by the generator and the real infrared image preprocessed in the step S1; when the translation direction is from the infrared image to the visible light image, the true-false discrimination is the discrimination between the generated visible light image and the real visible light image preprocessed in S1), and the discrimination probability is output. (the discriminant probability is the generation of the penalty function against the penalty L gan Is used in the preparation of the medicament. ) The generator comprises an encoding module, an attention module and a decoding module of the generator, and the discriminator comprises an encoding module, an attention module and a classification module;
s3: constructing an image edge information monitoring module, and connecting the output characteristic information of a first down-sampling layer of a generator coding module with an edge head (the edge head is a network formed by two convolution layers, a batch normalization layer, an activation layer and an up-sampling layer) to carry out edge information prediction (the output characteristic information of the first down-sampling layer is sent into the edge head which outputs predicted edge information, namely the edge head is used for generating edge information), wherein the corresponding monitoring information (the edge information prediction result can be understood as target information) is the edge information of a target domain real image extracted by a Sobel operator; (this step does not really have the discrimination probability applied to step 2. the purpose of this step is to constrain the generator to reduce the loss of edge information during the encoding process.
Performing edge information prediction during the generation of the translation image by the generator; the whole visible light-infrared image translation network restrains the generator from keeping the edge information in the encoding process (performing feature extraction) through the predicted edge information, namely the output feature of the first down-sampling layer can obtain the edge information through an edge header, which requires the output feature to keep the edge information, so that the predicted edge information plays a role in restraint; it is the entire visible-to-infrared image translation network that is optimized by minimizing the total loss function in S4. Each optimization of the network means that parameters of the network are updated, the updated network obtains new output (translation image), and as the optimization times increase, the translation image is more and more fitted with the target domain data distribution, that is, the quality of the translation image is higher and higher. And finally, selecting an optimal model (with highest translation image quality) as a final network model, and sending the input image into the final network model to obtain a final translation image.
The predicted edge information is obtained through the edge header, the image corresponding to the target domain (when the direction is from visible light to infrared, the image corresponding to the target domain is the infrared image which is provided by the data set and is preprocessed) is processed through the Solebeck operator to obtain the supervision information, the predicted edge information and the corresponding supervision information are needed to be used in the subsequent edge loss, and the edge loss has the effect that the predicted edge information is infinitely close to the corresponding supervision information.
The edge information and the target information are characteristic spectrums, and the visual effect of the characteristic spectrums is an edge graph.
The predicted edge information needs to be given a supervision information (group-route, often mentioned in deep learning) so that the edge header knows what the outgoing information should be. Without the supervision information, the output of the edge header is meaningless, and the network does not know that the edge map should be output. The edge loss requires the use of the predicted edge information and its corresponding supervision information.
S4: and defining a total loss function of the network (a total loss function of the visible light-infrared image translation network, wherein the whole visible light-infrared image translation network comprises the image translation module in the S2 and the image edge information supervision module in the S3). The total loss function of the network is a combined function consisting of generation countermeasure loss, cycle loss, consistency loss, classification loss and edge loss;
in addition to edge loss for the edge information supervision module, other losses are used for the image translation module, and the input of each loss is seen by the respective loss formula, e.g. D tt (G se→tt (x) Represents the translation image G se→tt (x) Send into the discriminator D tt And obtaining the discrimination probability.
S5: optimizing the visible-light-to-infrared image translation network (the visible-light-to-infrared image translation network is composed of both the image translation module in step S2 and the image edge information supervision module in step S3) by using a training sample set (the public data set LLVIP downloaded from the internet includes a training sample set and a test sample set); and testing the visible light-infrared image translation network by using a test sample set (the public data set LLVIP downloaded from the Internet comprises a training sample set and a test sample set).
(the training sample set is sent to the visible light-to-infrared image translation network for training, and the network is optimized according to the total loss function of S4. the connection relationship of the whole system is that the image preprocessing operation of S1 is firstly carried out, then the preprocessed input image obtained in the step S1 is sent to the visible light-to-infrared image translation network for image translation, and the visible light-to-infrared image translation network comprises the image translation module of S2 and the image edge information supervision module of S3. after the whole visible light-to-infrared image translation network formed by the image translation module of S2 and the image edge information supervision module of S3 is constructed, the total loss function of the whole network is defined as that of S4, the total loss function is a joint function of several loss functions of S4, and the neural network training or optimizing process is a process of minimizing the loss function, the smaller the loss function value, the closer the corresponding predicted and actual results are. Next, as described in S5, the training sample set is sent into the visible light-to-infrared image translation network for training, and the network is optimized according to the total loss function in S4; and after the network is optimized, sending the test sample set into the optimized network model for testing. ).
Preferably, in any of the above embodiments, the pre-processing specifically includes:
and sequentially carrying out rotation, scaling, cutting and normalization processing on the image to be translated.
Preferably, in any of the above embodiments, the process of obtaining the translation result specifically includes:
performing feature extraction on the preprocessed image to be translated through a coding submodule in a generator in the image translation module to obtain a corresponding feature spectrum, wherein the coding submodule consists of three downsampling blocks and six residual blocks;
carrying out importance weighting on the feature spectrum finally output by the coding submodule through an attention submodule in the generator to obtain an attention spectrum;
and decoding the attention spectrum through a decoding submodule in the generator to obtain the translation result.
Preferably, in any of the above embodiments, the specific process of obtaining the edge information is:
and after a first downsampling block of a coding submodule in a generator in the image translation module, externally connecting an edge head, performing edge information prediction on a first characteristic spectrum obtained after the first downsampling block is processed through the edge head, and constraining predicted edge information through an edge loss function.
Preferably, in any of the above embodiments, the visible-to-infrared image translation network is trained by a training sample set, optimized by minimizing a total loss function, and tested by a testing sample set.
As shown in fig. 2, a visible light to infrared image translation system includes:
the acquiring module 100 is configured to pre-process an image to be translated to obtain a pre-processed image to be translated;
the processing module 200 is configured to input the preprocessed image to be translated into a visible light-to-infrared image translation network to obtain a translation result and edge information of the image, where the visible light-to-infrared image translation network includes an image translation module and an image edge information supervision module, and the image translation module includes an attention submodule.
The invention has the beneficial effects that: a self-attention mechanism is introduced into a generator and a discriminator to guide visible light to an infrared image translation network to focus on a more discriminative area in an image, so that the translation of the image is better realized; the edge information monitoring module is introduced to restrict the consistency of the edges of the generated image and the input image, so that the problems of edge blurring and misalignment of the translated image in the existing method can be solved.
Preferably, in any of the above embodiments, the obtaining module 100 specifically includes:
and sequentially rotating, zooming, cutting and normalizing the image to be translated to obtain the preprocessed image to be translated.
Preferably, in any of the above embodiments, the process of obtaining the translation result specifically includes:
performing feature extraction on the preprocessed image to be translated through a coding submodule in a generator in the image translation module to obtain a corresponding feature spectrum, wherein the coding submodule consists of three downsampling blocks and six residual blocks;
carrying out importance weighting on the feature spectrum finally output by the coding submodule through an attention submodule in the generator to obtain an attention spectrum;
and decoding the attention spectrum through a decoding submodule in the generator to obtain the translation result.
Preferably, in any of the above embodiments, the specific process of obtaining the edge information is:
and after a first downsampling block of a coding submodule in a generator in the image translation module, externally connecting an edge head, performing edge information prediction on a first characteristic spectrum obtained after the first downsampling block is processed through the edge head, and constraining predicted edge information through an edge loss function.
Preferably, in any of the above embodiments, the visible-to-infrared image translation network is trained by a training sample set, optimized by minimizing a total loss function, and tested by a testing sample set.
Another technical solution of the present invention for solving the above technical problems is as follows: a storage medium having stored therein instructions which, when read by a computer, cause the computer to carry out a visible-to-infrared image translation method as in any one of the above.
In some possible embodiments, a self-attention mechanism is introduced into the generator and the discriminator to guide the visible light to the infrared image translation network to focus on more discriminative areas in the image, so that the translation of the image is better realized; the edge information monitoring module is introduced to restrict the consistency of the edges of the generated image and the input image, so that the problems of edge blurring and misalignment of the translated image in the existing method can be solved.
Another technical solution of the present invention for solving the above technical problems is as follows: an electronic device includes the storage medium and a processor executing instructions in the storage medium.
In some possible embodiments, the self-attention mechanism is introduced into the generator and the discriminator to guide the visible light to the infrared image translation network to focus on more discriminative areas in the image, so that the translation of the image is better realized; the edge information monitoring module is introduced to restrict the consistency of the edges of the generated image and the input image, so that the problems of edge blurring and misalignment of the translated image in the existing method can be solved.
The reader should understand that in the description of this specification, reference to the description of the terms "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described method embodiments are merely illustrative, and for example, the division of steps into only one logical functional division may be implemented in practice in another way, for example, multiple steps may be combined or integrated into another step, or some features may be omitted, or not implemented.
The above method, if implemented in the form of software functional units and sold or used as a stand-alone product, can be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for translating a visible light image into an infrared image is characterized by comprising the following steps:
step 1, preprocessing an image to be translated to obtain a preprocessed image to be translated;
and 2, inputting the preprocessed image to be translated into a visible light-infrared image translation network to obtain a translation result and edge information of the image, wherein the visible light-infrared image translation network comprises an image translation module and an image edge information supervision module, and the image translation module comprises an attention sub-module.
2. The method for translating a visible light image into an infrared image according to claim 1, wherein the step 1 specifically comprises:
and sequentially rotating, zooming, cutting and normalizing the image to be translated to obtain the preprocessed image to be translated.
3. The method for translating a visible light image into an infrared image according to claim 1, wherein the process of obtaining the translation result specifically comprises:
performing feature extraction on the preprocessed image to be translated through a coding submodule in a generator in the image translation module to obtain a corresponding feature spectrum, wherein the coding submodule consists of three downsampling blocks and six residual blocks;
carrying out importance weighting on the feature spectrum finally output by the coding submodule through an attention submodule in the generator to obtain an attention spectrum;
and decoding the attention spectrum through a decoding submodule in the generator to obtain the translation result.
4. The method for translating a visible light image into an infrared image according to claim 1, wherein the specific process of obtaining the edge information comprises:
and after a first downsampling block of a coding submodule in a generator in the image translation module, externally connecting an edge head, performing edge information prediction on a first characteristic spectrum obtained after the first downsampling block is processed through the edge head, and constraining predicted edge information through an edge loss function.
5. The visible-to-infrared image translation method of claim 1, wherein the visible-to-infrared image translation network is trained through a training sample set, optimized by minimizing a total loss function, and tested through a testing sample set.
6. A visible-to-infrared image translation system, comprising:
the acquisition module is used for preprocessing the image to be translated to obtain a preprocessed image to be translated;
the processing module is used for inputting the preprocessed image to be translated into a visible light-infrared image translation network to obtain a translation result and edge information of the image, the visible light-infrared image translation network comprises an image translation module and an image edge information supervision module, and the image translation module comprises an attention submodule.
7. The system for translating a visible light image into an infrared image according to claim 6, wherein the acquiring module specifically comprises:
and sequentially rotating, zooming, cutting and normalizing the image to be translated to obtain the preprocessed image to be translated.
8. The system for translating a visible light image into an infrared image according to claim 6, wherein the process of obtaining the translation result specifically comprises:
performing feature extraction on the preprocessed image to be translated through a coding submodule in a generator in the image translation module to obtain a corresponding feature spectrum, wherein the coding submodule consists of three downsampling blocks and six residual blocks;
carrying out importance weighting on the feature spectrum finally output by the coding submodule through an attention submodule in the generator to obtain an attention spectrum;
and decoding the attention spectrum through a decoding submodule in the generator to obtain the translation result.
9. A medium having stored therein instructions which, when read by a computer, cause the computer to execute a visible-to-infrared image translation method according to any one of claims 1 to 5.
10. A device comprising the storage medium of claim 9, a processor to execute instructions within the storage medium.
CN202210571742.4A 2022-05-24 2022-05-24 Image translation method, system, medium and device Pending CN115115870A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210571742.4A CN115115870A (en) 2022-05-24 2022-05-24 Image translation method, system, medium and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210571742.4A CN115115870A (en) 2022-05-24 2022-05-24 Image translation method, system, medium and device

Publications (1)

Publication Number Publication Date
CN115115870A true CN115115870A (en) 2022-09-27

Family

ID=83326055

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210571742.4A Pending CN115115870A (en) 2022-05-24 2022-05-24 Image translation method, system, medium and device

Country Status (1)

Country Link
CN (1) CN115115870A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115841589A (en) * 2022-11-08 2023-03-24 河南大学 Unsupervised image translation method based on generation type self-attention mechanism
CN115841589B (en) * 2022-11-08 2024-06-21 河南大学 Unsupervised image translation method based on generation type self-attention mechanism

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115841589A (en) * 2022-11-08 2023-03-24 河南大学 Unsupervised image translation method based on generation type self-attention mechanism
CN115841589B (en) * 2022-11-08 2024-06-21 河南大学 Unsupervised image translation method based on generation type self-attention mechanism

Similar Documents

Publication Publication Date Title
CN110706302B (en) System and method for synthesizing images by text
CN109410239B (en) Text image super-resolution reconstruction method based on condition generation countermeasure network
CN108648197B (en) Target candidate region extraction method based on image background mask
Yan et al. Fine-grained attention and feature-sharing generative adversarial networks for single image super-resolution
CN112819910B (en) Hyperspectral image reconstruction method based on double-ghost attention machine mechanism network
CN111583285B (en) Liver image semantic segmentation method based on edge attention strategy
CN110059728B (en) RGB-D image visual saliency detection method based on attention model
CN113642634A (en) Shadow detection method based on mixed attention
CN110263801B (en) Image processing model generation method and device and electronic equipment
CN114972213A (en) Two-stage mainboard image defect detection and positioning method based on machine vision
Ma et al. Unsupervised domain adaptation augmented by mutually boosted attention for semantic segmentation of VHR remote sensing images
US20220414838A1 (en) Image dehazing method and system based on cyclegan
CN116343052B (en) Attention and multiscale-based dual-temporal remote sensing image change detection network
Wei et al. Universal deep network for steganalysis of color image based on channel representation
Gao A method for face image inpainting based on generative adversarial networks
Geng et al. Cervical cytopathology image refocusing via multi-scale attention features and domain normalization
Lai et al. Generative focused feedback residual networks for image steganalysis and hidden information reconstruction
CN117635418A (en) Training method for generating countermeasure network, bidirectional image style conversion method and device
Tu et al. Unpaired image-to-image translation with improved two-dimensional feature
CN115115870A (en) Image translation method, system, medium and device
CN116503499A (en) Sketch drawing generation method and system based on cyclic generation countermeasure network
CN113780305B (en) Significance target detection method based on interaction of two clues
Ma et al. MHGAN: A multi-headed generative adversarial network for underwater sonar image super-resolution
Liu et al. Diverse hyperspectral remote sensing image synthesis with diffusion models
CN115953317A (en) Image enhancement method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination