Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.
The scheme provided by the embodiment of the application relates to the technologies such as machine learning of artificial intelligence and the like, and is specifically explained by the following embodiment:
the processing method of the image generator provided by the application can be applied to the application environment shown in FIG. 1. Wherein the terminal 102 communicates with the server 104 via a network. The terminal 102 acquires an image sample set, an image generator and a mutual information discriminator, wherein the image sample set comprises a source domain image sample and a reference image sample, and uploads the acquired image sample set, the image generator and the mutual information discriminator to the server 104; the server 104 generates a target generation image of the source domain image sample in the target domain through the image generator; the server 104 respectively extracts a first content feature of the source domain image sample, a second content feature of the reference image sample and a third content feature of the target generation image; the server 104 generates a positive sample according to the first content feature and the third content feature, and generates a negative sample according to at least one of the first content feature and the third content feature and the second content feature; the server 104 inputs the positive sample and the negative sample into the mutual information discriminator, performs iterative confrontation training on the image generator and the mutual information discriminator, and iteratively maximizes mutual information of the first content feature and the third content feature in the confrontation training process until an iteration stop condition is reached.
The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud storage, network services, cloud communication, big data, and an artificial intelligence platform. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
The image generation method provided by the application can also be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The terminal 102 acquires an image to be migrated and uploads the image to be migrated to the server 104; when the server 104 acquires the image to be migrated, determining a source domain to which the image to be migrated belongs and a target domain to which the image to be migrated belongs; server 104 queries an image generator for migrating images belonging to the source domain to the target domain; the server 104 generates a migration image of the image to be migrated in the target domain through the image generator; the content characteristics of the transferred image and the image to be transferred are the same; the image generator is obtained through iterative confrontation training with the mutual information discriminator, and target parameters are iteratively maximized in the confrontation training process; the target parameter is mutual information between the content characteristics of the source domain image sample and the content characteristics of the generated image sample; the source domain image sample belongs to a source domain; the generated image samples belong to a target domain and are generated from source domain image samples by an image generator.
In a specific embodiment, as shown in fig. 2, a front end running on a terminal 102 may obtain an image to be migrated provided by a user, the front end uploads the image to be migrated provided by the user to a back end (a server 104), and the back end executes an image generation method provided by the present application and feeds back a generated migration image to the front end.
In one embodiment, as shown in fig. 3, a processing method of an image generator is provided, which is described by taking the method as an example applied to the server 104 in fig. 1, and includes the following steps:
step 302, an image sample set, an image generator and a mutual information discriminator are obtained, wherein the image sample set comprises a source domain image sample and a reference image sample.
In the present application, model developers have designed an active learning network that includes an image generator and a mutual information discriminator. The image generator is used for transferring an original image of one image domain (source domain) to another image domain (target domain) to obtain a generated image belonging to the target domain. Because the generated image obtained by the migration of the image generator may be deformed (for example, objects in the generated image disappear or deform), that is, the generated image and the original image have a difference in image content, a mutual information discriminator is introduced to generate a countermeasure framework to constrain the image content of the generated image, so that the generated image and the original image are kept consistent in image content, and the generated image is prevented from being deformed.
Wherein the image sample set is a data set for the confrontation training image generator and the mutual information discriminator. The image domains to which the source domain image sample and the reference image sample belong may be the same or different. The image domains are different, and the image styles are different, which mainly shows that the images have differences in color and brightness. A source domain image sample is an image belonging to the source domain. The reference image sample is an image belonging to a source domain or a target domain. It will be appreciated that the source domain is used to characterize the domain of the image in which the image was located prior to migration. The target domain is used for representing the image domain where the image is located after the image is migrated.
In one embodiment, the image samples in the image sample set may be medical image samples. A medical image is a special and dedicated image in the medical field, and refers to an image of an internal tissue (stomach, abdomen, heart, brain, or the like) of a target object, which is obtained in a non-invasive manner for medical treatment or medical research. In different medical scenes, the obtained medical images are different due to different imaging devices or different imaging modes and the like. Examples of the images include images generated by medical instruments such as fundus images, endoscopic images, Computed Tomography (CT) images, Magnetic Resonance Imaging (MRI) images, ultrasound (US, e.g., B-ultrasound, color doppler ultrasound, heart color ultrasound, three-dimensional color ultrasound, etc.), X-ray images, electrocardiograms, electroencephalograms, and optical photography.
The medical images in different medical scenes can be regarded as images of different image domains; for example, the fundus image and the endoscopic image belong to different image domains, respectively. Medical images obtained by different imaging devices or imaging modes in the same medical scene can also be regarded as images of different image domains; for example, fundus images acquired by different fundus cameras belong to different image domains, respectively.
In one embodiment, the image samples in the image sample set may be regular images captured by an image capturing device, such as landscape images or people images captured by a camera. Different image domains such as oil or water color domains, etc. The image samples in the image sample set may also be video frames in a video.
In one particular embodiment, the image sample set may employ a training set as presently disclosed in the field of machine learning. Such as a refer dataset, a Lung node analysis (LUNA) 16 pulmonary Nodule detection dataset, a MICCAI (medical Image Computing and Computer Assisted assessment society) pancreas segmentation dataset, an ImageNet dataset, a MicroSoft COCO dataset, and the like.
For example, as shown in fig. 4, in the present embodiment, the image sample set is specifically a reference data set, and the reference training set and the test set of the reference are acquired by different fundus cameras respectively, so that the image domain to which the image sample of the reference training set and the image sample of the test set belong is different, and the brightness of the image sample of the reference training set is mainly lower than that of the image sample of the test set. Then, when the test set in fig. 4 is used to test the model trained by the training set in fig. 4, the images in the test set need to be migrated to the image domain where the images in the training set are located, and then the model test is performed. This may improve the performance of the training model.
It can be understood that, when the medical image processing model is trained through the image sample set, the performance of the medical image processing model is reduced because the image domain of the image samples of the training set is different from that of the image samples of the test set. Therefore, the image samples of the training set and the image samples of the test set can be migrated to the same image domain, for example, the image samples of the training set are migrated to the image domain to which the image samples of the test set belong, or the image samples of the test set are migrated to the image domain to which the image samples of the training set belong, so that the performance of the medical image processing model is improved.
And 304, generating a target generation image of the source domain image sample in the target domain through the image generator.
The target generated image is a real image obtained by transferring a source domain image sample from a source domain to a target domain through an image generator.
In a specific embodiment, the image Generator may employ a Generator (Generator) of GAN (Generative adaptive Networks), or the like.
Step 306, respectively extracting a first content feature of the source domain image sample, a second content feature of the reference image sample and a third content feature of the target generation image.
In the application, since the image content of the target generated image obtained by the image generator migration may change, that is, the content characteristics of the target generated image and the source domain image sample are different, the first content characteristic of the source domain image sample and the third content characteristic of the target generated image are extracted, and the first content characteristic of the source domain image sample and the third content characteristic of the target generated image are constrained by the mutual information discriminator, so that the source domain image sample and the target generated image are consistent in the image content.
In particular, the mutual information discriminator may be optimized by a classification task. The purpose of introducing the mutual information discriminator is: and constraining the first content features of the source domain image samples and the third content features of the target generation images so that the content features of the source domain image samples and the target generation images are the same. Therefore, the first content feature and the third content feature have no obvious distinguishability, and a training sample with obvious distinguishability needs to be constructed. Then, based on the above concept, the first content feature and the third content feature are used as a training sample, a reference image sample is selected from the source domain or the target domain, a second content feature of the reference image sample is extracted, and another training sample is constructed according to the second content feature and the first content feature, or the second content feature and the third content feature.
The content features may include texture features, shape features, spatial relationship features, and the like of the image. Texture features describe the surface properties of objects (e.g., scenes, people, objects, etc.) to which an image or image region corresponds. The shape features are represented in two types, one is outline features, the other is region features, the outline features mainly aim at the outer boundary of the target, and the region features are related to the whole shape region of the target. The spatial relationship characteristic refers to the mutual spatial position or relative direction relationship among a plurality of targets segmented from the image, and these relationships can be also divided into a connection/adjacency relationship, an overlapping/overlapping relationship, an inclusion/containment relationship, and the like.
The content characteristics of the reference image sample are different from those of the source domain image sample and the target generation image.
In a specific embodiment, an image can be randomly extracted from other images of the source domain except for the image sample of the source domain as a reference image sample; or randomly extracting an image from other images of the target domain except the target generated image as a reference image sample.
Step 308, generating a positive sample according to the first content characteristic and the third content characteristic, and generating a negative sample according to at least one of the first content characteristic and the third content characteristic and the second content characteristic.
In the present application, although there is a difference between the first content characteristic and the third content characteristic, the difference is smaller than the difference between the first content characteristic and the second content characteristic, and the difference between the second content characteristic and the third content characteristic. And generating a positive sample according to the first content characteristic and the third content characteristic, and generating a negative sample according to at least one of the first content characteristic and the third content characteristic and the second content characteristic, so that the mutual information discriminator can gradually distinguish the positive sample from the negative sample in the training process.
The positive sample comprises a first content feature, a third content feature and a corresponding training label, and the negative sample comprises the first content feature, the second content feature and the corresponding training label, or the second content feature, the third content feature and the corresponding training label, or the first content feature, the second content feature, the third content feature and the corresponding training label.
It is to be understood that "positive" and "negative" are used herein only to distinguish the training samples and do not constitute a limitation on the training labels of the training samples, i.e., it is also possible to generate negative samples according to the first content feature and the third content feature, and generate positive samples according to the second content feature and at least one of the first content feature and the third content feature.
In one embodiment, the training label for the positive sample may be "real" and the training label for the negative sample may be "fake".
And 310, inputting the positive sample and the negative sample into a mutual information discriminator, performing iterative confrontation training on the image generator and the mutual information discriminator, and iteratively maximizing mutual information of the first content characteristic and the third content characteristic in the confrontation training process until an iteration stop condition is reached.
In the application, the image generator and the mutual information discriminator are subjected to iterative confrontation training by taking the positive samples and the negative samples as confrontation sample pairs. The confrontation training means that the image generator and the mutual information discriminator form a dynamic 'game process', and the confrontation training mutually confronts and promotes each other.
Wherein mutual information is used to characterize interdependencies between variables. The larger the mutual information between the first content feature and the third content feature, the more similar the distribution of the first content feature and the third content feature.
In a specific embodiment, the mutual information discriminator may be a general discriminator.
In a particular embodiment, an adaptive moment estimation (Adam) optimizer may be used to perform parameter optimization on the image generator and the mutual information discriminator during the confrontational training process. In the optimization process, through continuously updating the model parameters and the bias parameters, in each iteration process, the error of the prediction result is calculated and reversely propagated to the model, the gradient is calculated, and the parameters of the model are updated.
Specifically, as training progresses, under the constraint of the mutual information discriminator, the difference of content features between the source domain image sample and the target generation image gradually becomes smaller; under the constraint of an image generator, the mutual information discriminator has higher and higher classification accuracy. And, with the optimization of the parameter of the mutual information discriminator, the mutual information of the first content characteristic and the third content characteristic is gradually increased.
It can be understood that by the method provided by the application, the image migration from the target domain to the source domain can be trained, and bidirectional image migration and multidirectional image migration are realized.
In the processing method of the image generator, a source domain image sample, a reference image sample, the image generator and a mutual information discriminator are obtained, generating a target generation image of the source domain image sample in the target domain by an image generator, respectively extracting a first content feature of the source domain image sample, a second content feature of the reference image sample and a third content feature of the target generation image, generating a positive sample according to the first content feature and the third content feature, generating a negative sample according to at least one of the first content feature and the third content feature and the second content feature, inputting the positive sample and the negative sample into a mutual information discriminator, and performing iterative countermeasure training on the image generator and the mutual information discriminator, and iteratively maximizing the mutual information of the first content characteristic and the third content characteristic in the countermeasure training process until an iteration stop condition is reached. In this way, through the countertraining between the mutual information discriminator and the image generator, when the image generator transfers the image from the source domain to the target domain, the image of the target domain is consistent with the image of the source domain in content characteristics, so that the image deformation of the target domain is avoided; moreover, when the target generation image is used for training the medical image processing model, the performance of the medical image processing model can be improved.
In one embodiment, inputting the positive sample and the negative sample into a mutual information discriminator, performing iterative confrontation training on the image generator and the mutual information discriminator, and iteratively maximizing mutual information of the first content feature and the third content feature during the confrontation training until an iteration stop condition is reached, comprises: characterizing mutual information of the first content feature and the third content feature by relative entropy of the first content feature and the third content feature; constructing a discrimination loss function of a mutual information discriminator according to the cross entropy of the first content characteristic and the third content characteristic; the cross entropy is positively correlated with the relative entropy; inputting the positive sample and the negative sample into a mutual information discriminator, performing iterative confrontation training on the image generator and the mutual information discriminator by combining a discriminant loss function, iteratively optimizing the discriminant loss function in the confrontation training process, and maximizing mutual information.
It can be understood that the discriminant loss function of the mutual information discriminator is constructed according to the cross entropy of the first content feature and the third content feature. And the discrimination loss function is used for optimizing the parameters of the mutual information discriminator according to the error of the prediction result. And the cross entropy is related to the relative entropy as follows:
H(Za;Zab)=H(Za)+I(DV)(Za;Zab)
wherein, H (Z)a;Zab) Is the cross entropy of the first content feature and the third content feature, H (Z)a) Information entropy, I, for the first and third content characteristics(DV)(Za;Zab) DV (Donsker-Varadhan) representation of KL (Kullback-Leibler) divergence, which is the relative entropy of the first content feature and the third content feature.
Since the information entropy of the first content feature and the third content feature is fixed, the cross entropy is positively correlated with the relative entropy. In the process of confrontation training, parameters of the mutual information discriminator are gradually optimized, the cross entropy is smaller and smaller, and the relative entropy is smaller and smaller. The relative entropy can measure the similarity between two distributions, and the smaller the relative entropy is, the more similar the two distributions are, and the larger the mutual information between the two distributions is. Then, based on the above concept, mutual information of the first content feature and the third content feature can be characterized by relative entropy of the first content feature and the third content feature, so that mutual information of the first content feature and the third content feature becomes larger and larger during the countermeasure training.
Specifically, a lower bound of mutual information of the first content feature and the third content feature is defined by relative entropy, and the formula is as follows:
wherein, I
(DV)(Z
a;Z
ab) DV (Donsker-Varadhan) representation of KL (Kullback-Leibler) divergence, which is the relative entropy of the first content feature and the third content feature; e
J[D
MI(Z
a;Z
ab)]Entropy distributed jointly for the first content feature and the third content feature;
entropy distributed for the first content feature and the third content feature edge; d
MI:Z
a×Z
ab→ R is the mutual information discriminator function, R is the real space.
Specifically, in the countermeasure training process, the mutual information discriminator gradually distinguishes positive samples and negative samples, parameters of the mutual information discriminator are gradually optimized, the lower bound of the mutual information of the first content feature and the third content feature is more and more accurate, and therefore the mutual information of the first content feature and the third content feature is more and more large.
In this embodiment, mutual information of the first content feature and the third content feature is represented by relative entropy of the first content feature and the third content feature; constructing a discrimination loss function of a mutual information discriminator according to the cross entropy of the first content characteristic and the third content characteristic; the cross entropy is positively correlated with the relative entropy; inputting the positive sample and the negative sample into a mutual information discriminator, performing iterative confrontation training on the image generator and the mutual information discriminator by combining a discrimination loss function, and iteratively optimizing the discrimination loss function and maximizing the mutual information in the confrontation training process, so that the parameter of the mutual information discriminator is optimized by a classification task to realize the maximization of the mutual information of the first content characteristic and the third content characteristic.
In one embodiment, the extracting the first content feature of the source domain image sample, the second content feature of the reference image sample, and the third content feature of the target generation image respectively includes: acquiring a source domain encoder and a target domain encoder; coding the source domain image sample into a feature space through a source domain coding model to obtain a first content feature of the source domain image sample; and coding the reference image sample into the feature space through the target domain coder to obtain a second content feature of the reference image sample, and coding the target generated image into the feature space through the target domain coder to obtain a third content feature of the target generated image.
The source domain encoder is used for extracting features from an image of a source domain, and embedding the extracted features into a feature space to obtain a feature vector. The target domain encoder is used for extracting features from the image of the target domain, and embedding the extracted features into the same feature space to obtain feature vectors. The feature space is used for storing feature vectors.
In a specific embodiment, the source domain encoder and the target domain encoder may be general encoders.
In this embodiment, a source domain encoder and a target domain encoder are obtained, a source domain image sample is encoded into a feature space by the source domain encoder to obtain a first content feature of the source domain image sample, a reference image sample is encoded into the feature space by the target domain encoder to obtain a second content feature of the reference image sample, and a target generated image is encoded into the feature space by the target domain encoder to obtain a third content feature of the target generated image.
In one embodiment, inputting the positive sample and the negative sample into a mutual information discriminator, performing iterative confrontation training on the image generator and the mutual information discriminator, and iteratively maximizing mutual information of the first content feature and the third content feature during the confrontation training until an iteration stop condition is reached, comprises: inputting the positive sample and the negative sample into a mutual information discriminator, carrying out iterative confrontation training on the image generator, the source domain encoder, the target domain encoder and the mutual information discriminator, and iteratively maximizing the mutual information of the first content characteristic and the third content characteristic in the confrontation training process until an iteration stop condition is reached.
Specifically, as shown in fig. 5, fig. 5 is a block diagram of a training system of an image generator in one embodiment. And Ia is a source domain image sample, Ib is a target generation image, Ic is a reference image sample, Za is a first content characteristic, Zb is a third content characteristic, and Zc is a second content characteristic. In the countercheck training process, the image generator, the source domain encoder, the target domain encoder and the mutual information discriminator form a dynamic game process, and the game process is counterchecked and mutually promoted.
Specifically, as training progresses, under the constraints of a source domain encoder, a target domain encoder and a mutual information discriminator, the image generator generates a target generation image and a source domain image sample, and the difference of content features between the target generation image and the source domain image sample gradually becomes smaller; the source domain encoder extracts the features of the source domain image sample more and more accurately under the constraints of the image generator, the target domain encoder and the mutual information discriminator; under the constraints of the image generator, the source domain encoder and the mutual information discriminator, the target domain encoder extracts the characteristics of the target generated image and the reference image sample more and more accurately; under the constraints of an image generator, a source domain encoder and a target domain encoder, a mutual information discriminator has higher and higher classification accuracy.
In this embodiment, the positive sample and the negative sample are input to the mutual information discriminator, iterative confrontation training is performed on the image generator, the source domain encoder, the target domain encoder and the mutual information discriminator, and mutual information of the first content feature and the third content feature is iteratively maximized in the confrontation training process until an iteration stop condition is reached.
In one embodiment, the method further comprises: acquiring a source domain decoder and a target domain decoder; mapping a first content characteristic of a source domain image sample to a source domain through a source domain decoder to obtain a first reconstructed image; mapping the first content characteristics of the source domain image sample to a target domain through a target domain decoder to obtain a second reconstructed image; constructing a first loss function based on a difference between the source domain image sample and the first reconstructed image, and constructing a second loss function based on a difference between the target generated image and the second reconstructed image; inputting the positive sample and the negative sample into a mutual information discriminator, performing iterative confrontation training on the image generator and the mutual information discriminator, and iteratively maximizing mutual information of the first content characteristic and the third content characteristic in the confrontation training process until an iteration stop condition is reached, wherein the method comprises the following steps: inputting the positive sample and the negative sample into a mutual information discriminator, combining the first loss function and the second loss function, performing iterative confrontation training on the image generator, the source domain encoder, the target domain encoder, the source domain decoder, the target domain decoder and the mutual information discriminator, and iteratively maximizing mutual information of the first content characteristic and the third content characteristic in the confrontation training process until an iteration stop condition is reached.
It will be appreciated that when the source domain encoder encodes the first content feature into the feature space, the source domain encoder may encode the source domain information into the feature space together; similarly, when the target encoder encodes the second content feature and the third content feature into the feature space, the target domain encoder may encode the target domain information into the feature space together. The source domain information and the target domain information may affect the training efficiency of the image generator to a certain extent. Based on this, the source domain information can be separated from the feature space by the source domain decoder, and the target domain information can be separated from the feature space by the target domain decoder.
The source domain decoder is used for reconstructing the feature vector obtained according to the source domain encoder or the target domain encoder to a source domain; and the target domain decoder is used for reconstructing the feature vector obtained by the source domain encoder or the target domain encoder to a target domain. The first reconstructed image is a real image obtained by reconstructing the first content characteristics to a source domain through a source domain decoder; the second reconstructed image is a real image obtained by reconstructing the first content features to the target domain through the target domain decoder. A first loss function is used to reduce the L1 norm between the source domain image samples and the first reconstructed image; the second loss function is used to reduce the L1 norm between the target generated image and the second reconstructed image.
Specifically, as shown in fig. 6, fig. 6 is a block diagram of a training system of an image generator in another embodiment. Wherein Ia is a source domain image sample, Ib is a target generation image, Za is a first content feature, Zaa is a first reconstruction image, and Zab is a second reconstruction image. When the source domain information is not present in the feature space, the source domain image samples may be infinitely close to the first reconstructed image and the target generated image may be infinitely close to the second reconstructed image. Taking the target generated image and the second reconstructed image as an example, the second reconstructed image is also an image of the source domain image sample corresponding to the target domain, and is only different from the generation mode of the target generated image, and the target generated image does not carry active domain information, so that when the source domain information does not exist in the feature space, the second reconstructed image is in wireless proximity to the target generated image. Therefore, a first loss function is constructed based on the difference between the source domain image sample and the first reconstructed image, and the difference between the source domain image sample and the first reconstructed image is reduced through the first loss function, so that the source domain decoder learns the source domain information; and constructing a second loss function based on a difference between the target generated image and the second reconstructed image, and reducing the difference between the target generated image and the second reconstructed image through the second loss function so that the target domain decoder learns the target domain information.
In one embodiment, the third content feature of the target generated image may be mapped to the source domain by the source domain decoder, resulting in a third reconstructed image; mapping the third content characteristics of the target generated image to a target domain through a target domain decoder to obtain a fourth reconstructed image; constructing a third loss function based on the difference between the source domain image sample and the third reconstructed image, and constructing a fourth loss function based on the difference between the target generation image and the fourth reconstructed image; and inputting the positive sample and the negative sample into a mutual information discriminator, and combining a third loss function and a fourth loss function.
The third reconstructed image is a real image obtained by reconstructing the third content characteristics to the source domain through a source domain decoder; the fourth reconstructed image is a real image obtained by reconstructing the third content feature to the target domain through the target domain decoder. A third loss function is used to reduce the L1 norm between the source domain image samples and the third reconstructed image; the fourth loss function is used to reduce the L1 norm between the target generated image and the fourth reconstructed image.
Specifically, as shown in fig. 7, fig. 7 is a block diagram of a training system of an image generator in a further embodiment. Wherein Ia is a source domain image sample, Ib is a target generated image, Zb is a third content feature, Zba is a third reconstructed image, and Zbb is a fourth reconstructed image. When the target domain information does not exist in the feature space, the source domain image sample can be infinitely close to the third reconstructed image, and the target generation image is infinitely close to the fourth reconstructed image. Therefore, a third loss function is constructed based on the difference between the source domain image sample and the third reconstructed image, and the difference between the source domain image sample and the third reconstructed image is reduced through the third loss function, so that the source domain decoder learns the source domain information; and constructing a fourth loss function based on the difference between the target generation image and the fourth reconstruction image, and reducing the difference between the target generation image and the fourth reconstruction image through the fourth loss function so that the target domain decoder learns the target domain information.
In one embodiment, the extracting the first content feature of the source domain image sample, the second content feature of the reference image sample, and the third content feature of the target generation image respectively includes: acquiring a source domain encoder and a target domain encoder; encoding the source domain image sample into a feature space through a source domain encoder to obtain a first encoding feature of the source domain image sample; the first encoding characteristic comprises at least a first content characteristic; encoding the reference image sample into a feature space through a target domain encoder to obtain a second encoding feature of the reference image sample; the second coding features at least comprise second content features, and the target generated image is coded into a feature space through a target domain coder to obtain third coding features of the target generated image; the third encoding characteristic includes at least a third content characteristic.
The processing method of the image generator further comprises: acquiring a source domain decoder and a target domain decoder; mapping a first coding characteristic of a source domain image sample to a source domain through a source domain decoder to obtain a first reconstructed image; mapping the first coding characteristics of the source domain image sample to a target domain through a target domain decoder to obtain a second reconstructed image; a first loss function is constructed based on differences between the source domain image samples and the first reconstructed image, and a second loss function is constructed based on differences between the target generated image and the second reconstructed image.
Inputting the positive sample and the negative sample into a mutual information discriminator, performing iterative confrontation training on the image generator and the mutual information discriminator, and iteratively maximizing mutual information of the first content characteristic and the third content characteristic in the confrontation training process until an iteration stop condition is reached, wherein the method comprises the following steps: inputting the positive sample and the negative sample into a mutual information discriminator, combining a first loss function and a second loss function, carrying out iterative confrontation training on the image generator, the source domain encoder, the target domain encoder, the source domain decoder, the target domain decoder and the mutual information discriminator, iteratively maximizing mutual information of the first content characteristic and the third content characteristic in the confrontation training process, and until the first coding characteristic only comprises the first content characteristic, the second coding characteristic only comprises the second content characteristic, the third coding characteristic only comprises the third content characteristic and an iteration stop condition is reached.
It is understood that in the early stages of iterative counter-training, the coding features resulting from the encoder coding include content features and domain features. For example, the source domain encoder encodes the source domain image samples to obtain a first encoding characteristic, which includes a first content characteristic and a source domain characteristic; for another example, the target domain encoder encodes the target-generating image to obtain a third encoding characteristic, which includes a third content characteristic and a target domain characteristic. In an iterative confrontation training process, the decoder learns to recover the domain features during the decoding process by simultaneously minimizing the first loss function and the second loss function such that the encoder learns to perform content feature dissociation, i.e., to remove the domain features, during the encoding process. In this way, at the later stage of the iterative confrontation training, the coding features obtained by the coding of the coder can only include the content features, and can be better used for the mutual information discriminator to maximize the mutual information between the content features. It should be noted that the encoding features, including only the content features, refer to the target state, allowing for errors that are tolerable in reality. Also, since the mapping between the source domain and the target domain is symmetrical, the mapping from the target domain to the source domain, the content feature dissociation during encoding, and the feature recovery during decoding can be similar to the previous embodiment.
In a specific embodiment, the source domain decoder and the target domain decoder can both adopt a universal decoder.
Specifically, in the countermeasure training process, the image generator, the source domain encoder, the target domain encoder, the source domain decoder, the target domain decoder and the mutual information discriminator form a dynamic game process, and the game process is mutually confronted and mutually promoted. With the training, the difference of content features between a target generated image generated by the image generator and a source domain image sample is gradually reduced, the source domain encoder extracts the features of the source domain image sample more and more accurately, the target domain encoder extracts the features of the target generated image and a reference image sample more and more accurately, the source domain decoder gradually separates source domain information from a feature space, and the target domain decoder gradually separates target domain information from the feature space, so that the mutual information discriminator has higher and higher classification accuracy.
In a particular embodiment, the second content feature may be from an image sample of the target domain while generating positive samples from the first content feature and the third content feature and generating negative samples from the first content feature and the second content feature. The first content characteristic comes from the source domain, the third content characteristic comes from the target domain, and if the second content characteristic also comes from the target domain, the interference of the domain information in the training process can be reduced. Likewise, the second content feature may be from an image sample of the source domain when generating the negative sample from the second content feature and the third content feature.
In this embodiment, a source domain decoder and a target domain decoder are obtained, a first content feature of a source domain image sample is mapped to a source domain through the source domain decoder to obtain a first reconstructed image, a first content feature of the source domain image sample is mapped to a target domain through the target domain decoder to obtain a second reconstructed image, a first loss function is constructed based on a difference between the source domain image sample and the first reconstructed image, a second loss function is constructed based on a difference between a target generated image and the second reconstructed image, a positive sample and a negative sample are input to a mutual information discriminator, and an image generator, a source domain encoder, a target domain encoder, a source domain decoder, a target domain decoder and a mutual information discriminator are subjected to iterative countertraining in combination with the first loss function and the second loss function, and mutual information of the first content feature and a third content feature is iteratively maximized in a countertraining process, and the training efficiency of the image generator is improved through the iterative confrontation training of the image generator, the source domain encoder, the target domain encoder, the source domain decoder, the target domain decoder and the mutual information discriminator.
In one embodiment, the method further comprises: acquiring an image discriminator; inputting the positive sample and the negative sample into a mutual information discriminator, performing iterative confrontation training on the image generator and the mutual information discriminator, and iteratively maximizing mutual information of the first content characteristic and the third content characteristic in the confrontation training process until an iteration stop condition is reached, wherein the method comprises the following steps: inputting a source domain image sample and a reference image sample into an image discriminator, inputting a positive sample and a negative sample into a mutual information discriminator, performing iterative dual-countercheck training on an image generator, the image discriminator and the mutual information discriminator, and iteratively maximizing mutual information in the countercheck training process until an iteration stop condition is reached.
Wherein the image discriminator is used for carrying out countermeasure training with the image generator.
It can be understood that because the quality of the image generated by the image generator lacks constraints, an image discriminator is introduced to improve the image quality of the target generated image based on the generation of the antagonistic frame. As shown in fig. 8, fig. 8 is a block diagram of a training system of an image generator in a further embodiment. The image generator generates a target generation image according to the source domain image sample, the source domain image sample and the target generation image are input into the image discriminator, and the image discriminator distinguishes the source domain image sample and the target generation image, so that the image generator and the image discriminator are trained together in continuous confrontation.
Specifically, as training progresses, under the constraint of the image discriminator and the mutual information discriminator, the difference of content features between the source domain image sample and the target generation image gradually becomes smaller; under the constraint of the image generator and the image discriminator, the mutual information discriminator has higher and higher classification accuracy.
In a specific embodiment, the image Discriminator may use a GAN (generic adaptive Networks) Discriminator (Discriminator), or the like.
In this embodiment, an image discriminator is obtained, a source domain image sample and a reference image sample are input to the image discriminator, a positive sample and a negative sample are input to a mutual information discriminator, an iterative dual countertraining is performed on the image generator, the image discriminator and the mutual information discriminator, and mutual information is iteratively maximized in a countertraining process until an iteration stop condition is reached, so that the performance and the training efficiency of the image generator are improved.
In one embodiment, the image generator is a target domain image generator; the image discriminator is a target domain image discriminator; the method further comprises the following steps: acquiring a source domain image generator and a source domain image discriminator; generating a restored image of the target generation image in the source domain by the source domain image generator; constructing cycle consistency loss of a source domain image generator, a source domain image discriminator, a target domain image generator and a target domain image discriminator; inputting a source domain image sample and a reference image sample into an image discriminator, inputting a positive sample and a negative sample into a mutual information discriminator, performing iterative dual-countercheck training on an image generator, the image discriminator and the mutual information discriminator, and iteratively maximizing mutual information in the countercheck training process until an iteration stop condition is reached, wherein the iterative dual-countercheck training method comprises the following steps: inputting a source domain image sample and a target generation image into a target domain image discriminator, inputting the source domain image sample and a recovery image into the source domain image discriminator, and inputting a positive sample and a negative sample into a mutual information discriminator, performing iterative dual countercheck training on the source domain image generator, the source domain image discriminator, the target domain image generator, the target domain image discriminator and the mutual information discriminator by combining cycle consistency loss, and iteratively maximizing mutual information in a countercheck training process until an iteration stop condition is reached.
The target domain image generator is used for migrating an image from the source domain to the target domain; the target domain image discriminator is used for discriminating the image generated by the target domain image generator; the source domain image generator is used for migrating an image from the target domain to the source domain; the source domain image discriminator is used for discriminating the image generated by the source domain image generator. The restored image is a real image of the generated target generation image in the source domain by the source domain image generator. The cycle consistency loss comprises a loss calculated by the target domain image discriminator and a loss calculated by the source domain image discriminator, and is used for optimizing parameters of the source domain image generator, the source domain image discriminator, the target domain image generator and the target domain image discriminator.
It can be understood that due to the lack of constraints on the quality of the image generated by the target domain image generator, the source domain image generator, the target domain image discriminator and the source domain image discriminator are introduced, and the quality of the target generated image is improved based on the generation countermeasure framework. The target domain image generator generates a target generation image according to the source domain image sample, the source domain image sample and the target generation image are input into a target domain image discriminator, and the target domain image discriminator distinguishes the source domain image sample and the target generation image; and generating a restored image of the target generated image in the source domain by a source domain image generator, inputting the source domain image sample and the restored image into a source domain image discriminator, and distinguishing the source domain image sample from the restored image by the source domain image discriminator. In this way, the target domain image generator is trained with the source domain image generator, the target domain image discriminator, and the source domain image discriminator in a continuous confrontation.
Specifically, as training progresses, under the constraint of the source domain image generator, the target domain image discriminator, the source domain image discriminator and the mutual information discriminator, the difference of content features between the source domain image sample and the target generation image gradually becomes smaller in the target domain image generator; and the mutual information discriminator has higher and higher classification accuracy under the constraint of the target domain image generator, the source domain image generator, the target domain image discriminator and the source domain image discriminator.
In a specific embodiment, the target domain image Generator and the source domain image Generator may adopt a Generator (Generator) of CycleGAN, etc.; the target domain image Discriminator and the source domain image Discriminator may employ a CycleGAN Discriminator (Discriminator), or the like.
In this embodiment, a source domain image generator and a source domain image discriminator are obtained, a restored image of a target generated image in a source domain is generated by the source domain image generator, a loop consistency loss of the source domain image generator, the source domain image discriminator, the target domain image generator and the target domain image discriminator is constructed, a source domain image sample and a target generated image are input to the target domain image discriminator, a source domain image sample and a restored image are input to the source domain image discriminator, and a positive sample and a negative sample are input to a mutual information discriminator, and a dual countercheck training is iteratively performed on the source domain image generator, the source domain image discriminator, the target domain image generator, the target domain image discriminator and the mutual information discriminator by combining the loop consistency loss, and mutual information is iteratively maximized in a countercheck training process until an iteration stop condition is reached, so that, the performance and the training efficiency of the image generator are improved.
In one embodiment, generating positive samples from the first content characteristic and the third content characteristic and generating negative samples from at least one of the first content characteristic and the third content characteristic, and the second content characteristic comprises: splicing the first content characteristic and the third content characteristic to obtain a positive sample; and splicing at least one of the first content characteristic and the third content characteristic with the second content characteristic to obtain a negative sample.
Wherein content features can be stitched according to channel dimensions. Taking the first content feature and the second content feature as an example, if the first content feature and the second content feature are both 64 × 256, the concatenation of the first content feature and the second content feature is 64 × 512.
It can be understood that the negative samples are used to increase the relative entropy of the positive samples during the classification process by the mutual information discriminator. Therefore, the negative sample only needs to have a certain difference from the positive sample, so that the mutual information discriminator can distinguish the negative sample, and the content features contained in the negative sample are the splicing result of the first content feature and the second content feature, or the splicing result of the third content feature and the second content feature, or the splicing result of the first content feature, the third content feature and the second content feature, so that the relative entropy of the positive sample cannot be influenced.
In this embodiment, the first content feature and the third content feature are spliced to obtain a positive sample, and at least one of the first content feature and the third content feature is spliced with the second content feature to obtain a negative sample, so that the mutual information discriminator can be conveniently trained through the positive sample and the negative sample.
In one embodiment, there is provided a processing method of an image generator, as shown in fig. 9, a training system of the image generator includes: a target domain image generator, a target domain image discriminator, a source domain image generator, a source domain image discriminator, a source domain encoder, a target domain encoder, a source domain decoder, a target domain decoder, and a mutual information discriminator. As shown in fig. 10, the method includes:
step 1002, obtain an image sample set, a target domain image generator, a target domain image discriminator, a source domain image generator, a source domain image discriminator, a source domain encoder, a target domain encoder, a source domain decoder, a target domain decoder, and a mutual information discriminator. The image sample set includes source domain image samples and target domain image samples.
And 1004, generating a target generation image of the source domain image sample in the target domain through the target domain image generator, generating a recovery image of the target generation image in the source domain through the source domain image generator, and constructing the cycle consistency loss of the source domain image generator, the source domain image discriminator, the target domain image generator and the target domain image discriminator.
Step 1006, encoding the source domain image sample into a feature space through a source domain encoder to obtain a first encoding feature of the source domain image sample, where the first encoding feature at least includes a first content feature, encoding the target domain image sample into the feature space through a target domain encoder to obtain a second encoding feature of the target domain image sample, where the second encoding feature at least includes a second content feature, and encoding the target generated image into the feature space through the target domain encoder to obtain a third encoding feature of the target generated image, where the third encoding feature at least includes a third content feature.
Step 1008, mapping, by the source domain decoder, the first coding feature of the source domain image sample to the source domain to obtain a first reconstructed image, mapping, by the target domain decoder, the first coding feature of the source domain image sample to the target domain to obtain a second reconstructed image, constructing a first loss function based on a difference between the source domain image sample and the first reconstructed image, and constructing a second loss function based on a difference between the target generated image and the second reconstructed image.
Step 1010, mapping a third coding feature of the target generated image to a source domain through a source domain decoder to obtain a third reconstructed image; mapping the third coding features of the target generated image to a target domain through a target domain decoder to obtain a fourth reconstructed image; a third loss function is constructed based on the difference between the source domain image samples and the third reconstructed image, and a fourth loss function is constructed based on the difference between the target generated image and the fourth reconstructed image. Step 1012, splicing the first coding feature and the third coding feature to obtain a positive sample, and splicing at least one of the first coding feature and the third coding feature with the second coding feature to obtain a negative sample.
And 1014, constructing a discrimination loss function of the mutual information discriminator according to the cross entropy of the first coding feature and the third coding feature.
Step 1016, inputting the source domain image sample and the target generation image into the target domain image discriminator, inputting the source domain image sample and the recovery image into the source domain image discriminator, and inputting the positive sample and the negative sample into the mutual information discriminator, combining the cycle consistency loss, the first loss function, the second loss function, the third loss function, the fourth loss function and the discrimination loss function, performing iterative dual-confrontation training on the target domain image generator, the target domain image discriminator, the source domain image generator, the source domain image discriminator, the source domain encoder, the target domain encoder, the source domain decoder, the target domain decoder and the mutual information discriminator, iteratively optimizing the discrimination loss function in the confrontation training process, and maximizing the mutual information until the first coding feature only includes the first content feature and the second coding feature only includes the second content feature, The third encoding feature includes only the third content feature and the iteration stop condition is reached.
The device represents mutual information of the first coding feature and the third coding feature through relative entropy of the first coding feature and the third coding feature; the cross entropy is positively correlated with the relative entropy. When the first coding feature only includes the first content feature and the third coding feature only includes the third content feature, mutual information of the first coding feature and the third coding feature, that is, mutual information of the first content feature and the third content feature.
It can be understood that, by the method provided by the present application, image migration from the source domain to the target domain and image migration from the target domain to the source domain can be trained, so that bidirectional image migration and multidirectional image migration are realized.
In the embodiment, through the countertraining between the mutual information discriminator and the image generator, when the image generator transfers the image from the source domain to the target domain, the image of the target domain is consistent with the image of the source domain in content characteristics, so that the image deformation of the target domain is avoided; moreover, when the target generation image is used for training the medical image processing model, the performance of the medical image processing model can be improved.
It will be appreciated that in this embodiment, the computer device constructs a generative confrontation network based on the image generator from the source domain to the target domain to train the image generator in a confrontational training manner. The generative confrontation network comprises two pairs of image generators and discriminators, an X-shaped dual autoencoder and a mutual information discriminator. The X-shaped dual self-encoder comprises a source domain encoder, a source domain decoder, a target domain encoder and a target domain decoder. In the process of the countertraining, an encoder in the X-shaped dual self-encoder learns the capability of removing domain features in the encoding process, a decoder learns the capability of recovering domain information in the decoding process, and a mutual information discriminator learns to maximize mutual information between content features of a source domain image sample and a target generation image. Therefore, the image generator can reduce the deformation of the image content to the maximum extent when the image is generated, and the original image and the generated image have the same content characteristics and do not deform, namely the content characteristics are the same.
In one embodiment, as shown in fig. 11, an image generation method is provided, which is described by taking the method as an example applied to the server 104 in fig. 1, and includes the following steps:
step 1102, acquiring an image to be migrated.
The image to be migrated refers to an image to be subjected to image migration processing.
In one embodiment, the image to be migrated may be a medical image. A medical image is an image of a specific medical field, and refers to an internal tissue image obtained non-invasively with respect to a target object for medical treatment or medical research. Examples of the images include images generated by medical instruments such as fundus images, Computed Tomography (CT) images, Magnetic Resonance Imaging (MRI) images, ultrasound (B-mode ultrasound, color doppler ultrasound, heart color ultrasound, and three-dimensional color ultrasound), X-ray images, electrocardiograms, electroencephalograms, and optical photography.
And 1104, determining a source domain to which the image to be migrated belongs and a target domain to which the image to be migrated belongs.
Wherein, the source domain is used for representing the image domain where the image is located before the image migration. The target domain is used for representing the image domain where the image is located after the image is migrated. The image domains are different, and the image styles are different, which mainly shows that the images have differences in color and brightness.
Step 1106 queries an image generator for migrating images belonging to the source domain to the target domain.
Wherein the image generator is configured to migrate an image from the source domain to the target domain. The image generator may implement one-way image domain migration, two-way image domain migration, and multi-way image domain migration. Unidirectional image domain migration refers to the migration of an image from a first image domain to a second image domain or the migration of an image from the second image domain to the first image domain by an image generator. Bidirectional image migration means that not only an image can be migrated from a first image domain to a second image domain, but also an image can be migrated from the second image domain to the first image domain by the image generator. Multi-directional image domain migration refers to the migration of an image from a source domain to at least two different target domains.
Step 1108, generating a migration image of the image to be migrated in the target domain through the image generator; the content characteristics of the transferred image and the image to be transferred are the same; the image generator is obtained through iterative confrontation training with the mutual information discriminator, and target parameters are iteratively maximized in the confrontation training process; the target parameter is mutual information between the content characteristics of the source domain image sample and the content characteristics of the target generation image; the source domain image sample belongs to a source domain; the target-generated image belongs to a target domain and is generated from the source-domain image samples by an image generator.
The migration image is a real image obtained by migrating the image to be migrated from the source domain to the target domain through the image generator.
The content features may include texture features, shape features, spatial relationship features, and the like of the image. Texture features describe the surface properties of objects (e.g., scenes, people, objects, etc.) to which an image or image region corresponds. The shape features are represented in two types, one is outline features, the other is region features, the outline features mainly aim at the outer boundary of the target, and the region features are related to the whole shape region of the target. The spatial relationship characteristic refers to the mutual spatial position or relative direction relationship among a plurality of targets segmented from the image, and these relationships can be also divided into a connection/adjacency relationship, an overlapping/overlapping relationship, an inclusion/containment relationship, and the like.
The mutual information discriminator is used for performing countermeasure training with the image generator and constraining the image content of the target generation image generated by the image generator so that the target generation image and the source domain image sample are consistent on the image content.
In a specific embodiment, the image Generator may employ a Generator (Generator) of GAN (Generative adaptive Networks), or the like. The mutual information discriminator may be a general discriminator.
It can be understood that, since the image content of the target generated image obtained by the image generator migration may change, that is, the content characteristics of the target generated image and the source domain image sample are different, the first content characteristic of the source domain image sample and the third content characteristic of the target generated image are extracted, and the first content characteristic of the source domain image sample and the third content characteristic of the target generated image are constrained by the mutual information discriminator, so that the source domain image sample and the target generated image are consistent in the image content.
In particular, the mutual information discriminator may be optimized by a classification task. The purpose of introducing the mutual information discriminator is: and constraining the first content features of the source domain image samples and the third content features of the target generation images so that the content features of the source domain image samples and the target generation images are the same. Therefore, the first content feature and the third content feature have no obvious distinguishability, and a training sample with obvious distinguishability needs to be constructed. Then, based on the above concept, the first content feature and the third content feature are used as a training sample, a reference image sample is selected from the source domain or the target domain, a second content feature of the reference image sample is extracted, and another training sample is constructed according to the second content feature and the first content feature, or the second content feature and the third content feature. The content characteristics of the reference image sample are different from those of the source domain image sample and the target generation image.
In a specific embodiment, an image can be randomly extracted from other images of the source domain except for the image sample of the source domain as a reference image sample; or randomly extracting an image from other images of the target domain except the target generated image as a reference image sample.
In the present application, although there is a difference between the first content characteristic and the third content characteristic, the difference is smaller than the difference between the first content characteristic and the second content characteristic, and the difference between the second content characteristic and the third content characteristic. And generating a positive sample according to the first content characteristic and the third content characteristic, and generating a negative sample according to at least one of the first content characteristic and the third content characteristic and the second content characteristic, so that the mutual information discriminator can gradually distinguish the positive sample from the negative sample in the training process.
The positive sample comprises a first content feature, a third content feature and a corresponding training label, and the negative sample comprises the first content feature, the second content feature and the corresponding training label, or the second content feature, the third content feature and the corresponding training label, or the first content feature, the second content feature, the third content feature and the corresponding training label. It is to be understood that "positive" and "negative" are used herein only to distinguish the training samples and do not constitute a limitation on the training labels of the training samples, i.e., it is also possible to generate negative samples according to the first content feature and the third content feature, and generate positive samples according to the second content feature and at least one of the first content feature and the third content feature. In one embodiment, the training label for the positive sample may be "real" and the training label for the negative sample may be "fake".
In a specific embodiment, the first content feature and the third content feature are spliced to obtain a positive sample, and at least one of the first content feature and the third content feature is spliced with the second content feature to obtain a negative sample.
It can be understood that the negative samples are used to increase the relative entropy of the positive samples during the classification process by the mutual information discriminator. Therefore, the negative sample only needs to have a certain difference from the positive sample, so that the mutual information discriminator can distinguish the negative sample, and the content features contained in the negative sample are the splicing result of the first content feature and the second content feature, or the splicing result of the third content feature and the second content feature, or the splicing result of the first content feature, the third content feature and the second content feature, so that the relative entropy of the positive sample cannot be influenced.
In the application, the image generator and the mutual information discriminator are subjected to iterative confrontation training by taking the positive samples and the negative samples as confrontation sample pairs. The confrontation training means that the image generator and the mutual information discriminator form a dynamic 'game process', and the confrontation training mutually confronts and promotes each other.
In a particular embodiment, an adaptive moment estimation (Adam) optimizer may be used to perform parameter optimization on the image generator and the mutual information discriminator during the confrontational training process. In the optimization process, through continuously updating the model parameters and the bias parameters, in each iteration process, the error of the prediction result is calculated and reversely propagated to the model, the gradient is calculated, and the parameters of the model are updated.
Specifically, as training progresses, under the constraint of the mutual information discriminator, the difference of content features between the source domain image sample and the target generation image gradually becomes smaller; under the constraint of an image generator, the mutual information discriminator has higher and higher classification accuracy. And, with the optimization of the parameter of the mutual information discriminator, the mutual information of the first content characteristic and the third content characteristic is gradually increased.
Wherein mutual information is used to characterize interdependencies between variables. The larger the mutual information between the first content feature and the third content feature, the more similar the distribution of the first content feature and the third content feature.
In one embodiment, mutual information of the first content feature and the third content feature is characterized by relative entropy of the first content feature and the third content feature; constructing a discrimination loss function of a mutual information discriminator according to the cross entropy of the first content characteristic and the third content characteristic; the cross entropy is positively correlated with the relative entropy; inputting the positive sample and the negative sample into a mutual information discriminator, performing iterative confrontation training on the image generator and the mutual information discriminator by combining a discriminant loss function, iteratively optimizing the discriminant loss function in the confrontation training process, and maximizing mutual information.
It can be understood that the discriminant loss function of the mutual information discriminator is constructed according to the cross entropy of the first content feature and the third content feature. And the discrimination loss function is used for optimizing the parameters of the mutual information discriminator according to the error of the prediction result. And the cross entropy is related to the relative entropy as follows:
H(Za;Zab)=H(Za)+I(DV)(Za;Zab)
wherein, H (Z)a;Zab) Is the cross entropy of the first content feature and the third content feature, H (Z)a) Information entropy, I, for the first and third content characteristics(DV)(Za;Zab) DV (Donsker-Varadhan) representation of KL (Kullback-Leibler) divergence, which is the relative entropy of the first content feature and the third content feature.
Since the information entropy of the first content feature and the third content feature is fixed, the cross entropy is positively correlated with the relative entropy. In the process of confrontation training, parameters of the mutual information discriminator are gradually optimized, the cross entropy is smaller and smaller, and the relative entropy is smaller and smaller. The relative entropy can measure the similarity between two distributions, and the smaller the relative entropy is, the more similar the two distributions are, and the larger the mutual information between the two distributions is. Then, based on the above concept, mutual information of the first content feature and the third content feature can be characterized by relative entropy of the first content feature and the third content feature, so that mutual information of the first content feature and the third content feature becomes larger and larger during the countermeasure training.
In one embodiment, a source domain encoder and a target domain encoder are obtained; encoding the source domain image sample into a feature space through a source domain encoder to obtain a first content feature of the source domain image sample; and coding the reference image sample into the feature space through the target domain coder to obtain a second content feature of the reference image sample, and coding the target generated image into the feature space through the target domain coder to obtain a third content feature of the target generated image.
The source domain encoder is used for extracting features from an image of a source domain, and embedding the extracted features into a feature space to obtain a feature vector. The target domain encoder is used for extracting features from the image of the target domain, and embedding the extracted features into the same feature space to obtain feature vectors. The feature space is used for storing feature vectors.
In a specific embodiment, the source domain encoder and the target domain encoder may be general encoders.
In one embodiment, positive and negative samples are input to a mutual information discriminator, and the image generator, the source domain encoder, the target domain encoder, and the mutual information discriminator are iteratively confronted, and mutual information of the first content feature and the third content feature is iteratively maximized during the confrontation training until an iteration stop condition is reached.
Specifically, as shown in fig. 5, fig. 5 is a block diagram of a training system of an image generator in one embodiment. And Ia is a source domain image sample, Ib is a target generation image, Ic is a reference image sample, Za is a first content characteristic, Zb is a third content characteristic, and Zc is a second content characteristic. In the countercheck training process, the image generator, the source domain encoder, the target domain encoder and the mutual information discriminator form a dynamic game process, and the game process is counterchecked and mutually promoted.
Specifically, as training progresses, under the constraints of a source domain encoder, a target domain encoder and a mutual information discriminator, the image generator generates a target generation image and a source domain image sample, and the difference of content features between the target generation image and the source domain image sample gradually becomes smaller; the source domain encoder extracts the features of the source domain image sample more and more accurately under the constraints of the image generator, the target domain encoder and the mutual information discriminator; under the constraints of the image generator, the source domain encoder and the mutual information discriminator, the target domain encoder extracts the characteristics of the target generated image and the reference image sample more and more accurately; under the constraints of an image generator, a source domain encoder and a target domain encoder, a mutual information discriminator has higher and higher classification accuracy.
In one embodiment, a source domain decoder and a target domain decoder are obtained; mapping a first content characteristic of a source domain image sample to a source domain through a source domain decoder to obtain a first reconstructed image; mapping the first content characteristics of the source domain image sample to a target domain through a target domain decoder to obtain a second reconstructed image; constructing a first loss function based on a difference between the source domain image sample and the first reconstructed image, and constructing a second loss function based on a difference between the target generated image and the second reconstructed image; inputting the positive sample and the negative sample into a mutual information discriminator, performing iterative confrontation training on the image generator and the mutual information discriminator, and iteratively maximizing mutual information of the first content characteristic and the third content characteristic in the confrontation training process until an iteration stop condition is reached, wherein the method comprises the following steps: inputting the positive sample and the negative sample into a mutual information discriminator, combining the first loss function and the second loss function, performing iterative confrontation training on the image generator, the source domain encoder, the target domain encoder, the source domain decoder, the target domain decoder and the mutual information discriminator, and iteratively maximizing mutual information of the first content characteristic and the third content characteristic in the confrontation training process until an iteration stop condition is reached.
It will be appreciated that when the source domain encoder encodes the first content feature into the feature space, the source domain encoder may encode the source domain information into the feature space together; similarly, when the target encoder encodes the second content feature and the third content feature into the feature space, the target domain encoder may encode the target domain information into the feature space together. The source domain information and the target domain information may affect the training efficiency of the image generator to a certain extent. Based on this, the source domain information can be separated from the feature space by the source domain decoder, and the target domain information can be separated from the feature space by the target domain decoder.
The source domain decoder is used for reconstructing the feature vector obtained according to the source domain encoder or the target domain encoder to a source domain; and the target domain decoder is used for reconstructing the feature vector obtained by the source domain encoder or the target domain encoder to a target domain. The first reconstructed image is a real image obtained by reconstructing the first content characteristics to a source domain through a source domain decoder; the second reconstructed image is a real image obtained by reconstructing the first content features to the target domain through the target domain decoder. A first loss function is used to reduce the L1 norm between the source domain image samples and the first reconstructed image; the second loss function is used to reduce the L1 norm between the target generated image and the second reconstructed image.
Specifically, as shown in fig. 6, fig. 6 is a block diagram of a training system of an image generator in one embodiment. Wherein Ia is a source domain image sample, Ib is a target generation image, Za is a first content feature, Zaa is a first reconstruction image, and Zab is a second reconstruction image. When the source domain information is not present in the feature space, the source domain image samples may be infinitely close to the first reconstructed image and the target generated image may be infinitely close to the second reconstructed image. Taking the target generated image and the second reconstructed image as an example, the second reconstructed image is also an image of the source domain image sample corresponding to the target domain, and is only different from the generation mode of the target generated image, and the target generated image does not carry active domain information, so that when the source domain information does not exist in the feature space, the second reconstructed image is in wireless proximity to the target generated image. Therefore, a first loss function is constructed based on the difference between the source domain image sample and the first reconstructed image, and the difference between the source domain image sample and the first reconstructed image is reduced through the first loss function, so that the source domain decoder learns the source domain information; and constructing a second loss function based on a difference between the target generated image and the second reconstructed image, and reducing the difference between the target generated image and the second reconstructed image through the second loss function so that the target domain decoder learns the target domain information. In one embodiment, the third content feature of the target generated image may be mapped to the source domain by the source domain decoder, resulting in a third reconstructed image; mapping the third content characteristics of the target generated image to a target domain through a target domain decoder to obtain a fourth reconstructed image; constructing a third loss function based on the difference between the source domain image sample and the third reconstructed image, and constructing a fourth loss function based on the difference between the target generation image and the fourth reconstructed image; and inputting the positive sample and the negative sample into a mutual information discriminator, and combining a third loss function and a fourth loss function.
The third reconstructed image is a real image obtained by reconstructing the third content characteristics to the source domain through a source domain decoder; the fourth reconstructed image is a real image obtained by reconstructing the third content feature to the target domain through the target domain decoder. A third loss function is used to reduce the L1 norm between the source domain image samples and the third reconstructed image; the fourth loss function is used to reduce the L1 norm between the target generated image and the fourth reconstructed image.
Specifically, as shown in fig. 7, fig. 7 is a block diagram of a training system of an image generator in another embodiment. Wherein Ia is a source domain image sample, Ib is a target generated image, Zb is a third content feature, Zba is a third reconstructed image, and Zbb is a fourth reconstructed image. When the target domain information does not exist in the feature space, the source domain image sample can be infinitely close to the third reconstructed image, and the target generation image is infinitely close to the fourth reconstructed image. Therefore, a third loss function is constructed based on the difference between the source domain image sample and the third reconstructed image, and the difference between the source domain image sample and the third reconstructed image is reduced through the third loss function, so that the source domain decoder learns the source domain information; and constructing a fourth loss function based on the difference between the target generation image and the fourth reconstruction image, and reducing the difference between the target generation image and the fourth reconstruction image through the fourth loss function so that the target domain decoder learns the target domain information.
In one embodiment, the extracting the first content feature of the source domain image sample, the second content feature of the reference image sample, and the third content feature of the target generation image respectively includes: acquiring a source domain encoder and a target domain encoder; encoding the source domain image sample into a feature space through a source domain encoder to obtain a first encoding feature of the source domain image sample; the first encoding characteristic comprises at least a first content characteristic; encoding the reference image sample into a feature space through a target domain encoder to obtain a second encoding feature of the reference image sample; the second coding features at least comprise second content features, and the target generated image is coded into a feature space through a target domain coder to obtain third coding features of the target generated image; the third encoding characteristic includes at least a third content characteristic.
The processing method of the image generator further comprises: acquiring a source domain decoder and a target domain decoder; mapping a first coding characteristic of a source domain image sample to a source domain through a source domain decoder to obtain a first reconstructed image; mapping the first coding characteristics of the source domain image sample to a target domain through a target domain decoder to obtain a second reconstructed image; a first loss function is constructed based on differences between the source domain image samples and the first reconstructed image, and a second loss function is constructed based on differences between the target generated image and the second reconstructed image.
Inputting the positive sample and the negative sample into a mutual information discriminator, performing iterative confrontation training on the image generator and the mutual information discriminator, and iteratively maximizing mutual information of the first content characteristic and the third content characteristic in the confrontation training process until an iteration stop condition is reached, wherein the method comprises the following steps: inputting the positive sample and the negative sample into a mutual information discriminator, combining a first loss function and a second loss function, carrying out iterative confrontation training on the image generator, the source domain encoder, the target domain encoder, the source domain decoder, the target domain decoder and the mutual information discriminator, iteratively maximizing mutual information of the first content characteristic and the third content characteristic in the confrontation training process, and until the first coding characteristic only comprises the first content characteristic, the second coding characteristic only comprises the second content characteristic, the third coding characteristic only comprises the third content characteristic and an iteration stop condition is reached.
It is understood that in the early stages of iterative counter-training, the coding features resulting from the encoder coding include content features and domain features. For example, the source domain encoder encodes the source domain image samples to obtain a first encoding characteristic, which includes a first content characteristic and a source domain characteristic; for another example, the target domain encoder encodes the target-generating image to obtain a third encoding characteristic, which includes a third content characteristic and a target domain characteristic. In an iterative confrontation training process, the decoder learns to recover the domain features during the decoding process by simultaneously minimizing the first loss function and the second loss function such that the encoder learns to perform content feature dissociation, i.e., to remove the domain features, during the encoding process. In this way, at the later stage of the iterative confrontation training, the coding features obtained by the coding of the coder can only include the content features, and can be better used for the mutual information discriminator to maximize the mutual information between the content features. It should be noted that the encoding features, including only the content features, refer to the target state, allowing for errors that are tolerable in reality. Also, since the mapping between the source domain and the target domain is symmetrical, the mapping from the target domain to the source domain, the content feature dissociation during encoding, and the feature recovery during decoding can be similar to the previous embodiment.
In a specific embodiment, the source domain decoder and the target domain decoder can both adopt a universal decoder.
Specifically, in the countermeasure training process, the image generator, the source domain encoder, the target domain encoder, the source domain decoder, the target domain decoder and the mutual information discriminator form a dynamic game process, and the game process is mutually confronted and mutually promoted. With the training, the difference of content features between a target generated image generated by the image generator and a source domain image sample is gradually reduced, the source domain encoder extracts the features of the source domain image sample more and more accurately, the target domain encoder extracts the features of the target generated image and a reference image sample more and more accurately, the source domain decoder gradually separates source domain information from a feature space, and the target domain decoder gradually separates target domain information from the feature space, so that the mutual information discriminator has higher and higher classification accuracy.
In a particular embodiment, the second content feature may be from an image sample of the target domain while generating positive samples from the first content feature and the third content feature and generating negative samples from the first content feature and the second content feature. The first content characteristic comes from the source domain, the third content characteristic comes from the target domain, and if the second content characteristic also comes from the target domain, the interference of the domain information in the training process can be reduced. Likewise, the second content feature may be from an image sample of the source domain when generating the negative sample from the second content feature and the third content feature.
In one embodiment, an image discriminator is obtained; inputting a source domain image sample and a reference image sample into an image discriminator, inputting a positive sample and a negative sample into a mutual information discriminator, performing iterative dual-countercheck training on an image generator, the image discriminator and the mutual information discriminator, and iteratively maximizing mutual information in the countercheck training process until an iteration stop condition is reached.
Wherein the image discriminator is used for carrying out countermeasure training with the image generator.
It can be understood that because the quality of the image generated by the image generator lacks constraints, an image discriminator is introduced to improve the image quality of the target generated image based on the generation of the antagonistic frame. Fig. 8 is a block diagram showing a structure of a training system of an image generator in still another embodiment, as shown in fig. 8. The image generator generates a target generation image according to the source domain image sample, the source domain image sample and the target generation image are input into the image discriminator, and the image discriminator distinguishes the source domain image sample and the target generation image, so that the image generator and the image discriminator are trained together in continuous confrontation.
Specifically, as training progresses, under the constraint of the image discriminator and the mutual information discriminator, the difference of content features between the source domain image sample and the target generation image gradually becomes smaller; under the constraint of the image generator and the image discriminator, the mutual information discriminator has higher and higher classification accuracy.
In a specific embodiment, the image Discriminator may use a GAN (generic adaptive Networks) Discriminator (Discriminator), or the like.
In one embodiment, the image generator is a target domain image generator; the image discriminator is a target domain image discriminator; acquiring a source domain image generator and a source domain image discriminator; generating a restored image of the target generation image in the source domain by the source domain image generator; constructing cycle consistency loss of a source domain image generator, a source domain image discriminator, a target domain image generator and a target domain image discriminator; inputting a source domain image sample and a target generation image into a target domain image discriminator, inputting the source domain image sample and a recovery image into the source domain image discriminator, and inputting a positive sample and a negative sample into a mutual information discriminator, performing iterative dual countercheck training on the source domain image generator, the source domain image discriminator, the target domain image generator, the target domain image discriminator and the mutual information discriminator by combining cycle consistency loss, and iteratively maximizing mutual information in a countercheck training process until an iteration stop condition is reached.
The target domain image generator is used for migrating an image from the source domain to the target domain; the target domain image discriminator is used for discriminating the image generated by the target domain image generator; the source domain image generator is used for migrating an image from the target domain to the source domain; the source domain image discriminator is used for discriminating the image generated by the source domain image generator. The restored image is a real image of the generated target generation image in the source domain by the source domain image generator. The cycle consistency loss comprises a loss calculated by the target domain image discriminator and a loss calculated by the source domain image discriminator, and is used for optimizing parameters of the source domain image generator, the source domain image discriminator, the target domain image generator and the target domain image discriminator.
It can be understood that due to the lack of constraints on the quality of the image generated by the target domain image generator, the source domain image generator, the target domain image discriminator and the source domain image discriminator are introduced, and the quality of the target generated image is improved based on the generation countermeasure framework. As shown in fig. 9, fig. 9 is a block diagram of a training system of an image generator in a further embodiment. The target domain image generator generates a target generation image according to the source domain image sample, the source domain image sample and the target generation image are input into a target domain image discriminator, and the target domain image discriminator distinguishes the source domain image sample and the target generation image; and generating a restored image of the target generated image in the source domain by a source domain image generator, inputting the source domain image sample and the restored image into a source domain image discriminator, and distinguishing the source domain image sample from the restored image by the source domain image discriminator. In this way, the target domain image generator is trained with the source domain image generator, the target domain image discriminator, and the source domain image discriminator in a continuous confrontation.
Specifically, as training progresses, under the constraint of the source domain image generator, the target domain image discriminator, the source domain image discriminator and the mutual information discriminator, the difference of content features between the source domain image sample and the target generation image gradually becomes smaller in the target domain image generator; and the mutual information discriminator has higher and higher classification accuracy under the constraint of the target domain image generator, the source domain image generator, the target domain image discriminator and the source domain image discriminator.
In a specific embodiment, the target domain image Generator and the source domain image Generator may adopt a Generator (Generator) of CycleGAN, etc.; the target domain image Discriminator and the source domain image Discriminator may employ a CycleGAN Discriminator (Discriminator), or the like.
The image generation method comprises the steps of obtaining an image to be migrated, determining a source domain to which the image to be migrated belongs and a target domain to which the image to be migrated belongs, inquiring an image generator for migrating the image belonging to the source domain to the target domain, generating a migration image of the image to be migrated in the target domain through the image generator, wherein the content characteristics of the migration image and the image to be migrated are the same, the image generator is obtained through iterative countermeasure training with a mutual information discriminator, and iteratively maximizes a target parameter in the countermeasure training process, the target parameter is mutual information between the content characteristics of an image sample in the source domain and the content characteristics of an image generated by the target, the image sample in the source domain belongs to the source domain, and the image generated by the image generator belongs to the target domain and is generated according to the image sample in the source domain. In this way, through the countertraining between the mutual information discriminator and the image generation model, when the image generator transfers the image from the source domain to the target domain, the image of the target domain is consistent with the image of the source domain in content characteristics, so that the image deformation of the target domain is avoided.
In one embodiment, the method further comprises: acquiring a training sample set; the training sample set comprises a first medical image obtained according to a first imaging condition and a second medical image obtained according to a second imaging condition; the image domains of the medical images obtained under different imaging conditions are different; taking the first medical image as an image to be migrated; the source domain is an image domain to which the first medical image belongs; the target domain is an image domain to which the second medical image belongs; acquiring a second medical image and a migration image as an updated training sample set; and training the medical image processing model according to the updated training sample set.
The medical image processing model is a machine learning model for realizing a target function. The target function may specifically be a function of classifying the medical image, a function of segmenting the medical image, or the like. The imaging conditions may be environmental conditions such as ambient brightness, lighting, etc.; the parameters of the imaging device, such as the same fundus camera, can also be different in imaging conditions, and the obtained fundus images belong to different image domains.
It can be understood that, when the medical image processing model is trained by the training sample set, the performance of the medical image processing model is reduced due to the fact that image domains to which the image samples of the training sample set belong are different. The image samples of the training sample set are transferred to the same image domain through the image generation model, so that the image samples of the medical image processing model are in the same image domain, and the performance of the medical image processing model is improved.
In this embodiment, a first medical image obtained according to a first imaging condition and a second medical image obtained according to a second imaging condition are obtained, image domains of the medical images obtained under different imaging conditions are different, the first medical image is used as an image to be migrated, a source domain is the image domain to which the first medical image belongs, a target domain is the image domain to which the second medical image belongs, the second medical image and the migrated image are obtained as an updated training sample set, and a medical image processing model is trained according to the updated training sample set, so that the performance of the medical image processing model is improved.
In one embodiment, training a medical image processing model based on an updated training sample set includes: determining a task type of the medical image processing model; determining a training label which corresponds to each training sample in the updated training sample set and is matched with the task type; and training the medical image processing model according to the training samples and the corresponding training labels of the training samples.
The task type may include a task of classifying the medical image, a task of segmenting the medical image, and the like.
In a specific embodiment, the classification task may be to classify an imaging region corresponding to the medical image. The segmentation task may be to segment discs (optical disc) and cups (optical cups). The optic disc is the central yellow portion of the retina, which is the entry point into the output blood vessels of the retina. The optic cup is a bright central depression of variable size present on the optic disc.
In this embodiment, the task type of the medical image processing model is determined, the training labels corresponding to the training samples and matching with the task type in the updated training sample set are determined, and the medical image processing model is trained according to the training samples and the training labels corresponding to the training samples, so that the performance of the medical image processing model is improved.
It should be understood that, although the steps in the flowcharts of fig. 3, 10, 11 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 3, 10, and 11 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the steps or stages in other steps.
In one embodiment, as shown in fig. 12, there is provided a processing apparatus of an image generator, which may be a part of a computer device by using a software module or a hardware module, or a combination of the two, the apparatus specifically includes: an obtaining module 1202, a generating module 1204, an extracting module 1206, and an inputting module 1208, wherein:
an obtaining module 1202, configured to obtain an image sample set, an image generator, and a mutual information discriminator; the image sample set comprises source domain image samples and reference image samples;
a generating module 1204, configured to generate, by the image generator, a target generation image of the source domain image sample in the target domain;
an extracting module 1206, configured to extract a first content feature of the source domain image sample, a second content feature of the reference image sample, and a third content feature of the target generation image, respectively;
the generating module 1204 is further configured to generate a positive sample according to the first content feature and the third content feature, and generate a negative sample according to at least one of the first content feature and the third content feature and the second content feature;
an input module 1208, configured to input the positive sample and the negative sample into the mutual information discriminator, perform iterative confrontation training on the image generator and the mutual information discriminator, and iteratively maximize mutual information of the first content feature and the third content feature in the confrontation training process until an iteration stop condition is reached.
In one embodiment, the input module 1208 is further configured to: characterizing mutual information of the first content feature and the third content feature by relative entropy of the first content feature and the third content feature; constructing a discrimination loss function of a mutual information discriminator according to the cross entropy of the first content characteristic and the third content characteristic; the cross entropy is positively correlated with the relative entropy; inputting the positive sample and the negative sample into a mutual information discriminator, performing iterative confrontation training on the image generator and the mutual information discriminator by combining a discriminant loss function, iteratively optimizing the discriminant loss function in the confrontation training process, and maximizing mutual information.
In one embodiment, the extraction module 1206 is further configured to: acquiring a source domain encoder and a target domain encoder; encoding the source domain image sample into a feature space through a source domain encoder to obtain a first content feature of the source domain image sample; and coding the reference image sample into the feature space through the target domain coder to obtain a second content feature of the reference image sample, and coding the target generated image into the feature space through the target domain coder to obtain a third content feature of the target generated image.
In one embodiment, the input module 1208 is further configured to: inputting the positive sample and the negative sample into a mutual information discriminator, carrying out iterative confrontation training on the image generator, the source domain encoder, the target domain encoder and the mutual information discriminator, and iteratively maximizing the mutual information of the first content characteristic and the third content characteristic in the confrontation training process until an iteration stop condition is reached.
In one embodiment, the obtaining module 1202 is further configured to: acquiring a source domain decoder and a target domain decoder; the processing means of the image generator further comprise a mapping module for: mapping a first content characteristic of a source domain image sample to a source domain through a source domain decoder to obtain a first reconstructed image; mapping the first content characteristics of the source domain image sample to a target domain through a target domain decoder to obtain a second reconstructed image; a build module to: constructing a first loss function based on a difference between the source domain image sample and the first reconstructed image, and constructing a second loss function based on a difference between the target generated image and the second reconstructed image; an input module 1208, further configured to: inputting the positive sample and the negative sample into a mutual information discriminator, combining the first loss function and the second loss function, performing iterative confrontation training on the image generator, the source domain encoder, the target domain encoder, the source domain decoder, the target domain decoder and the mutual information discriminator, and iteratively maximizing mutual information of the first content characteristic and the third content characteristic in the confrontation training process until an iteration stop condition is reached.
In one embodiment, the obtaining module 1202 is further configured to: acquiring an image discriminator; an input module 1208, further configured to: inputting a source domain image sample and a reference image sample into an image discriminator, inputting a positive sample and a negative sample into a mutual information discriminator, performing iterative dual-countercheck training on an image generator, the image discriminator and the mutual information discriminator, and iteratively maximizing mutual information in the countercheck training process until an iteration stop condition is reached.
In one embodiment, the image generator is a target domain image generator; the image discriminator is a target domain image discriminator; an obtaining module 1202, further configured to: acquiring a source domain image generator and a source domain image discriminator; the generating module 1204 is further configured to: generating a restored image of the target generation image in the source domain by the source domain image generator; a build module further configured to: constructing cycle consistency loss of a source domain image generator, a source domain image discriminator, a target domain image generator and a target domain image discriminator; an input module 1208, further configured to: inputting a source domain image sample and a target generation image into a target domain image discriminator, inputting the source domain image sample and a recovery image into the source domain image discriminator, and inputting a positive sample and a negative sample into a mutual information discriminator, performing iterative dual countercheck training on the source domain image generator, the source domain image discriminator, the target domain image generator, the target domain image discriminator and the mutual information discriminator by combining cycle consistency loss, and iteratively maximizing mutual information in a countercheck training process until an iteration stop condition is reached.
In one embodiment, the generating module 1204 is further configured to: splicing the first content characteristic and the third content characteristic to obtain a positive sample; and splicing at least one of the first content characteristic and the third content characteristic with the second content characteristic to obtain a negative sample.
For specific limitations of the processing means of the image generator, reference may be made to the above limitations of the processing method of the image generator, which are not described herein again. The respective modules in the processing means of the above-described image generator may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
The processing device of the image generator acquires a source domain image sample, a reference image sample, an image generator and a mutual information discriminator, generating a target generation image of the source domain image sample in the target domain by an image generator, respectively extracting a first content feature of the source domain image sample, a second content feature of the reference image sample and a third content feature of the target generation image, generating a positive sample according to the first content feature and the third content feature, generating a negative sample according to at least one of the first content feature and the third content feature and the second content feature, inputting the positive sample and the negative sample into a mutual information discriminator, and performing iterative countermeasure training on the image generator and the mutual information discriminator, and iteratively maximizing the mutual information of the first content characteristic and the third content characteristic in the countermeasure training process until an iteration stop condition is reached. In this way, through the countertraining between the mutual information discriminator and the image generator, when the image generator transfers the image from the source domain to the target domain, the image of the target domain is consistent with the image of the source domain in content characteristics, so that the image deformation of the target domain is avoided; moreover, when the target generation image is used for training the medical image processing model, the performance of the medical image processing model can be improved.
In one embodiment, as shown in fig. 13, there is provided an image generating apparatus, which may be a part of a computer device using a software module or a hardware module, or a combination of the two, the apparatus specifically includes: an obtaining module 1302, a determining module 1304, an inquiring module 1306, and a generating module 1308, wherein:
an obtaining module 1302, configured to obtain an image to be migrated;
a determining module 1304, configured to determine a source domain to which an image to be migrated belongs and a target domain to which the image to be migrated belongs;
a query module 1306 for querying an image generator for migrating images belonging to a source domain to a target domain;
a generating module 1308, configured to generate, by the image generator, a migration image of the image to be migrated in the target domain; the content characteristics of the transferred image and the image to be transferred are the same;
the image generator is obtained through iterative confrontation training with the mutual information discriminator, and target parameters are iteratively maximized in the confrontation training process; the target parameter is mutual information between the content characteristics of the source domain image sample and the content characteristics of the target generation image; the source domain image sample belongs to a source domain; the target-generated image belongs to a target domain and is generated from the source-domain image samples by an image generator.
In one embodiment, the obtaining module 1302 is further configured to: acquiring a training sample set; the training sample set comprises a first medical image obtained according to a first imaging condition and a second medical image obtained according to a second imaging condition; the image domains of the medical images obtained under different imaging conditions are different; taking the first medical image as an image to be migrated; the source domain is an image domain to which the first medical image belongs; the target domain is an image domain to which the second medical image belongs; an obtaining module 1302, configured to: acquiring a second medical image and a migration image as an updated training sample set; the image generation apparatus further comprises a training module configured to: and training the medical image processing model according to the updated training sample set.
In one embodiment, the training module is further configured to: determining a task type of the medical image processing model; determining a training label which corresponds to each training sample in the updated training sample set and is matched with the task type; and training the medical image processing model according to the training samples and the corresponding training labels of the training samples.
For specific limitations of the image generation apparatus, reference may be made to the above limitations of the image generation method, which are not described herein again. The respective modules in the image generating apparatus described above may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
The image generation device acquires an image to be migrated, determines a source domain to which the image to be migrated belongs and a target domain to which the image to be migrated belongs, queries an image generator for migrating the image belonging to the source domain to the target domain, generates a migration image of the image to be migrated in the target domain through the image generator, and the migration image has the same content characteristics as the image to be migrated, wherein the image generator is obtained through iterative countermeasure training with a mutual information discriminator, and iteratively maximizes a target parameter in the countermeasure training process, the target parameter is mutual information between content characteristics of an image sample in the source domain and content characteristics of an image generated by the target, the image sample in the source domain belongs to the source domain, and the image generated by the image generator belongs to the target domain and is generated according to the image sample in the source domain. In this way, through the countertraining between the mutual information discriminator and the image generator, when the image generator transfers the image from the source domain to the target domain, the image of the target domain is consistent with the image of the source domain in content characteristics, so that the image deformation of the target domain is avoided; moreover, when the target generation image is used for training the medical image processing model, the performance of the medical image processing model can be improved.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 14. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing training data and/or image generation data of the image generator. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a processing method of an image generator and/or an image generation method.
Those skilled in the art will appreciate that the architecture shown in fig. 14 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.