CN111862174B

CN111862174B - Cross-modal medical image registration method and device

Info

Publication number: CN111862174B
Application number: CN202010652606.9A
Authority: CN
Inventors: 李秀; 徐哲; 马露凡; 罗凤; 严江鹏
Original assignee: Shenzhen International Graduate School of Tsinghua University
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2023-10-03
Anticipated expiration: 2040-07-08
Also published as: CN111862174A

Abstract

A cross-modality medical image registration method, comprising: providing a training set comprising a floating image of a first modality and a reference image of a second modality; inputting the floating image into an image conversion network, and converting the floating image into a converted image of a second mode; inputting the floating image and the reference image into a cross-modal flow sub-network to output a first deformation field; inputting the converted image and the reference image into a single-mode flow sub-network to output a second deformation field; inputting the first deformation field and the second deformation field into a deformation field fusion network to output a final deformation field; inputting the floating image and the final deformation field into a space transformation network to obtain a floating image after the final deformation field is distorted and transformed; obtaining a first total loss function according to the transformed floating image and the reference image, and performing supervision training on the network by taking the minimized first total loss function as a target; and inputting the images to be registered into a trained network to obtain registered images. The invention can greatly improve the effect of cross-modal medical image registration.

Description

Cross-modal medical image registration method and device

Technical Field

The invention relates to the technical field of medical image registration, in particular to a cross-mode medical image registration method and device.

Background

Medical image registration is an optimization process that aligns a floating image with a reference image based on the appearance of the medical image, with the goal of finding the best spatial transformation that aligns the region of interest in the input image. As a key technique for image guided therapy, medical image registration attempts to establish anatomical correspondence between different medical images is applied to a plurality of clinical scenarios of endoscopy, disease diagnosis, surgical guidance, radiation therapy, etc. Medical image registration is a widely studied topic and can be classified into single-modality registration and cross-modality registration according to the type of image to be registered, and into rigid registration, affine registration and deformable registration according to the type of registration transformation.

The traditional registration method solves the optimal transformation by iteratively optimizing the image similarity index, and has low calculation efficiency. Therefore, the subsequent technology introduces a deep learning method into a medical image registration task, and utilizes a deep neural network to directly estimate the deformable transformation of the input image pair, thereby effectively balancing the registration precision and the calculation efficiency. However, acquiring the real deformation field (Ground Truth) and the three-dimensional segmentation labels is very challenging and costly, so the registration method is focused on registration network learning under the condition of no supervision.

Existing cross-modality medical image registration techniques can be divided into two main categories: 1) Modifying the loss function of the existing single-mode registration, and designing a similarity measure of a cross-mode image to guide the unsupervised deformable registration network learning; 2) The M2U (Multimodal to Unimodal) registration method based on cross-modal image conversion is used for converting cross-modal image registration into a single-mode registration task by means of the existing image conversion technology. The following description will be made respectively:

(1) Registration technology based on cross-modal image similarity measurement

Such methods directly perform different modality image registration tasks based on cross-modality image similarity loss. Because of the large appearance differences between cross-modality images, traditional single-modality image similarity metrics are mostly no longer suitable for cross-modality registration tasks. Therefore, there is a need to design efficient cross-modality image similarity loss for guiding the training of an unsupervised cross-modality registration network. To overcome this challenge, mattias et al propose a modality independent domain descriptor MIND based on the concept of image self-similarity. MIND has higher robustness to the remarkable differences among different modes, and can effectively characterize the similarity of cross-mode images.

A representative method of such a technique is the VoxelMorph framework+bond similarity measure. The method uses a cross-modal similarity metric MIND as a loss function, directly expands and applies the MIND as a loss function to a typical unsupervised registration frame VoxelMorph for guiding a network according to a multi-modal input image Learning the deformable mapping relation. Network architecture of VoxelMorph as shown in fig. 1, voxelMorph is an unsupervised deformable registration framework based on convolutional neural networks (Convolutional Neural Network, CNN). The deep convolution registration network cascades UNet and space transformation network structure, takes a floating Image (M) and a reference Image (F) to be registered as input, and passes through the registration network g _θ (F, M) learning deformable mapping between input images, and outputting a high-dimensional deformation field phi. The transformed floating image warp (phi) is obtained by spatially warping the floating image M based on the estimated deformation field phi. The loss function of the entire network consists of two parts: 1) Similarity loss between the transformed floating image warp (phi) and the reference image F; 2) The regularization loss of the deformation field phi is estimated smoothly. The cross-modal image registration technology based on VoxelMorph+MIND inputs the cross-modal image pair to be registered into a VoxelMorph network, MIND is used for calculating cross-modal image similarity loss for supervising parameter training, and deformable registration of the three-dimensional cross-modal image is realized.

(2) M2U registration technology based on cross-modal image conversion

The cross-modal medical image registration technology is completed by means of an image conversion method, and the core technical thought is to convert complex cross-modal medical image registration into a simpler single-mode registration task. The overall flow of the cross-mode registration method based on image conversion is as follows:

1) Aiming at the cross-modal medical image data, a cross-modal image conversion network is constructed, and the aim is to learn the mapping relation between images of different modes under the condition of no paired data. The generation of a countermeasure network (GAN) typified by a Cycle-GAN is a typical image conversion network. The cross-modal image conversion process based on the Cycle-GAN network is shown in fig. 2a to 2 c. To achieve mutual mapping of images between two image fields X and Y, the Cycle-GAN network comprises two field mapping networks (i.e. generators) and two associated discriminators, as shown in fig. 2 a. The generator G is responsible for mapping an image from image domain X to image domain Y, i.e., G: X-Y; the generator F is responsible for mapping the image from image domain Y to image domain X, i.e. F: Y.fwdarw.X. The discriminator Dx is used to distinguish between a real image from the image domain X and an image converted by the generator F; the same principle Dy is used to distinguish between a real image from the image domain Y and an image converted by the generator G. Fig. 2b shows a process of mapping an image from an original domain X to a target domain Y using a generator G, and then back to the original domain X using a discriminator Dy to discriminate a real image on the image domain Y and generate an image and calculate a contrast loss; fig. 2c shows a process of mapping an image from the original domain Y to the target domain X using the generator F, and then back to the original domain Y using the generator G, which uses the discriminator Dx to discriminate between a real image on the image domain X and a generated image and calculate a contrast loss. To ensure that the domain map transforms G and F are bi-directional reciprocal, cycle-GAN adds a cyclic-consistency loss (cyclic-consistency loss) based on the arbiter's counter-loss. The Cycle-GAN uses a pair of generated sub-networks to estimate mapping between images, judges whether the generated images are true or false by the sub-networks, and monitors network training together by adopting antagonism loss and Cycle consistency loss;

2) Based on the mapping relation learned by the Cycle-GAN network, the conversion of the image from one mode to another mode is completed, the input cross-mode image is converted into a single-mode image, and the problem is simplified into single-mode image registration;

3) For the single-mode images obtained above, a learning-based unsupervised deformable registration framework is constructed, deformable mapping between single-mode input images is learned by using a deep convolution registration network, and spatial warping transformation is performed on the floating images by means of an STN module according to estimated deformation fields (Deformable Fields, DFs) so that the similarity between the transformed floating images and the reference images reaches the maximum value.

However, the cross-modality image registration technique and the M2U registration technique based on voxelmorph+bond generally cannot achieve a satisfactory registration effect.

The foregoing background is only for the purpose of facilitating an understanding of the principles and concepts of the application and is not necessarily in the prior art to the present application and is not intended to be used as an admission that such background is not entitled to antedate such novelty and creativity by virtue of prior application or that it is already disclosed at the date of filing of this application.

Disclosure of Invention

In order to solve the technical problems, the invention provides a cross-modal medical image registration method and device, which can greatly improve the accuracy and the robustness of cross-modal medical image registration.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

one embodiment of the invention discloses a cross-modal medical image registration method, which comprises the following steps:

s1: providing a training set comprising a floating image of a first modality and a reference image of a second modality;

s2: inputting the floating image into an image conversion network to convert the floating image from a first mode to a second mode, and outputting a converted image of the second mode;

s3: inputting the floating image and the reference image into a cross-modal flow sub-network, and outputting a first deformation field;

s4: inputting the converted image and the reference image into a single-mode flow sub-network, and outputting a second deformation field;

s5: inputting the first deformation field and the second deformation field into a deformation field fusion network so as to superimpose the first deformation field and the second deformation field and output a final deformation field;

s6: inputting the floating image and the final deformation field into a space transformation network to obtain a floating image after the final deformation field is distorted and transformed;

S7: comparing the transformed floating image with the reference image to obtain a first total loss function, repeatedly executing the steps S2-S7 to train the cross-modal flow sub-network, the single-modal flow sub-network, the deformation field fusion network and the space transformation network with the aim of minimizing the first total loss function until the training is completed, and executing the step S8;

s8: and (2) executing the steps S2-S6 again on the floating image of the first modality to be registered and the reference image of the second modality to obtain a transformed floating image, namely a registration image.

Preferably, the image conversion network employs a modified Cycle-GAN network having a second total loss function of:

wherein, countering the lossTwo discriminators D1, D of an improved Cycle-GAN network, respectively ₂ Is a contrast loss of (1), a cyclic consistency loss L _cyc Two generators G for improved Cycle-GAN networks ₁ 、G ₂ Constraint of transform reversibility, identity mapping loss L _identity Is a normalized constraint performed by generating a converted image under the same mode, and the structural consistency is lost L _MIND Is a constraint on the structural similarity of the original image and the generated image, lambda _cyc 、λ _identity 、λ _MIND Respectively represent L _cyc 、L _identity 、L _MIND Is of relative importance of (2);

step S7 further comprises training the modified Cycle-GAN network targeting said second total loss function.

Preferably, wherein structural consistency is lost L _MIND The method comprises the following steps:

wherein M represents a mode independent domain descriptor MIND, I _r1 Representing a floating image of a first modality, I _r2 Reference image representing second modality, N ₁ and N₂ Respectively represent image I _r1 and I_r2 Wherein R represents a non-local region around voxel x; image G ₁ (I _r2 ) Refers to using generator G ₁ Image I _r2 Conversion to generateImage, image G ₂ (I _r1 ) Refers to using generator G ₂ Image I _r1 And generating an image after conversion.

Preferably, wherein identity maps lost L _identity The method comprises the following steps:

L _identity ＝||G ₁ (I ₁ )-I ₁ || ₁ +||G ₂ (I ₂ )-I ₂ || ₁

wherein ,I₁ An image representing a first modality, I ₂ An image representing a second modality, image G ₁ (I ₁ ) Refers to using generator G ₁ Image I of a first modality ₁ Converted generated image, image G ₂ (I ₂ ) Refers to using generator G ₂ Image I of the second modality ₂ And generating an image after conversion.

Preferably, the cross-modal flow sub-network adopts a UNet network structure, wherein the UNet network structure comprises an encoder and a decoder, and convolution layers of the encoder and the decoder adopt jump connection.

Preferably, the single-mode stream sub-network adopts a UNet network structure, wherein the UNet network structure comprises an encoder and a decoder, and convolution layers of the encoder and the decoder adopt jump connection.

Preferably, the deformation field fusion network is a 3D convolutional neural network.

Preferably, the spatial transformation network comprises a spatial grid generator that generates a sampling grid from the final deformation field and a sampler that spatially warp the floating image according to the sampling grid.

Preferably, the first total loss function is:

wherein ,representing the final deformation field, I _r1 Representing a floating image of a first modality, I _r2 Reference image representing a second modality, +.>Representing a floating image after a final deformation field distortion transformation, image similarity loss +.>Representing the transformed floating image +.>And image I _r2 Loss of image similarity between->Representing +.>The smoothness constrains the regularization loss, λ is the regularization coefficient.

Another embodiment of the invention discloses a cross-modality medical image registration apparatus comprising a processor and a readable storage medium storing executable instructions executable by the processor, the processor being arranged to cause implementation of the above-described cross-modality medical image registration method by the executable instructions.

Compared with the prior art, the invention has the beneficial effects that: according to the cross-modal medical image registration method and device, the floating image is firstly converted into the mode through the image conversion network, then the deformation field is estimated by utilizing the cross-modal flow sub-network and the single-modal flow sub-network under the unsupervised condition, the original image and the generated image information are effectively integrated through the double-flow mechanism, wherein the interference of unreal artificial features in the generated image on registration is weakened through introducing the texture features of the original image through the cross-modal flow, and the voxel drift effect caused by the cross-modal flow is effectively inhibited through using the single-modal flow; the original cross-modal flow and the synthesized single-modal flow are cooperatively optimized, and a more real deformation field is learned based on the original floating image and the generated conversion image information respectively; and then the deformation fields estimated by the cross-modal flow sub-network and the single-modal flow sub-network are fused, so that the accuracy and the robustness of cross-modal medical image registration are greatly improved, and better registration performance is obtained. By the method and the device for registering the cross-modal medical images, the situation that the direct cross-modal medical image registration effect is poor due to the fact that the appearance differences of the medical images of different modalities are huge can be avoided.

In a further scheme, the improved Cycle-GAN network is further adopted as an image conversion network, compared with the existing Cycle-GAN network, two loss function constraints are newly added, the structural similarity between a generated image and an original image is enhanced, the structural fidelity is improved, more artificial features are prevented from being introduced during image conversion, and therefore the problem that the existing generation countermeasure network (Generative Adversarial Network, GAN) is utilized to convert a cross-modal image pair to be registered into a single-modal image pair, unreal artificial anatomical features are inevitably introduced to interfere with a registration process, and the detail texture of the original image is lost, so that registration accuracy is lowered is solved.

Drawings

Fig. 1 is a schematic diagram of a conventional VoxelMorph network architecture;

FIGS. 2a to 2c are schematic diagrams of image conversion based on a Cycle-GAN network;

FIG. 3a is a raw CT modality image;

FIG. 3b is an MR image generated using a Cycle-GAN network;

FIG. 4 is a flow chart of a cross-modality medical image registration method according to a preferred embodiment of the present invention;

FIG. 5 is a dual-stream cross-modality registration network architecture based on countermeasure learning in an embodiment of the present invention;

FIGS. 6a and 6b are schematic diagrams of modified Cycle-GAN networks in accordance with embodiments of the present invention;

FIG. 7a is a raw CT modality image;

FIG. 7b is an MR image generated using the modified Cycle-GAN network of the present invention;

fig. 8 is a schematic structural diagram of a cross-modal flow subnetwork unet_o/single-modal flow subnetwork unet_s;

fig. 9 is a hardware configuration diagram of a cross-modality medical image registration apparatus according to a preferred embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical schemes and beneficial effects to be solved by the embodiments of the present invention more clear, the present invention is further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Wherein the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the embodiments of the present invention, the meaning of "plurality" is two or more, unless explicitly defined otherwise.

The inventor finds that the reason that the cross-mode image registration technology based on VoxelMorph+MIND in the prior art cannot obtain a satisfactory registration effect is that the differences of the appearance of medical images of different modes are obvious, the registration difficulty is high by directly ignoring the differences of the modes of the images, and the precision is not guaranteed; even though such techniques reform the loss function in the past single-modality registration tasks, it is still difficult to find a robust cross-modality similarity metric. In addition, the M2U registration technique based on cross-modality image conversion in the prior art has the defect that artificial features are inevitably introduced in the generation of images; most of these techniques utilize a Cycle-GAN or the like to generate a countermeasure network (GAN) to implement cross-modal image conversion. Specifically, the Cycle-GAN network uses a generator to learn the mapping relationship between different domains in an input image, and uses a discriminator to evaluate the true or false of the generated image in the target domain. The network adopts the countermeasures fed back by the discriminator to guide the generator to learn the image domain mapping, so that the generated image is similar to the target domain image in distribution, and meanwhile, the difference degree between the generated image and the original domain image is continuously increased. However, such an approach does not guarantee that the network outputs the ideal generated image, because the network learns only a probabilistic distribution by countering the loss. Although the network introduces a cyclic consistency loss to constrain the reversibility of the different modality conversions, it is not enough to ensure that the generated image still retains the original image features, because the structural similarity of the images before and after the modality conversion is not constrained during the network training. Therefore, the image conversion method based on the generation countermeasure network inevitably introduces unreal artificial features when generating the target domain image, resulting in an increase in the mismatching rate in the subsequent registration process. The master Cycle-GAN network does not place normalized constraints on the unit mapping of the input image in the same modality, and is likely to erroneously convert the input image already in the target domain to another domain. Therefore, a method that simply converts cross-modality registration to single-modality registration by means of only existing cross-modality image conversion techniques is not robust.

Meanwhile, the local texture features of the original image are easily lost in the target modal image generated by the existing image conversion technology, and the target modal image and the original image show a large structural difference, and abdomen registration from CT to MR is taken as an example, as shown in fig. 3a and 3b, wherein fig. 3a is the original CT modal image, and fig. 3b is the MR image generated by using a Cycle-GAN network, and compared with fig. 3b, the local texture features in the image are lost. In the unsupervised registration stage, the existing cross-modal registration technology generally directly inputs the generated image as a floating image into a deformable registration network for single-mode registration without incorporating the original image. Because the entire unsupervised registration network estimates the deformable transformations from the input image appearance characteristics, the Distortion Field (DFs) fidelity estimated by the registration network naturally depends on the identity of the input image to the original image characteristics. Because the prior art ignores auxiliary information provided by the original image during registration, parameter learning of the registration network is greatly influenced by artificial features in the generated image, so that the network finally estimates a distorted deformation field, the original floating image cannot be well aligned to the reference image, and the registration accuracy is reduced.

Furthermore, unsupervised medical image registration uses similarity or error loss between the transformed floating image and the reference image to supervise network training. Because of the large difference in appearance between different modality images, the similarity measurement index commonly used in the unsupervised single modality registration task is no longer applicable to cross-modality scenes. Many learning-based unsupervised registration techniques use indices of mutual information (Mutual Information, MI), cross-correlation (Cross Correlation, CC), etc. to measure image similarity in cross-modality registration tasks and use it to guide network parameter learning. However, the direct application of these similarity measure index transitions to cross-modality registration tasks cannot effectively characterize image similarity, resulting in the network learning in the wrong direction according to biased image similarity guide parameters.

The invention aims to incorporate original image information in the registration process by a design mode of double-flow registration field fusion, so as to help a network to learn a more real deformation field in a robust way, thereby obtaining better registration performance. In order to enable the original image information to guide the cross-modal registration process, the method effectively utilizes the deformation fields estimated by the original cross-modal flow and the synthesized single-modal flow, and automatically learns how to better integrate the two deformation fields through a convolution network. Meanwhile, in order to avoid introducing unreal artificial features in the image conversion process, two loss function constraints are newly added in the Cycle-GAN to improve the fidelity of the anatomical structure in the generated image.

As shown in fig. 4, a preferred embodiment of the present invention proposes a cross-modality medical image registration method for registering a floating image of a first modality to a reference image of a second modality, comprising the steps of:

s2: inputting the floating image into an image conversion network to convert the image to be registered from a first mode to a second mode, and outputting a converted image of the second mode;

s7: comparing the transformed floating image with the reference image to obtain a first total loss function, repeatedly executing the steps S2-S7 to monitor and train the cross-modal flow sub-network, the single-modal flow sub-network, the deformation field fusion network and the space transformation network with the aim of minimizing the first total loss function until the training is completed, and executing the step S8;

The whole process of the cross-modal medical image registration method provided by the invention is shown in figure 5, and can be divided into a cross-modal image conversion network based on improved Cycle-GAN and a double-flow cross-modal image registration network. The following description is given taking the example of floating CT image registration to reference MR image, but the method of the invention is not limited to CT-MR cross-modality medical image registration, but can equally be applied to other cross-modality image registration, such as magnetic resonance-ultrasound (MR-US) registration, computed tomography-ultrasound (CT-US) registration, etc.

In connection with the dual-stream registration fusion cross-modality medical image registration network structure based on countermeasure learning in fig. 5, the overall registration step includes:

a1: providing a training set comprising a floating image rCT of the CT modality and a reference image rMR of the MR modality;

a2: inputting the floating image rCT of the original CT mode into an improved Cycle-GAN image conversion network to realize conversion of the floating image from the CT mode to the MR mode, and outputting the generated image tMR by the network;

As an optimal model for image conversion, the Cycle-GAN network can be trained without CT and MR paired data of the same patient. The image conversion Cycle-GAN network model used in the present invention is shown in fig. 6a and 6 b. Forward (CT-to-MR) and reverse (MR-to-CT) transformation of CT modality and MR modality images are schematically depicted in fig. 6a, wherein solid lines represent forward (rCT-1 to tMR-1) and reverse (tMR-1 to tCT-1) transformation processes of raw CT modality images; the dashed lines represent the forward (from rMR-2 to tCT-2) and reverse (from tCT-2 to tMR-2) conversion processes of the real MR modality images. The Cycle-GAN image conversion network consists of two generators G _MR 、G _CT And two discriminators D _CT 、D _MR Composition is prepared. Wherein generator G _MR For converting images from CT mode to MR mode (CT-to-MR), e.g. generator G _MR The output generates image tMR-1 with rCT-1 as input. Generator G _CT For converting images from MR to CT modalities (MR-to-CT), e.g. image generator G _CT Taking rMR-2 as input, and outputting to generate an image tCT-2; distinguishing device D _CT For distinguishing real CT modality images from warp generator G _CT The resulting image after conversion, e.g., differentiating between the true CT image rCT-3 and the resulting image tCT-2. Similarly, a discriminator D _MR For distinguishing real MR modality images from the warp generator G _MR The resulting image after conversion, such as differentiating between the real image rMR-3 and the generated image tMR-1. Figure 6b is an improved Cycle-GAN network of the present invention with respect to the loss of identity mapping constraint for image transformations within the same modality.

The loss function of the improved Cycle-GAN network of the present invention comprises four parts: (1) Loss of antagonism from a discriminant and />Countering losses is to generate image and target mode real image dataPunishment is carried out on the difference of the distribution, and the purpose of the punishment is that the image converted by the generator has data distribution which is highly similar to the target mode image and is difficult to distinguish by the discriminator; (2) Cycle consistency penalty L _cyc Is to two generators G _CT and G_MR Constraint of transform reversibility, i.e. by generator G _CT The converted image is then passed through generator G _MR The conversion can return to the original mode image and be highly similar to the original image data distribution. For example, in the figure, the original CT mode image rCT-1 is transmitted through a generator G _MR Converting to tMR-1, and reusing generator G _CT Converting the image tMR-1 back to the original CT mode to obtain tCT-1, wherein the tCT-1 has the same data distribution as the original image rCT-1; (3) Structural consistency loss L _MIND The method is to restrict the structural similarity of the original image and the generated image, and aims to keep the structural characteristics of the image converted by the generator and the original image to be highly consistent. For example, the original image rCT-1 in the figure will be the generator G _MR Conversion to tMR-1, at a structural consistency penalty L _MIND rCT-1 has a high degree of structural similarity with tMR-1 under the constraint of (2); (4) Identity mapping penalty L _identity As shown in fig. 6b, normalized constraint is performed on the converted image generated in the same modality, and loss L is mapped in the identity _identity The image transformations within the same modality should remain unchanged under the training constraints of (a). For example, MR modality image rMR via generator G in fig. 6b _MR The tMR obtained after conversion should be identical to the original.

Wherein, structural consistency loss L _MIND The MIND is used for measuring the structural similarity between the original mode image and the target mode image obtained after conversion of the generator, such as rCT-1 and tMR-1, and is used for describing the local structural characteristics around each pixel. L (L) _MIND Has higher robustness to the significant difference between different modalities, and is used for constraining the generated image and the original imageStructural consistency between the starting images. L (L) _MIND Guiding network training to continuously reduce generated images G _CT (I _rMR ) Or G _MR (I _rCT ) And image I _rMR Or I _rCT MIND loss between the images to enhance the structural similarity between the images before and after conversion.

Structural consistency loss L used in the present invention _MIND The definition is as shown in the formula (1),

wherein M represents MIND, I _rMR Reference image representing MR modality, I _rCT Floating image representing CT modality, N _MR and N_CT Respectively represent image I _rMR and I_rCT The number of voxels, R, represents the non-local region around voxel x, image G _CT (I _rMR ) Refers to using generator G _CT Image I _rMR The resulting image (also denoted as tCT) after conversion, image G _MR (I _rCT ) Refers to using generator G _MR Image I _rCT The resulting image (also denoted tMR) after conversion.

In addition, the invention loses L to the Cycle-GAN network through the identity mapping _identity Normalized constraint on generation of transformed images in the same modality, identity mapping loss L _identity As shown in formula (2):

L _identity ＝||G _MR (I _MR )-I _MR || ₁ +||G _CT (I _CT )-I _CT || ₁ (2)

wherein ,G_MR (I _MR ) Representing MR modality images I _MR Warp generator G _MR The MR mode obtained after conversion generates images, G _CT (I _CT ) Representing CT modality image I _CT Warp generator G _CT And generating an image by the CT mode obtained after conversion. L (L) _identity Calculating the L1 distance between the generated image and the real image in the same mode, namely G _MR (I _MR) and I_MR 、G _CT (I _CT) and I_CT The sum of the L1 distances between them. Specifically, loss L is mapped at identity _identity Under training constraints of (a), the image transformations in the same modality should remain unchanged, i.e. G _MR (I _MR )≈I _MR ，G _CT (I _CT )≈I _CT . Loss of L through identity mapping _identity The generator can be prevented from erroneously converting an image that is already within the target modality to another modality.

In summary, the improved overall loss L of the Cycle-GAN network of the invention is the counterloss and />Cycle consistency penalty L _cyc Loss of identity mapping L _identity Structural consistency loss L _MIND The definition of the weighted sum is shown in the formula (3):

wherein ,λ_cyc 、λ _identity 、λ _MIND Respectively represent the loop consistency loss L _cyc Loss of identity mapping L _identity Structural consistency loss L _MIND Is of relative importance.

In this step, the original CT modality image rCT is converted using the modified Cycle-GAN network, as shown in fig. 7a and 7b, where the original CT modality image of fig. 7 a; FIG. 7b is an MR image generated using the modified Cycle-GAN network; FIG. 7b preserves local texture features in the image compared to FIG. 7a, and therefore, from the visualization results of FIGS. 7a and 7b, the structural consistency loss L _MIND The addition of (2) effectively enhances the structural similarity between the generated image tMR and the original image rCT, and improves the boundary fidelity of the organ.

The training loss of the existing Cycle-GAN network includes only two items: tamper-proof given by a discriminatorLoss of functionAndloss of loop consistency L _cyc . The method comprises the steps of checking loss and cyclic consistency loss, guiding a network to learn mapping relations among images of different modes, and restricting reversibility of mapping transformation by the cyclic consistency loss. However, the inventors have found that it is difficult to train a robust cross-modality medical image conversion network solely by means of these two loss function constraints, since the cyclic consistency loss is insufficient to guarantee structural similarity between the generated image and the original image (as shown by the comparison of fig. 3a and 3b above); moreover, the existing Cycle-GAN network does not perform normalized constraint on unit mapping of the input image in the same modality, and is likely to erroneously convert the input image already in the target domain to another domain. Thus, the improvement of the Cycle-GAN network in this step introduces two additional loss functions: structural consistency loss L _MIND And identity mapping loss L _identity The training of the Cycle-GAN network is constrained using a total of four losses, thereby ensuring structural similarity between the generated image and the original image, as shown in a comparison of fig. 7a and 7b, and avoiding erroneous transition of the input image already in the target domain to another domain.

A3: the cross-modal flow sub-network takes an original CT mode floating image rCT and an MR mode reference image rMR as inputs (namely, the inputs are cross-modal image pairs (rCT, rMR)), learns deformable mapping between the input image pairs through a UNet structural network, and outputs a deformation fieldWherein the deformation field->I.e. a deformable mapping relation representing the input cross-modality image pair (rCT, rMR);

in this embodiment, the cross-modal streaming sub-network unet_o adopts UNet network structure. As shown in FIG. 8The original CT mode image (rCT) and the MR mode image (rMR) are respectively used as floating images I _m And reference image I _f Gray scale floating image I with channel number of 1 _m And a gray reference image I with a channel number of 1 _f Will I _m and I_f And splicing according to the channel direction to obtain a three-dimensional volume image of two channels as an input image. The UNet network employs an encoder-decoder structure that reduces the spatial resolution of the input image using a 3D convolution with a step of 2 in the encoder section, and the decoder restores the spatial resolution of the image using a 3D upsampling layer. Jump connection is used between convolution layers of the encoder and the decoder to fuse shallow layer features and deep layer features; the number of channels per convolutional layer output profile is shown in fig. 8 as the upper end number of the rectangular convolutional layer. Variable transformation parameters between cross-modal input images are learned through a three-dimensional deep convolution network, and a deformation field of a3 channel is obtained through output

At this step, the original cross-modal flow incorporates the original image rCT into a cross-modal registration framework so that the model can estimate the deformation field based on the detailed texture features provided in rCTThe introduction of the original information assists the model to learn a more realistic deformable transformation, so that the disturbing influence of the artificial features in the generated image tMR on registration can be reduced.

A4: the unimodal flow sub-network then uses the output of the previous modified Cycle-GAN network to generate image tMR and MR modality reference image rMR as inputs (i.e., the inputs are unimodal image pairs (tMR, rMR)), uses the same UNet network as the cross-modal flow to learn the deformable mapping between the input image pairs, and outputs a deformation fieldWherein the deformation field->I.e. representing inputDeformable mapping of the unimodal image pair (tMR, rMR);

in this embodiment, the single-mode flow sub-network unet_s also adopts the same UNet network architecture as the cross-mode flow sub-network unet_o shown in fig. 8. The only difference is that unet_o is a cross-modal input, while unet_s is a single-modal input. Through image conversion, the network may convert the original CT modality image (rCT) to an MR modality image (tMR). The generated tMR image and rMR image are respectively input into the network as a floating image and a reference image by the single-mode flow subnetwork unet_s, the deformable mapping between the single-mode input images is learned through a three-dimensional depth convolution network, and the deformation field of 3 channels is finally output

In the step, the synthesized single-mode flow can learn more texture information in the single-mode image, and the voxel drift phenomenon caused by the cross-mode flow is effectively restrained.

A5: deformation field fusion network deformation field estimated with the first two flows (cross-modal flow and single-modal flow) and />For input, a convolution network is adopted to carry out mixed superposition on two deformation fields, and a final deformation field +.>

Wherein the cross-modal flow and the single-modal flow sub-network estimate the deformation field based on the cross-modal input (rCT and rMR) and the single-modal input (tMR and rMR), respectively and />In this step the deformation field is-> and />Mixing and superposing, effectively fusing two deformation fields by adopting a convolution neural network with the convolution kernel size of 3 multiplied by 3, and outputting a final deformation field +.>Wherein, 3D volume deformation field +.>Has a-> and />The same dimension is 3 channels.

A6: the spatial transformation network is based on the final deformation fieldSpatially warping the original CT modality floating image rCT by +.>Representation, obtaining a transformed floating image (moved CT);

final deformation field based on deformation field fusion networkThe floating image rCT is spatially warped in this step by means of a spatial transform network (Spatial Transform Network, STN). In this embodiment, the STN comprises a spatial grid generator and sampler, which can predict the deformation field according to the network +. >A sampling grid is generated and then spatially warped rCT by the sampler.

A7: calculating the network training loss, comprising two parts: image similarity loss between the transformed floating image and the reference image; smoothing the final deformation fieldRegularized loss of (c). And (3) repeating the steps A2-A7 to perform supervision training on the dual-flow cross-mode image registration network with the aim of minimizing the loss function until the training is completed and the step A8 is executed, and directly registering and outputting registration images by using the network.

In the embodiment, a dual-flow cross-mode image registration network is provided, which comprises a cross-mode flow sub-network, a single-mode flow sub-network, a deformation field fusion network and a Space Transformation Network (STN), wherein the training of the dual-flow cross-mode image registration network is similar to multi-antibody training, and the cross-mode flow and the single-mode flow are mutually independent and mutually restricted, so that the whole non-supervision registration network is cooperatively optimized. From the design of the optimization objective, the loss function of the network includes two terms: image similarity loss L _sim And regularization loss L _smooth . Wherein the similarity loss L _sim Characterizing a transformed floating imageAnd the similarity between the reference image rMR, in the embodiment, the structural similarity index is used for measuring the similarity of the images, and the similarity is independent of the brightness and the contrast of the images. Wherein (1) >Representing the final deformation field estimated by the registration network +.>Applied to the original floating image rCT, the spatial warping transformation is performed on rCT to obtain a transformed floating image. In addition, regularization loss L _smooth Deformation field estimated for network>Applying smoothness constraints, L is used in this embodiment ₂ Normal pair final deformation field->Is regularized.

To sum up, the total loss function L of the dual-flow cross-modality image registration network proposed in the present embodiment _total Loss of similarity L for images _sim And regularization loss L _smooth Is a weighted sum of (c). Total loss definition L _total As shown in formula (4):

wherein ,L_sim Representing transformed floating imagesAnd MR modality reference image I _rMR Loss of image similarity between L _smooth Representing +.>Smoothness is a constrained regularization penalty. I _rMR Reference image representing MR modality, I _rCT Floating image representing original CT modality, +.>Representing the final deformation field of the dual-flow cross-modality registration network output,representing ∈9 according to the final deformation field>Warp transforming floating image I of original CT mode _rCT The resulting transformed floating image, λ, is the regularization coefficient.

By minimizing the total loss function L in this embodiment _total At the same time, the method maximizes the similarity between the floating image and the reference image after transformation (i.e. minimizes the image similarity loss L _sim ) And a smooth deformation field is obtained (i.e. regularization loss L is minimized) _smooth ) The method comprises the steps of carrying out a first treatment on the surface of the And to minimize the total loss function L _total And performing supervision training on the dual-flow cross-mode image registration network for the target.

In summary, the dual-stream cross-modality registration technique proposed in the present embodiment allows the network to estimate the final deformation field using both the original image and the generated image information. Through the robust learning framework, robust and efficient registration of cross-modality medical images can be achieved under the condition of complete unsupervised. Moreover, the dual-stream cross-modality image registration network effectively combines the original cross-modality stream and the composite single-modality stream, fully utilizing the information of the original floating image rCT, the reference image rMR, and the generated image tMR. Therefore, the problem that the matching rate is high and the deformation field is distorted due to the fact that the non-real features are introduced when the image is generated by the conversion-based registration technology can be solved through the dual-flow cross-mode image registration network.

A8: the floating image rCT of the CT mode to be registered and the reference image rMR of the MR mode are input into the trained double-flow cross-mode image registration network, and a transformed floating image can be obtainedI.e. the registered image.

In this embodiment, given a floating image rCT of any CT modality and a reference image rMR of an MR modality, rCT is converted to an MR type image tMR using a modified Cycle-GAN network, and then two deformation fields are estimated using the original cross-modality flow and the synthetic unimodal flow, respectively and />Fusion by 3D convolutional network> and />Obtaining the final deformation field->The warping of the floating image rCT is accomplished by means of a Spatial Transformation Network (STN). The goal of the whole unsupervised cross-modality registration network is to maximize the transformed floating image +.>And the similarity between the reference images rMR.

The embodiment of the invention provides a novel dual-flow cross-mode medical image registration technology based on countermeasure learning, realizes CT-MR cross-mode registration under the condition of no supervision, overcomes the defects of the existing registration technology based on image conversion, and improves the registration precision and robustness of the non-supervision cross-mode medical image.

The dual-flow cross-mode registration network provided by the embodiment of the invention mainly comprises two parts: 1. a modified Cycle-GAN based cross-modality image conversion network that utilizes the modified Cycle-GAN network to convert the floating images rCT of the CT modality to the generated images tMR of the MR modality; 2. a cross-modality image registration network based on dual-flow registration field fusion. The cross-modal image registration network is divided into a cross-modal flow sub-network, a single-modal flow sub-network, a deformation field fusion network and a Space Transformation Network (STN).

In the invention, the cross-modal image conversion model Cycle-GAN is improved, and the structural consistency loss and the identity mapping loss are newly added in the network loss function, so that the structural similarity between the generated image and the original image is obviously enhanced. To quantitatively evaluate the performance of the improved Cycle-GAN model, SSIM and peak signal to noise ratio PSNR metrics are chosen below, and the quality of MR images generated from CT images is measured on two data sets (Pig data set-Pig Ex-vivo Kidney CT-MR data set, ABD data set-Abdomen (ABD) CT-MR data set). The SSIM index measures the structural similarity between images before and after cross-modal conversion, and the PSNR index is used for evaluating the quality of the generated image compared with the original image, and the higher the two index values are, the better. As shown in Table 1, the improved Cycle-GAN network of the present invention performs better than the prior original Cycle-GAN network.

TABLE 1 comparison of Cross-modality image conversion test results

Under the condition of no supervision, the invention provides a dual-flow cross-mode registration framework, and reduces the interference of unreal artificial features in a generated image on registration by introducing texture features of an original image through cross-mode flow, and effectively inhibits voxel drift effect caused by the cross-mode flow by using single-mode flow. The original cross-modal flow and the synthesized single-modal flow are cooperatively optimized, and a more real deformation field is estimated based on the original CT image and the generated MR image information respectively. The 3D convolution network adopting the mixing module automatically learns how to better integrate the two deformation fields so as to obtain better registration performance.

The following selects the Dice coefficients and target registration errors (Target Registration Error, TRE) to evaluate the performance of different cross-modality registration models. The degree of overlap between the floating image transformed by the STN module and the reference image is measured by the Dice coefficient, and the higher the Dice coefficient value is, the better the Dice coefficient value is. The TRE is an index special for measuring the accuracy of the registration algorithm, and represents the position distance (in millimeters mm) of the target point set on the registration image and the reference image, and the lower the TRE index value is, the better the registration performance is.

The effectiveness of the invention is verified by the evaluation results (shown in table 2) on two clinical data sets, and compared with the traditional cross-modal registration algorithm (voxelmorph+bond) and other cross-modal medical image registration techniques (M2U) based on deep learning, the dual-flow cross-modal medical image registration technique provided by the invention is significantly superior to the prior art in registration accuracy.

Table 2 cross-modality image registration experimental results comparison

Fig. 9 is a schematic hardware structure diagram of a cross-modal medical image registration apparatus according to another preferred embodiment of the present invention. The imaging device may include a processor 901, a readable storage medium 902 storing executable instructions. The processor 901 and the readable storage medium 902 may communicate via a system bus 903. Also, by reading and executing executable instructions corresponding to imaging logic in the readable storage medium 902, the processor 901 may perform a method of cross-modality medical image registration apparatus described above.

The readable storage medium 902 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information, such as executable instructions, data, or the like. For example, a machine-readable storage medium may be: nonvolatile memory, flash memory, a storage drive (e.g., hard drive), solid state disk, any type of storage disk (e.g., optical disk, DVD, etc.), or similar storage medium, or a combination thereof.

Embodiments of the subject matter and the functional operations described in this specification can be implemented in: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this specification and structural equivalents thereof, or a combination of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on a manually-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Computers suitable for executing computer programs include, for example, general purpose and/or special purpose microprocessors, or any other type of central processing unit. Typically, the central processing unit will receive instructions and data from a read only memory and/or a random access memory. The essential elements of a computer include a central processing unit for carrying out or executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks, etc. However, a computer does not have to have such a device. Furthermore, the computer may be embedded in another device, such as a mobile phone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.

Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices including, for example, semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., internal hard disk or removable disks), magneto-optical disks, and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several equivalent substitutions and obvious modifications can be made without departing from the spirit of the invention, and the same should be considered to be within the scope of the invention.

Claims

1. A method of cross-modality medical image registration, comprising the steps of:

s2: inputting the floating image into an image conversion network to convert the floating image from a first mode to a second mode, and outputting a converted image of the second mode; wherein the image conversion network adopts a modified Cycle-GAN network, and a second total loss function of the modified Cycle-GAN network is as follows:

Wherein, countering the lossTwo discriminators D of a modified Cycle-GAN network respectively ₁ 、D ₂ Is a contrast loss of (1), a cyclic consistency loss L _cyc Two generators G for improved Cycle-GAN networks ₁ 、G ₂ Constraint of transform reversibility, identity mapping loss L _identity Is a normalized constraint performed by generating a converted image under the same mode, and the structural consistency is lost L _MIND Is a constraint on the structural similarity of the original image and the generated image, lambda _cyc 、λ _identity 、λ _MIND Respectively represent L _cyc 、L _identity 、L _MIND Is of relative importance of (2);

s5: inputting the first deformation field and the second deformation field into a deformation field fusion network to superimpose the first deformation field and the second deformation field and output a final deformation field, wherein the deformation field fusion network is a 3D convolutional neural network;

s7: comparing the transformed floating image with the reference image to obtain a first total loss function, repeatedly executing steps S2-S7 to train the cross-modal flow sub-network, the single-modal flow sub-network, the deformation field fusion network and the space transformation network by taking the first total loss function as a target, and training the improved Cycle-GAN network by taking the second total loss function as a target until the training is completed, and executing step S8, wherein the first total loss function is as follows:

wherein ,representing the final deformation field, I _r1 Representing a floating image of a first modality, I _r2 A reference image representing a second modality,representing a floating image after a final deformation field distortion transformation, image similarity loss +.>Representing the transformed floating image +.>And image I _r2 Loss of image similarity between->Representing +.>The smoothness is subjected to constrained regularization loss, and lambda is a regularization coefficient;

2. The cross-modality medical image registration method of claim 1, wherein structural consistency is lost L _MIND The method comprises the following steps:

wherein M represents a mode independent domain descriptor MIND, I _r1 Representing a floating image of a first modality, I _r2 Reference image representing second modality, N ₁ and N₂ Respectively represent image I _r1 and I_r2 Wherein R represents a non-local region around voxel x; image G ₁ (I _r2 ) Refers to using generator G ₁ Image I _r2 Converted generated image, image G ₂ (I _r1 ) Refers to using generator G ₂ Image I _r1 And generating an image after conversion.

3. The cross-modality medical image registration method of claim 1, wherein the identity mapping loses L _identity The method comprises the following steps:

L _identity ＝||G ₁ (I ₁ )-I ₁ || ₁ +||G ₂ (I ₂ )-I ₂ || ₁

4. The cross-modality medical image registration method of claim 1, wherein the cross-modality flow sub-network employs a UNet network structure, wherein the UNet network structure includes an encoder and a decoder, and wherein convolved layers of the encoder and the decoder employ a skip connection.

5. The cross-modality medical image registration method of claim 1, wherein the single modality flow sub-network employs a UNet network structure, wherein the UNet network structure includes an encoder and a decoder, and wherein convolved layers of the encoder and the decoder employ a skip connection.

6. The cross-modality medical image registration method of claim 1, wherein the spatial transformation network includes a spatial grid generator that generates a sampling grid from the final deformation field, and a sampler that spatially warp the floating image from the sampling grid.

7. A cross-modality medical image registration apparatus comprising a processor and a readable storage medium storing executable instructions executable by the processor, the processor being arranged to cause the cross-modality medical image registration method of any one of claims 1 to 6 to be implemented by the executable instructions.