CN113724304A

CN113724304A - Esophagus region image automatic registration method and system based on deep learning

Info

Publication number: CN113724304A
Application number: CN202110800261.1A
Authority: CN
Inventors: 李登旺; 洪亭轩; 张建东; 侯勇; 黄浦; 虞刚; 李续然; 陆华; 王建波; 朱慧; 李婕; 吴冰; 柴象飞; 章桦
Original assignee: Shandong Normal University
Current assignee: Shandong Normal University
Priority date: 2021-07-15
Filing date: 2021-07-15
Publication date: 2021-11-30

Abstract

The invention provides an esophagus region image automatic registration method and system based on deep learning, which are used for acquiring an original fixed image and a moving image, preprocessing the image and selecting an interested region; forming an automatic registration model by using a full convolution neural network as a generator and a convolution neural network as a discriminator, and registering the images of the region of interest; comparing the registered image with the real image, judging whether the registered image is aligned or not, and optimizing the parameters of the automatic registration model according to the judgment result; and reprocessing the acquired image by using the optimized automatic registration model to obtain a registered esophagus image. The method based on deep learning ensures higher stability and robustness while obtaining higher registration speed.

Description

Esophagus region image automatic registration method and system based on deep learning

Technical Field

The invention belongs to the technical field of medical image processing, and particularly relates to an esophagus region image automatic registration method and system based on deep learning.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Multi-mode treatment strategies such as image-guided radiation therapy (IGRT) or three-dimensional conformal radiotherapy (3DCRT) plus surgery can improve the survival rate of patients with advanced esophageal cancer. For optimal radiation therapy of esophageal cancer, geometric uncertainties such as respiratory motion and daily positional changes of the tumor typically need to be considered. To compensate for these uncertainties, Planning Target Volume (PTV) margins are typically employed. However, the PTV margin is large and the PTV is equal for all patients. Therefore, organs adjacent to the Clinical Target (CTV) will be exposed to high levels of radiation, leading to an increased risk of acute radiation sickness. Accurate determination of tumor location, respiratory motion, and region of interest (ROI) may improve radiation therapy of esophageal cancer.

Image-guided radiation therapy (IGRT) includes obtaining different modality images at different times, preoperative planning, intraoperative visualization, and postoperative assessment. The image of the patient is mainly divided into a preoperative image and an intraoperative image, the preoperative image can reflect anatomical information of the patient, and the intraoperative image can reflect a physical space of a tumor of the patient. Registration of the preoperative image and the corresponding region-of-interest image of the intraoperative image can greatly assist a clinician in guiding the clinician correctly during the operation. The goal of IGRT is to improve the dose distribution to target area coverage and reduce radiation damage to normal tissues, so there is a need to accurately describe esophageal cancer tumor volume and mediastinal lymph node location. A conventional imaging modality for radiotherapy planning in IGRT is Computed Tomography (CT). However, CT images do not always distinguish between the proximal and distal boundaries of malignant esophageal tumor and normal esophageal tissue, nor do CT images show well the contour of mediastinal lymph nodes. However, X-ray imaging can provide Digital Radiography (DR) of the esophagus and surrounding tissue, and DR images can show clear contours of the esophagus and mediastinal lymph nodes. Therefore, the multi-mode image registration is carried out on the three-dimensional CT image and the two-dimensional DR image of the esophageal tissue, so that a clinician can be assisted to make a radiotherapy plan, and the damage of radiation to normal tissues is reduced.

IGRT techniques assist clinicians in qualitative diagnosis and accurate tumor resection based on medical image information. Image registration is a fundamental task in IGRT, the main purpose of which is to provide better visualization navigation for the clinician. In IGRT, the goal of registration is to integrate corresponding information in different images of the same organ into one common coordinate system. In addition, image registration can also be used for quantitative comparison of medical images taken at different times, so that information of tumor evolution over time can be deduced, and the information is helpful for a clinician to monitor growth of esophageal cancer tumor over time. The multi-modal image registration technique is a problem of finding the optimal transformation from one coordinate space to another. In the medical context, image registration plays a crucial role in matching pre-operative and intra-operative image data. Preoperative images (e.g., CT scans) are typically taken several days prior to the actual surgery to perform surgical planning, while intra-operative images (e.g., X-rays) are used for real-time navigation and monitoring. Without image registration, the clinician must rely on his own clinical experience to perform a mental mapping between pre-operative and intra-operative images. Therefore, automatic image registration is very important to simplify the surgical procedure, reduce psychological burden, and potentially improve clinical outcomes.

With the rapid development of the deep learning field, some image registration models based on deep learning are widely proposed. Deep learning is initially employed to enhance registration in iterative methods, and then introduced to predict regions of interest in registration. In recent years, a Spatial Transform Network (STN) has been proposed, which can generate dense variable transformations to perform image registration and is widely used in methods such as affine transformation. The method calculates the gradient of the deformation error which is reversely propagated in the optimization process, and greatly improves the registration speed and robustness based on the deep learning method. However, since the above methods all focus on single modality image registration, registration of medical images of different dimensions remains to be explored.

Disclosure of Invention

The invention provides an automatic esophagus region image registration method and system based on deep learning, aiming at solving the problems.

According to some embodiments, the invention adopts the following technical scheme:

an esophagus region image automatic registration method based on deep learning comprises the following steps:

acquiring an original fixed image and a moving image, preprocessing the image, and selecting an interested area;

forming an automatic registration model by using a full convolution neural network as a generator and a convolution neural network as a discriminator, and registering the images of the region of interest;

comparing the registered image with the real image, judging whether the registered image is aligned or not, and optimizing the parameters of the automatic registration model according to the judgment result;

and processing the acquired image by using the optimized automatic registration model to obtain a registered esophagus image.

An esophageal region image automatic registration system based on deep learning, comprising:

the image preprocessing module is configured to acquire an original fixed image and a moving image, preprocess the image and select an interested area;

an automatic registration module configured to register the region of interest image;

the registration evaluation module is configured to compare the registered image with the real image, judge whether the registered image is aligned or not and optimize the parameters of the automatic registration module according to the judgment result;

a result output module configured to output the registered esophageal image.

As an alternative embodiment, the image preprocessing module is configured to perform preliminary preprocessing of normalization and normalization on the fixed image and the moving image, perform noise reduction processing on the fixed image and the moving image, respectively, select an area of interest, and remove an area that affects the registration process.

As an alternative embodiment, the image preprocessing module is configured to divide the processed data into a training set, a validation set, and a test set.

As an alternative embodiment, the automatic registration module comprises a generator module and a discriminator module connected in series.

Further, the generator module uses a full convolution neural network, acquires the hidden vector by using an automatic encoder, and generates a deformation field from a two-dimensional space to a three-dimensional space by using the hidden vector as the full convolution neural network comprising a plurality of fully connected layers.

As a further step, the discriminator module employs a training convolutional neural network for encoding direct mapping from the input medical image to the output classification values, connects the registered image to the corresponding intermediate layer for non-rigid deformation, and introduces short residual connection and long residual connection to enhance the propagation of the registered image information.

As an alternative embodiment, the registration evaluation module is configured to evaluate the similarity degree of the predicted value and the true value of the automatic registration module by using a cross entropy loss function.

A computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the above method.

An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions, when executed by the processor, performing the steps of the above method.

Compared with the prior art, the invention has the beneficial effects that:

the invention learns CT and DR image information through a generator module based on a full convolutional neural network (FCN) and a discriminator module based on a Convolutional Neural Network (CNN), and responds in a mode of continuously adjusting and shaping a variable field.

The invention utilizes a deep learning network, automatically establishes characteristics through training data, and performs registration by combining image characteristics in different fields. The deep learning is represented by a plurality of stages of feature learning, and can automatically learn a rich feature hierarchy. Compared with the traditional registration method, the method based on deep learning ensures higher stability and robustness while obtaining higher registration speed, thereby realizing end-to-end registration.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

FIG. 1 is a schematic flow chart of an esophageal region image automatic registration method based on deep learning;

fig. 2 is a schematic diagram of a registration method based on a reactive network.

The specific implementation mode is as follows:

the invention is further described with reference to the following figures and examples.

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The first embodiment is as follows:

an automatic esophagus region image registration system based on deep learning mainly comprises four parts: 1. an image preprocessing unit; 2. an FCN-based generator unit; 3. a CNN-based discriminator unit; 4. and a model evaluation unit.

1. Image preprocessing unit

CT and DR image data of a patient receiving image-guided radiation therapy in a hospital are collected, cleaned and sorted by an image preprocessing unit, and input into an antagonistic image registration network for iterative registration. In 2D-3D image registration, due to changes in imaging protocols (e.g., collimation factors), the position and size of the target object may vary greatly in three-dimensional CT images, and interference from various artifacts of medical devices may also be present in two-dimensional DR images. Therefore, the CT and DR images are firstly subjected to normalization and standardization preliminary preprocessing, then the CT and DR images in the data set are respectively subjected to noise reduction processing, and then a region of interest (ROI) is selected to remove the region influencing the registration process. And finally, dividing the data set into a training set, a verification set and a test set.

2. FCN-based generator unit

While Convolutional Neural Network (CNN) based methods improve the robustness of 2D-3D registration, they are limited to the registration of highly opaque objects such as solid models or fixed shapes (e.g., rigid metal devices), which do not address multi-modal registration between different patients and with different shaped anatomies. And a complete convolution neural network (FCN) is used as a generator, a forming variable field can be directly generated, a registered image is output, and then the registered image is input into a CNN-based discriminator for judgment. In order to register each image in sequence, the system takes as input a low dimensional vector (hidden vector) obtained by an auto-encoder, which is a network structure consisting of two parts, an encoder and a decoder. And then generating a deformation field from a two-dimensional space to a three-dimensional space through the FCN network with n fully-connected layers. The FCN based generator updates not only the network parameters but also the input hidden vectors when optimizing.

The deep learning-based esophageal region image automatic registration method adopts a training strategy based on a full convolution neural network (FCN), and can train all observation regions in each back propagation. The strategy can reduce the degree of freedom of a deformation field in the registration process, so that the training efficiency can be improved compared with a standard CNN-based training method, and efficient 2D-3D registration is realized. When there is severe image artifact or occlusion, such as registration in images using metal screws or guide wires, the performance of this approach is significantly better than single neural network based approaches. Compared with the existing registration method, the method has higher registration precision and realizes end-to-end image registration.

3. CNN-based discriminator unit

A deep learning based esophageal region image auto-registration method employs a trained Convolutional Neural Network (CNN) to encode a direct mapping from an input medical image to an output classification value. Thus, the classification value measurement and the weight assignment can be realized jointly by the neural network in a mode of an optimal parameter. In consideration of different imaging modes of the multi-modal medical image, the system adopts a multi-scale fusion method based on an image pyramid, so that the registration process is more consistent with human visual perception. In addition, the system also adopts a strategy based on local similarity to perform self-adaptive adjustment and fusion on the decomposition coefficients of the input multi-modal medical images. In order to better fuse DR and CT images, the input of the CNN network of the system adopts a 'dual-channel' input form, and image data can be processed more quickly.

CNN networks exhibit an incredible speed and power in a variety of computer vision and medical image processing tasks. The system utilizes a CNN-based discriminator network to connect the registered images to corresponding intermediate layers for non-rigid deformation, and short residual connection and long residual connection are introduced into the discriminator network to strengthen the propagation of the information of the registered images. The registered images trained from the FCN-based generator are spatially randomly sampled, and the corresponding true value image information is used to train the CNN-based discriminator. And introducing a feedback mechanism in the discriminator to replace single registration estimation and iteratively adjust deformation parameters. The CNN-based discriminator network adopts a method of combining a hybrid convolutional neural network and iterative summation to realize reliable and efficient online 2D-3D image registration.

4. Model evaluation unit

The model evaluation unit mainly uses a Loss Function (Loss Function) to evaluate the similarity between the predicted value and the true value of the model, and the more accurate the Loss Function is, the better the performance of the model is. In the esophagus region image automatic registration system based on deep learning, the quality of registration is measured by a loss function, and the loss function is used for measuring the alignment degree of the registered images. In this system, a cross-entropy loss function is mainly used, which is of the form:

in this function L denotes the cross entropy loss function, N denotes the total number of samples, i denotes the image samples, y denotes the total number of samples_iSample label, p, representing sample i_iIndicating the probability that sample i is predicted as a positive class. The cross entropy can measure the difference degree of two different probability distributions in the same random variable, and is expressed as the difference between the true probability distribution and the predicted probability distribution in deep learning. The cross entropy loss function obtains the probability by using a logic function, and calculates the loss function by combining the cross entropy, so that the calculation speed is high, and the registration speed and the robustness of the preoperative image data and the intraoperative physical anatomical structure can be improved.

The embodiment mainly utilizes the antagonism network, can automatically learn the similarity measurement between the images, and trains the images into a deformable image registration network. The antagonistic image registration network consists of two sub-networks, a registration generator and a registration discriminator. More specifically, the generator network is a registration network that predicts the deformation field, and the discriminator network is a discrimination network that discriminates whether the images are well aligned and feeds back misalignment information to the generator network during training. The generator network and the discriminator network are learned by a training algorithm, the learning metric being whether good registration between the images is achieved. By simultaneously training two multi-level neural networks, one is used as a generator, and the other is used as a discriminator, not only is an image registration network obtained, but also a measurement network which is helpful for evaluating the image registration quality is obtained.

Example two:

an esophagus region image automatic registration method based on deep learning is shown in fig. 1 and comprises the following steps:

the input CT and DR images are divided into moving images and fixed images after being cut and subjected to noise reduction processing by a preprocessing module, wherein the DR images serve as the moving images, the CT images serve as the fixed images, and then the moving images and the fixed images are input into a antagonistic image registration network for registration.

In a antagonistic image registration network, as shown in fig. 2, the generator directly estimates the deformation field parameters between the input CT image and the DR image. The image resampling module then interpolates the input moving image using the transform estimated by the generator network and the true value transform to obtain a resampled new moving image. The discriminator network attempts to determine whether its input images are aligned using the transformations estimated by the generator network and the true value transformations.

As training progresses, both the generator network and the discriminator network are iteratively updated. The feedback of the discriminator will be used to improve the generator, which will eventually be trained to produce an image that approximates the true transformation to pass the discriminator's test.

The method provided by the second embodiment not only utilizes the efficient feedback channel of the generator network to directly estimate the transformation parameters, but also utilizes the discriminator network to evaluate the quality of registration, which makes it very suitable for applications such as image-guided radiation therapy. The registration method has a high level of registration accuracy, calculation time and robustness to abnormal values (such as those generated by surgical instruments) in performing multi-modal image registration.

The specific steps and processes are introduced as follows:

step 1, preprocessing an input image.

CT and DR image data of a patient receiving image-guided radiation therapy in a hospital are collected, cleaned and sorted by an image preprocessing unit, and input into a resistant image registration network for registration. Firstly, carrying out normalization and standardization preliminary preprocessing on CT and DR images, secondly, respectively carrying out noise reduction on the CT and DR images in a data set, and then selecting a region of interest (ROI) to remove a region influencing the registration process. And finally, dividing the data set into a training set, a verification set and a test set.

And 2, generating a registration image through the generator based on the FCN.

And (3) carrying out dimensionality reduction on the CT and DR images subjected to the preprocessing step, inputting the CT and DR images into a registration image generator based on the FCN, directly generating a forming variable field by using the FCN as the generator, outputting the registered images, and inputting the images into a discriminator based on the CNN for judgment. In order to register each image in sequence, the system takes as input a low dimensional vector (hidden vector) obtained by an auto-encoder, which is a network structure consisting of two parts, an encoder and a decoder. And then generating a deformation field from a two-dimensional space to a three-dimensional space through the FCN network with n fully-connected layers. The FCN based generator updates not only the network parameters but also the input hidden vectors when optimizing. An FCN-based training strategy can train all observation regions in each back-propagation. The method has the advantages that the degree of freedom of a deformation field in the registration process is reduced, the training efficiency can be improved, compared with the existing registration method, the method has higher registration precision, and end-to-end image registration is realized.

And 3, distinguishing whether the registered images are aligned or not by a CNN-based discriminator.

The registered images generated by the generator network are input to a network of discriminators, which then train a CNN network to encode a direct mapping from the input registered images to the output classification values. Thus, the classification value measurement and the weight assignment can be realized jointly by the neural network in a mode of an optimal parameter. In consideration of different imaging modes of the multi-modal medical image, the system adopts a multi-scale fusion method based on an image pyramid, so that the registration process is more consistent with human visual perception. In addition, the system also adopts a strategy based on local similarity to carry out self-adaptive adjustment on the decomposition coefficient of the input multi-modal medical image. The system utilizes a CNN-based discriminator network to connect the registered images to corresponding intermediate layers for non-rigid deformation, and short residual connection and long residual connection are introduced into the discriminator network to strengthen the propagation of the information of the registered images. A feedback mechanism is introduced into the discriminator to replace single registration estimation, and iterative adjustment is carried out on deformation parameters, so that reliable and efficient online 2D-3D image registration is realized.

Step 4, evaluating the registration model

The model evaluation unit mainly uses a Loss Function (Loss Function) to evaluate the similarity between the predicted value and the true value of the model, and the more accurate the Loss Function is, the better the performance of the model is. In the esophagus region image automatic registration system based on deep learning, the quality of registration is measured by a loss function, and the loss function is used for measuring the alignment degree of the registered images. The system mainly uses a cross entropy loss function, and the cross entropy can measure the difference degree of two different probability distributions in the same random variable and is expressed as the difference between the real probability distribution and the prediction probability distribution in deep learning. The cross entropy loss function obtains the probability by using a logic function, and calculates the loss function by combining the cross entropy, so that the calculation speed is high, and the registration speed and the robustness of the preoperative image data and the intraoperative physical anatomical structure can be improved.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like which do not require the inventive efforts of those skilled in the art are included in the spirit and principle of the present invention.

Claims

1. An esophagus region image automatic registration method based on deep learning is characterized in that: the method comprises the following steps:

2. An esophagus region image automatic registration system based on deep learning is characterized in that: the method comprises the following steps:

a result output module configured to output the registered esophageal image.

3. The deep learning based esophageal region image automatic registration system as claimed in claim 2, wherein: the image preprocessing module is configured to perform normalization and standardization preliminary preprocessing on the fixed image and the moving image, perform noise reduction processing on the fixed image and the moving image respectively, select an interested region, and remove a region influencing the registration process.

4. The deep learning based esophageal region image automatic registration system as claimed in claim 2, wherein: the image preprocessing module is configured to divide the processed data into a training set, a validation set and a test set.

5. The deep learning based esophageal region image automatic registration system as claimed in claim 2, wherein: the automatic registration module comprises a generator module and a discriminator module which are connected in sequence.

6. The deep learning based esophageal region image automatic registration system of claim 5, wherein: the generator module uses a full convolution neural network, acquires a hidden vector by using an automatic encoder, and generates a deformation field from a two-dimensional space to a three-dimensional space by using the hidden vector as the full convolution neural network comprising a plurality of full connection layers.

7. The deep learning based esophageal region image automatic registration system of claim 5, wherein: the discriminator module adopts a training convolutional neural network for coding direct mapping from an input medical image to an output classification value, connects the registered image to a corresponding middle layer for non-rigid deformation, and introduces short residual connection and long residual connection to strengthen the propagation of the information of the registered image.

8. The deep learning based esophageal region image automatic registration system as claimed in claim 2, wherein: the registration evaluation module is configured to evaluate the similarity degree between the predicted value and the true value of the automatic registration module by using a cross entropy loss function.

9. A computer-readable storage medium characterized by: for storing computer instructions which, when executed by a processor, perform the steps of the method of claim 1.

10. An electronic device, characterized by: comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, which when executed by the processor, perform the steps of the method of claim 1.