WO2024006996A1

WO2024006996A1 - Controlled respiratory motion simulation for patient-specific augmentation

Info

Publication number: WO2024006996A1
Application number: PCT/US2023/069510
Authority: WO
Inventors: Saad NADEEM; Yu-Chi Hu; Donghoon Lee
Original assignee: Memorial Sloan-Kettering Cancer Center; Memorial Hospital For Cancer And Allied Diseases; Sloan-Kettering Institute For Cancer Research
Priority date: 2022-07-01
Filing date: 2023-06-30
Publication date: 2024-01-04

Abstract

The present disclosure provides a machine learning model that learns to generate patient-specific realistic respiratory motion represented in time-varying displacement vector fields (DVFs) at different breathing phases from a 3D CT image and modulates predicted respiration patterns through auxiliary inputs of breathing traces. The model may include a 3D Seq2Seq architecture that includes VoxelMorph to learn the spatiotemporal motion representation in 4D-CT images. Adding patient-specific augmentations to training data can improve performance and accuracy of state-of-the-art deep learning registration algorithms. Also disclosed are breathing trace-modulated respiratory motion simulations for static radiology scans. The disclosed approach can be used for validating DIR algorithms as well as for patient-specific augmentations.

Description

CONTROLLED RESPIRATORY MOTION SIMULATION FOR PATIENT¬

SPECIFIC AUGMENTATION

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of and priority to U.S. Provisional Patent Application No. 63/357,908, filed July 1, 2022, the content of which is incorporated herein by reference in its entirety.

TECHNICAL FIELD

[0002] The present technology relates generally to using artificial intelligence to characterize respiratory motion, and more specifically to respiratory motion simulation using machine learning techniques that may employ deep learning models.

BACKGROUND

[0003] Motion caused by respiration, along with muscular, cardiac, and gastrointestinal systems, complicates diagnosis and image-guided procedures, such as nodule tracking in radiology diagnostics and tumor tracking in radiotherapy, where motion may lead to poor local tumor control and radiation toxicity to the normal organs. This complexity can exhibit itself as motion artifacts in the acquired images along with the inherent difficulty in disentangling the changes in nodule/tumor morphology from those induced by respiratory motion, making the image registration task across different breathing phases as well as across different time points challenging. To validate the image registration accuracy / performance for commissioning solutions available in clinical commercial systems, the American Association of Physicists in Medicine (AAPM) TG-132 recommended independent quality checks using digital phantoms. Current commercial solutions allow creation of synthetic deformation vector fields (DVFs) by user-defined transformations with only a limited degree of freedom. These monotonic transformations cannot capture the realistic respiratory motion.

SUMMARY

[0004] In one aspect, various embodiments of the present disclosure relate to a method of patient-specific simulation of respiratory motion. The method may comprise: receiving multi-phase imaging data of an anatomical region of a patient, the anatomical region comprising a diaphragm of the patient, the multi-phase imaging data comprising imaging data for a plurality of phases of a breathing cycle of the patient, the plurality of phases comprising a first phase of the breathing cycle and subsequent phases following the first phase of the breathing cycle, the multi-phase imaging data comprising (i) first-phase imaging data corresponding to the first phase, and (ii) subsequent-phase imaging data corresponding to the subsequent phases; detecting, via a beacon or an external monitoring device, a ID breathing trace corresponding to the breathing cycle of the patient; generating warped images corresponding to the breathing cycle of the patient, wherein generating the warped images comprises feeding the first-phase imaging data and the ID breathing trace to a deep-learning model comprising one or more artificial neural networks (ANNs), the deeplearning model comprising (i) a first module configured to generate, based at least on the first-phase imaging data and the ID breathing trace, predicted images for the subsequent phases of the breathing cycle, and (ii) a second module configured to generate the warped images based at least in part on the predicted images from the first module.

[0005] In various embodiments, the ID breathing trace is fed to the second module in addition to the predicted images to generate the warped images.

[0006] In various embodiments, the method may comprise training the deep-learning model using one or more training datasets based on four-dimensional computed tomography (4D-CT) imaging data.

[0007] In various embodiments, the 4D-CT imaging data corresponds to a cohort of reference subjects. Training the deep-learning model may comprises generating, using the 4D-CT imaging data, a training ID respiration surrogate for each reference subject in the cohort of subjects.

[0008] In various embodiments, each training ID respiration surrogate is based on diaphragm displacements across a plurality of training phases for each reference subject in the cohort of subjects.

[0009] In various embodiments, the displacements are displacements of an apex of the diaphragm of each corresponding reference subject in the cohort of subjects. [0010] In various embodiments, generating each training ID respiration surrogate comprises detecting an apex of the diaphragm in a breathing cycle of the corresponding reference subject in the cohort of subjects.

[0011] In various embodiments, generating each training ID respiration surrogate further comprises (i) registering an end-of-inhalation phase to all other phases based on diffeomorphic metric mapping to obtain a series of DVFs in the end-of-inhalation phase, and (ii) selecting the apex at a location where a z-axis displacement across the DVFs is largest.

[0012] In various embodiments, registering the end-of-inhalation phase comprises using large deformation diffeomorphic metric mapping (LDDMM).

[0013] In various embodiments, the first-phase imaging data comprises three- dimensional computed tomography (3D-CT) scan data.

[0014] In various embodiments, the one or more ANNs comprise one or more recurrent neural network (RNNs).

[0015] In various embodiments, a first subset of the plurality of phases corresponds to inhalation and a second subset of the plurality of phases corresponds to exhalation. The first-phase imaging data may correspond to an inhale-phase image of the breathing cycle in the first subset.

[0016] In various embodiments, the multi-phase imaging data are computed tomography (CT) scan data.

[0017] In various embodiments, the first module is a Seq2Seq-based module.

[0018] In various embodiments, the first module comprises a stacked convolutional long short-term memory (ConvLSTM) recurrent neural network.

[0019] In various embodiments, the second module is a VoxelMorph-based module.

[0020] In various embodiments, each training epoch of the deep-learning model back- propagates losses from the first and second modules alternatively.

[0021] In another aspect, various embodiments of the present disclosure relate to a method comprising training a deep-learning model. Training the deep-learning model may comprise: receiving 4D-CT imaging data of an anatomical region of each reference subject in a cohort of subjects, the anatomical region comprising a diaphragm of each reference subject, the 4D-CT imaging data comprising imaging data for phases of a reference breathing cycle of each reference subject; generating, for each reference subject in the cohort of subjects, a reference ID respiration surrogate corresponding to the reference breathing cycle of each reference subject in the cohort of subjects; generating a training dataset comprising the reference ID respiration surrogate for each reference subject in the cohort of subjects; and using the training dataset to train a deep-learning model comprising one or more ANNs, the deep-learning model comprising (i) a first module configured to generate, based at least on initial-phase imaging data and a detected patient ID breathing trace, predicted images for subsequent-phase imaging data of a patient breathing cycle, and (ii) a second module configured to generate warped images based at least in part on the predicted images from the first module, wherein the initial-phase imaging data corresponds to an initial phase of the patient breathing cycle, and the subsequent-phase imaging data corresponds to phases following the initial phase in the patient breathing cycle.

[0022] In various embodiments, the method comprises simulating respiratory motion of a patient based on the trained deep-learning model. Simulating respiratory motion of the patient may comprise: receiving multi-phase imaging data of the anatomical region of the patient, the multi-phase imaging data comprising imaging data for a plurality of phases of a patient breathing cycle, the plurality of phases comprising a first phase of the breathing cycle and subsequent phases following the first phase of the breathing cycle, the multiphase imaging data comprising (i) first-phase imaging data corresponding to the first phase, and (ii) subsequent-phase imaging data corresponding to the subsequent phases; detecting, via a beacon or an external monitoring device, a ID breathing trace corresponding to the patient breathing cycle; and generating warped images corresponding to the breathing cycle of the patient, wherein generating the warped images comprises feeding the first-phase imaging data and the ID breathing trace to the first module to obtain predicted images for the subsequent phases of the breathing cycle, and feeding the ID breathing trace and the predicted images from the first module to generate the warped images.

[0023] In yet another aspect, various embodiments of the present disclosure relate to a computing system comprising one or more processors configured to implement any of the above methods. [0024] In yet another aspect, various embodiments of the present disclosure relate a non-transitory computer-readable storage medium with instructions configured to cause one or more processors of a computing system to implement any of the above methods.

[0025] In yet other aspects, various embodiments of the disclosure relate to a computing system (which may be, or may comprise, one or more computing devices) comprising one or more processors that are configured to implement any of the methods disclosed herein.

[0026] In yet other aspects, various embodiments of the disclosure relate to non- transitory computer-readable storage media comprising instructions configured to cause one or more processors of a computing system (which may be, or may comprise, one or more computing devices) to implement any of the methods disclosed herein.

[0027] In yet other aspects, various embodiments of the disclosure relate to processes performed using devices and systems disclosed herein.

[0028] In various embodiments, the disclosed approach includes a method used to generate ID respiration surrogates from 4D-CT images in order to train a deep learning model. Clinically, the ID breathing trace may be obtained or determined, for example, using an implanted beacon, an external monitoring device, or another clinical device.

BRIEF DESCRIPTION OF THE DRAWINGS

[0029] FIG. 1 depicts an example system for implementing the disclosed approach, according to various potential embodiments.

[0030] FIG. 2 depicts example model training and motion simulator processes, according to various potential embodiments.

[0031] FIG. 3 shows a simplified block diagram of a representative server system and client computer system usable to implement certain embodiments of the present disclosure.

[0032] FIG. 4 shows a schematic representation for a deep learning model according to various potential embodiments. A Seq2Seq encode-decoder framework may be used in embodiments of the disclosed model. The model may be built with a 3D convolution layer and 3D convolutional Long Short-Term Memory (ConvLSTM) layers. The last layer of the decoder may be inserted with a spatial transformer to warp the initial phase image with the Deformation Vector Field (DVF). To modulate the respiratory motions the 1-D breathing trace may be inputted along with the phase image. [0033] FIG. 5 depicts a process for determining breathing trace according to various potential embodiments. The maximum z-axis displacement of DVFs between phase 10 and the other phases may be considered as breathing trace. The LDDMM DIR may be used to calculate the DVFs.

[0034] FIG. 6 shows three different breathing traces - BT1 (604), BT2 (608), and BT3 (612) - shown in the plot on the top row, that were used to predict the respiration motion of an internal case, resulting in 3 series of modulated phase images (second, third and fourth row) according to the breathing traces, according to various potential embodiments. The white line (620) indicates the position of the apex of the right diaphragm at the initial phase (left-most column.) The figure overlays the propagated lung (in yellow, 624), heart (in red, 628), esophagus (in blue, 632) and tumor (in green, 636) contours using predicted DVFs.

[0035] FIG. 7 provides Target Registration Error (TRE) results of a point-validated pixel-based breathing thorax model (POPI) dataset. VoxelMorph (a general purpose library for learning-based tools for alignment/registration, and more generally modeling with deformations) with RMSim augmentation outperformed the vanilla VoxelMorph in all 6 cases.

[0036] FIG. 8 provides TRE results of the POPI dataset according to various potential embodiments. VoxelMorph with the disclosed respiratory motion simulator model augmentation outperformed the vanilla VoxelMorph in all 6 cases.

[0037] FIG. 9 provides a schematic diagram for a deep learning architecture according to various potential embodiments. The model includes a Seq2Seq module and a Voxelmorph module. Seq2seq module predicts images of each phases from the input phases and the Voxelmorph module generates DVFs and the warped images.

[0038] FIG. 10 provides two examples of prediction results according to various potential embodiments. The images at the first two row are overlaid images between phase 10 and phase 9-2. The images at the last two row are overlaid images between predicted images and real image at each phase. The green parts (1002) in overlaid images are mismatch regions due to respiratory motion. Patient-specific motion predictions from the model in accordance with the patient’s breathing pattern are shown.

[0039] The foregoing and other features of the present disclosure will become apparent from the following description and appended claims, taken in conjunction with the accompanying drawings. Understanding that these drawings depict only several embodiments in accordance with the disclosure and are, therefore, not to be considered limiting of its scope, the disclosure will be described with additional specificity and detail through use of the accompanying drawings.

DETAILED DESCRIPTION

[0040] It is to be appreciated that certain aspects, modes, embodiments, variations and features of the present methods are described below in various levels of detail in order to provide a substantial understanding of the present technology. It is to be understood that the present disclosure is not limited to particular uses, methods, devices, or systems, each of which can, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.

[0041] Motion caused by respiration makes diagnosis (e.g., lung nodule tracking) and therapeutics (e.g., lung cancer radiotherapy) more challenging. Respiratory motion simulation for a given static 3D CT patient scan, modulated by a surrogate such as ID breathing trace, can generate large amounts of patient-specific augmentations that can drive more accurate deep learning deformable image registration (DIR). In this disclosure, Applicant presents a deep learning model (a respiratory motion simulator model) that learns to generate patient-specific realistic respiratory motion represented in time-varying displacement vector fields (DVFs) at different breathing phases from a 3D CT image and modulates predicted respiration patterns through auxiliary inputs of breathing traces. Specifically, the model consists of a 3D Seq2Seq architecture that includes VoxelMorph to learn the spatiotemporal motion representation in 4D-CT images. Applicant validated the model with both private and public datasets (healthy and cancer patients) and demonstrated that adding particular patient-specific augmentations to training data can improve performance and accuracy of state-of-the-art deep learning registration algorithms. Applicant also showcases breathing trace-modulated respiratory motion simulations for static radiology scans. The proposed approach can be used for validating DIR algorithms as well as for patient-specific augmentations.

[0042] For modeling respiration motion, a representation of motion is time-varying displacement fields obtained by deformable image registrations (DIR) in 4D images acquired in a breathing cycle. Surrogate-driven approaches connect the surrogate breathing signals with the displacement field model parameters using the displacement field as a function of the surrogate signals. However, an exact and direct solution in the highdimensional space of displacement fields is tractable. Traditional approaches employed dimension reduction techniques such as principal component analysis (PCA) for lowdimensional model parameters. In various embodiments, Applicant provides a 3D Seq2Seq model to learn the respiration motion directly from 4D-CT images in an unsupervised way to simultaneously predict DVFs of other breathing phases given a static 3D image. The disclosed approach also allows modulation of this simulated motion via arbitrary ID breathing traces to create large variations. This is turn creates diverse patient-specific data augmentations while also generating ground truth for DIR validation.

[0043] The disclosed approach has several differences and advantages over prior approaches. First, the limited availability of 2D on-treatment images as surrogates makes it less useful than ID breathing traces used in various embodiments for driving motion prediction. Secondly, the disclosed approach is more suitable for data augmentation since ID traces can be arbitrary or from different patients. In prior approaches, 2D surrogate images and 3D images must have been from the same patient. Third, Applicant thoroughly evaluated the disclosed model on external datasets to show the applicability on radiology as well as radiotherapy images.

[0044] Referring to FIG. 1, in various embodiments, a system 100 may include a computing system 110 (which may be or may include one or more computing devices, colocated or remote to each other), a condition detection system 160, an electronic medical record (EMR) system 170, a platform 175, and a therapeutic system 180. The computing system 110 (one or more computing devices) may be used to control and/or exchange signals and/or data with condition detection system 160, EMR system 170, platform 175, and/or therapeutic system 180, directly or via another component of system 100. In certain embodiments, computing system 110 may be used to control and/or exchange data or other signals with condition detection system 160, EMR system 170, platform 175, and/or therapeutic system 180. The computing system 110 may include one or more processors and one or more volatile and non-volatile memories for storing computing code and data that are captured, acquired, recorded, and/or generated. The computing system 110 may include a controller 112 that is configured to exchange control signals with condition detection system 160, EMR system 170, platform 175, therapeutic system 180, and/or any components thereof, allowing the computing system 110 to be used to control, for example, capture of images, acquisition of signals by sensors, positioning or repositioning of subjects and patients, recording or obtaining subject or patient information, and applying therapies.

[0045] A transceiver 114 allows the computing system 110 to exchange readings, control commands, and/or other data, wirelessly or via wires, directly or via networking protocols, with condition detection system 160, EMR system 170, platform 175, and/or therapeutic system 180, or components thereof. One or more user interfaces 116 allow the computing device 110 to receive user inputs (e.g., via a keyboard, touchscreen, microphone, camera, etc.) and provide outputs (e.g., via a display screen, audio speakers, etc.). The computing device 110 may additionally include one or more databases 118 for storing, for example, signals acquired via one or more sensors, biomarker signatures, etc. In some implementations, database 118 (or portions thereof) may alternatively or additionally be part of another computing device that is co-located or remote and in communication with computing device 110, condition detection system 160, EMR system 170, platform 175, and/or therapeutic system 180 or components thereof.

[0046] Condition detection system 160 may include a first imaging system 162 (which may be or may include, e.g., a positron emission tomography (PET) scanner, a single photon emission computed tomography (SPECT) scanner, a magnetic resonance imaging (MRI) scanner, a computed tomography (CT) scanner, and/or other imaging devices and/or sensors), a second imaging system 164 (which may be or may include, e.g., a PET scanner, a SPECT scanner, an MRI scanner, a CT scanner, and/or other imaging devices and/or sensors), and sensors 166 (which may detect, e.g., a position or motion of a patient, organs, tissues, or other states or conditions).

[0047] Therapeutic system 180 may include a radiation source for external beam therapy (e.g., orthovoltage x-ray machines, Cobalt-60 machines, linear accelerators, proton beam machines, neutron beam machines, etc.) and/or one or more other treatment devices. Sensors 184 may be used by therapeutic system 180 to evaluate and guide a treatment (e.g., by detecting level of emitted radiation, a condition or state of the patient, or other states or conditions). In various implementations, components of system 100 may be rearranged or integrated in other configurations. For example, computing system 110 (or components thereof) may be integrated with one or more of the condition detection system 160, therapeutic system 180, and/or components thereof. The condition detection system 160, therapeutic system 180, and/or components thereof may be directed to a platform 175 on which a patient or other subject can be situated (so as to image the subject, apply a treatment or therapy to the subject, and/or detect motion by the subject). In various embodiments, the platform 175 may be movable (e.g., using any combination of motors, magnets, etc.) to allow for positioning and repositioning of subjects (such as microadjustments to compensate for motion of a subject or patient). The platform 175 may include its own sensors to detect a condition or state of the patient or subject.

[0048] The computing system 110 may include an imager 120 configured to direct image capture and obtain imaging data. Imager 120 may include an image generator that may convert raw imaging data from condition detection system 160 into usable medical images or into another form to be analyzed. Computing system 110 may include an image analyzer configured to identify features in images or imaging data or otherwise make use of images or imaging data. Image analyzer may, for example, identify or characterize phases in a respiration cycle. Computing system 110 may also include a motion tracker, which may be or may include a beacon (e.g., an implanted beacon) and/or another external monitoring device that is configured to receive signals corresponding to, for example, position or motion of various parts of a subject or patient. Motion surrogate generator 130 of computing system 110 may generate ID respiration surrogates from 4D-CT images in order to train a machine learning model.

[0049] The computing system 110 also includes a machine learning platform 140 that is used to train and apply various machine learning models as disclosed herein. Machine learning platform may generate training datasets that include or are based on, for example, respiration surrogates from motion surrogate generator 130. A trained model may include multiple modules, such as a first module that is or that includes an image predictor 142, and a second module that is or that includes a warped image generator. For example, the model may be or may include a deep-learning model comprising one or more artificial neural networks (ANNs), with the first module configured to generate, based on imaging data corresponding to a first phase of a breathing cycle a breathing trace, predicted images for subsequent phases of the breathing cycle, and the second module configured to generate warped images based at least in part on the breathing trace and the predicted images from the first module. The model may have been trained using ID respiration surrogates using imaging data for a cohort of subjects as disclosed herein.

[0050] Referring to FIG. 2, an example process 200 is illustrated, according to various potential embodiments. Various elements of process 200 may be implemented by or via system 100 or components thereof. Process 200 may begin (205) with model training (on the left side of FIG. 2), which may be implemented by or via computing system 110 (e.g., imager 120, motion surrogate generator 130, and machine learning platform 140), if a trained model is not already available (e.g., in database 118), or if additional models (or modules thereof) are to be generated or updated through training or retraining with new training datasets. Alternatively, process 200 may begin with motion simulation (on the right side of FIG. 2) for a patient if a trained model is already available. Motion simulation may be implemented by or via computing system 110 (e.g., imager 120, motion tracker 125, and machine learning platform 140) if a suitable trained model is available. In various embodiments, process 200 may comprise both model training (e.g., steps 210 - 225) followed by motion simulation (e.g., steps 250 - 270).

[0051] At 210, imaging data for multiple phases of a breathing cycle may be obtained. This may be obtained by or via imager 120 for each subject (e.g., while on platform 175) in a cohort of subjects (e.g., using imaging systems 162 and/or 164) to capture or represent motion of an anatomical region (e.g., a region that includes a diaphragm of each subject in the cohort). Step 215 involves generating (e.g., by or via motion surrogate generator 130), for each subject in the cohort, a ID respiration surrogate.

[0052] At 220, one or more datasets (e.g., training datasets) may be generated using the ID respiration surrogates. This may be performed, for example, by or via machine learning platform 140. At 225, the dataset may be used to train a deep learning model (e.g., by or via machine learning platform 140). More specifically, training the deep learning model may comprise receiving 4D-CT imaging data (e.g., by or via imager 120 using condition detection system 160) of an anatomical region (comprising a diaphragm) of each reference subject in a cohort of subjects. The 4D-CT imaging data may comprise imaging data for phases of a breathing cycle of each reference subject. A reference ID respiration surrogate corresponding to the breathing cycle of each reference subject in the cohort of subjects may be generated and used to generate the training dataset comprising the reference ID respiration surrogate for each reference subject in the cohort of subjects. The training dataset may be used train a deep-learning model comprising one or more ANNs. The deeplearning model may be designed to have a first module (e.g., image predictor 142) configured to generate, based at least on initial-phase imaging data and a detected patient ID respiration surrogate, predicted images for subsequent-phase imaging data of a patient breathing cycle, and a second module (e.g., warped image generator 144) configured to generate warped images based at least in part on the patient ID breathing trace and the predicted images from the first module. The initial-phase imaging data may correspond to an initial phase of the patient breathing cycle, and the subsequent-phase imaging data may correspond to phases following the initial phase in the patient breathing cycle.

[0053] The trained model may be stored e.g., in database 118) for subsequent use. Process 200 may end (290), or proceed to step 250 for use in motion simulation. (As represented by the dotted line from step 225 to step 260, the trained deep-learning model may subsequently be used to generate and use warped images.

[0054] At 250, multi-phase imaging data for phases of a respiratory cycle of a patient may be obtained (e.g., by or via imager 120 using condition detection system 160). Imaging data may correspond to an anatomical region comprising a diaphragm of the patient. The phases may include a first phase of the breathing cycle and subsequent phases following the first phase of the breathing cycle. The multi-phase imaging data may comprise first-phase imaging data corresponding to the first phase, and subsequent-phase imaging data corresponding to the subsequent phases.

[0055] At 255, a ID breathing trace corresponding to the breathing cycle of the patient may be determined (e.g., based on signals from a beacon or an external monitoring device 166). At 265, the ID breathing trace and first-phase imaging data may be fed to the first module (e.g., image predictor 142) of the model. At 265, the predicted images from the first module may be fed to the second module (e.g., warped image generator 144) to generate the warped images corresponding to the breathing cycle of the patient. The ID breathing trade may be fed to the second module along with the predicted images to generate the warped images. At 270, the warped images may be used clinically (e.g., for radiotherapy, treatment planning, or radiological diagnosis). [0056] Process 200 may end (290), or return to step 250 (e.g., after administering a course of radiotherapy) for subsequent imaging and further treatment based on a change in a condition of the patient.

[0057] Various operations described herein can be implemented on computer systems, which can be of generally conventional design. FIG. 3 shows a simplified block diagram of a representative server system 300 (e.g., computing system 110) and client computer system 314 (e.g., computing system 110, condition detection system 160, imaging system 162, imaging system 164, sensors 166, EMR system 170, platform 175, therapeutic system 180, radiation source 182, and/or sensors 184) usable to implement various embodiments of the present disclosure. In various embodiments, server system 300 or similar systems can implement services or servers described herein or portions thereof. Client computer system 314 or similar systems can implement clients described herein.

[0058] Server system 300 can have a modular design that incorporates a number of modules 302 (e.g., blades in a blade server embodiment); while two modules 302 are shown, any number can be provided. Each module 302 can include processing unit(s) 304 and local storage 306.

[0059] Processing unit(s) 304 can include a single processor, which can have one or more cores, or multiple processors. In some embodiments, processing unit(s) 304 can include a general-purpose primary processor as well as one or more special-purpose coprocessors such as graphics processors, digital signal processors, or the like. In some embodiments, some or all processing units 304 can be implemented using customized circuits, such as application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs). In some embodiments, such integrated circuits execute instructions that are stored on the circuit itself. In other embodiments, processing unit(s) 304 can execute instructions stored in local storage 306. Any type of processors in any combination can be included in processing unit(s) 304.

[0060] Local storage 306 can include volatile storage media (e.g., conventional DRAM, SRAM, SDRAM, or the like) and/or non-volatile storage media (e.g., magnetic or optical disk, flash memory, or the like). Storage media incorporated in local storage 306 can be fixed, removable or upgradeable as desired. Local storage 306 can be physically or logically divided into various subunits such as a system memory, a read-only memory (ROM), and a permanent storage device. The system memory can be a read-and-write memory device or a volatile read-and-write memory, such as dynamic random-access memory. The system memory can store some or all of the instructions and data that processing unit(s) 304 need at runtime. The ROM can store static data and instructions that are needed by processing unit(s) 304. The permanent storage device can be a non-volatile read-and-write memory device that can store instructions and data even when module 302 is powered down. The term “storage medium” as used herein includes any medium in which data can be stored indefinitely (subject to overwriting, electrical disturbance, power loss, or the like) and does not include carrier waves and transitory electronic signals propagating wirelessly or over wired connections.

[0061] In some embodiments, local storage 306 can store one or more software programs to be executed by processing unit(s) 304, such as an operating system and/or programs implementing various server functions or any system or device described herein.

[0062] Software” refers generally to sequences of instructions that, when executed by processing unit(s) 304 cause server system 300 (or portions thereof) to perform various operations, thus defining one or more specific machine embodiments that execute and perform the operations of the software programs. The instructions can be stored as firmware residing in read-only memory and/or program code stored in non-volatile storage media that can be read into volatile working memory for execution by processing unit(s) 304. Software can be implemented as a single program or a collection of separate programs or program modules that interact as desired. From local storage 306 (or non-local storage described below), processing unit(s) 304 can retrieve program instructions to execute and data to process in order to execute various operations described above.

[0063] In some server systems 300, multiple modules 302 can be interconnected via a bus or other interconnect 308, forming a local area network that supports communication between modules 302 and other components of server system 300. Interconnect 308 can be implemented using various technologies including server racks, hubs, routers, etc.

[0064] A wide area network (WAN) interface 310 can provide data communication capability between the local area network (interconnect 308) and a larger network, such as the Internet. Conventional or other activities technologies can be used, including wired (e.g., Ethernet, IEEE 802.3 standards) and/or wireless technologies (e.g., Wi-Fi, IEEE 802.11 standards).

[0065] In some embodiments, local storage 306 is intended to provide working memory for processing unit(s) 304, providing fast access to programs and/or data to be processed while reducing traffic on interconnect 308. Storage for larger quantities of data can be provided on the local area network by one or more mass storage subsystems 312 that can be connected to interconnect 308. Mass storage subsystem 312 can be based on magnetic, optical, semiconductor, or other data storage media. Direct attached storage, storage area networks, network-attached storage, and the like can be used. Any data stores or other collections of data described herein as being produced, consumed, or maintained by a service or server can be stored in mass storage subsystem 312. In some embodiments, additional data storage resources may be accessible via WAN interface 310 (potentially with increased latency).

[0066] Server system 300 can operate in response to requests received via WAN interface 310. For example, one of modules 302 can implement a supervisory function and assign discrete tasks to other modules 302 in response to received requests. Conventional work allocation techniques can be used. As requests are processed, results can be returned to the requester via WAN interface 310. Such operation can generally be automated. Further, in some embodiments, WAN interface 310 can connect multiple server systems 300 to each other, providing scalable systems capable of managing high volumes of activity. Conventional or other techniques for managing server systems and server farms (collections of server systems that cooperate) can be used, including dynamic resource allocation and reallocation.

[0067] Server system 300 can interact with various user-owned or user-operated devices via a wide-area network such as the Internet. An example of a user-operated device is shown in FIG. 3 as client computing system 314. Client computing system 314 can be implemented, for example, as a consumer device such as a smartphone, other mobile phone, tablet computer, wearable computing device e.g., smart watch, eyeglasses), desktop computer, laptop computer, and so on.

[0068] example, client computing system 314 can communicate via WAN interface 310. Client computing system 314 can include conventional computer components such as processing unit(s) 316, storage device 318, network interface 320, user input device 322, and user output device 324. Client computing system 314 can be a computing device implemented in a variety of form factors, such as a desktop computer, laptop computer, tablet computer, smartphone, other mobile computing device, wearable computing device, or the like.

[0069] Processor 316 and storage device 318 can be similar to processing unit(s) 304 and local storage 306 described above. Suitable devices can be selected based on the demands to be placed on client computing system 314; for example, client computing system 314 can be implemented as a “thin” client with limited processing capability or as a high-powered computing device. Client computing system 314 can be provisioned with program code executable by processing unit(s) 316 to enable various interactions with server system 300 of a message management service such as accessing messages, performing actions on messages, and other interactions described above. Some client computing systems 314 can also interact with a messaging service independently of the message management service.

[0070] Network interface 320 can provide a connection to a wide area network (e.g., the Internet) to which WAN interface 310 of server system 300 is also connected. In various embodiments, network interface 320 can include a wired interface (e.g., Ethernet) and/or a wireless interface implementing various RF data communication standards such as Wi-Fi, Bluetooth, or cellular data network standards (e.g., 3G, 4G, LTE, etc.).

[0071] User input device 322 can include any device (or devices) via which a user can provide signals to client computing system 314; client computing system 314 can interpret the signals as indicative of particular user requests or information. In various embodiments, user input device 322 can include any or all of a keyboard, touch pad, touch screen, mouse or other pointing device, scroll wheel, click wheel, dial, button, switch, keypad, microphone, and so on.

[0072] User output device 324 can include any device via which client computing system 314 can provide information to a user. For example, user output device 324 can include a display to display images generated by or delivered to client computing system 314. The display can incorporate various image generation technologies, e.g., a liquid crystal display (LCD), light-emitting diode (LED) including organic light-emitting diodes (OLED), projection system, cathode ray tube (CRT), or the like, together with supporting electronics (e.g., digital -to-analog or analog-to-digital converters, signal processors, or the like). Some embodiments can include a device such as a touchscreen that function as both input and output device. In some embodiments, other user output devices 324 can be provided in addition to or instead of a display. Examples include indicator lights, speakers, tactile “display” devices, printers, and so on.

[0073] Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a computer readable storage medium. Many of the features described in this specification can be implemented as processes that are specified as a set of program instructions encoded on a computer readable storage medium. When these program instructions are executed by one or more processing units, they cause the processing unit(s) to perform various operation indicated in the program instructions. Examples of program instructions or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter. Through suitable programming, processing unit(s) 304 and 316 can provide various functionality for server system 300 and client computing system 314, including any of the functionality described herein as being performed by a server or client, or other functionality associated with message management services.

[0074] It will be appreciated that server system 300 and client computing system 314 are illustrative and that variations and modifications are possible. Computer systems used in connection with embodiments of the present disclosure can have other capabilities not specifically described here. Further, while server system 300 and client computing system 314 are described with reference to particular blocks, it is to be understood that these blocks are defined for convenience of description and are not intended to imply a particular physical arrangement of component parts. For instance, different blocks can be but need not be located in the same facility, in the same server rack, or on the same motherboard. Further, the blocks need not correspond to physically distinct components. Blocks can be configured to perform various operations, e.g., by programming a processor or providing appropriate control circuitry, and various blocks might or might not be reconfigurable depending on how the initial configuration is obtained. Embodiments of the present disclosure can be realized in a variety of apparatus including electronic devices implemented using any combination of circuitry and software.

[0075] While the disclosure has been described with respect to specific embodiments, one skilled in the art will recognize that numerous modifications are possible. For instance, although specific examples of rules (including triggering conditions and/or resulting actions) and processes for generating suggested rules are described, other rules and processes can be implemented. Embodiments of the disclosure can be realized using a variety of computer systems and communication technologies including but not limited to specific examples described herein.

[0076] Embodiments of the present disclosure can be realized using any combination of dedicated components and/or programmable processors and/or other programmable devices. The various processes described herein can be implemented on the same processor or different processors in any combination. Where components are described as being configured to perform certain operations, such configuration can be accomplished, e.g., by designing electronic circuits to perform the operation, by programming programmable electronic circuits (such as microprocessors) to perform the operation, or any combination thereof. Further, while the embodiments described above may make reference to specific hardware and software components, those skilled in the art will appreciate that different combinations of hardware and/or software components may also be used and that particular operations described as being implemented in hardware might also be implemented in software or vice versa.

[0077] Computer programs incorporating various features of the present disclosure may be encoded and stored on various computer readable storage media; suitable media include magnetic disk or tape, optical storage media such as compact disk (CD) or DVD (digital versatile disk), flash memory, and other non-transitory media. Computer readable media encoded with the program code may be packaged with a compatible electronic device, or the program code may be provided separately from electronic devices (e.g., via Internet download or as a separately packaged computer-readable storage medium).

[0078] Thus, although the disclosure has been described with respect to specific embodiments, it will be appreciated that the disclosure is intended to cover all modifications and equivalents within the scope of the following claims. EXAMPLES

[0079] The present technology is further illustrated by the following Examples, which should not be construed as limiting in any way.

Example 1

Datasets

[0080] Applicant used an internal lung 4D-CT dataset of 159 lung cancer patients receiving radiotherapy. The 4D-CTs were acquired using Philips Brilliance Big Bore or GE Advantage and binned into 10 phases using the vendor’s proprietary software with breathing signals from bellows or external fiducial markers. The x-ray energy for the CT image was 120 kVp. The image slice dimension was 512 x 512, while the number of image slices varied patient by patient. The internal dataset was anonymized and IRB approved for this study. Applicant split the internal dataset with 105 for training and 54 for testing.

[0081] Applicant used 20 cases of the Lung Nodule Analysis (LUNA) challenge dataset to show that the disclosed respiratory motion simulator model trained with the internal dataset can be effectively applied to an external radiology and/or diagnostic dataset to generate realistic respiration motions. Applicant validated the effectiveness of data augmentation using synthetic respiration motion images generated from the disclosed deeplearning model in the deformable registration task. For this, Applicant used the Leam2Reg 2020 and POPI datasets. The Learn2Reg dataset consists of 30 subjects (20 for the training / 10 for the testing) with 3D High-resolution computed tomography (HRCT) thorax images taken in inhale and exhale phases. For landmarks evaluation, Applicant used the POPI model which contains 6 inhale/exhale 4D-CT datasets with landmarks. For the deformable registration task, Applicant used VoxelMorph, a well-known unsupervised deep learning method, to train on 20 Learn2Reg training datasets and test on the 6 POPI datasets (with landmarks). For each Leam2Reg 20 inhale/exhale pairs, Applicant generated other phases of images using the deep-learning model, increasing the sample size to 200 in total to augment the training of VoxelMorph. All datasets used in this study were cropped to eliminate the background and resampled to 128*128*128 with 2mm voxel size.

Realistic Respiratory Motion Simulation [0082] The respiratory motion simulator model, illustrated in FIG. 4, is a novel deep learning architecture that can predict patient-specific respiratory motion. The respiratory simulator comprises a Seq2Seq network and an unsupervised deformable image registration network, VoxelMorph. The Seq2Seq model comprises of 3D convolutional layer (404) and stacked multiple 3D Convolutional-Long Short-Term -Memory (ConvLSTM) layers (408). The input sequence passes through the 3D convolution and 3D max-pooling layers (416) to extract important spatial features followed by temporal feature evaluation of the aggregated spatial feature embedding from previous time points with 3D ConvLSTMs. The output of 3D ConvLSTMs is upsampled (420) and sent to a 3D Convolution layer to output DVF (424) of a given time point (breathing phase) representing the predicted patient-specific respiratory motion. The DVF is then processed by a spatial transformer (412) to transform the initial phase image to a predicted phase image at different breathing phase.

[0083] Moreover, to modulate the predicted motion with a patient-specific pattern, Applicant used an auxiliary input of ID breathing trace. In this study, Applicant considered the amplitude of diaphragm motion as the surrogate of the respiratory signal. To extract the ID breathing trace for each training case, Applicant used large deformation diffeomorphic metric mapping (LDDMM) DIR provided by Advanced Normalization Tools (ANTs) to obtain DVFs between the phase at the end of inhalation to the other phases. The apex of the diagram was determined by finding the lung surface voxel with the maximum z-axis displacement among the DVFs. The z-axis displacement of the apex voxel at each phase resembles the ID breathing trace. FIG. 5 depicts the process of preparing the ID respiratory signal. The hidden state of ConvLSTM at each phase is modulated by a element-wise multiplication of the phase-amplitude of the trace: m(H_t, b_t) = btH_t Eq. 1 where H_t is the hidden state of phase t and b_t is the amplitude of the breathing trace at phase t.

[0084] The loss function for training is similar to VoxelMorph, which includes the mean-squared error of ground truth phase image and predicted phase image, and the regularization on the gradient of DVF by promoting smoothness of DVF:

where Xo is the initial phase image (phase 10 in this study), 7' is the spatial transform, </>_t is the predicted DVF for phase t and Y_t is the ground truth phase image at phase t.

[0085] Applicant developed the respiratory motion simulator model using the PyTorch library (version 1.2.0). The learning rate was 0.001 by referring to the Seq2Seq paper and the optimizer was Adam. Due to the large data size of 4D image sequence, the batch size was limited to 1 and the number of feature channels was 64 by considering GPU memory and training time. The model was trained and tested in an internal high computing cluster with 4 NVIDIA A40 GPUs with 48GB memory. The model consumed 35.2 GPU memory and the training time was approximately 72 hours. The inference time for 9 phases and 54 total test cases from the internal dataset was less than 3 min.

Data Augmentation by Respiratory Motion Simulator Model

[0086] Since the respiratory motion simulator model can generate a series of realistic respiratory motion-induced images from a single 3D CT, one of its use cases is data augmentation for training image registration algorithms. For each of 20 training cases in the Learn2Reg Grand Challenge dataset, Applicant randomly selected a ID breathing trace extracted from an internal dataset to modulate the motion used to transform the inhalation phase image to the other 9 phases, thus increasing the training size 10-fold. Applicant first trained a VoxelMorph model with the original 20 inhalation-to-exhalation pairs of images in the Leam2Reg training cases. Applicant then trained another VoxelMorph model with the augmented data including 200 pairs of inhalation-to-a-phase images. To compare the two models for the effectiveness of data augmentation using respiratory motion simulator model, Applicant evaluated the results with structure similarity and dice of lung segmentation between fixed images/contour (lung) and propagated images/contours (lung).

Results

[0087] FIG. 6 is an example of a case simulated by an embodiment of the respiratory motion simulator model. The plot in the top row in the figure illustrates the three ID breathing traces used for the modulation. The breathing trace 1 (BT1, 604), the middle line, represents the original respiratory signal for the case. BT2 (608) and BT3 (612), the bottom and top lines, respectively, are traces from other cases and used to generate the simulated images in the second and third rows. The white line (620) indicates the position of the apex of the diaphragm in the initial phase (the first column). It is used as a reference to show the relative positions of the diaphragm at different phases. The diaphragm on the third row clearly shows the most significant movement among the three as BT3 has the largest amplitude in the trace.

[0088] The results of anatomical structures propagation using the predicted DVFs are also shown in FIG. 6. Applicant propagated the lung (624), heart (632), esophagus (632), and tumor (636) from the initial phase image. The propagated contours are well-matched with the predicted image and the motion of structures looks very realistic. Applicant also provided the video of the simulated 4D CT along with the ground truth 4D CT, as well as 3D visualization in the supplementary material. Applicant calculated the SSIM (structure similarity index measure) between the ground truth phase images and predicted phase images for all internal test cases. The average SSIM is 0.93±0.03.

[0089] In POPI dataset, there is one case in which the lung segmentation on all phases is available. For this case, Applicant extracted ID breathing trace from the lung segmentations as done for Applicant’s internal dataset. The respiratory motion simulator model trained with Applicant’s internal dataset predicted the remaining phases from the inhale phase with the modulation from the ID breathing trace. The average TRE of landmarks propagated with the predicted DVFs in this case is 0.92±0.64mm, showing that respiratory motion simulator model can accurately predict the patient-specific motion from the patient’s ID breathing trace. FIG. 7 shows the TRE results for all predicted phases in this case. For the three other 4D-CT cases in POPI which do not have lung segmentation masks, Applicant performed the automatic segmentation for extracting the ID breathing traces.

[0090] Additionally, Applicant used the respiratory motion simulator model for augmenting the Leam2Reg Challenge dataset. The DICE (similarity coefficient) of lung segmentation of 10 Learn2Reg testing cases using the vanilla VoxelMorph was 0.96 ± 0.01 while the model with data augmentation using the respiratory motion simulator model was 0.97 ± 0.01 (p < 0.001 using the paired t-test.) The SSIM between the warped images and the ground truth images was 0.88 ± 0.02 for the vanilla model and 0.89 ± 0.02 (p < 0.001) for the model with data augmentation. To validate the improvement of DIR using VoxelMorph with augmentation, Applicant propagated the landmark points from the inhale phase to the exhale phase for the 6 cases available in POPI dataset and TRE were calculated. On average, pre-DIR TRE was 8.05±5.61mm, vanilla VoxelMorph was 8.12±5.78mm compared to 6.58±6.38mm for VoxelMorph with augmentation (p < 3-e48.) The TRE comparison of all 6 cases are shown in FIG. 8.

[0091] As discussed above, in various embodiment, a 3D Seq2Seq network can be used in respiratory motion simulation models to predict patient-specific realistic motion modulated with ID breathing trace. This generates patient-specific augmentations for improving deep learning DIR algorithm performance and provides new DIR validation dataset. Applicant evaluated the model on an internal 4D-CT dataset and showed a high similarity between the predicted and ground truth images. Applicant applied the internally trained model to an external 4D-CT case in POPI where Applicant achieved sub -millimeter accuracy on average with the motion predicted from the disclosed model. Applicant also augmented Leam2Reg dataset with RMSim to train a VoxelMorph model. The validation of registrations on POPI dataset showed the improvement over the vanilla VoxelMorph (without augmentation). Applicant also augmented the LUNA dataset to demonstrate that the model trained on the internal dataset can be effectively applied to an external radiology dataset. Although for demonstration purposes, the model was used to predict the motion in one breathing cycle, the model can be fine-tuned to predict multiple cycles in one-shot. In various embodiments, this may be accomplished by making the model bi-directional and using cross-attention to improve temporal dynamics in a long sequence. In various embodiments, the model may be extended to 4D-MR and 4D-CBCT datasets.

Example 2

[0092] To validate the accuracy/performance for commissioning deformable image registration solutions for clinical use, the American Association of Physicists in Medicine (AAPM) TGI 32 recommended independent quality checks using digital phantoms. Commercial software was developed to create synthetic deformation vector fields (DVFs) served as ground truth for this purpose, but these monotonic transformations from a limited degree of freedom cannot represent the realistic respiratory motion. On the other hand, biomechanical modeling addresses the issue but requires delineation of the organs supported and is time-consuming. Embodiments of the disclosed approach use a novel deep learning approach that only requires a 3D CT and ID breathing trace at inference time to predict a sequence of DVFs representing the realistic respiration motion. The deep learning model learned the motion representation from 4DCT and its corresponding breathing trace. [0093] FIG. 9 is a schematic image for embodiments of the disclosed deep learning framework. In this example, the network consisted of the Seq2Seq and Voxelmorph. The convolution long short-term memory (ConvLSTM) was mainly used for the Seq2Seq module to learn spatiotemporal features across different phases in a 4DCT and its ID respiratory motion trace. Applicant used the superior-inferior displacements of the apex of the diaphragm across the phases as the breathing trace. To automatically detect the apex, Applicant first registered the end-of-the inhalation phase (phase 10 in the study) to all other phases using large deformation diffeomorphic metric mapping (LDDMM) to obtain a series of DVFs in the reference phase (phase 10) CT. Then the apex of the diaphragm is selected at the location where it has the largest z-axis displacement across the DVFs. The output of Seq2Seq2 is predicted images at the later phases, which consequently are sent to a VoxelMorph network along with the initial phase image to generate DVFs. Applicant trained the proposed network end-to-end. In each epoch, the loss from VoxelMorph and the loss from Seq2Seq are back-propagated alternatively. The loss functions are defined as follows:

Loss_voxelmorph = iMSE VFi Yo) - K_£') + a || DVF_L ||

Eq. 3

Loss_Seq2Seq = MSE Y - Y + /3MSE( VF_i(Y₀)' - T_£)

Eq. 4

[0094] In FIG. 9, 3D CT images at each phase are depicted as 904, 3D convolution layers are depicted as 908, 3D ConvLSTM is depicted as 912, 3D maximum pooling is depicted as 916, 3D predicted images at future phases are depicted as 920, 3D warped image (phase 10 to each phase) is depicted as 924, predicted DVF is depicted as 928, and 3D upsampling is depicted as 932.

[0095] FIG. 10 provides two examples of motion prediction from the disclosed deep learning framework. The first two row images are overlaid images between an initial phase (phase 10) and 9 later phases (phase 9-1) images, and the last two row images are overlaid images between phase 10 and warped phase 10 images of later phases using the predicted DVFs. The motion of the diaphragm was well predicted in the DVFs thus the mismatch was diminished. Each patient has different respiratory patterns and the proposed model well reflects the patient-specific respiratory motion by incorporating patient-specific breathing trace in the proposed framework.

[0096] Table 1 provides the results of Peak Signal to Noise Ratio (PSNR) and Structure Similarity Intensity Measure (SSIM). The PSNR and SSIM was variable based on patient respiratory motion. As the respiratory phase became close to phase 5 or 4 (inhale) from phase 10 (exhale) the quantitative values of PSNR and SSIM were decreased due to respiratory motion, and they were recovered as the phase became close to phase 1 (exhale) from phase 5 or 4. The proposed framework can predict patient specific respiratory motion thus, the differences of PSNR and SSIM between inhale and exhale was diminished.

Table 1: The average of 40 testing patients Peak Signal to Noise Ratio (PSNR) and Structure Similarity Intensity Measure (SSIM).

EQUIVALENTS

[0097] The present technology is not to be limited in terms of the particular embodiments described in this application, which are intended as single illustrations of individual aspects of the present technology. Many modifications and variations of this present technology can be made without departing from its spirit and scope, as will be apparent to those skilled in the art. Functionally equivalent methods and apparatuses within the scope of the present technology, in addition to those enumerated herein, will be apparent to those skilled in the art from the foregoing descriptions. Such modifications and variations are intended to fall within the scope of the present technology. It is to be understood that this present technology is not limited to particular methods, reagents, compounds compositions or biological systems, which can, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.

[0098] In addition, where features or aspects of the disclosure are described in terms of Markush groups, those skilled in the art will recognize that the disclosure is also thereby described in terms of any individual member or subgroup of members of the Markush group.

[0099] As will be understood by one skilled in the art, for any and all purposes, particularly in terms of providing a written description, all ranges disclosed herein also encompass any and all possible subranges and combinations of subranges thereof. Any listed range can be easily recognized as sufficiently describing and enabling the same range being broken down into at least equal halves, thirds, quarters, fifths, tenths, etc. As a nonlimiting example, each range discussed herein can be readily broken down into a lower third, middle third and upper third, etc. As will also be understood by one skilled in the art all language such as “up to,” “at least,” “greater than,” “less than,” and the like, include the number recited and refer to ranges which can be subsequently broken down into subranges as discussed above. Finally, as will be understood by one skilled in the art, a range includes each individual member. Thus, for example, a group having 1-3 cells refers to groups having 1, 2, or 3 cells. Similarly, a group having 1-5 cells refers to groups having 1, 2, 3, 4, or 5 cells, and so forth.

[0100] All patents, patent applications, provisional applications, and publications referred to or cited herein are incorporated by reference in their entirety, including all figures and tables, to the extent they are not inconsistent with the explicit teachings of this specification.

Claims

1. A method of patient-specific simulation of respiratory motion, the method comprising: receiving multi-phase imaging data of an anatomical region of a patient, the anatomical region comprising a diaphragm of the patient, the multi-phase imaging data comprising imaging data for a plurality of phases of a breathing cycle of the patient, the plurality of phases comprising a first phase of the breathing cycle and subsequent phases following the first phase of the breathing cycle, the multi-phase imaging data comprising (i) first-phase imaging data corresponding to the first phase, and (ii) subsequent-phase imaging data corresponding to the subsequent phases; detecting, via a beacon or an external monitoring device, a ID breathing trace corresponding to the breathing cycle of the patient; generating warped images corresponding to the breathing cycle of the patient, wherein generating the warped images comprises feeding the first-phase imaging data and the ID breathing trace to a deep-learning model comprising one or more artificial neural networks (ANNs), the deep-learning model comprising (i) a first module configured to generate, based at least on the first-phase imaging data and the ID breathing trace, predicted images for the subsequent phases of the breathing cycle, and (ii) a second module configured to generate the warped images based at least in part on the predicted images from the first module.

2. The method of claim 1, wherein the ID breathing trace is fed to the second module in addition to the predicted images to generate the warped images.

3. The method of claim 1, further comprising training the deep-learning model using one or more training datasets based on four-dimensional computed tomography (4D- CT) imaging data.

4. The method of claim 3, wherein the 4D-CT imaging data corresponds to a cohort of reference subjects, and wherein training the deep-learning model comprises generating, using the 4D-CT imaging data, a training ID respiration surrogate for each reference subject in the cohort of subjects.

5. The method of claim 4, wherein each training ID respiration surrogate is based on diaphragm displacements across a plurality of training phases for each reference subject in the cohort of subjects.

6. The method of claim 5, wherein the displacements are displacements of an apex of the diaphragm of each corresponding reference subject in the cohort of subjects.

7. The method of claim 4, wherein generating each training ID respiration surrogate comprises detecting an apex of the diaphragm in a breathing cycle of the corresponding reference subject in the cohort of subjects.

8. The method of claim 7, wherein generating each training ID respiration surrogate further comprises (i) registering an end-of-inhalation phase to all other phases based on diffeom orphic metric mapping to obtain a series of DVFs in the end-of-inhalation phase, and (ii) selecting the apex at a location where a z-axis displacement across the DVFs is largest.

9. The method of claim 8, wherein registering the end-of-inhalation phase comprises using large deformation diffeomorphic metric mapping (LDDMM).

10. The method of claim 1, wherein the first-phase imaging data comprises three-dimensional computed tomography (3D-CT) scan data.

11. The method of claim 1, wherein the one or more ANNs comprise one or more recurrent neural network (RNNs).

12. The method of claim 1, wherein a first subset of the plurality of phases corresponds to inhalation and a second subset of the plurality of phases corresponds to exhalation, and wherein the first-phase imaging data corresponds to an inhale-phase image of the breathing cycle in the first subset.

13. The method of claim 1, wherein the multi-phase imaging data are computed tomography (CT) scan data.

14. The method of claim 1, wherein the first module is a Seq2Seq-based module.

15. The method of claim 1, wherein the first module comprises a stacked convolutional long short-term memory (ConvLSTM) recurrent neural network.

16. The method of claim 1, wherein the second module is a VoxelMorph-based module.

17. The method of claim 1, wherein each training epoch of the deep-learning model back-propagates losses from the first and second modules alternatively.

18. A method comprising training a deep-learning model, wherein training the deep-learning model comprises: receiving 4D-CT imaging data of an anatomical region of each reference subject in a cohort of subjects, the anatomical region comprising a diaphragm of each reference subject, the 4D-CT imaging data comprising imaging data for phases of a reference breathing cycle of each reference subject; generating, for each reference subject in the cohort of subjects, a reference ID respiration surrogate corresponding to the reference breathing cycle of each reference subject in the cohort of subjects; generating a training dataset comprising the reference ID respiration surrogate for each reference subject in the cohort of subjects; and using the training dataset to train a deep-learning model comprising one or more ANNs, the deep-learning model comprising (i) a first module configured to generate, based at least on initial-phase imaging data and a detected patient ID breathing trace, predicted images for subsequent-phase imaging data of a patient breathing cycle, and (ii) a second module configured to generate warped images based at least in part on the predicted images from the first module, wherein the initial-phase imaging data corresponds to an initial phase of the patient breathing cycle, and the subsequent-phase imaging data corresponds to phases following the initial phase in the patient breathing cycle.

19. The method of claim 18, further comprising simulating respiratory motion of a patient based on the trained deep-learning model, wherein simulating respiratory motion of the patient comprises: receiving multi-phase imaging data of the anatomical region of the patient, the multi-phase imaging data comprising imaging data for a plurality of phases of a patient breathing cycle, the plurality of phases comprising a first phase of the breathing cycle and subsequent phases following the first phase of the breathing cycle, the multi-phase imaging data comprising (i) first-phase imaging data corresponding to the first phase, and (ii) subsequent-phase imaging data corresponding to the subsequent phases; detecting, via a beacon or an external monitoring device, a ID breathing trace corresponding to the patient breathing cycle; and generating warped images corresponding to the breathing cycle of the patient, wherein generating the warped images comprises feeding the first-phase imaging data and the ID breathing trace to the first module to obtain predicted images for the subsequent phases of the breathing cycle, and feeding the ID breathing trace and the predicted images from the first module to generate the warped images.

20. A computing system comprising one or more processors configured to implement any of the above methods, or a non-transitory computer-readable storage medium with instructions configured to cause one or more processors of a computing system to implement any of the above methods.