CN114072845A

CN114072845A - SCT image generation using cycleGAN with deformable layers

Info

Publication number: CN114072845A
Application number: CN201980098267.XA
Authority: CN
Inventors: 徐峤峰
Original assignee: Elekta Ltd
Current assignee: Elekta Ltd
Priority date: 2019-06-06
Filing date: 2019-06-27
Publication date: 2022-02-18
Also published as: WO2020246996A1; AU2019449137A1; JP2022536107A; US20220318956A1; EP3980972A1; AU2019449137B2; JP7245364B2

Abstract

Techniques are provided for generating a composite computed tomography (sCT) image from Cone Beam Computed Tomography (CBCT) images. The technology comprises the following steps: receiving a CBCT image of a subject; generating an sCT image corresponding to the CBCT image using a generative model trained in a generative countermeasure network (GAN) based on one or more deformable offset layers to process the CBCT image as input and provide the sCT image as output; and generating a display of the sCT image for medical analysis of the subject.

Description

SCT image generation using cycleGAN with deformable layers

Priority requirement

This application claims priority to U.S. provisional application No. 62/858,156, filed 6/2019, the entire contents of which are incorporated herein by reference.

Technical Field

Embodiments of the present disclosure generally relate to Cone Beam Computed Tomography (CBCT) imaging, computed tomography imaging, and artificial intelligence processing techniques. In particular, the present disclosure relates to the generation and use of data models in a generative countermeasure network (GAN) suitable for use with CBCT and computed tomography images and system operations.

Background

X-ray Cone Beam Computed Tomography (CBCT) imaging has been used in radiotherapy for patient setup and adaptive re-planning. In some cases, CBCT imaging has also been used for diagnostic purposes, such as dental imaging and implant planning. In addition, X-ray CBCT imaging has been used in many imaging-related applications such as in micro-computed tomography. However, as observed by medical physicists, doctors and researchers, the image quality of CBCT images can be quite low. In general, CBCT images may contain different types of artifacts (including various types of noise or visualization structures in the reconstructed data that are not present in the real object under study).

Artifacts and noise in CBCT images can corrupt the re-planning of the adaptive therapy, affect the diagnosis, or make many other image processing steps difficult or even impossible (e.g., image segmentation). Since each artifact may be caused by one or more different factors, different methods may be used to suppress different artifacts. For radiotherapy and other clinical applications, typically, in addition to CBCT images (which may be acquired daily), there may be one or more other Computed Tomography (CT) image datasets (e.g., planned CT images) available together. Generally, CT images have much higher image quality with more accurate contrast or other information and fewer artifacts. Although researchers have conducted many studies and developed several related methods to reduce artifacts in CBCT images, there is currently no existing simple and effective method that can suppress all or most of the common artifacts. Therefore, there is a strong need to develop a novel, effective and simple method to suppress and eliminate artifacts and noise in CBCT images.

Disclosure of Invention

The present disclosure includes processes for developing, training, and utilizing Artificial Intelligence (AI) processing techniques to generate simulated or composite ct (sct) images that correspond to or represent input CBCT images. Such AI processing techniques may include generative confrontation networks (GANs), recurrent confrontation networks (cyclegans), Convolutional Neural Networks (CNNs), Deep Convolutional Neural Networks (DCNNs), deformable convolutional networks, deformable offset layers, spatial transformer networks, and other forms of Machine Learning (ML) implementations. The present disclosure specifically includes a number of illustrative examples relating to the following operations: operating within the GAN or CycleGAN using the discriminator model and the generator model to learn a model of the paired CBCT image and real CT image to enhance and produce an artifact-free or substantially artifact-free CBCT image or a CBCT image with a substantially reduced amount of artifacts. It will be apparent that the presently described use and analysis of imaging data (e.g., CBCT images) as part of GAN or CycleGAN (as well as other disclosed AI and ML techniques) can be incorporated into other medical workflows for a wide variety of diagnostic, assessment, interpretation, or therapy settings.

In some embodiments, a method, system, and transitory or non-transitory computer readable medium for generating a composite computed tomography (sCT) image from Cone Beam Computed Tomography (CBCT) images is provided, the method comprising: receiving a CBCT image of a subject; generating an sCT image corresponding to the CBCT image using a generative model trained in a generative countermeasure network (GAN) based on one or more deformable offset layers to process the CBCT image as input and provide the sCT image as output; and generating a display of the sCT image for medical analysis of the subject.

In some implementations, the generative countermeasure network is configured to: training a generative model using a discriminative model; using antagonistic training between the discriminative model and the generative model to establish values applied by the generative model and the discriminative model; and the generative model and the discriminative model comprise respective convolutional neural networks.

In some implementations, the counter training includes: training a generative model to generate a first sCT image from a given CBCT image by applying a first set of one or more deformable offset layers of the one or more deformable offset layers to the given CBCT image; training the generative model to generate a second sCT image from the given CBCT image without applying the first set of one or more deformable offset layers of the one or more deformable offset layers to the given CBCT image; and training a discriminant model to classify the first sCT image as a synthesized Computed Tomography (CT) image or a real Computed Tomography (CT) image, and an output of the generative model is used to train the discriminant model, and an output of the discriminant model is used to train the generative model.

In some implementations, the GAN is trained using a cycle generator antagonistic network (CycleGAN) that includes a generative model and a discriminative model, wherein the generative model is a first generative model and the discriminative model is a first discriminative model, wherein the CycleGAN further includes: a second generative model trained to: processing a given CT image as input; providing as output a first composite (sCBCT) image by applying a second set of one or more deformable offset layers of the one or more deformable offset layers to the given CT image; and providing as output a second composite (sCBCT) image without applying a second set of one or more deformable offset layers of the one or more deformable offset layers to the given CT image; and a second decision model trained to classify the first composite sccbct image as either a composite CBCT image or a true CBCT image.

In some implementations, the CycleGAN includes a first portion for training a first generative model, wherein the first generative model includes a first input interface and a second input interface and a first shared generator portion, wherein the second generative model includes a third input interface and a fourth input interface and a second shared generator portion, the first portion trained to: obtaining a training CBCT image paired with the real CT image; sending the training CBCT image to an input of a first generative model via a first path and a second path to output a first sCT image and a second sCT image, respectively, the first path including a first input interface including a first set of one or more deformable offset layers of the one or more deformable offset layers and a first set of one or more convolutional layers of the one or more convolutional layers, the second path including a second input interface including the first set of one or more convolutional layers of the one or more convolutional layers without the first set of one or more deformable offset layers; receiving a first sCT image at an input of a first discriminant model to classify the first sCT image as a synthesized CT image or a true CT image; and receiving the first sCT image and the second sCT image at an input of a second generative model via a third path and a fourth path to generate a first cyclical CBCT image and a second cyclical CBCT image, respectively, for calculating a cyclical loss of consistency, the third path including a third input interface including a second set of one or more deformable offset layers of the one or more deformable offset layers and a second set of one or more convolutional layers of the one or more convolutional layers, the fourth path including a fourth input interface including a second set of one or more convolutional layers of the one or more convolutional layers without the second set of one or more deformable offset layers.

In some implementations, the CycleGAN includes a second portion trained to: sending the real CT image to an input of a second generative model via a fifth path and a sixth path to output a first composite CBCT image and a second composite CBCT image, respectively, the fifth path comprising a third input interface comprising a second set of one or more deformable offset layers of the one or more deformable offset layers and a second set of one or more convolutional layers of the one or more convolutional layers, the sixth path comprising a fourth input interface comprising the second set of one or more convolutional layers of the one or more convolutional layers without the second set of one or more deformable offset layers; receiving the first composite CBCT image at an input of the second decision model to classify the first composite CBCT image as a composite CBCT image or a true CBCT image; and receiving the first and second composite CBCT images at an input of the first generative model via a seventh path and an eighth path to generate first and second cyclic CT images for calculating a cyclic consistency loss, the seventh path including a first input interface including a first set of one or more deformable offset layers of the one or more deformable offset layers and a first set of one or more convolutional layers of the one or more convolutional layers, the eighth path including a second input interface including a first set of one or more convolutional layers of the one or more convolutional layers without the first set of one or more deformable offset layers.

In some implementations, the cycle consistency loss is generated based on a comparison of the first and second cycle CBCT images to the training CBCT image and a comparison of the first and second cycle CT images to the true CT image; training the first generative model using the second sCT image to minimize or reduce a first pixel-based loss term, the first pixel-based loss term representing an expectation of differences between the plurality of synthetic CT images and respective pairs of real CT images; and training a second generative model using a second composite (sCBCT) image to minimize or reduce a second pixel-based loss term, the second pixel-based loss term representing a desire for a difference between the plurality of composite CBCT images and the respectively paired true CBCT images.

In some implementations, the CycleGAN is trained to apply a metric to the first pixel-based loss term and the second pixel-based loss term, the metric being generated based on a map of the same size as a pair of CBCT images and a real CT image, such that each pixel value in the map represents a degree of similarity between a given CBCT image and a given real CT image paired with the given CBCT image; and the CycleGAN is trained to apply a threshold to the metric such that when the degree of similarity exceeds the threshold, the metric is applied to the first and second pixel-based loss terms, and otherwise a zero value is applied to the first and second pixel-based loss terms.

In some implementations, the CycleGAN is trained to apply one of a plurality of metrics to the first and second pixel-based loss terms, the metrics generated using low-pass filtering and downsampling of the paired CBCT and CT images at different image resolutions or view levels.

In some implementations, one or more deformable offset layers are trained based on countertraining to change the amount of sampling, introduce coordinate offsets, and resample images using interpolation to store or absorb deformed structural information between the paired CBCT and CT images.

In some embodiments, a method, system, and transitory or non-transitory computer readable medium are provided for training a model to generate a synthesized computed tomography (sCT) image from Cone Beam Computed Tomography (CBCT) images, including: receiving a CBCT image of the subject as an input to generate a model; and training the generative model in a generative countermeasure network (GAN) via a first path and a second path to process the CBCT image to provide a first and second synthesized computed tomography (sCT) images corresponding to the CBCT image as an output of the generative model, the first path including a first set of one or more deformable offset layers of the one or more deformable offset layers and a first set of one or more convolutional layers of the one or more convolutional layers, the second path including the first set of one or more convolutional layers of the one or more convolutional layers without the first set of one or more deformable offset layers.

In some implementations, the GAN is trained using a cycle generator antagonistic network (CycleGAN) that includes a generative model and a discriminative model, wherein the generative model is a first generative model and the discriminative model is a first discriminative model, further comprising: training a second generative model to process the generated first and second sCT images as input and provide as output first and second cyclical CBCT images via third and fourth paths, respectively, the third path including a second set of one or more deformable offset layers of the one or more deformable offset layers and a second set of one or more convolutional layers of the one or more convolutional layers, the fourth path including the second set of one or more convolutional layers of the one or more convolutional layers without the second set of one or more deformable offset layers; and training a second discrimination model to classify the first-cycle CBCT image as a synthesized CBCT image or a true CBCT image.

In some implementations, the CycleGAN includes a first portion and a second portion for training the first generative model, further including: obtaining a training CBCT image paired with the real CT image; sending the training CBCT image to an input end of a first generation model through a first path and a second path so as to output a first synthetic CT image and a second synthetic CT image; receiving a first composite CT image at an input of a first discriminant model; classifying the first composite CT image into a composite CT image or a real CT image by using a first discrimination model; receiving the first and second composite CT images at an input of a second generative model via third and fourth paths to generate first and second cyclic CBCT images for calculating a cyclic consistency loss; sending the real CT image to an input of a second generative model via a fifth path and a sixth path to output a first composite training CBCT image and a second composite training CBCT image, the fifth path including a second set of one or more deformable offset layers of the one or more deformable offset layers and a second set of one or more convolutional layers of the one or more convolutional layers, the fourth path including the second set of one or more convolutional layers of the one or more convolutional layers without the second set of one or more deformable offset layers of the one or more deformable offset layers; receiving the first synthesized training CBCT image at an input of the second decision model; classifying the first synthesized training CBCT image into a synthesized CBCT image or a real CBCT image by using a second judgment model; receiving the first and second composite CBCT images at an input of the first generative model via a seventh path and an eighth path to generate first and second cyclic CT images for calculating a cyclic consistency loss, the seventh path including a first set of one or more deformable offset layers of the one or more deformable offset layers and a first set of one or more convolutional layers of the one or more convolutional layers, the eighth path including the first set of one or more convolutional layers of the one or more convolutional layers without the first set of one or more deformable offset layers; training the first generative model using the second sCT image to minimize or reduce a first pixel-based loss term representing a expectation of differences between the plurality of synthetic CT images and the respectively paired real CT images; and training a second generative model using a second composite (sCBCT) image to minimize or reduce a second pixel-based loss, the second pixel-based loss term representing a desire for differences between the plurality of composite CBCT images and respective paired real CBCT images.

The above summary is intended to provide an overview of the subject matter of the present patent application. And are not intended to provide an exclusive or exhaustive description of the inventive subject matter. Specific embodiments are included to provide further information regarding the present patent application.

Drawings

In the drawings, which are not necessarily drawn to scale, like reference numerals describe substantially similar components throughout the several views. Like reference numerals having different letter suffixes represent different instances of substantially similar components. The drawings illustrate generally, by way of example, but not by way of limitation, various embodiments discussed in the present document.

Fig. 1 illustrates an exemplary radiation therapy system suitable for performing a treatment plan generation process, according to some examples.

Fig. 2A and 2B illustrate an example image-guided radiation therapy apparatus according to some examples of the present disclosure.

Fig. 3A and 3B illustrate an exemplary convolutional neural network model adapted to generate and discriminate sCT images from received CBCT images, according to some examples of the present disclosure.

Fig. 4 illustrates an exemplary data flow regarding training and use of a generative warfare network adapted to generate sCT images from received CBCT images in accordance with some examples of the present disclosure.

Fig. 5 illustrates training of a generative countermeasure network for generating sCT images from received CBCT images, according to some examples of the present disclosure.

Fig. 6A-6D illustrate training and use of a cycle generating adaptive countermeasure network (cycle generating adaptive network) for generating sCT images from received CBCT images, according to some examples of the present disclosure.

Fig. 7 illustrates changes in the input images and anatomical region information used in connection with training and generating sCT images from received CBCT images, according to some examples of the present disclosure.

Fig. 8 illustrates a flowchart of exemplary operations for training a generative model adapted to output sCT images from received CBCT images, according to some examples of the present disclosure.

Detailed Description

The present disclosure includes various techniques for improving and enhancing CBCT imaging by generating sCT images (synthetic CT images or simulated CT images representing received CBCT images), including in a manner that provides technical benefits over manual (e.g., directed, assisted, or guided by a person) and conventional methods for improving CBCT images. These technical benefits include: the computational processing time to generate the enhanced CBCT image or sCT image is reduced, artifacts in the CBCT image and the enhanced CBCT image are eliminated, and concomitant improvements in processing, memory, and network resources used to generate and enhance the CBCT image and sCT image. In addition to improvements in data management, visualization, and control systems that manage data to support these improved CBCT images or sCT images, these improved CBCT images or sCT images may also be applicable to a wide variety of medical treatment and diagnostic settings, and to information technology systems used in such settings. Thus, in addition to these technical benefits, the present techniques may also yield a number of significant medical treatment benefits (including improved accuracy of radiation therapy treatment, reduced exposure to unintended radiation, etc.).

As discussed further herein, the following use and deployment of a generative countermeasure network (GAN), which is a form of supervised Artificial Intelligence (AI) machine learning, enables improvements in the accuracy and usefulness of CBCT images by generating sCT images via learned models. In an example, the present technology outputs a synthetic CT image that corresponds to an input CBCT image (e.g., a CBCT image of a human subject) and may include pixel values that exactly match or are comparable to real, true, or actual CT imaging data (such CT images may be referred to as real, true, or actual CT images throughout). The learned models discussed herein can also produce a composite CT image (sCT) with excellent high image quality from the original CBCT image of low image quality. Such sCT images preserve the anatomy in the original CBCT image and can remove or eliminate all or substantially all scatter, streak artifacts, and other noise artifacts to achieve high image quality with error-free HU values. The sCT images can be generated on-the-fly in real-time (e.g., the disclosed model can generate and enhance CBCT images as they are received).

In an example, a learned model is generated using a pair of deep convolutional neural networks operating in GAN or CycleGAN: a generator (also referred to as a "generative model") that produces an estimate describing a probability distribution of the training data; and a discriminator (also referred to as a "discriminant model") that classifies the generator samples as belonging to the generator or to the training data. The generator aims to simulate the data distribution of the training data as completely as possible, thereby confusing the discriminant to the greatest extent possible. Thus, a generator is produced that is trained (essentially "tuned") to maximize the results of the regression in the predictive modeling.

In an example, the GAN is trained on a paired CBCT image set and a true CT image set to train a model to generate sCT images given the CBCT images. The CBCT image may be registered with the true CT image. A GAN dual network (generator-arbiter) architecture can be used to produce a trained generative model that generates sCT images corresponding to the received CBCT images, which is superior to previous implementations including neural networks and previous methods of supervised ML in CycleGANs. These and various other technical and functional benefits will become apparent from the following sections.

In particular, the disclosed techniques train the GAN by applying a deformable offset layer to the CBCT or CT image in one generator model training path, and not in a second generator model training path. The discriminators in the disclosed technique are trained to operate only on the output of the generator to which the deformable offset layer is applied. The cyclic consistency loss term is used to train the generator model based on the images generated in the two generator paths. In this way, a certain amount of counter-loss of the artificial anatomy or the deformed anatomy is introduced to be applied only to the deformed image output of the generator. This limits the impact of the opposition loss of the artificial anatomy or the deformed anatomy in the training generator, thereby improving the sCT image produced by the generator. That is, the effects of potentially learning unwanted structural information and potentially creating hallucinogenic artificial structures caused by competing losses are decoupled from the effects of preserving the original structure caused by the cyclic consistency loss term. By minimizing the "cyclic consistency" loss term, the sCT image produced by the generator without the offset layer retains all of the true anatomical structures present in the original CBCT image. At the same time, because "contrast-type" losses only act on images produced based on deformable offset layers, such offset layers accommodate all unwanted shape deformations or phantom structures that may be produced.

The methods discussed herein enable the discovery of the characteristics of CBCT images and real CT images in order to generate new sCT images in real time as the CBCT images are received. These methods use a statistical learning employed by GAN to obtain a more detailed model of the connection between CBCT images and real CT images.

One approach has explored the use of CycleGANs for generating sCT images from CBCT images. Such a method is discussed in co-pending, commonly assigned U.S. patent application No. 16/044,245 to Jiaofeng Xu et al, entitled "CONE-BEAM CT IMAGE ENHANCEMENT use GENERATIVE ADVERSARIAL net works" (attorney docket No. 4186.042US1), filed 24/7/2018, which is hereby incorporated by reference in its entirety. The method incorporates a CycleGAN weighted L1 loss term based on paired CBCT-CT data. In particular, the method adds L1 loss terms between the sCT image and the true CT image and between the composite CBCT image and the true CBCT image and compensates for imperfect matches using weighted pixel-based terms. The method is suitable for generating the sCT image under the condition that the CBCT image and the CT image are similar in shape distribution or have no significant difference. However, when there is a large difference in shape distribution or other feature distribution in the CT image compared to the original CBCT image, the "contrast loss" may introduce some artificial anatomy or some deformed anatomy in the sCT image, thereby causing unexpected results in the generated sCT image. This is because the generator in this method is trained on the countermeasures to learn the unexpected shape deformation of certain regions.

In particular, the basic problem with the previous approach is that all these loss terms are used to train the same generator. Thus, the same generator is trained to achieve multiple different objectives, and in some cases, the same generator may not achieve all such objectives in an optimal manner. For example, a single generator is trained to convert the CBCT image appearance to a CT image in a manner that removes artifacts in the original CBCT image and converts to the correct CT number, while the single generator is trained based on some degree of structural deformation. When the shape distribution or other feature distribution in the CT image domain differs by a large amount from the original CBCT image domain, such unwanted structural or other distortions may not be suppressed, resulting in such unwanted structural or other artifacts appearing in the generated sCT image.

Conventional systems have not explored a way to improve the operation and accuracy of CycleGAN, which separates the effects of potentially learning non-unwanted structural information and potentially creating hallucinogenic artificial structures caused by antagonistic losses from the effects of preserving the original structure caused by the cyclic consistency loss term. In particular, conventional systems have not explored the following approaches: the generator is trained in parallel or simultaneously to generate sCT images in multiple paths, and the countermeasures are applied to images produced in only one of the paths to which the deformable offset layer has been applied.

Fig. 1 illustrates an exemplary radiation therapy system 100, the radiation therapy system 100 being adapted to perform radiation therapy planning processing operations using one or more of the methods discussed herein. These radiation therapy planning processing operations are performed to enable the radiation therapy system 100 to provide radiation therapy to a patient based on certain aspects of the captured medical imaging data and therapy dose calculations. Specifically, the following processing operations may be implemented: an image generation workflow 130 implemented by image processing logic 120 and a portion of an image generation training workflow 140. However, it should be understood that many variations and use cases of the following training model and image processing logic 120 may be provided, including data validation, visualization, and other medical evaluation and diagnosis settings. The radiation therapy system 100 can generate sCT images from the received CBCT images using GAN. The sCT image can represent an improved CBCT image with sharp edge appearance similar to a real CT image. Thus, the radiation therapy system 100 can generate sCT-type images in real time for medical analysis using captured lower quality CBCT images of a region of a subject.

The radiation therapy system 100 includes a radiation therapy processing computing system 110 that hosts image processing logic 120. The radiation therapy treatment computing system 110 may be connected to a network (not shown), and such network may be connected to the internet. For example, the network may connect the radiation therapy treatment computing system 110 with one or more medical information sources (e.g., a Radiology Information System (RIS), a medical record system (e.g., an Electronic Medical Record (EMR)/Electronic Health Record (EHR) system), an Oncology Information System (OIS)), one or more image data sources 150, an image acquisition device 170 (e.g., an imaging modality), a treatment device 180 (e.g., a radiation therapy device), and a treatment data source 160. As an example, the radiation therapy treatment computing system 110 may be configured to: the CBCT image of the subject is received by executing instructions or data from image processing logic 120 and the sCT image corresponding to the CBCT image is generated as part of the operation of generating an improved CBCT image to be used by treatment apparatus 180 and/or for output on apparatus 146.

The radiation therapy treatment computing system 110 may include processing circuitry 112, memory 114, storage 116, and other hardware and software operable features such as a user interface 142, a communication interface (not shown), and the like. The storage 116 may store transient or non-transient computer-executable instructions such as an operating system, a radiation therapy treatment plan (e.g., training CBCT images, real CT images, pairing information associating training images and real CT images, generated sCT images, modified or altered CBCT images, etc.), software programs (e.g., image processing software; image or anatomical visualization software; AI implementations and algorithms such as provided by DL models, ML models, and neural networks, etc.), and any other computer-executable instructions to be executed by the processing circuitry 112.

In an example, the processing circuitry 112 may include a processing device, e.g., one or more general purpose processing devices such as a microprocessor, Central Processing Unit (CPU), Graphics Processing Unit (GPU), Accelerated Processing Unit (APU), etc. More particularly, the processing circuitry 112 may be a Complex Instruction Set Computing (CISC) microprocessor, Reduced Instruction Set Computing (RISC) microprocessor, Very Long Instruction Word (VLIW) microprocessor, processor implementing other instruction sets, or processors implementing a combination of instruction sets. The processing circuitry 112 may also be implemented by one or more special-purpose processing devices such as an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), a system on a chip (SoC), or the like. As will be understood by those skilled in the art, in some examplesThe processing circuitry 112 may be a special purpose processor rather than a general purpose processor. The processing circuitry 112 may include one or more known processing devices, such as those from Intel^TMProduced Pentium^TM、Core^TM、Xeon^TMOr

Microprocessor family and from AMD^TMManufactured Turion^TM、Athlon^TM、Sempron^TM、Opteron^TM、FX^TM、Phenom^TMA family of microprocessors or any of the various processors manufactured by Sun Microsystems, inc (Sun Microsystems). The processing circuitry 112 may also include circuitry such as from Nvidia^TMMade of

Series, by Intel^TMGMA and Iris produced^TMSeries or by AMD^TMRadeon of manufacture^TMA graphics processing unit of a series of GPUs. The processing circuitry 112 may also include circuitry such as Intel^TMManufactured Xeon Phi^TMA series of accelerated processing units. The disclosed embodiments are not limited to any type of processor that is otherwise configured to meet the computing needs of identifying, analyzing, maintaining, generating, and/or providing large amounts of data or manipulating such data to perform the methods disclosed herein. In addition, the term "processor" may include more than one physical (circuitry-based) or software-based processor, such as a multi-core design or multiple processors each having a multi-core design. The processing circuitry 112 may execute sequences of transitory or non-transitory computer program instructions stored in the memory 114 and accessed from the storage 116 to perform various operations, processes, methods that will be described in more detail below. It should be understood that any of the components in system 100 may be implemented separately and operate as stand-alone devices and may be coupled to any other component in system 100 to perform the techniques described in this disclosure.

The memory 114 may include Read Only Memory (ROM), phase change random access memory (PRAM), Static Random Access Memory (SRAM), flash memory, Random Access Memory (RAM), Dynamic Random Access Memory (DRAM) such as synchronous DRAM (sdram), Electrically Erasable Programmable Read Only Memory (EEPROM), static memory (e.g., flash memory, flash disk, static random access memory) and other types of random access memory, cache memory, registers, compact disk read only memory (CD-ROM), Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, other magnetic storage devices, or any other non-transitory medium that can be used to store information, including images, data, or transitory or non-transitory computer-executable instructions (e.g., stored in any format), that can be accessed by the processing circuitry 112 or any other type of computer device. For example, the computer program instructions may be accessed by the processing circuitry 112, read from a ROM or any other suitable memory location, and loaded into RAM for execution by the processing circuitry 112.

The storage device 116 may constitute a drive unit comprising a transitory or non-transitory machine-readable medium having stored thereon one or more transitory or non-transitory sets of instructions and data structures (e.g., software) implemented or utilized by any one or more of the methods or functions described herein (including, in various examples, the image processing logic 120 and the user interface 142). The instructions may also reside, completely or at least partially, within the memory 114 and/or within the processing circuitry 112 during execution thereof by the radiation therapy treatment computing system 110, with the memory 114 and the processing circuitry 112 also constituting transitory or non-transitory machine-readable media.

Memory 114 and storage 116 may constitute non-transitory computer-readable media. For example, the memory 114 and storage 116 may store or load transient or non-transient instructions for one or more software applications on a computer-readable medium. Software applications stored or loaded using memory 114 and storage 116 may include, for example, operating systems for general purpose computer systems and for software controlled devices. The radiation therapy treatment computing system 110 may also operate various software programs including software code for implementing the image processing logic 120 and the user interface 142. Further, the memory 114 and storage 116 may store or load an entire software application, a portion of a software application, or code or data associated with a software application that is capable of being executed by the processing circuitry 112. In another example, the memory 114 and storage 116 may store, load, and manipulate one or more radiation therapy treatment plans, imaging data, segmentation data, treatment visualizations, histograms or measurements, AI model data (e.g., weights and parameters), labeling and mapping data, and the like. It is contemplated that the software program may be stored not only on storage device 116 and memory 114, but also on removable computer media such as hard drives, computer diskettes, CD-ROMs, DVDs, blu-ray DVDs, USB flash drives, SD cards, memory sticks, or any other suitable medium; such software programs may also be transmitted or received over a network.

Although not depicted, the radiation therapy treatment computing system 110 may include a communication interface, a network interface card, and communication circuitry. Example communication interfaces can include, for example, network adapters, cable connectors, serial connectors, USB connectors, parallel connectors, high-speed data transmission adapters (e.g., such as fiber optic, USB 3.0, thunderbolt interfaces (thunderbolt), etc.), wireless network adapters (e.g., such as IEEE 802.11/Wi-Fi adapters), telecommunications adapters (e.g., to communicate with 3G, 4G/LTE, and 5G networks, etc.), and so forth. Such a communication interface may include one or more digital and/or analog communication devices that allow the machine to communicate with other machines and devices, such as remotely located components, via a network. The network may provide the functionality of a Local Area Network (LAN), a wireless network, a cloud computing environment (e.g., software as a service, platform as a service, infrastructure as a service, etc.), a client-server, a Wide Area Network (WAN), etc. For example, the network may be a LAN or WAN that may include other systems (including additional image processing computing systems or image-based components associated with medical imaging or radiation therapy operations).

In an example, the radiation therapy treatment computing system 110 can obtain image data 152 from an image data source 150 (e.g., CBCT image) for hosting on the storage 116 and the memory 114. An exemplary image data source 150 is described in detail in conjunction with fig. 2B. In an example, a software program running on the radiation therapy treatment computing system 110 can convert a medical image in one format (e.g., MRI) to another format (e.g., CT), for example, by producing a composite image such as a pseudo-CT image or an sCT image. In another example, the software program may register or associate a patient medical image (e.g., a CT image or MR image) with a CBCT image (e.g., also denoted as an image) of the patient that is subsequently created or captured, thereby causing the respective images to be appropriately paired and associated. In yet another example, the software program may replace functions of the patient image, such as a signed distance function or a processed version of the image that emphasizes some aspects of the image information.

In an example, the radiation therapy treatment computing system 110 can obtain CBCT imaging data 152 from the image data source 150 or transmit the CBCT imaging data 152 to the image data source 150. Such imaging data may be provided to the computing system 110 to enhance or improve the imaging data using GAN or CycleGAN modeling to produce sCT images. The therapy data source 160 or the device 180 can treat the human subject using the sCT images. In further examples, the treatment data source 160 receives or updates planning data as a result of the sCT images generated by the image generation workflow 130; the image data source 150 may also provide or host imaging data 152 for use in the image generation training workflow 140.

In an example, the computing system 110 may use the image data source 150 to generate a CBCT image and a real CT image that are paired. For example, the computing system 110 may instruct the CBCT apparatus to acquire an image of a target region (e.g., a brain region) of the subject. The computing system 110 may store the image data in the storage 116 along with the time of capture by the CBCT image and the associated indication of the target region. The computing system 110 may also instruct the CT imaging device to acquire images of the same target region (e.g., the same cross-section of the brain region) as the real CT images. The computing system 110 may associate the real CT image with a previously obtained CBCT image of the same region to form a pair of real CT image and CBCT image to be stored as a training pair in the device 116. Computing system 110 may continue to generate such training image pairs until a threshold number of pairs are obtained. In some implementations, the computing system 110 can be guided by a human operator which target region and which CBCT images are to be paired with real CT images.

The processing circuitry 112 may be communicatively coupled to the memory 114 and the storage device 116, and the processing circuitry 112 may be configured to: executing computer-executable instructions stored thereon from memory 114 or storage 116. Processing circuitry 112 may execute instructions to cause medical images from image data 152 to be received or acquired into memory 114 and processed using image processing logic 120. In further examples, the processing circuitry 112 may utilize a software program (e.g., image processing software) as well as the image data 152 and other patient data to enhance or generate the sCT image.

Additionally, the processing circuitry 112 may utilize software programs to generate intermediate data, such as updated parameters to be used by, for example, neural network models, machine learning models, image generation workflows 130, image generation training workflows 140, or other aspects related to generating sCT images using GAN or CycleGAN as discussed herein. Moreover, using techniques discussed further herein, such software programs may utilize image processing logic 120 to implement image generation workflow 130 to generate new or updated sCT images for deployment to therapy data source 160 and/or presentation on output device 146. The processing circuitry 112 may then transmit the new or updated image to the treatment device 180 via the communication interface and network, where the radiation therapy plan will be used to treat the patient with radiation via the treatment device 180, consistent with the results of the workflow 130 trained with the workflow 140. Other outputs and uses of the software programs and

workflows

130, 140 can occur while the computing system 110 is being processed using radiation therapy.

In examples herein (e.g., with reference to generative confrontation network processing discussed with reference to fig. 3 and 4), the processing circuitry 112 may execute software programs that invoke the image processing logic 120 to implement the functionality of ML, DL, neural networks, and other aspects of artificial intelligence for generating sCT images from input CBCT images. For example, the processing circuitry 112 may execute software programs that train, analyze, predict, evaluate, and generate sCT images from received CBCT images as discussed herein. In accordance with the disclosed embodiments, a generator of sCT images is trained over multiple paths, wherein a first path includes one or more deformable offset layers and one or more convolutional layers, and a second path includes one or more convolutional layers without one or more deformable offset layers. Specifically, the generator is trained by performing two parallel or sequential processes, one of which generates a first sCT image from a training CBCT image via a first path, and a second of which generates a second sCT image from the training CBCT image via a second path. The first sCT image is provided to train a first discriminator to distinguish whether the first sCT image is a real CT image or a composite CT image. The first and second sCT images are then provided to another generator to generate CBCT images to provide first and second cyclic CBCT images for training based on the cyclic consistency loss term using the training CBCT images. In this way, the effects of learning unwanted structural deformations and potentially creating phantom structures caused by training against the loss term are decoupled from the effects of preserving the original structure caused by the cyclic consistency loss term. Another part of the CycleGAN trains the same generator in a similar manner based on the paired CT images of the training CBCT images used using two separate paths — one path with a deformable offset layer and one or more convolutional layers, and the other path with one or more convolutional layers without a deformable offset layer.

In an example, the image data 152 can include one or more MRI images (e.g., 2D MRI, 3D MRI, 2D streaming MRI, 4D volumetric MRI, 4D cine MRI, etc.), functional MRI images (e.g., fMRI, DCE-MRI, diffusion MRI), Computed Tomography (CT) images (e.g., 2D CT, 2D cone beam CT, 3D CBCT, 4D CBCT), ultrasound images (e.g., 2D ultrasound, 3D ultrasound, 4D ultrasound), Positron Emission Tomography (PET) images, X-ray images, fluoroscopic images, radiotherapy portal images (radiotherapeutic portal image), Single Photon Emission Computed Tomography (SPECT) images, computer-generated synthetic images (e.g., pseudo-CT images), and so forth. Furthermore, the image data 152 may also include or be associated with medical image processing data such as training images, real images (ground route images), contour images, and dose images. In other examples, an equivalent representation of the anatomical region may be represented in a non-image format (e.g., coordinates, maps, etc.).

In an example, the image data 152 can be received from the image acquisition device 170 and stored in one or more of the image data sources 150 (e.g., Picture Archiving and Communication System (PACS), vendor-neutral archive (VNA), medical records or information system, data warehouse, etc.). Thus, the image acquisition device 170 may include an MRI imaging device, a CT imaging device, a PET imaging device, an ultrasound imaging device, a fluoroscopic device, a SPECT imaging device, an integrated linear accelerator and MRI imaging device, a CBCT imaging device, or other medical imaging device for acquiring medical images of a patient. The image data 152 may be received and stored in any data type or any format type (e.g., in digital imaging and communications in medicine (DICOM) format) that the image acquisition device 170 and radiation therapy processing computing system 110 may use to perform operations consistent with the disclosed embodiments. Further, in some examples, the models discussed herein may be trained to: the raw image data format or a derivative thereof is processed.

In an example, the image acquisition device 170 may be integrated with the treatment device 180 into a single device (e.g., an MRI device in combination with a linear accelerator, also referred to as "MRI-Linac"). Such MRI-Linac may be used, for example, to determine the location of a target organ or target tumor within a patient's body, thereby accurately directing radiation therapy to a predetermined target according to a radiation therapy treatment plan. For example, a radiation therapy treatment plan may provide information about the specific radiation dose to be applied to each patient. The radiation therapy treatment plan may also include other radiation therapy information, such as beam angle, dose-histogram-volume information, the number of radiation therapy beams to be used during treatment, dose per beam, and the like. In some examples, the GAN trained models in the image generation workflow 130 are used only to generate enhanced CBCT images, while other workflows or logic (not shown) are used to convert the enhanced CBCT images to specific beam angles and radiation physical quantities for completing the radiation therapy treatment.

The radiation therapy treatment computing system 110 may communicate with an external database over a network to send/receive a plurality of various types of data related to image processing and radiation therapy operations. For example, the external database may include machine data (including device constraints) that provides information associated with the treatment device 180, the image acquisition device 170, or other machines related to the radiation therapy or medical procedure. The machine data information may include radiation therapy beam size, arc placement, beam on and off durations, machine parameters, segments, multi-leaf collimator (MLC) configuration, gantry speed, MRI pulse sequence, and so forth. The external database may be a storage device and may be provided with a suitable database management software program. Further, such a database or data source may include multiple devices or systems located in a centralized or distributed manner.

The radiation therapy treatment computing system 110 may collect and acquire data and communicate with other systems via a network using one or more communication interfaces communicatively coupled to the processing circuitry 112 and the memory 114. For example, the communication interface may provide a communicative connection between the radiation therapy treatment computing system 110 and the radiation therapy system components (e.g., allowing data to be exchanged with an external device). For example, in some examples, the communication interface may have appropriate interface circuitry with the output device 146 or the input device 148 to connect to the user interface 142, which the user interface 142 may be a hardware keyboard, keypad, or touch screen through which a user may input information into the radiation therapy system 100.

As an example, the output device 146 may include a display device that outputs: a representation of user interface 142; and one or more aspects, visualizations, or representations of medical images, treatment plans; and the state of training, generation, validation, or implementation of such plans. The output device 146 may include one or more display screens that display medical images, interface information, treatment plan parameters (e.g., contours, dose, beam angles, markers, maps, etc.), treatment plans, targets, locating and/or tracking targets, or any user-related information. The input device 148 connected to the user interface 142 may be a keyboard, keypad, touch screen, or any type of device by which a user may input information to the radiation therapy system 100. Alternatively, the features of the output device 146, the input device 148, and the user interface 142 may be integrated into a device such as a smartphone or tablet computer (e.g., Apple)

Lenovo

Samsung

Etc.) in a single device.

Further, any and all components of the radiation therapy system 100 may be implemented as virtual machines (e.g., via VMWare, Hyper-V, etc. virtualization platforms) or as stand-alone devices. For example, a virtual machine may be software acting as hardware. Thus, a virtual machine may include at least one or more virtual processors, one or more virtual memories, and one or more virtual communication interfaces that together act as hardware. For example, the radiation therapy processing computing system 110, the image data source 150, or similar components may be implemented as virtual machines or within a cloud-based virtualized environment.

Image processing logic 120 or other software program may cause the computing system to communicate with image data source 150 to read images into memory 114 and storage 116, or to store images or associated data from memory 114 or storage 116 to image data source 150 and to store images or associated data from image data source 150 to memory 114 or storage 116. For example, in a model training or generation use case, the image data source 150 may be configured to: a plurality of images (e.g., 3D MRI, 4D MRI, 2D MRI slice images, CT images, 2D fluoroscopic images, X-ray images, raw data from MR scans or CT scans, digital imaging and communications in medicine (DICOM) metadata, etc.) hosted by the image data source 150 from a set of images in image data 152 obtained from one or more patients via the image acquisition device 170 are stored and provided. The image data source 150 or other database may also store data to be used by the image processing logic 120 when executing a software program that performs image processing operations to create, modify, or generate sCT images from received CBCT images. In addition, various databases may store data generated by the trained models, including network parameters and resulting prediction data that make up the models learned by the generative confrontation network model 138. Accordingly, the radiation therapy processing computing system 110 may acquire and/or receive image data 152 (e.g., 2D MRI slice images, CT images, 2D fluoroscopic images, X-ray images, 3D DMRI images, 4D MRI images, etc.) from an image data source 150, an image acquisition device 170, a treatment device 180 (e.g., MRI-Linac), or other information system associated with performing radiation therapy or diagnostic procedures.

The image acquisition device 170 may be configured to: one or more images of the patient's anatomy are acquired for a region of interest (e.g., a target organ, a target tumor, or both). Each image, typically a 2D image or slice, may include one or more parameters (e.g., 2D slice thickness, orientation, and position, etc.). In an example, the image acquisition device 170 may acquire 2D slices at any orientation. For example, the orientation of the 2D slice may include a sagittal orientation, a coronal orientation, or an axial orientation. The processing circuitry 112 may adjust one or more parameters, such as the thickness and/or orientation of the 2D slice, to include the target organ and/or the target tumor. In an example, a 2D slice may be determined from information such as a 3D CBCT or CT or MRI volume. Such 2D slices may be acquired by the image acquisition device 170 in "near real-time" while the patient is undergoing radiation therapy treatment, such as when the treatment device 180 is in use (where "near real-time" means that data is acquired in at least milliseconds or less).

The image processing logic 120 in the radiation therapy processing computing system 110 is depicted as implementing an image generation workflow 130, which workflow 130 involves using a trained (learned) generative model (e.g., implementing the method described below with reference to fig. 8). The generative model may be provided by a generator 138B trained as part of a generative confrontation network (GAN) model 138. In an example, an image generation workflow 130 operated by the image processing logic 120 is combined with real CT image data processing 132 and CBCT image data processing 134 to generate sCT images based on mapped (paired) CT images and CBCT images used in training.

In an example, the generator 138B includes learned weights and values as a result of training involving the use of the discriminators 138A and generators 138B in the GAN model 138 in conjunction with an image generation training workflow 140, the image generation training workflow 140 processing paired training data (e.g., paired CBCT images and real CT images). As described above, the training workflow 140 may obtain and utilize imaging data and associated image data 152 from the

data sources

160, 150.

Fig. 2A shows an exemplary image-guided radiation therapy device 202, the image-guided radiation therapy device 202 comprising: a radiation source such as an X-ray source or a linear accelerator, a couch 216, an imaging detector 214, and a radiation therapy output 204. The radiation therapy device 202 may be configured to: a radiation therapy beam 208 is emitted to provide therapy to the patient. The radiation therapy output 204 can include one or more attenuators or collimators, such as a multi-leaf collimator (MLC). As will be appreciated, the radiation therapy output 204 may be provided in conjunction with image processing logic 120, which image processing logic 120 enables the associated use of the image generation workflow 130 and the image generation from the GAN generator 138B.

As an example, a patient may be placed in region 212 supported by treatment couch 216 to receive a radiation therapy dose according to a radiation therapy treatment plan. The radiation therapy output 204 can be mounted or attached to a gantry 206 or other mechanical support. One or more chassis motors (not shown) may rotate the gantry 206 and the radiation therapy output 204 about the couch 216 when the couch 216 is inserted into the treatment region. In an example, the gantry 206 may continuously rotate around the bed 216 as the bed 216 is inserted into the treatment region. In another example, the gantry 206 can rotate to a predetermined position when the couch 216 is inserted into the treatment region. For example, the gantry 206 can be configured to rotate the therapy output 204 about an axis ("a"). Both the couch 216 and the radiation therapy output 204 may be independently movable to other positions around the patient, for example, movable in a transverse direction ("T"), movable in a lateral direction ("L"), or rotatable about one or more other axes, such as about a transverse axis (denoted as "R"). A controller communicatively connected to one or more actuators (not shown) may control the movement or rotation of the couch 216 in order to properly position the patient in or out of the radiation therapy beam 208 according to the radiation therapy treatment plan. Both the couch 216 and the gantry 206 can move independently of each other in multiple degrees of freedom, which allows the patient to be placed such that the radiation therapy beam 208 can be accurately targeted at the tumor.

The coordinate system shown in fig. 2A, including axes A, T and L, may have an origin located at the isocenter (isocenter) 210. The isocenter 210 can be defined as the location where the central axis of the radiation therapy beam 208 intersects the origin of the coordinate axes, e.g., to deliver a prescribed radiation dose to a location on or within the patient. Alternatively, the isocenter 210 can be defined as the position at which the central axis of the radiation therapy beam 208 intersects the patient for various rotational positions of the radiation therapy output 204 about the axis a as positioned by the gantry 206.

The gantry 206 may also have an imaging detector 214 attached. The imaging detector 214 is preferably located opposite the radiation source (output 204), and in an example, the imaging detector 214 may be located within the field of the radiation therapy beam 208. The imaging detector 214 may implement the image processing logic 120 (fig. 1) to generate sCT images from CBCT images in real time. The imaging detector 214 may preferably be mounted on the gantry 206 opposite the radiation therapy output 204, for example, to remain aligned with the radiation therapy beam 208. As the gantry 206 rotates, the imaging detector 214 rotates about a rotational axis. In an example, the imaging detector 214 can be a flat panel detector (e.g., a direct detector or a scintillator detector). In this manner, the imaging detector 214 may be used to monitor the radiation therapy beam 208, or the imaging detector 214 may be used to image the patient's anatomy, such as portal imaging. The control circuitry of the radiation therapy device 202 can be integrated within the radiation therapy system 100 or remote from the radiation therapy system 100.

In an illustrative example, one or more of the couch 216, therapy output 204, or gantry 206 may be automatically placed, and the therapy output 204 may create the radiation therapy beam 208 according to a specified dose for a particular therapy delivery instance. The sequence of therapy delivery may be specified according to a radiation therapy treatment plan, e.g., using one or more different orientations or positions of the gantry 206, the couch 216, or the therapy output 204. Therapy delivery may occur sequentially, but may intersect on or within the patient in the desired therapy site, e.g., at isocenter 210. Thus, a prescribed cumulative dose of radiation therapy can be delivered to the therapy site while damage to tissue near the therapy site can be reduced or avoided.

Accordingly, fig. 2A specifically illustrates an example of a radiation therapy device 202, the radiation therapy device 202 operable to provide radiation therapy treatment to a patient, the radiation therapy device 202 having a configuration in which the radiation therapy output is rotatable about a central axis (e.g., axis "a"). Other radiation therapy output configurations may be used. For example, the radiation therapy output may be mounted to a robotic arm or manipulator having multiple degrees of freedom. In yet another example, the therapy output can be fixed, e.g., positioned in a region laterally separated from the patient, and a platform supporting the patient can be used to align the radiation therapy isocenter with a designated target site within the patient. In another example, the radiation therapy device may be a combination of a linear accelerator and an image acquisition device. As one of ordinary skill in the art will recognize, in some examples, the image acquisition device may be an MRI, X-ray, CT, CBCT, helical CT, PET, SPECT, optical tomography, fluorescence imaging, ultrasound imaging, or radiotherapy portal imaging device, among others.

Fig. 2B shows an example of an X-ray cone beam computed tomography scanner 220 as one of the image acquisition devices of fig. 2A and 170 (fig. 1). The X-ray cone-beam computed tomography scanner 220 may include an X-ray tube 224 and a detector 222. During operation, photons may be emitted from the X-ray tube 224 and may travel through a 3D object (e.g., a portion of a patient's anatomy) before reaching the detector 222. The 3D object may absorb a portion of the emitted photons. The detector 222 may comprise a 2D flat panel that may convert received photons into corresponding electronic signals. The electronic signal may record the absorption intensity along a particular X-ray path (a straight-line path), for example to form a 2D projection space image. To obtain 3D structural information of the 3D object, the 3D object may be rotated around a rotation axis, or the X-ray tube 224 and the detector 222 may be scanned along an orbital trajectory to obtain 2D projection space images from different perspectives. In an example, 2D projection space images may be collected over a range greater than 200 degrees, which may correspond to hundreds of 2D projection space images, for example.

An image reconstruction algorithm may be employed to form a 3D image of the 3D object from the 2D projection space image collected by the X-ray cone beam computed tomography scanner 220. The reconstruction algorithm may include analytical and iterative reconstruction algorithms. In an example, the 2D projection space images collected by the scanner 220 may be processed using an analysis algorithm (e.g., Feldkamp or Feldkamp modification algorithm) to obtain a 3D reconstructed image. In an example, the analysis algorithm may process the 2D projection space image in a few seconds. However, the 3D reconstructed image may be affected by artifacts, for example, artifacts introduced due to differences between the collected 2D projection space image and mathematical assumptions associated with the analysis algorithm. In addition, artifacts may come from other sources, such as noise. In an example, the 2D projection space image collected by the scanner 220 may be processed using an iterative algorithm to obtain a 3D reconstructed image. Iterative algorithms may suppress some but not all types of artifacts associated with analysis algorithms, and iterative algorithms may achieve better image quality than analysis algorithms, but even with advanced GPU techniques, iterative algorithms may take longer than analysis algorithms. Neither analytical algorithms nor iterative algorithms work for all types of artifacts. The artifacts in the image may include any one or more of noise, scattering artifacts, extinction artifacts (extinction artifacts), beam hardening artifacts (beam hardening artifacts), exponential edge gradient effects, aliasing effects, ring artifacts, motion artifacts, or misalignment effects.

The noise artifact may include additive noise or electrical noise from rounding errors. Noise artifacts may also include photon counting noise, which may follow a Poisson distribution (Poisson distribution). CBCT machines can operate at milliamps of current, which may be about an order of magnitude lower than the current of CT machines, and thus the signal-to-noise ratio in CBCT images may be lower than in CT images. Scatter artifacts may be caused by photons scattered by the object that deviate from traveling along a straight path. In some reconstruction algorithms, which may assume that photons travel along a straight path, artifacts may be introduced due to scattering. The scatter artifacts may include uneven darkening in the CBCT 2D/3D image. In case the object contains a strongly absorbing material and photons cannot penetrate the object, extinction artifacts may occur, resulting in a very weak or zero signal on the detector. In case the signal on the detector is very weak or zero, absorption information may be lost. Extinction artifacts in 2D CBCT projection space images may cause artifacts, such as strong bright streak-like artifacts, in the reconstructed CBCT 2D/3D images. In the case of using polychromatic X-ray beams to form 2D CBCT projection space images, beam hardening artifacts may occur. In a polychromatic x-ray beam, low energy x-rays may be preferentially absorbed by tissue within the patient, such as may result in a relative increase in the ratio of high energy x-rays to low energy x-rays. A relative increase in this ratio may lead to artifacts in the reconstructed CBCT 2D/3D image. Exponential Edge Gradient Effects (EEGE) may occur at sharp edges with high contrast to neighboring structures. EEGE may be induced by averaging the measured intensity over a limited beam width, while the algorithm used for reconstruction assumes a beam width of zero. EEGE can provide reduced calculated density values and can result in stripes that are tangential to the long straight side in the projection direction. Aliasing artifacts may occur when the image sampling frequency (pixels per unit area) is less than twice the value of the spatial frequency being sampled. Aliasing artifacts may also occur due to a diverging cone beam, such as the cone beam used in collecting CBCT projection space images. Ringing artifacts may be caused by defects or misalignment of the detector elements. The ringing artifacts may appear as concentric rings centered on the axis of rotation. Motion artifacts and misalignment effects may be due to misalignment of any of the source, object, and detector during the collection of CBCT images.

These CBCT images may be improved by using the GAN-related Deep Learning (DL) method/Machine Learning (ML) method discussed herein, for example, using the image processing logic 120 (fig. 1). AI. DL or ML are based on mathematical analysis of the probability of random variables and their probability distribution. Usually, in pairs X, Y — { x-_i，y _i1, N-observation random variables, where x is the value of each_iE.x, we want to assign it to index y by scalar class_iE.g. class or category represented by Y (classification), or according to some function Y_i＝f(x_i) (regression) assigns it a numerical value. All classification or regression methods rely on the concept of probability distributions to describe the random variables X, Y. The probability distribution of the random variables X, p (X) (X is discrete or continuous) must satisfy: 1) the field of p (x) must be a set of all possible values for x; 2) for all X ∈ X, p (X) ≧ 0; and 3)

The sample x extracted from the distribution p (x) is written as x to p (x). Where joint distribution p (x, y) is p (x) ═ p (x, y) dy, the joint distribution of X, Y is written as p (x, y) and the edge distribution is written as x, p (x). The probability of observing y under the condition of the value of x is p (y | x) ═ p (x, y)/p (x). The conditional probability of observing y given data x is called the data likelihood. Bayesian rules link X, Y conditional likelihood to p (y | x) ═ p (x | y) p (y)/p (x).

The goal of statistical learning is to determine a mapping f that associates any y with x: x → y. One of the most important methods is maximum likelihood estimation. Suppose that the training data is composed of process p_data(x, y). Finding a mapping involves learning a model process p_model(x; theta) which, in addition to x, also includes a parameter theta on which the mapping depends. For example, θ may include neural network layer weights and bias parameters. The maximum likelihood will give the parameter θ of the most likely value of x_LEstimated as:

where E is the expected value of the argument in parenthesis. Since the probability distribution is difficult to approximate and since the goal is to make p_data(x) Distribution and p_modelThe difference between the (x; theta) distributions is minimized, so the KL divergence provides an alternative to data driving,

where maximum likelihood amounts to minimizing the difference between the model and the data distribution. log p_data(x) The term is independent of the model, so D is to be_KLMinimization, it is desirable to minimize equation 3 below:

this is the same as equation (1), where θ is implicit in the model expression. The desired mapping is then f (θ): x to p_model→y。

The presently disclosed system for CT image modeling and sCT image generation using a multi-path approach (one path with and one path without deformable offset layers) provides a useful application of modern neural network techniques to model radiation therapy treatment planning and image generation. Neural Networks (NN) have been studied since the 1960 s to solve the classification problem (assigning observed data x to two or more classes y_i(i-one of 1.. n) and a regression problem (relating the observed data x to the value y of the parameter related to that data). The generation of CT image and CBCT image parameters can be viewed as a regression problem that is generated by using NN generation models that are learned via GAN configuration. Although the above and following description refers to a multi-path approach, where one path has a deformable offset layer and one path does not, the deformable offset layer may take other forms or types (e.g., spatial transform layers or transducers) that may be used to store the deformed structure information.

A simple NN consists of an input layer, an intermediate or hidden layer, and an output layer, each layer containing computational cells or nodes. The hidden layer node(s) have inputs from all input layer nodes and are connected to all nodes in the output layer. Such networks are referred to as "fully connected". Each node transmits a signal to an output node according to a non-linear function of the sum of its inputs. For a classifier, the number of input level nodes is typically equal to the number of features of each of the set of objects classified into classes, while the number of output level nodes is equal to the number of classes. Training a network by: the network is presented with the features of objects of known classes and the node weights are adjusted by an algorithm called back-propagation to reduce training errors. Thus, the trained network can classify novel objects whose class is unknown.

Neural networks have the ability to discover relationships between data and class or regression values, and under certain conditions can model any function y ═ f (x) including non-linear functions. In ML, it is assumed that both training and test data are generated by the same data generation process p_dataGenerated, each of which { x_i，y_iSamples are all the same and independently distributed (i.i.d.). In ML, the goal is to minimize the training error and minimize the difference between the training error and the testing error. If the training error is too large, under-fitting (Underfitting) may occur; overfitting can occur when the training-test error difference is too large. Both types of performance deficiencies are related to model capacity; large volumes may fit training data very well, but may result in overfitting; small capacity may result in under-fitting. Since DNNs have a large capacity, overfitting is a more common problem in machine learning.

Deep learning is a machine learning method that employs DNN with a large number of hidden layers, where inputs and outputs are arranged in a complex manner and produce human-level performance in image and speech recognition tasks. In this example, the DNN may be trained to determine the relationship between observed data X and output Y. Data X ═ X₁，...X_nIs a set of CBCT images and the output Y is an sCT image.

The action of DNN is symbolically captured by function f (-)

Y^*＝f(X；Θ)

(formula 4)

Wherein Θ ═ θ₁，...，θ_n)^TIs a vector of parameters related to the trained NN, where Y^*Is the closest approximation of true Y observed in training. Using a data set { X, Y } of a training CBCT image X and a known true corresponding or registered CT image Y_i(i 1.., N) to train the DNN. Training minimizes the cost function J (Θ) for the following classes:

J(Θ^*)＝argmin_Θ||Y-Y^*||²

(formula 5)

Wherein, theta^*Is to make the actual Y and the estimated Y^*With minimum mean square error between. In deep learning, a cost function typically expresses a data approximation function as a probability function of the problem variable, or a conditional likelihood that Y is observed given X and obeys the value of the parameter Θ, as P (Y | X; Θ), for which the optimal parameter Θ is obtained by maximizing the likelihood_ML：

Θ_ML＝argmax_Θ P(Y|X；Θ)

(formula 6)

Or in the alternative,

the training data is summed by equation 7.

DNN output that produces an identification of sCT images belonging to the true CT class is an example of classification. In this case, the DNN output would be the CBCT image map Y ═ Y (Y)₁,...,y_M)^TTrue value element y of_iThis means that the network computation will be an example of regression.

DNNs have more layers (deeper) than the basic NN implementation, since DNNs typically include tens or hundreds of layers, each layer consisting of thousands to hundreds of thousands of nodes, where the layers are arranged in complex geometries. In addition to the weighted sum of the inputs, some layers compute other operations on the outputs of the previous layer, such as convolution. Convolution and filters derived from the convolution can locate edges in the image or time/pitch features in the sound stream, and subsequent layers will find larger structures consisting of these primitives (primitives). Such trained DNNs involving the use of convolutional layers are referred to as Convolutional Neural Networks (CNNs).

Skip connection (skip connection) is an important CNN architecture innovation. Originally introduced to improve accuracy and shorten training, hop connectivity is the splicing of node data at one level of the network with data of nodes at another level. An important example is the U-Net architecture developed for medical image segmentation. As discussed further below, the "left" portion of the "U" encodes the image data as convolution filter features, and the "right" portion of the "U" decodes these features into a continuous higher resolution representation. The combination of encoded and decoded features across levels of the same network hierarchy may result in more accurate classification. Another variant of the jump join is implemented within each CNN block, which forces the training of the differences (residuals) between the layer outputs, rather than the training of the layer outputs directly. This "ResNet" architecture and many variations thereof can improve the accuracy of NNs.

Fig. 3A illustrates an exemplary CNN model 300 suitable for generating a composite CT image (sCT) in accordance with the present disclosure. In particular, the model 300 depicts an arrangement of "U-Net" depths CNN designed to generate an output dataset (output sCT image 306) based on an input training set (e.g., a paired CBCT image 302 and CT image 304). The name is derived from the "U" configuration, and it is well understood that this form of NN model may yield pixel-by-pixel classification or regression results. In some cases, the first path to the CNN model 300 includes one or more deformable offset layers and one or more convolutional layers, while the second path to the CNN model 300 includes one or more convolutional layers without deformable offset layers. The model 300 generates, in parallel or sequentially, via first and second paths, first and second sCT images 306 (one generated via the first path and one generated via the second path) as an output data set.

The left side of the model operation (the "encode" operation 312) learns a set of features that are used by the right side (the "decode" operation 314) to reconstruct the output result. The U-Net has n stages consisting of conv/BN/ReLU (convolution/batch normalization/correction linear unit) blocks 316, and each block has a jump connection to achieve residual learning. The block size is represented in FIG. 3A by the "S" and "F" numbers; the size of the input image is S × S, and the number of feature layers is equal to F. The output of each block is a pattern of characteristic responses in an array of the same size as the image.

Proceeding along the encoding path, the block size is reduced 1/2 or 2 at each stage^-1Whereas conventionally the size of the features has increased by a factor of 2. The decoding side of the network scales from S/2ⁿBacking up while adding feature content from the left side at the same level; this is a duplicate/connection data communication. The

input images

302, 304 shown in FIG. 3A may be provided for training the network to evaluate conv/BN/ReLU layer parameters, as there will be no output images. To reason or test using this model, the input would be a single image of the CBCT image 302, and the output would be the sCT image 306.

The representation of model 300 of FIG. 3A thus illustrates the training and prediction of a generative model suitable for performing regression rather than classification. Fig. 3B illustrates an exemplary CNN model suitable for discriminating synthetic CT images (scts) in accordance with the present disclosure. The arbiter network shown in FIG. 3B may include several levels of blocks configured with convolutional layers, bulk normalization layers, and ReLu layers of step size 2, and separate pooling layers. At the end of the network, there will be one or several fully connected layers to form a 2D patch (patch) for discrimination purposes. The arbiter shown in fig. 3B may be a slice-based arbiter configured to: receiving an input sCT image (e.g., generated from a first path including a deformable offset layer from the generator shown in fig. 3A), classifying the image as authentic or counterfeit, and providing the classification as an output 350.

Consistent with embodiments of the present disclosure, a therapy modeling method, system, device and/or process based on such a model includes two phases: training a generative model using the discriminator/generator pair in the GAN; and using the GAN trained generator to make predictions using the generative model. Various examples involving GAN and CycleGAN for sCT image generation are discussed in detail in the examples below. It will be appreciated that other variations and combinations of the types of deep learning models, as well as other neural network processing methods, may also be implemented using the present techniques. Further, while the following examples are discussed with reference to images and image data, it will be understood that the following networks and GANs may operate using other non-image data representations and formats. Furthermore, while two paths are discussed for generating the first and second sCT images during training, only one path (the second path not including the deformable offset layer) is actually used after the training generator generates the sCT images at a given CBCT image.

In deep CNN training, the learned model is the values of the layer node parameters θ (node weights and layer biases) determined during training. The training uses maximum likelihood or cross entropy between the training data and the model distribution. The cost function expressing this relationship is

The exact form of the cost function for a particular problem depends on the nature of the model used. Gaussian model p_modelAn implicit cost function is (y | x) ═ N (y: f (x; θ)) such as:

which includes a constant term that is independent of theta. Thus, minimizing J (θ) yields a map f (x; θ) that approximates the distribution of the training data.

FIG. 4 illustrates an exemplary data flow for training and using a generative countermeasure network adapted to generate a composite CT image from received CBCT images. For example, the generator model 432 of FIG. 4 trained to produce the trained generator model 460 via multiple parallel paths (one including the convolutional layer and one or more deformable offset layers, and one including only the convolutional layer) may be trained to: processing functions 132, 134 provided as part of the image processing logic 120 in the radiation therapy system 100 of fig. 1 are implemented. Thus, the data flow using 450 (predicted) GAN model is depicted in fig. 4 as: the trained generator model 460 is provided with new data 470 (e.g., CBCT input images from new patients) and is used to produce a prediction or estimate of the generated results 480 (e.g., sCT images corresponding to the input CBCT images).

The GAN includes two networks: a generative network (e.g., generator model 432) trained to perform a classification or regression; and a discriminant network (e.g., discriminant model 440) that samples an output distribution (e.g., simulated output 436) of the generative network and determines whether the sample is the same as or different from a true test distribution. The goal of this network system is to drive the generator network to learn the real model as accurately as possible, so that the arbiter net has only 50% of the chance to determine the correct source of the generator samples, which is in balance with the generator network. The arbiter has access to real data, but the generator has access to only training data via the detector's response to the generator's output.

The data flow of fig. 4 illustrates the receipt of training input 410, training input 410 including model parameters 412 and various values of training data 420 (where such training image 423 includes CBCT patient imaging data, true CT images corresponding to the patient imaging data, and/or

maps

424 and 425 of anatomical regions, conditions or constraints 426). The training inputs 410 are provided to the GAN model training 430 to produce a trained builder model 460 for use in the GAN model usage 450. The

maps

424 and 425 of the anatomical regions provide a metric for comparing similarity between the two images (e.g., using SSIM weights).

As part of the GAN model training 430, a generator model 432 is trained on the map 424 of the anatomical region and the real CT and CBCT image pairs 422 (also depicted as 302, 304 in fig. 3A) to produce and map segment pairs in the CNN. In this manner, the generator model 432 is trained via multiple paths to produce first and second simulated or composite CT image representations 436 based on the input map. The first sCT image representation 436 is generated by the generator model 432 by applying one or more deformable offset layers and one or more convolutional layers to an input training image at a first input interface of the generator model 432. The second sCT image representation 436 is generated by the generator model 432 by applying one or more convolutional layers without applying a deformable offset layer at a second input interface of the generator model 432. All remaining components of the generator model 432 (e.g., the components used to process information and generate the sCT image representation via the first and second input interfaces) are shared by multiple paths, meaning that the generator is trained based on the output of both paths. The discriminator model 440 determines whether the simulated representation 436 is from training data (e.g., a true CT image) or from a generator (e.g., an sCT as communicated between the generator model 432 and the discriminator model 440 using the generation results 434 and the detection results 444). The discriminator model 440 operates and trains based only on the first sCT image representation 436 (e.g., a representation generated using deformable offset layers). In this manner, the generator model 432 is trained with a discriminator on images generated by a first path including a deformable offset layer, and is further trained based on cyclic consistency loss information generated based on two images generated by the first path including the deformable offset layer and a second path without the deformable offset layer. This training process results in a back propagation of the

weight adjustments

438, 442 to refine the generator model 432 and the discriminator model 440.

Thus, in this example, data for GAN model training 430 prepares CT images that need to be paired with CBCT images (these may be referred to as training CBCT/CT images). In an example, the raw data includes paired CBCT image sets and corresponding CT images that can be registered and resampled to a common coordinate system to produce paired anatomical-derived images.

In detail, in the GAN model, the producers (e.g., the producer model 432) learn the distribution p on the data x_G(x) From having a distribution p_Z(z) noise input begins because the generator learns the mapping G (z; θ)_G)：p_Z(z)→p_G(x) Where G is a value representing a layer weight and a deviation parameter θ_GIs a differentiable function of the neural network of (a). Discriminator D (x; theta)_D) (e.g., discriminator model 440) maps the generator output to a binary scalar { true, false }, if the generator output is from the actual data distribution p_data(x) The decision is true if the producer output is from the producer distribution p_G(x) It is judged as false. That is, D (x) is x is from p_data(x) Rather than from p_G(x) The probability of (c).

Fig. 5 illustrates training in a GAN for generating a synthetic CT image model according to example techniques discussed herein. Fig. 5 specifically illustrates the operational flow 550 of the GAN generator model G560, the GAN generator model G560 designed to produce a simulated (e.g., estimated, artificial, etc.) output sCT image 580 as a result of the input CBCT image 540. Fig. 5 also shows an operational flow 500 of a GAN discriminator model D520, the GAN discriminator model D520 being designed to produce determined values 530 (e.g., true or fake, true or false) based on an input (e.g., a true CT image 510 or a generated sCT image 580). In particular, the discriminator model D520 is trained to produce an output that instructs the discriminator model D520 to determine whether the generated sCT image 580 is authentic or counterfeit.

In the case of GAN, based on the adjusted training weights 570 applied during training, discriminator D520 is trained to maximize the probability of assigning the correct label to samples from both distributions, while generator G560 is trained to minimize log (1-D (G (z)). D. G can be considered as playing a binary mini-max game (two-player mini game) with the following value function v (d):

min_Gmax_DV(D，G)＝E_y～p(CT)[log D(y)]+E_x～p(CBCT)[log(1-D(G(x))]

(formula 10)

Early in learning, when G underperforms, the log (1-D (G (x)) term dominates V (D, G) and leads to early and incorrect termination. Instead of training G to minimize log (1-D (G (x)), G may be trained to maximize log (1-D (G (x)), thereby producing more informative gradients earlier in the training. Furthermore, as training progresses, p is distributed_G(x) Convergence to true data distribution p_data(x)。

A useful extension of GAN is the CycleGAN described below in connection with fig. 6A-6D. Fig. 6A illustrates training and use of a CycleGAN600 for generating sCT images from CBCT images received via multiple paths (some paths including one or more deformable offset layers and one or more convolutional layers/blocks, and some other paths including one or more convolutional layers/blocks without one or more deformable offset layers), according to some examples of the present disclosure. The CycleGAN600 includes one or more deformable offset

layers

660A, 660B, one or

more convolution blocks

661A, 661B, a first placement portion 610 of the generator model, a second placement portion 620 of the generator model, a first discriminator model 630, and a second discriminator model 640. The first arrangement 610 of generator models including the first and second input interfaces (e.g., deformable offset layer 660A and convolution block 661A) is our first generator model 606, and the second arrangement 620 of generator models including the first and second input interfaces (e.g., deformable offset layer 660B and convolution block 661B) is the second generator model 608. The two

models

606 and 608 may each be an implementation of the generator model 432 (fig. 4) (e.g., a DCNN model as a regression type), and the first and

second discriminator models

630 and 640 may each be an implementation of the discriminator model 440 (e.g., a DCNN model as a classification type). The CycleGAN600 may be divided into two parts-a first part 650 and a second part 652.

The first placement part 610 of the generator model represents a portion of the first generator model that is shared by two separate input interfaces to the first generator model. That is, one input interface of the first placing part 610 of the generator model includes a path through the deformable offset layer 660A and the convolution block 661A, and the second input interface of the first placing part 610 of the generator model includes a path through only the convolution block 661A without the deformable offset layer 660A. After training the first generator model, only the input interface including the convolution block 661A without the deformable offset layer 660A is used. The second placement of the producer model 620 represents a portion of the second producer model that is shared by two separate input interfaces to the second producer model. That is, one input interface of the second placing part 620 of the generator model includes a path through the deformable offset layer 660B and the convolution block 661B, and the second input interface of the second placing part 620 of the generator model includes a path through only the convolution block 661B without the deformable offset layer 660B. After training the second generator model, only the input interface including the convolution block 661B without the deformable offset layer 660B is used. Convolution blocks 661A and 661B may be trained together with or separately from the training of the generator and arbiter models. In particular,

convolution blocks

661A and 661B are trained to obtain the correct weights to perform their functions.

The deformable offset

layers

660A and 660B may each be trained to coordinate offsetting, resampling, and performing interpolation. The deformable offset

layers

660A and 660B may be trained together with or separately from the training of the generator and the arbiter model. In particular, the deformable offset

layers

660A and 660B are trained to obtain the correct weights to perform their functions. The effect of these offset layers alters the original regular sampling grid from the over-roll block, introduces coordinate offsets, and resamples the image using interpolation. In this manner, the deformable offset

layers

660A and 660B may include structural deformation information. An illustrative implementation of one of the deformable offset

layers

660A and 660B is shown and described in connection with fig. 6B. The deformable offset

layers

660A and 660B may alternatively or additionally be implemented using spatial transformers, other types of convolutional layers, and/or any other module that may store deformed structural information of an image. The number of offset layers in the deformable offset

layers

660A and 660B may vary based on the image size, the number of downsampled convolutional layers (e.g., the number of convolutional blocks in one or more

convolutional blocks

661A and 661B), and other factors.

As shown in FIG. 6B, one of the deformable offset layers 660A/660B includes an input signature, a rolling block, an offset field, and an output signature. Fig. 6B specifically illustrates a 3 × 3 deformable convolution, but any other size of deformable convolution may be similarly provided. The 2D convolution includes two steps: 1) sampling on an input feature map x using a regular grid R; 2) the w weighted sample values are summed. Grid R defines the receptive field size and dilation. For example, R { (-1, -1), (-1, 0), …, (0, 1), (1, 1) } defines a 3 × 3 kernel with a dilation of 1. For each position p on the output profile y₀The following may be provided:

wherein p is_nEnumerate the position in R.

In the deformable convolution, the regular grid R is increased by an offset { Δ p_n1., N }, where N ═ R |. In this case, the above expression becomes:

now, the sampling is at irregular and offset positions p_n+Δp_nThe above. Due to the offset Δ p_nUsually fractional, the above representation can be implemented by bilinear interpolation as:

wherein p represents an arbitrary (fractional) position (p ═ p)₀+p_n+Δp_n) Q enumerates all the integral spatial locations in the feature map x, and G (·,) is a bilinear interpolation kernel. G may be two-dimensional and may be divided into two one-dimensional kernels:

G(q，p)＝g(q_x，p_x)·g(q_y，p_y)，

wherein g (a, b) ═ max (0,1- | a-b |).

As shown in FIG. 6B, the offset is obtained by training the convolutional layer on the same input feature map. The convolution kernel has the same spatial resolution and dilation as the current convolution layer. The output offset field has the same spatial resolution as the input signature. The channel dimension 2N corresponds to N2D shift maps (e.g., N + N — 2N) for the x and y directions, respectively (the x direction having N and the y direction having another N). During training, both the convolution kernel and the offset used to generate the output features are learned simultaneously. To learn the offset, the gradient is propagated back through the bilinear operation in the above equation. In some cases, the convolution kernel in FIG. 6B is implemented using convolution block 661 or shared with convolution block 661.

Referring back to fig. 6A, in an example, in a first portion 650, a first setup portion 610 of the generator model may be trained to receive a CBCT training image 602 (which may include one of the image pairs 422) via first and second paths and generate respective first and second sCT images as generation results 612 and 614. Specifically, the CBCT training image 602 may be processed by the one or more deformable offset layers 660A and the one or more convolution blocks 661A in the first path to produce a first sCT image as the generation result 612. The CBCT training image 602 may be processed by the convolution block 661A in the second path without being processed by the deformable offset layer 660A to produce a second sCT image as the generation result 614. The first generator model 610, which includes a first input interface with a deformable offset layer 660A and a convolution block 661A and generates a first generation 612 via a first path, is referred to as a first generator model

And a first generator model 610 comprising a second input interface with a convolution block 661A and without a deformable offset layer 660A and generating a second generated result 614 via a second path is referred to as G^cbct2ct. Generator G^cbct2ctAnd generator

All network layers and weights are shared and are generators

But not all of these offset layers 660A.

In the first path, the CBCT training image 602 may be processed by a first one of the deformable offset layers 660A, and the output of the first deformable offset layer is provided to a first one of the one or more convolution blocks 661A. The output of the first convolution block is then provided for processing by a second one of the deformable offset layers 660A. The output of the second deformable offset layer is then provided to another volume block (if present) or directly to the first positioning section 610 of the generator model. In particular, the deformable offset layer 660A may be interleaved with the convolution block 661A.

In parallel with the first path, the CBCT training image 602 may be processed in the second path only by the one or more convolution blocks 661A without passing through the deformable offset layer 660A. The convolution block 661A through which the CBCT training image 602 passes may be shared by the first and second paths. Specifically, the CBCT training image 602 passes through a first and another volume block (if any) of the one or more volume blocks 661A in a second path. The output of the first convolution block is then provided to a first placement portion 610 of the generator model. The first placing section 610 of the generator model processes the images output by the first and second paths in parallel or sequentially.

The first generation 612 may be provided to the first discriminant model 630 instead of the second generation 614. The first discriminator model 630 may classify the sCT image as a real CT training image or a simulated CT training image and provide the classification as the detection result 632. The first generation results 612 and detection results 632 may be fed back to the first generator model 606 and the first discriminator model 630 to adjust the weights implemented by the first generator model 606 and the first discriminator model 630, including those deformable-offset layers 660A and those convolutional layers 661A. For example, the first generation 612 (e.g., an sCT image generated by the first generator model 610 via the first path) and the detection 632 may be used to calculate the opposition loss.

Fig. 6C shows an illustrative implementation of the first and second paths of the first portion 650. As shown, a true CBCT image 602 is received and provided to a plurality of deformable displacement layers 660A in a first path. The CBCT image 602 passes through the deformable offset layer 660A in a manner that interleaves with the convolution blocks in the convolution block 661A. Although only four deformable displacement layers 660A are shown, more or fewer deformable displacement layers may be used. In parallel with the first path, the CBCT image 602 only passes through the convolution block 661A and not through the deformable offset layer 660A via the second path. Both the image outputs of the first and second paths are provided to the first arranging part 610 of the shared generator model for parallel processing (e.g. by running a parallel process implementing the functionality of the first arranging part 610 of the generator model) to output the respective sCT images as a first generation 612 and a second generation 614. Specifically, the first generation 612 is an sCT image generated using an offset layer, and the second generation 614 is an sCT image generated without using an offset layer. The results 612, which include the sCT images generated using the offset layer, are provided to a first discriminator model 630 of the CT domain, while the results 614 are not provided to the first discriminator model 630.

Referring back to fig. 6A, a first production result 612 (e.g., an sCT image) can also be provided to the second generator model 608 simultaneously with a second production result 614 via third and fourth paths, respectively. The second generator model 608 can receive the first generation 612 and generate a corresponding simulated CBCT image as an output. The simulated CBCT image may be referred to as a cyclic CBCT image 622 and may be used to calculate the cyclic loss to adjust the weights of the first/second generator models 606/608. The second generator model 608 that generates the first cycle CBCT image 622 via the third path is referred to as the first generator model

And the second generator model 608 that generates the second cycle CBCT image 622 via the fourth path is referred to as G^ct2cbct. Generator G^ct2cbctAnd generator

All network layers and weights are shared and are generators

But not all of these offset layers.

In particular, the first generated result 612 may be processed by the one or more deformable offset layers 660B and the one or more convolution blocks 661B in the third path to generate the first simulated CBCT image as the first cyclic CBCT image 622. The second generation result 614 may be processed by the convolution block 661B without the deformable offset layer 660B in the fourth path to produce the second simulated CBCT image as the second cyclic CBCT image 622.

In a third path, the first generation 612 may be processed by a first one of the deformable offset layers 660B, and the output of the first deformable offset layer is provided to a first one of the one or more volume blocks 661B. The output of the first convolution block is then provided for processing by a second one of the deformable offset layers 660B. The output of the second deformable offset layer is then provided to another rolling block (if present) or directly to a second positioning section 620 of the generator model. In parallel with the third path, the second generated results 614 may be processed in the fourth path only by the one or more convolution blocks 661 without passing through the deformable offset layer 660B. The convolution block 661B through which the second generated result 614 passes may be shared by the third and fourth paths. Specifically, the second generated result 614 passes through the first and another (if any) of the one or more convolution blocks 661B in the fourth path. The output of the first convolution block is then provided to a second arranging section 620 of the generator model. The second arranging section 620 of the generator model processes the images output by the third and fourth paths in parallel or sequentially.

With the additional offset layers in the first portion 650 (in the forward direction including the first and second paths and the backward direction including the third and fourth paths), the effects of learning unwanted structural deformations and potentially creating phantom structures caused by the "antagonistic" loss terms are separated from the effects of preserving the original structures caused by the "cyclic consistency" loss terms. That is, the additional deformable offset

layers

660A and 660B in the first and third paths of the first portion 650 absorb or merge those shape distribution differences or other feature distribution differences between the CBCT image domain and the CT image domain. At the same time, the second and fourth paths are constrained to preserve perfectly all true anatomical structures. In this way, the sCT image produced from the first path has a similar planar CT image appearance, but may also contain those deformed structures or other phantom structures, while the sCT image produced by the second path strictly preserves the original true CBCT anatomy, without any other deformed structures, since it has no offset layer. Furthermore, since the first, second, third, and fourth paths share all processing layers except the deformable offset

layers

660A and 660B, the sCT image produced by the second path has a similar planar CT image appearance and an accurate CT number.

Fig. 6D shows an illustrative implementation of the third and fourth paths of the first portion 650. As shown, the sCT image (of results 612) generated using the offset layer is received and provided to a plurality of deformable offset layers 660B in a third path. The sCT image (of results 612) generated using the offset layer passes through the deformable offset layer 660B in a manner that interleaves with the convolution blocks in convolution block 661B. Although only four deformable displacement layers 660B are shown, more or fewer deformable displacement layers may be used. In parallel with the third path, the sCT image (of results 614) that was generated without the use of an offset layer only passes through the convolution block 661B and not through the deformable offset layer 660B via the fourth path. The two image outputs of the third and fourth paths are provided to the second arrangement of shared generator models 620 for parallel processing (e.g. by running parallel processes implementing the functionality of the second arrangement of generator models 620) to output respective loop-

CBCT images

628 and 629. Specifically, the cyclic-CBCT image 628 is generated using the offset layer, and the cyclic-CBCT image 629 is generated without using the offset layer. The cyclic-CBCT image 628 generated using the offset layer is provided to the second discriminator model 640 of the CBCT domain, and the cyclic-CBCT image 629 is not provided to the second discriminator model 640.

Referring back to fig. 6A, in an example, in a second portion 652, the second generator model 608 may be trained to receive the real CT training image 604 (which may include one of the image pairs 422) via fifth and sixth paths and generate respective first and second sccbct images (synthesized or simulated CBCT images) as first and second generation results 626 and 627. In particular, the real CT training image 604 may be processed by the one or more deformable offset layers 660B and the one or more convolution blocks 661B in a fifth path to produce a first sscbct image as the first production 626. The real CT training image 604 may be processed by the convolution block 661B in the sixth path without being at the deformable offset layer 660BTo produce a second sscbct image as the second generation 627. The second generator model 608 that generates the first generated result 626 via the fifth path using the first input interface including the convolution block 661B and the deformable offset layer 660B is the same generator as used in the first portion 650

And the second generator model 608 that generates the second generated result 627 via a sixth path using a second input interface that includes the convolution block 661B and that is free of the deformable offset layer 660B is the same generator G as the generator used in the first portion 650^ct2cbct。

In the fifth path, the real CT training image 604 may be processed by a first one of the deformable offset layers 660B, and the output of the first deformable offset layer is provided to a first one of the one or more convolution blocks 661B. The output of the first convolution block is then provided for processing by a second one of the deformable offset layers 660B. The output of the second deformable offset layer is then provided to another rolling block (if present) or directly to a second positioning section 620 of the generator model. In particular, the deformable offset layer 660 may be interleaved with the convolution block 661B.

In parallel with the fifth path, the real CT training image 604 may be processed in the sixth path only by the one or more convolution blocks 661B without passing through the deformable offset layer 660B. The convolution block 661B through which the real CT training image 604 passes may be shared by the fifth and sixth paths. Specifically, the real CT training image 604 passes through a first and another volume block (if any) of the one or more volume blocks 661B in the sixth path. The output of the first convolution block is then provided to a second arranging section 620 of the generator model. The second placing section 620 of the generator model processes the images output from the fifth and sixth paths in parallel or sequentially.

The first generated result 626 may be provided to the second determiner model 640 instead of the second generated result 627. The second classifier model 640 may classify the sccct image as a real CBCT training image or a simulated CBCT training image and provide the classification as the detection result 642. The first generation result 626 and the detection result 642 may be fed back to the second generator model 608 and the second discriminator model 640 to adjust the weights implemented by the second generator model 608 and the second discriminator model 640. For example, the first generation 626 (e.g., the sccbct image generated by the second generator model 620) and the detection 642 may be used to calculate the antagonistic loss.

The first generation result 626 (e.g., an sscbct image) may also be provided to the first generator model 606 simultaneously with the second generation result 627 via seventh and eighth paths, respectively. The first generator model 606 may receive the first generation 626 and generate as output a corresponding cyclic-CT image 624. The rotation-CT image 624 may be used to calculate a rotation loss to adjust the weights of the first/second generator models 606/608. The first generator model 606 that generates the cycle-CT image via the seventh path is the same generator that is used in the first portion 650

And the first generator model 608 that generates the cycle-CT image via the eighth path is the same generator G as used in the first portion 650^cbct2ct。

In some examples, "confrontation loss" may account for classification loss with respect to the first and

second discriminator models

630, 640. The first and

second discriminator models

630, 640 may classify whether the composite image has a distribution similar to a true image. For the cycle consistency loss, the loss between each pair of true CBCT image and cycle-CBCT image and the loss between each pair of true CT image and cycle-CT image are calculated, respectively. For example, a first loss between the CBCT training image 602 and the circled-CBCT image 622 may be calculated, and a second loss between the true training CT image 604 and the circled CT image 624 may be calculated. Both the cycle-CBCT image 622 and the cycle-CT image 624 can be obtained by performing a forward cycle and a reverse cycle. Each pair of true CBCT image 602 and loop-CBCT image 622 can be in the same CBCT image domain, and each pair of true training CT image 604 and loop-CT image 624 can be in the same CT image domain. Thus, the CycleGAN600 can rely on the entire pool (or majority) of real or true CBCT training images 602 and the entire pool (or majority) of true training CT images 604 to produce a composite CT image (sCT image) with the deformable offset layer 660A applied (e.g., result 612) and a composite CT image (sCT image) without the deformable offset layer applied (e.g., result 614), a composite CBCT image (sCT image) with the deformable offset layer 660B applied (e.g., result 626), and a composite CBCT image without the deformable offset layer applied (e.g., result 627), the cycle-CBCT image 622, and the cycle-CT image 624. Based on the "contrast loss" and "cyclic consistency loss", the CycleGAN600 can produce a sharp synthetic CT image with similar image resolution as the real CT image. This is at least one technical improvement over one of the earlier methods of improving image quality in a given CBCT image.

In some examples, the processor (e.g., of system 100) may apply image registration to register the real CT training image and the training CBCT image. This may create a one-to-one correspondence between CBCT images and CT images in the training data. This relationship may be referred to as a paired CBCT image and CT image or a paired CBCT image and CT image. In one example, the CycleGAN600 can generate one or more sCT images that retain exactly the same anatomical structure or substantially the same structure as in the corresponding CBCT image and also have high image quality, including pixel value accuracy, similar to a real CT image. In some cases, these anatomical structures may be determined from a map 424 (fig. 4) of the anatomical region that provides a metric representing similarity between the two images. In an example, to preserve the pixel value accuracy of the sCT image corresponding to the CBCT image (e.g., to make the CBCT image look like a real CT image), additional constraints can be added to the CycleGAN 600. These constraints may include: pixel value losses representing pixel value losses between the sCT image and the true CT image and between the sCT image and the true CBCT image are added. Such constraints may be part of constraints 426 (FIG. 4). These constraints that directly relate the sCT image to the corresponding CT image can be represented by the following pixel value loss terms (pixel-based loss terms):

item sCT-CT L1: e_{x～p(CBCT)，y～p(CT)}||G^cbct2ct(x)-y||1，

Where x, y are the paired CBCT image and true CT image, G^cbct2ctIs a first generator model 606 (e.g., CBCT to sCT generator G) including an input interface with a convolution block 661A and without a deformable offset layer 660A^cbct2ct) E denotes the expectation of L1 difference between all sCT images and the corresponding true CT image, and G^cbct2ct(x) Is the second sCT image. The value for E may be retrieved from storage 116, an external data source, human input, and/or may be continuously updated as additional sCT images and sCT images are generated.

Another pixel value loss term may also be added as a constraint to directly associate an scccbct image with a corresponding CBCT image, which may be represented by the following pixel value loss terms:

sCBCT-CBCT item L1: E_{x～p(CBCT),y～p(CT)}||G^ct2cbct(y)-x||₁，

Wherein x and y are a CBCT image and a CT image in pair, G^ct2cbctIs a second generator model 608 comprising an input interface with a convolution block 661B and without a deformable offset layer 660B (real CT-to-sCBCT generator G)^ct2cbct) And G is^ct2cbct(x) Is the second scccbct image.

These two pixel value loss terms (L1 norm terms) compare the sCT image with the corresponding true CT image and the sCT image with the corresponding true CBCT image. By minimizing or reducing the two L1 norm terms, the pixel values of the composite image will more likely match the pixel values of the target true/true image. More specifically, for example, by minimizing or reducing the sCT-CT L1 norm term, the resulting sCT image can be forced to have, or more likely to have, pixel level accuracy and similarity to a corresponding CT image. This may provide a technical improvement over conventional systems, since absolute pixel values in medical images may represent some specific physical measure or quantity. For example, CT pixel values contain information about electron density, which is useful for dosimetry calculations. As another example, the radiation therapy system 100 can more accurately direct radiation therapy or radiation to a target region within a subject.

In another example, metrics such as weights may be added to the pixel-based loss term (e.g., to account for the case where the paired images are not perfectly aligned). Such metrics may be retrieved from constraints 426. The metric may be generated using predetermined assigned weights, a Universal Quality Index (UQI), local cross-correlation, or using any other technique that measures similarity or other relationships between two images. For purposes of this disclosure, the metric used to measure the relationship between two images is the SSIM weight, but any other metric may be used. For example, the SSIM-weighted L1 norm method may be used to compensate for incomplete matching of paired CBCT-CT images. In particular, by minimizing or reducing the sCT-CT L1 norm term, the CycleGAN600 can enforce the following constraints: the pixel values of the sCT image match the pixel values of the CT image of the target. If all or substantially all of the paired CBCT-CT images are perfectly aligned, the true-to-target CT image may have the same anatomy as the CBCT image. In this case, the generated sCT image can also retain the same anatomical structures in the CBCT image. In some cases, the paired CBCT image and CT image may not be perfectly aligned (e.g., because some anatomical structures of the patient (particularly some organs or soft tissues) have changed between taking the CBCT image and the corresponding real CT image). In these cases, minimizing or reducing the sCT-CT and/or sCT-CBCT L1 loss terms may introduce errors into the trained generator and discriminator models (e.g., by forcing the generated composite CT image to match the corresponding CT images in those misaligned regions), resulting in some distortion of the generated images. One way to eliminate/suppress/reduce the potential distortion effects is to add SSIM weights to the sCT-CBCT pixel-based loss term and the sCT-CBCT pixel-based loss term, such as:

SSIM-weighted sCT-CT L1 term E_{x～p(CBCT),y～p(CT)}SSIM(x,y)·||G^cbct2ct(x)-y||₁，

And

SSIM-weighted sCBCT-CBCT L1 items E_{x～p(CBCT),y～p(CT)}SSIM(x,y)·||G^ct2cbct(y)-x||₁

In some implementations, the weight (SSIM (x, y)) may be a map (e.g., anatomical region 424) that has the same size or substantially the same size as the CBCT image and the CT image that are paired. Each pixel value in SSIM (x, y) may represent a degree of similarity between the paired original CBCT image and the corresponding true CT image at the same pixel location. The pixel value range of SSIM may be in the range of 0 to 1 (although other ranges are possible). A value of 1 may indicate complete structural similarity, which indicates that the CBCT image and the corresponding true CT image are well aligned at the image location, while 0 may indicate minimal structural similarity. For example, for each pair of CBCT image and real CT image (e.g., for each pair of CBCT image and real CT image stored in the storage 116), an SSIM map/image may be calculated (e.g., by a processor of the system 100). The SSIM graph may be calculated by the computing system 110 at the time a given pair of training images is formed or after a threshold number of training images have been obtained and stored. The computing system 110 may use image modeling to determine, for each pixel, a probability that the CBCT and true CT image pairs are aligned, and store the probability or update the SSIM value for the pixel with the determined probability. In some implementations, the probability value may be input by a human operator or directed by a physician. The SSIM-weighted sCT-CT pixel-based loss term may mean: if a pair of pixels between a pair of CBCT and real CT have a high SSIM value (close to 1) with strong similarity, the CycleGAN600 can match the sCT image with the target CT image (e.g., because the similarity between CBCT and CT images is significant) by minimizing the L1 norm term weighted by SSIM. On the other hand, if a pair of pixels between a pair of CBCTs and a real CT has a low SSIM value (close to 0), the CycleGAN600 can avoid matching the composite CT image to the target CT (e.g., because the weights are already small (close to 0)) by minimizing or reducing the SSIM-weighted L1 norm term. In particular, when the weights of the maps indicate a low degree of similarity between the paired CT/CBCT images, the likelihood of the pixel value penalty term being minimized or reduced at a given pixel location is reduced. This reduces the effect of the difference between the two images at the pixel location on the sCT. Furthermore, when the weights of the maps indicate a high degree of similarity between the paired CT/CBCT images, the likelihood that the pixel value loss term is minimized or reduced at a given pixel location increases. This increases the effect of the difference between the two images at the pixel location on the sCT. SSIM weights provide the following mechanism: different intensity levels are enforced on the sCT image to match the actual CT image targeted on a pixel-by-pixel level.

Similarly, the pixel-based loss term for SSIM-weighted sccbct-CBCT may mean: if a pair of pixels between a pair of CBCT and real CT have a high SSIM value (close to 1) with strong similarity, the CycleGAN600 can match the sccbct image with the target CBCT image (e.g., because the similarity between CBCT and CT images is significant) by minimizing the L1 norm term weighted by SSIM. On the other hand, if a pair of pixels between a pair of CBCTs and a real CT has a low value of SSIM (close to 0), the CycleGAN600 may avoid matching the composite CBCT image to the target CBCT (e.g., because the weights are already small (close to 0)) by minimizing or reducing the SSIM-weighted L1 norm term. In particular, when the weights of the maps indicate a low degree of similarity between the paired CT/CBCT images, the likelihood of the pixel value penalty term being minimized or reduced at a given pixel location is reduced. This reduces the effect of the difference between the two images at the pixel location on the sCBCT. Furthermore, when the weights of the maps indicate a high degree of similarity between the paired CT/CBCT images, the likelihood that the pixel value loss term is minimized or reduced at a given pixel location increases. This increases the effect of the difference between the two images at the pixel location on the sCBCT. SSIM weights provide the following mechanism: different intensity levels are enforced on the scccbct image to match the true CBCT image targeted on a pixel-by-pixel level.

In another example, a SSIM weighted L1 norm method with a threshold may be used. The threshold may be retrieved from constraint 426 (fig. 4). In particular, a threshold may be set for SSIM weights to select high similarity regions between the paired CBCT image and the true CT image and ignore low similarity regions. For example, when the weight values of SSIM (x, y) are less than a threshold α (hyperparameter), all those weights may be set to zero (e.g., the weights may be ignored, reducing the likelihood that the loss term is reduced or minimized). For such regions, the sCT is not forced to match the CT image since the weights are zero (information about those differences between the two images may not be available). Thus, since the weight is 0, multiplying by the difference is 0 means that the difference between the two images at those pixels has no effect. Instead, the process relies on the penalty on countermeasures and the loss of cyclic consistency to restore those pixel regions. When the weight value is greater than or equal to the threshold value α (hyper-parameter), the weight has the same effect on the pixel-based penalty value as discussed above. The thresholded SSIM weighted L1 norm may be expressed as:

since the sCT images produced by paths with deformable offset layers (e.g., the first and third paths) may include deformed structures or potentially other artificial structures, the SSIM-weighted L1 norm is placed only on sCT images produced by generators in paths that do not include deformable offset layers (e.g., the second and fourth paths) that strictly preserve all of the original anatomical structures in the CBCT images. Thus, the thresholded SSIM weighted L1 sCT-CT pixel-based loss term can be expressed as:

thresholded SSIM weighted sCT-CT L1 term:

the thresholded SSIM weighted L1 sCBCT-CBCT term can be expressed as:

threshold SSIM weighted sccbct-CBCT L1 term:

the range of the over-parameter a may be set between 0 and 1. In some examples, the value of the hyperparameter α may be set to 0.5 or 0.6. The additional thresholded SSIM weighted L1 norm term is placed only on generators without offset layers (e.g., generators for the second and fourth paths) and may be expressed as:

threshold SSIM weighted L1 term:

in some implementations, CycleGAN600 can be implemented to generate sCT images according to an objective function that includes a countermeasure loss term, a cyclic consistency loss term, and a pixel-based loss term (L1 norm term). The pixel-based penalty term may be weighted (e.g., SSIM weighted) and/or thresholded. The fight loss may be determined using the first/second generator models 610/620 and the first/second discriminator models 630/640, and may be expressed as:

the resistance loss:

wherein the content of the first and second substances,

D_ctis a first discriminator model that determines whether an image is a true CT image or an sCT image. D_cbctIs a second discriminator model that determines whether an image is a true CBCT image or an sscbct image.

In some embodiments, a single-scale SSIM metric is employed to assess structural similarity between two images (between an sCT image and a corresponding CT image). In this case, rather than using a single-scale version of the SSIM metric, the disclosed embodiments may also employ a multi-scale version of the SSIM metric (MS-SSIM) to assess structural similarity between two images. The multi-scale version of the SSIM metric uses a low-pass filter and downsampling to obtain multiple SSIMs at different resolution or view levels.

The cyclic consistency loss is applied to two generators in the first path, the second path, the third path, the fourth path, the fifth path, the sixth path, the seventh path, and the eighth path

And

such a cycle consistency loss term can be determined using

images

622 and 624, and can be expressed as:

practice of the disclosureMode only to the generator

And

(e.g., generators in the first, third, fifth, and seventh paths) apply a "antagonism" penalty, and for all generators (e.g.,

and G^cbct2ct、G^ct2cbct) A "cycle consistency" loss is applied. The effect of minimizing the "round robin consistency" penalties is to preserve the original structure and avoid unnecessary structural deformation, and the effect of minimizing the "antagonism" penalties is to learn the mapping or distribution transformation from one domain to its opposite hand domain. Unlike previous single generator approaches, two different generators are provided in each direction (e.g., two generators, one for each of the first and second paths, and two generators, one for each of the third and fourth paths). One generator is provided with an offset layer (e.g. in the first path)

) And one without an offset layer (e.g. G in the second stream)^cbct2ct). In addition, the generator shares weights and other modules with all other layers (except those offset layers). By combining the two loss terms on separate generators, the loss terms are separated and will not compete with each other. That is, by minimizing the "circular consistency" loss term, by the generator G having no offset layer in the second path^cbct2ctThe generated sCT images retain all true anatomical structures present in the original CBCT images. At the same time, losses due to "antagonism" are placed in the generators

So the offset layer will be trained to accommodate all unwanted shape deformations or illusive structures that may be created, and all of themHe layer will not introduce shape distortions or other potentially illusive structures. In contrast, the generator G^cbct2ctAn sCT image is generated with the true original anatomy. Since all other layers are generated by the generator

And generator G^cbct2ctShared, so they are simultaneously controlled by a "round robin consistency" penalty. Based on this approach, the disclosed technique can successfully separate two distinct effects caused by the "antagonism" and "cyclic consistency" loss terms. Thus, when the shape distribution or other feature distribution in the CT image is very different from the shape distribution or other feature distribution of the original CBCT image, the generator G generates a large difference^cbct2ctThe generated sCT images can still retain the true original anatomical structures present in the original CBCT images.

Accordingly, the overall objective function can be expressed as: total loss: l is_total＝L_GANs+λ₁·L_CYC+λ₂*L_SSIMWherein λ is₁、λ₂The relative intensities of the two losses are controlled separately.

The CycleGAN600 may train the first generator 610 and the second generator 620 according to:

this can be done by some common optimization algorithms used in the field of deep learning, such as random gradient descent, Adam method, or other popular methods. Once the generator is obtained

Also obtain the generator G^cbct2ctDue to G^cbct2ctIs a generator

A part of (a).

In some implementations, after training using the CycleGAN600, the first generator model 606 including the input interface with the convolution block 661A without the deformable offset layer 660A may be used in the system 100 to generate sCT images from acquired CBCT images. Other components of CycleGAN600 may be excluded from system 100.

The foregoing examples provide the following examples: how to train a GAN or CycleGAN based on CT image and CBCT image pairs, in particular from image data in 2D image slices in multiple parallel or sequential paths. It will be appreciated that the GAN or CycleGAN may process other forms of image data (e.g., 3D or other multi-dimensional images). Further, while only grayscale (including black and white) images are depicted by the figures, it will be understood that color images may also be generated and/or processed by GAN, as discussed in the examples below.

Fig. 7 shows a pair of CBCT input images and real CT input images used in connection with training and generating the sCT image model. In fig. 7, an image 702 shows an input CBCT image paired with a real CT image 704. Image 706 represents an image used as a map 424 of the anatomical region to provide an image similarity metric (e.g., SSIM weight), and image 708 is an image used as a map 424 of the anatomical region to provide a thresholded similarity metric (e.g., thresholded SSIM weight). After the generator of CycleGAN has been trained, a new CBCT image 712 may be received. The conventional CycleGAN can process the new CBCT image 712 and output the sCT image 714. Compared to the registered CT image 716, the conventionally generated sCT image 714 includes a large mismatch in the structures shown in the regions framed in the

images

714 and 716. To improve the quality of the sCT image output of the CycleGAN, a CycleGAN or GAN model trained via multiple paths (e.g., a first path, a second path, a third path, a fourth path, a fifth path, a sixth path, a seventh path, and an eighth path) in accordance with the disclosed techniques generates a modified sCT image 718 that correctly preserves the fine structure of the CBCT image 712 that was incorrectly represented by the conventionally-produced sCT image 714.

FIG. 8 shows a flowchart of a process 800 of exemplary operations for training a generative model adapted to output sCT images from input CBCT images via multiple parallel or sequential paths. The process 800 is illustrated from the perspective of the radiation therapy system 100, the radiation therapy system 100 using GAN or CycleGAN to train and utilize generative models as discussed in the previous example. However, the corresponding operations may be performed by other devices or systems (including in an offline training or verification setting separate from the particular image processing workflow or medical treatment).

As shown, the first phase of the flowchart workflow begins with operations (810, 820) that create parameters for training and model operations. The process 800 begins with operations of receiving (e.g., obtaining, extracting, identifying) training image data (operation 810) and receiving constraints or conditions for training (operation 820). In an example, the training image data may include image data from a plurality of human subjects relating to a particular condition, anatomical feature, or anatomical region, such as a CBCT image and a true CT image pair of the target region. Also in examples, the constraints may relate to imaging devices, treatment devices, patients, or medical considerations. In an example, the constraints can include a penalty loss, a round robin based penalty, and a pixel based value penalty term (or a weighted pixel based penalty term and/or a thresholded weighted pixel based penalty term).

The second phase of process 800 continues with training operations, including countermeasure training of generative models and discriminative models in a generative countermeasure network (operation 830). In an example, the confrontation training includes: the generative model is trained to process the input CBCT images to generate first and second simulated CT images via first and second paths (the first path including an input interface that includes the deformable offset layers and the volume blocks; and the second path including an input interface that includes the same volume blocks in the first path but without those offset layers) (operation 842). The first simulated CT image produced via the first path is provided to a discriminant model to train the discriminant model to classify the generated simulated CT image as simulated or real training data (operation 844). Also in such countermeasure training, the output of the generative model is used to train the discriminative model, and the output of the discriminative model is used to train the generative model. The first and second simulated CT images are passed to the second generator model via third and fourth paths, respectively. The third path includes an input interface including a deformable offset layer and a volume block; and the fourth path includes an input interface that includes the same volume blocks in the third path but without those offset layers. The third and fourth paths pass through a shared second generation model's placement to generate a cyclic CBCT image from the first and second simulated CT images processed with and without the deformable offset layer, respectively. The cyclic CBCT images are used in the loss term for training the generative model.

In various examples, the generative model and the discriminative model include respective convolutional neural networks (e.g., as discussed above with reference to fig. 3A and 3B, respectively). In other examples, the generative warfare network is a recurring generative warfare network (e.g., as discussed above with reference to fig. 6) in which multiple generative models and warfare models are employed and the output from one generative model is provided as input to a second generative model.

The process 800 continues with the output of the generative model for generating the sCT image (operation 850) because the generative model is adapted to generate the sCT image based on the input CBCT image of the subject. The generative model may be employed in any component of the system 100 to enhance the CBCT image or perform image processing. In some implementations, the generative model may be added to the system 100 from an external source (e.g., a third party vendor).

The process 800 continues with generating sCT images based on the input CBCT images of the subject using the trained generative model (operation 860). The generative model may be employed in any component of the system 100 to enhance the CBCT image or perform image processing. In some implementations, the generative model may be added to the system 100 from an external source (e.g., a third party vendor).

The process 800 ends with a final stage of implementing an update to the generative model, including updating the generative model based on additional training data (operation 870) and outputting an updated trained generative model (operation 880). In various examples, the updates may be generated in connection with receiving additional training image data and constraints (e.g., in a manner similar to operations 810, 820) or performing additional counter training (e.g., in a manner similar to operations 830, 842, 844). In further examples, the generative model may be specifically updated based on approval, changes, or use of the sCT image (e.g., resulting from modification, verification, or change to the image data by a healthcare professional). The flowchart ends with using the updated trained generative model (operation 890), such as may be performed when using the updated generative model for a subsequent radiation therapy treatment.

As discussed above with reference to fig. 6 and 8, the generative warfare network may be a recurrent generative warfare network that includes a generative model and a discriminant model.

As previously discussed, various electronic computing systems or devices may implement one or more of the method or functional operations as discussed herein. In one or more embodiments, the radiation therapy treatment computing system 110 may be configured, adapted or used to: controlling or operating the image-guided radiotherapy device 202; performing or implementing training or prediction operations from the model 300; operating the trained generator model 460; perform or implement

data flow

500, 550; perform or implement the operations of process 800; or perform any one or more of the other methods discussed herein (e.g., as part of the image processing logic 120 and workflows 130, 140). In various embodiments, such electronic computing systems or devices operate as standalone devices or may be connected (e.g., networked) to other machines. For example, such a computing system or apparatus may operate in the capacity of a server or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The features of a computing system or apparatus may be implemented by a Personal Computer (PC), a tablet PC, a Personal Digital Assistant (PDA), a cellular telephone, a web appliance or any machine capable of executing instructions (sequential or otherwise) that specify actions to be taken by that machine.

As also described above, the functions discussed above may be implemented by instructions, logic or other information storage on a machine-readable medium. While the machine-readable medium may have been described in various examples with reference to a single medium, the term "machine-readable medium" may include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more transitory or non-transitory instructions or data structures. The term "machine-readable medium" shall also be taken to include any tangible medium that is capable of storing, encoding or carrying transitory or non-transitory instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present subject matter, or that is capable of storing, encoding or carrying data structures utilized by or associated with such instructions.

The foregoing detailed description includes references to the accompanying drawings, which form a part hereof. The drawings illustrate by way of example, and not by way of limitation, specific embodiments in which the subject matter may be practiced. These embodiments are also referred to herein as "examples. Such examples may include elements in addition to those shown or described. However, the present disclosure also contemplates examples in which only those elements shown or described are provided. Moreover, this disclosure also contemplates examples using any combination or permutation of those elements (or one or more aspects of those elements) shown or described with respect to a particular example (or one or more aspects of a particular example) or with respect to other examples (or one or more aspects of other examples) shown or described herein.

All publications, patents, and patent documents referred to in this document are incorporated by reference herein in their entirety, as if individually incorporated by reference. The use in the incorporated reference(s) should be considered as a supplement to the use in this document if there is inconsistent use between this document and those incorporated by reference; for inconsistent inconsistencies, please refer to the usage in this document.

In this document, when introducing elements of aspects of the present subject matter or the elements of embodiments thereof, the terms "a," an, "" the, "and" said "are used as if they were common in patent documents to include one or more than one or more of the elements, regardless of any other instances or usages of" at least one "or" one or more. In this document, unless otherwise indicated, the term "or" is used to refer to a nonexclusive or such that "a or B" includes "a but not B," B but not a, "and" a and B.

In the appended claims, the terms "including" and "in which" are used as the plain-english equivalents of the respective terms "comprising" and "in which". Furthermore, in the following claims, the terms "comprising," "including," and "having" are intended to be open-ended, to mean that there may be additional elements other than the listed elements, such that the terms (e.g., comprising, including, having) in the claims are still considered to be within the scope of the claims. Furthermore, in the appended claims, the terms "first," "second," and "third," etc. are used merely as labels, and are not intended to impose numerical requirements on their objects.

The present subject matter also relates to a computing system adapted, configured, or operated to perform the operations herein. The system may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program (e.g., instructions, code, etc.) stored in the computer. The order of carrying out or performing the operations in the embodiments of the invention illustrated and described herein is not essential, unless otherwise specified. That is, the operations may be performed in any order, unless otherwise specified, and embodiments of the invention may include additional or fewer operations than those disclosed herein. For example, it is contemplated that implementing or performing a particular operation before, concurrently with, or after another operation is within the scope of aspects of the subject matter.

In view of the above, it will be seen that the several objects of the subject matter are achieved and other advantageous results attained. Having described aspects of the inventive subject matter in detail, it will be apparent that modifications and variations are possible without departing from the scope of aspects of the inventive subject matter as defined in the appended claims. As various changes could be made in the above constructions, products, and methods without departing from the scope of aspects of the inventive subject matter, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.

The examples described herein may be implemented in a wide variety of embodiments. For example, one embodiment includes a computing device that includes processing hardware (e.g., a processor or other processing circuitry) and memory hardware (e.g., a storage device or volatile memory) including instructions embodied thereon such that, when executed by the processing hardware, the instructions cause the computing device to implement, perform, or coordinate electronic operations for these techniques and system configurations. Another embodiment discussed herein includes a computer program product, such as may be implemented by a machine-readable medium or other storage device, that provides transitory or non-transitory instructions for implementing, executing, or coordinating electronic operations for these techniques and system configurations. Another embodiment discussed herein includes a method operable on processing hardware of a computing device to implement, perform, or coordinate electronic operations for these techniques and system configurations.

In other embodiments, the logic, commands, or transitory or non-transitory instructions to implement the aspects of the electronic operations described above may be provided in a distributed or centralized computing system, including any number of form factors with respect to computing systems such as desktop or notebook personal computers, mobile devices such as tablet computers, netbooks, and smart phones, client terminals, and server-hosted machine instances. Another embodiment discussed herein includes incorporating the techniques discussed herein into other forms, including programmed logic, hardware configurations, or other forms of specialized components or modules, including apparatus having corresponding means for performing the functions of such techniques. Various algorithms for implementing the functionality of such techniques may include sequences of some or all of the above described electronic operations or other aspects as depicted in the figures and the above detailed description.

The above description is intended to be illustrative, and not restrictive. For example, the above-described examples (or one or more aspects of an example) may be used in combination with each other. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the inventive subject matter without departing from its scope. While the dimensions, types and exemplary parameters, functions, and implementations of the materials described herein are intended to define the parameters of the inventive subject matter, they are by no means limiting embodiments, but rather are exemplary embodiments. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. The scope of the inventive subject matter should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.

Furthermore, in the foregoing detailed description, various features may be grouped together to simplify the present disclosure. This should not be construed as an intention: the features of the disclosure that are not claimed are essential to any claims. Rather, inventive subject matter may lie in less than all features of a particular disclosed embodiment. Thus the following claims are hereby incorporated into the detailed description, with each claim standing on its own as a separate embodiment. The scope of the present subject matter should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.

Claims

1. A computer-implemented method for generating a composite computed tomography (sCT) image from Cone Beam Computed Tomography (CBCT) images, the method comprising:

receiving a CBCT image of a subject;

generating an sCT image corresponding to the CBCT image using a generative model trained in a generative countermeasure network (GAN) based on one or more deformable offset layers to process the CBCT image as input and provide the sCT image as output; and

generating a display of the sCT image for medical analysis of the subject.

2. The method of claim 1, wherein:

the generative countermeasure network is configured to: training the generative model using a discriminant model;

using countermeasure training between the discriminant model and the generative model to establish values applied by the generative model and the discriminant model; and

the generative model and the discriminative model include respective convolutional neural networks.

3. The method of claim 2, wherein:

the confrontational training includes:

training the generative model to generate a first sCT image from a given CBCT image by applying a first set of one or more deformable offset layers of the one or more deformable offset layers to the given CBCT image;

training the generative model to generate a second sCT image from the given CBCT image without applying the first set of one or more deformable offset layers of the one or more deformable offset layers to the given CBCT image; and

training the discriminative model to classify the first sCT image as a synthetic Computed Tomography (CT) image or a real Computed Tomography (CT) image, and an output of the generative model is used to train the discriminative model and an output of the discriminative model is used to train the generative model.

4. The method of claim 3, wherein the GAN is trained using a cycle generating antagonistic network (CycleGAN) that includes the generative model and the discriminative model, wherein the generative model is a first generative model and the discriminative model is a first discriminative model, wherein the CycleGAN further comprises:

a second generative model trained to:

processing a given CT image as input;

providing as output a first composite (scBCT) image by applying a second set of one or more deformable offset layers of the one or more deformable offset layers to the given CT image; and

providing a second composite (sCBCT) image as output without applying the second set of one or more deformable offset layers of the one or more deformable offset layers to the given CT image; and

a second classification model trained to classify the first synthesized sCBCT image as either a synthesized CBCT image or a true CBCT image.

5. The method of claim 4, wherein the CycleGAN includes a first portion for training the first generative model, wherein the first generative model includes first and second input interfaces and a first shared generator portion, wherein the second generative model includes third and fourth input interfaces and a second shared generator portion, the first portion trained to:

obtaining a training CBCT image paired with the real CT image;

sending the training CBCT image to an input of the first generative model via a first path and a second path to output the first sCT image and the second sCT image, respectively, the first path including a first input interface including the first set of one or more deformable offset layers of the one or more deformable offset layers and a first set of one or more convolutional layers, the second path including a second input interface including the first set of one or more convolutional layers of the one or more convolutional layers without the first set of one or more deformable offset layers;

receiving the first sCT image at an input of the first discriminant model to classify the first sCT image as the composite CT image or the real CT image; and

receiving the first sCT image and the second sCT image at an input of the second generative model via a third path and a fourth path to generate a first cyclic CBCT image and a second cyclic CBCT image, respectively, for calculating a cyclic consistency loss, the third path comprising a third input interface comprising a second set of one or more deformable offset layers of the one or more deformable offset layers and a second set of one or more convolutional layers of the one or more convolutional layers, the fourth path comprising a fourth input interface comprising the second set of one or more convolutional layers of the one or more convolutional layers without the second set of one or more deformable offset layers.

6. The method of claim 5, wherein the CycleGAN includes a second portion trained to:

sending the real CT image to an input of the second generative model via a fifth path and a sixth path to output a first composite CBCT image and a second composite CBCT image, respectively, the fifth path including a third input interface including the second set of one or more deformable offset layers of the one or more deformable offset layers and the second set of one or more convolutional layers of the one or more convolutional layers, the sixth path including a fourth input interface including the second set of one or more convolutional layers of the one or more convolutional layers without the second set of one or more deformable offset layers;

receiving the first composite CBCT image at an input of the second decision model to classify the first composite CBCT image as a composite CBCT image or a true CBCT image; and

receiving the first and second composite CBCT images at an input of the first generative model via a seventh path and an eighth path to generate first and second cyclic CT images for calculating a cyclic consistency loss, the seventh path comprising a first input interface comprising the first set of one or more deformable offset layers of the one or more deformable offset layers and the first set of one or more convolutional layers of the one or more convolutional layers, the eighth path comprising a second input interface comprising the first set of one or more convolutional layers of the one or more convolutional layers without the first set of one or more deformable offset layers.

7. The method of claim 6, wherein:

generating the cycle consistency loss based on a comparison of the first and second cycle CBCT images to the training CBCT image and a comparison of the first and second cycle CT images to the real CT image;

training the first generative model using the second sCT image to minimize or reduce a first pixel-based loss term representing an expectation of differences between a plurality of synthetic CT images and respective pairs of real CT images; and

training the second generative model using the second composite (sCBCT) image to minimize or reduce a second pixel-based loss term representing a desire for differences between the plurality of composite CBCT images and the respectively paired true CBCT images.

8. The method of claim 7, wherein:

the CycleGAN is trained to apply a metric to the first and second pixel-based loss terms, the metric being generated based on a map of the same size as a pair of CBCT images and real CT images, such that each pixel value in the map represents a degree of similarity between a given CBCT image and a given real CT image paired with the given CBCT image; and

the CycleGAN is trained to apply a threshold to the metric such that when the degree of similarity exceeds the threshold, the metric is applied to the first and second pixel-based loss terms, and otherwise zero values are applied to the first and second pixel-based loss terms.

9. The method of any one of claims 1 to 8, wherein the CycleGAN is trained to apply one of a plurality of metrics to the first and second pixel-based loss terms, the metrics generated using low pass filtering and downsampling of the paired CBCT and CT images at different image resolutions or view levels.

10. The method of claim 9, wherein the one or more deformable offset layers are trained based on the antagonistic training to change an amount of sampling, introduce coordinate offsets, and resample images using interpolation to store or absorb deformed structural information between the paired CBCT and CT images.

11. A computer-implemented method for training a model to generate a composite computed tomography (sCT) image from Cone Beam Computed Tomography (CBCT) images, the method comprising:

receiving a CBCT image of the subject as an input to generate a model; and

training the generative model in a generative countermeasure network (GAN) via a first path and a second path to process the CBCT image to provide a first and second synthetic computed tomography (sCT) images corresponding to the CBCT image as an output of the generative model, the first path including a first set of one or more deformable offset layers of the one or more deformable offset layers and a first set of one or more convolutional layers of the one or more convolutional layers, the second path including the first set of one or more convolutional layers of the one or more convolutional layers without the first set of one or more deformable offset layers.

12. The method of claim 11, wherein the GAN is trained using a cycle generated antagonistic network (CycleGAN) that includes the generative model and a discriminative model, wherein the generative model is a first generative model and the discriminative model is a first discriminative model, the method further comprising:

training a second generative model to process the generated first and second sCT images as input and provide as output first and second cyclical CBCT images via a third path and a fourth path, respectively, the third path including a second set of one or more deformable offset layers of the one or more deformable offset layers and a second set of one or more convolutional layers of the one or more convolutional layers, the fourth path including the second set of one or more convolutional layers of the one or more convolutional layers without the second set of one or more deformable offset layers; and

a second discrimination model is trained to classify the first-cycle CBCT image as a composite CBCT image or a true CBCT image.

13. The method of claim 12, wherein the CycleGAN includes a first portion and a second portion for training the first generative model, the method further comprising:

obtaining a training CBCT image paired with the real CT image;

sending the training CBCT image to an input of the first generative model via the first path and the second path to output a first composite CT image and a second composite CT image;

receiving the first composite CT image at an input of the first discriminant model;

classifying the first composite CT image as a composite CT image or a true CT image using the first discriminative model;

receiving the first and second composite CT images at an input of the second generative model via the third and fourth paths to generate the first and second cyclic CBCT images for calculating cyclic consistency loss;

sending the real CT image to an input of the second generative model via a fifth path and a sixth path to output a first composite training CBCT image and a second composite training CBCT image, the fifth path including the second set of one or more deformable offset layers of the one or more deformable offset layers and the second set of one or more convolutional layers of the one or more convolutional layers, the fourth path including the second set of one or more convolutional layers of the one or more convolutional layers without the second set of one or more deformable offset layers;

receiving the first synthetic training CBCT image at an input of the second decision model;

classifying the first synthetic training CBCT image as a synthetic CBCT image or a true CBCT image using the second classification model;

receiving first and second composite CBCT images at an input of the first generative model via a seventh path and an eighth path to generate first and second cyclic CT images for calculating a cyclic consistency loss, the seventh path comprising the first set of one or more of the one or more deformable offset layers and the first set of one or more of the one or more convolutional layers, the eighth path comprising the first set of one or more of the one or more convolutional layers without the first set of one or more deformable offset layers;

training the second generative model using a second composite (scBCT) image to minimize or reduce a second pixel-based loss, the second pixel-based loss term representing a expectation of differences between the plurality of composite CBCT images and respective pairs of real CBCT images.

14. A system for generating a composite computed tomography (sCT) image from Cone Beam Computed Tomography (CBCT) images, the system comprising:

processing circuitry comprising at least one processor; and

a storage medium comprising instructions that, when executed by the at least one processor, cause the processor to perform operations comprising:

receiving a CBCT image of a subject;

generating a display of the sCT image for medical analysis of the subject.

15. The system of claim 14, wherein:

the one or more deformable offset layers is a first set of one or more deformable offset layers;

the generative confrontation network is configured to train the generative model using a discriminative model;

the values applied by the generative model and the discriminative model are established using a competing training between the discriminative model and the generative model; and

the generation model and the discrimination model comprise respective convolutional neural networks; wherein the content of the first and second substances,

the confrontational training includes:

training the generative model to generate a first sCT image from a given CBCT image by applying the first set of one or more deformable offset layers of the one or more deformable offset layers to the given CBCT image,

training the generative model to generate a second sCT image from the given CBCT image without applying the first set of one or more deformable offset layers of the one or more deformable offset layers to the given CBCT image, an

Training the discriminative model to classify the first sCT image as a composite Computed Tomography (CT) image or a real Computed Tomography (CT) image, an

The output of the generative model is used to train the discriminative model, and the output of the discriminative model is used to train the generative model.

16. The system of claim 15, wherein the GAN is trained using a cycle generating antagonistic network (CycleGAN) that includes the generative model and the discriminative model, wherein the generative model is a first generative model and the discriminative model is a first discriminative model, wherein the CycleGAN further comprises:

a second generative model trained to:

processing a given CT image as input;

a second discrimination model trained to: classifying the first synthesized sCBCT image as either a synthesized CBCT image or a true CBCT image.

17. The system of claim 16, wherein the CycleGAN includes a first portion for training the first generative model, the first portion being trained to:

obtaining a training CBCT image paired with the real CT image;

sending the training CBCT image to an input of the first generative model via a first path and a second path to output the first sCT image and the second sCT image, respectively, the first path including the first set of one or more deformable offset layers of the one or more deformable offset layers and a first set of one or more convolutional layers of the one or more convolutional layers, the second path including the first set of one or more convolutional layers of the one or more convolutional layers without the first set of one or more deformable offset layers;

receiving the first sCT image at an input of the first discriminant model to classify the first sCT image as the synthesized CT image or the real CT image; and

receiving the first sCT image and the second sCT image at an input of the second generative model via a third path and a fourth path to generate a first cyclic CBCT image and a second cyclic CBCT image, respectively, for calculating a cyclic consistency loss, the third path including the second set of one or more deformable offset layers of the one or more deformable offset layers and the second set of one or more convolutional layers of the one or more convolutional layers, the fourth path including the second set of one or more convolutional layers of the one or more convolutional layers without the second set of one or more deformable offset layers.

18. The system of claim 17, wherein the CycleGAN includes a second portion trained to:

sending the real CT image to an input of the second generative model via a fifth path and a sixth path to output a first composite CBCT image and a second composite CBCT image, respectively, the fifth path including the second set of one or more deformable offset layers of the one or more deformable offset layers and the second set of one or more convolutional layers of the one or more convolutional layers, the sixth path including the second set of one or more convolutional layers of the one or more convolutional layers without the second set of one or more deformable offset layers;

receiving the first and second composite CBCT images at an input of the first generative model via a seventh path and an eighth path to generate first and second cyclic CT images for calculating a cyclic consistency loss, the seventh path comprising the first set of one or more deformable offset layers of the one or more deformable offset layers and the first set of one or more convolutional layers of the one or more convolutional layers, the eighth path comprising the first set of one or more convolutional layers of the one or more convolutional layers without the first set of one or more deformable offset layers.

19. The system of claim 18, wherein:

the cycle consistency loss is generated based on a comparison of the first and second cycle CBCT images to the training CBCT image and a comparison of the first and second cycle CT images to the real CT image;

20. The system of any one of claims 14 to 20, wherein the one or more deformable displacement layers comprise at least one of: one or more modules based on countertraining to change the amount of sampling, introduce coordinate offsets, and resample the image using interpolation, one or more spatial transformers, one or more convolution layers, or one or more modules configured to store deformed structural information of the image.