WO2024081596A1

WO2024081596A1 - Identification and characterization of gene fusions by crispr-targeted nanopore sequencing

Info

Publication number: WO2024081596A1
Application number: PCT/US2023/076391
Authority: WO
Inventors: Giwon Shin; Hanlee P. Ji
Original assignee: The Board Of Trustees Of The Leland Stanford Junior University
Priority date: 2022-10-10
Filing date: 2023-10-09
Publication date: 2024-04-18

Abstract

Provided herein is a method that involves lysing cells that have a fusion between a first gene and a second gene at an end of an electrophoresis gel, applying a voltage potential to the gel to intact genomic DNA at one end of the gel, digesting the trapped genomic DNA using two or more pairs of RNA-guided endonucleases to release segments, electrophoresing the segments, eluting the segments into different fractions and analyzing the sequences nucleic acid collected in the fractions to identify a fraction that contains the segments of the first and second genes and a fraction that contains the segment of the gene fusion.

Description

S22-267 IDENTIFICATION AND CHARACTERIZATION OF GENE FUSIONS BY CRISPR- TARGETED NANOPORE SEQUENCING GOVERNMENT RIGHTS This invention was made with Government support under contracts CA247700, HG006137, and HG010963 awarded by the National Institutes of Health. The Government has certain rights in the invention. CROSS-REFERENCING This application claims the benefit of U.S. provisional application serial no. 63/414,889, filed on October 10, 2022, which application is incorporated by reference herein. INCORPORATION BY REFERENCE OF SEQUENCE LISTING PROVIDED AS A SEQUENCE LISTING XML FILE A Sequence Listing is provided herewith as a Sequence Listing XML, “STAN-2002WO_SEQ_LIST”, created on October 9, 2023 and having a size of 13,404 bytes. The contents of the Sequence Listing XML are incorporated herein by reference in their entirety. BACKGROUND Genomic instability is the increased frequency of DNA mutations and is characteristic to nearly all human cancer types. This instability can manifest in somatic chromosomes in many ways, ranging from single base-pair alterations to chromosomal translocations and deletions. Increased mutation rate due to genomic instability drives cancer progression through the silencing of tumor suppressor genes and activation of protooncogenes. For this reason, genomic instability is a hallmark of cancer. However, genomic instability is not required for cancer development, as can be observed in mismatch repair deficient colorectal tumors and colorectal adenomas. Still, 90% of all cancer types feature alterations in chromosome number and structure, with 16.5% possessing a driving fusion mutation. Leukemias commonly have at least one significant chromosomal variation. Understanding and identifying the specific genomic S22-267 aberrations in cancer types can be used to more accurately provide diagnoses, prognoses, and targeted therapies. In acute myeloid leukemia (AML), certain chromosomal translocations are associated with a good prognosis, whereas some chromosomal deletions or duplications are indicative of an aggressive cancer with a poor prognosis. The ability to determine the specific mutations in cancer provides significant clinical utility and allows for the precise delivery of targeted therapies. Chromosomal rearrangement due to genetic instability can result in the fusion of genes. The resulting proteins from these gene fusions have altered functionality and can drive cancer progression, such as the BCR-ABL fusion protein observed in chronic myeloid leukemia (CML). This protein has increased tyrosine kinase activity as compared to its normally functioning counterpart. Patients with the BCR-ABL gene fusion can now be treated with specific tyrosine kinase inhibitors. The identification of this gene fusion and resulting protein allows for targeted therapy and improved prognosis of CML patients. As manageable as this example seems, the variable nature of genetic instability maintains great complexities. An example of such complexity is illustrated through KMT2A gene fusions. The KMT2A gene consists of 36 exons and is located on chromosome 11 in the q23 position. The gene encodes a protein with a H3K4 methyltransferase domain that plays a critical role in the regulation of gene expression in early development and hematopoiesis. KMT2A gene fusions are among the most common chromosomal abnormalities in acute leukemias, occurring in 80% of infant acute lymphoblastic leukemias (ALL), 5% of acute monocytic leukemia (AML) cases, and in 85% of secondary AML cases in patients previously treated with topoisomerase II inhibitors. KMT2A has 135 reported fusion partners, some of which are pathogenic and some are not. Regardless of the pathogenicity, it is well established that KMT2A gene fusions are drivers of acute leukemia. The most frequent partner genes in KMT2A gene fusions are AFF1, MLLT3, MLLT10, ELL, and AFDN. These partner genes make up 80% of the gene fusions in KMT2A-positive acute leukemias. The specific fusion partner contributes to the determination of either the myeloid or lymphoblastic disease phenopatype, and the nature of the rearrangement can be used to predict prognosis. The KMT2A gene has a multitude of breakpoint regions in several different exons. The majority of KMT2A breakpoints are localized to a breakpoint cluster region between exon 7 and exon 14, but there are more than S22-267 one cluster regions, whose combined range is greater than 22 kb. Cytogenetic analysis can be used to identify chromosomal abnormalities through karyotyping and fluorescence in-situ hybridization (FISH). These methods provide visual information on chromosome structure and some specific variations, but they are limited by low resolution. In addition, these cytogenetic methods cannot provide information on i) the breakpoints with base pair resolution, and ii) whether the fusion gene is transcriptionally active. A more common method to detect and characterize fusion genes is reverse transcription polymerase chain reaction (RT-PCR), which uses transcribed mRNA to determine the fusion gene. This method is highly tedious and requires prior cytogenetic analysis to confirm the results. RNA-sequencing can also be used for whole transcriptome analysis by sequencing chimeric gene fusions through mRNA expression. This method is not without fault, as it is also highly complex and returns a considerable number of false positives and negatives. A recent report showed that RNA-seq failed to detect 10% gene fusions that were determined by routine cytogenetics methods with high confidence. The false negative cases generally coincided with a low sequencing coverage for the transcribed fusion genes, which was likely due to instability of RNA molecules and/or low expression of the fusion gene. To avoid the false positives related to the stability issues and the dynamics of the gene expression, sequencing gene fusions at the gene level could, in theory, be a solution. However, the size of fusion gene can be greater than 100 kb, making it very difficult to target. When focusing only on the breakpoint regions, the size of target can be smaller. For example, KMT2A gene has two major breakpoint cluster regions, whose combined length is less than 30kb. However, even with this information, designing an assay for determining the breakpoint is still be challenging because i) although less frequent, the breakpoint can be located outside of these cluster regions and ii) the range is still larger than the size that is amplifiable with routine PCR. These issues can generate false positive results. Furthermore, DNA-based fusion detection does not generally provide information about whether the fusion gene is transcriptionally active. By way of example, acute leukemia is a complex disease with varying prognoses depending on specific chromosomal abnormalities. In particular, certain chromosomal translocations are associated with a favorable prognosis, while others, such as translocations, indicate a more aggressive course and a poor prognosis. One of the most common S22-267 chromosomal abnormalities in acute leukemias is the fusion of the KMT2A gene with various partner genes. However, characterizing these KMT2A gene fusions has been challenging due to their complexity. There are numerous reported partner genes, and the participating genes have multiple breakpoint regions spanning different exons. This complexity has hindered the genetic and epigenetic characterization of these gene fusions using existing methods. Consequently, the relationship between the genetic and epigenetic variations of gene fusions and the cellular physiology of leukemia remains unclear. The present method is believed to solve this problem. SUMMARY This disclosure provides a way to characterize gene fusions. In the example, KMT2A gene fusions are characterized in two types of acute leukemia (lymphocytic and myelocytic) from both genetic and epigenetic perspectives. A targeted long-read nanopore sequencing approach to analyze gene fusions between KMT2A and five frequently observed partner genes. In this targeted approach, the fusion and wild-type genes are physically separated. The method provides a way to identify the exact breakpoints with base pair resolution, determining phased variants by assembling complete contigs of both wild-type and fusion alleles, and examining epigenetic changes. The epigenetic information obtained potentially allows one to predict the expression activity of the fusion and wild-type genes. In some embodiments, the method may comprise: (a) lysing cells that have a fusion between a first gene and a second gene at an end of an electrophoresis gel to release the genomic DNA from the cells; (b) applying a voltage potential to the gel, thereby trapping intact genomic DNA at one end of the gel; (c) digesting the trapped genomic DNA using two or more pairs of RNA-guided endonucleases to release: (i) a segment of the first gene; (ii) a segment of the second gene; and (iii) a segment of a gene fusion between the first and second genes; wherein the segments of the first and second genes are approximately the same size and the segment of the gene fusion is resolvable from the segments of the first and second genes in the gel; (d) electrophoresing the segments of (i), (ii) and (iii) through the gel, thereby separating the segment of (iii) from the segments of (i) and (ii); (e) eluting the segments into different fractions by applying a second voltage potential to the gel, wherein the second voltage potential is S22-267 orthogonal to the potential of (a), wherein the segment of (iii) is eluted into a fraction that is different to the fraction into which the segments of (i) and (ii) are eluted; and (f) assaying for the flanking sequences of the first and second genes in the fractions collected in (e), thereby identifying a fraction that contains the segments of the first and second genes and a fraction contains the segment of the gene fusion. BRIEF DESCRIPTION OF THE FIGURES The skilled artisan will understand that the drawings, described below, are for illustration purposes only. The drawings are not intended to limit the scope of the present teachings in any way. Figs. 1A and 1B. Limitations in fusion gene characterization using targeted long-read sequencing (Fig. 1A), and automated and streamlined separation of targets from genomic DNA and between wild-type and structural variant (Fig. 1B). Simple qPCR analysis for quantification of target HMW fragments in elution modules informs existence of fusion gene even before downstream sequencing step. Figs. 2A and 2B. TaqMan qPCR copy number analysis for detecting size-shifted fusion gene. Fig. 2A: Illustration of wild type and fusion genes in CATCH assay products. The size of HMW assay products are provided, and red arrow head indicates CRISPR target sites. Dumbbells with bent and circled ends indicate TaqMan probes binding to different locations of target HMW fragments. The probes are color-coded in the illustration, and some assay probes bind to both wild type and fusion fragments because they target shared sequences. Fig. 2B: Copy numbers measured by each TaqMan assay probes. For each elution module, copy numbers of both target and control (RNaseP) are shown. The elution modules are indicated as expected length of the HMW DNA fragments collected in them. Figs. 3A and 3B. ONT sequencing read alignment for identification of fusion breakpoints (Fig. 3A) and phased variants (Fig. 3B). Bars representing the sequence context (wild type or fusion genes with flanking sequences) are shown at the top of sequence pile ups. For fusion genes sequence, pile-ups for two different genomic locations are shown with solid lines guiding the breakpoint locations. Zoom-ins for examples of phased variants are also S22-267 shown. Figs. 4A and 4B. Differential DNA modification in KMT2A promoter CpG. Sequence pile-ups and read depths are shown with information about CpG modifications (Fig. 4A). Red bars indicate 5-methylcytosine (5mC), and blue bars indicate 5-hydoxymethylcytosine (5hmC). a depletion of 5mC in both the wild-type and fusion genes was observed. Fig. 4B illustrates graphs showing a depletion of 5mC in both the wild-type and fusion genes. Fig. 5. Flow chart illustrating a guide RNA design process. Fig. 6. Two gRNA designs for KMT2A gene. The two designs generate HMW fragments where the location of KMT2A gene is different. Fig. 7. Detection of fusion genes resulted from reciprocal translocation. In right panels, illustration of wild type and fusion genes in CATCH assay products are shown with breakpoint cluster regions as well as the expected length of contributing parts of fusion genes. Two TaqMan copy number assay probes are shown with their target locations in the assay products and the copy number data for individual elution modules. Two fusion fragments were detected which are longer or shorter than the wild type fragments. Dotted guide lines indicate the target HMW fragments collected in corresponding elution modules. Fig. 8. Comparison of ligation- and tagmentation-based ONT library preparations. (Upper panels) For a genomic region including KMT2A gene, binned sequencing read depth (bin size: 5 kb) is shown for different library preparations (ligation versus tagmentation) and CATCH assays (300-kb versus 1-Mb assays). Blue color indicates alignments to forward strand, and red color indicates alignments to reverse strand. (Lower panels) Box plots are shown for location specific read depth for all the six target in the multiplex CATCH assay. Mean coverage for 15-kb regions at 5’-end, middle, and 3’-end of each gene are used to compare uniformity of sequence coverage. Fig. 9: Table of gRNA sequences. From top to botton: SEQ ID NOS: 1-14. DETAILED DESCRIPTION S22-267 Unless defined otherwise herein, all technical and scientific terms used in this specification have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are described. All patents and publications, including all sequences disclosed within such patents and publications, referred to herein are expressly incorporated by reference. Numeric ranges are inclusive of the numbers defining the range. Unless otherwise indicated, nucleic acids are written left to right in 5' to 3' orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively. The headings provided herein are not limitations of the various aspects or embodiments of the invention. Accordingly, the terms defined immediately below are more fully defined by reference to the specification as a whole. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Singleton, et al., DICTIONARY OF MICROBIOLOGY AND MOLECULAR BIOLOGY, 2D ED., John Wiley and Sons, New York (1994), and Hale & Markham, THE HARPER COLLINS DICTIONARY OF BIOLOGY, Harper Perennial, N.Y. (1991) provide one of ordinary skill in the art with the general meaning of many of the terms used herein. Still, certain terms are defined below for the sake of clarity and ease of reference. Other definitions of terms may appear throughout the specification. In some embodiments, the method may comprise lysing cells that have a fusion between a first gene and a second gene at an end of an electrophoresis gel to release the genomic DNA from the cells. Methods for lysing cells are well known. In any embodiment, the cells may be blood cells (e.g., PBMCs), cultured cells, or a dissociated tissue cell suspension, for example, although a biopsy could be used. Several gene fusions that are thought to cause cancer have already been identified and may be targeted using the present method. As such, in some embodiments, the first region to which the forward primers bind may be selected from the group consisting of ROS1, ALK, EML4, BCR, ABL, TCF3, PBX1, ETV6, RUNX1, MLL, AF4, SIL, TAL1, RET, NTRK1, PAX8, PPARG, MECT1, MAML2, TFE3, TFEB, BRD4, NUT, ETV6, S22-267 NTRK3, TMPRSS2, NKRT2, KMT2A and ERG. In some embodiments, the cells may have a fusion involving ALK, RET, KMT2A, NTRK1, ROS1, BRAF, EGFR, NRG1 or MET. Possible fusion partners for these genes are numerous. For example, if the fusion involves the ALK gene, then the fusion partner may be EML4, STRN, KIF5B and/or TFG. Likewise, if the fusion involves the ROS1 gene, then the fusion partner may be CD74, SLC34A2, SDC4, TPM3 and/or EZR. In some embodiments, the fusion-specific primers may target any one or more of the following fusions: CD74-ROS1, SLC34A2-ROS1, SDC4-ROS1, EZR-ROS1, GOPC-ROS1, LRIG3-ROS1, TPM3-ROS1, PPFIBP1-ROS1, EML4-ALK, BCR-ABL, TCF3-PBX1, ETV6- RUNX1, MLL-AF4, SIL-TAL1, RET-NTRK1, PAX8-PPARG, MECT1-MAML2, TFE3- TFEB, BRD4-NUT, ETV6-NTRK3, TMPRSS2-ERG, TPM3-NTRK1, SQSTM1-NTRK1, CD74-NTRK1, MPRIP-NTRK1 and TRIM24-NTRK2. The method finds particular utility in analyzing blood cells. In these embodiments, the patient from which the blood cells were obtained may have a blood cancer such as acute myeloid leukemia (AML), acute lymphoblastic leukemia (ALL), or mixed-phenotype acute leukemia (MPAL). Gene fusions found in AML include RUNX1-RUNX1T1, PML-RARA, ZNF292-PNRC1, NUP98-NSD1, CBFB-MYH11, KMT2A-MLLT4, KMT2A-MLLT3, KMT2A-MLLT10, and DEK-NUP214. Gene fusions found in ALL include BCR-ABL1, ETV6- RUNX1, EP300-ZNF384, TCF3-PBX1, KMT2A-AFF1, MEF2D-BCL9, STIL-TAL1, TCF3- HLF, ZNF292-PNRC1, EBF1-PDGFRB, PAX5-NOL4L, PICALM-MLLT10, and TCF3- ZNF384. Gene fusions found in MPAL patients include BCR-ABL1, ETV6-ARNT, ETV6- NCOA2, ETV6-LOH12CR1, PICALM-MLLT10, NAP1L1-MLLT10, RUNX1-MECOM, TRA2B-MECOM, SET-NUP214 and KMT2A-MLLT4. KMT2A fusions ae relatively common in ALL and AML. This gene has more than 100 fusion partners. Gene fusions involving KMT2A include KMT2A-MLLT4, KMT2A-MLLT3, KMT2A-MLLT10, KMT2A-AFF1, KMT2A-MLLT1, KMT2A-ELL, KMT2A-MLLT6, KMT2A-USP2, KMT2A- MAML2, KMT2A-MLLT11, KMT2A-MYO1F, KMT2A-SEPT5, KMT2A-SEPT6, and KMT2A-CARS, among others. The gene fusion may be a kinase, transcription factor or epigenetic genes (a chromatin modifier) for example. Next, the method may involve applying a voltage potential to the gel, thereby trapping intact genomic DNA at one end of the gel and digesting the trapped genomic DNA using two or S22-267 more pairs of RNA-guided endonucleases to release: a segment of the first gene a segment of the second gene (i.e., a ‘fusion partner’ for the first gene); and a segment of a gene fusion between the first and second genes. In these embodiments, the segments of the first and second genes are approximately the same size (e.g., the segments have a size difference of less than at least 100kb, less then 50kb, or less than 20kb) and the segment of the gene fusion is resolvable from the segments of the first and second genes in the gel. In these embodiments, the segments may be 100kb - 1 Mb in length and the segment pf the gene fusion is smaller or larger than the other segments by at least 50kb, at least 100kb, at least 200kb. After digestion, the method may comprise electrophoresing the segments through the gel, thereby separating the segment of segment of the gene fusion from the segments of the first and second genes. This step may be done by pulsed field electrophoresis. Some of the initial steps of the present method may be adapted from Zhou et al (BioRxiv, 2020. 10.1101/2020.10.23.349621v3), which describes method for isolating a method for isolating targets that are from 50 kb to 1 Mb (e.g., 200-500kb) in length. In some embodiments, the method may comprise eluting the segments into different fractions (e.g., 4-10 fractions) by applying a second voltage potential to the gel, wherein the second voltage potential is orthogonal to the potential used earlier in the method, wherein the segment of gene fusion is eluted into a fraction that is different to the fraction into which the segments of first and second genes are eluted. After the fractions have been collected, the flanking sequences of the first and second genes may be assayed, e.g., by quantitative PCR (e.g., Taqman) to identifying a fraction that contains the segments of the first and second genes and a fraction contains the segment of the gene fusion. If desired (e.g., to identify a breakpoint) the the eluted segment of the gene fusion may be sequenced using any suitable long range sequencing technology, e.g., nanopore sequencing (e.g., as described in Soni et al. Clin. Chem. 200753: 1996-2001, or as described by Oxford Nanopore Technologies). Nanopore sequencing is a single-molecule sequencing technology whereby a single molecule of DNA is sequenced directly as it passes through a nanopore. A nanopore is a small hole, of the order of 1 nanometer in diameter. Immersion of a nanopore in a conducting fluid and application of a potential (voltage) across it results in a slight electrical current due to conduction of ions through the nanopore. The amount of current which flows is sensitive to the size and shape of the nanopore. S22-267 As a DNA molecule passes through a nanopore, each nucleotide on the DNA molecule obstructs the nanopore to a different degree, changing the magnitude of the current through the nanopore in different degrees. Thus, this change in the current as the DNA molecule passes through the nanopore represents a reading of the DNA sequence. Nanopore sequencing technology is disclosed in U.S. Pat. Nos. 5,795,782, 6,015,714, 6,627,067, 7,238,485 and 7,258,838 and U.S. Pat Appln Nos. 2006003171 and 20090029477. See also Greninger Genome Medicine. 20151: 99, among others. The junction of the fusion can be identified in the sequence reads. In some embodiments, the cells are blood cells from a patient that has a blood cancer associate with the gene fusion, e.g., ALL or AML, etc. In these embodiments, the results of the method may be used as a diagnostic, to measure the severity of the cancer, to monitor the disease, to determine if a treatment is working and to make treatment decisions. For example, if a gene fusion involves a kinase, then a drug that targets that kinase can be administered. In addition, the method may be multiplexed, as needed. For example, the DNA may be cleaved by two or more sets of three pairs of RNA-guided endonucleases, where each set targets a different fusion. In some embodiments, the present method may be carried out on a SageHLS device, as described at sagescienes.com and US20210062180, which is incorporated by reference in its entirety. As described in US20210062180, in some embodiments, the device may comprise molecule retention cassette for retaining molecu1es during electrophoresis, the cassette comprising: a housing; a lane configured within the housing, the lane having a first elongate edge and a second elongate edge; an elution module configured to be received in the lane to divide the lane into a first chamber and a second chamber, a first buffer reservoir positioned adjacent the first elongate edge; and a second buffer reservoir positioned adjacent the second elongate edge; wherein: a first side of the elution module facing the first chamber comprises a porous sterile filtration membrane; and a second side of the elution module facing the second chamber comprises an ultrafiltration membrane, the ultrafiltration membrane having a pore size to retain molecules during electrophoresis. The device may be used as follows for isolating and collecting target segments of target particles, the method comprising: receiving a sample in a sample well of an elution module; receiving an SDS-containing lysis buffer in a first buffer chamber, buffer chamber being S22-267 configured along a first side of the elution module; applying a first electrophoresis voltage t grate components of e sample towards a second buffer chamber configured along a second side of the elution module, such that: target particles are immobilized in a gel segment configured along the second side of the elution module between the elution module and the second buffer chamber, and non-target particles pass through the gel segment and into the second buffer chamber; washing the first buffer chamber, the second buffer chamber, and the elution module; filling the first buffer chamber, the second buffer charmer, and the elution module with a Cas9 reaction buffer; emptying the elution module; refilling the elution module with a Cas9 enzyme mix to cleave sections of target particles immobilized in the gel segment; loading the elution module with an SDS stop solution; applying a second electrophoresis voltage to release the Cas9 froth the target particles and migrate Cas9 into the second buffer chamber; washing the first buffer chamber, the second buffer chamber, and the elution module; filling the first buffer chamber, the second buffer chamber, and the elution module with elution buffer; and applying a third electrophoresis voltage in a reverse direction to migrate the cleaved sections of the target particles from the gel segment and into the elution module.' For example, the device can be configured as a semi-automated research instrument system for extraction and enzymatic processing of extremely high molecular weight (HMW) DNA (100-2000 kb). The system uses intact cells or isolated nuclei as input samples. Input samples are loaded into an agarose gel cassettes, and chromosome length DNA is extracted from the samples by electrophoresis of SDS through the sample well compartment. SDS-coated proteins, lipids are electrophoresed away from the sample well through the central agarose gel column, but the chromosome-length DNA becomes firmly entangled and immobilized in the agarose gel wall of the sample well. The sample well can be emptied and refilled without any loss of DNA. This allows for treatment the immobilized DNA by refilling the sample well with an enzyme reaction mixture. Many commonly used DNA processing enzymes readily diffuse into the agarose, including many restriction enzymes, DNA polymerases, ligases, transposases, non-specific DNA cleavases, and S. pyogenes Cas9. After DNA processing, an additional round of size-selection electrophoresis is performed, followed by electroelution of the DNA products into a series of six buffer-filled elution modules arranged along one side of the gel separation column . The DNA processing step includes some cleavage to reduce the size of the desired S22-267 DNA products to below 2 megabases (mb) in length DNA greater than 2 mb will remain immobilized in the sample well, unable to move during electrophoresis. Each cassette can have two physically isolated sample processing areas. The cassette may a standard 96-well plate footprint. The central agarose channel has two loading wells. Cells or nuclei are loaded in the sample well, and art SDS-based lysis reagent is loaded into the reagent well. Electrophoresis is carried out to drive the SDS through the sample well compartment where the cells or nuclei are lysed. Chromosome-sized genomic DNA becomes immobilized in the sample well wall, while other components are carried to the bottom electrode chamber along with the SDS. After DNA processing and size selection electrophoresis, the DNA products are electroeluted into an array of six elution modules positioned along the right side of the agarose channel. In theory, the amount of fusion DNA in the cell-free fraction of a patient’s bloodstream (i.e., cfDNA) should correlate with disease severity for those cancers that are associated with the fusion, e.g., a subset of non-small cell lung cancers. Thus, tracking the amount of fusion DNA over time could be used to, for example, determine if a treatment is working. Assays for accurately quantifying the amount of a particular fusion sequence in a sample are well known. For example, qPCR or Invader assay could be used. However, in the clinic, such assays are not straightforward to implement because different patients have different fusions and, even if the genes that are fused together in a patient’s cancer are known, the genes can be fused in different places. Such analyses are complicated by the fact that cfDNA is highly fragmented and, as such, samples that contain cfDNA are not amenable to analysis by some of the methods that are used to analyse samples that contain an intact genome. Thus, identifying and quantifying gene fusions in cfDNA would logically be implemented in two steps, where the first step involves sequencing a patient’s cfDNA to identify which genes are fused as well as the sequence at the junction of the fusion, and a second step that involves quantifying the amount of fusion DNA in the cfDNA (see, e.g., Harris et al, Nature Scientific Reports 20166: 29831). The problem with this approach is that the latter step is patient-specific in the sense that most reliable quantification methods (e.g., qPCR or Invader, for example) only work if primers that flank the fusion junction are used. Thus, in order to implement the conventional workflow, one would have to carefully select a custom primer pair for each patient being tested, before quantifying S22-267 the amount of fusion DNA. This is problematic because performing patient-specific assays using, e.g., custom sets of primers, is time consuming, inefficient and creates a significant potential for human error. Therefore, such assays should be avoided in the clinic, where robust, high-throughput methods are required. EXAMPLES Below are examples of specific embodiments for carrying out the present invention. The examples are offered for illustrative purposes only, and are not intended to limit the scope of the present invention in any way. Efforts have been made to ensure accuracy with respect to numbers used (e.g., amounts, temperatures, etc.), but some experimental error and deviation should, of course, be allowed for. The present method aims to overcome some of the challenges of DNA-based approaches in fusion gene sequencing using nanopore sequencing combined with a targeted enrichment method. There already exist targeted nanopore sequencing methods, such as nanopore Cas9- targeted sequencing (nCATS). However, because these targeted methods start sequencing only from the ends, the long and fragile strands can easily break apart before the middle of the fragment is sequenced. Therefore, the existing targeted nanopore sequencing methods are often only useful only when a part of fusion gene (e.g. a region including the breakpoint) is targeted. When targeting the entire fusion gene is desired, one needs an alternative strategy. Sequencing the entire gene fusion provide information such as, fully phased germline and somatic variants and CpG island DNA methylation of the gene fusion. These information cannot be obtained when targeting a small part of the gene fusion including the breakpoint. The general strategy is illustrated in Figs. 1A and 1B. A high molecular weight (HMW) DNA enrichment based on Cas9-assisted targeting of chromosome segments (CATCH) was used. The targeted fragments are then eluted and quantified by qPCR. The method can enrich and extract target regions that are as large as 1 Mb. The enriched target regions can then be sequenced and analyzed for the purpose of characterizing gene fusions. The method combines targeted in vitro genome cutting and pulse- field electrophoresis to sequence targets that were previously not targetable in a single HMW molecule. The resulting fragments are then sequenced without amplification using an nanopore S22-267 sequencer to resolve large and complex rearrangements. This method has been used to sequence 22q11.2 and 16q11.2 CNV rearrangements that are prominent risk factors for neurological disorders such as schizophrenia and autism. The present study aims to use a modified version of CTLR-Seq to investigate blood samples from leukemia (AML or ALL) samples. These samples have been previously tested with clinical cytogenetic methods and the method described in this study characterize gene fusions between KMT2A and the aforementioned frequent partner genes. Particularly, the qPCR step was used for target quantification also as simple detection method for presence of a rearrangement event. When a size shift is detected from clinical samples, additional ONT sequencing data can be used to further characterize the event. Physically separated fusions and the wild type genes are sequenced (Fig. 1B). The characterization includes breakpoint identification with base pair resolution, entirely phased variants using end-to-end contig assembly of wildtype and fusion alleles, and also the epigenetic changes. Particularly with the epigenetic information, one should be able to predict the expression activity of the fusion and wild type genes, which will be validated by RNA sequencing and/or traditional RT-PCR-based gene expression analysis. This information can be obtained only when wild type and fusion genes are binned and the entire gene is targeted. Methods Guide RNA design: To design 20-bp target sequences of gRNAs, all 20-bp sequences (20-mers) in the region of the target cut sites that occur directly adjacent to a Cas9 binding motif are considered. Only those candidate guide sequences that are unique in the human genome and that have no other alignments with less than three mismatches were retained. The GRCh38 genome was used to identify off-target CRISPR sites. An off-target gRNA sequence is defined as a 20-mers with three or four mismatches where the mismatches are located within the first 10 bases (the half distant from PAM domain). This high level of mismatches enables the gRNA to anneal across many different positions in the genome. Eliminating them reduces the off-target annealing. The candidate designs do not have any off-target sites near or inside the targets (Fig. 5). Also, a candidate design has no pair of off-targets neighboring each other within a range and S22-267 generating an offtarget fragment that can co-migrate with target fragments in the electrophoretic separation. Some candidate gRNA sequences are included even if they have off-targets elsewhere in the genome. The overall number of off-targets are considered as a parameter when assessing the design. Fig. 9 provides the final gRNA designs with information about their potential off-targets. Cell preparation: Peripheral blood mononuclear cells (PBMCs) were collected from patients’ blood using a Ficoll (Sigma-Aldrich, St. Louis, MO) separation, and stored frozen in 90% fetal calf serum (FCS) and 10% dimethyl sulfoxide (DMSO) until being used for enrichment of high molecular weight target DNA. A mammalian white blood cell (WBC) suspension kit (Sage Science, Beverly, MA) was used to prepare cells for sample loading. Briefly, frozen cell stock was thawed in a 30°C bead bath. To remove residual red blood cells (RBCs), PBMCs were incubated with the 1X RBC lysis buffer (Sage Science) at 4°C for 5 min. After the incubation, WBCs were washed twice with the 1X RBC lysis buffer (Sage Science) using centrifugation at >2,000g. After the second wash, the pellet was resuspended in 280 µL of the resuspension buffer (Sage Science). The cell suspension was quantified with genomic DNA contents using the Qubit lysis buffer (Sage Science) and the Qubit 1x dsDNS high sensitivity assay (Thermo Fischer Scientififc, Waltham, MA) following the manufacturers’ guides. The cell suspension was diluted with the resuspension buffer (Sage Science) so that the concentration of genomic DNA was approximately 100 ng/µL, with which a 70-µL aliquot included 1 million cells. Custom-designed synthetic CRISPR RNAs (crRNAs) and the trans-activating CRISPR RNA (tracrRNA) were purchased from Integrated DNA Technologies (Coralville, IA, USA). Per target, a pair of crRNAs were used to excise the 300 kb or 1 Mb fragments, one crRNA targeting the 5’ flanking region and the other targeting the 3’ flanking region. Up to six pairs of crRNAs were multiplexed (i.e. up to six targets in a single assay). When preparing guide RNA (gRNA)-Cas9 assembly for four sample runs (the maximum capacity of Sage HLS machine), 800 fmol of pooled crRNAs was annealed to 520 fmol of tracrRNA in 44 µL of 1X duplex buffer (Integrated DNA Technologies) at 95°C for 10 minutes, followed by cooling at room temperature for 5 minutes. The annealed gRNA mix was assembled with 160 fmol of Cas9 endonuclease in 80 µL of 1X enzyme buffer (Sage Science) at 37°C for 10 minutes. S22-267 HLS-CATCH: The DNA from 70 µL of the cell suspension was extracted by using the workflow “CATCH 100-300 kb extr3hr inj4m80v sep4hr.shflow” or “CATCH 1000 kb extr3hr inj4m80v sep8hr.shflow” on the Sage HLS instrument (Sage Science). Intact PBMCs (∼1.0 million) were loaded into the sample well, and a lysis buffer containing 3% sodium dodecyl sulphate (SDS) was loaded into a reagent well upstream of the sample well. Electrophoresis was carried out for 3 hour, driving the SDS through the sample well to lyse the cells. The SDS, proteins, and membrane components were carried away from the sample well to the bottom electrode chamber. The large genomic DNA (>2 Mb) was embedded in the agarose wall of the sample well during the extraction electrophoresis. At the end of the extraction stage, the electrophoresis was halted and the reagent well was emptied and refilled with the Cas9–gRNA reaction mixture. The reaction mixture was diluted with 3X volume of 1X enzyme buffer (Sage Sciecne) prior to loading. Electrophoresis was carried out for 4 min to drive the Cas9 enzyme into contact with the genomic DNA inside the sample well wall. Then, the electrophoresis was stopped, followed by Cas9 digestion of the genomic DNA at room temperature for 30 minutes. After Cas9 digestion, the reagent well was emptied and refilled with the SDS lysis reagent, and size selection electrophoresis was carried out for 4 hr. The electrophoresis process used a pulsed-field waveform, designed for optimal resolution of DNA fragments 300 kb or 1 Mb in size. After size separation, a second orthogonal set of electrodes was used to elute the size- separated DNA into a series of elution modules located along one side of the gel column. The DNA was moved from the elution modules after 12 hours after run termination. Quantitation of targeted high molecular weight DNA: The DNA from each elution module was prepared with a 1:5 dilution in 33% bCD. TaqMan qPCR Copy Number assays (Thermo Fisher Scientific) were used to measure the DNA concentration after extraction. The 10 µL reaction included 2 µL of diluted target DNA sample, 1X TaqMan Genotyping Mix, 1X TaqMan RNaseP reference and 1X TaqMan assay for the specific targets. The samples were denatured at 95°C for 10 min, followed by 50 cycles of 15 s at 95°C and 1 min at 60°C. For a relative quantification (i.e. target versus RNaseP reference), a modified ΔΔCt method [20] was used. One ng of NA18507 genomic DNA was used as a control. For an estimation of absolute copy number, it was assumed that 290 genome copies were in 1 ng of the control sample. Library preparation and nanopore sequencing: When using multiple sample runs for S22-267 sequencing, the enriched targets from the elution modules were pooled in accordance with the results of the qPCR. The pooled sample DNA was first purified using 0.45X Ampure XP beads with a gentle liquid handling to minimize physical DNA shearing. The beads were washed twice with 80% ethanol, and then cleaned DNA was eluted in 17 µL of 10 mM Tris for 1 hr at 37°C with 400 rpm shaking, and then at 4°C overnight. The Qubit dsDNA HS assay (Thermo Fisher Scientific) was used to measure the yield of this purification, which were generally 50-60%. A SQK-RAD004 kit (Oxford Nanopore Technologies, Littlemore, Oxford, UK) was used for the library preparation with modifications. Frist, FRA was prepared in 1:20, 1:30, or 1:40 dilutions with FRA dilution buffer from SQK-ULK001 (Oxford Nanopore Technologies). The purified CATCH product were combined with 5 µL of a FRA dilution for a total volume of 20 µL. The resulting mixture was incubated at 30°C for 1.5 minutes, and then at 80°C for 1.5 minutes. Following the incubation period, 20 µL of the tagmented CATCH product was combined with 1 µL of RAP and incubated at room temperature for 5 minutes. The library was loaded to PromethION flowcell (R9.4.1) after combined with water, loading beads, and sequencing buffer following the manufacturer’s instruction. The library was sequenced using an Oxford Nanopore Technologies PromethION 24 sequencing machine (Oxford Nanopore Technologies). The sequencing was performed for 72 hr with a high-accuracy base calling model (‘dna_r9.4.1_450bps_hac_prom’) and pore scanning in every 1.5 hr. The base calls generated during sequencing was used only for real-time monitoring of sequencing run quality. Base calling, alignment, and assembly: For base calling, Guppy (v6.1.1, Oxford Nanopore Technologies) was used with a super-accuracy model (‘dna_r9.4.1_450bps_sup_prom’). Using the fast5 files generated by the Nanopore sequencing as input, fastq files were generated. For alignment, Minimap2 (v2.17) [21] was used with a preset for Nanopore sequencing reads (‘map-ont’). The sequencing reads were aligned to the GRCh38 human genome. To get on-target mean coverage, ‘bedtools coverage’ (v2.25) [22] was used. To identify the on-target reads for local sequence assemblies, ‘samtools view’ (v.1.10) [23] was used with the ‘-L’ option. The bed file used for the coverage analysis and identification of on-target reads is provided as a supplementary table. For sequence assembly, Flye (v2.9.1- b1780) [24] was used with a preset for Nanopore sequencing reads base-called with super- accuracy models (‘nano-hq’). The on-target reads as input was used and the estimation of S22-267 genome size was calculated with possible scenarios based on the results of the qPCR assays. Methylation analysis: Megalodon (v2.5.0, Oxford Nanopore Technologies) was used to call CpG methylations from the Nanopore sequence reads. Fast5 files were used as input, and the GRCh38 human genome was used as the reference. CpG methylation was called with two different models: i) 5-methylcytosine (‘dna_r9.4.1_450bps_modbases_5mc_cg_sup_prom’) and ii) 5-hydroxymethylcytosine or 5-methylcytosine (‘dna_r9.4.1_450bps_modbases _5hmc_5mc_cg_sup_prom’). Single cell multi-omics assay: For single cell multi-omics assay, the Chromium Next GEM Single Cell Multiome ATAC and Gene Expression Reagent kit (10X Genomics, Pleasanton, CA) was used. The ATAC and RNA seq libraries were prepared according to the manufacturer’s guide, and sequenced using NovaSeq 6000 with 50:8:24:49 (ATAC) or 28:10:10:90 (RNA) paired end format. Using Cell Ranger ARC (v2.0.0, 10X Genomics), the raw sequencing data was then demultiplexed, aligned, and initially analyzed to produce matrix tables for ATAC peaks/fragments and the gene expression. Finally, Signac package [25] was used to generate Seurat objects and to cluster cells based on the multi-omics features. Pathway analysis: A pathway enrichment analysis was conducted based on cancer hallmark signatures downloaded from MSigDB (v.6.2) [26, 27]. The pathway enrichment scores of clusters were calculated using Gene Set Variation Analysis (GSVA) (v.1.32.0) [26] with parameters “kcdf = Poisson”, “mx.diff=TRUE” and “min.sz=10”. An ANOVA test was performed to compare the enrichment scores among the clusters. The significant pathways of each cluster were decided as the adjusted P-value < 0.05 after FDR. Results Study overview This study aimed to comprehensively analyze the genetic and epigenetic characteristics of KMT2A gene fusions in acute leukemia. The approach involved a targeted analysis of the entire gene regions, including exons and introns, of both KMT2A and its partner gene. This allowed the detection sequence variations as well as fully phased simple and complex structural variations. In total, seven patient samples were analyzed, consisting of five cases of AML and two cases of B-ALL. The mega haplotypes associated with the fusions were cataloged to gain insights into their genetic makeup. S22-267 In addition, the promoter CpG methylation of the fusion and wild type genes was assessed to assess their expression activity. Bodified bases, such as 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) were considered during the base calling process from the raw sequencing data. By examining the methylation levels, the expression patterns of these genes was preducted. To gain a deeper understanding of the heterogeneity within leukemia cells, a single-cell multi-ome assay was used. This enables one to identify distinct clusters of leukemia cells based on their transcriptome and genome-wide chromatin accessibility profiles. In particular, changes at the individual gene level were examined, specifically examining the expression levels and promoter chromatin accessibility of the KMT2A gene within different cell clusters. The changes were cross-referenced with the methylation level alterations identified through a previous targeted approach. To explore the broader genomic signatures, the set of genes that exhibited differential expression in leukemia cell clusters compared to normal cell clusters within the same sample was analyzed. Additionally, the enriched pathways associated with these genes were investigated. CRISPR targeting and fusion detection by TaqMan assays An approach was to target KMT2A gene fusions involving partner genes such as AFF1, MLLT3, MLLT10, ELL, and AFDN. In this study, three multiplex CATCH assays were tested (Assays 1, 2, and 3) designed to enrich 300-kb genomic segments, including KMT2A gene fusion targets. The multiplex assays targeted either two or six genes simultaneously, resulting in the extraction of genomic DNA with enriched two or six 300-kb target fragments from cells without fusion events (e.g., GM18507). Initially, Assay 1 was designed to target only two genes (KMT2A and a partner gene) and later expanded it to include more partner genes in Assays 2 and 3. Assay 2 and 3 differed in the gRNA pair targeting the KMT2A gene. One gRNA pair positioned KMT2A in the middle of the resulting fragment, while the other targeted a shifted location toward the 3’-flanking region of KMT2A (Fig. 6). The shifted KMT2A location improved the separation resolution between wild type and fusion fragments in some samples. To detect gene fusion events with size shifts, TaqMan DNA copy number assays were S22-267 performed to quantify the absolute amount of target HMW DNA molecules in the elution modules. These modules collected CRISPR-excised fragments in different size ranges (Fig. 1B). To enable straightforward detection, all target fragments had similar sizes when no rearrangement events occurred, and thus were collected in the same elution module. When a gene was shorter than the other genes in a multiplex assay, the target fragment included the gene along with flanking regions to match the size of the other gene targets. For one AML sample (SU710) with a known gene fusion between KMT2A and MLLT3, four TaqMan assays were employed (Fig. 2A). Each gene was targeted by two TaqMan assays, which focused on regions near the inside edge of the CATCH product to detect size shifts. When only the wild type gene was present, the assays generated the same enrichment patterns across all the TaqMan assays. However, when both wild type and fusion alleles were present, two of the TaqMan assays showed a different pattern because the fusion gene was shorter than the wild type genes (Fig. 2B). Even in cases where electrophoresis separation was not reproducible, the size shift could still be detected by comparing the two assay patterns. Therefore, a minimum of two TaqMan assays were required to detect the fusion: one targeting a region that exists only in the wild type and the other targeting a region that exists in both the wild type and fusion. Additional TaqMan assays can be added to further confirm fusion events and detect more complex events, such as chimeric fusions. Fully phased variants in fusion gene assembly In this study, the nanopore sequencing library preparation method was modified for the CTRL-seq approach. By comparing ligation-based and tagmentation-based library preparations using a human cell line sample, it was observed that the tagmentation-based method demonstrated superior uniformity in sequencing coverage. It eliminated the need for post- adapter ligation cleanup and allowed direct loading of the library. Through optimization of the tagmentation enzyme concentration and enzymatic reaction duration, long sequence reads with N50 values comparable to the ligation-based method while maintaining uniform coverage were obtained. With the optimized tagmentation-based method, nanopore sequencing reads ranging from 15kb to 35kb in N50 were generated and achieved the longest on-target reads, which generally exceeded 100kb for most samples. These uniformly distributed, long on-target reads were utilized for sequence assembly. S22-267 For all the samples full length assembly contigs were obtained. Importantly, those assemblies are not collapsed version of multiple haplotypes, but the reconstructed sequence of only the fusion itself. This was possible because the DNA molecules were binned experimentally with an electrophoretic separation and then separately sequenced. No prior variant analysis and binning for sequence reads were required. Fig. 3A shows an example of experimentally binned sequence reads (i.e. two separate sequencing runs) aligned to human genome. The fusion-participating genes existed as wild type in one of the runs, but as a fusions gene in the other. The alignment showed clear break points only in one of the sequencings. The observed variants were compared to the assembly made from each sequencing, and confirmed that the assembly process accurately selected the variant allele predominantly observed in the sequence pile up (Fig. 3B). With the efficient targeted long-read sequencing, one can phase not only variants proximal to fusion breakpoints, but also variants distant to the breakpoints, which was not possible with previous sequencing approaches. Differential DNA methylation between wild-type and fusion genes An analysis of the DNA methylation status in both wild-type and fusion genes was conducted, focusing on the CpG Island of the KMT2A gene promoter (Fig. 4A). A base-calling model was employed that considers both 5mC and 5hmC. One aim was to identify any differential methylation patterns between the wild-type and fusion genes. Overall, no significant difference in 5mC levels between the two gene types was found. Within the promoter CpG island, a depletion of 5mC was observed in both the wild-type and fusion genes (an example shown in Fig. 4B). In contrast, the 5hmC levels exhibited variation between the wild-type and fusion genes across different samples. Inside the CpG island, there are two exons: exon 1 and an alternative exon adjacent to the first exon. The changes in 5hmC levels were not uniform across the CpG island but rather localized around these exons. The direction of changes varied among the samples. Some samples exhibited a decrease in 5hmC levels (SU659 and SU714), while others showed an increase (SU710, SU847, and SU968). In contrast to the samples with decreased fusion 5hmC levels, the samples with increased fusion 5hmC levels displayed variability in the location of these changes relative to the exons. For SU710 and SU847, the increased 5hmC levels were observed only before exon 1. However, SU968 exhibited two peaks in fusion 5hmC levels, with each peak located before one of the exons. Identification of leukemia clusters with single-cell multi-ome assay In addition, a single-cell multi-ome assay on four samples (SU659, SU710, SU847, and SU968) was conducted, which provided transcriptome and genome-wide chromatin accessibility profiles at the single-cell level. Utilizing both sets of profiles, the Signac package [25] was used to generate UMAP cell clusters. These clusters were then annotated using gene markers for B- ALL and AML, as well as general cell markers for various blood cell types including T and B cells [28]. Across all four samples, leukemia cell clusters were identified that aligned with the respective disease type (B-ALL or AML). Each sample exhibited between four to seven leukemia clusters, each characterized by distinct cell markers. While most clusters were in close proximity to one another in the UMAP spaces, an exceptional case was observed in SU968, where one AML cluster was notably distant from the other AML clusters within the same sample. Additionally, normal blood cell clusters (B and T cells) were present in all samples, serving as internal controls for analyzing gene expression and chromatin accessibility on an individual basis, as well as on a transcriptome basis. KMT2A promoter chromatin accessibility and expression of cell clusters Initially, the investigation focused on analyzing the single-cell assay to examine potential epigenetic modifications associated with the fusion gene and their impact on gene expression. It is important to note that leukemia cells can harbor both fusion genes and wild- type genes, including both KMT2A wild type and KMT2A-derived fusion genes. As a result, the single-cell profile represented a composite signal from both types, rather than separate signals for wild-type or fusion genes. Despite this complexity, distinct clusters of leukemia cells were identified, where variations in ATAC signal and the corresponding expression patterns were observed. These findings suggest that specific leukemia cell populations exhibited differential ATAC signal, indicating potential regulatory changes, accompanied by corresponding alterations in gene expression. One interesting observation pertains to the consistency exhibited by both AML samples. These samples showed consistent changes in terms of i) the structure of cell clusters, ii) ATAC coverage alterations, and iii) the corresponding expression changes. Notably, the ATAC coverage spikes, indicative of open chromatin regions, were located before the first two exons (Exon 1 and the alternative exon adjacent to the first exon). Generally, there was a decrease in coverage in those peaks within AML clusters. However, AML clusters in close proximity to each other in the UMAP space (AML1-3 clusters in SU710 and AML2-7 clusters in SU968) exhibited a more pronounced decrease compared to the distant clusters (AML4 cluster in SU710 and AML1 cluster in SU968) within each AML sample. Another intriguing finding was that the ATAC coverage peaks preceding the alternative exon showed more significant differences between subsets of AML clusters and normal B/T cell clusters compared to the differences observed in cell clusters of B-ALL samples. Both leukemia and normal cell clusters in B-ALL samples displayed prominent ATAC peaks before the alternative exon, whereas in AML samples, the leukemia cell clusters did not exhibit such pronounced coverage peaks before the alternative exon. This suggests different isoform structures in different types of leukemia. In contrast to the AML samples, B-ALL samples demonstrated inconsistencies in terms of the direction of changes in KMT2A promoter accessibility and gene expression. One sample (SU659) featured a B-ALL cluster showing increased chromatin accessibility and up-regulated gene expression. Conversely, the other sample (SU847) exhibited a B-ALL cluster with decreased chromatin accessibility and down-regulated gene expression. All of these observations aligned with the quantified 5hmC level of the fusion gene promoter using a targeted approach. Pathway enrichment analysis Using the hallmark gene sets [27], a pathway enrichment analysis was performed to determine which enriched pathways were shared or not shared between leukemia clusters with and without epigenetic and transcriptional modulations in KMT2A. For all leukemia cell clusters, significant enrichment and depletion against the internal control cell clusters (normal B and T cells) was filtered. Then, the clusters with changes in KMT2A were distinguished from the others that didn't have the KMT2A changes. For the B-ALL samples (SU659 and SU847), each had one leukemia cluster that showed epigenetic and transcriptional changes in KMT2A. However, when compared to the other S22-267 leukemia clusters that did not share the KMT2A change, there was no uniquely enriched pathway for them. Instead, they shared enriched pathways with some of the leukemia clusters, suggesting similar cellular physiology. Interestingly, from both B-ALL samples, it was observed leukemia cell clusters that were unique among all leukemia cell clusters in terms of enriched pathways. Although from two different samples, they shared the most enriched pathways, such as "G2M_CHECKPOINT" and "E2F TARGETS." For the AML samples (SU710 and SU968), each had three or six leukemia clusters that showed epigenetic and transcriptional changes in KMT2A. Unlike the clusters with KMT2A changes in the B-ALL samples, they shared most of their enriched pathways and were distinguished from the other ones that did not have the changes in KMT2A. Each had a leukemia cell cluster that showed no epigenetic and transcriptional changes in KMT2A and also had a set of uniquely enriched pathways. From SU710, the unique leukemia cell cluster had enrichment in "ANGIOGENESIS," "ADIPOGENESIS," and "XENOBIOTIC_METABOLISM." From SU968, the unique leukemia cell cluster had enrichment in "HEDGEHOG_SIGNALING," "MYC_TARGETS_V1," and "MYC_TARGETS_V2." Shifted KMT2A location in target HMW fragment improved fusion separation A multiplex assay was designed, locating the target genes in the middle of the target fragments (e.g. Assay 2). However, in some samples, the gene fusion and the wild type fragments had similar length, and the difference was less than 50 kb. The HMW DNA fragments with similar length were eluted in a same elution well, and therefore the simple qPCR method was not able to detect the fusion. For example, the initially designed CATCH assay was not able to separate fusion fragments, whose lengths were 330 kb and 270 kb, from SU659 (sample with a KMT2A-AFF1 fusion). Another pair of KMT2A gRNAs was designed to be compatible with other existing gRNAs to locate the gene at a shifted location toward 5’-end. Using this newer multiplex assay (Assay 3), SU659 was analyzed again, and confirmed additional enrichment of targets in elution modules neighboring the elution module that collected wild type fragments. (Fig. 7). In addition to 300-kb wild type targets, 400-kb and 200-kb rearranged DNA molecules were separated in the electrophoretic separation step, and collected at the neighboring elution modules. These two additional enrichments suggested two fusion genes resulted from a reciprocal translocation. S22-267 Tagmentation-based ONT sequencing library preparation The ONT sequencing library preparation was modified in CTRL-seq method. Using a human cell line sample (GM18507), ligation- (CTRL-seq) and tagmentation-based (this study) library preparations were compared. For the 300-kb targets (Assay 2), both ligation- and tagmentation-based methods generated reads covering the entire region with reasonably good coverage although better uniformly was apparent in the targmentation-based method (Fig. 8). Therefore, end-to-end assembly contigs were able to be made from both the sequencings. However, when longer than 300kb, an end-to-end assembly may be challenging because of poorly covered middle part of target. For example, if a rearrangement event increases the size of CRISPR-targeted products, the issue of uniformity may obscure the rearrangement. To test a HMW target longer than 300 kb, CATCH assay for KMT2A gene were designed, and was able to generate end-to-end assembly with targeted ONT sequencing (Fig. 8). In addition to uniform sequencing coverage, tagmentation-based method appeared to be superior in terms of efficiency. When using ligation for introducing adapters, the library should be purified before loading to sequencer. Some degree of fragmentation may occur, negatively impacting the uniformity of sequencing coverage. In the version of tagmentation-based nanopore library preparation that was used in this study, no post-adapter cleanup was required and the library could be loaded directly after the reaction. DNA shearing after addition of adapter could be minimized and the library inserts were generated as randomly fragmented targets. The distribution of insert sizes and the length of the sequence reads could be optimized by selecting concentration of transposase enzyme and the duration of the enzymatic reaction. The goal of optimization was achieving the longest sequence reads while keeping the yield comparable to the ligation-based method. 1:5, 1:20, and 1:40 dilution of original tagmentation enzyme mix were tested with different clinical samples, and N50 was achieved, comparable to ligation-based method while keeping the uniform sequencing coverage throughout the target region. References 1. Bodmer, W., J.H. Bielas, and R.A. Beckman, Genetic instability is not a requirement for tumor development. Cancer Res, 2008. 68(10): p. 3558-60; discussion 3560-1. 2. Kou, F., et al., Chromosome Abnormalities: New Insights into Their Clinical Significance in Cancer. Mol Ther Oncolytics, 2020. 17: p. 562-570. 3. Gao, Q., et al., Driver Fusions and Their Implications in the Development and Treatment of Human Cancers. Cell Rep, 2018. 23(1): p. 227-238 e3. 4. Puiggros, A., G. Blanco, and B. Espinet, Genetic abnormalities in chronic lymphocytic leukemia: where we are and where we go. Biomed Res Int, 2014. 2014: p. 435983. 5. Glassman, A.B., Chromosomal abnormalities in acute leukemias. Clin Lab Med, 2000. 20(1): p. 39-48. 6. Powers, M.P., The ever-changing world of gene fusions in cancer: a secondary gene fusion and progression. Oncogene, 2019. 38(47): p. 7197-7199. 7. Engvall, M., et al., Detection of leukemia gene fusions by targeted RNA-sequencing in routine diagnostics. BMC Med Genomics, 2020. 13(1): p. 106. 8. Hess, J.L., MLL: a histone methyltransferase disrupted in leukemia. Trends Mol Med, 2004. 10(10): p. 500-7. 9. Gestrich, C.K., et al., Reciprocal ATP5L-KMT2A gene fusion in a paediatric B lymphoblastic leukaemia/lymphoma (B-ALL) patient. Br J Haematol, 2020. 191(2): p. e61-e64. 10. Yoshida, A., et al., KMT2A (MLL) fusions in aggressive sarcomas in young adults. Histopathology, 2019. 75(4): p. 508-516. 11. Zerkalenkova, E., et al., BTK, NUTM2A, and PRPF19 Are Novel KMT2A Partner Genes in Childhood Acute Leukemia. Biomedicines, 2021. 9(8). 12. Ney Garcia, D.R., et al., Molecular characterization of KMT2A fusion partner genes in 13 cases of pediatric leukemia with complex or cryptic karyotypes. Hematol Oncol, 2017. 35(4): p. 760-768. 13. Bataller, A., et al., KMT2A-CBL rearrangements in acute leukemias: clinical characteristics and genetic breakpoints. Blood Adv, 2021. 5(24): p. 5617-5620. 14. Meyer, C., et al., Human MLL/KMT2A gene exhibits a second breakpoint cluster region for recurrent MLL-USP2 fusions. Leukemia, 2019. 33(9): p. 2306-2340. 15. Markey, F.B., et al., Fusion FISH imaging: single-molecule detection of gene fusion transcripts in situ. PLoS One, 2014. 9(3): p. e93488. 16. Lyu, X., et al., Detection of 22 common leukemic fusion genes using a single-step multiplex qRT-PCR-based assay. Diagn Pathol, 2017. 12(1): p. 55. 17. Zhou, B., Shin, G., Greer, S.U., Vervoort, L., Huang, Y., Pattni, R., Ho, M., Wong, W.H., Vermeesch, J.R., Ji, H.P., Urban, A.E., Complete and haplotype-specific sequence assembly of segmental duplicationmediated genome rearrangements using CRISPR-targeted ultra-long read sequencing (CTLR-Seq). BioRxiv, 2020. https://doi.org/10.1101/2020.10.23.349621. 18. Kerbs, P., et al., Fusion gene detection by RNA-sequencing complements diagnostics of acute myeloid leukemia and identifies recurring NRIP1-MIR99AHG rearrangements. Haematologica, 2022. 107(1): p. 100-111. 19. Gilpatrick, T., et al., Targeted nanopore sequencing with Cas9-guided adapter ligation. Nat Biotechnol, 2020. 38(4): p. 433-438. 20. Livak, K.J. and T.D. Schmittgen, Analysis of relative gene expression data using real- time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods, 2001. 25(4): p. 402-8. 21. Li, H., Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 2018. 34(18): p. 3094-3100. 22. Quinlan, A.R. and I.M. Hall, BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 2010. 26(6): p. 841-2. 23. Li, H., et al., The Sequence Alignment/Map format and SAMtools. Bioinformatics, 2009. 25(16): p. 2078-9. 24. Kolmogorov, M., et al., Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol, 2019. 37(5): p. 540-546. 25. Stuart, T., et al., Single-cell chromatin state analysis with Signac. Nat Methods, 2021. 18(11): p. 1333-1341. 26. Hanzelmann, S., R. Castelo, and J. Guinney, GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics, 2013. 14: p. 7. 27. Liberzon, A., et al., The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst, 2015. 1(6): p. 417-425. 28. Khabirova, E., et al., Single-cell transcriptomics reveals a distinct developmental state of KMT2A-rearranged infant B-cell acute lymphoblastic leukemia. Nat Med, 2022. 28(4): p. 743- 751.

Claims

S22-267 CLAIMS What is claimed is: 1. A method comprising: (a) lysing cells that have a fusion between a first gene and a second gene at an end of an electrophoresis gel to release the genomic DNA from the cells; (b) applying a voltage potential to the gel, thereby trapping intact genomic DNA at one end of the gel; (c) digesting the trapped genomic DNA using two or more pairs of RNA-guided endonucleases to release: (i) a segment of the first gene; (ii) a segment of the second gene; and (iii) a segment of a gene fusion between the first and second genes; wherein the segments of the first and second genes are approximately the same size and the segment of the gene fusion is resolvable from the segments of the first and second genes in the gel; (d) electrophoresing the segments of (i), (ii) and (iii) through the gel, thereby separating the segment of (iii) from the segments of (i) and (ii); (e) eluting the segments into different fractions by applying a second voltage potential to the gel, wherein the second voltage potential is orthogonal to the potential of (a), wherein the segment of (iii) is eluted into a fraction that is different to the fraction into which the segments of (i) and (ii) are eluted; and (f) assaying for the flanking sequences of the first and second genes in the fractions collected in (e), thereby identifying a fraction that contains the segments of the first and second genes and a fraction contains the segment of the gene fusion. 2. The method of claim 1, wherein the assay is done by a quantitative PCR 3. The method of claim 2, wherein the assay is done by Taqman. S22-267 4. The method of any prior claim, wherein the electrophoresing of (d) is pulsed field electrophoresis. 5. The method of any of claims 1-4, wherein the segments are 100kb - 1 Mb in length. 6. The method of any of claims 1-5, wherein the segments of (i) and (ii) have a size difference of less than at least 100kb. 7. The method of any of claims 1-6, wherein the segments of (i) and (ii) have a size difference of less than 20kb. 8. The method of any prior claim, wherein the segment of (iii) is smaller or larger than the segments of (i) and (ii) by at least 50kb. 9. The method of any prior claim, wherein the segment of (iii) is smaller or larger than the segments of (i) and (ii) by at least 200kb. 10. The method of any prior claim, wherein nucleic acid in the gel is eluted into 4-10 fractions in step (e). 11. The method of any prior claim, wherein the cells of (a) are blood cells, cultured cells, or a dissociated tissue cell suspension. 12. The method of any prior claim, wherein the cells are blood cells from a patient that has a blood cancer associated with the gene fusion. 13. The method of any claim 12, wherein the patient has ALL or AML. 14. The method of any prior claim, further comprising sequencing the eluted segment of the gene fusion using nanopore sequencing. S22-267 15. The method of claim 14, further comprising identifying a breakpoint in the fusion.