WO2024081596A1 - Identification and characterization of gene fusions by crispr-targeted nanopore sequencing - Google Patents
Identification and characterization of gene fusions by crispr-targeted nanopore sequencing Download PDFInfo
- Publication number
- WO2024081596A1 WO2024081596A1 PCT/US2023/076391 US2023076391W WO2024081596A1 WO 2024081596 A1 WO2024081596 A1 WO 2024081596A1 US 2023076391 W US2023076391 W US 2023076391W WO 2024081596 A1 WO2024081596 A1 WO 2024081596A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- gene
- fusion
- segments
- genes
- segment
- Prior art date
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 181
- 230000004927 fusion Effects 0.000 title claims abstract description 163
- 238000007672 fourth generation sequencing Methods 0.000 title claims description 17
- 238000012512 characterization method Methods 0.000 title description 6
- 238000000034 method Methods 0.000 claims abstract description 74
- 238000001962 electrophoresis Methods 0.000 claims abstract description 23
- 108010042407 Endonucleases Proteins 0.000 claims abstract description 5
- 102000004533 Endonucleases Human genes 0.000 claims abstract description 5
- 230000002934 lysing effect Effects 0.000 claims abstract description 5
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 3
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 3
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 3
- 210000004027 cell Anatomy 0.000 claims description 52
- 238000003556 assay Methods 0.000 claims description 41
- 238000012163 sequencing technique Methods 0.000 claims description 36
- 210000000601 blood cell Anatomy 0.000 claims description 8
- 239000006285 cell suspension Substances 0.000 claims description 5
- 201000005787 hematologic cancer Diseases 0.000 claims description 3
- 208000024200 hematopoietic and lymphoid system neoplasm Diseases 0.000 claims description 3
- 210000004748 cultured cell Anatomy 0.000 claims description 2
- 238000001853 pulsed-field electrophoresis Methods 0.000 claims description 2
- 238000003753 real-time PCR Methods 0.000 claims description 2
- 108020004414 DNA Proteins 0.000 description 63
- 239000000523 sample Substances 0.000 description 40
- 101001045846 Homo sapiens Histone-lysine N-methyltransferase 2A Proteins 0.000 description 38
- 102100022103 Histone-lysine N-methyltransferase 2A Human genes 0.000 description 37
- 238000010828 elution Methods 0.000 description 33
- 208000032839 leukemia Diseases 0.000 description 30
- 239000012634 fragment Substances 0.000 description 28
- 239000000872 buffer Substances 0.000 description 23
- 208000031261 Acute myeloid leukaemia Diseases 0.000 description 22
- 230000014509 gene expression Effects 0.000 description 22
- 101100342378 Homo sapiens KMT2A gene Proteins 0.000 description 21
- 101150077376 KMT2A gene Proteins 0.000 description 21
- 208000033776 Myeloid Acute Leukemia Diseases 0.000 description 20
- 206010028980 Neoplasm Diseases 0.000 description 18
- 239000000499 gel Substances 0.000 description 18
- 108020005004 Guide RNA Proteins 0.000 description 16
- 238000004458 analytical method Methods 0.000 description 16
- 201000011510 cancer Diseases 0.000 description 16
- 108091033409 CRISPR Proteins 0.000 description 14
- 230000037361 pathway Effects 0.000 description 13
- 238000011529 RT qPCR Methods 0.000 description 12
- 238000005516 engineering process Methods 0.000 description 12
- DBMJMQXJHONAFJ-UHFFFAOYSA-M Sodium laurylsulphate Chemical compound [Na+].CCCCCCCCCCCCOS([O-])(=O)=O DBMJMQXJHONAFJ-UHFFFAOYSA-M 0.000 description 11
- 230000001973 epigenetic effect Effects 0.000 description 11
- 238000002360 preparation method Methods 0.000 description 11
- 206010000830 Acute leukaemia Diseases 0.000 description 10
- 108010077544 Chromatin Proteins 0.000 description 10
- 238000013459 approach Methods 0.000 description 10
- 210000003483 chromatin Anatomy 0.000 description 10
- 230000002068 genetic effect Effects 0.000 description 10
- 230000008707 rearrangement Effects 0.000 description 10
- 238000000926 separation method Methods 0.000 description 10
- 235000019333 sodium laurylsulphate Nutrition 0.000 description 10
- 108700024394 Exon Proteins 0.000 description 9
- 238000009015 Human TaqMan MicroRNA Assay kit Methods 0.000 description 9
- 238000001514 detection method Methods 0.000 description 9
- 235000002020 sage Nutrition 0.000 description 9
- 230000008685 targeting Effects 0.000 description 9
- 208000024893 Acute lymphoblastic leukemia Diseases 0.000 description 8
- 101000804764 Homo sapiens Lymphotactin Proteins 0.000 description 8
- 102100035304 Lymphotactin Human genes 0.000 description 8
- 208000006664 Precursor Cell Lymphoblastic Leukemia-Lymphoma Diseases 0.000 description 8
- 238000013461 design Methods 0.000 description 8
- 208000014697 Acute lymphocytic leukaemia Diseases 0.000 description 7
- 239000003153 chemical reaction reagent Substances 0.000 description 7
- 238000011068 loading method Methods 0.000 description 7
- 230000011987 methylation Effects 0.000 description 7
- 238000007069 methylation reaction Methods 0.000 description 7
- 108091079001 CRISPR RNA Proteins 0.000 description 6
- 208000031404 Chromosome Aberrations Diseases 0.000 description 6
- 238000003559 RNA-seq method Methods 0.000 description 6
- 230000004075 alteration Effects 0.000 description 6
- 239000002245 particle Substances 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 102000004169 proteins and genes Human genes 0.000 description 6
- LRSASMSXMSNRBT-UHFFFAOYSA-N 5-methylcytosine Chemical compound CC1=CNC(=O)N=C1N LRSASMSXMSNRBT-UHFFFAOYSA-N 0.000 description 5
- 206010008805 Chromosomal abnormalities Diseases 0.000 description 5
- 108091029523 CpG island Proteins 0.000 description 5
- 102000004190 Enzymes Human genes 0.000 description 5
- 108090000790 Enzymes Proteins 0.000 description 5
- 210000000349 chromosome Anatomy 0.000 description 5
- 230000002559 cytogenic effect Effects 0.000 description 5
- 201000010099 disease Diseases 0.000 description 5
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 5
- 238000000605 extraction Methods 0.000 description 5
- 239000012139 lysis buffer Substances 0.000 description 5
- 239000000203 mixture Substances 0.000 description 5
- 210000003819 peripheral blood mononuclear cell Anatomy 0.000 description 5
- RYVNIFSIEDRLSJ-UHFFFAOYSA-N 5-(hydroxymethyl)cytosine Chemical compound NC=1NC(=O)N=CC=1CO RYVNIFSIEDRLSJ-UHFFFAOYSA-N 0.000 description 4
- 229920000936 Agarose Polymers 0.000 description 4
- 108700028369 Alleles Proteins 0.000 description 4
- 102000053602 DNA Human genes 0.000 description 4
- IAZDPXIOMUYVGZ-UHFFFAOYSA-N Dimethylsulphoxide Chemical compound CS(C)=O IAZDPXIOMUYVGZ-UHFFFAOYSA-N 0.000 description 4
- 208000031448 Genomic Instability Diseases 0.000 description 4
- 102100025373 Runt-related transcription factor 1 Human genes 0.000 description 4
- 210000001744 T-lymphocyte Anatomy 0.000 description 4
- 210000003719 b-lymphocyte Anatomy 0.000 description 4
- 239000011324 bead Substances 0.000 description 4
- 238000010790 dilution Methods 0.000 description 4
- 239000012895 dilution Substances 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 210000003743 erythrocyte Anatomy 0.000 description 4
- 239000012528 membrane Substances 0.000 description 4
- 238000007837 multiplex assay Methods 0.000 description 4
- 238000004393 prognosis Methods 0.000 description 4
- 238000011002 quantification Methods 0.000 description 4
- 230000002103 transcriptional effect Effects 0.000 description 4
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 3
- 102100024379 AF4/FMR2 family member 1 Human genes 0.000 description 3
- 208000032791 BCR-ABL1 positive chronic myelogenous leukemia Diseases 0.000 description 3
- 238000010354 CRISPR gene editing Methods 0.000 description 3
- 208000010833 Chronic myeloid leukaemia Diseases 0.000 description 3
- 108010043471 Core Binding Factor Alpha 2 Subunit Proteins 0.000 description 3
- 230000007067 DNA methylation Effects 0.000 description 3
- 208000034951 Genetic Translocation Diseases 0.000 description 3
- 101000833180 Homo sapiens AF4/FMR2 family member 1 Proteins 0.000 description 3
- 101000959489 Homo sapiens Protein AF-9 Proteins 0.000 description 3
- 208000033761 Myelogenous Chronic BCR-ABL Positive Leukemia Diseases 0.000 description 3
- 102100039686 Protein AF-9 Human genes 0.000 description 3
- 239000011543 agarose gel Substances 0.000 description 3
- 210000004369 blood Anatomy 0.000 description 3
- 239000008280 blood Substances 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 230000029087 digestion Effects 0.000 description 3
- 238000010201 enrichment analysis Methods 0.000 description 3
- 238000006911 enzymatic reaction Methods 0.000 description 3
- 102000054766 genetic haplotypes Human genes 0.000 description 3
- 210000000265 leukocyte Anatomy 0.000 description 3
- 208000022769 mixed phenotype acute leukemia Diseases 0.000 description 3
- 210000004940 nucleus Anatomy 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 239000002096 quantum dot Substances 0.000 description 3
- 239000011541 reaction mixture Substances 0.000 description 3
- 230000001105 regulatory effect Effects 0.000 description 3
- 238000003757 reverse transcription PCR Methods 0.000 description 3
- 238000002626 targeted therapy Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000007671 third-generation sequencing Methods 0.000 description 3
- 230000005945 translocation Effects 0.000 description 3
- 108020005065 3' Flanking Region Proteins 0.000 description 2
- WEVYNIUIFUYDGI-UHFFFAOYSA-N 3-[6-[4-(trifluoromethoxy)anilino]-4-pyrimidinyl]benzamide Chemical compound NC(=O)C1=CC=CC(C=2N=CN=C(NC=3C=CC(OC(F)(F)F)=CC=3)C=2)=C1 WEVYNIUIFUYDGI-UHFFFAOYSA-N 0.000 description 2
- 102100033793 ALK tyrosine kinase receptor Human genes 0.000 description 2
- 102100036775 Afadin Human genes 0.000 description 2
- 108091003079 Bovine Serum Albumin Proteins 0.000 description 2
- GHOSNRCGJFBJIB-UHFFFAOYSA-N Candesartan cilexetil Chemical compound C=12N(CC=3C=CC(=CC=3)C=3C(=CC=CC=3)C3=NNN=N3)C(OCC)=NC2=CC=CC=1C(=O)OC(C)OC(=O)OC1CCCCC1 GHOSNRCGJFBJIB-UHFFFAOYSA-N 0.000 description 2
- 206010061764 Chromosomal deletion Diseases 0.000 description 2
- 102100027100 Echinoderm microtubule-associated protein-like 4 Human genes 0.000 description 2
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 2
- 102100035108 High affinity nerve growth factor receptor Human genes 0.000 description 2
- 101000779641 Homo sapiens ALK tyrosine kinase receptor Proteins 0.000 description 2
- 101000928246 Homo sapiens Afadin Proteins 0.000 description 2
- 101001057929 Homo sapiens Echinoderm microtubule-associated protein-like 4 Proteins 0.000 description 2
- 101000596894 Homo sapiens High affinity nerve growth factor receptor Proteins 0.000 description 2
- 101001005667 Homo sapiens Mastermind-like protein 2 Proteins 0.000 description 2
- 101000718497 Homo sapiens Protein AF-10 Proteins 0.000 description 2
- 101000686031 Homo sapiens Proto-oncogene tyrosine-protein kinase ROS Proteins 0.000 description 2
- 101000579425 Homo sapiens Proto-oncogene tyrosine-protein kinase receptor Ret Proteins 0.000 description 2
- 101001048695 Homo sapiens RNA polymerase II elongation factor ELL Proteins 0.000 description 2
- 101000837841 Homo sapiens Transcription factor EB Proteins 0.000 description 2
- 101000813738 Homo sapiens Transcription factor ETV6 Proteins 0.000 description 2
- 102100025130 Mastermind-like protein 2 Human genes 0.000 description 2
- 108091028043 Nucleic acid sequence Proteins 0.000 description 2
- 102100026286 Protein AF-10 Human genes 0.000 description 2
- 102000001253 Protein Kinase Human genes 0.000 description 2
- 108010024221 Proto-Oncogene Proteins c-bcr Proteins 0.000 description 2
- 102000015690 Proto-Oncogene Proteins c-bcr Human genes 0.000 description 2
- 102100023347 Proto-oncogene tyrosine-protein kinase ROS Human genes 0.000 description 2
- 102100028286 Proto-oncogene tyrosine-protein kinase receptor Ret Human genes 0.000 description 2
- 102100023449 RNA polymerase II elongation factor ELL Human genes 0.000 description 2
- 102100028502 Transcription factor EB Human genes 0.000 description 2
- 102100039580 Transcription factor ETV6 Human genes 0.000 description 2
- 102000008579 Transposases Human genes 0.000 description 2
- 108010020764 Transposases Proteins 0.000 description 2
- 230000000712 assembly Effects 0.000 description 2
- 238000000429 assembly Methods 0.000 description 2
- 229940058087 atacand Drugs 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000009089 cytolysis Effects 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 230000008995 epigenetic change Effects 0.000 description 2
- 239000012894 fetal calf serum Substances 0.000 description 2
- 238000011049 filling Methods 0.000 description 2
- 210000005260 human cell Anatomy 0.000 description 2
- 238000011534 incubation Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 108020004999 messenger RNA Proteins 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000035772 mutation Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000035479 physiological effects, processes and functions Effects 0.000 description 2
- 238000010837 poor prognosis Methods 0.000 description 2
- 239000011148 porous material Substances 0.000 description 2
- 108060006633 protein kinase Proteins 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 238000010008 shearing Methods 0.000 description 2
- 230000000392 somatic effect Effects 0.000 description 2
- 238000000108 ultra-filtration Methods 0.000 description 2
- 238000005406 washing Methods 0.000 description 2
- 108020005029 5' Flanking Region Proteins 0.000 description 1
- 101150023956 ALK gene Proteins 0.000 description 1
- 206010000871 Acute monocytic leukaemia Diseases 0.000 description 1
- 101000783817 Agaricus bisporus lectin Proteins 0.000 description 1
- 101000719121 Arabidopsis thaliana Protein MEI2-like 1 Proteins 0.000 description 1
- 101000797615 Arabidopsis thaliana Protein MEI2-like 4 Proteins 0.000 description 1
- 206010003805 Autism Diseases 0.000 description 1
- 208000020706 Autistic disease Diseases 0.000 description 1
- 208000025324 B-cell acute lymphoblastic leukemia Diseases 0.000 description 1
- 208000010839 B-cell chronic lymphocytic leukemia Diseases 0.000 description 1
- 108091005625 BRD4 Proteins 0.000 description 1
- 102100026008 Breakpoint cluster region protein Human genes 0.000 description 1
- 102100029895 Bromodomain-containing protein 4 Human genes 0.000 description 1
- 102100040775 CREB-regulated transcription coactivator 1 Human genes 0.000 description 1
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 1
- 208000036086 Chromosome Duplication Diseases 0.000 description 1
- 208000001333 Colorectal Neoplasms Diseases 0.000 description 1
- 108010076804 DNA Restriction Enzymes Proteins 0.000 description 1
- 230000008836 DNA modification Effects 0.000 description 1
- 230000009946 DNA mutation Effects 0.000 description 1
- 229940124087 DNA topoisomerase II inhibitor Drugs 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 101100477411 Dictyostelium discoideum set1 gene Proteins 0.000 description 1
- 229920001917 Ficoll Polymers 0.000 description 1
- 102100030595 HLA class II histocompatibility antigen gamma chain Human genes 0.000 description 1
- 108010036115 Histone Methyltransferases Proteins 0.000 description 1
- 102000011787 Histone Methyltransferases Human genes 0.000 description 1
- 101000933320 Homo sapiens Breakpoint cluster region protein Proteins 0.000 description 1
- 101000891939 Homo sapiens CREB-regulated transcription coactivator 1 Proteins 0.000 description 1
- 101001082627 Homo sapiens HLA class II histocompatibility antigen gamma chain Proteins 0.000 description 1
- 101001050559 Homo sapiens Kinesin-1 heavy chain Proteins 0.000 description 1
- 101000620359 Homo sapiens Melanocyte protein PMEL Proteins 0.000 description 1
- 101000588130 Homo sapiens Microsomal triglyceride transfer protein large subunit Proteins 0.000 description 1
- 101001122114 Homo sapiens NUT family member 1 Proteins 0.000 description 1
- 101000604452 Homo sapiens NUT family member 2A Proteins 0.000 description 1
- 101000602930 Homo sapiens Nuclear receptor coactivator 2 Proteins 0.000 description 1
- 101000601664 Homo sapiens Paired box protein Pax-8 Proteins 0.000 description 1
- 101000741790 Homo sapiens Peroxisome proliferator-activated receptor gamma Proteins 0.000 description 1
- 101000610107 Homo sapiens Pre-B-cell leukemia transcription factor 1 Proteins 0.000 description 1
- 101001125496 Homo sapiens Pre-mRNA-processing factor 19 Proteins 0.000 description 1
- 101000857677 Homo sapiens Runt-related transcription factor 1 Proteins 0.000 description 1
- 101000697544 Homo sapiens SCL-interrupting locus protein Proteins 0.000 description 1
- 101000984753 Homo sapiens Serine/threonine-protein kinase B-raf Proteins 0.000 description 1
- 101000648196 Homo sapiens Striatin Proteins 0.000 description 1
- 101000740519 Homo sapiens Syndecan-4 Proteins 0.000 description 1
- 101000891113 Homo sapiens T-cell acute lymphocytic leukemia protein 1 Proteins 0.000 description 1
- 101000596772 Homo sapiens Transcription factor 7-like 1 Proteins 0.000 description 1
- 101000666382 Homo sapiens Transcription factor E2-alpha Proteins 0.000 description 1
- 101000837845 Homo sapiens Transcription factor E3 Proteins 0.000 description 1
- 101000638154 Homo sapiens Transmembrane protease serine 2 Proteins 0.000 description 1
- 101000850794 Homo sapiens Tropomyosin alpha-3 chain Proteins 0.000 description 1
- 101000823316 Homo sapiens Tyrosine-protein kinase ABL1 Proteins 0.000 description 1
- 101000864342 Homo sapiens Tyrosine-protein kinase BTK Proteins 0.000 description 1
- 101000964718 Homo sapiens Zinc finger protein 384 Proteins 0.000 description 1
- 108091092195 Intron Proteins 0.000 description 1
- 102100023422 Kinesin-1 heavy chain Human genes 0.000 description 1
- 102000003960 Ligases Human genes 0.000 description 1
- 108090000364 Ligases Proteins 0.000 description 1
- 208000031422 Lymphocytic Chronic B-Cell Leukemia Diseases 0.000 description 1
- 102100022430 Melanocyte protein PMEL Human genes 0.000 description 1
- 108060004795 Methyltransferase Proteins 0.000 description 1
- 102000016397 Methyltransferase Human genes 0.000 description 1
- 208000035489 Monocytic Acute Leukemia Diseases 0.000 description 1
- -1 NKRT2 Proteins 0.000 description 1
- 102100029166 NT-3 growth factor receptor Human genes 0.000 description 1
- 102100027086 NUT family member 1 Human genes 0.000 description 1
- 102100038690 NUT family member 2A Human genes 0.000 description 1
- 208000012902 Nervous system disease Diseases 0.000 description 1
- 102000048238 Neuregulin-1 Human genes 0.000 description 1
- 108090000556 Neuregulin-1 Proteins 0.000 description 1
- 208000025966 Neurological disease Diseases 0.000 description 1
- 208000015914 Non-Hodgkin lymphomas Diseases 0.000 description 1
- 102100037226 Nuclear receptor coactivator 2 Human genes 0.000 description 1
- 108700020796 Oncogene Proteins 0.000 description 1
- 108091008121 PML-RARA Proteins 0.000 description 1
- 102100037502 Paired box protein Pax-8 Human genes 0.000 description 1
- 102100038825 Peroxisome proliferator-activated receptor gamma Human genes 0.000 description 1
- 108091000080 Phosphotransferase Proteins 0.000 description 1
- 241000276427 Poecilia reticulata Species 0.000 description 1
- 102100040171 Pre-B-cell leukemia transcription factor 1 Human genes 0.000 description 1
- 102100029522 Pre-mRNA-processing factor 19 Human genes 0.000 description 1
- 108010029485 Protein Isoforms Proteins 0.000 description 1
- 102000001708 Protein Isoforms Human genes 0.000 description 1
- 102000004022 Protein-Tyrosine Kinases Human genes 0.000 description 1
- 108090000412 Protein-Tyrosine Kinases Proteins 0.000 description 1
- 108700020978 Proto-Oncogene Proteins 0.000 description 1
- 102000052575 Proto-Oncogene Human genes 0.000 description 1
- 101150035397 Ros1 gene Proteins 0.000 description 1
- 108091006576 SLC34A2 Proteins 0.000 description 1
- 235000013290 Sagittaria latifolia Nutrition 0.000 description 1
- 206010039491 Sarcoma Diseases 0.000 description 1
- 101000702553 Schistosoma mansoni Antigen Sm21.7 Proteins 0.000 description 1
- 101000714192 Schistosoma mansoni Tegument antigen Proteins 0.000 description 1
- 102100027103 Serine/threonine-protein kinase B-raf Human genes 0.000 description 1
- 102100038437 Sodium-dependent phosphate transport protein 2B Human genes 0.000 description 1
- 241000193996 Streptococcus pyogenes Species 0.000 description 1
- 102100028898 Striatin Human genes 0.000 description 1
- 102100037220 Syndecan-4 Human genes 0.000 description 1
- 102100040365 T-cell acute lymphocytic leukemia protein 1 Human genes 0.000 description 1
- 239000000317 Topoisomerase II Inhibitor Substances 0.000 description 1
- 108091023040 Transcription factor Proteins 0.000 description 1
- 102000040945 Transcription factor Human genes 0.000 description 1
- 102100038313 Transcription factor E2-alpha Human genes 0.000 description 1
- 102100028507 Transcription factor E3 Human genes 0.000 description 1
- 102100031989 Transmembrane protease serine 2 Human genes 0.000 description 1
- 239000007983 Tris buffer Substances 0.000 description 1
- 102100033080 Tropomyosin alpha-3 chain Human genes 0.000 description 1
- 108700025716 Tumor Suppressor Genes Proteins 0.000 description 1
- 102000044209 Tumor Suppressor Genes Human genes 0.000 description 1
- 102100022596 Tyrosine-protein kinase ABL1 Human genes 0.000 description 1
- 102100029823 Tyrosine-protein kinase BTK Human genes 0.000 description 1
- 102100040731 Zinc finger protein 384 Human genes 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000011759 adipose tissue development Effects 0.000 description 1
- 150000001413 amino acids Chemical group 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000000540 analysis of variance Methods 0.000 description 1
- 230000033115 angiogenesis Effects 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 108010056708 bcr-abl Fusion Proteins Proteins 0.000 description 1
- 102000004441 bcr-abl Fusion Proteins Human genes 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 125000003178 carboxy group Chemical group [H]OC(*)=O 0.000 description 1
- 238000005119 centrifugation Methods 0.000 description 1
- 229910052804 chromium Inorganic materials 0.000 description 1
- 239000011651 chromium Substances 0.000 description 1
- 230000002759 chromosomal effect Effects 0.000 description 1
- 230000008711 chromosomal rearrangement Effects 0.000 description 1
- 210000001726 chromosome structure Anatomy 0.000 description 1
- 208000032852 chronic lymphocytic leukemia Diseases 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- 108010041758 cleavase Proteins 0.000 description 1
- 201000002758 colorectal adenoma Diseases 0.000 description 1
- 235000015246 common arrowhead Nutrition 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 239000013068 control sample Substances 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 239000013024 dilution buffer Substances 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 239000012149 elution buffer Substances 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 102000052116 epidermal growth factor receptor activity proteins Human genes 0.000 description 1
- 108700015053 epidermal growth factor receptor activity proteins Proteins 0.000 description 1
- 230000004049 epigenetic modification Effects 0.000 description 1
- 238000010195 expression analysis Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000030279 gene silencing Effects 0.000 description 1
- 238000003205 genotyping method Methods 0.000 description 1
- 210000004602 germ cell Anatomy 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000009459 hedgehog signaling Effects 0.000 description 1
- 230000011132 hemopoiesis Effects 0.000 description 1
- 102000049285 human KMT2A Human genes 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000007654 immersion Methods 0.000 description 1
- 230000003116 impacting effect Effects 0.000 description 1
- 238000007901 in situ hybridization Methods 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 238000011065 in-situ storage Methods 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 150000002632 lipids Chemical class 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 201000011649 lymphoblastic lymphoma Diseases 0.000 description 1
- 230000000527 lymphocytic effect Effects 0.000 description 1
- 208000003747 lymphoid leukemia Diseases 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000002493 microarray Methods 0.000 description 1
- 230000033607 mismatch repair Effects 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 231100000310 mutation rate increase Toxicity 0.000 description 1
- YOHYSYJDKVYCJI-UHFFFAOYSA-N n-[3-[[6-[3-(trifluoromethyl)anilino]pyrimidin-4-yl]amino]phenyl]cyclopropanecarboxamide Chemical compound FC(F)(F)C1=CC=CC(NC=2N=CN=C(NC=3C=C(NC(=O)C4CC4)C=CC=3)C=2)=C1 YOHYSYJDKVYCJI-UHFFFAOYSA-N 0.000 description 1
- 208000002154 non-small cell lung carcinoma Diseases 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 230000000174 oncolytic effect Effects 0.000 description 1
- 230000001717 pathogenic effect Effects 0.000 description 1
- 230000007918 pathogenicity Effects 0.000 description 1
- 238000003068 pathway analysis Methods 0.000 description 1
- 239000013610 patient sample Substances 0.000 description 1
- 239000008188 pellet Substances 0.000 description 1
- 102000020233 phosphotransferase Human genes 0.000 description 1
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 239000011535 reaction buffer Substances 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000014493 regulation of gene expression Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 201000000980 schizophrenia Diseases 0.000 description 1
- 230000007017 scission Effects 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 238000004557 single molecule detection Methods 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 238000011146 sterile filtration Methods 0.000 description 1
- 239000012089 stop solution Substances 0.000 description 1
- 238000003239 susceptibility assay Methods 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 208000016595 therapy related acute myeloid leukemia and myelodysplastic syndrome Diseases 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 238000011222 transcriptome analysis Methods 0.000 description 1
- LENZDBCJOHFCAS-UHFFFAOYSA-N tris Chemical compound OCC(N)(CO)CO LENZDBCJOHFCAS-UHFFFAOYSA-N 0.000 description 1
- 108010064892 trkC Receptor Proteins 0.000 description 1
- 230000005748 tumor development Effects 0.000 description 1
- 208000029729 tumor suppressor gene on chromosome 11 Diseases 0.000 description 1
- 229940121358 tyrosine kinase inhibitor Drugs 0.000 description 1
- 239000005483 tyrosine kinase inhibitor Substances 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
- 230000022814 xenobiotic metabolic process Effects 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1003—Extracting or separating nucleic acids from biological samples, e.g. pure separation or isolation methods; Conditions, buffers or apparatuses therefor
- C12N15/1006—Extracting or separating nucleic acids from biological samples, e.g. pure separation or isolation methods; Conditions, buffers or apparatuses therefor by means of a solid support carrier, e.g. particles, polymers
- C12N15/101—Extracting or separating nucleic acids from biological samples, e.g. pure separation or isolation methods; Conditions, buffers or apparatuses therefor by means of a solid support carrier, e.g. particles, polymers by chromatography, e.g. electrophoresis, ion-exchange, reverse phase
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/62—DNA sequences coding for fusion proteins
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N27/00—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
- G01N27/26—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating electrochemical variables; by using electrolysis or electrophoresis
- G01N27/416—Systems
- G01N27/447—Systems using electrophoresis
Definitions
- genomic instability drives cancer progression through the silencing of tumor suppressor genes and activation of protooncogenes. For this reason, genomic instability is a hallmark of cancer. However, genomic instability is not required for cancer development, as can be observed in mismatch repair deficient colorectal tumors and colorectal adenomas. Still, 90% of all cancer types feature alterations in chromosome number and structure, with 16.5% possessing a driving fusion mutation. Leukemias commonly have at least one significant chromosomal variation. Understanding and identifying the specific genomic S22-267 aberrations in cancer types can be used to more accurately provide diagnoses, prognoses, and targeted therapies.
- AML acute myeloid leukemia
- certain chromosomal translocations are associated with a good prognosis, whereas some chromosomal deletions or duplications are indicative of an aggressive cancer with a poor prognosis.
- the ability to determine the specific mutations in cancer provides significant clinical utility and allows for the precise delivery of targeted therapies.
- Chromosomal rearrangement due to genetic instability can result in the fusion of genes.
- the resulting proteins from these gene fusions have altered functionality and can drive cancer progression, such as the BCR-ABL fusion protein observed in chronic myeloid leukemia (CML). This protein has increased tyrosine kinase activity as compared to its normally functioning counterpart.
- KMT2A gene fusions An example of such complexity is illustrated through KMT2A gene fusions.
- the KMT2A gene consists of 36 exons and is located on chromosome 11 in the q23 position.
- the gene encodes a protein with a H3K4 methyltransferase domain that plays a critical role in the regulation of gene expression in early development and hematopoiesis.
- KMT2A gene fusions are among the most common chromosomal abnormalities in acute leukemias, occurring in 80% of infant acute lymphoblastic leukemias (ALL), 5% of acute monocytic leukemia (AML) cases, and in 85% of secondary AML cases in patients previously treated with topoisomerase II inhibitors.
- KMT2A has 135 reported fusion partners, some of which are pathogenic and some are not. Regardless of the pathogenicity, it is well established that KMT2A gene fusions are drivers of acute leukemia.
- the most frequent partner genes in KMT2A gene fusions are AFF1, MLLT3, MLLT10, ELL, and AFDN.
- KMT2A-positive acute leukemias The specific fusion partner contributes to the determination of either the myeloid or lymphoblastic disease phenopatype, and the nature of the rearrangement can be used to predict prognosis.
- the KMT2A gene has a multitude of breakpoint regions in several different exons. The majority of KMT2A breakpoints are localized to a breakpoint cluster region between exon 7 and exon 14, but there are more than S22-267 one cluster regions, whose combined range is greater than 22 kb. Cytogenetic analysis can be used to identify chromosomal abnormalities through karyotyping and fluorescence in-situ hybridization (FISH).
- FISH fluorescence in-situ hybridization
- RNA-sequencing can also be used for whole transcriptome analysis by sequencing chimeric gene fusions through mRNA expression. This method is not without fault, as it is also highly complex and returns a considerable number of false positives and negatives.
- RNA-seq failed to detect 10% gene fusions that were determined by routine cytogenetics methods with high confidence.
- the false negative cases generally coincided with a low sequencing coverage for the transcribed fusion genes, which was likely due to instability of RNA molecules and/or low expression of the fusion gene.
- sequencing gene fusions at the gene level could, in theory, be a solution.
- the size of fusion gene can be greater than 100 kb, making it very difficult to target. When focusing only on the breakpoint regions, the size of target can be smaller.
- KMT2A gene has two major breakpoint cluster regions, whose combined length is less than 30kb.
- DNA-based fusion detection does not generally provide information about whether the fusion gene is transcriptionally active.
- acute leukemia is a complex disease with varying prognoses depending on specific chromosomal abnormalities.
- certain chromosomal translocations are associated with a favorable prognosis, while others, such as translocations, indicate a more aggressive course and a poor prognosis.
- KMT2A gene fusions are characterized in two types of acute leukemia (lymphocytic and myelocytic) from both genetic and epigenetic perspectives.
- a targeted long-read nanopore sequencing approach to analyze gene fusions between KMT2A and five frequently observed partner genes.
- the fusion and wild-type genes are physically separated.
- the method provides a way to identify the exact breakpoints with base pair resolution, determining phased variants by assembling complete contigs of both wild-type and fusion alleles, and examining epigenetic changes.
- the epigenetic information obtained potentially allows one to predict the expression activity of the fusion and wild-type genes.
- the method may comprise: (a) lysing cells that have a fusion between a first gene and a second gene at an end of an electrophoresis gel to release the genomic DNA from the cells; (b) applying a voltage potential to the gel, thereby trapping intact genomic DNA at one end of the gel; (c) digesting the trapped genomic DNA using two or more pairs of RNA-guided endonucleases to release: (i) a segment of the first gene; (ii) a segment of the second gene; and (iii) a segment of a gene fusion between the first and second genes; wherein the segments of the first and second genes are approximately the same size and the segment of the gene fusion is resolvable from the segments of the first and second genes in the gel; (d) electrophoresing the segments of (i), (ii) and (iii) through the gel, thereby separating the segment of (iii) from the segments of (i) and (ii); (e) eluting the
- FIG. 1A and 1B Limitations in fusion gene characterization using targeted long-read sequencing (Fig. 1A), and automated and streamlined separation of targets from genomic DNA and between wild-type and structural variant (Fig. 1B). Simple qPCR analysis for quantification of target HMW fragments in elution modules informs existence of fusion gene even before downstream sequencing step.
- Figs. 2A and 2B TaqMan qPCR copy number analysis for detecting size-shifted fusion gene.
- Fig. 2A Illustration of wild type and fusion genes in CATCH assay products.
- Fig. 2B Copy numbers measured by each TaqMan assay probes. For each elution module, copy numbers of both target and control (RNaseP) are shown. The elution modules are indicated as expected length of the HMW DNA fragments collected in them.
- Figs. 3A and 3B ONT sequencing read alignment for identification of fusion breakpoints (Fig. 3A) and phased variants (Fig. 3B).
- Figs. 4A and 4B Differential DNA modification in KMT2A promoter CpG. Sequence pile-ups and read depths are shown with information about CpG modifications (Fig. 4A). Red bars indicate 5-methylcytosine (5mC), and blue bars indicate 5-hydoxymethylcytosine (5hmC). a depletion of 5mC in both the wild-type and fusion genes was observed. Fig. 4A). Red bars indicate 5-methylcytosine (5mC), and blue bars indicate 5-hydoxymethylcytosine (5hmC). a depletion of 5mC in both the wild-type and fusion genes was observed. Fig.
- FIG. 4B illustrates graphs showing a depletion of 5mC in both the wild-type and fusion genes.
- Fig. 5. Flow chart illustrating a guide RNA design process.
- Fig. 6. Two gRNA designs for KMT2A gene. The two designs generate HMW fragments where the location of KMT2A gene is different.
- Fig. 7. Detection of fusion genes resulted from reciprocal translocation. In right panels, illustration of wild type and fusion genes in CATCH assay products are shown with breakpoint cluster regions as well as the expected length of contributing parts of fusion genes.
- Two TaqMan copy number assay probes are shown with their target locations in the assay products and the copy number data for individual elution modules.
- Fig. 8 Comparison of ligation- and tagmentation-based ONT library preparations.
- binned sequencing read depth (bin size: 5 kb) is shown for different library preparations (ligation versus tagmentation) and CATCH assays (300-kb versus 1-Mb assays). Blue color indicates alignments to forward strand, and red color indicates alignments to reverse strand.
- Box plots are shown for location specific read depth for all the six target in the multiplex CATCH assay.
- nucleic acids are written left to right in 5' to 3' orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively.
- the headings provided herein are not limitations of the various aspects or embodiments of the invention. Accordingly, the terms defined immediately below are more fully defined by reference to the specification as a whole. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
- the method may comprise lysing cells that have a fusion between a first gene and a second gene at an end of an electrophoresis gel to release the genomic DNA from the cells.
- the cells may be blood cells (e.g., PBMCs), cultured cells, or a dissociated tissue cell suspension, for example, although a biopsy could be used.
- PBMCs blood cells
- a dissociated tissue cell suspension for example, although a biopsy could be used.
- the first region to which the forward primers bind may be selected from the group consisting of ROS1, ALK, EML4, BCR, ABL, TCF3, PBX1, ETV6, RUNX1, MLL, AF4, SIL, TAL1, RET, NTRK1, PAX8, PPARG, MECT1, MAML2, TFE3, TFEB, BRD4, NUT, ETV6, S22-267 NTRK3, TMPRSS2, NKRT2, KMT2A and ERG.
- the cells may have a fusion involving ALK, RET, KMT2A, NTRK1, ROS1, BRAF, EGFR, NRG1 or MET. Possible fusion partners for these genes are numerous.
- the fusion partner may be EML4, STRN, KIF5B and/or TFG.
- the fusion partner may be CD74, SLC34A2, SDC4, TPM3 and/or EZR.
- the fusion-specific primers may target any one or more of the following fusions: CD74-ROS1, SLC34A2-ROS1, SDC4-ROS1, EZR-ROS1, GOPC-ROS1, LRIG3-ROS1, TPM3-ROS1, PPFIBP1-ROS1, EML4-ALK, BCR-ABL, TCF3-PBX1, ETV6- RUNX1, MLL-AF4, SIL-TAL1, RET-NTRK1, PAX8-PPARG, MECT1-MAML2, TFE3- TFEB, BRD4-NUT, ETV6-NTRK3, TMPRSS2-ERG, TPM3-NTRK1, SQSTM1-NTRK1, CD74-NTRK1, MPRIP-NTRK1 and TRIM24-NTRK2.
- the method finds particular utility in analyzing blood cells.
- the patient from which the blood cells were obtained may have a blood cancer such as acute myeloid leukemia (AML), acute lymphoblastic leukemia (ALL), or mixed-phenotype acute leukemia (MPAL).
- AML acute myeloid leukemia
- ALL acute lymphoblastic leukemia
- MPAL mixed-phenotype acute leukemia
- Gene fusions found in AML include RUNX1-RUNX1T1, PML-RARA, ZNF292-PNRC1, NUP98-NSD1, CBFB-MYH11, KMT2A-MLLT4, KMT2A-MLLT3, KMT2A-MLLT10, and DEK-NUP214.
- Gene fusions found in ALL include BCR-ABL1, ETV6- RUNX1, EP300-ZNF384, TCF3-PBX1, KMT2A-AFF1, MEF2D-BCL9, STIL-TAL1, TCF3- HLF, ZNF292-PNRC1, EBF1-PDGFRB, PAX5-NOL4L, PICALM-MLLT10, and TCF3- ZNF384.
- Gene fusions found in MPAL patients include BCR-ABL1, ETV6-ARNT, ETV6- NCOA2, ETV6-LOH12CR1, PICALM-MLLT10, NAP1L1-MLLT10, RUNX1-MECOM, TRA2B-MECOM, SET-NUP214 and KMT2A-MLLT4.
- KMT2A fusions ae relatively common in ALL and AML. This gene has more than 100 fusion partners.
- KMT2A-MLLT4 Gene fusions involving KMT2A include KMT2A-MLLT4, KMT2A-MLLT3, KMT2A-MLLT10, KMT2A-AFF1, KMT2A-MLLT1, KMT2A-ELL, KMT2A-MLLT6, KMT2A-USP2, KMT2A- MAML2, KMT2A-MLLT11, KMT2A-MYO1F, KMT2A-SEPT5, KMT2A-SEPT6, and KMT2A-CARS, among others.
- the gene fusion may be a kinase, transcription factor or epigenetic genes (a chromatin modifier) for example.
- the method may involve applying a voltage potential to the gel, thereby trapping intact genomic DNA at one end of the gel and digesting the trapped genomic DNA using two or S22-267 more pairs of RNA-guided endonucleases to release: a segment of the first gene a segment of the second gene (i.e., a ‘fusion partner’ for the first gene); and a segment of a gene fusion between the first and second genes.
- the segments of the first and second genes are approximately the same size (e.g., the segments have a size difference of less than at least 100kb, less then 50kb, or less than 20kb) and the segment of the gene fusion is resolvable from the segments of the first and second genes in the gel.
- the segments may be 100kb - 1 Mb in length and the segment pf the gene fusion is smaller or larger than the other segments by at least 50kb, at least 100kb, at least 200kb.
- the method may comprise electrophoresing the segments through the gel, thereby separating the segment of segment of the gene fusion from the segments of the first and second genes. This step may be done by pulsed field electrophoresis.
- the method may comprise eluting the segments into different fractions (e.g., 4-10 fractions) by applying a second voltage potential to the gel, wherein the second voltage potential is orthogonal to the potential used earlier in the method, wherein the segment of gene fusion is eluted into a fraction that is different to the fraction into which the segments of first and second genes are eluted.
- flanking sequences of the first and second genes may be assayed, e.g., by quantitative PCR (e.g., Taqman) to identifying a fraction that contains the segments of the first and second genes and a fraction contains the segment of the gene fusion.
- quantitative PCR e.g., Taqman
- the eluted segment of the gene fusion may be sequenced using any suitable long range sequencing technology, e.g., nanopore sequencing (e.g., as described in Soni et al. Clin. Chem. 200753: 1996-2001, or as described by Oxford Nanopore Technologies).
- Nanopore sequencing is a single-molecule sequencing technology whereby a single molecule of DNA is sequenced directly as it passes through a nanopore.
- a nanopore is a small hole, of the order of 1 nanometer in diameter.
- Immersion of a nanopore in a conducting fluid and application of a potential (voltage) across it results in a slight electrical current due to conduction of ions through the nanopore.
- the amount of current which flows is sensitive to the size and shape of the nanopore.
- S22-267 As a DNA molecule passes through a nanopore, each nucleotide on the DNA molecule obstructs the nanopore to a different degree, changing the magnitude of the current through the nanopore in different degrees.
- the cells are blood cells from a patient that has a blood cancer associate with the gene fusion, e.g., ALL or AML, etc.
- the results of the method may be used as a diagnostic, to measure the severity of the cancer, to monitor the disease, to determine if a treatment is working and to make treatment decisions. For example, if a gene fusion involves a kinase, then a drug that targets that kinase can be administered.
- the method may be multiplexed, as needed.
- the DNA may be cleaved by two or more sets of three pairs of RNA-guided endonucleases, where each set targets a different fusion.
- the present method may be carried out on a SageHLS device, as described at sagescienes.com and US20210062180, which is incorporated by reference in its entirety.
- the device may comprise molecule retention cassette for retaining molecu1es during electrophoresis, the cassette comprising: a housing; a lane configured within the housing, the lane having a first elongate edge and a second elongate edge; an elution module configured to be received in the lane to divide the lane into a first chamber and a second chamber, a first buffer reservoir positioned adjacent the first elongate edge; and a second buffer reservoir positioned adjacent the second elongate edge; wherein: a first side of the elution module facing the first chamber comprises a porous sterile filtration membrane; and a second side of the elution module facing the second chamber comprises an ultrafiltration membrane, the ultrafiltration membrane having a pore size to retain molecules during electrophoresis.
- the device may be used as follows for isolating and collecting target segments of target particles, the method comprising: receiving a sample in a sample well of an elution module; receiving an SDS-containing lysis buffer in a first buffer chamber, buffer chamber being S22-267 configured along a first side of the elution module; applying a first electrophoresis voltage t grate components of e sample towards a second buffer chamber configured along a second side of the elution module, such that: target particles are immobilized in a gel segment configured along the second side of the elution module between the elution module and the second buffer chamber, and non-target particles pass through the gel segment and into the second buffer chamber; washing the first buffer chamber, the second buffer chamber, and the elution module; filling the first buffer chamber, the second buffer charmer, and the elution module with a Cas9 reaction buffer; emptying the elution module; refilling the elution module with a Cas9 enzyme mix to cleave sections
- the system uses intact cells or isolated nuclei as input samples.
- Input samples are loaded into an agarose gel cassettes, and chromosome length DNA is extracted from the samples by electrophoresis of SDS through the sample well compartment.
- SDS-coated proteins, lipids are electrophoresed away from the sample well through the central agarose gel column, but the chromosome-length DNA becomes firmly entangled and immobilized in the agarose gel wall of the sample well.
- the sample well can be emptied and refilled without any loss of DNA. This allows for treatment the immobilized DNA by refilling the sample well with an enzyme reaction mixture.
- DNA processing enzymes readily diffuse into the agarose, including many restriction enzymes, DNA polymerases, ligases, transposases, non-specific DNA cleavases, and S. pyogenes Cas9.
- an additional round of size-selection electrophoresis is performed, followed by electroelution of the DNA products into a series of six buffer-filled elution modules arranged along one side of the gel separation column .
- the DNA processing step includes some cleavage to reduce the size of the desired S22-267 DNA products to below 2 megabases (mb) in length DNA greater than 2 mb will remain immobilized in the sample well, unable to move during electrophoresis.
- Each cassette can have two physically isolated sample processing areas.
- the cassette may a standard 96-well plate footprint.
- the central agarose channel has two loading wells. Cells or nuclei are loaded in the sample well, and art SDS-based lysis reagent is loaded into the reagent well. Electrophoresis is carried out to drive the SDS through the sample well compartment where the cells or nuclei are lysed. Chromosome-sized genomic DNA becomes immobilized in the sample well wall, while other components are carried to the bottom electrode chamber along with the SDS. After DNA processing and size selection electrophoresis, the DNA products are electroeluted into an array of six elution modules positioned along the right side of the agarose channel.
- the amount of fusion DNA in the cell-free fraction of a patient’s bloodstream should correlate with disease severity for those cancers that are associated with the fusion, e.g., a subset of non-small cell lung cancers.
- tracking the amount of fusion DNA over time could be used to, for example, determine if a treatment is working.
- Assays for accurately quantifying the amount of a particular fusion sequence in a sample are well known. For example, qPCR or Invader assay could be used.
- identifying and quantifying gene fusions in cfDNA would logically be implemented in two steps, where the first step involves sequencing a patient’s cfDNA to identify which genes are fused as well as the sequence at the junction of the fusion, and a second step that involves quantifying the amount of fusion DNA in the cfDNA (see, e.g., Harris et al, Nature Scientific Reports 20166: 29831).
- the problem with this approach is that the latter step is patient-specific in the sense that most reliable quantification methods (e.g., qPCR or Invader, for example) only work if primers that flank the fusion junction are used.
- Efforts have been made to ensure accuracy with respect to numbers used (e.g., amounts, temperatures, etc.), but some experimental error and deviation should, of course, be allowed for.
- the present method aims to overcome some of the challenges of DNA-based approaches in fusion gene sequencing using nanopore sequencing combined with a targeted enrichment method.
- nanopore sequencing methods such as nanopore Cas9- targeted sequencing (nCATS).
- nCATS nanopore Cas9- targeted sequencing
- these targeted methods start sequencing only from the ends, the long and fragile strands can easily break apart before the middle of the fragment is sequenced. Therefore, the existing targeted nanopore sequencing methods are often only useful only when a part of fusion gene (e.g. a region including the breakpoint) is targeted. When targeting the entire fusion gene is desired, one needs an alternative strategy.
- Sequencing the entire gene fusion provide information such as, fully phased germline and somatic variants and CpG island DNA methylation of the gene fusion. These information cannot be obtained when targeting a small part of the gene fusion including the breakpoint.
- the general strategy is illustrated in Figs. 1A and 1B.
- a high molecular weight (HMW) DNA enrichment based on Cas9-assisted targeting of chromosome segments (CATCH) was used.
- the targeted fragments are then eluted and quantified by qPCR.
- the method can enrich and extract target regions that are as large as 1 Mb.
- the enriched target regions can then be sequenced and analyzed for the purpose of characterizing gene fusions.
- the method combines targeted in vitro genome cutting and pulse- field electrophoresis to sequence targets that were previously not targetable in a single HMW molecule.
- the resulting fragments are then sequenced without amplification using an nanopore S22-267 sequencer to resolve large and complex rearrangements.
- This method has been used to sequence 22q11.2 and 16q11.2 CNV rearrangements that are prominent risk factors for neurological disorders such as schizophrenia and autism.
- the present study aims to use a modified version of CTLR-Seq to investigate blood samples from leukemia (AML or ALL) samples. These samples have been previously tested with clinical cytogenetic methods and the method described in this study characterize gene fusions between KMT2A and the aforementioned frequent partner genes.
- the qPCR step was used for target quantification also as simple detection method for presence of a rearrangement event.
- additional ONT sequencing data can be used to further characterize the event.
- Physically separated fusions and the wild type genes are sequenced (Fig. 1B).
- the characterization includes breakpoint identification with base pair resolution, entirely phased variants using end-to-end contig assembly of wildtype and fusion alleles, and also the epigenetic changes.
- the epigenetic information one should be able to predict the expression activity of the fusion and wild type genes, which will be validated by RNA sequencing and/or traditional RT-PCR-based gene expression analysis.
- Methods Guide RNA design To design 20-bp target sequences of gRNAs, all 20-bp sequences (20-mers) in the region of the target cut sites that occur directly adjacent to a Cas9 binding motif are considered. Only those candidate guide sequences that are unique in the human genome and that have no other alignments with less than three mismatches were retained. The GRCh38 genome was used to identify off-target CRISPR sites. An off-target gRNA sequence is defined as a 20-mers with three or four mismatches where the mismatches are located within the first 10 bases (the half distant from PAM domain).
- a candidate design does not have any off-target sites near or inside the targets (Fig. 5). Also, a candidate design has no pair of off-targets neighboring each other within a range and S22-267 generating an offtarget fragment that can co-migrate with target fragments in the electrophoretic separation. Some candidate gRNA sequences are included even if they have off-targets elsewhere in the genome. The overall number of off-targets are considered as a parameter when assessing the design.
- Fig. 9 provides the final gRNA designs with information about their potential off-targets.
- PBMCs Peripheral blood mononuclear cells
- FCS fetal calf serum
- DMSO dimethyl sulfoxide
- crRNAs Custom-designed synthetic CRISPR RNAs
- tracrRNA trans-activating CRISPR RNA
- Per target a pair of crRNAs were used to excise the 300 kb or 1 Mb fragments, one crRNA targeting the 5’ flanking region and the other targeting the 3’ flanking region. Up to six pairs of crRNAs were multiplexed (i.e. up to six targets in a single assay).
- gRNA-Cas9 assembly When preparing guide RNA (gRNA)-Cas9 assembly for four sample runs (the maximum capacity of Sage HLS machine), 800 fmol of pooled crRNAs was annealed to 520 fmol of tracrRNA in 44 ⁇ L of 1X duplex buffer (Integrated DNA Technologies) at 95°C for 10 minutes, followed by cooling at room temperature for 5 minutes. The annealed gRNA mix was assembled with 160 fmol of Cas9 endonuclease in 80 ⁇ L of 1X enzyme buffer (Sage Science) at 37°C for 10 minutes.
- 1X duplex buffer Integrated DNA Technologies
- the SDS, proteins, and membrane components were carried away from the sample well to the bottom electrode chamber.
- the large genomic DNA (>2 Mb) was embedded in the agarose wall of the sample well during the extraction electrophoresis.
- the electrophoresis was halted and the reagent well was emptied and refilled with the Cas9–gRNA reaction mixture.
- the reaction mixture was diluted with 3X volume of 1X enzyme buffer (Sage Sciecne) prior to loading. Electrophoresis was carried out for 4 min to drive the Cas9 enzyme into contact with the genomic DNA inside the sample well wall. Then, the electrophoresis was stopped, followed by Cas9 digestion of the genomic DNA at room temperature for 30 minutes.
- the electrophoresis process used a pulsed-field waveform, designed for optimal resolution of DNA fragments 300 kb or 1 Mb in size.
- a second orthogonal set of electrodes was used to elute the size- separated DNA into a series of elution modules located along one side of the gel column. The DNA was moved from the elution modules after 12 hours after run termination. Quantitation of targeted high molecular weight DNA: The DNA from each elution module was prepared with a 1:5 dilution in 33% bCD.
- TaqMan qPCR Copy Number assays (Thermo Fisher Scientific) were used to measure the DNA concentration after extraction.
- the 10 ⁇ L reaction included 2 ⁇ L of diluted target DNA sample, 1X TaqMan Genotyping Mix, 1X TaqMan RNaseP reference and 1X TaqMan assay for the specific targets.
- the samples were denatured at 95°C for 10 min, followed by 50 cycles of 15 s at 95°C and 1 min at 60°C.
- a relative quantification i.e. target versus RNaseP reference
- a modified ⁇ Ct method [20] was used.
- One ng of NA18507 genomic DNA was used as a control.
- absolute copy number it was assumed that 290 genome copies were in 1 ng of the control sample.
- a SQK-RAD004 kit (Oxford Nanopore Technologies, Littlemore, Oxford, UK) was used for the library preparation with modifications. Frist, FRA was prepared in 1:20, 1:30, or 1:40 dilutions with FRA dilution buffer from SQK-ULK001 (Oxford Nanopore Technologies). The purified CATCH product were combined with 5 ⁇ L of a FRA dilution for a total volume of 20 ⁇ L. The resulting mixture was incubated at 30°C for 1.5 minutes, and then at 80°C for 1.5 minutes. Following the incubation period, 20 ⁇ L of the tagmented CATCH product was combined with 1 ⁇ L of RAP and incubated at room temperature for 5 minutes.
- the library was loaded to PromethION flowcell (R9.4.1) after combined with water, loading beads, and sequencing buffer following the manufacturer’s instruction.
- the library was sequenced using an Oxford Nanopore Technologies PromethION 24 sequencing machine (Oxford Nanopore Technologies). The sequencing was performed for 72 hr with a high-accuracy base calling model (‘dna_r9.4.1_450bps_hac_prom’) and pore scanning in every 1.5 hr.
- the base calls generated during sequencing was used only for real-time monitoring of sequencing run quality.
- Base calling, alignment, and assembly For base calling, Guppy (v6.1.1, Oxford Nanopore Technologies) was used with a super-accuracy model (‘dna_r9.4.1_450bps_sup_prom’).
- CpG methylation was called with two different models: i) 5-methylcytosine (‘dna_r9.4.1_450bps_modbases_5mc_cg_sup_prom’) and ii) 5-hydroxymethylcytosine or 5-methylcytosine (‘dna_r9.4.1_450bps_modbases _5hmc_5mc_cg_sup_prom’).
- Single cell multi-omics assay For single cell multi-omics assay, the Chromium Next GEM Single Cell Multiome ATAC and Gene Expression Reagent kit (10X Genomics, Pleasanton, CA) was used.
- the ATAC and RNA seq libraries were prepared according to the manufacturer’s guide, and sequenced using NovaSeq 6000 with 50:8:24:49 (ATAC) or 28:10:10:90 (RNA) paired end format.
- Cell Ranger ARC v2.0.0, 10X Genomics
- the raw sequencing data was then demultiplexed, aligned, and initially analyzed to produce matrix tables for ATAC peaks/fragments and the gene expression.
- Signac package [25] was used to generate Seurat objects and to cluster cells based on the multi-omics features.
- Pathway analysis A pathway enrichment analysis was conducted based on cancer hallmark signatures downloaded from MSigDB (v.6.2) [26, 27].
- GSVA Gene Set Variation Analysis
- Assay 2 and 3 differed in the gRNA pair targeting the KMT2A gene.
- One gRNA pair positioned KMT2A in the middle of the resulting fragment, while the other targeted a shifted location toward the 3’-flanking region of KMT2A (Fig. 6).
- the shifted KMT2A location improved the separation resolution between wild type and fusion fragments in some samples.
- TaqMan DNA copy number assays were S22-267 performed to quantify the absolute amount of target HMW DNA molecules in the elution modules. These modules collected CRISPR-excised fragments in different size ranges (Fig. 1B).
- TaqMan assays showed a different pattern because the fusion gene was shorter than the wild type genes (Fig. 2B). Even in cases where electrophoresis separation was not reproducible, the size shift could still be detected by comparing the two assay patterns. Therefore, a minimum of two TaqMan assays were required to detect the fusion: one targeting a region that exists only in the wild type and the other targeting a region that exists in both the wild type and fusion. Additional TaqMan assays can be added to further confirm fusion events and detect more complex events, such as chimeric fusions.
- the nanopore sequencing library preparation method was modified for the CTRL-seq approach.
- the tagmentation-based method demonstrated superior uniformity in sequencing coverage. It eliminated the need for post- adapter ligation cleanup and allowed direct loading of the library.
- the tagmentation enzyme concentration and enzymatic reaction duration long sequence reads with N50 values comparable to the ligation-based method while maintaining uniform coverage were obtained.
- the optimized tagmentation-based method nanopore sequencing reads ranging from 15kb to 35kb in N50 were generated and achieved the longest on-target reads, which generally exceeded 100kb for most samples.
- Fig. 3A shows an example of experimentally binned sequence reads (i.e. two separate sequencing runs) aligned to human genome.
- the fusion-participating genes existed as wild type in one of the runs, but as a fusions gene in the other. The alignment showed clear break points only in one of the sequencings.
- the single-cell profile represented a composite signal from both types, rather than separate signals for wild-type or fusion genes.
- distinct clusters of leukemia cells were identified, where variations in ATAC signal and the corresponding expression patterns were observed.
- These findings suggest that specific leukemia cell populations exhibited differential ATAC signal, indicating potential regulatory changes, accompanied by corresponding alterations in gene expression.
- One interesting observation pertains to the consistency exhibited by both AML samples. These samples showed consistent changes in terms of i) the structure of cell clusters, ii) ATAC coverage alterations, and iii) the corresponding expression changes.
- the ATAC coverage spikes indicative of open chromatin regions, were located before the first two exons (Exon 1 and the alternative exon adjacent to the first exon).
- AML clusters in close proximity to each other in the UMAP space (AML1-3 clusters in SU710 and AML2-7 clusters in SU968) exhibited a more pronounced decrease compared to the distant clusters (AML4 cluster in SU710 and AML1 cluster in SU968) within each AML sample.
- the unique leukemia cell cluster had enrichment in "ANGIOGENESIS,” “ADIPOGENESIS,” and “XENOBIOTIC_METABOLISM.” From SU968, the unique leukemia cell cluster had enrichment in "HEDGEHOG_SIGNALING,” “MYC_TARGETS_V1,” and “MYC_TARGETS_V2.” Shifted KMT2A location in target HMW fragment improved fusion separation A multiplex assay was designed, locating the target genes in the middle of the target fragments (e.g. Assay 2). However, in some samples, the gene fusion and the wild type fragments had similar length, and the difference was less than 50 kb.
- the HMW DNA fragments with similar length were eluted in a same elution well, and therefore the simple qPCR method was not able to detect the fusion.
- the initially designed CATCH assay was not able to separate fusion fragments, whose lengths were 330 kb and 270 kb, from SU659 (sample with a KMT2A-AFF1 fusion).
- Another pair of KMT2A gRNAs was designed to be compatible with other existing gRNAs to locate the gene at a shifted location toward 5’-end.
- the distribution of insert sizes and the length of the sequence reads could be optimized by selecting concentration of transposase enzyme and the duration of the enzymatic reaction.
- the goal of optimization was achieving the longest sequence reads while keeping the yield comparable to the ligation-based method. 1:5, 1:20, and 1:40 dilution of original tagmentation enzyme mix were tested with different clinical samples, and N50 was achieved, comparable to ligation-based method while keeping the uniform sequencing coverage throughout the target region.
Landscapes
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Organic Chemistry (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Analytical Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Biochemistry (AREA)
- Biophysics (AREA)
- Microbiology (AREA)
- Plant Pathology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Immunology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Crystallography & Structural Chemistry (AREA)
- Electrochemistry (AREA)
- General Physics & Mathematics (AREA)
- Pathology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Provided herein is a method that involves lysing cells that have a fusion between a first gene and a second gene at an end of an electrophoresis gel, applying a voltage potential to the gel to intact genomic DNA at one end of the gel, digesting the trapped genomic DNA using two or more pairs of RNA-guided endonucleases to release segments, electrophoresing the segments, eluting the segments into different fractions and analyzing the sequences nucleic acid collected in the fractions to identify a fraction that contains the segments of the first and second genes and a fraction that contains the segment of the gene fusion.
Description
S22-267 IDENTIFICATION AND CHARACTERIZATION OF GENE FUSIONS BY CRISPR- TARGETED NANOPORE SEQUENCING GOVERNMENT RIGHTS This invention was made with Government support under contracts CA247700, HG006137, and HG010963 awarded by the National Institutes of Health. The Government has certain rights in the invention. CROSS-REFERENCING This application claims the benefit of U.S. provisional application serial no. 63/414,889, filed on October 10, 2022, which application is incorporated by reference herein. INCORPORATION BY REFERENCE OF SEQUENCE LISTING PROVIDED AS A SEQUENCE LISTING XML FILE A Sequence Listing is provided herewith as a Sequence Listing XML, “STAN-2002WO_SEQ_LIST”, created on October 9, 2023 and having a size of 13,404 bytes. The contents of the Sequence Listing XML are incorporated herein by reference in their entirety. BACKGROUND Genomic instability is the increased frequency of DNA mutations and is characteristic to nearly all human cancer types. This instability can manifest in somatic chromosomes in many ways, ranging from single base-pair alterations to chromosomal translocations and deletions. Increased mutation rate due to genomic instability drives cancer progression through the silencing of tumor suppressor genes and activation of protooncogenes. For this reason, genomic instability is a hallmark of cancer. However, genomic instability is not required for cancer development, as can be observed in mismatch repair deficient colorectal tumors and colorectal adenomas. Still, 90% of all cancer types feature alterations in chromosome number and structure, with 16.5% possessing a driving fusion mutation. Leukemias commonly have at least one significant chromosomal variation. Understanding and identifying the specific genomic
S22-267 aberrations in cancer types can be used to more accurately provide diagnoses, prognoses, and targeted therapies. In acute myeloid leukemia (AML), certain chromosomal translocations are associated with a good prognosis, whereas some chromosomal deletions or duplications are indicative of an aggressive cancer with a poor prognosis. The ability to determine the specific mutations in cancer provides significant clinical utility and allows for the precise delivery of targeted therapies. Chromosomal rearrangement due to genetic instability can result in the fusion of genes. The resulting proteins from these gene fusions have altered functionality and can drive cancer progression, such as the BCR-ABL fusion protein observed in chronic myeloid leukemia (CML). This protein has increased tyrosine kinase activity as compared to its normally functioning counterpart. Patients with the BCR-ABL gene fusion can now be treated with specific tyrosine kinase inhibitors. The identification of this gene fusion and resulting protein allows for targeted therapy and improved prognosis of CML patients. As manageable as this example seems, the variable nature of genetic instability maintains great complexities. An example of such complexity is illustrated through KMT2A gene fusions. The KMT2A gene consists of 36 exons and is located on chromosome 11 in the q23 position. The gene encodes a protein with a H3K4 methyltransferase domain that plays a critical role in the regulation of gene expression in early development and hematopoiesis. KMT2A gene fusions are among the most common chromosomal abnormalities in acute leukemias, occurring in 80% of infant acute lymphoblastic leukemias (ALL), 5% of acute monocytic leukemia (AML) cases, and in 85% of secondary AML cases in patients previously treated with topoisomerase II inhibitors. KMT2A has 135 reported fusion partners, some of which are pathogenic and some are not. Regardless of the pathogenicity, it is well established that KMT2A gene fusions are drivers of acute leukemia. The most frequent partner genes in KMT2A gene fusions are AFF1, MLLT3, MLLT10, ELL, and AFDN. These partner genes make up 80% of the gene fusions in KMT2A-positive acute leukemias. The specific fusion partner contributes to the determination of either the myeloid or lymphoblastic disease phenopatype, and the nature of the rearrangement can be used to predict prognosis. The KMT2A gene has a multitude of breakpoint regions in several different exons. The majority of KMT2A breakpoints are localized to a breakpoint cluster region between exon 7 and exon 14, but there are more than
S22-267 one cluster regions, whose combined range is greater than 22 kb. Cytogenetic analysis can be used to identify chromosomal abnormalities through karyotyping and fluorescence in-situ hybridization (FISH). These methods provide visual information on chromosome structure and some specific variations, but they are limited by low resolution. In addition, these cytogenetic methods cannot provide information on i) the breakpoints with base pair resolution, and ii) whether the fusion gene is transcriptionally active. A more common method to detect and characterize fusion genes is reverse transcription polymerase chain reaction (RT-PCR), which uses transcribed mRNA to determine the fusion gene. This method is highly tedious and requires prior cytogenetic analysis to confirm the results. RNA-sequencing can also be used for whole transcriptome analysis by sequencing chimeric gene fusions through mRNA expression. This method is not without fault, as it is also highly complex and returns a considerable number of false positives and negatives. A recent report showed that RNA-seq failed to detect 10% gene fusions that were determined by routine cytogenetics methods with high confidence. The false negative cases generally coincided with a low sequencing coverage for the transcribed fusion genes, which was likely due to instability of RNA molecules and/or low expression of the fusion gene. To avoid the false positives related to the stability issues and the dynamics of the gene expression, sequencing gene fusions at the gene level could, in theory, be a solution. However, the size of fusion gene can be greater than 100 kb, making it very difficult to target. When focusing only on the breakpoint regions, the size of target can be smaller. For example, KMT2A gene has two major breakpoint cluster regions, whose combined length is less than 30kb. However, even with this information, designing an assay for determining the breakpoint is still be challenging because i) although less frequent, the breakpoint can be located outside of these cluster regions and ii) the range is still larger than the size that is amplifiable with routine PCR. These issues can generate false positive results. Furthermore, DNA-based fusion detection does not generally provide information about whether the fusion gene is transcriptionally active. By way of example, acute leukemia is a complex disease with varying prognoses depending on specific chromosomal abnormalities. In particular, certain chromosomal translocations are associated with a favorable prognosis, while others, such as translocations, indicate a more aggressive course and a poor prognosis. One of the most common
S22-267 chromosomal abnormalities in acute leukemias is the fusion of the KMT2A gene with various partner genes. However, characterizing these KMT2A gene fusions has been challenging due to their complexity. There are numerous reported partner genes, and the participating genes have multiple breakpoint regions spanning different exons. This complexity has hindered the genetic and epigenetic characterization of these gene fusions using existing methods. Consequently, the relationship between the genetic and epigenetic variations of gene fusions and the cellular physiology of leukemia remains unclear. The present method is believed to solve this problem. SUMMARY This disclosure provides a way to characterize gene fusions. In the example, KMT2A gene fusions are characterized in two types of acute leukemia (lymphocytic and myelocytic) from both genetic and epigenetic perspectives. A targeted long-read nanopore sequencing approach to analyze gene fusions between KMT2A and five frequently observed partner genes. In this targeted approach, the fusion and wild-type genes are physically separated. The method provides a way to identify the exact breakpoints with base pair resolution, determining phased variants by assembling complete contigs of both wild-type and fusion alleles, and examining epigenetic changes. The epigenetic information obtained potentially allows one to predict the expression activity of the fusion and wild-type genes. In some embodiments, the method may comprise: (a) lysing cells that have a fusion between a first gene and a second gene at an end of an electrophoresis gel to release the genomic DNA from the cells; (b) applying a voltage potential to the gel, thereby trapping intact genomic DNA at one end of the gel; (c) digesting the trapped genomic DNA using two or more pairs of RNA-guided endonucleases to release: (i) a segment of the first gene; (ii) a segment of the second gene; and (iii) a segment of a gene fusion between the first and second genes; wherein the segments of the first and second genes are approximately the same size and the segment of the gene fusion is resolvable from the segments of the first and second genes in the gel; (d) electrophoresing the segments of (i), (ii) and (iii) through the gel, thereby separating the segment of (iii) from the segments of (i) and (ii); (e) eluting the segments into different fractions by applying a second voltage potential to the gel, wherein the second voltage potential is
S22-267 orthogonal to the potential of (a), wherein the segment of (iii) is eluted into a fraction that is different to the fraction into which the segments of (i) and (ii) are eluted; and (f) assaying for the flanking sequences of the first and second genes in the fractions collected in (e), thereby identifying a fraction that contains the segments of the first and second genes and a fraction contains the segment of the gene fusion. BRIEF DESCRIPTION OF THE FIGURES The skilled artisan will understand that the drawings, described below, are for illustration purposes only. The drawings are not intended to limit the scope of the present teachings in any way. Figs. 1A and 1B. Limitations in fusion gene characterization using targeted long-read sequencing (Fig. 1A), and automated and streamlined separation of targets from genomic DNA and between wild-type and structural variant (Fig. 1B). Simple qPCR analysis for quantification of target HMW fragments in elution modules informs existence of fusion gene even before downstream sequencing step. Figs. 2A and 2B. TaqMan qPCR copy number analysis for detecting size-shifted fusion gene. Fig. 2A: Illustration of wild type and fusion genes in CATCH assay products. The size of HMW assay products are provided, and red arrow head indicates CRISPR target sites. Dumbbells with bent and circled ends indicate TaqMan probes binding to different locations of target HMW fragments. The probes are color-coded in the illustration, and some assay probes bind to both wild type and fusion fragments because they target shared sequences. Fig. 2B: Copy numbers measured by each TaqMan assay probes. For each elution module, copy numbers of both target and control (RNaseP) are shown. The elution modules are indicated as expected length of the HMW DNA fragments collected in them. Figs. 3A and 3B. ONT sequencing read alignment for identification of fusion breakpoints (Fig. 3A) and phased variants (Fig. 3B). Bars representing the sequence context (wild type or fusion genes with flanking sequences) are shown at the top of sequence pile ups. For fusion genes sequence, pile-ups for two different genomic locations are shown with solid lines guiding the breakpoint locations. Zoom-ins for examples of phased variants are also
S22-267 shown. Figs. 4A and 4B. Differential DNA modification in KMT2A promoter CpG. Sequence pile-ups and read depths are shown with information about CpG modifications (Fig. 4A). Red bars indicate 5-methylcytosine (5mC), and blue bars indicate 5-hydoxymethylcytosine (5hmC). a depletion of 5mC in both the wild-type and fusion genes was observed. Fig. 4B illustrates graphs showing a depletion of 5mC in both the wild-type and fusion genes. Fig. 5. Flow chart illustrating a guide RNA design process. Fig. 6. Two gRNA designs for KMT2A gene. The two designs generate HMW fragments where the location of KMT2A gene is different. Fig. 7. Detection of fusion genes resulted from reciprocal translocation. In right panels, illustration of wild type and fusion genes in CATCH assay products are shown with breakpoint cluster regions as well as the expected length of contributing parts of fusion genes. Two TaqMan copy number assay probes are shown with their target locations in the assay products and the copy number data for individual elution modules. Two fusion fragments were detected which are longer or shorter than the wild type fragments. Dotted guide lines indicate the target HMW fragments collected in corresponding elution modules. Fig. 8. Comparison of ligation- and tagmentation-based ONT library preparations. (Upper panels) For a genomic region including KMT2A gene, binned sequencing read depth (bin size: 5 kb) is shown for different library preparations (ligation versus tagmentation) and CATCH assays (300-kb versus 1-Mb assays). Blue color indicates alignments to forward strand, and red color indicates alignments to reverse strand. (Lower panels) Box plots are shown for location specific read depth for all the six target in the multiplex CATCH assay. Mean coverage for 15-kb regions at 5’-end, middle, and 3’-end of each gene are used to compare uniformity of sequence coverage. Fig. 9: Table of gRNA sequences. From top to botton: SEQ ID NOS: 1-14. DETAILED DESCRIPTION
S22-267 Unless defined otherwise herein, all technical and scientific terms used in this specification have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are described. All patents and publications, including all sequences disclosed within such patents and publications, referred to herein are expressly incorporated by reference. Numeric ranges are inclusive of the numbers defining the range. Unless otherwise indicated, nucleic acids are written left to right in 5' to 3' orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively. The headings provided herein are not limitations of the various aspects or embodiments of the invention. Accordingly, the terms defined immediately below are more fully defined by reference to the specification as a whole. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Singleton, et al., DICTIONARY OF MICROBIOLOGY AND MOLECULAR BIOLOGY, 2D ED., John Wiley and Sons, New York (1994), and Hale & Markham, THE HARPER COLLINS DICTIONARY OF BIOLOGY, Harper Perennial, N.Y. (1991) provide one of ordinary skill in the art with the general meaning of many of the terms used herein. Still, certain terms are defined below for the sake of clarity and ease of reference. Other definitions of terms may appear throughout the specification. In some embodiments, the method may comprise lysing cells that have a fusion between a first gene and a second gene at an end of an electrophoresis gel to release the genomic DNA from the cells. Methods for lysing cells are well known. In any embodiment, the cells may be blood cells (e.g., PBMCs), cultured cells, or a dissociated tissue cell suspension, for example, although a biopsy could be used. Several gene fusions that are thought to cause cancer have already been identified and may be targeted using the present method. As such, in some embodiments, the first region to which the forward primers bind may be selected from the group consisting of ROS1, ALK, EML4, BCR, ABL, TCF3, PBX1, ETV6, RUNX1, MLL, AF4, SIL, TAL1, RET, NTRK1, PAX8, PPARG, MECT1, MAML2, TFE3, TFEB, BRD4, NUT, ETV6,
S22-267 NTRK3, TMPRSS2, NKRT2, KMT2A and ERG. In some embodiments, the cells may have a fusion involving ALK, RET, KMT2A, NTRK1, ROS1, BRAF, EGFR, NRG1 or MET. Possible fusion partners for these genes are numerous. For example, if the fusion involves the ALK gene, then the fusion partner may be EML4, STRN, KIF5B and/or TFG. Likewise, if the fusion involves the ROS1 gene, then the fusion partner may be CD74, SLC34A2, SDC4, TPM3 and/or EZR. In some embodiments, the fusion-specific primers may target any one or more of the following fusions: CD74-ROS1, SLC34A2-ROS1, SDC4-ROS1, EZR-ROS1, GOPC-ROS1, LRIG3-ROS1, TPM3-ROS1, PPFIBP1-ROS1, EML4-ALK, BCR-ABL, TCF3-PBX1, ETV6- RUNX1, MLL-AF4, SIL-TAL1, RET-NTRK1, PAX8-PPARG, MECT1-MAML2, TFE3- TFEB, BRD4-NUT, ETV6-NTRK3, TMPRSS2-ERG, TPM3-NTRK1, SQSTM1-NTRK1, CD74-NTRK1, MPRIP-NTRK1 and TRIM24-NTRK2. The method finds particular utility in analyzing blood cells. In these embodiments, the patient from which the blood cells were obtained may have a blood cancer such as acute myeloid leukemia (AML), acute lymphoblastic leukemia (ALL), or mixed-phenotype acute leukemia (MPAL). Gene fusions found in AML include RUNX1-RUNX1T1, PML-RARA, ZNF292-PNRC1, NUP98-NSD1, CBFB-MYH11, KMT2A-MLLT4, KMT2A-MLLT3, KMT2A-MLLT10, and DEK-NUP214. Gene fusions found in ALL include BCR-ABL1, ETV6- RUNX1, EP300-ZNF384, TCF3-PBX1, KMT2A-AFF1, MEF2D-BCL9, STIL-TAL1, TCF3- HLF, ZNF292-PNRC1, EBF1-PDGFRB, PAX5-NOL4L, PICALM-MLLT10, and TCF3- ZNF384. Gene fusions found in MPAL patients include BCR-ABL1, ETV6-ARNT, ETV6- NCOA2, ETV6-LOH12CR1, PICALM-MLLT10, NAP1L1-MLLT10, RUNX1-MECOM, TRA2B-MECOM, SET-NUP214 and KMT2A-MLLT4. KMT2A fusions ae relatively common in ALL and AML. This gene has more than 100 fusion partners. Gene fusions involving KMT2A include KMT2A-MLLT4, KMT2A-MLLT3, KMT2A-MLLT10, KMT2A-AFF1, KMT2A-MLLT1, KMT2A-ELL, KMT2A-MLLT6, KMT2A-USP2, KMT2A- MAML2, KMT2A-MLLT11, KMT2A-MYO1F, KMT2A-SEPT5, KMT2A-SEPT6, and KMT2A-CARS, among others. The gene fusion may be a kinase, transcription factor or epigenetic genes (a chromatin modifier) for example. Next, the method may involve applying a voltage potential to the gel, thereby trapping intact genomic DNA at one end of the gel and digesting the trapped genomic DNA using two or
S22-267 more pairs of RNA-guided endonucleases to release: a segment of the first gene a segment of the second gene (i.e., a ‘fusion partner’ for the first gene); and a segment of a gene fusion between the first and second genes. In these embodiments, the segments of the first and second genes are approximately the same size (e.g., the segments have a size difference of less than at least 100kb, less then 50kb, or less than 20kb) and the segment of the gene fusion is resolvable from the segments of the first and second genes in the gel. In these embodiments, the segments may be 100kb - 1 Mb in length and the segment pf the gene fusion is smaller or larger than the other segments by at least 50kb, at least 100kb, at least 200kb. After digestion, the method may comprise electrophoresing the segments through the gel, thereby separating the segment of segment of the gene fusion from the segments of the first and second genes. This step may be done by pulsed field electrophoresis. Some of the initial steps of the present method may be adapted from Zhou et al (BioRxiv, 2020. 10.1101/2020.10.23.349621v3), which describes method for isolating a method for isolating targets that are from 50 kb to 1 Mb (e.g., 200-500kb) in length. In some embodiments, the method may comprise eluting the segments into different fractions (e.g., 4-10 fractions) by applying a second voltage potential to the gel, wherein the second voltage potential is orthogonal to the potential used earlier in the method, wherein the segment of gene fusion is eluted into a fraction that is different to the fraction into which the segments of first and second genes are eluted. After the fractions have been collected, the flanking sequences of the first and second genes may be assayed, e.g., by quantitative PCR (e.g., Taqman) to identifying a fraction that contains the segments of the first and second genes and a fraction contains the segment of the gene fusion. If desired (e.g., to identify a breakpoint) the the eluted segment of the gene fusion may be sequenced using any suitable long range sequencing technology, e.g., nanopore sequencing (e.g., as described in Soni et al. Clin. Chem. 200753: 1996-2001, or as described by Oxford Nanopore Technologies). Nanopore sequencing is a single-molecule sequencing technology whereby a single molecule of DNA is sequenced directly as it passes through a nanopore. A nanopore is a small hole, of the order of 1 nanometer in diameter. Immersion of a nanopore in a conducting fluid and application of a potential (voltage) across it results in a slight electrical current due to conduction of ions through the nanopore. The amount of current which flows is sensitive to the size and shape of the nanopore.
S22-267 As a DNA molecule passes through a nanopore, each nucleotide on the DNA molecule obstructs the nanopore to a different degree, changing the magnitude of the current through the nanopore in different degrees. Thus, this change in the current as the DNA molecule passes through the nanopore represents a reading of the DNA sequence. Nanopore sequencing technology is disclosed in U.S. Pat. Nos. 5,795,782, 6,015,714, 6,627,067, 7,238,485 and 7,258,838 and U.S. Pat Appln Nos. 2006003171 and 20090029477. See also Greninger Genome Medicine. 20151: 99, among others. The junction of the fusion can be identified in the sequence reads. In some embodiments, the cells are blood cells from a patient that has a blood cancer associate with the gene fusion, e.g., ALL or AML, etc. In these embodiments, the results of the method may be used as a diagnostic, to measure the severity of the cancer, to monitor the disease, to determine if a treatment is working and to make treatment decisions. For example, if a gene fusion involves a kinase, then a drug that targets that kinase can be administered. In addition, the method may be multiplexed, as needed. For example, the DNA may be cleaved by two or more sets of three pairs of RNA-guided endonucleases, where each set targets a different fusion. In some embodiments, the present method may be carried out on a SageHLS device, as described at sagescienes.com and US20210062180, which is incorporated by reference in its entirety. As described in US20210062180, in some embodiments, the device may comprise molecule retention cassette for retaining molecu1es during electrophoresis, the cassette comprising: a housing; a lane configured within the housing, the lane having a first elongate edge and a second elongate edge; an elution module configured to be received in the lane to divide the lane into a first chamber and a second chamber, a first buffer reservoir positioned adjacent the first elongate edge; and a second buffer reservoir positioned adjacent the second elongate edge; wherein: a first side of the elution module facing the first chamber comprises a porous sterile filtration membrane; and a second side of the elution module facing the second chamber comprises an ultrafiltration membrane, the ultrafiltration membrane having a pore size to retain molecules during electrophoresis. The device may be used as follows for isolating and collecting target segments of target particles, the method comprising: receiving a sample in a sample well of an elution module; receiving an SDS-containing lysis buffer in a first buffer chamber, buffer chamber being
S22-267 configured along a first side of the elution module; applying a first electrophoresis voltage t grate components of e sample towards a second buffer chamber configured along a second side of the elution module, such that: target particles are immobilized in a gel segment configured along the second side of the elution module between the elution module and the second buffer chamber, and non-target particles pass through the gel segment and into the second buffer chamber; washing the first buffer chamber, the second buffer chamber, and the elution module; filling the first buffer chamber, the second buffer charmer, and the elution module with a Cas9 reaction buffer; emptying the elution module; refilling the elution module with a Cas9 enzyme mix to cleave sections of target particles immobilized in the gel segment; loading the elution module with an SDS stop solution; applying a second electrophoresis voltage to release the Cas9 froth the target particles and migrate Cas9 into the second buffer chamber; washing the first buffer chamber, the second buffer chamber, and the elution module; filling the first buffer chamber, the second buffer chamber, and the elution module with elution buffer; and applying a third electrophoresis voltage in a reverse direction to migrate the cleaved sections of the target particles from the gel segment and into the elution module.' For example, the device can be configured as a semi-automated research instrument system for extraction and enzymatic processing of extremely high molecular weight (HMW) DNA (100-2000 kb). The system uses intact cells or isolated nuclei as input samples. Input samples are loaded into an agarose gel cassettes, and chromosome length DNA is extracted from the samples by electrophoresis of SDS through the sample well compartment. SDS-coated proteins, lipids are electrophoresed away from the sample well through the central agarose gel column, but the chromosome-length DNA becomes firmly entangled and immobilized in the agarose gel wall of the sample well. The sample well can be emptied and refilled without any loss of DNA. This allows for treatment the immobilized DNA by refilling the sample well with an enzyme reaction mixture. Many commonly used DNA processing enzymes readily diffuse into the agarose, including many restriction enzymes, DNA polymerases, ligases, transposases, non-specific DNA cleavases, and S. pyogenes Cas9. After DNA processing, an additional round of size-selection electrophoresis is performed, followed by electroelution of the DNA products into a series of six buffer-filled elution modules arranged along one side of the gel separation column . The DNA processing step includes some cleavage to reduce the size of the desired
S22-267 DNA products to below 2 megabases (mb) in length DNA greater than 2 mb will remain immobilized in the sample well, unable to move during electrophoresis. Each cassette can have two physically isolated sample processing areas. The cassette may a standard 96-well plate footprint. The central agarose channel has two loading wells. Cells or nuclei are loaded in the sample well, and art SDS-based lysis reagent is loaded into the reagent well. Electrophoresis is carried out to drive the SDS through the sample well compartment where the cells or nuclei are lysed. Chromosome-sized genomic DNA becomes immobilized in the sample well wall, while other components are carried to the bottom electrode chamber along with the SDS. After DNA processing and size selection electrophoresis, the DNA products are electroeluted into an array of six elution modules positioned along the right side of the agarose channel. In theory, the amount of fusion DNA in the cell-free fraction of a patient’s bloodstream (i.e., cfDNA) should correlate with disease severity for those cancers that are associated with the fusion, e.g., a subset of non-small cell lung cancers. Thus, tracking the amount of fusion DNA over time could be used to, for example, determine if a treatment is working. Assays for accurately quantifying the amount of a particular fusion sequence in a sample are well known. For example, qPCR or Invader assay could be used. However, in the clinic, such assays are not straightforward to implement because different patients have different fusions and, even if the genes that are fused together in a patient’s cancer are known, the genes can be fused in different places. Such analyses are complicated by the fact that cfDNA is highly fragmented and, as such, samples that contain cfDNA are not amenable to analysis by some of the methods that are used to analyse samples that contain an intact genome. Thus, identifying and quantifying gene fusions in cfDNA would logically be implemented in two steps, where the first step involves sequencing a patient’s cfDNA to identify which genes are fused as well as the sequence at the junction of the fusion, and a second step that involves quantifying the amount of fusion DNA in the cfDNA (see, e.g., Harris et al, Nature Scientific Reports 20166: 29831). The problem with this approach is that the latter step is patient-specific in the sense that most reliable quantification methods (e.g., qPCR or Invader, for example) only work if primers that flank the fusion junction are used. Thus, in order to implement the conventional workflow, one would have to carefully select a custom primer pair for each patient being tested, before quantifying
S22-267 the amount of fusion DNA. This is problematic because performing patient-specific assays using, e.g., custom sets of primers, is time consuming, inefficient and creates a significant potential for human error. Therefore, such assays should be avoided in the clinic, where robust, high-throughput methods are required. EXAMPLES Below are examples of specific embodiments for carrying out the present invention. The examples are offered for illustrative purposes only, and are not intended to limit the scope of the present invention in any way. Efforts have been made to ensure accuracy with respect to numbers used (e.g., amounts, temperatures, etc.), but some experimental error and deviation should, of course, be allowed for. The present method aims to overcome some of the challenges of DNA-based approaches in fusion gene sequencing using nanopore sequencing combined with a targeted enrichment method. There already exist targeted nanopore sequencing methods, such as nanopore Cas9- targeted sequencing (nCATS). However, because these targeted methods start sequencing only from the ends, the long and fragile strands can easily break apart before the middle of the fragment is sequenced. Therefore, the existing targeted nanopore sequencing methods are often only useful only when a part of fusion gene (e.g. a region including the breakpoint) is targeted. When targeting the entire fusion gene is desired, one needs an alternative strategy. Sequencing the entire gene fusion provide information such as, fully phased germline and somatic variants and CpG island DNA methylation of the gene fusion. These information cannot be obtained when targeting a small part of the gene fusion including the breakpoint. The general strategy is illustrated in Figs. 1A and 1B. A high molecular weight (HMW) DNA enrichment based on Cas9-assisted targeting of chromosome segments (CATCH) was used. The targeted fragments are then eluted and quantified by qPCR. The method can enrich and extract target regions that are as large as 1 Mb. The enriched target regions can then be sequenced and analyzed for the purpose of characterizing gene fusions. The method combines targeted in vitro genome cutting and pulse- field electrophoresis to sequence targets that were previously not targetable in a single HMW molecule. The resulting fragments are then sequenced without amplification using an nanopore
S22-267 sequencer to resolve large and complex rearrangements. This method has been used to sequence 22q11.2 and 16q11.2 CNV rearrangements that are prominent risk factors for neurological disorders such as schizophrenia and autism. The present study aims to use a modified version of CTLR-Seq to investigate blood samples from leukemia (AML or ALL) samples. These samples have been previously tested with clinical cytogenetic methods and the method described in this study characterize gene fusions between KMT2A and the aforementioned frequent partner genes. Particularly, the qPCR step was used for target quantification also as simple detection method for presence of a rearrangement event. When a size shift is detected from clinical samples, additional ONT sequencing data can be used to further characterize the event. Physically separated fusions and the wild type genes are sequenced (Fig. 1B). The characterization includes breakpoint identification with base pair resolution, entirely phased variants using end-to-end contig assembly of wildtype and fusion alleles, and also the epigenetic changes. Particularly with the epigenetic information, one should be able to predict the expression activity of the fusion and wild type genes, which will be validated by RNA sequencing and/or traditional RT-PCR-based gene expression analysis. This information can be obtained only when wild type and fusion genes are binned and the entire gene is targeted. Methods Guide RNA design: To design 20-bp target sequences of gRNAs, all 20-bp sequences (20-mers) in the region of the target cut sites that occur directly adjacent to a Cas9 binding motif are considered. Only those candidate guide sequences that are unique in the human genome and that have no other alignments with less than three mismatches were retained. The GRCh38 genome was used to identify off-target CRISPR sites. An off-target gRNA sequence is defined as a 20-mers with three or four mismatches where the mismatches are located within the first 10 bases (the half distant from PAM domain). This high level of mismatches enables the gRNA to anneal across many different positions in the genome. Eliminating them reduces the off-target annealing. The candidate designs do not have any off-target sites near or inside the targets (Fig. 5). Also, a candidate design has no pair of off-targets neighboring each other within a range and
S22-267 generating an offtarget fragment that can co-migrate with target fragments in the electrophoretic separation. Some candidate gRNA sequences are included even if they have off-targets elsewhere in the genome. The overall number of off-targets are considered as a parameter when assessing the design. Fig. 9 provides the final gRNA designs with information about their potential off-targets. Cell preparation: Peripheral blood mononuclear cells (PBMCs) were collected from patients’ blood using a Ficoll (Sigma-Aldrich, St. Louis, MO) separation, and stored frozen in 90% fetal calf serum (FCS) and 10% dimethyl sulfoxide (DMSO) until being used for enrichment of high molecular weight target DNA. A mammalian white blood cell (WBC) suspension kit (Sage Science, Beverly, MA) was used to prepare cells for sample loading. Briefly, frozen cell stock was thawed in a 30°C bead bath. To remove residual red blood cells (RBCs), PBMCs were incubated with the 1X RBC lysis buffer (Sage Science) at 4°C for 5 min. After the incubation, WBCs were washed twice with the 1X RBC lysis buffer (Sage Science) using centrifugation at >2,000g. After the second wash, the pellet was resuspended in 280 µL of the resuspension buffer (Sage Science). The cell suspension was quantified with genomic DNA contents using the Qubit lysis buffer (Sage Science) and the Qubit 1x dsDNS high sensitivity assay (Thermo Fischer Scientififc, Waltham, MA) following the manufacturers’ guides. The cell suspension was diluted with the resuspension buffer (Sage Science) so that the concentration of genomic DNA was approximately 100 ng/µL, with which a 70-µL aliquot included 1 million cells. Custom-designed synthetic CRISPR RNAs (crRNAs) and the trans-activating CRISPR RNA (tracrRNA) were purchased from Integrated DNA Technologies (Coralville, IA, USA). Per target, a pair of crRNAs were used to excise the 300 kb or 1 Mb fragments, one crRNA targeting the 5’ flanking region and the other targeting the 3’ flanking region. Up to six pairs of crRNAs were multiplexed (i.e. up to six targets in a single assay). When preparing guide RNA (gRNA)-Cas9 assembly for four sample runs (the maximum capacity of Sage HLS machine), 800 fmol of pooled crRNAs was annealed to 520 fmol of tracrRNA in 44 µL of 1X duplex buffer (Integrated DNA Technologies) at 95°C for 10 minutes, followed by cooling at room temperature for 5 minutes. The annealed gRNA mix was assembled with 160 fmol of Cas9 endonuclease in 80 µL of 1X enzyme buffer (Sage Science) at 37°C for 10 minutes.
S22-267 HLS-CATCH: The DNA from 70 µL of the cell suspension was extracted by using the workflow “CATCH 100-300 kb extr3hr inj4m80v sep4hr.shflow” or “CATCH 1000 kb extr3hr inj4m80v sep8hr.shflow” on the Sage HLS instrument (Sage Science). Intact PBMCs (∼1.0 million) were loaded into the sample well, and a lysis buffer containing 3% sodium dodecyl sulphate (SDS) was loaded into a reagent well upstream of the sample well. Electrophoresis was carried out for 3 hour, driving the SDS through the sample well to lyse the cells. The SDS, proteins, and membrane components were carried away from the sample well to the bottom electrode chamber. The large genomic DNA (>2 Mb) was embedded in the agarose wall of the sample well during the extraction electrophoresis. At the end of the extraction stage, the electrophoresis was halted and the reagent well was emptied and refilled with the Cas9–gRNA reaction mixture. The reaction mixture was diluted with 3X volume of 1X enzyme buffer (Sage Sciecne) prior to loading. Electrophoresis was carried out for 4 min to drive the Cas9 enzyme into contact with the genomic DNA inside the sample well wall. Then, the electrophoresis was stopped, followed by Cas9 digestion of the genomic DNA at room temperature for 30 minutes. After Cas9 digestion, the reagent well was emptied and refilled with the SDS lysis reagent, and size selection electrophoresis was carried out for 4 hr. The electrophoresis process used a pulsed-field waveform, designed for optimal resolution of DNA fragments 300 kb or 1 Mb in size. After size separation, a second orthogonal set of electrodes was used to elute the size- separated DNA into a series of elution modules located along one side of the gel column. The DNA was moved from the elution modules after 12 hours after run termination. Quantitation of targeted high molecular weight DNA: The DNA from each elution module was prepared with a 1:5 dilution in 33% bCD. TaqMan qPCR Copy Number assays (Thermo Fisher Scientific) were used to measure the DNA concentration after extraction. The 10 µL reaction included 2 µL of diluted target DNA sample, 1X TaqMan Genotyping Mix, 1X TaqMan RNaseP reference and 1X TaqMan assay for the specific targets. The samples were denatured at 95°C for 10 min, followed by 50 cycles of 15 s at 95°C and 1 min at 60°C. For a relative quantification (i.e. target versus RNaseP reference), a modified ΔΔCt method [20] was used. One ng of NA18507 genomic DNA was used as a control. For an estimation of absolute copy number, it was assumed that 290 genome copies were in 1 ng of the control sample. Library preparation and nanopore sequencing: When using multiple sample runs for
S22-267 sequencing, the enriched targets from the elution modules were pooled in accordance with the results of the qPCR. The pooled sample DNA was first purified using 0.45X Ampure XP beads with a gentle liquid handling to minimize physical DNA shearing. The beads were washed twice with 80% ethanol, and then cleaned DNA was eluted in 17 µL of 10 mM Tris for 1 hr at 37°C with 400 rpm shaking, and then at 4°C overnight. The Qubit dsDNA HS assay (Thermo Fisher Scientific) was used to measure the yield of this purification, which were generally 50-60%. A SQK-RAD004 kit (Oxford Nanopore Technologies, Littlemore, Oxford, UK) was used for the library preparation with modifications. Frist, FRA was prepared in 1:20, 1:30, or 1:40 dilutions with FRA dilution buffer from SQK-ULK001 (Oxford Nanopore Technologies). The purified CATCH product were combined with 5 µL of a FRA dilution for a total volume of 20 µL. The resulting mixture was incubated at 30°C for 1.5 minutes, and then at 80°C for 1.5 minutes. Following the incubation period, 20 µL of the tagmented CATCH product was combined with 1 µL of RAP and incubated at room temperature for 5 minutes. The library was loaded to PromethION flowcell (R9.4.1) after combined with water, loading beads, and sequencing buffer following the manufacturer’s instruction. The library was sequenced using an Oxford Nanopore Technologies PromethION 24 sequencing machine (Oxford Nanopore Technologies). The sequencing was performed for 72 hr with a high-accuracy base calling model (‘dna_r9.4.1_450bps_hac_prom’) and pore scanning in every 1.5 hr. The base calls generated during sequencing was used only for real-time monitoring of sequencing run quality. Base calling, alignment, and assembly: For base calling, Guppy (v6.1.1, Oxford Nanopore Technologies) was used with a super-accuracy model (‘dna_r9.4.1_450bps_sup_prom’). Using the fast5 files generated by the Nanopore sequencing as input, fastq files were generated. For alignment, Minimap2 (v2.17) [21] was used with a preset for Nanopore sequencing reads (‘map-ont’). The sequencing reads were aligned to the GRCh38 human genome. To get on-target mean coverage, ‘bedtools coverage’ (v2.25) [22] was used. To identify the on-target reads for local sequence assemblies, ‘samtools view’ (v.1.10) [23] was used with the ‘-L’ option. The bed file used for the coverage analysis and identification of on-target reads is provided as a supplementary table. For sequence assembly, Flye (v2.9.1- b1780) [24] was used with a preset for Nanopore sequencing reads base-called with super- accuracy models (‘nano-hq’). The on-target reads as input was used and the estimation of
S22-267 genome size was calculated with possible scenarios based on the results of the qPCR assays. Methylation analysis: Megalodon (v2.5.0, Oxford Nanopore Technologies) was used to call CpG methylations from the Nanopore sequence reads. Fast5 files were used as input, and the GRCh38 human genome was used as the reference. CpG methylation was called with two different models: i) 5-methylcytosine (‘dna_r9.4.1_450bps_modbases_5mc_cg_sup_prom’) and ii) 5-hydroxymethylcytosine or 5-methylcytosine (‘dna_r9.4.1_450bps_modbases _5hmc_5mc_cg_sup_prom’). Single cell multi-omics assay: For single cell multi-omics assay, the Chromium Next GEM Single Cell Multiome ATAC and Gene Expression Reagent kit (10X Genomics, Pleasanton, CA) was used. The ATAC and RNA seq libraries were prepared according to the manufacturer’s guide, and sequenced using NovaSeq 6000 with 50:8:24:49 (ATAC) or 28:10:10:90 (RNA) paired end format. Using Cell Ranger ARC (v2.0.0, 10X Genomics), the raw sequencing data was then demultiplexed, aligned, and initially analyzed to produce matrix tables for ATAC peaks/fragments and the gene expression. Finally, Signac package [25] was used to generate Seurat objects and to cluster cells based on the multi-omics features. Pathway analysis: A pathway enrichment analysis was conducted based on cancer hallmark signatures downloaded from MSigDB (v.6.2) [26, 27]. The pathway enrichment scores of clusters were calculated using Gene Set Variation Analysis (GSVA) (v.1.32.0) [26] with parameters “kcdf = Poisson”, “mx.diff=TRUE” and “min.sz=10”. An ANOVA test was performed to compare the enrichment scores among the clusters. The significant pathways of each cluster were decided as the adjusted P-value < 0.05 after FDR. Results Study overview This study aimed to comprehensively analyze the genetic and epigenetic characteristics of KMT2A gene fusions in acute leukemia. The approach involved a targeted analysis of the entire gene regions, including exons and introns, of both KMT2A and its partner gene. This allowed the detection sequence variations as well as fully phased simple and complex structural variations. In total, seven patient samples were analyzed, consisting of five cases of AML and two cases of B-ALL. The mega haplotypes associated with the fusions were cataloged to gain insights into their genetic makeup.
S22-267 In addition, the promoter CpG methylation of the fusion and wild type genes was assessed to assess their expression activity. Bodified bases, such as 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) were considered during the base calling process from the raw sequencing data. By examining the methylation levels, the expression patterns of these genes was preducted. To gain a deeper understanding of the heterogeneity within leukemia cells, a single-cell multi-ome assay was used. This enables one to identify distinct clusters of leukemia cells based on their transcriptome and genome-wide chromatin accessibility profiles. In particular, changes at the individual gene level were examined, specifically examining the expression levels and promoter chromatin accessibility of the KMT2A gene within different cell clusters. The changes were cross-referenced with the methylation level alterations identified through a previous targeted approach. To explore the broader genomic signatures, the set of genes that exhibited differential expression in leukemia cell clusters compared to normal cell clusters within the same sample was analyzed. Additionally, the enriched pathways associated with these genes were investigated. CRISPR targeting and fusion detection by TaqMan assays An approach was to target KMT2A gene fusions involving partner genes such as AFF1, MLLT3, MLLT10, ELL, and AFDN. In this study, three multiplex CATCH assays were tested (Assays 1, 2, and 3) designed to enrich 300-kb genomic segments, including KMT2A gene fusion targets. The multiplex assays targeted either two or six genes simultaneously, resulting in the extraction of genomic DNA with enriched two or six 300-kb target fragments from cells without fusion events (e.g., GM18507). Initially, Assay 1 was designed to target only two genes (KMT2A and a partner gene) and later expanded it to include more partner genes in Assays 2 and 3. Assay 2 and 3 differed in the gRNA pair targeting the KMT2A gene. One gRNA pair positioned KMT2A in the middle of the resulting fragment, while the other targeted a shifted location toward the 3’-flanking region of KMT2A (Fig. 6). The shifted KMT2A location improved the separation resolution between wild type and fusion fragments in some samples. To detect gene fusion events with size shifts, TaqMan DNA copy number assays were
S22-267 performed to quantify the absolute amount of target HMW DNA molecules in the elution modules. These modules collected CRISPR-excised fragments in different size ranges (Fig. 1B). To enable straightforward detection, all target fragments had similar sizes when no rearrangement events occurred, and thus were collected in the same elution module. When a gene was shorter than the other genes in a multiplex assay, the target fragment included the gene along with flanking regions to match the size of the other gene targets. For one AML sample (SU710) with a known gene fusion between KMT2A and MLLT3, four TaqMan assays were employed (Fig. 2A). Each gene was targeted by two TaqMan assays, which focused on regions near the inside edge of the CATCH product to detect size shifts. When only the wild type gene was present, the assays generated the same enrichment patterns across all the TaqMan assays. However, when both wild type and fusion alleles were present, two of the TaqMan assays showed a different pattern because the fusion gene was shorter than the wild type genes (Fig. 2B). Even in cases where electrophoresis separation was not reproducible, the size shift could still be detected by comparing the two assay patterns. Therefore, a minimum of two TaqMan assays were required to detect the fusion: one targeting a region that exists only in the wild type and the other targeting a region that exists in both the wild type and fusion. Additional TaqMan assays can be added to further confirm fusion events and detect more complex events, such as chimeric fusions. Fully phased variants in fusion gene assembly In this study, the nanopore sequencing library preparation method was modified for the CTRL-seq approach. By comparing ligation-based and tagmentation-based library preparations using a human cell line sample, it was observed that the tagmentation-based method demonstrated superior uniformity in sequencing coverage. It eliminated the need for post- adapter ligation cleanup and allowed direct loading of the library. Through optimization of the tagmentation enzyme concentration and enzymatic reaction duration, long sequence reads with N50 values comparable to the ligation-based method while maintaining uniform coverage were obtained. With the optimized tagmentation-based method, nanopore sequencing reads ranging from 15kb to 35kb in N50 were generated and achieved the longest on-target reads, which generally exceeded 100kb for most samples. These uniformly distributed, long on-target reads were utilized for sequence assembly.
S22-267 For all the samples full length assembly contigs were obtained. Importantly, those assemblies are not collapsed version of multiple haplotypes, but the reconstructed sequence of only the fusion itself. This was possible because the DNA molecules were binned experimentally with an electrophoretic separation and then separately sequenced. No prior variant analysis and binning for sequence reads were required. Fig. 3A shows an example of experimentally binned sequence reads (i.e. two separate sequencing runs) aligned to human genome. The fusion-participating genes existed as wild type in one of the runs, but as a fusions gene in the other. The alignment showed clear break points only in one of the sequencings. The observed variants were compared to the assembly made from each sequencing, and confirmed that the assembly process accurately selected the variant allele predominantly observed in the sequence pile up (Fig. 3B). With the efficient targeted long-read sequencing, one can phase not only variants proximal to fusion breakpoints, but also variants distant to the breakpoints, which was not possible with previous sequencing approaches. Differential DNA methylation between wild-type and fusion genes An analysis of the DNA methylation status in both wild-type and fusion genes was conducted, focusing on the CpG Island of the KMT2A gene promoter (Fig. 4A). A base-calling model was employed that considers both 5mC and 5hmC. One aim was to identify any differential methylation patterns between the wild-type and fusion genes. Overall, no significant difference in 5mC levels between the two gene types was found. Within the promoter CpG island, a depletion of 5mC was observed in both the wild-type and fusion genes (an example shown in Fig. 4B). In contrast, the 5hmC levels exhibited variation between the wild-type and fusion genes across different samples. Inside the CpG island, there are two exons: exon 1 and an alternative exon adjacent to the first exon. The changes in 5hmC levels were not uniform across the CpG island but rather localized around these exons. The direction of changes varied among the samples. Some samples exhibited a decrease in 5hmC levels (SU659 and SU714), while others showed an increase (SU710, SU847, and SU968). In contrast to the samples with decreased fusion 5hmC levels, the samples with increased fusion 5hmC levels displayed variability in the location of these changes relative to the exons. For SU710 and SU847, the increased 5hmC levels were observed only before exon 1.
However, SU968 exhibited two peaks in fusion 5hmC levels, with each peak located before one of the exons. Identification of leukemia clusters with single-cell multi-ome assay In addition, a single-cell multi-ome assay on four samples (SU659, SU710, SU847, and SU968) was conducted, which provided transcriptome and genome-wide chromatin accessibility profiles at the single-cell level. Utilizing both sets of profiles, the Signac package [25] was used to generate UMAP cell clusters. These clusters were then annotated using gene markers for B- ALL and AML, as well as general cell markers for various blood cell types including T and B cells [28]. Across all four samples, leukemia cell clusters were identified that aligned with the respective disease type (B-ALL or AML). Each sample exhibited between four to seven leukemia clusters, each characterized by distinct cell markers. While most clusters were in close proximity to one another in the UMAP spaces, an exceptional case was observed in SU968, where one AML cluster was notably distant from the other AML clusters within the same sample. Additionally, normal blood cell clusters (B and T cells) were present in all samples, serving as internal controls for analyzing gene expression and chromatin accessibility on an individual basis, as well as on a transcriptome basis. KMT2A promoter chromatin accessibility and expression of cell clusters Initially, the investigation focused on analyzing the single-cell assay to examine potential epigenetic modifications associated with the fusion gene and their impact on gene expression. It is important to note that leukemia cells can harbor both fusion genes and wild- type genes, including both KMT2A wild type and KMT2A-derived fusion genes. As a result, the single-cell profile represented a composite signal from both types, rather than separate signals for wild-type or fusion genes. Despite this complexity, distinct clusters of leukemia cells were identified, where variations in ATAC signal and the corresponding expression patterns were observed. These findings suggest that specific leukemia cell populations exhibited differential ATAC signal, indicating potential regulatory changes, accompanied by corresponding alterations in gene expression. One interesting observation pertains to the consistency exhibited by both AML samples.
These samples showed consistent changes in terms of i) the structure of cell clusters, ii) ATAC coverage alterations, and iii) the corresponding expression changes. Notably, the ATAC coverage spikes, indicative of open chromatin regions, were located before the first two exons (Exon 1 and the alternative exon adjacent to the first exon). Generally, there was a decrease in coverage in those peaks within AML clusters. However, AML clusters in close proximity to each other in the UMAP space (AML1-3 clusters in SU710 and AML2-7 clusters in SU968) exhibited a more pronounced decrease compared to the distant clusters (AML4 cluster in SU710 and AML1 cluster in SU968) within each AML sample. Another intriguing finding was that the ATAC coverage peaks preceding the alternative exon showed more significant differences between subsets of AML clusters and normal B/T cell clusters compared to the differences observed in cell clusters of B-ALL samples. Both leukemia and normal cell clusters in B-ALL samples displayed prominent ATAC peaks before the alternative exon, whereas in AML samples, the leukemia cell clusters did not exhibit such pronounced coverage peaks before the alternative exon. This suggests different isoform structures in different types of leukemia. In contrast to the AML samples, B-ALL samples demonstrated inconsistencies in terms of the direction of changes in KMT2A promoter accessibility and gene expression. One sample (SU659) featured a B-ALL cluster showing increased chromatin accessibility and up-regulated gene expression. Conversely, the other sample (SU847) exhibited a B-ALL cluster with decreased chromatin accessibility and down-regulated gene expression. All of these observations aligned with the quantified 5hmC level of the fusion gene promoter using a targeted approach. Pathway enrichment analysis Using the hallmark gene sets [27], a pathway enrichment analysis was performed to determine which enriched pathways were shared or not shared between leukemia clusters with and without epigenetic and transcriptional modulations in KMT2A. For all leukemia cell clusters, significant enrichment and depletion against the internal control cell clusters (normal B and T cells) was filtered. Then, the clusters with changes in KMT2A were distinguished from the others that didn't have the KMT2A changes. For the B-ALL samples (SU659 and SU847), each had one leukemia cluster that showed epigenetic and transcriptional changes in KMT2A. However, when compared to the other
S22-267 leukemia clusters that did not share the KMT2A change, there was no uniquely enriched pathway for them. Instead, they shared enriched pathways with some of the leukemia clusters, suggesting similar cellular physiology. Interestingly, from both B-ALL samples, it was observed leukemia cell clusters that were unique among all leukemia cell clusters in terms of enriched pathways. Although from two different samples, they shared the most enriched pathways, such as "G2M_CHECKPOINT" and "E2F TARGETS." For the AML samples (SU710 and SU968), each had three or six leukemia clusters that showed epigenetic and transcriptional changes in KMT2A. Unlike the clusters with KMT2A changes in the B-ALL samples, they shared most of their enriched pathways and were distinguished from the other ones that did not have the changes in KMT2A. Each had a leukemia cell cluster that showed no epigenetic and transcriptional changes in KMT2A and also had a set of uniquely enriched pathways. From SU710, the unique leukemia cell cluster had enrichment in "ANGIOGENESIS," "ADIPOGENESIS," and "XENOBIOTIC_METABOLISM." From SU968, the unique leukemia cell cluster had enrichment in "HEDGEHOG_SIGNALING," "MYC_TARGETS_V1," and "MYC_TARGETS_V2." Shifted KMT2A location in target HMW fragment improved fusion separation A multiplex assay was designed, locating the target genes in the middle of the target fragments (e.g. Assay 2). However, in some samples, the gene fusion and the wild type fragments had similar length, and the difference was less than 50 kb. The HMW DNA fragments with similar length were eluted in a same elution well, and therefore the simple qPCR method was not able to detect the fusion. For example, the initially designed CATCH assay was not able to separate fusion fragments, whose lengths were 330 kb and 270 kb, from SU659 (sample with a KMT2A-AFF1 fusion). Another pair of KMT2A gRNAs was designed to be compatible with other existing gRNAs to locate the gene at a shifted location toward 5’-end. Using this newer multiplex assay (Assay 3), SU659 was analyzed again, and confirmed additional enrichment of targets in elution modules neighboring the elution module that collected wild type fragments. (Fig. 7). In addition to 300-kb wild type targets, 400-kb and 200-kb rearranged DNA molecules were separated in the electrophoretic separation step, and collected at the neighboring elution modules. These two additional enrichments suggested two fusion genes resulted from a reciprocal translocation.
S22-267 Tagmentation-based ONT sequencing library preparation The ONT sequencing library preparation was modified in CTRL-seq method. Using a human cell line sample (GM18507), ligation- (CTRL-seq) and tagmentation-based (this study) library preparations were compared. For the 300-kb targets (Assay 2), both ligation- and tagmentation-based methods generated reads covering the entire region with reasonably good coverage although better uniformly was apparent in the targmentation-based method (Fig. 8). Therefore, end-to-end assembly contigs were able to be made from both the sequencings. However, when longer than 300kb, an end-to-end assembly may be challenging because of poorly covered middle part of target. For example, if a rearrangement event increases the size of CRISPR-targeted products, the issue of uniformity may obscure the rearrangement. To test a HMW target longer than 300 kb, CATCH assay for KMT2A gene were designed, and was able to generate end-to-end assembly with targeted ONT sequencing (Fig. 8). In addition to uniform sequencing coverage, tagmentation-based method appeared to be superior in terms of efficiency. When using ligation for introducing adapters, the library should be purified before loading to sequencer. Some degree of fragmentation may occur, negatively impacting the uniformity of sequencing coverage. In the version of tagmentation-based nanopore library preparation that was used in this study, no post-adapter cleanup was required and the library could be loaded directly after the reaction. DNA shearing after addition of adapter could be minimized and the library inserts were generated as randomly fragmented targets. The distribution of insert sizes and the length of the sequence reads could be optimized by selecting concentration of transposase enzyme and the duration of the enzymatic reaction. The goal of optimization was achieving the longest sequence reads while keeping the yield comparable to the ligation-based method. 1:5, 1:20, and 1:40 dilution of original tagmentation enzyme mix were tested with different clinical samples, and N50 was achieved, comparable to ligation-based method while keeping the uniform sequencing coverage throughout the target region. References 1. Bodmer, W., J.H. Bielas, and R.A. Beckman, Genetic instability is not a requirement for tumor development. Cancer Res, 2008. 68(10): p. 3558-60; discussion 3560-1.
2. Kou, F., et al., Chromosome Abnormalities: New Insights into Their Clinical Significance in Cancer. Mol Ther Oncolytics, 2020. 17: p. 562-570. 3. Gao, Q., et al., Driver Fusions and Their Implications in the Development and Treatment of Human Cancers. Cell Rep, 2018. 23(1): p. 227-238 e3. 4. Puiggros, A., G. Blanco, and B. Espinet, Genetic abnormalities in chronic lymphocytic leukemia: where we are and where we go. Biomed Res Int, 2014. 2014: p. 435983. 5. Glassman, A.B., Chromosomal abnormalities in acute leukemias. Clin Lab Med, 2000. 20(1): p. 39-48. 6. Powers, M.P., The ever-changing world of gene fusions in cancer: a secondary gene fusion and progression. Oncogene, 2019. 38(47): p. 7197-7199. 7. Engvall, M., et al., Detection of leukemia gene fusions by targeted RNA-sequencing in routine diagnostics. BMC Med Genomics, 2020. 13(1): p. 106. 8. Hess, J.L., MLL: a histone methyltransferase disrupted in leukemia. Trends Mol Med, 2004. 10(10): p. 500-7. 9. Gestrich, C.K., et al., Reciprocal ATP5L-KMT2A gene fusion in a paediatric B lymphoblastic leukaemia/lymphoma (B-ALL) patient. Br J Haematol, 2020. 191(2): p. e61-e64. 10. Yoshida, A., et al., KMT2A (MLL) fusions in aggressive sarcomas in young adults. Histopathology, 2019. 75(4): p. 508-516. 11. Zerkalenkova, E., et al., BTK, NUTM2A, and PRPF19 Are Novel KMT2A Partner Genes in Childhood Acute Leukemia. Biomedicines, 2021. 9(8). 12. Ney Garcia, D.R., et al., Molecular characterization of KMT2A fusion partner genes in 13 cases of pediatric leukemia with complex or cryptic karyotypes. Hematol Oncol, 2017. 35(4): p. 760-768. 13. Bataller, A., et al., KMT2A-CBL rearrangements in acute leukemias: clinical characteristics and genetic breakpoints. Blood Adv, 2021. 5(24): p. 5617-5620. 14. Meyer, C., et al., Human MLL/KMT2A gene exhibits a second breakpoint cluster region for recurrent MLL-USP2 fusions. Leukemia, 2019. 33(9): p. 2306-2340. 15. Markey, F.B., et al., Fusion FISH imaging: single-molecule detection of gene fusion transcripts in situ. PLoS One, 2014. 9(3): p. e93488. 16. Lyu, X., et al., Detection of 22 common leukemic fusion genes using a single-step
multiplex qRT-PCR-based assay. Diagn Pathol, 2017. 12(1): p. 55. 17. Zhou, B., Shin, G., Greer, S.U., Vervoort, L., Huang, Y., Pattni, R., Ho, M., Wong, W.H., Vermeesch, J.R., Ji, H.P., Urban, A.E., Complete and haplotype-specific sequence assembly of segmental duplicationmediated genome rearrangements using CRISPR-targeted ultra-long read sequencing (CTLR-Seq). BioRxiv, 2020. https://doi.org/10.1101/2020.10.23.349621. 18. Kerbs, P., et al., Fusion gene detection by RNA-sequencing complements diagnostics of acute myeloid leukemia and identifies recurring NRIP1-MIR99AHG rearrangements. Haematologica, 2022. 107(1): p. 100-111. 19. Gilpatrick, T., et al., Targeted nanopore sequencing with Cas9-guided adapter ligation. Nat Biotechnol, 2020. 38(4): p. 433-438. 20. Livak, K.J. and T.D. Schmittgen, Analysis of relative gene expression data using real- time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods, 2001. 25(4): p. 402-8. 21. Li, H., Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 2018. 34(18): p. 3094-3100. 22. Quinlan, A.R. and I.M. Hall, BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 2010. 26(6): p. 841-2. 23. Li, H., et al., The Sequence Alignment/Map format and SAMtools. Bioinformatics, 2009. 25(16): p. 2078-9. 24. Kolmogorov, M., et al., Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol, 2019. 37(5): p. 540-546. 25. Stuart, T., et al., Single-cell chromatin state analysis with Signac. Nat Methods, 2021. 18(11): p. 1333-1341. 26. Hanzelmann, S., R. Castelo, and J. Guinney, GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics, 2013. 14: p. 7. 27. Liberzon, A., et al., The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst, 2015. 1(6): p. 417-425. 28. Khabirova, E., et al., Single-cell transcriptomics reveals a distinct developmental state of KMT2A-rearranged infant B-cell acute lymphoblastic leukemia. Nat Med, 2022. 28(4): p. 743- 751.
Claims
S22-267 CLAIMS What is claimed is: 1. A method comprising: (a) lysing cells that have a fusion between a first gene and a second gene at an end of an electrophoresis gel to release the genomic DNA from the cells; (b) applying a voltage potential to the gel, thereby trapping intact genomic DNA at one end of the gel; (c) digesting the trapped genomic DNA using two or more pairs of RNA-guided endonucleases to release: (i) a segment of the first gene; (ii) a segment of the second gene; and (iii) a segment of a gene fusion between the first and second genes; wherein the segments of the first and second genes are approximately the same size and the segment of the gene fusion is resolvable from the segments of the first and second genes in the gel; (d) electrophoresing the segments of (i), (ii) and (iii) through the gel, thereby separating the segment of (iii) from the segments of (i) and (ii); (e) eluting the segments into different fractions by applying a second voltage potential to the gel, wherein the second voltage potential is orthogonal to the potential of (a), wherein the segment of (iii) is eluted into a fraction that is different to the fraction into which the segments of (i) and (ii) are eluted; and (f) assaying for the flanking sequences of the first and second genes in the fractions collected in (e), thereby identifying a fraction that contains the segments of the first and second genes and a fraction contains the segment of the gene fusion. 2. The method of claim 1, wherein the assay is done by a quantitative PCR 3. The method of claim 2, wherein the assay is done by Taqman.
S22-267 4. The method of any prior claim, wherein the electrophoresing of (d) is pulsed field electrophoresis. 5. The method of any of claims 1-4, wherein the segments are 100kb - 1 Mb in length. 6. The method of any of claims 1-5, wherein the segments of (i) and (ii) have a size difference of less than at least 100kb. 7. The method of any of claims 1-6, wherein the segments of (i) and (ii) have a size difference of less than 20kb. 8. The method of any prior claim, wherein the segment of (iii) is smaller or larger than the segments of (i) and (ii) by at least 50kb. 9. The method of any prior claim, wherein the segment of (iii) is smaller or larger than the segments of (i) and (ii) by at least 200kb. 10. The method of any prior claim, wherein nucleic acid in the gel is eluted into 4-10 fractions in step (e). 11. The method of any prior claim, wherein the cells of (a) are blood cells, cultured cells, or a dissociated tissue cell suspension. 12. The method of any prior claim, wherein the cells are blood cells from a patient that has a blood cancer associated with the gene fusion. 13. The method of any claim 12, wherein the patient has ALL or AML. 14. The method of any prior claim, further comprising sequencing the eluted segment of the gene fusion using nanopore sequencing.
S22-267 15. The method of claim 14, further comprising identifying a breakpoint in the fusion.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263414889P | 2022-10-10 | 2022-10-10 | |
US63/414,889 | 2022-10-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024081596A1 true WO2024081596A1 (en) | 2024-04-18 |
Family
ID=90670264
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2023/076391 WO2024081596A1 (en) | 2022-10-10 | 2023-10-09 | Identification and characterization of gene fusions by crispr-targeted nanopore sequencing |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024081596A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140090113A1 (en) * | 2012-09-07 | 2014-03-27 | Dow Agrosciences Llc | Engineered transgene integration platform (etip) for gene targeting and trait stacking |
US20180135104A1 (en) * | 2002-12-04 | 2018-05-17 | Applied Biosystems, Llc | Multiplex amplification of polynucleotides |
US20210207122A1 (en) * | 2015-11-20 | 2021-07-08 | Sage Science, Inc. | Preparative electrophoretic method for targeted purification of genomic dna fragments |
-
2023
- 2023-10-09 WO PCT/US2023/076391 patent/WO2024081596A1/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180135104A1 (en) * | 2002-12-04 | 2018-05-17 | Applied Biosystems, Llc | Multiplex amplification of polynucleotides |
US20140090113A1 (en) * | 2012-09-07 | 2014-03-27 | Dow Agrosciences Llc | Engineered transgene integration platform (etip) for gene targeting and trait stacking |
US20210207122A1 (en) * | 2015-11-20 | 2021-07-08 | Sage Science, Inc. | Preparative electrophoretic method for targeted purification of genomic dna fragments |
Non-Patent Citations (1)
Title |
---|
BO ZHOU: "Complete and haplotype-specific sequence assembly of segmental duplicationmediated genome rearrangements using targeted CRISPR-targeted ultra-long read sequencing (CTLR-Seq)", BIORXIV, 1 January 2020 (2020-01-01), XP093163524, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/2020.10.23.349621v2.full.pdf> * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2955382C (en) | Polynucleotide enrichment using crispr-cas systems | |
McCaffrey et al. | High-throughput single-molecule telomere characterization | |
JP7379418B2 (en) | Deep sequencing profiling of tumors | |
Minervini et al. | Mutational analysis in BCR-ABL1 positive leukemia by deep sequencing based on nanopore MinION technology | |
WO2018094031A1 (en) | Multimodal assay for detecting nucleic acid aberrations | |
JP2009519710A (en) | Functional arrays for high-throughput characterization of gene expression regulatory elements | |
Shibata et al. | Detection of DNA fusion junctions for BCR-ABL translocations by Anchored ChromPET | |
Denholtz et al. | Upon microbial challenge, human neutrophils undergo rapid changes in nuclear architecture and chromatin folding to orchestrate an immediate inflammatory gene program | |
Solé et al. | The use of circRNAs as biomarkers of cancer | |
Véronèse et al. | Contribution of MLPA to routine diagnostic testing of recurrent genomic aberrations in chronic lymphocytic leukemia | |
US20210062180A1 (en) | Semi-automated research instrument system | |
CN1982477A (en) | Methods and compositions for assaying mutations and/or large scale alterations in nucleic acids | |
WO2024081596A1 (en) | Identification and characterization of gene fusions by crispr-targeted nanopore sequencing | |
Peterson et al. | Use of mate-pair sequencing to characterize a complex cryptic BCR/ABL1 rearrangement observed in a newly diagnosed case of chronic myeloid leukemia | |
Mitschka et al. | Generation of 3′ UTR knockout cell lines by CRISPR/Cas9-mediated genome editing | |
Sun et al. | Single-cell multi-omics sequencing and its application in tumor heterogeneity | |
Fernandez et al. | A distinct epigenetic program underlies the 1; 7 translocation in myelodysplastic syndromes | |
US20220127601A1 (en) | Method of determining the origin of nucleic acids in a mixed sample | |
Helmsauer et al. | Enhancer hijacking determines intra-and extrachromosomal circular MYCN amplicon architecture in neuroblastoma | |
Solomon et al. | Molecular diagnostics of non-hodgkin lymphoma | |
Burbulis et al. | Improved molecular karyotyping in glioblastoma | |
US20240093180A1 (en) | Oligonucleotide adapters and method | |
Haas et al. | Targeted next-generation sequencing: the clinician’s stethoscope for genetic disorders | |
Miyaji et al. | Topoisomerase IIβ targets DNA crossovers formed between distant homologous sites to modulate chromatin structure and gene expression | |
Rapley | Molecular cloning and DNA sequencing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23878115 Country of ref document: EP Kind code of ref document: A1 |