CA3196269A1 - Safe harbor loci - Google Patents

Safe harbor loci

Info

Publication number
CA3196269A1
CA3196269A1 CA3196269A CA3196269A CA3196269A1 CA 3196269 A1 CA3196269 A1 CA 3196269A1 CA 3196269 A CA3196269 A CA 3196269A CA 3196269 A CA3196269 A CA 3196269A CA 3196269 A1 CA3196269 A1 CA 3196269A1
Authority
CA
Canada
Prior art keywords
cell
safe harbor
seq
locus
engineered
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3196269A
Other languages
French (fr)
Inventor
Xinying ZHENG
Brendan GALVIN
Somya Khare
Aaron Cooper
Michelle NGUYEN
Anzhi YAO
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Arsenal Biosciences Inc
Original Assignee
Arsenal Biosciences Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Arsenal Biosciences Inc filed Critical Arsenal Biosciences Inc
Publication of CA3196269A1 publication Critical patent/CA3196269A1/en
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N5/00Undifferentiated human, animal or plant cells, e.g. cell lines; Tissues; Cultivation or maintenance thereof; Culture media therefor
    • C12N5/06Animal cells or tissues; Human cells or tissues
    • C12N5/0602Vertebrate cells
    • C12N5/0634Cells from the blood or the immune system
    • C12N5/0636T lymphocytes
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K35/00Medicinal preparations containing materials or reaction products thereof with undetermined constitution
    • A61K35/12Materials from mammals; Compositions comprising non-specified tissues or cells; Compositions comprising non-embryonic stem cells; Genetically modified cells
    • A61K35/28Bone marrow; Haematopoietic stem cells; Mesenchymal stem cells of any origin, e.g. adipose-derived stem cells
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K35/00Medicinal preparations containing materials or reaction products thereof with undetermined constitution
    • A61K35/12Materials from mammals; Compositions comprising non-specified tissues or cells; Compositions comprising non-embryonic stem cells; Genetically modified cells
    • A61K35/48Reproductive organs
    • A61K35/54Ovaries; Ova; Ovules; Embryos; Foetal cells; Germ cells
    • A61K35/545Embryonic stem cells; Pluripotent stem cells; Induced pluripotent stem cells; Uncharacterised stem cells
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K39/00Medicinal preparations containing antigens or antibodies
    • A61K39/46Cellular immunotherapy
    • A61K39/461Cellular immunotherapy characterised by the cell type used
    • A61K39/4611T-cells, e.g. tumor infiltrating lymphocytes [TIL], lymphokine-activated killer cells [LAK] or regulatory T cells [Treg]
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K39/00Medicinal preparations containing antigens or antibodies
    • A61K39/46Cellular immunotherapy
    • A61K39/463Cellular immunotherapy characterised by recombinant expression
    • A61K39/4631Chimeric Antigen Receptors [CAR]
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K39/00Medicinal preparations containing antigens or antibodies
    • A61K39/46Cellular immunotherapy
    • A61K39/464Cellular immunotherapy characterised by the antigen targeted or presented
    • A61K39/4643Vertebrate antigens
    • A61K39/4644Cancer antigens
    • A61K39/464402Receptors, cell surface antigens or cell surface determinants
    • A61K39/464411Immunoglobulin superfamily
    • A61K39/464412CD19 or B4
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K39/00Medicinal preparations containing antigens or antibodies
    • A61K39/46Cellular immunotherapy
    • A61K39/464Cellular immunotherapy characterised by the antigen targeted or presented
    • A61K39/4643Vertebrate antigens
    • A61K39/4644Cancer antigens
    • A61K39/464466Adhesion molecules, e.g. NRCAM, EpCAM or cadherins
    • A61K39/464468Mesothelin [MSLN]
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P35/00Antineoplastic agents
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/705Receptors; Cell surface antigens; Cell surface determinants
    • C07K14/70503Immunoglobulin superfamily
    • C07K14/7051T-cell receptor (TcR)-CD3 complex
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K2239/00Indexing codes associated with cellular immunotherapy of group A61K39/46
    • A61K2239/31Indexing codes associated with cellular immunotherapy of group A61K39/46 characterized by the route of administration
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K2239/00Indexing codes associated with cellular immunotherapy of group A61K39/46
    • A61K2239/38Indexing codes associated with cellular immunotherapy of group A61K39/46 characterised by the dose, timing or administration schedule
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K2239/00Indexing codes associated with cellular immunotherapy of group A61K39/46
    • A61K2239/46Indexing codes associated with cellular immunotherapy of group A61K39/46 characterised by the cancer treated
    • A61K2239/48Blood cells, e.g. leukemia or lymphoma
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/03Fusion polypeptide containing a localisation/targetting motif containing a transmembrane segment
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2510/00Genetically modified cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Cell Biology (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Medicinal Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Microbiology (AREA)
  • Biotechnology (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Animal Behavior & Ethology (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Epidemiology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mycology (AREA)
  • Biochemistry (AREA)
  • Developmental Biology & Embryology (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Plant Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Virology (AREA)
  • Reproductive Health (AREA)
  • Hematology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Toxicology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Gynecology & Obstetrics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)

Abstract

Provided herein are safe harbor loci and methods for identifying and using safe harbor loci. The safe harbor loci exhibit increased knock-in efficiency and allow for increased, stable expression of transgenes.

Description

SAFE HARBOR LOCI
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to and the benefit of U.S. Provisional Patent Application No. 63/179,143, filed on April 23, 2021, U.S. Provisional Patent Application No.
63/141,926, filed on January 26, 2021, and U.S. Provisional Patent Application No 63/105,834, filed on October 26, 2020, the entire contents of which are incorporated by reference herein for all purposes.
SEQUENCE LISTING
[0002] The instant application contains a Sequence Listing which will be submitted via EFS-Web and is hereby incorporated by reference in its entirety. Said ASCII copy, created on October 15, 2021, is named ANB-203W0 SequenceListing and is 623,177 bytes in size.
BACKGROUND
[0003] Cancer continues to present a significant clinical burden despite the substantial research efforts and scientific advances in cancer therapies. Blood and bone marrow cancers are frequently diagnosed cancer types, including multiple myelomas, leukemia, and lymphomas. Current treatment options for these cancers are not effective for all patients and/or can have substantial adverse side effects. Other types of cancer also remain challenging to treat using existing therapeutic options. Cancer immunotherapies are a promising solution because they can be highly specific, allowing for increased therapeutic effectiveness and the mitigation of side effects.
[0004] Genetically engineered immune cell therapy is a growing field with promising applications for the treatment of diseases including, but not limited to, cancer. Through the alteration of coding and/or non-coding genomic regions, researchers are identifying transgenes and insertion sites within cells that facilitate, for example, enhanced cell function, arrest cell growth, induced cell death, and tumor size/volume reduction. The identification of safe harbor sites (SHS) has improved outcomes of genome-engineering therapies.
Well known SHS include the AAVS1 adeno-associated virus insertion site on chromosome 19, the human homolog of the murine Rosa26 locus, and the CCR5 chemokine receptor gene __ the absence of which confers HIV resistance. (See, for example, Pellenz etal., 2018, the relevant disclosures of which are herein incorporated by reference). However, there is still a need for improved guidelines for gene editing therapies and additional SHS to address challenges such as poor knock-in (KI) efficiency, insertional oncogenesis, unstable and/or anomalous expression of transgenes and/or adjacent genes, etc..
SUMMARY
[0005] The present disclosure is directed, inter alict, to safe harbor loci that exhibit high knock-in efficiency and stable expression of their transgenes. These safe harbor loci can be used to alter T cells for immunotherapy. These safe harbor loci are useful for the treatment of various diseases, including cancer.
[0006] In one aspect, the present disclosure provides an engineered cell, comprising at least one sequence encoding a transgene, wherein the at least one sequence is inserted within a safe harbor locus, the safe harbor locus is at any one or more of the sgRNA target loci provided in Table 4; and wherein expression of the at least one sequence encoding the transgene is operatively linked to an endogenous promoter. In another aspect, the present disclosure provides an engineered cell, comprising at least one sequence encoding a transgene, wherein the at least one sequence is inserted within a safe harbor locus, the safe harbor locus is at any one or more of the sgRNA target loci provided in Table 4; and wherein expression of the at least one sequence encoding the transgene is operatively linked to an exogenous promoter.
[0007] In some embodiments, the target locus is selected from: chr10:33130000-33140000, chr10:72290000-72300000, chr11:128340000-128350000, chr11:65425000-65427000 (NEAT1), chr15:92830000-92840000, chr16:11220000-11230000, chr2:87460000-87470000, chr3:186510000-186520000, chr3:59450000-59460000, chr8:127980000-128000000, and chr9:7970000-7980000. In some embodiments, the target locus is selected from: chr10:72290000-72300000, chr11:128340000-128350000, chr15:92830000-92840000, and chr16:11220000-11230000. In some embodiments, the target locus is chr11:128340000-128350000. In some embodiments, the target locus is chr15:92830000-92840000.
In some embodiments, the target locus is a gene selected from: APRT, B2M, CAPNS1, CBLB, CD2, CD3E, CD3G, CDS, EDF!, FTL, PTEN, PTPN2, PTPN6, PTPRC, PTPRCAP, RPS23, RTRAF, SERF2, SLC38A1, SMAD2 , SOCS1, ,SRP 14, SRSF9, SUB], TET2 , TIGIT, TRAC, and TRIM28.
[0008] In some embodiments, the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS88, GS89, GS90, GS91, GS92, GS93, GS94, GS95, GS96, GS97, GS98, GS99, GS100, GS101, GS102, GS103, GS104, GS105, GS106, GS107, GS108, GS109, GS110, GS111, GS112, GS113, GS114, GS115, GS116, GS117, GS118, GS119, G-S120. In some embodiments, the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS91, GS92, GS93, GS94, GS95, GS96, GS100, GS101, GS102, GS103, GS104, and GS105. In some embodiments, the safe harbor locus is selected from any one of the integration sites in Table 4 designated:
GS103, GS104, and GS105. In some embodiments, the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS94, GS95, and GS96. In some embodiments, the safe harbor locus is the GS94 integration site in Table 4. In some embodiments, the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS100, GS101, and GS102. In some embodiments, the safe harbor locus is the GS102 integration site in Table 4. In some embodiments, the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS91, GS92, and GS93.
[0009] In some embodiments, the exogenous promoter is an EFla promoter. In some embodiments, the engineered cell is a stem cell, a human cell, a primary cell, an hematopoietic cell, an adaptive immune cell, an innate immune cell, a T cell or a T cell progenitor. In some embodiments, the transgene encodes a recombinant protein, optionally a therapeutic agent. In some embodiments, the transgene encodes a chimeric antigen receptor (CAR).
[0010] In another aspect, the present disclosure provides a composition comprising the engineered cell as described herein and a pharmaceutical excipient.
[0011] In another aspect, the present disclosure provides a guide ribonucleic acids (gRNA) for editing a cell at a safe harbor locus, wherein gRNA comprises any one of the sgRNA
sequences in Table 4.
[0012] In some embodiments, the gRNA comprises any one of SEQ ID NOS:1-120. In some embodiments, the gRNA comprises any one of SEQ ID NOS: 91-96 and 100-105. In some embodiments, the gRNA comprises SEQ ID NO:94 or SEQ ID NO:102. In some embodiments, the gRNA comprises SEQ ID NO:94. In some embodiments, the gRNA
comprises SEQ ID NO: 102. In some embodiments, the cell is a stem cell, a human cell, a primary cell, an hematopoietic cell, an adaptive immune cell, an innate immune cell, a T cell or a T cell progenitor.
[0013] In another aspect, the present disclosure provides a method of editing a cell having chromosomal DNA, comprising inserting at least one sequence encoding a transgene within a safe harbor locus in the chromosomal DNA of the cell, wherein the safe harbor locus is any one or more of the sgRNA target loci provided in Table 4.
[0014] In some embodiments, the target locus is selected from: chr10:33130000-33140000, chr10:72290000-72300000, chr11:128340000-128350000, chr11:65425000-65427000 (NEAT1), chr15:92830000-92840000, chr16:11220000-11230000, chr2:87460000-87470000, chr3:186510000-186520000, chr3:59450000-59460000, chr8:127980000-128000000, and chr9:7970000-7980000. In some embodiments, the target locus is selected from: chr10:72290000-72300000, chr11:128340000-128350000, chr15:92830000-92840000, and chr16:11220000-11230000. In some embodiments, the target locus is chr11:128340000-128350000. In some embodiments, the target locus is chr15:92830000-92840000.
In some embodiments, the target locus is a gene selected from: APRT, B2M, CAPNS1, CBLB, CD2, CD3E, CD3G, CDS, EDF 1 , FTL, PTEN, PTPN2, PTPN6, PTPRC, PTPRCAP , RP S23, RTRAF, SERF2, SLC38A1, SMAD2 , SOCS1, SRP 14, SRSF9, SUB1 , TET2 , TIGIT, TRAC, and TRIM28.
[0015] In some embodiments, the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS88, GS89, GS90, GS91, GS92, GS93, GS94, GS95, GS96, GS97, GS98, GS99, GS100, GS101, GS102, GS103, GS104, GS105, GS106, GS107, GS108, GS109, GS110, GS111, GS112, GS113, GS114, GS115, GS116, GS117, GS118, GS119, GS120. In some embodiments, the safe harbor locus is selected from any one of the integration sites in Table 4 designated: 0S91, 0S92, GS93, GS94, GS95, GS96, GS100, GS101, GS102, GS103, GS104, and GS105. In some embodiments, the safe harbor locus is selected from any one of the integration sites in Table 4 designated:
GS103, GS104, and GS105. In some embodiments, the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS94, GS95, and GS96. In some embodiments, the safe harbor locus is the GS94 integration site in Table 4. In some embodiments, the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS100, GS101, and GS102. In some embodiments, the safe harbor locus is the GS102 integration site in Table 4. In some embodiments, the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS91, GS92, and GS93.
[0016] In some embodiments, the transgene encodes a recombinant protein, optionally a therapeutic agent. In some embodiments, the transgene encodes a chimeric antigen receptor (CAR). In some embodiments, the at least one sequence comprises an exogenous promoter and the exogenous promoter is operably linked to the transgene. In some embodiments, the exogenous promoter is an EFla promoter. In some embodiments, the cell is a stem cell, a human cell, a primary cell, an hematopoietic cell, an adaptive immune cell, an innate immune cell, a T cell or T cell progenitor. In some embodiments, the at least one sequence is inserted using a homology-directed repair. In some embodiments, the at least one sequence is inserted using a homology independent targeted insertion. In some embodiments, the at least one sequence is inserted using one or more guide ribonucleic acids (gRNAs) and one or more Cas9 endonucleases.
[0017] In some embodiments, the one or more gRNAs comprises any one of SEQ ID
NOS:
1-120. In some embodiments, the one or more gRNAs comprises any one of SEQ ID
NOS:
91-96 and 100-105. In some embodiments, the gRNA comprises SEQ ID NO:94 or SEQ
ID
NO: 102. In some embodiments, the gRNA comprises SEQ ID NO:94. In some embodiments, the gRNA comprises SEQ ID NO:102.
[0018] In another aspect, the present disclosure provides a method of editing a T cell, comprising contacting a T cell with one or more guide ribonucleic acids (gRNAs), at least one sequence encoding a transgene, and one or more Cas9 endonucleases, wherein the one or more gRNAs and Cas9 cndonucleases facilitate the insertion of the at least one sequence into chromosomal DNA within a safe harbor locus, wherein the safe harbor locus is selected from any one or more of the sgRNA target loci in Table 4.
[0019] In some embodiments, the one or more gRNAs comprises a sequence selected from any one of the sgRNA sequences in Table 4. In some embodiments, the one or more gRNAs comprises any one of SEQ ID NOS: 1-120. In some embodiments, the one or more gRNAs comprises any one of SEQ ID NOS: 91-96 and 100-105. In some embodiments, the gRNA
comprises SEQ ID NO:94 or SEQ ID NO: 102. In some embodiments, the gRNA
comprises SEQ ID NO:94. In some embodiments, the gRNA comprises SEQ ID NO:102.
[0020] In some embodiments, the target locus is selected from: chr10:33130000-33140000, chr10:72290000-72300000, chr11:128340000-128350000, chr11:65425000-65427000 (NEAT1), chr15 :92830000-92840000, chr16: 11220000-11230000, chr2 :87460000-87470000, chr3:186510000-186520000, chr3:59450000-59460000, chr8:127980000-128000000, and chr9:7970000-7980000. In some embodiments, the target locus is selected from: chr10:72290000-72300000, chr11:128340000-128350000, chr15:92830000-92840000, and chr16:11220000-11230000. In some embodiments, the target locus is chr11:128340000-128350000. In some embodiments, the target locus is chr15:92830000-92840000.
In some embodiments, the target locus is a gene selected from: APRT, B2M, CAPNS1, CBLB,CD2, CD3E, CD3G, CD5,EDFI, FTL, PTEN, PTPN2, PTPN6, PTPRC, PTPRCAP , RPS23 , RTRAF, SERF2, SLC38A1, SMAD2 , SOCS1, SRP 14, SRSF9 , SUB 1, TET2, TIGIT, TRAC, and TRIM28.
[0021] In some embodiments, the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS88, GS89, GS90, GS91, GS92, GS93, GS94, GS95, GS96, GS97, GS98, GS99, GS100, GS101, GS102, GS103, GS104, GS105, GS106, GS107, GS108, GS109, GS110, GS111, GS112, GS113, GS114, GS115, GS116, GS117, GS118, GS119, GS120. In some embodiments, the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS91, GS92, GS93, GS94, GS95, GS96, GS100, GS101, GS102, GS103, GS104, and GS105. In some embodiments, the safe harbor locus is selected from any one of the integration sites in Table 4 designated:
GS103, GS104, and GS105. In some embodiments, the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS94, GS95, and GS96. In some embodiments, the safe harbor locus is the GS94 integration site in Table 4. In some embodiments, the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS100, GS101, and GS102. In some embodiments, the safe harbor locus is the GS102 integration site in Table 4. In some embodiments, the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS91, GS92, and GS93
[0022] In another aspect, the present disclosure provides an ex vivo method of obtaining an engineered cell or population thereof, comprising (a) obtaining a cell; and (b) genetically modifying the cell by inserting at least one sequence encoding a transgene within a safe harbor locus, wherein the safe harbor locus is selected from any one of the sgRNA target loci in Table 4.
[0023] In some embodiments, obtaining the cell comprises: (i) collecting a tissue sample from a subject, (ii) isolating the cells from the tissue samples, and (iii) culturing the cells in vitro. In some embodiments, the tissue sample is a blood sample. In some embodiments, the cell is a stem cell, a human cell, a primary cell, an hematopoietic cell, an adaptive immune cell, an innate immune cell, a T cell or T cell precursor (T cell progenitor).
In some embodiments, the at least one sequence is inserted using a homology-directed repair. In some embodiments, the at least one sequence is inserted using a homology independent targeted insertion.
[0024] In some embodiments, the genetically modifying in step (b) comprises contacting the cell with one or more guide ribonucleic acids (gRNAs), the at least one sequence, and one or more Cas9 endonucleases, wherein the one or more gRNAs and Cas9 endonucleases facilitate the insertion of the at least one sequence into chromosomal DNA within the safe harbor locus. In some embodiments, the one or more gRNAs comprises a sequence selected from any one of the sgRNA sequences in Table 4.
[0025] In some embodiments, the transgene encodes a recombinant protein, optionally a therapeutic agent. In some embodiments, the transgene encodes a chimeric antigen receptor (CAR). In some embodiments, the at least one sequence comprises an exogenous promoter and the exogenous promoter is operably linked to the transgene. In some embodiments, the exogenous promoter is an EFla promoter.
[0026] In yet another aspect, the present disclosure provides a method of treating a subject having or at risk of having a disease, comprising administering to the subject an effective amount of an engineered cell as described herein, a population thereof, or a composition as described herein. In some embodiments, the cell, the population thereof, or the composition is administered to the subject by infusion.
[0027] In yet another aspect, the present disclosure provides a method of treating a subject having or at risk of having a disease, comprising (a) conducting any one of the methods described supra; and (b) administering to the subject an effective amount of a composition comprising the cell or a population thereof. In some embodiments, the composition is administered to the subject by infusion. In some embodiments, the disease is cancer. In some embodiments, the disease is blood cancer.
[0028] In another aspect, the present disclosure provides a method of identifying a safe harbor locus, comprising. (a) identifying genes or non-coding regions in a chromosome that are above a threshold level for expression across developmental cell states and/or a threshold level for accessibility of chromatin; (b) generating a linear model that correlates the gene or non-coding region from step (a) with knock-in (KI) efficiency and estimates the KI efficiency of any gene or coding region on the chromosome; and (c) selecting the safe harbor locus based on threshold parameters; wherein the safe harbor locus is selected for insertion of at least one sequence encoding a transgene within a cell.
[0029] In some embodiments, the threshold parameters include one or more of:
stable expression of a transgene, knockout of the gene confers benefit to the function of the cell, no known function within the cell, stable transgene expression in vitro with or without CD3/CD28 stimulation, negligible off-target cleavage as detected by iGuide-Seq or CRISPR-Seq, less off-target cleavage relative to other loci as detected by iGuide-Seq or CRISPR-Seq, negligible transgene-independent cytotoxicity, negligible transgene-independent cytokine expression, negligible transgene-independent chimeric antigen receptor expression, negligible deregulation or silencing of nearby genes, and positioned outside of a cancer-related gene. In some embodiments, the stable expression of a transgene at the safe harbor locus is less than or equal to 2-fold expression change over the course of at least 1, 2, 3, 4, 5, 6, or 7 days, and wherein expression change is measured by mean fluorescence intensity of a reporter gene encoded by the at least one sequence. In some embodiments, the accessibility of chromatin is measured using an assay for transposase-accessible chromatin using sequencing (ATAC-seq).
In some embodiments, the level of expression across developmental cell states is measured using RNA sequencing (RNA-seq).
[0030] In some embodiments, the cell is a stem cell, a human cell, a primary cell, an hematopoietic cell, an adaptive immune cell, an innate immune cell, a T cell or T cell progenitor. In some embodiments, the linear model has a coefficient of determination (R2 value) of at least 30%.
[0031] In yet another aspect, the present disclosure provides an engineered cell, composition, gRNA or method as described herein, wherein insertion within the safe harbor locus increase cell cytotoxicity of diseased cells.
[0032] In yet another aspect, the present disclosure provides an engineered cell, composition, gRNA or method as described herein, wherein knock-in efficiency at the safe harbor locus is increased relative to other locations along the chromosome BRIEF DESCRIPTION OF THE DRAWINGS
[0033] These and other features, aspects, and advantages of the present invention will become better understood with regard to the following description, and accompanying drawings, where:
[0034] FIG. 1 includes a schematic depicting the methodology used to identify safe harbor loci in the present disclosure, in some embodiments.
[0035] FIGS. 2A-2C include schematic depicting 3 top safe harbor loci known in the art:
AAVS1 (FIG. 2A), CCR5 (FIG. 2B), and Rosa26 (FIG. 2C). The figures were retrieved from Sadelain, M., et al. (2012). Safe harbours for the integration of new DNA
in the human genome. Nature reviews Cancer, 12(1), 51-58, the relevant disclosures of which are herein incorporated by reference in their entirety.
[0036] FIG. 3 includes a schematic outlining the data provided in Roth, T. L., et al. 2019.
Rapid discovery of synthetic DNA sequences to rewrite endogenous T cell circuits. bioRxiv, 604561, the relevant disclosures of which are herein incorporated by reference in their entirety.
[0037] FIGS. 4A-4B include heatmaps with samples clustered based on activation status and cell type, generated with processed RNA-seq data. FIG. 4A includes a heatmap of samples generated with processed RNA-seq data. FIG. 4B includes a heatmap sample generated with processed RNA-seq data, showing the clustering of ---20k genes.
[0038] FIGS. 5A-5B include plots generated using processed RNA-seq data. FIG.

includes a plot showing direct correlation between transcript expression data from Roth, T.L., et al. 2019 and transcript expression data generated by the inventors of the present disclosure.
FIG.5B includes a heatmap showing clusters of the top 10% expressed genes.
[0039] FIG. 6 includes a heatmap of samples generated with processed ATAC-seq data.
[0040] FIGS. 7A-7B includes plots depicting the signal enrichment at transcription start sites (TSS) (FIG. 7A) and peak size distribution (FIG. 7B), generated with processed ATAC-seq data for coding regions.
[0041] FIG. 8 includes a plot depicting the open chromatin regions around the TRAC locus.
[0042] FIG. 9 includes a plot depicting KI (knock-in) efficiency for coding regions/genes with GFP as a reporter.
[0043] FIG. 10 includes a plot depicting KI efficiency for coding regions/genes with tNGFR
as a reporter.
[0044] FIG. 11 includes plots showing scaled KI efficiency for 90 genes vs d2 (day-2) RNA-seq data, d4 (day-4) RNA-seq data and ATAC-seq data, utilized for the predictive linear model.
[0045] FIG. 12 includes a plot showing chromatin accessibility for a top candidate non-coding region, measured using ATAQ sequencing.
[0046] FIG. 13A includes plots showing the cell counts for GFP controls used for the evaluation of candidate KI loci. FIG. 13B includes pmax (GFP high) readings for GFP
controls used for the evaluation of candidate KI loci.
[0047] FIG. 14A includes plots showing the maximum episomal GFP expression (GFP high) readings for non-targeting controls. FIG. 14B includes the cell count for non-targeting controls from donors 1, 2, and 3.
[0048] FIG. 15 includes the cell count and maximum GFP expression (GFP high) readings for WT controls.
[0049] FIG. 16 includes plots showing the cell count and maximum GFP
expression (GFP high) readings when using sgRNA5, which targets a B2M safe harbor locus, and the construct expression was driven by an endogenous promoter.
[0050] FIG. 17 includes plots showing the cell count and maximum GFP
expression (GFP high) readings when using sgRNA5, which targets a B2M safe harbor locus, and the construct expression was driven by an EFla promoter.
[0051] FIG. 18 includes plots showing the cell count and maximum GFP
expression (GFP high) readings when using sgRNA79 to target a TRAC safe harbor locus, and the construct expression was driven by an endogenous promoter.
[0052] FIG. 19 includes plots showing the cell count and maximum GFP
expression (GFP high) readings when using sgRNA79, which targets a TRAC safe harbor locus, and the construct expression was driven by an exogenous promoter.
[0053] FIG. 20 includes plots showing TCR (T cell receptor) vs. GFP among all donors and time points when using sgRNA79, and the construct is driven by an endogenous promoter.
[0054] FIG. 21 includes plots showing TCR vs. GFP among all donors and time points when using sgRNA79 and the construct is driven by an exogenous promoter.
[0055] FIG. 22 includes plots showing the cell count and maximum GFP
expression (GFP high) readings when using sgRNA83, which targets a TRAC safe harbor locus, and the construct expression was driven by an endogenous promoter.
[0056] FIG. 23 includes plots showing the cell count and maximum GFP
expression (GFP high)leadings when using sgRNA83, which targets a TRAC safe harbor locus, and the construct expression was driven by an exogenous promoter.
[0057] FIG. 24 includes plots showing TCR vs. GFP among all donors and time points when using sgRNA83 and the construct is driven by an endogenous promoter.
[0058] FIG. 25 includes plots showing TCR vs. GFP among all donors and time points when using sgRNA83 and the construct is driven by an exogenous promoter.
[0059] FIG. 26 includes plots illustrating potential sources of variation (e.g., edge effects and electroporation errors) observed between replicates and donors.
[0060] FIG. 27 includes plots illustrating potential sources of variation observed between replicates and donors, e.g., relating to inherent differences between donors.
[0061] FIG. 28 includes plots illustrating potential sources of variation observed between replicates and donors, e.g., relating to gating errors.
[0062] FIG. 29 includes plots showing the GFP mean fluorescence intensity (GFP
MFI) and KI efficiency for top KI loci evaluated with endogenous promoters.
[0063] FIG. 30 includes plots showing the GFP mean fluorescence intensity (GFP
MFI) and KI efficiency for top KI loci evaluated with the EFla promoter.
[0064] FIGS. 31A-31C include plots showing all the significant KI loci (top KI
loci) as evaluated with endogenous and EFla promoters. FIG. 31A includes a plot showing that the expression from the EFla promoter was approximately 10 times higher than expression from an endogenous promoter. FIG. 31B shows the top KI loci ranked by GFP MFI at week 3.
FIG. 31C shows the top integration loci ranked by GFP MFI (at weeks 3 and 4;
donors 1-3).
[0065] FIG. 32A includes a plot showing some target loci and their measured transgene expression levels. FIG. 32B includes plots showing the transgene (Prime Receptor (PrimeR)) and TCR expression for the control and insertion at GS94, GS102 and TRAC loci.
[0066] FIG. 33A includes a plot showing the PrimeR levels measured for the indicated integration sites. FIG. 33B includes a schematic showing the GS94 integration site on Chromosome 11.
[0067] FIG. 34A includes plots showing CAR induction and primeR expression of engineered T cells after 48 hours of coculturing with K562-CD19 cells FIG. 34B
includes plots showing the cytotoxity and cytokine secretion levels for engineered T
cells 48 hours of coculturing with K562-CD19/MSLN cells.
[0068] FIG. 35A includes a schematic showing the experimental overview for evaluating the effect of integration site on cytotoxicity. FIG. 35B includes plots showing the measured cytotoxicity for engineerred T cells cocultured for 48 hours with the K562 CD19+/MSLN+ or K562 CD19-/MSLN+ cells.
[0069] FIG. 36A includes a schematic showing the experimental overview for evaluating the effect of integration site on cytokine secretion. FIG. 36B includes plots showing the measured cytokine levels for engineerred T cells cocultured for 48 hours with CD19+/MSLN+ cells.
[0070] FIG. 37 includes a schematic showing the in vitro experiment conducted to determin the effect of integration site on primeR-independent CAR expression. "Flow"
refers to flow cytometry and "restim" refers to repetitive CD3/CD28 stimualation of the engineered T cells.
"EP" refers to electroporation.
[0071] FIGS. 38A and 38B include plots showing the stability of PrimeR
expression over time when using the indicated integration sites. "Flow" refers to flow cytometry and "restim"

refers to repetitive CD3/CD28 stimualation of the engineered T cells. "EP"
refers to electroporation. In FIG. 38B, the PrimeR expression is normalized to the expression from using the TRAC integration site.
[0072] FIG. 39A includes a schematic showing the iGuide-Seq assay technique.
FIG. 39B
includes a plot showing the on-target efficiency, using iGuide Seq assay, for the indicated integration sites. FIG. 39C includes schematics from the iGuide-Seq analysis showing that GS94 had no reproducible putatitve off-targets across two donors.
[0073] FIG. 40 includes a schematic showing the iGuide-Seq workflow and data.
[0074] FIG. 41 includes a plot showing rhAmp-seq analysis of putative off-target sites identified by iGUIDE-seq and Elevation prediction.
[0075] FIG. 42 includes plots showing RNA-seq analysis of cells with GS94, GS102 and TRAC knock-in of CD19/MSLN circuits. Scatterplot of gene expression in cells with integration at the GS94 locus (y-axis) vs cells with integration at either the TRAC or the GS102 locus (x-axis) in two donors. The yellow dots correspond to ETS1 and FLIL In blue are the genes that were found to be differentially expressed using edgeR (fold-change > 0, FDR-corrected p-value < 0.01, average counts-per-million across compared conditions at least 2).
[0076] FIG. 43 includes plots showing the absence of cytokine-independent growth in cells with CD19/MSLN circuit KI at GS94.
[0077] FIG. 44 shows a diagram of a 8.3 kb cassette that was inserted into the GS94 safe harbor locus.
[0078] FIG. 45 shows the expression of a 8.3 kb transgene circuit comprising a priming receptor and CAR in K562 cells.
[0079] FIG. 46 shows that non-viral editing generated less differentiated T
cells.
DETAILED DESCRIPTION
[0080] The present disclosure provides safe harbor loci and methods for identifying safe harbor loci that exhibit high integration efficiency (e.g., high knock-in (KI) efficiency), high and constant levels of transgene expression, and such benefits independent of T cell activation/differentiation state. In some embodiments, the safe harbor loci also exhibit minimal to no disruption to T cell function and/or capacity for product manufacturing. In some embodiments, these loci are useful for effective and safe integration and expression of transgenes in T cells (e.g. in CAR T therapies). In some embodiments, the methods described herein can be used for the identification of safe harbor loci for insertion of transgenes in other types of cells.
[0081] FIG. 1 illustrates the overall approach that the inventors of the present disclosure used to identify safe harbor loci. In some embodiments, the present disclosure provides a method comprising the identification of genes within a genome and non-coding regions with sustained expression in a treatment cell (e.g. T cell) and using a predictive model of KI
efficiency as a function of T cell chromatic state and computational analysis to predict candidate integration sites. The method further comprises evaluating the candidate integration sites for actual KI efficiency, sustained levels of transgene expression of a transgene, and minimal disruption to the treatment cell phenotype (e.g., T cell function and/or capacity for treatment product expansion and manufacturing). In some embodiments, the safe harbor loci allow for integration of a transgene driven by an endogenous promoter. In some embodiments, the safe harbor loci allow for integration of a transgene driven by an exogenous promoter (e.g. EFla promoter).
[0082] To facilitate an understanding of the present disclosure, a number of terms and phrases are defined below. Unless otherwise defined herein, scientific and technical terms used in this application shall have the meanings that are commonly understood by those of ordinary skill in the art. Generally, nomenclature used in connection with, and techniques of pharmacology, cell and tissue culture, molecular biology, cell and cancer biology, neurobiology, neurochemistry, virology, immunology, microbiology, genetics and protein and nucleic acid chemistry, described herein, are those well-known and commonly used in the art. In case of conflict, the present specification, including definitions, will control.
[0083] Unless otherwise defined, all terms of art, notations and other scientific terminology used herein are intended to have the meanings commonly understood by those of skill in the art. In some cases, terms with commonly understood meanings are defined herein for clarity and/or for ready reference, and the inclusion of such definitions herein should not necessarily be construed to represent a difference over what is generally understood in the art. The techniques and procedures described or referenced herein are generally well understood and commonly employed using conventional methodologies by those skilled in the art, such as, for example, the widely utilized molecular cloning methodologies described in Sambrook et al., Molecular Cloning: A Laboratory Manual 4th ed. (2012) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. As appropriate, procedures involving the use of commercially available kits and reagents are generally carried out in accordance with manufacturer-defined protocols and conditions unless otherwise noted.
[0084] As used herein, the singular forms "a,- "an,- and "the- include the plural referents unless the context clearly indicates otherwise. The terms "include,- "such as,-and the like are intended to convey inclusion without limitation, unless otherwise specifically indicated.
[0085] As used herein, the term -comprising" also specifically includes embodiments "consisting of' and "consisting essentially of' the recited elements, unless specifically indicated otherwise.
[0086] The term "about" indicates and encompasses an indicated value and a range above and below that value. In certain embodiments, the term "about" indicates the designated value 10%, 5%, or 1%. In certain embodiments, where applicable, the term "about"
indicates the designated value(s) one standard deviation of that value(s).
[0087] As used herein, the term "gene" refers to the basic unit of heredity, consisting of a segment of DNA arranged along a chromosome, which codes for a specific protein or segment of protein. A gene typically includes a promoter, a 5' untranslated region, one or more coding sequences (exons), optionally introns, a 3' untranslated region.
The gene may further comprise a terminator, enhancers and/or silencers.
[0088] As used herein, the term "locus- refers to a specific, fixed physical location on a chromosome where a gene or genetic marker is located.
[0089] As used herein, the term "target locus" refers to a locus on a chromosome within which a safe harbor locus can be used for the insertion of a sequence. A
target locus can consist of multiple potential safe harbor loci (integration sites). Examples of target loci are provided in Table 4, as sgRNA target loci. The notation used for the sgRNA
target loci in Table 4 refers to the genomic region of the target locus, defined by the chromosome of the target locus and the coordinate range for that target locus. For example, chr10:33130000-33140000 refers to a target locus on Chr10 (chromosome 10) starting from coordinate 33130000 and ending with coordinate 33140000.
[0090] The term "safe harbor locus" refers to a locus at which genes or genetic elements can be incorporated without disruption to expression or regulation of adjacent genes. These safe harbor loci are also referred to as safe harbor sites (SHS). As used herein, a safe harbor locus refers to an "integration site" or "knock-in site" at which a sequence encoding a transgene, as defined herein, can be inserted. In some embodiments the insertion occurs with replacement of a sequence that is located at the integration site. In some embodiments, the insertion occurs without replacement of a sequence at the integration site. Examples of integration sites contemplated are provided in Table 4.
[0091] As used herein, the term "insert- refers to a nucleotide sequence that is integrated (inserted) at a safe harbor site. The insert can be used to refer to the genes or genetic elements that are incorporated at the safe harbor site using, for example, homology-directed repair (HDR) CRISPR/Cas9 genome-editing or other methods for inserting nucleotide sequences into a genomic region known to those of ordinary skill in the art.
[0092] The "CRISPR/Cas" system refers to a widespread class of bacterial systems for defense against foreign nucleic acid. CRISPRJCas systems are found in a wide range of eubacterial and archaeal organisms. CRISPR/Cas systems include type I, II, and III sub-types. Wild-type type II CRISPR/Cas systems utilize an RNA-mediated nuclease,Cas9 in complex with guide and activating RNA to recognize and cleave foreign nucleic acid. Guide RNAs having the activity of both a guide RNA and an activating RNA are also known in the art. In some cases, such dual activity guide RNAs are referred to as a small guide RNA
(sgRNA).
[0093] Cas9 homologs are found in a wide variety of eubacteria, including, but not limited to bacteria of the following taxonomic groups. A ctinohacteria, A quificae, Bacteroidetes-Chlorobi, Chlamydiae-Verrucomicrohia, Chlrofiexi, Cyanohacteria, Firm icutes, Proteobacteria, ,S'pirochaetes, and Thermotogae. An exemplary Cas9 protein is the Streptococcus pyogenes Cas9 protein. Additional Cas9 proteins and homologs thereof are described in, e.g., Chylinksi, et al., RNA Biol. 2013 May 1, 10(5). 726-737 , Nat. Rev.
Microbiol. 2011 June; 9(6): 467-477; Hou, et al., Proc Natl Acad Sci U S A.
2013 Sep 24; 110(39): 15644-9; Sampson et al., Nature. 2013 May 9;497(7448):254-7; and Jinek, et al., Science. 2012 Aug 17;337(6096):816-21. The Cas9 nuclease domain can be optimized for efficient activity or enhanced stability in the host cell.
[0094] As used herein, the term "Cas9- refers to an RNA-mediated nuclease (e.g., of bacterial or archeal orgin, or derived therefrom). Exemplary RNA-mediated nuclases include the foregoing Cas9 proteins and homologs thereof, and include but are not limited to, CPF1 (See, e.g., Zetsche etal., Cell, Volume 163, Issue 3, p759-771, 22 October 2015). Similarly, as used herein, the term "Cas9 ribonucleoprotein" complex and the like refers to a complex between the Cas9 protein, and a crRNA (e.g., guide RNA or small guide RNA), the Cas9 protein and a trans-activating crRNA (tracrRNA), the Cas9 protein and a small guide RNA, or a combination thereof (e.g., a complex containing the Cas9 protein, a tracrRNA, and a crRNA guide RNA).
[0095] As used herein, the terms "T lymphocyte- and "T cell" are used interchangeably and refer to cells that have completed maturation in the thymus, and identify certain foreign antigens in the body. The terms also refer to the major leukocyte types that have various roles in the immune system, including activation and deactivation of other immune cells. The T
cell can be any T cell such as a cultured T cell, e.g., a primary T cell, or a T cell derived from a cultured T cell line, e.g., a Jurkat, SupT1, etc., or a T cell obtained from a mammal. The T
cell can be a CD3 + cell. The T cell can be any type of T cell, CD4 + / CD8 +
double positive T cells, CD4 + helper T cells (e.g. Thl and Th2 cells), CD8 + T cells (e.g.
cytotoxic T cells), peripheral Including but not limited to blood mononuclear cells (PBMC), peripheral blood leukocytes (PBL), tumor infiltrating lymphocytes (TIL), memory T cells, naive T cells, regulatory T cells, T cells, etc. It can be any T cell at any stage of development. Additional types of helper T cells include Th3 (Trcg) cells, Th17 cells, Th9 cells, or Tfh cells. Additional types of memory T cells include cells such as central memory T cells (Tern cells), effector memory T cells (Tern cells and TEMRA cells). A T cell can also refer to a genetically modified T cell, such as a T cell that has been modified to express a T cell receptor (TCR) or a chimeric antigen receptor (CAR) T cells can also be differentiated from stem cells or progenitor cells (e.g., precursor cells).
[0096] "CD4 + T cells" refers to a subset of T cells that express CD4 on their surface and are associated with a cellular immune response. CD4 + T cells are characterized by a post-stimulation secretion profile that can include secretion of cytokines such as 1FN-y, TNF-u, IL-2, IL-4 and IL-10. "CD4" is a 55 1(1) glycoprotein originally defined as a differentiation antigen on T lymphocytes, but was also found on other cells including monocytes /
macrophages. The CD4 antigen is a member of the immunoglobulin superfamily and has been implicated as an associative recognition element in M_HC (major histocompatibility complex) class II restricted immune responses. On T lymphocytes, the CD4 antigen defines a helper / inducer subset.
[0097] "CD8 + T cells" refers to a subset of T cells that express CD8 on their surface, are MHC class I restricted, and function as cytotoxic T cells. The "CD8" molecule is a differentiation antigen present on thymocytes, as well as on cytotoxic and suppressor T
lymphocytes. The CD8 antigen is a member of the immunoglobulin superfamily and is an associative recognition element in major histocompatibility complex class I
restriction interactions.
[0098] As used herein, the term "ex vivo" generally includes experiments or measurements made in or on living tissue, preferably in an artificial environment outside the organism, preferably with minimal differences from natural conditions.
10099] As used herein, the term -construct" refers to a complex of molecules, including macromolecules or polynucleotides.
[00100] As used herein, the term "integration" refers to the process of stably inserting one or more nucleotides of a construct into the cell genome, i.e., covalently linking to a nucleic acid sequence in the chromosomal DNA of the cell. It may also refer to nucleotide deletions at a site of integration. Where there is a deletion at the insertion site, "integration" may further include substitution of the endogenous sequence or nucleotide deleted with one or more inserted nucleotides.
[00101] As used herein, the term "exogenous" refers to a molecule or activity that has been introduced into a host cell and is not native to that cell. The molecule can be introduced, for example, by introduction of the encoding nucleic acid into host genetic material, such as by integration into a host chromosome, or as non-chromosomal genetic material, such as a plasmid. Thus, the term, when used in connection with expression of an encoding nucleic acid, refers to the introduction of the encoding nucleic acid into a cell in an expressible form.
The term "endogenous" refers to a molecule or activity that is present in a host cell under natural, unedited conditions. Similarly, the term, when used in connection with expression of the encoding nucleic acid, refers to expression of the encoding nucleic acid that is contained within the cell and not introduced exogenously.
[00102] As used herein, a "polynucleotide donor construct" refers to a nucleotide sequence (e.g. DNA sequence) that is genetically inserted into a polynucleotide and is exogenous to that polynucleotide. The polynucleotide donor construct is transcribed into RNA and optionally translated into a polypeptide. The polynucleotide donor construct can include prokaryotic sequences, cDNA from eukaryotic mRNA, genomic DNA
sequences from eukaryotic (e.g., mammalian) DNA, and synthetic DNA sequences. For example, the polynucleotide donor construct can be a miRNA, shRNA, natural polypeptide (i.e., a naturally occurring polypeptide) or fragment thereof or a variant polypeptide (e.g. a natural polypeptide having less than 100% sequence identity with the natural polypeptide) or fragments thereof [00103] As used herein, the term "transgene" refers to a polynucleotide that has been transferred naturally, or by any of a number of genetic engineering techniques from one organism to another. It is optionally translated into a polypeptide. It is optionally translated into a recombinant protein. A "recombinant protein" is a protein encoded by a gene ¨
recombinant DNA ¨ that has been cloned in a system that supports expression of the gene and translation of messenger RNA (see expression system). The recombinant protein can be a therapeutic agent, e.g. a protein that treats a disease or disorder disclosed herein. As used, transgene can refer to a polynucleotide that encodes a polypeptide. A
transgene can also refer to a non-encoding sequence, such as, but not limited to shRNAs, miRNAs, and miRs.
[00104] The terms "protein," "polypeptide," and "peptide" are used herein interchangeably.
[00105] As used herein, the term "operably linked" refers to the binding of a nucleic acid sequence to a single nucleic acid fragment such that one function is affected by the other. For example, if a promoter is capable of affecting the expression of a coding sequence or functional RNA (i.e., the coding sequence or functional RNA is under transcriptional control by the promoter), the promoter is operably linked thereto. Coding sequences can be operably linked to control sequences in both sense and anti sense orientation.
[00106] As used herein, the term "developmental cell states" refers to, for example, states when the cell is inactive, actively expressing, differentiating, senescent, etc. developmental cell state may also refer to a cell in a precursor state (e.g., a T cell precursor or T cell progenitor).
[00107] As used, the term "encoding" refers to a sequence of nucleic acids which codes for a protein or polypeptide of interest. The nucleic acid sequence may be either a molecule of DNA or RNA. In preferred embodiments, the molecule is a DNA molecule. In other preferred embodiments, the molecule is a RNA molecule. When present as a RNA
molecule, it will comprise sequences which direct the ribosomes of the host cell to start translation (e.g., a start codon, ATG) and direct the ribosomes to end translation (e.g., a stop codon). Between the start codon and stop codon is an open reading frame (ORF). Such terms are known to one of ordinary skill in the art.
[00108] The term "inserting" refers to a manipulation of a nucleotide sequence to introduce a non-native sequence. This is done, for example, via the use of restriction enzymes and ligases whereby the DNA sequence of interest, usually encoding the gene of interest, can be incorporated into another nucleic acid molecule by digesting both molecules with appropriate restriction enzymes in order to create compatible overlaps and then using a ligase to join the molecules together. One skilled in the art is very familiar with such manipulations and examples may be found in Sambrook et al. (Sambrook, Fritsch, & Maniatis, "Molecular Cloning: A Laboratory Manual", 2nd ed., Cold Spring Harbor Laboratory, 1989), which is hereby incorporated by reference in its entirety including any drawings, figures and tables.
[00109] As used herein, the term -subject" refers to a mammalian subject.
Exemplary subjects include humans, monkeys, dogs, cats, mice, rats, cows, horses, camels, goats, rabbits, pigs and sheep. In certain embodiments, the subject is a human. In some embodiments the subject has a disease or condition that can be treated with an engineered cell provided herein or population thereof. In some aspects, the disease or condition is a cancer.
[00110] As used herein, the term "promoter" refers to a nucleotide sequence (e.g. DNA
sequence) capable of controlling the expression of a coding sequence or functional RNA. The promoter sequence consists of proximal and more distal upstream elements, the latter elements often referred to as enhancers. A promoter can be derived from natural genes in its entirety, can be composed of different elements from different promoters found in nature, and/or may comprise synthetic DNA segments. A promoter, as contemplated herein, can be endogenous to the cell of interest or exogenous to the cell of interest. It is appreciated by those skilled in the art that different promoters can induce gene expression in different tissue or cell types, or at different developmental stages, or in response to different environmental conditions. As is known in the art, a promoter can be selected according to the strength of the promoter and/or the conditions under which the promoter is active, e.g., constitutive promoter, strong promoter, weak promoter, inducible/repressible promoter, tissue specific Or developmentally regulated promoters, cell cycle-dependent promoters, and the like.
[00111] A promoter can be an inducible promoter (e.g., a heat shock promoter, tetracycline- regulated promoter, steroid-regulated promoter, metal-regulated promoter, estrogen receptor- regulated promoter, etc.). The promoter can be a constitutive promoter (e.g., CMV promoter, UBC promoter). In some embodiments, the promoter can be a spatially restricted and/or temporally restricted promoter (e.g., a tissue specific promoter, a cell type specific promoter, etc.). See for example US Application No. 15/715,068, the disclosures of which are herein incorporated by reference in their entirety.
[00112] Gene editing, as contemplated herein, may involve a gene (or nucleotide sequence) knock-in or knock-out. As used herein, the term "knock-in" refers to an addition of a DNA sequence, or fragment thereof into a genome. Such DNA sequences to be knocked-in may include an entire gene or genes, may include regulatory sequences associated with a gene or any portion or fragment of the foregoing. For example, a polynucleotide donor construct encoding a recombinant protein may be inserted into the genome of a cell carrying a mutant gene. In some embodiments, a knock-in strategy involves substitution of an existing sequence with the provided sequence, e.g., substitution of a mutant allele with a wild-type copy. On the other hand, the term "knock-out" refers to the elimination of a gene or the expression of a gene. For example, a gene can be knocked out by either a deletion or an addition of a nucleotide sequence that leads to a disruption of the reading frame. As another example, a gene may be knocked out by replacing a part of the gene with an irrelevant (.e.g., non-coding) sequence.
[00113] As used herein, the term "non-homologous end joining" or NHEJ refers to a cellular process in which cut or nicked ends of a DNA strand are directly ligated without the need for a homologous template nucleic acid. NHEJ can lead to the addition, the deletion, substitution, or a combination thereof, of one or more nucleotides at the repair site.
[00114] As used herein, the term -homology directed repair" or I-1DR refers to a cellular process in which cut or nicked ends of a DNA strand are repaired by polymerization from a homologous template nucleic acid. Thus, the original sequence is replaced with the sequence of the template. The homologous template nucleic acid can be provided by homologous sequences elsewhere in the genome (sister chromatids, homologous chromosomes, or repeated regions on the same or different chromosomes). Alternatively, an exogenous template nucleic acid can be introduced to obtain a specific HDR-induced change of the sequence at the target site. In this way, specific mutations can be introduced at the cut site.
[00115] The terms "vector" and "plasmid" are used interchangeably and as used herein refer to polynucleotide vehicles useful to introduce genetic material into a cell. Vectors can be linear or circular. Vectors can integrate into a target genome of a host cell or replicate independently in a host cell. Vectors can comprise, for example, an origin of replication, a multicloning site, and/or a selectable marker. An expression vector typically comprises an expression cassette. Vectors and plasmids include, but are not limited to, integrating vectors, prokaryotic plasmids, eukaryotic plasmids, plant synthetic chromosomes, episomes, viral vectors, cosmids, and artificial chromosomes.
[00116] As used herein the term "expression cassette" is a polynucleotide construct, generated recombinantly or synthetically, comprising regulatory sequences operably linked to a selected polynucleotide to facilitate expression of the selected polynucleotide in a host cell.

For example, the regulatory sequences can facilitate transcription of the selected polynucleotide in a host cell, or transcription and translation of the selected polynucleotide in a host cell. An expression cassette can, for example, be integrated in the genome of a host cell or be present in an expression vector.
[00117] As used herein, the phrase "subject in need thereof' refers to a subject that exhibits and/or is diagnosed with one or more symptoms or signs of a disease or disorder as described herein.
[00118] A "chemotherapeutic agent" refers to a chemical compound useful in the treatment of cancer. Chemotherapeutic agents include "anti-hormonal agents" or "endocrine therapeutics" which act to regulate, reduce, block, or inhibit the effects of hormones that can promote the growth of cancer.
[00119] The term "composition" refers to a mixture that contains, e.g., an engineered cell or protein contemplated herein. In some embodiments, the composition may contain additional components, such as adjuvants, stabilizers, cxcipients, and the like. The term -composition" or -pharmaceutical composition" refers to a preparation which is in such form as to permit the biological activity of an active ingredient contained therein to be effective in treating a subject, and which contains no additional components which are unacceptably toxic to the subject in the amounts provided in the pharmaceutical composition.
[00120] As used herein, the term "effective amount" refers to the amount of a compound (e.g., a compositions described herein, cells described herein) sufficient to effect beneficial or desired results. An effective amount can be administered in one or more administrations, applications or dosages and is not intended to be limited to a particular formulation or administration route. As used herein, the term "treating" includes any effect, e.g., lessening, reducing, modulating, ameliorating or eliminating, that results in the improvement of the condition, disease, disorder, and the like, or ameliorating a symptom thereof.
[00121] The terms "modulate" and "modulation- refer to reducing or inhibiting or, alternatively, activating or increasing, a recited variable.
[00122] The terms "increase" and "activate" refer to an increase of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 2-fold, 3-fold, 4-fold, 5-fold, 10-fold, 20-fold, 50-fold, 100-fold, or greater in a recited variable.
[00123] The terms "reduce" and "inhibit" refer to a decrease of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 2-fold, 3-fold, 4-fold, 5-fold, 10-fold, 20-fold, 50-fold, 100-fold, or greater in a recited variable.

Safe Harbor Loci [00124] Gene editing therapies include, for example, viral vector integration and site specific integration. Site-specific integration is a promising alternative to random integration of viral vectors, as it mitigates the risks of insertional mutagenesis or insertional oncogenesis (Kolb et al. Trends Biotechnol. 2005 23:399-406; Porteus et al. Nat Biotechnol. 2005 23:967-973, Paques et al. Curr Gen Ther. 2007 7:49-66). However, site specific integration continues to face challenges such as poor knock-in efficiency, risk of insertional oncogenesis, unstable and/or anomalous expression of adjacent genes or the transgene, low accessibility (e.g. within 20 kB of adjacent genes), etc.. These challenges can be addressed, in part, through the identification and use of safe harbor loci or safe harbor sites (SHS), which are sites in which genes or genetic elements can be incorporated without disruption to expression or regulation of adjacent genes.
[00125] The most widely used of the putative human safe harbor sites is the AAVS1 site on chromosome 19q, which was initially identified as a site for recurrent adenoassociated virus insertion. Other potential SHS have been identified on the basis of homology, with sites first identified in other species (e.g., the human homolog of the permissive murine Rosa26 locus) or among the growing number of human genes that appear non-essential under some circumstances. One putative SHS of this type is the CCR5 chemokine receptor gene, which, when disrupted, confers resistance to human immunodeficiency virus infection.
Additional potential genomic SHS have been identified in human and other cell types on the basis of viral integration site mapping or gene-trap analyses, as was the original murine Rosa26 locus. The three top SHS, AAVS1, CCR5, and Rosa26, are in close proximity to many protein coding genes and regulatory elements (See FIG. 2 from Sadelain, M., et al (2012). Safe harbours for the integration of new DNA in the human genome.
Nature reviews Cancer, 12(1), 51-58, the relevant disclosures of which are herein incorporated by reference in their entirety).
[00126] The AAVS1 (also known as the PPP1R12C locus) on human chromosome 19 is a known SHS for hosting transgenes (e.g. DNA transgenes) with expected function.
It is at position 19q13.42. It has an open chromatin structure and is transcription-competent. The canonical SHS locus for AAVS1 is chr19: 55,625,241-55,629,351. See Pellenz et al. "New Human Chromosomal Sites with "Safe Harbor" Potential for Targeted Transgene Insertion."
Human gene therapy vol. 30,7 (2019): 814-828, the relevant disclosures of which are herein incorporated by reference. An exemplary AAVS1 target gRNA and target sequence are provided below:
= AAVS1-gRNA sequence:
ggggccactagggacaggatGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTA
GTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTT
= AAVS1 target sequence: ggggccactagggacaggat [00127] CCR5, which is located on chromosome 3 at position 3p21.31, encodes the major co-receptor for HIV-1. Disruption at this site in the CCR5 gene has been beneficial in HIV/A1DS therapy and prompted the development of zinc-finger nucleases that target its third exon. The canonical SHS locus for CCR5 is chr3: 46,414,443-46,414,942.
See Pellenz et al. "New Human Chromosomal Sites with "Safe Harbor" Potential for Targeted Transgene Insertion." Human gene therapy vol. 30,7 (2019): 814-828, the relevant disclosures of which are herein incorporated by reference.
[00128] The mouse Rosa26 locus is particularly useful for genetic modification as it can be targeted with high efficiency and is expressed in most cell types tested.
Trion et al. 2007 ("Identification and targeting of the ROS A26 locus in human embryonic stem cells "Nature biotechnology 25.12 (2007): 1477-1482, the relevant disclosure of which are herein incorporated by reference) identified the human homolog, human ROSA26, in chromosome 3 (position 3p25.3).The canonical SHS locus for human Rosa26 (hRosa26) is chr3:
9,415,082-9,414,043. See Pellenz et al. "New Human Chromosomal Sites with "Safe Harbor"
Potential for Targeted Transgene Insertion." Human gene therapy vol. 30,7 (2019): 814-828, the relevant disclosures of which are herein incorporated by reference.
[00129] Additional examples of safe harbor sites are provided in Pellenz et al. "New Human Chromosomal Sites with "Safe Harbor" Potential for Targeted Transgene Insertion."
Human gene therapy vol. 30,7 (2019): 814-828, the relevant disclosures of which are herein incorporated by reference.
[00130] The present disclosure is directed to methods for identifying safe harbor loci with benefits including, but not limited to, high knock-in efficiency and high expression of transgene. An example of applications of the presently disclosed methods is the identification of safe harbor loci for insertion of transgenes (e.g., chimeric antigen receptors (CAR)) into T-cells. In some embodiments, the safe harbor loci of the present disclsoure are useful for the insertion of a sequence encoding a transgene. In some embodiments, the safe harbor sites allow for high transgene expression (sufficient to allow for transgene functionality or treatment of a disease of interest) and stable expression of the transgene over several days, weeks or months. In some embodiments, knockout of the gene at the safe harbor locus confers benefit to the function of the cell, or the gene at the safe harbor locus has no known function within the cell. In some embodiments the safe harbor locus results in stable transgene expression in vitro with or without CD3/CD28 stimulation, negligible off-target cleavage as detected by iGuide-Seq or CRISPR-Seq, less off-target cleavage relative to other loci as detected by iGuide-Seq or CRISPR-Seq, negligible transgene-independent cytotoxicity, negligible transgene-independent cytokine expression, negligible transgene-independent chimeric antigen receptor expression, negligible deregulation or silencing of nearby genes, and positioned outside of a cancer-related gene.
[00131] As used, a "nearby gene" can refer to a gene that is within about 1001(13, about 125kB, about 1501(13, about 175kB, about 200kB, about 225kB, about 250kB, about 2751(B, about 3001(B, about 325kB, about 350kB, about 375kB, about 4001(B, about 4251(B, about 450kB, about 475kB, about 500kB, about 525kB, about 550kB away from the safe harbor locus (integration site).
[00132] In some embodiments, the present disclosure contemplates inserts that comprise one or more transgenes. The transgene can encode a therapeutic protein, an antibody, a peptide, a suicide gene, an apoptosis gene or any other gene of interest. The safe harbor loci identified using the method described herein allow for transgene integration that results in, for example, enhanced therapeutic properties These enhanced therapeutic properties, as used herein, refer to an enhanced therapeutic properly of a cell when compared to a typical immune cell of the same normal cell type. For example, an NK cell having -enhanced therapeutic properties" has an enhanced, improved, and/or increased treatment outcome when compared to a typical, unmodified and/or naturally occurring NK cell. The therapeutic properties of immune cells can include, but are not limited to, cell transplantation, transport, homing, viability, self-renewal, persistence, immune response control and regulation, survival, and cytotoxicity. The therapeutic properties of immune cells are also manifested by:
antigen-targeted receptor expression; HLA presentation or lack thereof;
tolerance to the intratumoral microenvironment; induction of bystander immune cells and immune regulation;
improved target specificity with reduction; resistance to treatments such as chemotherapy.
[00133] As used herein, the term "insert size" refers to the length of the nucleotide sequence being integrated (inserted) at the safe harbor site. In some embodiments, the insert size comprises at least about 100, 200, 300, 400 or 500 basepairs. In some embodiments, the insert size comprises about 500 nucleotides or basepairs. In some embodimetns, the insert size comprises up to 0.5, 0.6, 0.7, 0.8, 0.9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 kbp (kilo basepairs) or the sizes in between. In some embodiments, the insert size is greater than 0.5, 0.6, 0.7, 0.8, 0.9, 1, 2, 3, 4, 5, 6, 7, 8,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 kbp or the sizes in between. In some embodiments, the insert size is within the range of 3-15 kbp or is any number in that range. In some embodiments, the insert size is within the range of 1.5-8.3 kbp or is any number in that range. In some embodiments, the insert size is within the range of 1.5-15 kbp or is any number in that range.
In some embodiments, the insert size is within the range of 0.5-20 kbp or is any number in that range.
In some embodiments, the insert size is 0.5-10, 0.6-10, 0.7-10, 0.8-10, 0.9-10, 1-10, 2-10, 3-10, 4-10, 5-10, 6-10, 7-10, 8-10, 9-10 kbp. In some embodiments, the insert size is 0.5-11, 0.6-11, 0.7-11, 0.8-11, 0.9-11, 1-11, 2-11, 3-11, 4-11, 5-11, 6-11, 7-11, 8-11, 9-11, or 10-11 kbp. In some embodiments, the insert size is 0.5-12, 0.6-12, 0.7-12, 0.8-12, 0.9-12, 1-12, 2-12, 3-12, 4-12, 5-12, 6-12, 7-12, 8-12, 9-12, 10-12, or 11-12 kbp. In some embodiments, the insert size is 0.5-13, 0.6-13, 0.7-13, 0.8-13, 0.9-13, 1-13, 2-13, 3-13, 4-13, 5-13, 6-13, 7-13, 8-13, 9-13, 10-13, 11-13, or 12-13 kbp. In some embodiments, the insert size is 0.5-14, 0.6-14, 0.7-14, 0.8-14, 0.9-14, 1-14, 2-14, 3-14, 4-14, 5-14, 6-14, 7-14, 8-14, 9-14, 10-14, 11-14, 12-14 or 13-14 kbp. In some embodiments, the insert size is 0.5-15, 0.6-15, 0.7-15, 0.8-15, 0.9-15, 1-15, 2-15, 3-15, 4-15, 5-15, 6-15, 7-15, 8-15, 9-15, 10-15, 11-15, 12-15, 13-15, or 14-15 kbp. In some embodiments, the insert size is 0.5-16, 0.6-16, 0.7-16, 0.8-16, 0.9-16, 1-16, 2-16, 3-16, 4-16, 5-16, 6-16, 7-16, 8-16, 9-16, 10-16, 11-16, 12-16, 13-16, 14-16 or 15-16 kbp. In some embodiments, the insert size is 0.5-17, 0.6-17, 0.7-17, 0.8-17, 0.9-17, 1-17, 2-17, 3-17, 4-17, 5-17, 6-17, 7-17, 8-17, 9-17, 10-17, 11-17, 12-17, 13-17, or 14-17, 15-17 or 16-17 kbp. In some embodiments, the insert size is 0.5-18, 0.6-18, 0.7-18, 0.8-18, 0.9-18, 1-18, 2-18, 3-18, 4-18, 5-18, 6-18, 7-18, 8-18, 9-18, 10-18, 11-18, 12-18, 13-18, 14-18, 15-18, 16-18 or 17-18 kbp. In some embodiments, the insert size is 0.5-19, 0.6-19, 0.7-19, 0.8-19, 0.9-19, 1-19, 2-19, 3-19, 4-19, 5-19, 6-19, 7-19, 8-19, 9-19, 10-19, 11-19, 12-19, 13-19, 14-19, 15-19, 16-19, 17-19, or 18-19 kbp. In some embodiments, the insert size is 0.5-20, 0.6-20, 0.7-20, 0.8-20, 0.9-20, 1-20, 2-20, 3-20, 4-20, 5-20, 6-20, 7-20, 8-20, 9-20, 10-20, 11-20, 12-20, 13-20, 14-20, 15-20, 16-20, 17-20, 18-20, or 19-20 kbp.
[00134] The inserts of the present disclosure refer to nucleic acid molecules or polynucleotide inserted at a safe harbor site. In some embodiments, the nucleotide sequence is a DNA molecule, e.g., genomic DNA, or comprises deoxy-ribonucleotides. In some embodiments, the insert comprises a smaller fragment of DNA, such as a plastid DNA, mitochondrial DNA, or DNA isolated in the form of a plasmid, a fosmid, a cosmid, a bacterial artificial chromosome (BAC), a yeast artificial chromosome (YAC), and/or any other sub-genome segment of DNA. In some embodiments, the insert is an RNA
molecule or comprises ribonucleotides. The nucleotides in the insert are contemplated as naturally occuring nucleotides, non-naturally occuring, and modified nucleotides.
Nucleotides may be modified chemically or biochemically, or may contain non-natural or derivatized nucleotide bases, as will be readily appreciated by those of skill in the art. Such modifications include, for example, labels, methylation, substitution of one or more of the naturally occurring nucleotides with an analog, internucleotide modifications. The polynucleotides can be in any topological conformation, including single-stranded, double-stranded, partially duplexed, triplexed, hairpinned, circular conformations, and other three-dimension conformations contemplated in the art.
[00135] The inserts can have coding and/or non-coding regions. The insert can comprises a non-coding sequence (e.g., control elements, e.g., a promoter sequence). In some embodiments, the insert encodes transcription factors. In some embodiments, the insert encodes an antigen binding receptors such as single receptors, T-cell receptors (TCRs), syn-notch, CARs, mAbs, etc. In some embodiments, the inserts are RNAi molecules, including, but not limited to, miRNAs, siRNA, shRNAs, etc. In some embodiments, the the insert is a human sequence. In some embodiments, the insert is chimeric. In some embodiments, the insert is a multi-gene/multi-module therapeutic cassette. A multi-gene/multi-module therapeutic cassette referst to an insert or cassette having one or more than one receptor (e.g., synthetic receptors), other exogenous protein coding sequences, non-coding RNAs, transcriptional regulatory elements, and/or insulator sequences, etc.
[00136] Various cell types are contemplated as having the safe harbor sites in the present disclosure. A cell comprising a safe harbor site and/or a cell comprising an insert at a safe harbor site as described in the present disclosure can be referred to as an engineered cell. The cells can include, but are not limited to, eukaryotic cells, prokaryotic cells, animal cells, plant cells, fungal cells and the like. Optionally, the cell is a mammalian cell, for example, a human cell. In some embodiments, that engineered cell is a stem cell, a human cell, a primary cell, an hematopoietic cell, an adaptive immune cell, an innate immune cell, a T cell or a T cell progenitor. Non-limiting examples of immune cells that are contemplated in the present disclosure include T cell, B cell, natural killer (NK) cell, NKT/iNKT
cell, macrophage, myeloid cell, and dendritic cells. Non-limiting examples of stem cells that are contemplated in the present disclosure include pluripotent stem cells (PSCs), embryonic stem cells (ESCs), induced pluripotent stem cells (iPSCs), embryo-derived embryonic stem cells obtained by nuclear transfer (ntES; nuclear transfer ES), male germline stem cells (GS cells), embryonic germ cells (EG cells), hematopoietic stem/progenitor stem cells (HSPCs), somatic stem cells (adult stem cells), hemangioblasts, neural stem cells, mesenchymal stem cells and stem cells of other cells (including osteocyte, chondrocyte, myocyte, cardiac myocyte, neuron, tendon cell, adipocyte, pancreocyte, hepatocyte, nephrocyte and follicle cells and so on). In some embodiments, the engineered cells is a T cell, NK cells, iPSC, and HSPC. In some embodiments, the engineered cells used in the present disclosure are human cell lines grown in vitro (e.g. deliberately immortalized cell lines, cancer cell lines, etc.).
[00137] The methods for integrating the inserts at the safe harbor sites can be viral or non-viral delivery techniques.
[00138] In some embodiments, the nucleic acid sequence is inserted into the genome of the engineered cell by introducing a vector, for example, a viral vector, comprising the nucleic acid. Examples of viral vectors include, but are not limited to, adeno-associated viral (AAV) vectors, retroviral vectors or lentiviral vectors. In some embodiments, the lentiviral vector is an integrase-deficientlentiviral vector.
[00139] In some embodiments, the nucleic acid sequence is inserted into the genome of the T cell via non-viral delivery. In non-viral delivery methods, the nucleic acid can be naked DNA, or in a non-vital plasmid or vector. Non-viral delivery techniques can be site-specific integration techniques, as described herein or known to those of ordinary skill in the art.
Examples of site-specific techniques for integration into the safe harbor loci include, without limitation, homology-dependent engineering using nucleases and homology independent targeted insertion using Cas9. In some embodiments, the non-viral delivery method comprises electroporation.
[00140] In some embodiments, the insert is integrated at a safe harbor site by introducing into the engineered cell, (a) a targeted nuclease that cleaves a target region in the safe harbor site to create the insertion site; and (b) the nucleic acid sequence (insert), wherein the insert is incorporated at the insertion site by, e.g., HDR. Examples of non-viral delivery techniques that can be used in the methods of the present disclosure are provided in US
Application Nos.
16/568,116 and 16/622,843, the relevant disclosures of which are herein incorporated by reference in their entirety.

[00141] The engineered cell can retain its undifferentiated state after insertion of the transgenes. In some embodiments, the engineered cell is undifferentiated. In some embodiments, the engineered cell is undifferentiated after insert of the transgene. In some embodiments, the engineered cell is CD45RA+ and CCR7+ after insertion of the transgene. In some embodiments, the engineered cell is CD45RA+CCR7+CD27+ after insertion of the transgene.
CAR T cell Therapy [00142] Chimeric antigen receptor (CAR) T cells are T cells that have been genetically engineered to produce an artificial T-cell receptor for use in immunotherapy.
Chimeric antigen receptors are receptor proteins that have been engineered to confer T
cells with the ability to target a specific protein. The genetic modification of lymphocytes (e.g. T cells) by incorporation of, for example, CARs, and administration of the engineered cells to a subject is an example of "adoptive cell therapy-. As used herein, the term "adoptive cell therapy"
refers to cell-based immunotherapy for transfusion of autologous or allogeneic lymphocytes, referred to as T cells or B cells. In this CAR therapy approach, cells are expanded and cultured ex vivo and genetically modified, prior to transfusion.
[00143] The expression of CARs allows the engineered T-cells to target and bind specific proteins, for example, tumor antigens. In CAR therapy, T-cells are harvested from a subject¨they can be autologous T-cells from the subject own blood or from a donor that will not be receiving the CAR therapy. Once isolated, the T-cells are genetically modified with a CAR, expanded ex vivo, and administered to the subject (i.e. patient) by, e.g.
infusion.
[00144] The CARs may be introduced into the T-cells using, for example, a viral technique (e.g., retroviral integration) or site-specific technique. With site specific integration of the transgenes (e.g. CARs), the transgenes may be targeted to a safe harbor locus.
Examples of site-specific techniques for integration into the safe harbor loci include, without limitation, homology-dependent engineering using nucleases and homology independent targeted insertion using Cas9.
[00145] The engineered CAR T cells have applications to immune-oncology. The CAR, for example, can be selected to target a specific tumor antigen. Examples of cancers that can be effectively targeted using CAR T cells are blood cancers. In some embodiments, CAR T
cell therapy can be used to treat solid tumors.

Gene editing [00146] The terms "gene editing" or "genome editing", as used herein, refer to a type of genetic manipulation in which DNA is inserted, replaced, or removed from the genome using artificially manipulated nucleases or "molecular scissors". It is a useful tool for elucidating the function and effect of sequence-specific genes or proteins or altering cell behavior (e.g.
for therapeutic purposes).
[00147] Currently available genome editing tools include zinc finger nucleases (ZEN) and transcription activator-like effector nucleases (TALENs) to incorporate genes at safe harbor loci (.e.g. the adeno-associated virus integration site 1 (AAVS1) safe harbor locus). The DICE (dual integrase cassette exchange) system utilizing phiC31 integrase and Bxbl integrase is a tool for target integration. Additionally, clustered regularly interspaced short palindromic repeat/Cas9 (CRISPR/Cas9) techniques can be used for targeted gene insertion.
[00148] Site specific gene editing approaches can include homology dependent mechanisms or homology independent mechanisms.
[00149] All methods known in the art for targeted insertion of gene sequences are contemplated in the methods described herein to insert constructs at safe harbor loci.
Crispr-Cas Gene editing [00150] One effective example of gene editing is the Crisp-Cas approach (e.g. Crispr-Cas9). This approach incorporates the use of a guide polynucleotide (e.g.
guide ribonucleic acid or gRNA) and a cas endonuclease (e.g. Cas9 endonuclease).
[00151] As used herein, a polypeptide referred to as a "Cas endonuclease" or having "Cas endonuclease activity" refers to a CRISPR-related (Cas) polypeptide encoded by a Cas gene, wherein a Cas polypeptide is a target DNA sequence that can be cleaved when operably linked to one or more guide polynucleotides (see, e.g., US Pat. No.
8,697,359). Also included in this definition are variants of Cas endonuclease that retain guide polynucleotide-dependent endonuclease activity. The Cas endonuclease used in the donor DNA insertion method detailed herein is an endonuclease that introduces double-strand breaks into DNA at the target site (e.g., within the target locus or at the safe harbor site).
[00152] As used herein, the term "guide polynucleotide" relates to a polynucleotide sequence capable of complexiiw with a Cas endonuclease and allowing the Cas endonuclease to recognize and cleave a DNA target site. The guide polynucleotide can be a single molecule or a double molecule. The guide polynucleotide sequence can be an RNA
sequence, a DNA
sequence, or a combination thereof (RNA-DNA combination sequence). A guide polynucleotide comprising only ribonucleic acid is also referred to as "guide RNA". In some embodiments, a polynucleotide donor construct is inserted at a safe harbor locus using a guide RNA (gRNA) in combination with a cas endonuclease (e.g. Cas9 endonuclease).
[00153] The guide polynucleotide includes a first nucleotide sequence domain (also referred to as a variable targeting domain or VT domain) that is complementary to a nucleotide sequence in the target DNA, and a second nucleotide that interacts with a Cas endonuclease polypeptide. It can be a double molecule (also referred to as a double-stranded guide polynucleotide) comprising a sequence domain (referred to as a Cas endonuclease recognition domain or CER domain). The CER domain of this double molecule guide polynucleotide comprises two separate molecules that hybridize along the complementary region. The two separate molecules can be RNA sequences, DNA sequences and/or RNA-DNA combination sequences.
[00154] Genome editing using CRISPR-Cas approaches relies on the repair of site-specific DNA double-strand breaks (DSBs) induced by the RNA-guided Cas endonuclease (e.g. Cas 9 endonuclease). Homology-directed repair (I-1DR) of these DSBs enables precise editing of the genome by introducing defined genomic changes, including base substitutions, sequence insertions, and deletions. Conventional 1-IDR-based CRISPR/Cas9 genome-editing involves transfecting cells with Cas9, gRNA and donor DNA containing homologous arms matching the genomic locus of interest.
[00155] HITI (homology independent targeted insertion) uses a non-homologous end joining (NHEJ)-based homology-independent strategy and the method can be more efficient than 1-IDR. Guide RNAs (gRNAs) target the insertion site. For HITI, donor plasmids lack homology arms and DSB repair does not occur through the HDR pathway. The donor polynucleotide construct can be engineered to include Cas9 cleavage site(s) flanking the gene or sequence to be inserted. This results in Cas9 cleavage at both the donor plasmid and the genomic target sequence. Both target and donor have blunt ends and the linearized donor DNA plasmid is used by the NHEJ pathway resulting integration into the genomic DSB site.
(See, for example, Suzuki, K., et al. (2016). In vivo genome editing via CRISPR/Cas9 mediated homology-independent targeted integration. Nature, 540(7631), 144-149, the relevant disclosures of which are herein incorporated in their entirety).
[00156] Methods for conducing gene editing using CRISPR-Cas approaches are known to those of ordinary skill in the art. (See, for example, US Application Nos.
US16/312,676, US15/303,722, and US15/628,533, the disclosures of which are herein incorporated by reference in their entirety). Additionally, uses of endonucleases for inserting transgenes into safe harbor loci are described, for example, in US Application No. 13/036,343, the disclosures of which are herein incorporated by reference in their entirety.
[00157] The guide RNAs and/or mRNA (or DNA) encoding an endonuclease can be chemically linked to one or more moieties or conjugates that enhance the activity, cellular distribution, or cellular uptake of the oligonucleotide. Non-limiting examples of such moieties include lipid moieties such as a cholesterol moiety, cholic acid, a thioether, a thiocholesterol, an aliphatic chain (e.g., dodecandiol or undecyl residues), a phospholipid, e.g., di-hexadecyl-rac-glycerol or triethylammonium 1 ,2-di-O-hexadecyl- rac-glycero-3-H-phosphonate, a polyamine or a polyethylene glycol chain, adamantane acetic acid, a palmityl moiety and an octadecylamine or hexylamino-carbonyl-t oxycholesterol moiety.
See for example US Application No. 15/715,068, the disclosures of which are herein incorporated by reference in their entirety.
Therapeutic Applications [00158] For therapeutic applications, the engineered cells, populations thereof, or compositions thereof are administered to a subject, generally a mammal, generally a human, in an effective amount.
[00159] The engineered cells may be administered to a subject by infusion (e.g., continuous infusion over a period of time) or other modes of administration known to those of ordinary skill in the art.
[00160] The engineered cells provided herein can be administered as part of a pharmaceutical compositions. In some embodimetns, the present disclosure provides compositions comprising a guide RNA of the present disclosure. The pharmaceutical composition may comprise one or more pharmaceutical excipients. Any suitable pharmaceutical excipient may be used, and one of ordinary skill in the art is capable of selecting suitable pharmaceutical excipients. Accordingly, the pharmaceutical excipients provided below are intended to be illustrative, and not limiting. Additional pharmaceutical excipients include, for example, those described in the Handbook of Pharmaceutical Excipients, Rowe et at. (Eds.) 6th Ed. (2009), incorporated by reference in its entirety.
[00161] The engineered cells provided herein not only find use in gene therapy but also in non-pharmaceutical uses such as, e.g., production of animal models and production of recombinant cell lines expressing a protein of interest.

[00162] The engineered cells of the present disclosure can be any cell, generally a mammalian cell, generally a human cell that has been modified by integrating a transgene at a safe harbor locus described herein. In some embodiments, the engineered cells are immune cells. In some embodiments, the engineered cells are lymphocytes. In some embodiments, the engineered cells are T cells or T cell progenitors.
[00163] The engineered cells, compositions and methods of the present disclosure are useful for therapeutic applications such as CAR T cell therapy and TCR T cell therapy. In some embodiments, the insertion of a sequence encoding a transgene within a safe harbor locus maintains the TCR expression relative to instances when there is no insertion and enables transgene expression while maintaining TCR function.
[00164] Various diseases treated using the engineered cells, populations thereof, or compositions thereof are provided herein. Non-limiting examples of such diseases include alopecia areata, autoimmune hemolytic anemia, autoimmune hepatitis, cancer, dcrmatomyositis, diabetes (type 1), certain juvenile idiopathic arthritis, glomcruloncphritis, Graves' disease, Guillain Valley Syndrome, idiopathic thrombocytopenic purpura, myasthenia gravis, certain myocarditis, multiple sclerosis, pemphigus/pemphigoid, pernicious anemia, polyarteritis nodosa, polymyositis, primary bile With cirrhosis, psoriasis, rheumatoid arthritis, scleroderma/systemic sclerosis, Sjogren's syndrome, systemic lupus erythematosus, certain thyroiditis, certain uveitis, vitiligo, multiple vasculitis (Wegener)); autoimmune disorders including, but not limited to, granulomatosis; hematopoietic tumors including but not limited to acute and chronic leukemia, lymphoma, multiple myeloma and myelodysplastic syndrome; tumors of the prostate, breast, lung, colon, uterus, skin, liver, bone, pancreas, ovary, testis, bladder, kidney, head, neck, stomach, cervix, rectum, larynx, or esophagus solid tumors; HIV (human immunodeficiency virus) related disorders, RSV
(respiratory syncytial virus) related disorders; EBV (Epstein-Barr virus) related disorders;
CMV (cytomegalovirus) related disorders; and infectious diseases including, but not limited to, adenovirus-related disorders and BK polyomavirus-related disorders.
100165] Cancers that can be treated with the engineered cells (e.g., CAR T-cells) of the present disclosure, populations thereof, or compositions thereof include blood cancers. In some embodiments, the cancer treated using the engineered cells (e.g., CAR T-cells) described herein, populations thereof, or compositions thereof is a hematologic malignancy or leukemia. In some embodiments, the engineered cells (e.g., CAR T-cells) described herein, populations thereof, or compositions thereof are used for the treatment of acute lymphoblastic leukemia (ALL) or diffuse large B-cell lymphoma (DLBCL). In some embodiments, the cancer is acute myeloid leukemia (AML), acute lymphoblastic leukemia (ALL), myelodysplasia, myelodysplastic syndromes, acute T-lymphoblastic leukemia, or acute promyelocytic leukemia, chronic myelomonocytic leukemia, or myeloid blast crisis of chronic myeloid leukemia. Examples of cancers treatable using the engineered cells (e.g., CAR T-cells) described herein include, without limitation, breast cancer, ovarian cancer, esophageal cancer, bladder or gastric cancer, salivary duct carcinoma, salivary duct carcinomas, adenocarcinoma of the lung or aggressive forms of uterine cancer, such as uterine serous endometrial carcinoma. In some other embodiments, the cancer is brain cancer, breast cancer, cervical cancer, colon cancer, colorectal cancer, endometrial cancer, esophageal cancer, leukemia, lung cancer, liver cancer, melanoma, ovarian cancer, pancreatic cancer, rectal cancer, renal cancer, stomach cancer, testicular cancer, or uterine cancer. In yet other embodiments, the cancer is a squamous cell carcinoma, adenocarcinoma, small cell carcinoma, melanoma, ncuroblastoma, sarcoma (e.g., an angiosarcoma or chondrosarcoma), larynx cancer, parotid cancer, biliary tract cancer, thyroid cancer, acral lentiginous melanoma, actinic keratoses, acute lymphocytic leukemia, acute myeloid leukemia, adenoid cystic carcinoma, adenomas, adenosarcom a, adenosquamous carcinoma, anal canal cancer, anal cancer, anorectum cancer, astrocytic tumor, bartholin gland carcinoma, basal cell carcinoma, biliary cancer, bone cancer, bone marrow cancer, bronchial cancer, bronchial gland carcinoma, carcinoid, cholangiocarcinoma, chondrosarcoma, choroid plexus papilloma/calcinoma, chronic ly mphocy tic leukemia, chronic myeloid leukemia, deal cell carcinoma, connective tissue cancer, cystadenoma, digestive system cancer, duodenum cancer, endocrine system cancer, endodermal sinus tumor, endometrial hyperplasia, endometrial stromal sarcoma, endometrioid adenocarcinoma, endothelial cell cancer, ependymal cancer, epithelial cell cancer, Ewing's sarcoma, eye and orbit cancer, female genital cancer, focal nodular hyperplasia, gallbladder cancer, gastric antrum cancer, gastric fundus cancer, gastrinoma, glioblastoma, glucagonoma, heart cancer, hemangioblastomas, hemangioendothelioma, hemangiomas, hepatic adenoma, hepatic adenomatosis, hepatobiliary cancer, hepatocellular carcinoma, Hodgkin's disease, ileum cancer, insulinoma, intraepithelial neoplasia, interepithelial squamous cell neoplasia, intrahepatic bile duct cancer, invasive squamous cell carcinoma, jejunum cancer, joint cancer, Kaposi's sarcoma, pelvic cancer, large cell carcinoma, large intestine cancer, leiomyosarcoma, lentigo maligna melanomas, lymphoma, male genital cancer, malignant melanoma, malignant mesothelial tumors, medulloblastoma, medulloepithelioma, meningeal cancer, mesothelial cancer, metastatic carcinoma, mouth cancer, mucoepidermoid carcinoma, multiple myeloma, muscle cancer, nasal tract cancer, nervous system cancer, neuroepithelial adenocarcinoma nodular melanoma, non-epithelial skin cancer, non-Hodgkin's lymphoma, oat cell carcinoma, oligodendroglial cancer, oral cavity cancer, osteosarcoma, papillary serous adenocarcinoma, penile cancer, pharynx cancer, pituitary tumors, plasmacytoma, pseudosarcoma, pulmonary blastoma, rectal cancer, renal cell carcinoma, respiratory system cancer, retinoblastoma, rhabdomyosarcoma, sarcoma, serous carcinoma, sinus cancer, skin cancer, small cell carcinoma, small intestine cancer, smooth muscle cancer, soft tissue cancer, somatostatin-secreting tumor, spine cancer, squamous cell carcinoma, striated muscle cancer, submesothelial cancer, superficial spreading melanoma, T cell leukemia, tongue cancer, undifferentiated carcinoma, ureter cancer, urethra cancer, urinary bladder cancer, urinary system cancer, uterine cervix cancer, uterine corpus cancer, uveal melanoma, vaginal cancer, vcrrucous carcinoma, VIPoma, vulva cancer, well-differentiated carcinoma, or Wilms tumor.
[00166] In some embodiments, the present disclosure provides methods of treating a subject in need of treatment by administering to the subject a composition comprising any of the engineered cells described herein. As used, the terms "treat,"
"treatment," and the like refer generally to obtaining a desired pharmacological and/or physiological effect. That effect is preventive in terms of complete or partial prevention of the disease and/or therapeutic in terms of partial or complete cure of the disease and/or adverse effects resulting from the disease. The term "treatment", as used herein, encompasses any treatment of a disease in a subject (e.g., mammal, e.g., human). Treatment may also refer to the administration of the engineered cells provided herein to a subject that is susceptible to the disease but has not yet been diagnosed as suffering from it, including preventing the disease from occurring;
inhibiting disease progression; or reducing the disease (i.e., causing a regression of the disease). Further, treatment may stabilize or reduce undesirable clinical symptoms in subjects (e.g., patients). The cells provided herein populations thereof, or compositions thereof may be administered before, during or after the occurrence of the disease or injury.
[00167] In certain embodiments, the subject has a disease, condition, and/or injury that can be treated and/or ameliorated by cell therapy. In some embodiments, the subject in need of cell therapy is a subject having an injury, disease, or condition, thereby causing cell therapy (e.g., therapy in which cellular material is administered to the subject). However, it is contemplated that it is possible to treat, ameliorate and/or reduce the severity of at least one symptom associated with the injury, disease or condition. In certain embodiments, a subject in need of cell therapy includes, but is not limited to, a bone marrow transplant or stem cell transplant candidate, a subject who has received chemotherapy or radiation therapy, a hyperproliferative disease or cancer (e.g., a hematopoietic system), a subject having or at risk of developing a hyperproliferative disease or cancer), a subject having or at risk of developing a tumor (e.g., solid tumor), viral infection or virus. It is also intended to encompass subjects suffering from or at risk of suffering from a disease associated with an infection.
[00168] In some embodiments, the present disclosure provides a composition of the present disclosure along with instructions for use. The instructions for use can be present in the kits as a package insert, in the labeling of the container of the kit or components thereof, or can be in digital form (e.g. on a CD-ROM, via a link on the interne . A kit can include one or more of a genome-targeting nucleic acid, a polynucleotide encoding a genome-targeting nucleic acid, a site-directed polypeptide, and/or a polynucleotide encoding a site-directed polypeptide. Additional components within the kits are also contemplated, for example, buffer (such as reconstituting buffer, stabilizing buffer, diluting buffer), and/or one or more control vectors.
Combination Therapies [00169] In some embodiments, an engineered cells of the present disclosure or composition thereof is administered with at least one additional therapeutic agent. Any suitable additional therapeutic agent may be administered with an engineered cell provided herein, populations thereof, or compositions thereof. In some aspects, the additional therapeutic agent is selected from radiation, an ophthalmologic agent, a cytotoxic agent, a chemotherapeutic agent, a cytostatic agent, an anti-hormonal agent, an immunostimulatory agent, an anti-angiogenic agent, and combinations thereof [00170] In some embodiments, an engineered cell of the present disclosure or composition thereof is administered with a steroid. The administration of a steroid can prevent or mitigate the risk of a subject receiving the engineered cell(s) or composition thereof having an autoimmune reaction.
[00171] The additional therapeutic agent may be administered by any suitable means. In some embodiments, the engineered cells described herein, populations thereof, or compositions thereof and the additional therapeutic agent is administered in the same pharmaceutical composition, e.g. by infusion. In some embodiments, the engineered cells described herein and additional therapeutic agent are included in different pharmaceutical compositions.
[00172] The pharmaceutical composition may comprise one or more pharmaceutical excipients. Any suitable pharmaceutical excipient may be used, and one of ordinary skill in the art is capable of selecting suitable pharmaceutical excipients.
Accordingly, the pharmaceutical excipients provided below are intended to be illustrative, and not limiting.
Additional pharmaceutical excipients include, for example, those described in the Handbook of Pharmaceutical Excipients, Rowe et at. (Eds.) 6th Ed. (2009), incorporated by reference in its entirety.
[00173] Various modes of administering the additional therapeutic agents are contemplated herein. In some embodiments, the additional therapeutic agent is administered by any suitable mode of administration. Generally, modes of administration include, without limitation, intravitreal, subretinal, suprachoroi dal, intraarterial, intradermal, intramuscular, intraperitoneal, intravenous, nasal, parcnteral, topical, pulmonary, and subcutaneous routes.
[00174] In embodiments where the engineered cells provided herein and the additional therapeutic agent are included in different pharmaceutical compositions, administration of the engineered cells provided herein can occur prior to, simultaneously, and/or following, administration of the additional therapeutic agent.
Additional Embodiments [00175] In some aspects, provided herein are engineered cells, comprising at least one sequence encoding a transgene, wherein the at least one sequence is inserted within a safe harbor locus; wherein the safe harbor locus is at any one or more of the sgRNA
target loci selected from: chr10:33130000-33140000, chr10:72290000-72300000, chr11:128340000-128350000, chrl 1:65425000-65427000 (NEAT1), chr15:92830000-92840000, chr16:11220000-11230000, chr2:87460000-87470000, chr3:186510000-186520000, chr3:59450000-59460000, chr8:127980000-128000000, chr9:7970000-7980000, APRT, B2M, CAPNS1, CBLB, CD2, CD3E, CD3G, CD5, EDF1, FTL, PTEN, PTPN2, PTPN6, PTPRC, PTPRCAP, RPS23, RTRAF, SERF2, SLC38A1, SMAD2, SOCS1, SRP14, SRSF9, SUB1, TET2, TIGIT, TRAC, and TR1M28.
[00176] In some embodiments, expression of the at least one sequence encoding the transgene is operatively linked to an endogenous promoter.
[00177] In some embodiments, expression of the at least one sequence encoding the transgene is operatively linked to an exogenous promoter.

[00178] In some embodiments, the target locus is selected from: chr10:33130000-33140000, chr10:72290000-72300000, chrl 1:128340000-128350000, chrl 1 :65425000-65427000 (NEAT1), chr15:92830000-92840000, chr16:11220000-11230000, chr2:87460000-87470000, chr3:186510000-186520000, chr3:59450000-59460000, chr8:127980000-128000000, and chr9:7970000-7980000.
[00179] In some embodiments, the target locus is chr11:128340000-128350000 or chr15:92830000-92840000.
[00180] In some embodiments, the target locus is a gene selected from: APRT, B2M, CAPNS1, CBLB, CD2, CD3E, CD3G, CD5, EDF1, FTL, PTEN, PTPN2, PTPN6, PTPRC, PTPRCAP, RPS23, RTRAF, SERF2, SLC38A1, SMAD2, SOCS1, SRP14, SRSF9, SUB1, TET2, TIGIT, TRAC, and TRIM28.
[00181] In some embodiments, the safe harbor locus is the GS94 or GS102 integration site in Table 4.
[00182] In some embodiments, the exogenous promoter is an EFla promoter.
[00183] In some embodiments, the engineered cell is a natural killer (NK) cell, an induced pluripotent stem cells (iPSC), a human pluripotent stem cell (HSPC), a T cell or a T cell progenitor.
[00184] In some embodiments, the transgene encodes a recombinant protein, a therapeutic agent, or a chimeric antigen receptor (CAR).
[00185] In some aspects, provided herein are compositions comprising the engineered cell described herein and a pharmaceutical excipient.
[00186] In some aspects, provided herein are guide ribonucleic acids (gRNA) for editing a cell at a safe harbor locus, wherein the gRNA comprises any one of SEQ ID
NOS:1-120.
[00187] In some aspects, provided herein are methods of editing a cell having chromosomal DNA, comprising inserting at least one sequence encoding a transgene within a safe harbor locus in the chromosomal DNA of the cell, wherein the safe harbor locus is at any one or more of the sgRNA target loci selected from: chr10:33130000-33140000, chr10:72290000-72300000, chr11:128340000-128350000, chr11:65425000-65427000 (NEAT1), chr15:92830000-92840000, chr16:11220000-11230000, chr2:87460000-87470000, chr3:186510000-186520000, chr3:59450000-59460000, chr8:127980000-128000000, chr9:7970000-7980000, APRT, B2M, CAPNS1, CBLB, CD2, CD3E, CD3G, CD5, EDFI, FTL, PTEN, PTPN2, PTPN6, PTPRC, PTPRCAP, RPS23, RTRAF, SERF2, SLC38A1, SMAD2, SOCS1, SRP14, SRSF9, SUB1, TET2, TIGIT, TRAC, and TRIM28.

[00188] In some embodiments, the target locus is selected from: chr10:33130000-33140000, chr10:72290000-72300000, chr11:128340000-128350000, chrl 1 :65425000-65427000 (NEAT1), chr15:92830000-92840000, chr16:11220000-11230000, chr2:87460000-87470000, chr3:186510000-186520000, chr3:59450000-59460000, chr8:127980000-128000000, and chr9:7970000-7980000.
[00189] In some embodiments, the target locus is chr11:128340000-128350000 or chr15:92830000-92840000.
[00190] In some embodiments, the target locus is a gene selected from: APRT, B2M, CAPNS1, CBLB, CD2, CD3E, CD3G, CD5, EDF1, FTL, PTEN, PTPN2, PTPN6, PTPRC, PTPRCAP, RPS23, RTRAF, SERF2, SLC38A1, SMAD2, SOCS1, SRP14, SRSF9, SUB1, TET2, TIGIT, TRAC, and TRIM28.
[00191] In some embodiments, the transgene encodes a recombinant protein, a therapeutic agent, or a chimeric antigen receptor (CAR).
[00192] In some embodiments, the at least one sequence comprises an exogenous promoter and the exogenous promoter is operably linked to the transgene.
[00193] In some embodiments, the cell is a T cell or T cell progenitor.
[00194] In some embodiments, the at least one sequence is inserted using a homology-directed repair or a homology independent targeted insertion [00195] In some embodiments, the at least one sequence is inserted using one or more guide ribonucleic acids (gRNAs) and one or more Cas9 endonucleases, wherein the one or more gRNAs comprises any one of SEQ ID NOS. 1-120.
[00196] In some aspects, provided herein are ex vivo methods of obtaining an engineered cell or population thereof, comprising: obtaining a cell; genetically modifying the cell by inserting at least one sequence encoding a transgene within a safe harbor locus, wherein the safe harbor locus is at any one or more of the sgRNA target loci selected from:
chr10:33130000-33140000, chr10:72290000-72300000, chrl 1:128340000-128350000, chrl 1:65425000-65427000 (NEAT1), chr15:92830000-92840000, chr16:11220000-11230000, chr2:87460000-87470000, chr3:186510000-186520000, chr3:59450000-59460000, chr8:127980000-128000000, chr9:7970000-7980000, APRT, B2M, CAPNS1, CBLB, CD2, CD3E, CD3G, CD5, EDF1, FTL, PTEN, PTPN2, PTPN6, PTPRC, PTPRCAP, RPS23, RTRAF, SERF2, SLC38A1, SMAD2, SOCS1, SRP14, SRSF9, SUB1, TET2, TIGIT, TRAC, and TRIM28.

[00197] In some embodiments, obtaining the cell comprises: (i) collecting a tissue sample from a subject, (ii) isolating the cells from the tissue samples, and (iii) culturing the cells in vitro.
[00198] In some embodiments, the cell is a stem cell, a natural killer (NK) cell, an induced pluripotent stem cells (iPSC), a human pluripotent stem cell (HSPC), a T cell or a T cell progenitor.
[00199] In some embodiments, the at least one sequence is inserted using a homology-directed repair or a homology independent targeted insertion.
[00200] In some embodiments, the genetically modifying in step (b) comprises contacting the cell with one or more guide ribonucleic acids (gRNAs), the at least one sequence, and one or more Cas9 endonucleases, wherein the one or more gRNAs and Cas9 endonucleases facilitate the insertion of the at least one sequence into chromosomal DNA
within the safe harbor locus and wherein the one or more gRNAs comprises any one of SEQ ID
NOS: 1-120.
[00201] In some embodiments, the at least one sequence comprises an exogenous promoter and the exogenous promoter is operably linked to the transgene.
[00202] In some aspects, provided herein are methods of treating a subject having or at risk of having a disease, comprising administering to the subject an effective amount of the engineered cell described herein.
[00203] In some aspects, provided herein are methods of treating a subject having or at risk of having a disease, comprising: conducting the method described herein;
and administering to the subject an effective amount of a composition comprising the cell or a population thereof.
[00204] In some embodiments, the disease is cancer.
EXAMPLES
[00205] The following are examples of methods and compositions of the invention. It is understood that various other embodiments may be practiced, given the general description provided herein [00206] Below are examples of specific embodiments for carrying out the present invention. The examples are offered for illustrative purposes only, and are not intended to limit the scope of the present invention in any way. Efforts have been made to ensure accuracy with respect to numbers used (e.g., amounts, temperatures, etc.), but some experimental error and deviation should, of course, be allowed for.

[00207] The practice of the present invention will employ, unless otherwise indicated, conventional methods of protein chemistry, biochemistry, recombinant DNA
techniques and pharmacology, within the skill of the art. Such techniques are explained fully in the literature. See, e.g., T.E. Creighton, Proteins: Structures and Molecular Properties (W.H.
Freeman and Company, 1993); A.L. Lehninger, Biochemistry (Worth Publishers, Inc., current addition); Sambrook, et al., Molecular Cloning: A Laboratory Manual (2nd Edition, 1989);
Methods In Enzymology (S. Colowick and N. Kaplan eds., Academic Press, Inc.);
Remington's Pharmaceutical Sciences, 18th Edition (Easton, Pennsylvania: Mack Publishing Company, 1990).
Example I: Identification of Knock-In Loci in Coding Regions [00208] The objective of the experiments in this set of examples was to identify T-cell knock-in (KT) loci in coding regions outside of the T Cell Receptor Alpha Constant (TRAC) locus. To select candidate loci, the criteria and requirements shown in Table 1 were used.
Table 1: Criteria used for identification of candidate knock-in loci within coding regions Criteria Detailed Requirements .. Datasets Considered for identification of loci = Knock in efficiency data on ¨90 genes derived from Roth, T. L., et al. 2019. (Rapid discovery of synthetic DNA
sequences to rewrite endogenous T cell circuits. bioRxiv, = High expression at 604561.) d2 (day 2) = Bulk RNA-seq data at dO, d2, d3 and d4:
= Similar expression = (i) publicly available data;
KI dynamics as TRAC = (ii) data derived from Roth, T. L., et al. 2019. (Rapid (stable expression discovery of synthetic DNA
sequences to rewrite efficiency upon activation) endogenous T cell circuits.
bioRxiv, 604561.); and = Accessible chromatin = (iii) internally generated data = gRNA editing = Bulk ATAC-seq data at dO, d2, d3 and d4:
efficiency = (i) data derived from Roth, T.
L., et al. 2019. (Rapid discovery of synthetic DNA sequences to rewrite endogenous T cell circuits. bioRxiv, 604561.); and = (ii) internally generated data = Does not affect proliferation, = T cell screening data cytolytic activity or = Achilles project on cancer cell lines dataSafety signaling of T cells = Annotated oncogenes and tumor suppressors from TCGA
= Be careful with = Annotated functions in immune cells tumor suppressors and oncogenes [00209] Generally, the requirements for candidate KI loci within the coding regions was that the coding gene is not essential for T cell function or the knock-out of that gene would be beneficial to chimeric antigen receptor therapy (CAR T) functions.
[00210] As shown in Table 1, some of the data utilized, for comparison purposes, to identify candidate KI loci was sourced, in part, from Roth, T. L., et aL 2019 (Rapid discovery of synthetic DNA sequences to rewrite endogenous T cell circuits. bioRxiv, 604561), the relevant disclosures of which are herein incorporated by reference in their entirety. A
summary of data in the Roth, T.L. et al. paper is shown in FIG. 3.
[00211] FIG 4A and FIG 4B show the processed RNA-seq data, showing that samples cluster based on activation status and by cell type. Donor 4 (AKI4) was identified as an outlier, as it clustered differently than the others, and was removed from remaining analysis.
The transcript expression data from RNA-seq experiments on the 90 genes cited in the Roth, T.L. et aL paper was correlated to transcript expression data in Roth, T.L. et al. paper. (See FIG. 5A and FIG. 5B).
[00212] FIG. 6 shows the process ATAC-seq data, which focused on the 10 kb area around the transcription start site (TS S). Again, donor 4 (AKI4) was identified as an outlier, based on expression profile in both CD4 and CDS cells, and was removed from remaining analysis. The results of the remaining ATAC-seq analysis are shown in FIGS. 7A-8. The data in FIG. 7A for TSS enrichment scores also indicated that all libraries were high quality.The data on open chromatin regions around the TRAC locus revealed the highest signal near exon 3 rather than exon 1 (FIG. 8).
Determination of Knock-In Efficiency:
[00213] The KI efficiency results (obtained using the model described infra) based on the donors, 2 gRNA, and 2 replicates/gRNAs from the Roth, T.L. et al. paper are shown in FIG. 9 and FIG. 10.
[00214] A linear model was built to estimate KI efficiency of approximately 90 genes using RNA-seq and ATAC-seq data. The linear model captured 33% of the variation in the data (i.e. R2=0.33). FIG. 11 shows KI efficiency versus day 2 (d2) RNA-seq data, day 4 (d4) RNA seq data, and d2 ATAC data. The linear model was applied to remaining candidate genes to estimate their KI efficiency.
[00215] The candidate coding loci were selected by first ranking all genes in the pooled data sets by predicted KI efficiency, using RNA-seq expression data from d2 (day 2).
Candidate coding loci were required to be stably expressed during T cell activation (e.g., <=2-fold expression change relative to day 0 (d0). The candidate loci also had to be accessible based on ATAC-seq data. Using that selection process/requirements, 16 well-characterized coding genes (with known functions) were selected as candidate genes. The knockout of these 16 would confer a benefit to the function of CART cells (e.g. B2M, CD5, SMAD2, PTPRC, CD3E). An additional 12 coding genes with high predicted KI
efficiency and no apparent essential function (inert coding genes) were selected as candidate genes. (See Table 2).
Table 2: KI Candidate Loci selected Genes with Known Functions Inert Coding Genes B2M (-70%) CD2 SUB1 CD3E (-50%) SMAD2 EDF1 PTPRC (-50%) SLC38A1 CAPNS 1 CD3G (-30%) TIGIT SRP14 Tested in Roth, T.L. et al CBLB PTPRCAP
paper (reported editing PTEN SRSF9 efficiency from Roth, T.L. et al paper) PTPN6 FTL

Example 2: Identification of Knock-In Loci in Gene Deserts [00216]
The criteria for selection of candidate loci in non-coding regions or gene deserts is summarized in Table 3.
Table 3: Criteria used for identification of candidate knock-in loci within non-coding regions Criteria Detailed Requirements Datasets Considered for identification of loci = Bulk ATAC-seq data at dO, d2, d3 and d4:
- (i) data derived from Roth, T. L., et al.
KI 2019. (Rapid discovery of synthetic = Accessible chromatin efficiency DNA sequences to rewrite endogenous T cell circuits. bioRxiv, 604561.); and - (ii) internally generated data = >10kb from cancer-related =
Ensembl genes Safety genes or TADS

= >10kb from any = Annotated oncogenes and tumor miRNA/functional small suppressors from TCGA
RNAs = Annotated enhancers from Encode and = >10kb from any 5' gene papers end = >10kb from any regulatory regions (ultra-conserved elements, enhancers) [00217] To select candidate regions within the non-coding regions, the inventors here started by looking in highly accessible regions of the genome (10 kb windows).
The most accessible region overlapped with (i) annotated protein coding genes (>50%
accessible regions), (ii) pseudogenes and noncoding RNAs (-20% accessible regions), and (iii) enhancer/regulatory regions (-20% accessible regions). The candidate genes were required to be < 10 kb from the coding regions. A few regions that overlapped with long intergenic noncoding RNAs (lincRNAs) but did not have apparent function in T cells were also considered.
[00218] Examples of other criteria that have been used in the art for selection of SHS are described, for example in Pellenz, S., etal. (2019). New human chromosomal sites with "safe harbor" potential for targeted transgene insertion. Human gene therapy, 30(7), 814-828, the relevant disclosures of which are herein incorporated by reference in their entirety.
[00219] FIG. 12 is a plot showing the normalized ATAQ SEQ data for a top candidate non-coding region.
[00220] Using the above-described selection process/requirement, 11 candidate regions in gene deserts were selected In total, there were 39 KI candidate loci, including those in coding and non-coding regions that would be evaluated as safe harbor loci (predicted loci).
Example 3: Experimental Evaluation of Predicted Loci [00221] Materials = 120 sgRNAs were ordered from Synthego, 3 sgRNAs per region, and editing efficiency was assessed by next generation sequencings (NGS).
- 87 targeted coding genes - 33 targeted gene deserts = 189 constructs were synthesized by Genscript - Homology arm: 450bp each - eGFP was used as the reporter - Vector backbone: pUC57-Kan backbone - 78 with endogenous promoters - 111 with EFla-HTLV promoter = Constructs were sequence verified by Genscript and majority were internally verified by sequencing. The sgRNA sequences are provided in Table 4 and the construct sequences are provided in Table 5.
[00222] Methods [00223] To evaluate the effectiveness of the predicted loci, the following were used.
[00224] (87 endogenous constructs + 120 exogenous construct)*2 = 414 wells +
Controls =460 [00225] 3 donors, 2 replicates at 3 time points: wkl (d6=day 6), wk3 (d21=day 21) and wk4 (d28=day 28) [00226] Controls [00227] DNA-only (no ribonucleoprotein (RNP)): episomal expression (n=8) [00228] DNA non-targeting guides + RNP: to check if DNA delivery is more efficient with RNP (n=8) [00229] pMax GFP was measured to check if transfection is ok (n=3).
[00230] WT cells with no electroporation [00231] WT cells with el ectroporation [00232] Controls were spread out across plates.
[00233] Cells were electroporated without RNP/HDR to check if cell counting makes sense.
[00234] 2 more donors were added and changed to 3 replicates at 1 time point:
wkl (d6) [00235] FACS panel: GFP, TCR, Zombie [00236] Washed every 3 wells.
[00237] The secondary data collected included: RNA-seq data: at dO, d2, and d4 (to determine stable expression) and ATAC-seq data: at d2 (to identify activated cells). The methods used for conducting the RNA-seq and ATAC-seq experiments are in Roth, T. L., et al. 2019. (Rapid discovery of synthetic DNA sequences to rewrite endogenous T
cell circuits.
bioRxiv, 604561), which is herein incorporated by reference in its entirety.
[00238] Automated gating was performed on FACS data, using .fcs files. For constructs with endogenous promoters, all 5 donors and all 3 time points were used. For constructs with EFla promoter, used 3 donors (donors 1-3) and 2 time points (wk3 and wk4).

[00239] To summarize data, wells with a %KI efficiency (% GFP high cells) <=10%
(likely a gating error, experimental error or minimal integration) were removed. Loci were then ranked by median expression among donors and median %KI efficiency among donors.
[00240] Loci significance was calculated using robust rank aggregation, which was the methodology used for consistent ranking of loci among donors and time points.
Methods for conducting robust rank aggregation are known in the art. (See, for example, Kolde, R., et al.
(2012). Robust rank aggregation for gene list integration and meta-analysis.
Bioinformatics (Oxford, England), 28(4), 573-580, the disclosure of which is herein incorporated by reference in its entirety).
[00241] Results [00242] The results from the analysis of controls and experimental groups is shown in FIGS. 13A-25.
[00243] The results of the pmax GFP control experiments revealed that donor 5 had significantly higher cell count than other donors. Generally, the GFP high readings decreased over time, as expected. (FIGS. 13A and 13B).
100244] Experiments to analyze non-targeting controls for changes in episomal GFP
expression, revealed that episomal GFP expression decreased to less than 0.05 over time.
(FIGS. 14A and 14B) Some of the outliers were attributed to gating errors, which were most severe in donor 3. WT control experiments also exhibited <0.05 GFP high readings, as expected (FIG. 15).
[00245] As shown in FIG. 16 and FIG. 17, there was a more consistent trend among donors when using sgRNA5 to target the B2M safe harbor locus.
[00246] A comparison of GFP expression trends between sgRNA79 and sgRNA83 for TRAC insertion and expression with an endogenous reporter revealed that donor 2 trends were different between sgRNA79 and sgRNA83, while donor 4 trends were the same (FIG.
18 and FIG. 22). Some potential reasons for the observed variation between replicates and donors include:
[00247] (1) manual handling of a large number of plates (e.g., FIG.
26):
[00248] Edge effects [00249] Electroporation errors [00250] (2) Inherent differences among donors (e.g., FIG. 27):
[00251] There were consistent differences with certain wells throughput the assay (spanning multiple weeks) [00252] There were variable cell numbers between donors [00253] (3) Gating errors (e.g. FIG. 28):
[00254] 1 peak vs >2 peaks in GFP signals Example 4: Ranking Loci by Expression and Kt Efficiency [00255] The loci were ranked based on mean florescence intensity (MFI) of reporter gene, GFP, and KI efficiency.
[00256] FIG. 29 shows GFP MFI and KI efficiency for all significant loci having endogenous promoters. The B2M locus reported the highest GFP MFI. The TRAC
locus was among the top 10 MFI. The highest GFP MFI readings were reported in week 1 and reduced slightly in weeks 3 and 4. The results also showed most top loci have more than 1 sgRNA.
With regard to 1(1 efficiency, the SOCS1 locus reported the highest K I
efficiency. Overall, KI
efficiency showed greater variation among donors than GFP MFI.
[00257] FIG. 30 shows GFP MFI and KI efficiency for all significant loci having exogenous promoters (e.g. EF la promoters). The TRAC locus reported the highest GFP MFI.
There was comparable expression between weeks 3 and 4. The results also showed most top loci have more than 1 sgRNA. With regard to KI efficiency, the SOCS1 locus reported the highest KI efficiency. Overall, KI efficiency showed less variation when driven by the exogenous promoters than when driven by endogenous promoters.
[00258] FIGS. 31A, 31B, and 31C show GFP WI and KI efficiency for all significant loci having endogenous and exogenous (e.g., EF1a) promoters. The results showed that expression driven by EF1 a promoters is about 10 X higher than endogenous promoters (FIG.
31A). With regard to KI efficiency, the SOCS1 locus reported the highest KI
efficiency. At weeks 3 and 4, KI efficiency was higher when driven by the exogenous promoters than when driven by endogenous promoters.
Example 5: Evaluation of TCR expression in candidate noncoding sites [00259] This experiment was conducted to identify target loci and integration sites that enable high transgene expression without disrupting the TCR. Some gene deserts (e.g. gene deserts 2, 3, 5, and 6) were identified as having high transgene expression (FIG. 32A).
[00260] To evaluate and identify preferred knock-in sites, TCR expression was measured following circuit cassette knock in of Prim eR (Prime Receptor (Myc)). Myc denotes an N-terminal Myc epitope tag to facilate detection of surface expressed primer receptor. Briefly, CD3-CD28 Dynabead-activated T cells were electroporated with sgRNA/Cas9 RNPs targeting the indicated sites (see FIG. 32B and 33A), as well as HDRTs with homology arms directing HDR-mediated integration into the indicated sites. At day 6 post-electroporation, cells were stained with anti-TCRalpha/beta and anti-myc antibodies and analyzed on an Attune NxT flow cytometer. As shown in FIG. 32B, the TCR expression was maintained at GS94 and GS102 candidate integration (knock-in) sites. Only 2/3 of the cells that had circuit cassette knock in of PrimeR at GS79 (TRAC) locus maintatined TCR expression.
These results indicated that the GS94 and GS102 sites showed better potential for TCR stimulation.
[00261] The percentage of cells showing effective knock-in (based on measurements of PrimeR) was 36% 4% when using the GS94 integration site, as compared to 32%
5%
when using the TRAC integration site. These results revealed integration sites, including GS94, that supported reproducibly high circuit cassette knockin rates. See FIG. 33A.
Example 6: Evaluation of GS94 circuit expression and function [00262] GS94 is a candidate integration site located on chromosome 11's distal q arm. It is within 180-350kb of the promoters for ETS1 and FLI1 (FIG. 33B), however that is considered low-risk for integration vector gene therapy. The circuit expression and function potential of the GS94 gene was evaluated.
[00263] T cells underwent circuit cassette integration with PrimeR at GS79 (TRAC) integration site and GS94 integration site. The cells were cocultured with K562 C19 cells for 48 hours and then the PrimeR induced CAR MFI was compared to the PrimeR MEI.
[00264] Briefly, T cells generated as described in the Example above were cocultured with K562 CD19+/MSLN- cells at day 7 post-electroporation. MSLN (mesothelin) is a gene that is overexpressed in human pancreatic cancer. Cells were then stained with anti-FLAG antibody 48 h post initiation of coculture and analyzed on an Attune NxT flow cytometer. The results revealed that GS94 yields superior CAR induction with high prime R expression following the 48-hour coculture with the K562 C19 cells. See FIG. 34A. GS94 resulted in prime antigen-dependent CAR expression that was approximately two-fold higher than the expression in several other candidate integration sites as well as the TRAC
integration site.
Additionally, on average, the prime receptor surface expression level was no less than 50% of expression level when using the TRAC integration site.
Cytotoxicity and Cytokine Secretion [00265] To evaluate the effect of the candidate integration site on cytotoxicity and cytokine secretion, T cells that had undergone circuit cassette integration with PrimeR at GS79 (TRAC) integration site and GS94 integration site were cocultured with C19/MSLN cells for 48 hours. MSLN is a gene that is overexpressed in human pancreatic cancer. The cells were treated at a 1:1 effector:target cell ratio (1:1 E:T)).
Briefly, T cells generated as described above were cocultured with K562 CD19+/MSLN+ cells at day 7 post-electroporation. 48h post initiation of coculture, supernatants were collected and analyzed via Luminex for cytokine levels. The cytokines measure were IL-2, INFg, and TNF.
Cytotoxicity was analyzed by measuring luciferase activity of remaining target cells after 48h. Each of the data points in FIG. 34B represent two replicates and the lines represent the range of cytotoxicity for the replicates. As shown in FIG. 34B, the GS94 integration sites resulted in superior cycotoxic ability and cytokine secretion following the 48-hour coculture with K562 C19/MSLN cells.
Prime-independent cytotoxicity [00266] To compare the effect of the candidate integration site on cytotoxicity versus prime-independent cytotoxicity, T cells that had undergone circuit cassette integration with PrimeR at GS79 (TRAC) integration site and GS94 intcgration site were cocultured with K562 CD19+/MSLN+ cells or K562 CD19-/MSLN+ cells ("K562 MSLN") at day 7 post-electroporation for 48 hours. 0.3, 1.0, and 3.0 E:T cell ratios were tested.
See FIG. 35A and FIG. 35B. At 48h post initiation of coculture, cytotoxicity was analyzed by measuring luciferase activity of remaining target cells As shown in FIG 35B, the GS94 integration site resulted in equivalent cytotoxic potential to the TRAC integration site and there was no prime-independent cytotoxicity.
Prime-independent cytokine section [00267] To compare the effect of the candidate integration site on cytoxicity versus prime-independent cytotoxicity, T cells that had undergone circuit cassette integration with PrimeR
at GS79 (TRAC) integration site and GS94 integration site, generated as described above, were cocultured with K562 CD19+/MSLN+ cells. A group that had target cells only (E:T =
0; Targets only) was compared to a group with an E:T cell ratio of 1.
Following the 48 h coculture with the K562 CD19+/MSLN+ cells, supernatant was collected and analyzed via Luminex for cytokine levels to measure secretion of IL-2, INFg and TNF
cytokines. See FIG.
36A and FIG. 36B. As shown in FIG. 36B, the GS94 integration site resulted in equivalent cytokine secretion to the TRAC integration site and and there was no prime-independent secretion of IL-2, INFg or TNF.
Prime-independent CAR expression [00268] To evaluate the effect of the candidate integration site on prime-independent CAR
expression, T cells that had undergone circuit cassette integration with PrimeR at GS79 (TRAC) integration site, GS94 integration site, and GS102 integration site were cultured in vitro for 32 days. The cells were treated with repetitive CD3/CD28 stimulation at days 5, 12, 19 and 28 of the experiment. On Day 16, the cells were evaluated for CAR
expression using a flow cytometry assay. As shown in FIG. 37, T cell activation through TCR did not result in prime R-independent CAR expression from circuit cassette integration at the candidate integration sites.
[00269] T cells generated as described above were cultured in 96-well plates, with T cell growth medium being exchanged every 2 days. At days 5, 12, 19 and 28, T cells were stimulated with 1:1 CD3/CD28 Dynabeads. Cells were analyzed for PrimeR
expression by myc epitope tag staining, and for CAR expression by FLAG epitope tag staining at the indicated time points. Flow analysis was performed on an Attune NxT flow cytometer.
Example 7: Evaluating stability of prime receptor expression over several weeks [00270] To evaluate the effect of the candidate integration sites on stable (sustained) expression of PiimeR, T cells that had undergone circuit cassette integration with PrimeR at the integration sites indicated in FIG. 38 were cultured in vitro for 32 days.
Briefly, T cells generated as described above were cultured in 96-well plates, with T cell growth medium being exchanged every 2 days. At days 5, 12, 19 and 28, T cells were stimulated with 1:1 CD3/CD28 Dynabeads repetitive stimulation. Flow cytometry assays were run on days 16 and 32 using an Attune NxT flow cytometer. The cells were analyzed for PrimeR
expression by myc epitope tag staining. As shown in FIGS. 38A and 38B, the GS94 integration site resulted in stable PrimeR expression over at least a 4-week period.
Example 8: Evaluation of on-target editing efficiency [00271] To evaluate the on-target editing efficiency of candidate knock-in sites, iGUIDE-Seq assay was used. The methods used for conducting the iGUIDE-Seq assay are illustrated in FIG. 39A and provided in Nobles el at., Genome Biology (2019), which is hereby incorporated by reference in its entitrety. As shown in FIG. 39B, the GS94 integration site had the highest on-target editing efficiency of the evaluated candidate integration sites. As shown in FIG. 39C, GS94 resulted in no putative off-target editing as observed with two donors.

Example 9: Evaluation of GS94 knock-in Methods [00272] Elevation prediction: Computational predictions of potential off-target sites from (gs94) were performed using Elevation-search (algorithm described in Listgarten et al. 2018.
Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs. Nat Bionted Engr 2, 37-48; software obtained from https://github.com/Microsoft/Elevation). All sites identified by Elevation-search were subjected to analysis using rhAmp-seq.
[00273] rhAmpSeq: 49 candidate off-target sites for GS94 identified by iGUIDE
or the Elevation prediction algorithm and the GS94 target site were characterized by rhAmpSeq (Integrated DNA Technologies, Inc.). This targeted amplification enables NGS-based quantification of the editing occurring at numerous sites simultaneously.
Genomic DNA from T cells from at least 2 donors that had been treated singly with each of the following 7 guides:
GS84, GS94, GS95, GS96, GS102, GS108, and GS138 was isolated with the GenFind DNA purification system (Beckman Coulter). Two separate rhAmpSeq amplification pools were used to cover the 50 loci, the procedure was performed as recommended by Integrated DNA Technologies for each of the samples. The rhAmpSeq libraries were sequenced on a MiniSeq with a Mid Output Kit (300-cycles) (Illumina). The CRISPResso2 algorithm (https://github.com/pinellolab/CRISPResso2) was used to determine the percentage of insertions and deletions at each of the amplified loci. Statistical significance (FDR-adjusted p-value < 0.001) using a chi-squared test was only observed at the G594 site.
[00274] RNA-seq: To evaluate changes induced by GS94 integration at the transcriptional level, a CD19/MSLN circuit was integrated at the GS79 (TRAC), GS94, and GS102 integration sites On day 6 post-integration, 1e6 edited cells were sorted using a BD
FACSAria based on transgene expression RNA was isolated from sorted T cells with the RNeasy kit (Qiagen). Purified RNA was converted into an NGS library using the TruSeq RNA Library Prep Kit v2 (Illumina) Libraries were sequenced on either the NovaSeq 6000 or NextSeq 550 instruments (Illumina). The STAR 2.7.3a aligner (Dobin A. et al.
Bioinformatics. 2013. 29:15-21) was used to align the RNA-seq data against the reference human GRCh38 transcriptome and to obtain gene-level read counts. edgeR
(Robinson MID et al. Bioinformatics. 2010. 26: 139-140) was used to compute differential expression, combining data across both donors. The only genes within 300Kb of the GS94 site, ETS1 and FLI1, were not differentially expressed in cells with integration at the G594 integration site compared to cells with integration at any of the other two loci. At an FDR-adjusted p-value cutoff of 0.01, the number of differentially expressed genes was minimal (<100 genes genome-wide).
[00275] Cytokine-independent growth assay: To evaluate the safety of the primary T
cells with GS94 locus KI, cytokine-independent growth assay was performed to evaluate the potential for oncogenic transformation. Briefly, primary human T cells that had undergone CD19/MSLN circuit cassette integration at GS94 locus were thawed and recovered overnight.
lx106 cells were then seeded in one well of a 24 well-GRex plate, culturing for 5 days in the medium with or without cytokines. Cell number and viability were recorded at days 0, 3 and 5. As a positive control, lx106 Jurkat cells were cultured in the medium without cytokines in parallel. As shown in FIG. 43, while GS94 KI T cells maintained good viability and total cell count when cultured with cytokine, the viability of GS94 KI T cells drastically decreased over the course of 5 days when cultured without cytokine and there was no viable cell left on day 5. The positive control Jurkat cells maintained good viability and expansion without cytokine throughout the assay. Taken together, this data shows that GS94 edited primary human T cells still depend on exogenous cytokine for growth, survival and expansion, therefore, there is no concern for cellular transformation.
Results [00276] The specificity of CRISPR reagents (e.g. SpCas9 complexed with sgRNA) targeting candidate loci including GS94 was evaluated by iGUIDE-seq (FIG. 40).

targeting CRISPR RNP showed the highest percentage of iGUIDE-seq oligo cassette trapping events of all candidates evaluated, and the control sgRNA sequences from the iGUIDE-seq paper showed similar specificities to what was reported in the original publication, suggesting that the assay performed as expected.
[00277] Putative off-target sites were taken from the iGUIDE-seq output, which already suggested that the putative sites were spurious. Additional target sites were predicted by a computational approach (Elevation software package). rhAmp-seq was used to prepare high-throughput sequencing libraries for each of the putative off-target sites, and the method was applied to DNA samples from T cells electroporated with CRISPR RNPs targeting the candidate target sites. The resulting NGS data were processed with CRISPResso2 software, and the frequency of insertions and deletions (indels) was taken as indication of CRISPR
cleavage activity, as is common in the field. T cells electroporated with GS94-targeting CRISPR RNP showed no greater frequency of indels at the set of putative off-target sites than T cells treated with CRISPR RNP targeting other sites, consistent with the GS94-targeting CRISPR RNP having no consequential or detectable off-target activity, and therefore being the most specific out of the set evaluated (FIG. 41).
[00278] Potential effects of transgene integration at the GS94 site on the regulation of the T cell transcriptome were evaluated by knocking in a large cassette to the site, growing T
cells for several days, sorting cells expressing the transgene within the cassette, and then collecting RNA from the cells. RNA-seq libraries were prepared and sequenced, and analysis of the resulting Illumina sequencing data revealed no biologically or statistically significant differences in expression of any genes within 300kb of the GS94 site in cells with integrations at GS94 compared to cells with integration at TRAC or the GS102 sites (FIG.
42). Furthermore, other gene expression differences that reached statistical significance were minimal in number and in effect size, consistent with them being noise in the comparison.
[00279] To assess whether transgene integration at GS94 could confer a transformed phenotype, cells with integrations at the GS94 site were cultured with and without cytokines in vitro. Cells remained alive and viable with cytokinc addition, but died without cytokinc supplementation and lost their viability (FIG. 43). The positive control Jurkat cells remained viable and proliferated. Overall, this indicates that integration of a transgene at GS94 does not confer capacity for cytokine-independent growth, which is a hallmark of T cell transformation.
Example 10: In vivo Insertion of a CAR expressing cassette [00280] In vivo efficacy of T cells with a transgene cassette expressing a CAR
recognizing a tumor antigen, or a CAR recognizing a tumor antigen under control of a priming receptor recognizing an antigen in the anatomical vicinity of the tumor, is assessed against human tumor cells such as K562 engineered to express the CAR antigen or to express antigens recognized by both the priming receptor and the CAR. Tumor cells (e.g. 1e6) are subcutaneously injected into the flank of NSG mice (Jackson Laboratories).
Tumor growth is assessed by dimensional measurement by calipers every 2-4 days. When the tumor volume reaches ¨100 cubic mm, mice are intravenously injected with 5e6 T cells with a CAR or prime-CAR circuit cassette integrated at a specific site by CRISPR-mediated insertion, or with T cells engineered with CRISPR RNP alone, or with PBS alone as a sham injection.
Tumor growth is monitored and mice are euthanized when tumor volume reaches 2000 cubic mm. Peripheral blood is bled from mice through a retro-orbital procedure, and flow cytometry and/or ddPCR is used to observe engineered T cell expansion over time. At time of sacrifice, spleen, blood, tumor and/or other tissue is analyzed via flow cytometry, ddPCR, and/or immunohistochemistry for the presence of engineered T cells. The results demonstrate that T cells engineered with cassette integration at one of the defined genomic loci lead to tumor regression and clearance in injected mice as compared to T cells without cassette integration, and that engineered T cells are detectable in the peripheral blood and tissues of inj ected mice.
Example 11: Evaluation of non-viral insertion of a large 8.3kb expression cassette in GS94 [00281] Next, an 8.3 kb insert was inserted into a T cell at the GS94 safe harbor loci using materials/methods as previously described. A diagram of the cassette is provided in FIG. 44.
[00282] Construct generation [00283] To generate plasmid constructs for knock-in, synthetic DNA was ordered from Twist, IDT and GENEWIZ and assembled via Gibson Assembly and Golden Gate Assembly.
Plasmids contained homology arms homologous to sequences flanking the CRISPR
target sites in the genome of 1.2kb or 450 bp in length.
[00284] T cell engineering [00285] T-cells were enriched from peripheral blood mononuclear cells (PBMCs) obtained from normal donor Leukopaks (STEMCELL Technologies) using Lymphoprep (STEMCELL

Technologies) and the EasySep Human T-Cell Isolation Kit (STEMCELL
Technologies). T-cells were subsequently activated with CD3/CD28 Dynabeads at 1:1 bead to cell ratio (ThermoFisher, 40203D) in TexMACS medium (Miltenyi 130-197-196) supplemented with 3% human AB serum (Gemini Bio) and 12.5 ng/ml human IL-7 and IL-15 (Miltenyi premium grade) and cultured at 37 C, 5% CO2 for 48 hours before electroporation.
[00286] CRISPR RNP were prepared by combining 120 tM sgRNA (Synthego) targeting DNA sequence GAGCCATGCTTGGCTTACGA (GS94, SEQ ID NO: 94), 62.5 sNLS-SpCas9-sNLS (Aldevron) and P3 buffer (Lonza) at a volume ratio of 5:1:3:6, and incubated for 15 minutes at room temperature. An optimized amount of plasmid DNA, determined by dose titration experiments (ranging from 0.5-3 micrograms) was mixed with 3.5 Ill of RNP.
T-cells were counted, debeaded, centrifuged at 90 X G for 10 minutes and resuspended at 10^6 cells/14.5 pi of P3 with supplement added (Lonza). 14.5 j.tl of T-cell suspension was added to the DNA/RNP mixture, transferred to Lonza 384-well nucleocuvette plate, and pulsed in a Lonza HT Nucleofector System with code EH-115. Cells were allowed to rest for 15 minutes at room temperature before transfer to 96-well plates (Sarstedt) in TexMACS
medium supplemented with 12.5 ng/ml human IL-7 and IL-15 (Miltenyi premium grade).

[00287] Transgene expression was detected by staining with anti -Myc antibody (Cell Signaling Technology clone 9B11) and anti-Flag antibody (RnD systems, clone 1042E) and analyzed on an Attune NxT Flow Cytometer. Other antibodies used were live/dead Fixable Near-IR (Thermo Fisher), TCRalpha/beta antibody (BioLegend clone IP26), CD4 antibody (BioLegend clone RPA-T4), CD8 antibody (BioLegend clone SK1).
[00288] Priming receptor induction [00289] To assess functional activity of the transgene (ie. synthetic circuit), edited T cells were co-cultured with target cell line expressing priming antigen at 1:1 E:T
ratio, and incubated for 24hrs. T cells were harvested and stained with anti-Myc and anti-Flag antibodies to assess for Priming Receptor and CAR expression, respectively.
[00290] To assess whether the 8.3kb transgene integration at GS94 resulted in functional knock-in, cells were cultured with parental K562 cells and K562 cells expressing the cognate priming antigen at a 1:1 E:T cell ratio. Cells were assayed by flow cytometry after 48 hours as previously described. K562 cells with priming antigen induced CAR
expression, while control parental K562 cells did not (FIG. 45). Overall, this indicates that the Priming Receptor induced CAR expression after insertion of a 8.3kb transgene circuit.
Thus, insertion of the 8.3 kb transgene circuit resulted in expression of multiple functional genes.
Example 12: T cell differentiation post editing [00291] Methods [00292] Two donor T cells edited as described above with a priming receptor and CAR
synthetic circuit were phenotypically profiled with cell surface T cell subset markers by flow cytometry. Resting T cells were taken from in vitro culture conditions, rinsed with PBS prior to staining with Zombie-Aqua viability dye, CD4, CD8, CD45RA, CCR7 and CD27 with FMOs used as controls for gating and analyzed with an Attune NxT. In FlowJo, single, viable lymphocytes were selected by SSC and FSC and subset profiling by a combination of CCR7, CD27, CD45RA were used to identify Naive- or stem cell memory- (Tn/Tscm:
CD45RA+CCR7+CD27+), central memory- (Tern: CD45RA-CCR7+CD27+), effector memory- (Tem: CD45RA-CCR7-CD27-), or terminal effector- (Tte: CD45RA-PCCR7-) T cells on CD4+ and CD8+ subpopulations.
[00293] Results [00294]
The non-viral editing generated a less differentiated T cell product (FIG.
46). In both donors, the non-viral editing did not contribute to an expansion of terminally differentiated T cells, as the major subset of T cells in both subpopulations retained positive expression of CD45RA and CCR7. (FIG. 46). This suggests that the edited T
cells contain the capacity to expand, survive and persist in vivo.
References [00295] Eyquem, J., Mansilla-Soto, J., Giavridis, T., van der Stegen, S. J., Hamieh, M., Cunanan, K. M., ... & Sadelain, M. (2017). Targeting a CAR to the TRAC locus with CRISPR/Cas9 enhances tumour rejection. Nature, 543(7643), 113-117.
[00296] Sadelain, M., Papapetrou, E. P., & Bushman, F. D. (2012).
Safe harbours for the integration of new DNA in the human genome. Nature reviews Cancer, 12(0,51-58.
[00297] Irion, S., Luche, H., Gadue, P., Fehling, H. J., Kennedy, M., & Keller, G. (2007).
Identification and targeting of the ROSA26 locus in human embryonic stem cells. Nature biotechnology, 25(12), 1477-1482.
[00298] Pellenz, S., Phelps, M., Tang, W., Hovde, B. T., Sinit, R.
B., Fu, W.....& Monnat Jr, R. J. (2019). New human chromosomal sites with "safe harbor" potential for targeted transgene insertion. Human gene therapy, 30(7), 814-828.
[00299] Roth, T. L., Li, P. J., Nies, J. F., Yu, R., Nguyen, M. L., Lee, Y., ... & Nguyen, D.
N. (2019). Rapid discovery of synthetic DNA sequences to rewrite endogenous T
cell circuits. bioRxiv, 604561.
TABLE 4: SGRNA SEQUENCES USED FOR EVALUATION OF PREDICTED LOCI
Median (%
Modified), Integra sgRNA sgRNA_start_coor sgRNA
summarize sgRNA Sequence . tion ID GRCH38 Target Loci .Site d from 2 donors, 2 primersets sgRNA_ GCACCTGAATACCACGCCTG
79.28 1 (SEQ ID NO:1) chr16:88811818 APRT APRT
sgRNA_ CGCCTGCGATGTAGTCGATG
78.60 2 (SEQ ID NO:2) chr16:88811551 APRT APRT
sgRNA_ CAGGACGGGCGAGATGTCCC
85.25 3 (SEQ ID NO:3) chr16:88811640 APRT APRT
sgR NA_ CTGAATCTTIGGAGTACCTG
4 (SEQ ID NO:4) chr15:44715425 B2M B2M
78.51 sgRNA_ GGCCACGGAGCGAGACATCT
94.75 (SEQ ID NO:5) chr15:44711550 B2M B2M
sgRNA_ AAGTCAACTTCAATGTCGGA
70.97 6 (SEQ ID NO:6) chr15:44715515 B2M B2M
sgRNA_ GCTTGGAGGCCTGATCAGCG CAPNS
89.34 7 (SEQ ID NO:7) chr19:36141111 CAPNS1 1 sgRNA_ CTTATCTCTTCGCAGCGAGG CAPNS
91.09 8 (SEQ ID NO:8) chr19:36142301 CAPNS1 1 sgRNA_ CACACATTACTCCAACATTG CAPNS
71.98 9 (SEQ ID NO:9) chr19:36142676 CAPNS1 1 sgRNA_ TTCCGCAAAATAGAGCCCCA
91.55 (SEQ ID NO:10) chr3:105746019 CBLB CBLB
sgRNA_ TGCACAGAACTATCGTACCA
91.43 11 (SEQ ID NO:11) chr3:105751622 CBLB CBLB
sgR NA_ GCAATAAGACTCTTTAAAGA
76.18 12 (SEQ ID NO:12) chr3:105853470 CBLB CBLB
sgRNA_ CAAAGAGATTACGAATGCCT
89.80 13 (SEQ ID NO:13) chr1:116754658 CD2 CD2 sgRNA_ CAAGGCACCCCAGGTTTCCA
92.70 14 (SEQ ID NO:14) chr1:116754663 CD2 CD2 sgRNA_ TTACGAATGCCTTGGAAACC
92.82 (SEQ ID NO:15) chr1:116754666 CD2 CD2 sgRNA_ CAGAGACGCATCTGACCCTC
90.96 16 (SEQ ID NO:16) chr11:118315540 CD3E CD3E
sgRNA_ CATGCAGTTCTCACACACTG
87.47 17 (SEQ ID NO:17) chr11:118313715 CD3E CD3E
sgRNA_ GTGTGAGAACTGCATGGAGA
86.65 18 (SEQ ID NO:18) chr11:118313715 CD3E CD3E
sgRNA_ TCTCATTTCAGGAAACCACT
87.24 19 (SEQ ID NO:19) chr11:118349748 CD3G CD3G
sgRNA_ AGTCATACACCTTAACCAAG
87.99 (SEQ ID NO:20) chr11:118349754 CD3G CD3G
sgRNA_ TTCAAGGAAACCAGTTGAGG
86.55 21 (SEQ ID NO:21) chr11:118352458 CD3G CD3G
sgR NA_ GAG CCTTGCCTGGAAATCTG
84.03 22 (SEQ ID NO:22) chr11:61118177 CD5 CD5 sgRNA_ AAGCGTCAAAAGTCTGCCAG
89.19 23 (SEQ ID NO:23) chr11:61118324 CD5 CD5 sgRNA_ CGTTCCAACTCGAAGTGCCA
83.11 24 (SEQ ID NO: 24) chr11:61118121 CD5 CD5 sgRNA_ GAGCGACTGGGACACGGTGA
88.84 (SEQ ID NO:25) chr9:136866246 EDF1 EDF1 sgRNA_ GCTGCGCAAGAAGGGCCCTA
91.04 26 (SEQ ID NO:26) chr9:136866211 EDF1 EDF1 sgRNA_ TTGTTCTGGCCAGCAGCCCC
85.98 27 (SEQ ID NO: 27) chr9:136863433 EDF1 EDF1 sgR NA_ CTTCCAGAGCCACATCATCG
93.10 28 (SEQ ID NO:28) chr19:48965791 FTL FTL
sgRNA_ GGGACTCACCAGAGAGAGGT
88.86 29 (SEQ ID NO:29) chr19:48965601 FTL FTL
sgRNA_ CGGTCGAAATAGAAGCCCTA
93.14 (SEQ ID NO:30) chr19:48965770 FTL FTL
sgR NA_ AAAAGGATATTGTGCAACTG
92.37 31 (SEQ ID NO:31) chr10:87933015 PTEN PTEN
sgR NA_ TGTGCATATTTATTACATCG
90.64 32 (SEQ ID NO:32) chr10:87933183 PTEN PTEN
sgR NA_ TTTGTGAAGATCTTGACCAA
85.36 33 (SEQ ID NO:33) chr10:87933087 PTEN PTEN

sgRNA_ TGTCATGCTGAACCGCATTG
87.94 34 (SEQ ID NO:34) ch r18:12830972 PTPN2 PTPN2 sgRNA_ CCACTCTATGAGGATAGTCA
92.45 35 (SEQ ID NO:35) ch r18:12859219 PTPN2 PTPN2 sgRNA_ TTGACATAGAAGAGGCACAA
93.96 36 (SEQ ID NO:36) ch r18:12836828 PTPN2 PTPN2 sgRNA_ GAGTACTACACTCAGCAGCA
89.61 37 (SEQ ID NO:37) ch r12:6952098 PTPN6 PTPN6 sgRNA_ TCACGCACAAGAAACGTCCA
82.74 38 (SEQ ID NO:38) ch r12:6954872 PTPN6 PTPN6 sgRNA_ AGGTCTCGGTGAAACCACCT
91.27 39 (SEQ ID NO:39) chr12:6951610 PTPN6 PTPN6 sgRNA_ AGCATTATCCAAAGAGTCCG
88.88 40 (SEQ ID NO:40) ch r1:198696873 PTPRC PTPRC
sgR NA_ ATATTAATTCTTACCAGTGG
88.95 41 (SEQ ID NO:41) ch r1:198692370 PTPRC PTPRC
sgR NA_ AGCTTTAAATCAAGGTTCAT
96.89 42 (SEQ ID NO:42) ch r1 :198756176 PTPRC PTPRC
sgRNA_ ATCCCGAGCCCTAAGGTGCA PTPRCA
84.08 43 (SEQ ID NO:43) ch r11:67436325 PTPRCAP P
sgRNA_ GGCAGCGCGGAGGACAGCGT PTPRCA
97.74 44 (SEQ ID NO:44) ch r11:67436285 PTPRCAP P
sgRNA_ CTCAGGGGGCTACTACCACC PTPRCA
91.50 45 (SEQ ID NO:45) ch r11:67436170 PTPRCAP P
sgRNA_ GTCACCGACGAGACCAGAAG
79.40 46 (SEQ ID NO:46) ch r5:82277810 RPS23 RPS23 sgRNA_ GTCGTGGACTTCGTACTGCT
83.07 47 (SEQ ID NO:47) ch r5:82277843 RPS23 RPS23 sgRNA_ TAATTTITAGGCAAGTGICG
61.94 48 (SEQ ID NO:48) ch r5:82277860 RPS23 RPS23 sgR NA_ TTAGCTGTTAGACTTGAATA
85.50 49 (SEQ ID NO:49) chr14:51993810 RTRAF RTRAF
sgRNA_ CGAGAGCCGTCAACTTGCGT
85.64 50 (SEQ ID NO:50) ch r14:51989652 RTRAF RTRAF
sgRNA_ CGGCTTCAACTGCAAAGGTG
88.77 51 (SEQ ID NO:51) ch r14:51989700 RTRAF RTRAF
sgRNA_ TATGAAAAAGCAGAGCGACT
89.61 52 (SEQ ID NO:52) ch r15:43793025 SERF2 SERF2 sgRNA_ TCTGGCGGGCGAGCTCACGC
86.73 53 (SEQ ID NO:53) ch r15:43792989 SERF2 SERF2 sgRNA_ CTCACGCTGGTTACCGCCTA
80.57 54 (SEQ ID NO:54) ch r15:43792977 SERF2 SERF2 sgRNA_ AAAGATTACGAACTTCCCTG SLC38A
92.24 55 (SEQ ID NO:55) ch r12:46207559 SLC38A1 1 sgRNA_ GTTAAAAACAGACATGCCTA SLC38A
91.51 56 (SEQ ID NO:56) ch r12:46229232 SLC38A1 1 sgRNA_ ATGCCTAAGGAGGTTGTACC SLC38A
79.48 57 (SEQ ID NO:57) ch r12:46229246 SLC38A1 1 sgRNA_ CTCCAGGTATCCCATCGAAA
79.53 58 (SEQ ID NO:58) ch r18:47869418 SMAD2 SMAD2 sgRNA_ CACCAAATACGATAGATCAG
86.61 59 (SEQ ID NO:59) ch r18:47870532 SMAD2 SMAD2 sgRNA_ TGGCGGCGTGAATGGCAAGA
82.91 60 (SEQ ID NO:60) ch r18:47896729 SMAD2 SMAD2 sgRNA_ TAGGATGGTAGCACACAACC
92.25 61 (SEQ ID NO:61) ch r16:11255478 SOCS1 SOCS1 sgRNA_ CAGCAGCAGAGCCCCGACGG
83.79 62 (SEQ ID NO:62) ch r16:11255432 SOCS1 SOCS1 sgRNA_ CGGCGTGCGAACGGAATGTG
84.24 63 (SEQ ID NO:63) ch r16:11255296 SOCS1 SOCS1 sgRNA_ TATAGACGCTGCCCGACGTC
95.12 64 (SEQ ID NO:64) ch r15:40038895 SRP14 SRP14 sgRNA_ TCCAAAGAAGGGTACTGTGG
92.14 65 (SEQ ID NO:65) ch r15:40038368 SRP14 SRP14 sgRNA_ ACAGTACCCTTCTTTGGAAT
65.82 66 (SEQ ID NO:66) ch r15:40038358 SRP14 SRP14 sgRNA_ GCGACGGGCGCATCTACGTG
83.68 67 (SEQ ID NO:67) ch r12:120469572 SRSF9 SRSF9 sgRNA_ CCCGACCTCCATAAGTCCTG
92.56 68 (SEQ ID NO:68) ch r12:120465700 SRSF9 SRSF9 sgRNA_ GGGGTCCTCGAAGCGCACGA
89.94 69 (SEQ ID NO:69) ch r12:120469426 SRSF9 SRSF9 sgR NA_ TGCTCTGTTTAGAAGATGAC
79.36 70 (SEQ ID NO:70) chr5:32591641 SUB1 SUB1 sgR NA_ ATATTCTTTTCTAGTTAAAG
70.93 71 (SEQ ID NO:71) ch r5:32591566 SUB1 SUB1 sgRNA_ CCTGTAAAGAAACAAAAGAC
93.66 72 (SEQ ID NO:72) ch r5:32591614 SUB1 SUB1 sgRNA_ TGGAGAAAGACGTAACTTCG
83.53 73 (SEQ ID NO:73) ch r4:105234315 TET2 TET2 sgRNA_ TCTGCCCTGAGGTATGCGAT
90.97 74 (SEQ ID NO:74) ch r4:105234747 TET2 TET2 sgRNA_ ATTCCGCTTGGTGAAAACGA
89.62 75 (SEQ ID NO:75) ch r4:105235656 TET2 TET2 sgRNA_ CAGGCACAATAGAAACAACG
92.65 76 (SEQ ID NO:76) chr3:114295571 TIGIT TIGIT
sgR NA_ CCATTTGTAATGCTGACTTG
60.75 77 (SEQ ID NO:77) ch r3:114295700 TIGIT TIGIT
sgRNA_ CTGGGTCACTTGTGCCGTGG
87.99 78 (SEQ ID NO:78) ch r3:114295634 TIGIT TIGIT
sgR NA_ GTCAGGGTTCTGGATATCTG
98.20 79 (SEQ ID NO:79) ch r14:22547508 TRAC TRAC
sgRNA_ TGGATTTAGAGTCTCTCAGC
88.15 80 (SEQ ID NO:80) ch r14:22547541 TRAC TRAC
sgRNA_ CTGCGGCTGTGGTCCAGCTG
94.77 81 (SEQ ID NO:81) chr14:22550661 TRAC TRAC
sgRNA_ ACAAAACTGTGCTAGACATG
87.86 82 (SEQ ID NO:82) ch r14:22547658 TRAC TRAC
sgRNA_ TTCTTCCCCAGCCCAGGTAA
89.85 83 (SEQ ID NO:83) ch r14:22547778 TRAC TRAC
sgRNA_ CGTCATGAGCAGATTAAACC
95.81 84 (SEQ ID NO:84) ch r14:22550625 TRAC TRAC
sgRNA_ GAGAGCGCCTGCGACCCGAG
89.44 85 (SEQ ID NO:85) ch r19:58544980 TRIM28 1RIM28 sgRNA_ CCAGCGGGTGAAGTACACCA
94.79 86 (SEQ ID NO:86) chr19:58544869 TRIM28 TRIM28 sgRNA_ GGAGCGCTTTTCGCCGCCAG
91.81 87 (SEQ ID NO:87) chr19:58544839 TRIM28 TRIM28 chr10:3313 desert sgRNA_ TGAGGCCTGGACCTTATGCA ¨

69.44 88 (SEQ ID NO:88) chr10:33134193 33140000 (GS88) chr10:3313 desert sgRNA_ CCTGGTGGAGTGAACCATGA ¨

95.25 89 (SEQ ID NO:89) chr10:33132917 33140000 (GS89) chr10:3313 desert sgRNA_ CAAGCACTTAGGTTCCCCTG

91.13 90 (SEQ ID NO:90) chr10:33134633 33140000 (GS90) chr10:7229 desert sgRNA_ GGTCTCCCTACAATTCAGCG ¨

92.02 91 (SEQ ID NO:91) chr10:72294568 72300000 (GS91) chr10:7229 desert sgRNA_ CACAGCGCGTGACTGCAATG ¨

90.22 92 (SEQ ID NO:92) chr10:72298268 72300000 (GS92) chr10:7229 desert sgRNA_ TCTGGGGCACCAATTCTAGG

86.35 93 (SEQ ID NO:93) chr10:72292786 72300000 (GS93) chr11:1283 desert sgRNA_ GAGCCATGCTTGGCTTACGA ¨

91.24 94 (SEQ ID NO:94) chr11:128342576 128350000 (GS94) chr11:1283 desert sgRNA_ GTACAAGTACTTATCTCATG ¨

89.02 95 (SEQ ID NO:95) chr11:128343592 128350000 (GS95) chr11:1283 desert sgRNA_ GAGATAACAACATAACAACA ¨

96.47 96 (SEQ ID NO:96) chr11:128347170 128350000 (GS96) chr11:6542 sgRNA_ CATATTCCATAGTCTTTGGG 5000- desert_ 88.54 97 (SEQ ID NO:97) 65427000 4 chr11:65425000 (NEAT1) (G597) chr11:6542 sgRNA_ CTGCCCCTTAGCAACTTAGG 5000- desert_ 92.76 98 (SEQ ID NO:98) 65427000 4 chr11:65425507 (NEAT1) (G598) chr11:6542 sgRNA_ TGTTTAAAAATATGTTGACA 5000- desert 90.76
99 (SEQ ID NO:99) 65427000 4 chr11:65426264 (NEAT1) (GS99) desert_ sgRNA_ CCAGGAATGGAAACTCACGC chr15:9283 5 87.84
100 (SEQ ID NO:100) 0000- (GS100 chr15:92830315 92840000 ) desert_ sgRNA_ GAGGCCGCTGAATTAACCCG chr15:9283 5 85.32
101 (SEQ ID NO:101) 0000- (GS101 chr15:92831850 92840000 ) desert_ sgR NA_ ATACACGCACACTTGCAGAA chr15:9283 5 99.92
102 (SEQ ID NO:102) 0000- (GS102 chr15:92831131 92840000 ) desert_ sgR NA_ GAG CAGACAGAAACCCAGGG chr16:1122 6 87.92
103 (SEQ ID NO:103) 0000- (GS103 chr16:11225670 11230000 ) desert_ sg R NA_ TGAGTCTCCAAACAGAACAG chr16:1122 6 88.53
104 (SEQ ID NO:104) 0000- (65104 chr16:11226284 11230000 ) desert_ sg R NA_ TAATATCACTGACTTCACGG chr16:1122 6 87.65
105 (SEQ ID NO:105) 0000- (GS105 chr16:11225029 11230000 ) desert_ sg R NA_ TACACACAATGTAAGCAGCA chr2:87460 7 71.79
106 (SEQ ID NO:106) 000- (GS106 chr2:87467461 87470000 ) desert_ sgR NA_ GGGAGCTCAATTCGAAACCA chr2:87460 7 65.89
107 (SEQ ID NO:107) 000- (GS107 chr2:87468809 87470000 ) desert_ sgR NA_ TTGGACAGGTGAGACAGTCG chr2:87460 7
108 (SEQ ID NO:108) 000- (GS108 72.64 chr2:87467001 87470000 ) desert_ sg R NA_ AAGCTCACTCAGATAGTGTG chr3:18651 8 76.89
109 (SEQ ID NO:109) 0000- (GS109 chr3:186511316 186520000 ) desert_ sgR NA_ CAGGAGAACCACCTTACACG chr3:18651 8 86.31
110 (SEQ ID NO:110) 0000- (GS110 chr3:186515260 186520000 ) desert_ sgR NA_ GGACAGACCCTGATTCACAA chr3:18651 8 85.47
111 (SEQ ID NO:111) 0000- (GS111 chr3:186519655 186520000 ) desert_ sgR NA_ ACATGGCAGTCTATGAACAG chr3:59450 9 87.77
112 (SEQ ID NO:112) 000- (GS112 chr3:59451154 59460000 ) desert_ sg R NA_ CCTATAGAGAGTACTACTTG chr3:59450 9 79.33
113 (SEQ ID NO:113) 000- (GS113 chr3:59456416 59460000 ) desert_ sgR NA_ CCAACCGGGTCTTCATTACG chr3:59450 9 92.21
114 (SEQ ID NO:114) 000- (GS114 ch r3:59457029 59460000 ) desert_ sgR NA_ TCAAGCGTAGAGTTCCGAGT chr8:12798 10 93.07
115 (SEQ ID NO:115) 0000- (GS115 ch r8:127993006 128000000 ) desert_ sgR NA_ TCATGCAATTATGGACCCAG chr8:12798 10 89.40
116 (SEQ ID NO:116) 0000- (GS116 ch r8:127994663 128000000 ) desert_ sgR NA_ CGGGAAAGTGACTGGCCATG chr8:12798 10 87.45
117 (SEQ ID NO:117) 0000- (G5117 ch r8:127996766 128000000 ) desert_ sgR NA_ TGAGATTGAAATCAAATCGG 11 84.84
118 (SEQ ID NO:118) chr9:79700 (GS118 ch r9:7974159 00-7980000 ) desert_ sgR NA_ TATGCAATATTCATCACGCG 11 85.44
119 (SEQ ID NO:119) chr9:79700 (GS119 ch r9:7977914 00-7980000 ) desert_ sgR NA_ AATGTGTTAAATCAAATG CA 11 83.48
120 (SEQ ID NO:120) chr9:79700 (GS120 ch r9:7976895 00-7980000 ) TABLE 5: CONSTRUCTS USED FOR EVALUATION OF PREDICTED LOCI
Construct Sequence Length name pARBI- ATCTCGGCCAATAAAGGAGAAAGG GCGCGGCCCGTACGCG

903: ACCAGCTCACGCCCCTCCTCCAGCCGCCAAGGCCCCGGCCCACAGCTGCCTG
GCTGCAGTC
AP RT_LE AGAAGCGTAGCCCGAGACAAGGAAGGGCGCCTTGACTCGCACTTTTGTCCGGTTCGAACGT
ndogen o TCTGctcagtggtgcgtgga atgcgag cgcgtctta a aa tcgatggcgcctaggagtcca tg a a ata cggTACAG G
CTTCCG GCGACGGATGCCCCGCCCCTCACCCACGCTCCG CCCTCCG GGGATGCCCCACCCCT
CGTGGCGGTCCCGCCCGTCCCCGCGCAGGCGCGCTCGGGCTG CCGCTGGCTCTTCGCACGC
GGCCATGGCCGACTCCGAGCTGCAGCTGGTTGAGCAGCGGATCCGCAGCTTCCCCGACTTC
CCCACCCCAtccggatccgg agagggcaggggatctctcctta cttgtggcga cgtggaggaga a ccccggccccA
TG GTG AGCAAGG GCGAG GAG CTGTTCACCG G GGTGGTGCCCATCCTGGTCGAGCTGGACG
GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACG
GCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTG CCCGTGCCCTGG CCCACCCTC
GTGACCACCCTGACCTACGGCGTGCAGTGCTTCAG CCGCTACCCCGACCACATGAAGCAGC
ACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAG
GACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAAC
CGCATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGGCAACATCCTGGGGCACAAGCTG
GAGTACAACTACAACAG CCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCA
AGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTA
CCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGC
ACCCAGTCCAAGCTGAG CAAAGACCCCAACGAGAAG CGCGATCACATGGTCCTG CTG GAG
TTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagctt ataatggttacaaataaagcaatagcatcacaaatttcacaaataaagcatttttttcactgcattctagttgtggttt gtc ca a actcatca atgtatctta tGG CGTGGTATTCAG GTG CACGCACAGGCCGCCCTCGTGGCGCCC
CGACCTGCGGGCCTACGGATGGGAGCGCGTGGCCCGCGACCTCCGGGCGGGCGGGGCGG
GAACCCTCGTCTTTCGCCCCCGGGGCCCTGCCCTCCTTCGGCCCCGGCGTCACCAGGCCTGT

CCTTGGGTCCAGGGACATCTCGCCCGTCCTGAAGGACCCCGCCTCCTTCCGCGCCGCCATCG
GCCTCCTGGCGCGACACCTGAAGGCGACCCACGGGGGCCGCATCGACTACATCGCAGGCG
AGTGCCCAGTGGCCGCATCTAGGGCGCTTCCGCCTCTGCGCGCGCCGAGGGCAGCACGTG
GGCTCTGCGCGTCTGCTTGGGGGAGGGCCTTTGGGGTGCTTCAGGGGGCGCCGGGACGG
GCGCCGTGCTTGGGTCGCCCGGGAAGGGTTGTGAGATTGAGCCC
pARBI- TGCCCCGCCCCTCACCCACGCTCCGCCCTCCGGGGATGCCCCACCCCTCGTGGCGGTCCCGC 1806 904:
CCGTCCCCGCGCAGGCGCGCTCGGGCTGCCGCTGGCTCTTCGCACGCGGCCATGGCCGACT
APRT_2_E CCGAGCTGCAGCTGGTTGAGCAGCGGATCCGCAGCTTCCCCGACTTCCCCACCCCAGGCGT
ndogeno GGTATTCAGGTGCACGCACAGGCCGCCCTCGTGGCGCCCCGACCTGCGGGCCTACGGATG
us_2 GGAGCGCGTGGCCCGCGACCTCCGGGCGGGCGGGGCGGGAACCCTCGTCTTTCGCCCCCG
GGGCCCTGCCCTCCTTCGGCCCCGGCGTCACCAGGCCTGTCCTTGGGTCCAGGGACATCTC
GCCCGTCCTGAAGGACCCCGCCTCCTTCCGCGCCGCCATCGGCCTCCTGGCGCGACACCTG
AAGGCGACCCACGGGGGCCGCATCtccggatccggagagggcaggggatctctccttacttgtggcgacgtg gaggagaaccccggccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCC
TGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAG
GGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCG
TGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCC
GACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGC
GCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGG
GCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACA
TCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAA
GCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGT
GCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCC
GACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGAT
CACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGT
ACAAatgattgtttattgcagcttataatggttacaaataaagcaatagcatcacaaatttcacaaataaagcattttt tt cactgcattctagttgtggtttgtccaaactcatcaatgtatcttatGACTACATCGCAGGCGAGTGCCCAGT
GGCCGCATCTAGGGCGCTTCCGCCTCTGCGCGCGCCGAGGGCAGCACGTGGGCTCTGCGC
GTCTGCTTGGGGGAGGGCCTTTGGGGTGCTTCAGGGGGCGCCGGGACGGGCGCCGTGCTT
GGGTCGCCCGGGAAGGGTTGTGAGATTGAGCCCCCGAGGCCGCCGCGCTGTGCAGGCGTC
CTTCCCGCAGGTTCCGGGTCCCCAGCCCAGGACAGGCGTGACCGAGTTGCCGGGTCAGTTG
GTCTCCCTGGAGTGCCCAAGCTGAATCCACAGGGCCCAGCTGCCTTGCTTCTTGTTCCTTCT
GCGAGCTGGTATTGAGCGCCTGCCACGAGCCAGGCCTTCCCTGGTGAAGATCACGGAATG
CCCACCCAGGGAAGGGAGGCCTGGAGGCCTCCGGGAGAGCCCAAGAGGTGGCCCAGGGA
GA
pARBI-CGTTCTGctcagtggtgcgtggaatgcgagcgcgtcttaaaatcgatggcgcctaggagtccatgaaatacggTAC

905:
AGGCTTCCGGCGACGGATGCCCCGCCCCTCACCCACGCTCCGCCCTCCGGGGATGCCCCAC
APRT_3_E CCCTCGTGGCGGTCCCGCCCGTCCCCGCGCAGGCGCGCTCGGGCTGCCGCTGGCTCTTCGC
ndogeno ACGCG GCCATG GCCGACTCCGAGCTGCAGCTGGTTGAGCAGCGGATCCGCAGCTTCCCCG
us_3 ACTTCCCCACCCCAGGCGTGGTATTCAGGTGCACGCACAGGCCGCCCTCGTGGCGCCCCGA
CCTGCGGGCCTACG GATGGGAGCGCGTGGCCCGCGACCTCCGGGCGGGCGGGGCGGGAA
CCCTCGTCTTTCGCCCCCGGGGCCCTGCCCTCCTTCGGCCCCGGCGTCACCAGGCCTGTCCTT
GGGTCCAGGtccggatccggagagggcaggggatctctccttacttgtggcgacgtggaggagaaccccggcccc ATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGAC
GGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTAC
GGCAAGCTGACCCTGAAGTTCATCTG CACCACCGG CAAGCTGCCCGTGCCCTGGCCCACCC
TCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCA
GCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCA
AGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGA
ACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGC
TGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCAT
CAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCA
CTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTG
AGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTG
GAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgc agcttataatggtta caaataaagca atagcatcacaaatttcacaaataaagcatttttttca ctgcattctagttgtggt ttgtccaaactcatcaatgtatcttatGACATCTCGCCCGTCCTGAAGGACCCCGCCTCCTTCCGCGCC
GCCATCGGCCTCCTGGCGCGACACCTGAAGGCGACCCACGGGGGCCGCATCGACTACATC
GCAGGCGAGTGCCCAGTGGCCGCATCTAGGGCGCTTCCGCCTCTGCGCGCGCCGAGGGCA
GCACGTGGGCTCTGCGCGTCTGCTTGGGGGAGGGCCTTTGGGGTGCTTCAGGGGGCGCCG
GGACGGGCGCCGTGCTTGGGTCGCCCGGGAAGGGTTGTGAGATTGAGCCCCCGAGGCCG
CCGCGCTGTGCAGGCGTCCTTCCCGCAGGTTCCGGGTCCCCAGCCCAGGACAGGCGTGACC
GAGTTGCCGGGTCAGTTGGTCTCCCTGGAGTGCCCAAGCTGAATCCACAGGGCCCAGCTGC
CTTGCTTCTTGTTCCTTCTGCGAGCTGGTATTGAGCGCCTGCCACGA
pARBI- AGTAATTTGATGGGGGCTATTATGAACTGAGAAATGAACTTTGAAAAGTATCTTGGGGCCA 1806 906:
AATCATGTAGACTCTTGAGTGATGTGTTAAGGAATGCTATGAGTGCTGAGAGGGCATCAGA
B2M_1_E AGTCCTTGAGAGCCTCCAGAGAAAGGCTCTTAAAAATGCAGCGCAATCTCCAGTGACAGAA
ndogeno GATACTGCTAGAAATCTGCTAGAAAAAAAACAAAAAAGGCATGTATAGAGGAATTATGAG
us_4 GGAAAGATACCAAGTCACGGTTTATTCTTCAAAATGGAGGTGGCTTGTTGGGAAGGTGGA
AGCTCATTTGGCCAGAGTGGAAATGGAATTGGGAGAAATCGATGACCAAATGTAAACACTT
GGTGCCTGATATAGCTTGACACCAAGTTAGCCCCAAGTGAAATACCCTGGCAATATTAATGT
GTCTTTTCCCGATATTCCTCAGGTtccggatccggagagggcaggggatctctccttacttgtggcgacgtgga ggagaaccccggccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCT
GGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGG
GCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGT
GCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCG
ACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCG
CACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGC
GACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATC
CTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGC
AGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGC
AGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGA
CAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCAC
ATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACA
Aatgattgtttattgcagcttataatggttacaaataaagcaatagcatcacaaatttcacaaataaagcatttttttc act gcattctagttgtggtttgtccaaactcatcaatgtatcttatACTCCAAAGATTCAGGTTTACTCACGTCATC
CAGCAGAGAATGGAAAGTCAAATTTCCTGAATTGCTATGTGTCTGGGTTTCATCCATCCGAC
ATTGAAGTTGACTTACTGAAGAATGGAGAGAGAATTGAAAAAGTGGAGCATTCAGACTTGT
CTTTCAGCAAGGACTGGTCTTTCTATCTCTTGTACTACACTGAATTCACCCCCACTGAAAAAG
ATGAGTATGCCTGCCGTGTGAACCATGTGACTTTGTCACAGCCCAAGATAGTTAAGTGGGG
TAAGTCTTACATTCTTTTGTAAGCTGCTGAAAGTTGTGTATGAGTAGTCATATCATAAAGCT
GCTTTGATATAAAAAAGGTCTATGGCCATACTACCCTGAATGAGTCCCATCCCATCTGATAT
AAACAATCTGCATATTGGGATTGTCAGGGAATGTTCTTAAAGATCAGA
pARBI- AAGATCTTAATCTTCTGGGTTTCCGTTTTCTCGAATGAAAAATGCAGGTCCGAGCAGTTAAC 1806 907:
TGGCTGGGGCACCATTAGCAAGTCACTTAGCATCTCTGGGGCCAGTCTGCAAAGCGAGGG

ndogeno GGCGCGCACCCCAGATCGGAGGGCGCCGATGTACAGACAGCAAACTCACCCAGTCTAGTG
us_5 CATGCCTTCTTAAACATCACGAGACTCTAAGAAAAGGAAACTGAAAACGGGAAAGTCCCTC
TCTCTAACCTGGCACTGCGTCGCTGGCTTGGAGACAGGTGACGGTCCCTGCGGGCCTTGTC
CTGATTGGCTGGGCACGCGTTTAATATAAGTGGAGGCGTCGCGCTGGCGGGCATTCCTGAA
GCTGACAGCATTCGGGCCGAGATGtccggatccggagagggcaggggatctctccttacttgtggcgacgtg gaggagaaccccggccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCC
TGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAG
GGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTG CACCACCGG CAAGCTGCCCG
TGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCC
GACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGC
GCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGG
GCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACA
TCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAA
GCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGT

GCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCG GCGACGGCCCCGTGCTGCTGCCC
GACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGAT
CACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGG GATCACTCTCGGCATGGACGAGCTGT
ACAAatgattgtttattgcagcttata atggtta ca a a ta aagcaatagcatcaca a atttca ca a ata a agcattttttt ca ctgcattctagttgtggtttgtccaa a ctcatca atgtatctta tTCTCGCTCCGTGG
CCTTAGCTGTGCTCG
CG CTACTCTCTCTTTCTG GCCTG G AG G CTATCCAG CGTG AGTCTCTCCTACCCTCCCGCTCTG
GTCCTTCCTCTCCCGCTCTGCACCCTCTGTGGCCCTCGCTGTGCTCTCTCGCTCCGTGACTTCC
CTTCTCCAAGTTCTCCTTGGTGGCCCG CCGTGGGGCTAGTCCAGGGCTGGATCTCGGGGAA
GCGGCGGGGTGGCCTGGGAGTGGGGAAGGGGGTGCGCACCCGGGACGCGCGCTACTTGC
CCCTTTCGGCGGGG AGCAGGG GAGACCTTTGGCCTACGGCGACGGGAGGGTCGGGACAA
AGTTTAGGGCGTCGATAAGCGTCAGAGCGCCGAGGTTGGG GGAGGGTTTCTCTTCCGCTCT
TTCGCGGG GCCTCTGGCTCCCCCAGCGCAGCTGGAGTGGGGGACGGGTAGGCTCG
pARBI- AGGAATGCTATGAGTGCTGAGAGGGCATCAGAAGTCCTTGAGAGCCTCCAGAGAAAGGCT 1806 908: CTTAAAAATGCAGCGCAATCTCCAGTGACAGAAGATACTGCTAGAAATCTGCTAGAAAAAA
B2 M_3_E AACAAAAAAGGCATGTATAGAGGAATTATGAGGGAAAGATACCAAGTCACGGTTTATTCTT
ndogen o CAAAATGGAGGTGGCTTGTTGGGAAGGTGGAAGCTCATTTGGCCAGAGTGGAAATGGAAT
us_6 TGGGAGAAATCGATGACCAAATGTAAACACTTGGTGCCTGATATAGCTTGACACCAAGTTA
GCCCCAAGTGAAATACCCTGGCAATATTAATGTGTCTTTTCCCGATATTCCTCAGGTACTCCA
AAGATTCAGGTTTACTCACGTCATCCAGCAGAGAATGGAAAGTCAAATTTCCTGAATTGCTA
TGTGTCTGGGTTTCATCCATCCtccggatccggagagggcaggggatctctccttacttgtggcgacgtggagg agaaccccggccccATGGTGAGCAAGG GCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGG
TCGAG CTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG GCGAGGGCGAG G GC
GATGCCACCTACGG CAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGC
CCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGAC
CACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCG AAGGCTACGTCCAGGAGCGCA
CCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGG GCG
ACACCCTGGTGAACCGCATCGAGCTG AAGGGCATCGACTTCAAGG AGGACGGCAACATCC
TGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCA
GAAGAACGGCATCAAGGTGAACTTCAAGATCCG CCACAACATCGAGGACGGCAGCGTGCA
GCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCG ACGGCCCCGTGCTGCTGCCCGAC
AACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACA
TGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATG GACGAGCTGTACAA
atgattgtttattgcagcttataatggtta ca a ata a agca atagcatca ca aatttca ca a ataa agcatttttttca ctg cattctagttgtggtttgtcca aa ctcatca atgtatcttatGACATTGAAGTTGACTTACTGAAGAATG GA
GAGAGAATTGAAAAAGTG GAG CATTCAGACTTGTCTTTCAG CAAG GACTG G TCTTTCTATCT
CTTGTACTACACTGAATTCACCCCCACTGAAAAAGATGAGTATGCCTGCCGTGTGAACCATG
TGACTTTGTCACAGCCCAAGATAGTTAAGTG G G GTAAGTCTTACATTCTTTTGTAAG CTG CT
GAAAGTTGTGTATG AGTAGTCATATCATAAAG CTG CTTTG ATATAAAAAAG GTCTATG G CC
ATACTACCCTGAATGAGTCCCATCCCATCTGATATAAACAATCTGCATATTGGGATTGTCAG
GGAATGTTCTTAAAGATCAGATTAGTGGCACCTGCTGAGATACTGATGCACAGCATGGTTT
CTGAACCAGTAGTTTCCCTGCAGTTGAGCAGGGAGCAGCAGCAGCACTTG
pARBI- GATCTCACATTCTAGCGCCCAGACATGTCGGGAACGCCTTACGCCAACTTGGGGTCCCCACC 1806 909: AAGAGAACCCCCACCAGATCTGCACCCTCCCCTTCACGCGTGCACCCAGTCCAGGCTCCCTC
CAP N S1_ AAGCCCCACG GGTG CCTTTTAGACCTGAG GAG GTTGCAAACCTGATCCCCCATACCTGCCCC
1_Endoge ACCCATCCGCGGACAACCCGCCCTCGCAAACTCAGACCCCCACCCG GAGG CTTCAGATTCCT
nous_7 CCCAGGTCCAGCTGCCGGAAATGCGTGTTTGAAGGGAGGGTGTGGGCTCAGGGGCG
AAG
CACCCACTGGTCCCCTTTTTTCCCCCCAGCAGTGAGTCGCAGCCATGTTCCTGGTTAACTCGT
TCTTGAAGGGCGGCGGCG GCGGCGG CGGG GGAGGCGGGGG CCTGGGTGGG GGCCTGGG
AAATGTGCTTG GAG G CCTGATCtccggatccggagagggcaggggatctctc ctta cttgtggcga cgtggag gagaa ccccggccccATGGTGAGCAAGGG CGAG GAGCTGTTCACCGGGGTGGTGCCCATCCTG
GTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGG
CGATG CCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTG
CCCTGG CCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAG CCGCTACCCCGA
CCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATG CCCGAAGGCTACGTCCAGG AGCGC
ACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGC

GACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAG GAGGACGGCAACATC
CTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAG C
AGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAG CGTGC
AG CTCGCCGACCACTACCAGCAG AACACCCCCATCGGCGACGGCCCCGTGCTGCTG CCCGA
CAACCACTACCTG AGCACCCAGTCCAAGCTGAGCAAAGACCCCAACG AGAAGCGCGATCAC
ATG GTCCTGCTGGA GTTCGTGACCGCCGCCGGGATCACTCTCGG CATGGACGAG CTGTACA
Aatgattgtttattgcagcttata atggtta ca a a ta aagcaatagcatcacaa atttcaca a ata a agcatttttttca ct gcattctagttgtggtttgtccaa a ctcatca atgtatctta tAGCGG GGCCGG G GGCG GCG GCGG
CGGCG
GCGGCGGCGGCGGCGGTGGTGGAGGCGGCGGTGGCGGTGGAACGGCCATGCGCATCCTA
GGCGGAGTCATCAG CGCCATCAGGTAA GGCGGAG ACTATCAG AGGG G CGGGG CCTGGG A
ATG GG AGGA GCCTCAGTG AG GCGTGGTCTG G GAG GGGCGTG GTCTAAAAATAGAATAGG
ATTAACCTGGAGGCTAACCTGGGTACATGAATTAG GCCGGGG AGGCCTGGTTTG AGAGTT
CTGCTGTAAGG GGGTGGA CCCCAG TGAG GCGG ATATCAGTCATTGGG GGCG GTGCTTGAT
ATGGGAGTAGTCTGATTGTGTGGGACCAGGACAATGTGGTCCTGAGGGGACTGATGTG GA
GTTTTGGCGGGTGGGGCTTATGGGTCTGGG CTCAGCCTGCATTGGCTAACCTGGAGATGAA
CG TT
pARBI- CGGGCATTTAG GAAGCGGGCCATG

910: TAAGGCCTTGG GAAGCTGGGGCAGAGCCTCTATGAAGTGAGCCCTCCAAGGGGCGGGGTC
CAP N Si TTTTGATTCTACGTGGGATTTTTTAGGTTTAGGGTGGCCAAGATGACTGAAATCTGCCACTG
2 Endoge GGTAGGTGTGCCTGGCAGGAGGGGAGCCTCCCAG GGGA CCG GTCTCTGGG TTTCCTCG AG
nous 8 GGTGGGGTTGGCCTGAGGAAGGGAGAAGAGGGG CACGACCAGG
GCAGTGTGGATTGGG
ACAGATGAGGACAAGAACAAATGAAAGGCACAGCAACCAAGTAAGGAAGATAACGGCTG
GG GTCTGGAGCGTTGGGGCTGATGGTTCTGTAGTGCTGCCCGTTG GAGGCCCCGCCCCTG
GCACTAACCCCTCCCCCTTATCTCTTCGCAGCtccggatccggagagggcaggggatctctccttacttgtg gcgacgtggaggagaaccccggccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTG
CCCATCCTGGTCG AG CTGG ACGG CGACGTAAACGG CCACAAG TTCAGCGTGTCCGGCG AG
GGCGAGGG CGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGG CAAG
CTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCG
CTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCC
AG GAG CGCACCATCTTCTTCAAGGACG ACGGCAACTACAAGACCCGCG CCGAGGTGAAGT
TCGAG GGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACG
G CAA CATCCTG G G G CA CAAG CTG G AGTA CAA CTACAA CAG CCA CAA CG TCTATATCATG
GC
CGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGG
CAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTG
CTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAG C
GCGATCACATG GTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGA
GCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatca ca aa tttca caa ata a agc atttttttca ctgcattctagttgtggtttgtcca aactcatcaatgtatcttatGAG
GCGGCTGCGCAGTACAAC
CCG GAGCCCCCGGTAAGCCCCCTCTG CAACCAGACCCCCTTCTCCTGCCAAG GCCTCTTCGA
GGTCCCATCCCTGTTCCTGTAGAGAAGCCCCACCTTCCTCCCCTTCTTGTGAAATTCCTCTGC
CAGTTCCTCCCATGCCGTGTCTGCAGCTTCGCCATGGGTCTTAGCCATGCCCCACATACGTG
CACCCCATTCA CTACCACCCCTCATTCTTTTCCTCATCAA G CTG CCCA G CCCA CTCT G ACTTCC
CCACCCAGGGTACCTGGGTTTGG GGAG CCGTCCTGGCCGGGTTCCCCTCCCCCTGCTCTGA
GCTCTCCTCCCTTTGCAG CCCCCACGCACACATTACTCCAACATTGAGGCCAACGAGAGTGA
GG AGGTCCGG CAGTTCCG GAG ACTCTTTGCCCAGCTGGCTGGAGATGTAAGTAAC
pARB I- GCTGATGG TTCTGTAGTGCTGCCCGTTGGAGG CCCCGCCCCTGG

911: ATCTCTTCGCAGCGAGGCGGCTGCGCAGTACAACCCGGAGCCCCCGGTAAG CCCCCTCTGC
CAP N S1_ AACCAGACCCCCTTCTCCTGCCAAG GCCTCTTCGAGGTCCCATCCCTGTTCCTGTAGAGAAG
3_E n d oge CCCCACCTTCCTCCCCTTCTTGTGAAATTCCTCTGCCAGTTCCTCCCATGCCGTGTCTGCAG CT
no u s_9 TCGCCATGGGTCTTAGCCATGCCCCACATACGTG
CACCCCATTCACTACCACCCCTCATTCTT
TTCCTCATCAAGCTGCCCAGCCCACTCTGACTTCCCCACCCAGGGTACCTGGGTTTGGGGAG
CCGTCCTGGCCGGGTTCCCCTCCCCCTGCTCTGAGCTCTCCTCCCTTTGCAGCCCCCACGCAC
ACATTACTCCAACtccgga tccggagagggca ggggatctctccttacttgtggcga cgtgga ggaga a ccccggc cccATG GTGAGCAAG GGCG AGGAG CTGTTCACCGGGGTG GTG CCCATCCTG GTCG AG CTG
GACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCAC

CTACGG CAAG CTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTG GCCC
ACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAA
GCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCT
TCAAG GACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGG
TGAACCGCATCGAG CTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGG GCACA
AG CTG GAGTACAACTACAACAGCCACAACGTCTATATCATGG CCGACAAG CAGAAGAACG
GCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCG
ACCACTACCAGCAGAACACCCCCATCGGCGACCGCCCCGTG CTGCTGCCCGACAACCACTA
CCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTG
CTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAG CTGTACAAatgattgttta ttgcagcttataatggttaca aata aagcaatagcatcacaaatttca ca aataaagcatttttttcactgcattctagttg tggtttgtccaaa ctcatcaatgtatcttatATTGAGGCCAACGAGAGTGAGGAGGTCCGGCAGTTCCG
GAGACTCTTTGCCCAGCTGGCTGGAGATGTAAGTAACCTGGGGTCCCTGGCCCCGTCCTAA
CCGTTCCATCCCTTCCCTTGTGGCTGCCCTTGCACACACACCCTTGACCATGACAATCCCAGT
GTTCCCATTCTCCATGACATTCTCAGACCCCTTTCAGTCACCCCTGACCTGCCCCTAACTTCCG
CCCGCAGGACATGGAGGTCAGCGCCACAGAACTCATGAACATTCTCAATAAGGTTGTGACA
CGACGTAAGTGACCGGGGTTAAGGAATAGGGTAGATTCAGAGGCAGAGGGGTCAGAGAG
GATTTGACCTCTGGCCTCTGACTTTCAACCTGTTACCCACAGACCCTGATCTGAAGACTGAT
GGTTTTGGCATTGACACATGTCGCAGCATGGTGGCCGTGATG
pARBI- ggagagttatagcagatggagacaa aaaagggaa ctgctggggatcaaaccttgtaaagtccttctaagtgatatatga 1806 912: ggactttgacttttattcaaagttagaaggctttgaa ca aaagaattatttcatttgtgtttta ca atggtcacaagtgccg CBLB_LE tagcgaga ataggctgTGTAGAGAAGAGATTTTGTGGTTTTCTTTTTTGTTCTCTGTATTCTCTTCC
ndogeno TGAAAAATCTGTTTTCTGCCTAATCATATTTCTGTAAAGAACAATAGACTTGAACGTCTGCTG
us_10 TTAAGTAACAAAGGTTGACCTAAACCACATAAGGTCAGATTAACTTTTAAAATATCCAAGAT
AATGTTAAGTGTATTAAATATGGTTCAGTATGTAATCAAAATATTTGATATTCTCTAGATACT
AATCTCttttaatttttttattttttAGCCTTGGtccggatccggagagggcaggggatctctccttacttgtggcgac gtggaggaga accccggccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCA
TCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCG
AG GGCGATG CCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCC
CGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACC
CCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAG GA
GCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAG
GGCGACACCCTGGTGAACCGCATCGAGCTGAAG GGCATCGACTTCAAGGAGGACGGCAAC
ATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACA
AG CAGAAGAACG GCATCAAGGTGAACTTCAAGATCCGCCACAACATCG AGGACGGCAGCG
TGCAG CTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTG CC
CGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGA
TCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACG AG CTG
TACAAatgattgtttattgcagcttataatggttacaa ataaagcaatagcatca caaatttcacaaataaagcattttt ttcactgcattctagttgtggtttgtccaa actcatca atgtatcttatGGCTCTATTTTGCGGAATTGGAATTT
CTTAGCTGTGACACATCCAGGTTACATGGCATTTCTCACATATGATGAAGTTAAAGCACGAC
TACAGAAATATAGCACCAAACCCGGAAGGTAAGACTTTTTAGCAGTAATATTATGTGATATA
TCCAAAGATGAAAAATTCTCCATGGTTTGTTCCTAAGTAAGTCAGCAATACCCGCTGATG GC
ATG CTTGGGAGGGG GCTAGAAAAG GTAAAAGACCAAAATGGTGCTTCTTAACATAATGTTT
TAAATTGATACATTGTGCTTACTCTGAAACTTG GTATTCATAAAGTGAAAG GGGATATGGTC
TCCTTTCTGTTTTCATGATATCCGTGTTTATGTGCAAAATTGAATGTATATATTTGAATACTTG
AATACTGTGCTAAAGAACAAATAAATAGAACAGTCTTATGTTCATTACTTT
pARBI- CATATTACTCATTTGTGCAATGAATAAAGGATTTCTATTATTATATG

913: ACTGTTTTTAGAAAATGAGAATTAAATCTTGATTCATTATTAAATAACCTTGTATACACACAT
CBLB_2_E GAAGAGGTCATGACACATAAGACTTTGACCTAATACCTTTTTTCAGATATGTTTCTATAGCTA
ndogeno TCTTGATAGCCTAGGACTGTTTGAGAGAAAATTAATTTAAATTTTTGTCATATTCTTAGCAAA
us 11 TGCAAAATAATTGAGTTAAACAGAACCCACTACATTTGTTCTGAAATATGAACAATATTCTC
AAATATTTGTCTGGTAGTACAAATACAATTATATTAATTTTATTATTGAATTTTGCTGCTTCAA
AG GGAG GTATTTCTTAAATGATACTTGATTCAATTATTTCCCTTTTTTTCCACCTTG CACAGA
ACTATCGTAtccggatccggagagggcaggggatctctccttacttgtggcgacgtggaggagaaccccggccccA

TGGTG AGCAAGG GCGAG GAG CTGTTCACCG G GGTGGTGCCCATCCTGGTCGAGCTGGACG
GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACG
GCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTG CCCGTGCCCTGG CCCACCCTC
GTGACCACCCTGACCTACGGCGTGCAGTG CTTCAGCCGCTACCCCGACCACATGAAGCAGC
ACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAG
GACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGG G CGACACCCTGGTGAAC
CGCATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGGCAACATCCTGGGGCACAAGCTG
GAGTACAACTACAACAG CCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCA
AGGTGAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTCGCCGACCACTA
CCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGC
ACCCAGTCCAAGCTGAG CAAAGACCCCAACGAGAAG CGCGATCACATGGTCCTG CTG GAG
TTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagctt ata atggtta ca a ata a agca atagcatca ca a atttca ca a ata aagcatttttttca ctgcattctagttgtggtttgtc ca a actcatca atgtatctta tCCATGGAAAGTATTCAGACAGTGCCTTCATGAGGTCCACCAGATT
AGCTCTGGCCTGGAAGCAATGGCTCTAAAATCAACAATTGATTTAACTTGCAATGATTACAT
TTCAGTTTTTGAATTTGATATTTTTACCAGGCTGTTTCAGGTAAGTTTATACTTGCTTAGTCTA
TCCATTCCCTtcttctctctccctgtcttttctctctgtttctctctctctctgtctctctctcacacaca ca cacaTAAAC
AAACATATG GAAAATAGTGTCATTCCTATTAGTCTGTAATTTTATTGGTATGTTTG GTCATTG
TTAAAAGTTTAGAGTGAGCCATTTGGCTTAAGTTTGGAGTTATCAGATG CTGTGAGCCTGGT
GG CTGCTTGTTAGCTTTGAGAGGATCCATCCAAAAGAG CTTTGTTTTAAACTCTCTACTTAGT
TTCTTACTT
pARBI- TTAAACTTTAGAGGTTCAAAATAAGGATATGATATAGAAAAGATTGAATGGTAGGAGAACA 1806 914: GAATAGGATTTATTTATATTATGACAAACGTGACTATCAGTTAATAAATTGGAGATAGTAAG
CBLB_3_E TTAAGATCATTATTTATATTATGGAAAAATGAAGAAATAG GACATG GTTCTCTAACTTTTTAA
ndogeno TTTTTCTGTGAAATAAAATATTATGACTTATTTTATTTTTATCTTTGTTTTTAGGTAAGACTGT
us_12 GCCAAAATCCCAAACTTCAGTTGAAAAATAGCCCACCATATATACTTGATATTTTGCCTGATA
CATATCAGCATTTACGACTTATATTGAGTAAATATGATGACAACCAGAAACTTGCCCAACTC
AGTGAGAATGAGTACTTTAAAATCTACATTGATAGCCTTATGAAAAAGTCAAAACGGGCAA
TAAGACTCTTTAAAtccggatccggagagggcaggggatctctcctta cttgtggcga cgtggaggaga a ccccg gccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCT
GGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCA
CCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCC
CACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGA
AG CAGCACGACTTCTTCAAGTCCGCCATG CCCGAAGGCTACGTCCAGGAGCGCACCATCTT
CTTCAAGGACGACG GCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCT
GGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCA
CAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATG GCCGACAAGCAGAAGAAC
GGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCC
GACCACTACCAGCAGAACACCCCCATCG C CGACGGCCCCGTGCTGCTG CCCGACAACCACT
ACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCT
GCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgattgttt attgcagcttata atggtta ca a ataa agca a tagcatca ca a atttca ca a ata a agcatttttttcactgcattctagtt gtggtttgtcca a a ctcatca atgtatcttatGAAG GCAAGGAGAGAATGTATGAAGAACAGTCACAG
GACAGGTAAGAAGAATATTTCAGATGTTTTGGTGTAAA GGTCATTTATGTTGCTTTTTACTT
AATAGTTAACCTAAACGTCACCATGTAATTGTTTTGGGGTAAATGTGAGTCGCTTTGTATAT
agtcatgtggta ctta a cgatggggatactttacaaga aatgcatttttaga ca atttcattgttgtgtgg a catgataga gtgtactta ca cagagctagatggtattacctaccgcaca cctaggctgtatga tgtagcctattgctcctaggctgca a a tctata cagta tgtta ctgta ctga ata ctgtgggcagttgta acaca atggta agtatgtgtgtatcta a atata ccta a a catagaa a aggtatggta aa a ata ctgtata a a aggccgggcgtgg pARBI- AGATGAGAAAACCTATCCTTCCCAATTTTTTTGTGTGAGAATTAAAATGCAGCAAGAAAACA 1806 915: CACACTCATAAACACATCTGCTTTGGCAAAG GAG CACATCAGAAGGGCTGG CTTGTGCGCG

ndogeno CGTGGTTAAGCTCTCGGGGTGTGGACTCCACCAGTCTCACTTCAGTTCCTTTTGCATGAAGA
us_13_14 GCTCAGAATCAAAAGAGGAAACCAACCCCTAAGATGAGCTTTCCATGTAAATTTGTAGCCA

GCTTCCTTCTGATTTTCAATGTTTCTTCCAAAGGTAAGCATAAGAGTCAAAGAAGTCCCAAC

CCAG CTTTCCCTG AAA GTG A CTCTCAGTAA CTCTTTTG CTTTTTATAG GTG CAGTCTCCAA AG
AG ATTACG AATGCCTTGtccggatccgga gagggcaggggatctctccttacttgtggcgacgtgg aggaga a c cccggccccATGGTGAGCAAGG G CGAGGAG CTGTTCACCGGG GTGGTGCCCATCCTG GTCGA
GCTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCGGCGAG GGCGAGG GCGATG
CCACCTACGGCAAG CTGACCCTGAAGTTCATCTG CACCACCGG CAAGCTGCCCGTGCCCTG
GCCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAGCCGCTACCCCGACCACA
TGAAG CAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAG GAGCGCACCAT
CTTCTTCAAGGACGACGGCAACTACAAGACCCGCG CCGAGGTGAAGTTCGAGG GCGACAC
CCTGGTGAACCGCATCGAGCTGAAGGG CATCGACTTCAAGGAGGACG GCAACATCCTGGG
GCACAAGCTGGAGTACAACTACAACAG CCACAACGTCTATATCATG G CCG ACAAG CAGAAG
AACG GCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTG CAGCTC
GCCGACCACTACCAGCAGAACACCCCCATCG G CGACGGCCCCGTG CTG CTGCCCGACAACC
ACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGT
CCTGCTGGAGTTCGTGACCGCCG CCG GGATCACTCTCGG CATG GACGAGCTGTACAAatgat tgtttattgcagcttata atggttacaa ata a agcaatagcatca caa atttca ca aata a a gcatttttttc a ctg cattc tagttgtggtttgtccaa a ctcatca atgtatcttatGAAACCTGGGGTGCCTTG GGTCAGGACATCAACT
TG G ACATTCCTAGTTTTCAAATG AG TG ATG ATATTG ACG ATATAAAATG G GAAAAAACTTCA
GA CAAGAAAAA G ATTG CACAATTCAG AAAAG A G AAAG A G ACTTTCAA G G AAAAA G ATACA
TATAAGCTATTTAAAAATG G AA CTCTG AAAATTAAG CATCTG AAGACCG ATG ATCA G G ATA
TCTACAAGGTATCAATATATGATACAAAAGGAAAAAATGTGTTGGAAAAAATATTTGATTT
GAAGATTCAAG GTAAGTGTTCATTCCCTTAATTGCTTTATTTCAGTGTGGGTGCTATTTGGC
AAGTTGGAAAATAGCATTTCTAATATTCCCCAGCGCTGACCTCTGCCTCCAGGGGG GCTATA
CAG GAAACCAGATGCATGTCTCCTGCCCCAGTGGAGACTGTGGTCCAA
pA R B I- AG G ATTG TTCTAGTTG ATTGGTATG TGTG CA

916: CTG G ATATA CTCAACAAATATTTG TTG A GCCAAATACTCAACACCAG CCAAACACG TAGTAT

C D3 E_1_ TTACTTTAGCTTAAG CGAATTATTTAG CCCTG A CAG AA G CCCTGG AATGTG G
GTCTTTAA GT
Endogeno TCCTATTTTTGAGATG GGAAAGCTGAGGCTCACGG AAGGAGGTGACCAG CTCAAGTCTCCT
u s_16 ACCGTCCATGCCAAATTAGAATTCCAGCCTG
CCTCCTGACTTCAAGTCCAAAGTTCTTCCCAC
GCACTAAAGCTAGCTCTTCAGTGTCCTTTCTTAG GAGGTACTTCCTCCCGCACCACTGACCG
CCCCCTCTCTATTTCACCCCCAG CCCATCCGGAAAGGCCAGCGGGACCTGTATTCTG G CCTG
AATCAGAG ACGCATCtccggatccggaga gggcaggggatctctcctta cttgtggcga cgtggagga ga a ccc cggccccATGGTGAGCAAGGG CGAG GAG CTGTTCACCGGG GTG GTGCCCATCCTG GTCGAG
CTGGACG GCGACGTAAACGG CCACAAGTTCAGCGTGTCCGGCGAG G GCGAGGG CGATGC
CACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGG
CCCACCCTCGTGACCACCCTGACCTACGG CGTG CAGTGCTTCAGCCGCTACCCCGACCACAT
GAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAG G CTACGTCCAGGAGCGCACCATC
TTCTTCAAGGACGACG G CAACTACAAGACCCG CGCCGAG GTGAAGTTCGAG G GCGACACC
CTGGTGAACCG CATCGAGCTGAAG GGCATCGACTTCAAGGAG GACG G CAACATCCTGG GG
CACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGA
ACG GCATCAAG GTGAACTTCAAGATCCG CCACAACATCGAGGACG GCAGCGTGCAG CTCG
CCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTG CTG CTG CCCGACAACCA
CTACCTG AG CACCCAGTCCAAG CTGAGCAAAGACCCCAACGAGAAGCG CGATCACATGGTC
CTGCTG GAGTTCGTGACCGCCGCCG GGATCACTCTCGG CATG GACGAG CTGTACAAatgattg tttattgcagcttata atggttaca a ata a a gcaatagcatca ca a atttca ca aa ta a agcatttttttca ctgcattcta gttgtggtttgtccaa a ctcatca atgtatcttatTGACCCTCTG GAGAACACTGCCTCCCGCTGG CCCAG
GTCTCCTCTCCAGTCCCCCTG CGACTCCCTGTTTCCTG GGCTAGTCTTGG ACCCCACGAG AG
AG AATCGTTCCTCAGCCTCATG GTGAACTCG CG CCCTCCAGCCTGATCCCCCGCTCCCTCCTC
CCTG CCTTCTCTG CTG G TACCCAGTCCTAAAATATTG CTGCTTCCTCTTCCTTTG AA G CATCAT
CAG TA GTCACACCCTCA CAG CT G G CCTG CCCTCTTG CCA G G ATATTTATTTGTG CTATTCACT
CCCTTCCCTTTG G ATGTAA CTTCTCCG TTCA G TTCCCTCCTTTTCTTG CATGTAA GTTG TCCCC
CATCCCAAAGTATTCCATCTACTTTTCTATCG CCGTCCCCTTTTGCAG CCCTCTCTGG GGATG
GACTG GGTAAATGTTGACAGAGG CCCTGCCCCGTT
pA R B I- AATAATAAAATAATAA CAATACTTAACATTTATTG A GTG CTTATTAAG TCTCA

917: GTACCCAACACTTATCAAG GATTCTTTTTCATGTAATCCTCTCAACAACTATATG G G TTAA GT
C D3 E_2_ ATCATTTTATTCCCATGAGTAAAGG GATGAGGAAACAGAGGGTTTGTGAGTTGAAAACACA

Endogeno TTTCACG CTTCTCACAGCTAGTGAGTAATAAAGCTGG GACTCAAACCCA GGGCTGTTTGACT
us_17_18 CCAGTGCCTCTACCCACGGCCACCACTCTTTGCTTGTCAATGTTGTTCTAAACATATTGAAGG
GG GGGCTCTGACCGTGGCAAGCGTGTGAGTAGTAAGGGGAGAATGGCCTTCATGCACTCC
CTCCTCACCTCCAGCGCCTTGTGTTTTCCTTGCTTAGTGATTTCCCCTCTCCCCACCCCACCCC
CCACAGTGTGTGAG tccggatccggagagggcagggga tctctcctta cttgtgg cg a cgtggaggag a a ccccg gccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCT
GGACGG CGA CGTAAA CGG CCACAAGTTCAGCGTGTCCGGCGAG GGCGAGG GCGATGCCA
CCTACGGCAAGCTG ACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCC
CACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGA
AG CAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTT
CTTCAAGGACG ACG GCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCT
GGTGAACCGCATCGAG CTGAAGGG CATCGACTTCAAGGAGGACGGCAACATCCTG G GGCA
CAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAAC
GGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG GACGGCAG CGTGCAGCTCGCC
GACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACT
ACCTGAGCACCCAGTCCAAG CTGAGCAAAGACCCCAACG AGAAGCGCGATCACATG GTCCT
GCTGG AGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgattgttt attgcagcttata atggtta ca a ataa agca a ta gc atca ca a atttca ca a ata a agc atttttttcactgcattctagtt gtggtttgtcca a a ctcatca atgtatcttatAACTGCATGGAGATGGATGTGATGTCGGTGGCCACAA
TTGTCATAGTGGACATCTG CATCACTGGGGGCTTGCTGCTGCTGGTTTACTACTGGAG CAA
GAATAGAAAGGCCAAGGCCAAGCCTGTGACACGAGGAGCGGGTGCTGGCGGCAG GCAAA
GGGGTAAGGCTGTGGAGTCCAGTCAGAG GAGATTCCTGCCAAGGGGGACGACCAGCCTG
GGCCAGGGTGGGTGGCAAGTCCACAGCTAGGTCAGAACAGCTTCTCTAGAGCTTCTATGCA
CAGCTTCTATTACTGTGATGACAAGATCTCAACAGACGGTTTCAAATCTCACATCACTCCCCT
CCTTCCCATCCTAGAAAAGTGCAAAAAAGTTTATGAAAGTGATGGGCTTCCTCACATACCTG
TCAATG CCTGCAGTCATCCGATTCCGCCCCTAAG CTGTGGG AAGAG AG
pA R B I-918: ATCCTTCTCTTTAGTTCATCTATTCTAC CCAAAG TGATCTCATCATCTG GTATG
CTGTTAG CA G
CD3G_1_ TTTCTTACCTGTATAGTATCTTCCAAATAACATGCCCCAAAATCCCAAAGTTTTACCCCTACTA
Endogeno ATTACAGCAATGTCTCTTTTATTCTTCACCCCCTGACGCAGATATTGGCGTCACCCGAGAGC
us_19_20 ATGTTAGTAATGCAGAATCTCCCCTCCCCAGAACTACTAAATAGCACCTGAAATTTTAACAA
GATCCCCATGTGATTCATGTGCACATCAAAGTTTGAG AAACACTACTCTAATGATCTCCTGG
TATG CAGAAG CAG G GAG AATTTCAGAG G CAAGATCCTTAATAGAACCACG G CTTTTCTCAT
TTCAGGAAACCACtccggatccggagagggcaggggatctctccttacttgtggcgacgtggaggagaaccccgg ccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTG
GACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCAC
CTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCC
ACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAA
GCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCT
TCAAG GACGACGGCAACTACAAGACCCGCGCCG AGGTGAAGTTCGAGGGCGACACCCTGG
TGAACCGCATCGAG CTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGG GCACA
AGCTG GAGTACAACTACAACAGCCACAACGTCTATATCATGG CCGACAAG CAGAAGAACG
GCATCAAG GTG AACTTCAAG ATCCGCCACAACATCG AG GACGGCA GCGTGCAG CTCG CCG
ACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTG CTGCTGCCCGACAACCACTA
CCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACG AGAAGCGCGATCACATGGTCCTG
CTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAG CTGTACAAatgattgttta ttg ca gcttata a tggtta ca a ata aagcaatagcatcacaaatttca ca a ata a agca tttttttca ctgcattctagttg tggtttgtcca a a ctcatca a tgtatctta tTTGGTTAAGGIGTATG ACTATCAAGAAGATGGTTCGGTA
CTTCTGACTTGTGATGCAGAAGCCAAAAATATCACATGGTTTAAAGATGGGAAGATGATCG
GCTTCCTAACTGAAGATa a a a aa a a aTGGAATCTG GGAAGTAATGCCAAG G ACCCTCGAG GG
ATGTATCAGTGTAAAGGATCACAGAACAAGTCAAAACCACTCCAAGTGTATTACAGAAGTA
TGTAATCCCCTTTGGTCTGTTTGTTGTGAAATTAATCAGTATTTGCTGTTCTGGTGAGCTTTT
TATCTGGGGTGAAAGTGGAAATAG ATCCTCAACAGTAATATTATCGCCTGTTCTCTTAATTT
CAGCTTGCCTCTTTTAAAATACTGTAAGATACTTCCCTCACCCTATTGAAAAACTACAGCCAG
TCCTGTAAAATTTTGTTTACCTTTGGGTGG GCTCCATGG

pARB I-919: ACCAGGCGTGGAGGCGCGCGCCTGTAGTCCCAGCTACTCAGGAGGCTGAGGCAGGAGGA
CD3G_3_ CCACTTGAACCCAGGAGGTCGAGGTTGCAGTGAGCTGCGATTGTGCCACTGCACTCCAGCC
Endogeno TGGGCAACAGAGAAAGACTCCGTCTCAAAAAAAAAAAAGAGAGAGAGAGAAAAAGAAAA
us_21 AAGACAGAGCCTCCATCTCCTTGTCCTCTTTCCATCCTCAGGACCATGAAGTACCCACTCCAA
ATTCTCACATATAAAAAACATTCAATAAACATGCATCAAATTAATTAATAGAGGATGGAAAA
AATGACTTATGACTGTGCTGTCCTTTCCAGCCCCTCAAGGATCGAGAAGATGACCAGTACAG
CCACCTTCAAGGAAACCAGTTGtccggatccggagagggcaggggatctctccttacttgtggcgacgtggag gagaaccccggccccATGGTGAGCAAGGGCGAGGAGCTGITCACCGGGGIGGIGCCCATCCIG
GTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGG
CGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTG
CCUGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGA
CCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGC
ACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGC
GACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATC
CTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGC
AGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGC
AGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGA
CAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCAC
ATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACA
Aatgattgtttattgcagcttataatggttacaaataaagcaatagcatcacaaatttcacaaataaagcatttttttc act gcattctagttgtggtttgtccaaactcatcaatgtatcttatAGGAGGAATTGAACTCAGGACTCAGAGTA
GGTGGGTTCTTCAATGCCAATTCTAATAAAGGACCCTTGCATCAACTGCCCTCGCAATTG CT
TCTAAGTCTAGCTCCCTTCCCTAAGCGGCTATAAGCATCAGACTCTGGGGATCAGGGATTG
GGACGTGGTTTGGGGTACTCTTTTCTAAAAATTCTGGGGCCATACTGATTGTCTTGGCCTAG
GTAAATATGAATTTTATGTATCTGTAAATCCTGTCAGAGCAGGGCCTCAAGCCATAGAGATG
CTGAATATTAATCTTAACCTACATTTGAATTICTCATTATCTACACTATTAACATTTTGGGCTA
ATTAATTATTTGTGATGAGGGGCTAGCCTGTGCATTGTAGGAGTTATGGAAGCATCCCTGG
CCTCTCTCCACCAGATGCTGGTAGATTGTCCAGTGTGACAATCAAAAAT
pARBI- GACCTCAGGTGATCTGCCCGCCTCAGCTTCCCAAAGTGCTGGGATTACAGGCATGAGCCAC 1806 920:
CACACCTGGCCTAAAATTAATTTTTAAAGATCTCTTAAAAAGCAGACACCAGCCCCAATCTC
CD5_1_E AGACCCCTTGAGACAGAATTTCCAGGACAGGGGCCATCCTGCTGGACAGTGGGTGCCGAG
ndogeno AACACCTTGCCCATTTATCTGAGCTCCCTTCTGACTCTGAAATCTGGAGCCCCACCCTCCTGG
us_22 GTCTAGCTTCGGGG
CTGCCTGGGTCAGGGTCCTCTGGGAAGCCCCTGCAGTGCCCCAGAAG
GGACGAAGCTCACAAGGGGCAAGGCAGGCAGCCCACGGGGCAGGAGGGAGCTCAACTGG
GCGTCCTAGGGAGAG GGCAGTGAGGGGTGCCAGTG GGGAACCCCTCCCAGCCTGACCCCC
ACCACACCTTTCTGACCCCCAGATtccggatccggagagggcaggggatctctccttacttgtggcgacgtgg aggagaaccccggccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCC
TGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAG
GGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCG
TGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCC
GACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGC
GCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGG
GCGACACCCTG GTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACA
TCCTGGGG CACAAGCTG GAGTACAACTACAACAGCCACAACGTCTATATCATG GCCGACAA
GCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGT
GCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCC
GACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGAT
CACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGT
ACAAatgattgtttattgcagcttataatggttacaaataaagcaatagcatcacaaatttcacaaataaagcattttt tt cactgcattctagttgtggtttgtccaaactcatcaatgtatcttatTTCCAGGCAAGGCTCACCCGTTCCAAC
TCGAAGTGCCAGGGCCAGCTGGAGGTCTACCTCAAGGACGGATGGCACATGGTTTGCAGC
CAGAGCTGGGGCCGG AGCTCCAAGCAGTGGGAGGACCCCAGTCAAGCGTCAAAAGTCTGC
CAGCGGCTGAACTGTGGGGTGCCCTTAAGCCTTGGCCCCTTCCTTGTCACCTACACACCTCA
GAGCTCAATCATCTGCTACGGACAACTGGGCTCCTTCTCCAACTGCAGCCACAGCAGAAAT

GACATGTGTCACTCTCTGG GCCTGACCTGCTTAG GTG G GTAACTAGCCAGCCACACG GG CA
CCCTGGGCCTGGGCGCCAGCCCCGAGGAGACTGCCCGAGGCCTGTGATCTAGGGTCTGAG
CAGGCTGGTGGAAGGGGTGGGGGGACCCCAGTTTATAACCACTCCCCAAGACACATACC
pARBI- GGACAGGGGCCATCCTGCTGG

921:
CCCTTCTGACTCTGAAATCTGGAGCCCCACCCTCCTGGGTCTAGCTTCGGGGCTGCCTGGGT
CD5_2_E CAGGGTCCTCTGGGAAGCCCCTG CAGTGCCCCAGAAGGGACGAAGCTCACAAGGGGCAAG
ndogen o GCAGGCAGCCCACGG GGCAGGAGGGAGCTCAACTGGGCGTCCTAGGGAGAGGGCAGTGA
us_23 GG
GGTGCCAGTGGGGAACCCCTCCCAGCCTGACCCCCACCACACCTTTCTGACCCCCAGATT
TCCAGGCAAGGCTCACCCGTTCCAACTCGAAGTGCCAGGGCCAGCTGGAGGTCTACCTCAA
GGACGGATG GCACATGGTTTGCAGCCAGAGCTGGGGCCGGAGCTCCAAGCAGTGGGAGG
ACCCCAGTCAAGCGTCAAAAGTCTGCtccggatccggagagggcaggggatctctccttacttgtggcgacg tggaggaga a ccccggccccATGGTGAGCAAGGGCGAGGAG CTGTTCACCGGG GTGGTGCCCAT
CCTGGTCGAGCTGGACGG CGACGTAAACG GCCACAAGTTCAG CGTGTCCG GCGAG GGCGA
GGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCC
GTGCCCTG GCCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAGCCG CTACCC
CGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGG AG
CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAG G
GCGACACCCTG GTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACA
TCCTGGGG CACAAGCTG GAGTACAACTACAACAGCCACAACGTCTATATCATG GCCGACAA
GCAGAAGAACG GCATCAAGGTGAACTTCAAGATCCG CCACAACATCGAGGACG GCAG CGT
GCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCG GCGACGGCCCCGTGCTGCTGCCC
GACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAG CGCGAT
CACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGG GATCACTCTCGGCATGGACGAGCTGT
ACAAatgattgtttattgcagcttata atggtta ca a a ta aagcaatagcatcaca a atttca ca a ata a agcattttttt ca ctgcattctagttgtggtttgtccaa a ctcatca atgtatctta tCAGCGGCTGAACTGTGG G
GTGCCCTTA
AG CCTTGG CCCCTTCCTTGTCACCTACACACCTCAGAGCTCAATCATCTG CTACGGACAACT
GG GCTCCTTCTCCAACTGCAGCCACAGCAGAAATGACATGTGTCACTCTCTGGGCCTGACCT
GCTTAGGTGGGTAACTAGCCAGCCACACGGGCACCCTGG GCCTGGGCGCCAGCCCCGAGG
AGACTGCCCGAGGCCTGTGATCTAGGGTCTGAGCAG GCTGGTGGAAGGGGTGGGGGGAC
CCCAGTTTATAACCACTCCCCAAGACACATACCCAGGAGGGGGACTGGAAGGGGCCAGCA
CCCATCTGTAG GATG G CAATG GAG GACCTAGTTCTG CCAATCACTGACTTCATCGTCG CCTC
TGAACCTCCATTCTCCCATCTGTGAAGTGGGGTGGTACTTCCCGCCTCGCAGGAG GCT
pARBI- tgctgggatta caggcatgagcca ccacacctggccta a a atta atttttaa agaTCTCTTAAAAAGCAGACAC 1806 922:
CAGCCCCAATCTCAGACCCCTTGAGACAGAATTTCCAGGACAGGGGCCATCCTGCTGGACA
CD5_3_E GTGGGIGCCGAGAACACCTIGCCCATTTATCTGAGCTCCCTTCTGACTCTGAAAICTGGAGC
ndogen o CCCACCCTCCTGGGTCTAG CTTCGGGGCTGCCTG GGTCAGG GTCCTCTGGGAAGCCCCTGC
us_24 AGTGCCCCAGAAGGG ACGAAGCTCACAAGGG GCAAGGCAGGCAGCCCACGGG GCAG
GAG
GGAGCTCAACTGGGCGTCCTAGGGAGAGGGCAGTGAGG GGTGCCAGTGGGGAACCCCTC
CCAGCCTGACCCCCACCACACCTTTCTGACCCCCAGATTTCCAGGCAAGGCTCACCCGTTCC
AACTCGAAGTG Ctccggatccggaga gggcaggggatctctcctta cttgtggcga cgtggagga ga a ccccggc cccATG GTGAGCAAG GGCGAGGAG CTGTTCACCGGGGTG GTG CCCATCCTGGTCGAGCTG
GACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCAC
CTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCC
ACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAA
GCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCT
TCAAG GACGACGGCAACTACAAGACCCGCGCCG AGGTGAAGTTCGAGGGCGACACCCTGG
TGAACCGCATCGAG CTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGG GCACA
AGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACG
GCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCG
ACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTG CTGCTGCCCGACAACCACTA
CCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTG
CTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAG CTGTACAAatgattgttta ttg ca gcttata a tggtta ca a ata aagcaatagcatcacaaatttca ca a ata a agca tttttttca ctgcattctagttg tggtttgtcca a a ctcatca a tgtatctta tCAGGG CCAGCTGGAG GTCTACCTCAAG GACGGATGGCA

CATGGTTTGCAGCCAGAGCTGGGGCCG GAGCTCCAAGCAGTGGGAGGACCCCAGTCAAGC

GTCAAAAGTCTG CCAGCGGCTGAACTGTGG GGTGCCCTTAAG CCTTGGCCCCTTCCTTGTCA
CCTACACACCTCA GAG CTCAATCATCTG CTA CG G ACAACTG G G CTC CTTCTCCAACTG CA G C
CACA G CAG AAATG A CATG TGTCACTCTCTG G G CCTGACCTG CTTAG GTG G GTAA CTAG CCA
GCCACACG GGCACCCTGGGCCTGGGCG CCAGCCCCG AGG AGACTGCCCG AG GCCTGTGAT
CTAGGGTCTGAGCAGGCTGGTGGAAGGGGTGGGGGGACCCCAGTTTATAACCACTCCCCA
AG ACACATACCCAG GAGGGG GACTGGAAG GGG CCAGCACCCATCTGT
pARBI- GGGGGCGGCCGCACGG CTAGAGCGGAGACCCCGCGCCCCCTCCGCCCGCGTGG

923: GG GGTGCGG
GGCCCGGGGAGGCACGGGGGCTGCGCGTCGGGGCGCAGCCGCCGCCCGC
E DF 1_1_E GTGTG CTCGGAGGCCGCGGGGCCCGGGCTCCGGGGTCCTCCCGACCTGCAGCCCCAGCGG
ndogen o CTACCGCGCCTCGCCAGGCCAGG CCAGGCCCCGACGTCGCCTTCCCTACGTCG CCGG CG CC
us_25 CG
GCCACGACGTCCCTCAGACGAGCCGAACGCCGAATGGCCCCGAGCACGGGAAGTGCCC
GCCCCCCGCGTGCAGCCAGCCAATGGGACGCCGAAAGCGGGGAGGTGCCGAGGGGACGT
AG CGTCG CCGCGCCAGGTCTCTAG CAGCTGCCGCTG AGCCGCCG G ACG G ACGCTCGTCTTC
GCCCG CCATGG CCGAGAGCGACTGGGACACGtccggatccggagagggcaggggatctctccttacttgt ggcga cgtggaggaga a ccccgg ccccATG GTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTG GT
GCCCATCCTGGTCGAGCTGGACG GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGA
GGGCGAGGGCGATG CCACCTACG GCAAG CTGACCCTG AAGTTCATCTG CACCACCG GCAA
GCTG CCCGTGCCCTGG CCCACCCTCGTGACCACCCTG ACCTACG GCGTGCAGTGCTTCAG CC
GCTACCCCGACCACATGAAG CAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGT
CCAGGAGCGCACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCGCGCCGAGGTGAA
GTTCGAGG GCGACACCCTG GTGAACCG CATCGA GCTGAAGGG CATCG ACTTCAAGGAG GA
CG GCAACATCCTGG G G CA CAAGCTGG A GTACAA CTA CAA CAG CCACAACGTCTATATCATG
GCCGACAAGCAGAAG AACGGCATCAAG GTGAACTTCAAGATCCGCCACAACATCGAGG AC
GGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGC
TGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAA
GCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGAC
GAGCTGTACAAatgattgtttattgcagcttata atggttaca a ata a agca ata gcatca ca a atttca ca aata a agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatctta tGTGACGGTGCTGCG
CAAGA
AG GGCCCTACGGCCGCCCAGG CCAAATCCAAGCAGGTGCTTTG CTGACTGAG GAGCGG CC
GGGCCGGGCGGGGCGGACGCTgggccggggcca cgga ccgtggggcagggggctcgggga a gggcaggc ggggcctgggacagggcgcaggggacggggacaaggctggGGTGCCGGGGGCAGGACGGGGTTCGAA
GGGCGGGGCGAATCGCAGG GGTAGGAGGACCGGGCCAGGACaggggtggga aggtggggggcg gacccaggaggggatgga cggacccaggaggcagggggCGGGCGACTGG AGGAGGAGAGGAGCAGG
GAACCGGACGGG GCAAGGGGCAG GCGATAGGGCCGGGACCAGTGCCAGGGTGGGGCGG
GGAGGG GATAGGGGCAGGGACTGTTGGTG GGGACAGGTGTGAGGACAG
pARBI- ACG GCTAG AG CGGAGACCCCG CGCCCCCTCCG CCCGCGTGGCCCG

924: CCGGGGAGGCACGGGGGCTGCGCGTCGGGGCGCAGCCGCCGCCCGCGTGTG
CTCGGAGG
E DF1_2_E CCGCGGGGCCCGGGCTCCGGGGTCCTCCCGACCTGCAGCCCCAGCGGCTACCGCGCCTCG C
ndogen o CAGGCCAGGCCAGGCCCCGACGTCGCCTTCCCTACGTCGCCGGCGCCCGGCCACGACGTCC
us_26 CTCAGACGAGCCGAACGCCGAATGGCCCCG AGCACGGGAA GTGCCCG
CCCCCCGCGTG CA
GCCAGCCAATGGGACGCCGAAAGCGGGGAGGTGCCGAG GGGACGTAGCGTCGCCGCGCC
AG GTCTCTAGCAG CTGCCGCTGAGCCGCCG GACG GACGCTCG TCTTCGCCCGCCATGGCCG
AG AGCGACTGG G ACACG GTGACGGTGCTGt ccggatccggagagggcaggggatctctcctta cttgtgg cga cgtggagga ga a ccccggccccATG GTG AGCAAGGGCGAGGAGCTGTTCACCGG GGTGGTG C
CCATCCTGGTCGAGCTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG GCGAGG
GCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCT
GCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCT
ACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCA
GGAGCG CACCATCTTCTTCAAG GACG ACG GCAACTACAAGACCCGCGCCGAG GTGAAGTTC
GAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGG GCATCG ACTTCAAGG AG GACGGC
AACATCCTG G G GCACAAG CTG GAG TACAACTACAACAG CCACAACGTCTATATCATG G CCG
ACAAGCAGAAGAACG GCATCAAGGTGAACTTCAAGATCCGCCACAACATCG AG GACGG CA
GCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCT
GCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCG
CGATCACATGGTCCTGCTGGAGTTCGTGACCGCCG CCGGGATCACTCTCGGCATGGACGAG

CTGTACAAatgattgtttattgcagcttataatggttacaa ata a agca atagcatca ca a atttca ca a ata aagca tttttttca ctgca ttctagttgtggtttgtcca a actcatca atgtatcttatCGCAAGAAGGGCCCTACGGCCG
CCCAG GCCAAATCCAAGCAGGTGCTTTGCTGACTGAGGAGCGGCCGGGCCGG GCGGGGC
GGACGCTgggccggggccacggaccgtggggcagggggctcggggaagggcaggcggggcctgggacagggcgc agggga cggggacaaggctggGGTGCCGGGGGCAGGACGGGGTTCGAAGGGCGGGGCGAATCG
CAGGGGTAGGAGGACCGGG CCAGGACaggggtgggaaggtggggggcgga cccaggaggggatggac ggacccaggaggcagggggCGGGCGACTG GAGGAGGAGAGGAGCAGG GAACCGGACGGGGCA
AG GG GCAGGCGATAG G GCCGGGACCAGTG CCAGGGTGGG GCGGGGAGGGGATAGGGG
CAGGGACTGTTGGTGGGGACAGGTGTGAGGACAGAGGCAGGAGGTG
pARBI- CCCTGGAGTCG

925: CCATTTCAAAGCGGTAACCAGACCCCAGAGGCTGCCTGAACTCAAGGGGACATGGGAACC
E DF1_3_E CAGGCTGTCCCAAGTTCACACACCTCAGGTCGTGGACTTCCAGAGTTTTCTCTCAGTTATGA
ndogen o GGACAGGCAGCTGTACTCATG CCAACCAGAGCTGCTCTG GGAAGATGGCTGCCTCCCAGG
us_27 GCTCCGGCCAG CACCG GTG CAGG CAGGCACTCAGCTGACAACGTCCCCGG GG
GCGCCG CA
CACACCACACCCACCAGCACATGGACCCCACAGCACAGCCTCATGTTGCAAGCGGAAACAC
AAGTACCTACATTTCTTGGAAGTCTCCACATCTTCTCCTCGTCTCTGTG CCG CTAAGATAG CC
TAGAAAATTAGAAAACATCAGTG Gtccggatccgga gagggcaggggatctctccttacttgtggcgacgtg gaggaga a ccccggccccATGGTGAGCAAGGG CGAGGAGCTGTTCACCGGGGTGGTGCCCATCC
TGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAG
GGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTG CACCACCGG CAAGCTGCCCG
TGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCC
GACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGC
GCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGG
GCGACACCCTG GTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACA
TCCTGGGG CACAAGCTG GAGTACAACTACAACAGCCACAACGTCTATATCATG GCCGACAA
GCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGT
GCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCG GCGACGGCCCCGTGCTGCTGCCC
GACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAG CGCGAT
CACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGG GATCACTCTCGGCATGGACGAGCTGT
ACAAatgattgtttattgcagcttata atggtta ca a a ta aagcaatagcatcaca a atttca ca a ata a agcattttttt ca ctgcattctagttgtggtttgtccaa a ctcatca atgtatctta tTTTCTAAAGAGAAGATGTGCTTTATAC
ACATCAGCTCCACTAGGACTGCTGCAAAACCAGGCGCGAAGCGTCGCCTGAGAACAGCAG
CTTCTAGGGCCCCTGGGGTACGCACCCACACG CAGCTGGGTTTTGTGCGAGAGG CAGTGA
GAGCCGGGCTGACCTGGCTTCCCGAGGCATTCCCTGCAGGGGAACCAGGGGGGTGGAGCC
CACCCTCCCCGTGCTATCTGAACACCGGCCATCCCCCTCCCCAAACCCACAACCCCAGGAGT
GAGAGCCCCGGCGCAGCCTCACCGTGGCCAGGTCCTTCTGCGTAAGCCCCTTGCTCTGCCG
ACCTTGCTGGATCACCTTG CCCACCTCCAGGGTCACCCTGTCATGGTGCAGCTCCTCTGTCTC
CCGGTCCAGCTTGG CCGTGTTCTTGGTAATAGAATGTTGTTTGTTCTGGCCAGCAGC
pARBI- CTCTTGCTTCAACAGTGTTTG GACG

926: CCTCCGATTTCCTCTCCGCTTGCAACCTCCGGGACCATCTTCTCGGCCATCTCCTGCTTCTGG
FTL 1 En GACCTGCCAGCACCGTTTTTGTGGTTAG CTCCTTCTTGCCAACCAACCATGAGCTCCCAGATT
dogeno us CGTCAGAATTATTCCACCGACGTGGAGGCAGCCGTCAACAGCCTGGTCAATTTGTACCTGC
_28_30 AG GCCTCCTACACCTACCTCTCTCTG GTGAGTCCCCAGGACGCCCCTGG
CCCTAATTTCCTCC
AG CTG CGCACCTCCGGCCCTCACTGCACGCGCCAGCCTTCTTTGTGCGGTCG GGTAAACAG
AG GGCGGAGTCCCCTTG GCCTCGCCTCCCG CTAACCATTGTTG CCTCCATCTCTTCCCGTAG
GGCTTCTATTTCGACtccggatccggagagggcaggggatctctccttacttgtggcgacgtggaggagaacccc ggccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGC
TG GACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG GCGAGGGCGAG GGCGATG CC
ACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCG GCAAGCTG CCCGTGCCCTG GC
CCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCG CTACCCCGACCACATG
AAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCT
TCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCC
TGGTGAACCGCATCGAGCTGAAGG GCATCGACTTCAAGGAGGACGGCAACATCCTGGG GC
ACAAGCTG GAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAG CAGAAGAA
CG GCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGC

CGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCAC
TACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATG GTCC
TGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgattgt ttattgcag cttataatggtta ca a ata a agcaatagcatcaca aatttcacaa ata a agcatttttttca ctgcattcta g ttgtggtttgtccaaactcatcaatgtatcttatCGCGATGATGTGGCTCTG GAAG GCGTGAGCCACTTC
TTCCGCGAATTGGCCGAGGAGAAGCGCGAGGGCTACGAGCGTCTCCTGAAGATGCAAAAC
CAGCGTGGCGGCCGCGCTCTCTTCCAGGACATCAAGGTAACTAGTGTGTGGGTAATGGACT
ACATCTCCCAGCAGGCCGTGCGCGCGAGGAGCCTTGATTTGAGGGCGTAGGTGTCGCGTG
GG CTTCTGG GAGATTGAGTTCGGTCTTGTGAG CCCTCTTAACCGCTGGAAATAGAG GCG CA
CCTCGTGCAGTGCCCACAACACGCGGCAGTCCACACCG CTGCGTGGTCTTAGG GACGTATA
GCTGTAAGAGCTAGGACAGGGTGCGGAGAGTGATAAATACAAGCTGTCACATGTCTTTGT
GG CCTGGGCCTCTG ACCCCCAACGACTCTTGGGAAATGTAG GTTTAGTTCT
pARBI- TGGGGCGGG GGGCTGAGACTCCTATGTGCTCCG GATTGGTCAGGCACGG

927: GCCTCCTGCCACCGCAGATTGGCCGCTAGCCCTCCCCGAGCGCCCTGCCTCCGAGGGCCGG
FTL_2_En CGCACCATAAAAGAAGCCGCCCTAGCCACGTCCCCTCGCAGTTCGGCGGTCCCGCGGGTCT
dogeno us GTCTCTTGCTTCAACAGTGTTTGGACGGAACAGATCCGGGGACTCTCTTCCAGCCTCCGACC

GCCATCTCCTGCTTCT
GGGACCTGCCAGCACCGTTTTTGTGGTTAGCTCCTTCTTGCCAACCAACCATGAGCTCCCAG
ATTCGTCAGAATTATTCCACCGACGTGGAGGCAGCCGTCAACAGCCTGGTCAATTTGTACCT
GCAGGCCTCCTACACCTACtccggatccggagagggcaggggatctctccttacttgtggcgacgtggaggaga accccggccccATGGTGAGCAAGGGCGAG GAGCTGTTCACCGGGGTGGTGCCCATCCTGGTC
GAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGG CGAGGGCGA
TGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCT
GGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCAC
ATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCA
TCTTCTTCAAG GACGACGGCAACTACAAGACCCGCGCCGAG GTGAAGTTCGAGG GCGACA
CCCTGGTGAACCGCATCGAG CTGAAGGGCATCGACTTCAAGGAGGACGG CAACATCCTGG
GGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAA
GAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTG CAG CT
CGCCGACCACTACCAGCAGAACACCCCCATCGGCG ACGGCCCCGTGCTGCTGCCCGACAAC
CACTACCTGAGCACCCAGTCCAAGCTGAG CAAAGACCCCAACGAGAAGCGCGATCACATG
GTCCTGCTGGAGTTCGTGACCGCCGCCGG GATCACTCTCGGCATGGACGAGCTGTACAAatg attgtttattgcagcttata atggttacaa ata a agcaatagcatca ca a atttca ca aata a a gcatttttttc a ctg ca t tctagttgtggtttgtcca a a ctcatca atgtatctta tCTCTCTCTG GTGAGTCCCCAGGACG CCCCTGG
C
CCTAATTTCCTCCAGCTGCGCACCTCCGGCCCTCACTGCACGCGCCAGCCTTCTTTGTGCGGT
CG GGTAAACAGAGGGCGGAGTCCCCTTGGCCTCGCCTCCCGCTAACCATTGTTGCCTCCATC
TCTTCCCGTAGGGCTTCTATTTCGACCGCGATGATGTGG CTCTGGAAG G CGTGAGCCACTTC
TTCCGCGAATTGGCCGAGGAGAAGCGCGAGGGCTACGAGCGTCTCCTGAAGATGCAAAAC
CAGCGTGGCGGCCGCGCTCTCTTCCAGGACATCAAGGTAACTAGTGTGTGGGTAATGGACT
ACATCTCCCAGCAGG CCGTGCGCGCGAGGAG CCTTGATTTGAGGG CGTAGGTGTCGCGTG
GG CTTCTGGGAGATTGAGTTCGGTCTTGTGAGCCCTCTTAACCGCTGGA
pARBI- CTTTGCATAGTTTATCCTATTAGTAATCTATTCTGTCTTTGGAATATGTTTTGTGATGATGAA 1806 928: ATAAATACTATAAATAGTATTATTCCTTTTGCATTGAG AGTCCTGACGAAATGTCCATGTGA
PTE N_1_ CAGTTCATTTTGGGTTTAGCTCTACCTCTAATATGTGACCTATG CTACCAGTCCGTATAGCGT
End ogeno AAATTCCCAGAATATATCCTCCTGAATAAAATGGGGGAAAATAATACCTGGCTTCCTTAATG
us_31 ATTATATTTAAGACTTATCAAGAGACTATTTTCTATTTAACAATTAGAAAGTTAAGCAATACA
TTATTTTTCTCTGGAATCCAGTGTTTCTTTTAAATACCTGTTAAGTTTGTATGCAACATTTCTA
AAGTTACCTACTTGTTAATTAAAAATTCAAG AGTTTTTTTTTCTTATTCTG AG GTTATCTTTTT
ACCACAGTTtccggatccggagagggcaggggatctctcctta cttgtggcga cgtggaggaga a ccccggccccA
TGGTG AGCAAGG GCGAG GAG CTGTTCACCG G GGTGGTGCCCATCCTGGTCGAGCTGGACG
GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACG
GCAAGCTGACCCTG AAGTTCATCTGCACCACCGGCAAGCTG CCCGTGCCCTGG CCCACCCTC
GTGACCACCCTGACCTACGGCGTGCAGTG CTTCAGCCGCTACCCCGACCACATGAAGCAGC
ACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAG
GACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAAC

CGCATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGGCAACATCCTGGGGCACAAGCTG
GAGTACAACTACAACAG CCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCA
AGGTGAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTCGCCGACCACTA
CCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGC
ACCCAGTCCAAGCTGAG CAAAGACCCCAACGAGAAG CGCGATCACATGGTCCTG CTG GAG
TTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagctt ata a tggtta ca a a ta a agca a tagca tca ca a a tttca ca a a ta aa gcatttttttca ctgcattctagttgtggtttgtc ca a actcatca a tgtatctta tGCACAATATCCTTTTGAAGACCATAACCCACCACAG CTAGAACTTA
TCAAAC CCTTTTGTG AAG ATCTTG A CCAATG G CTAAGTG AAG ATGACAATCATGTTG CAG CA
ATTCACTGTAAAGCTG GAAAGGGACGAACTGGTGTAATG ATATGTGCATATTTATTACATC
GGGGCAAATTTTTAAAGGCACAAGAGG CCCTAGATTTCTATGGGGAAGTAAGGACCAGAG
ACAAA AAG G TAAGTTATTTTTTGAT GTTTTTC CTTTCCTCTTC CTG G ATCTG AG AATTTATTG
GAAAA CAG ATTTTG G G TTTCTTTTTTTCCTTCAG TTTTATTG A G G TGTAATTG A CAAGTAAAA
ATTATATATAAATACAATGTATAATATGATGTTTTGATGTATGTGTATATACATTGTGAAATG
ATTACTACAGTCAAACTACTTAACATATTCAT
pARB I-929: TAATACCTG G CTTCCTTAATG ATTATATTTAAG ACTTATCAAG A G ACTATTTTCTATTTAACA

PTE N_2_ ATTAGAAAGTTAAGCAATACATTATTTTTCTCTGGAATCCAGTGTTTCTTTTAAATACCTGTT
Endogeno AAGTTTGTATGCAACATTTCTAAAGTTACCTACTTGTTAATTAAAAATTCAAG AGTTTTTTTTT
us 32 CTTATTCTG A G GTTATCTTTTTACCA CAGTTG CACAATATCCTTTTG AA G
AC CATAACC CACC
ACAGCTAG AACTTATCAAACCCTTTTGTGAAGATCTTGACCAATGGCTAAGTGAAGATGACA
ATCATGTTGCAGCAATTCACTGTAAAGCTG GAAAG G G AC GAACTG GTGTAATG ATATGTG C
ATATTTATTACATtccggatccggagagggcaggggatctctcctta cttgtggcga cgtggaggaga a ccccgg c cccATGGTGAGCAAGGGCGAGGAG CTGTTCACCGGGGTG GTG CCCATCCTG GTCG AG CTG
GACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCAC
CTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCC
ACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAA
GCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGG CTACGTCCAGGAGCGCACCATCTTCT
TCAAG GACGACGGCAACTACAAGACCCGCGCCG AGGTGAAGTTCGAGGGCGACACCCTGG
TGAACCGCATCGAG CTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGG GCACA
AGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACG
GCATCAAG GTG AACTTCAAG ATCCGC CACAACATCG AG GACGGCAGCGTGCAG CTCG CCG
ACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTG CTGCTGCCCGACAACCACTA
CCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACG AGAAGCGCGATCACATGGTCCTG
CTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAG CTGTACAAatgattgttta ttg ca gcttata a tggtta ca a a ta aagca a tagca tcacaa a tttca ca a a ta a agca tttttttca ctgcattctagttg tggtttgtcca a a ctcatca a tgtatctta tCG GGGCAAATTTTTAAAG GCACAAG AG
GCCCTAGATTTC
TATGGG G AAGTAAG G A CCAG A G ACAAAAA G GTAAGTTATTTTTTG ATG TTTTTC CTTTCCTC
TTCCTG G ATCTG A GAATTTATTG G AAAA CAG ATTTTG G GTTTCTTTTTTTCCTTCAGTTTTATT
GA G GTGTAATT GACAA GTAAAAATTATATATAAATACAATGTATAATATG ATGTTTTG ATG T
ATGTGTATATACATTGTGAAATGATTACTACAGTCAAACTACTTAACATATTCATCACCTCAC
ATAATTATTATTCTCCCCCCAGGGTGAAAGCATTTAAGATCTACAAGCTACAATTTTCAATTA
TACAATG TTATTATTAA CTATAGT CACTATG CTGTCCAGTA G AG CTTCAG ATCTTGTTCATCT
TGTGTTCCTCCCTCCCCACCCTCAGTCCCTGG AA
pARB I- ATAAATAG TATTATTCCTTTTG CATTG AG A GTCCTG ACG AAATGTCCATG

930: TGGGTTTAGCTCTACCTCTAATATGTGACCTATGCTACCAGTCCGTATAGCGTAAATTCCCA
PTE N_3_ GAATATATCCTCCTGAATAAAATGGGGGAAAATAATACCTG GCTTCCTTAATGATTATATTT
Endogeno AAG A CTTATCAAG A GACTATTTTCTATTTAA CAATTAG AAAG TTAAG
CAATACATTATTTTTC
us_33 TCTG GAATCCAGTGTTTCTTTTAAATACCTGTTAAGTTTGTATG CAA
CATTTCTAAAG TTACC
TACTTG TTAATTAAAAATTCAA G AGTTTTTTTTTCTTATTCTG AG GTTATCTTTTTA CCACA GT
TG CACAATATCCTTTTG AAG A CCATAACCCA CCACA G CTAG AACTTATCAAA CCCTTTTG TG A
AG ATCTTGACtccgga tccggagagggcagggga tctctcctta cttgtggcga cgtggagg aga a ccccggcccc ATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGAC
GG CGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTAC
GGCAAGCTGACCCTGAAGTTCATCTG CACCACCGG CAAGCTGCCCGTGCCCTGGCCCACCC

TCGTGACCACCCTGACCTACGG CGTG CAGTGCTTCAGCCGCTACCCCGACCACATGAAGCA
GCACGACTTCTTCAAGTCCGCCATG CCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCA
AG GACGACGGCAACTACAAGACCCGCGCCG AGGTGAAGTTCG AGG GCGACACCCTGGTG A
ACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGC
TGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCAT
CAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGG CAG CGTGCAGCTCGCCGACCA
CTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTG
AG CACCCA GTCCAAGCTGAG CAAAGACCCCAACGAG AAGCGCGATCACATGGTCCTGCTG
GAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgc agcttataatggtta ca aata a agca atagcatcaca a atttca ca a ata a agcatttttttca ctgcattctagttgtggt ttgtcca a a ctca tca a tgta tcttatCAATG GCTAAGTGAAGATGACAATCATGTTGCAG CAATTCAC

TGTAAAG CTGG AAAGGGACGAACTGGTGTAATGATATGTG CATATTTATTACATCG GG GCA
AATTTTTAAAGG CACAAG AG G CCCTAGATTTCTATGG G GAAGTAAGGACCAG AG ACAAAA
AG GTAAGTTATTTTTTGATGTTTTTCCTTTCCTCTTCCTG GATCT GAGAATTTATTG GAAAAC
AGATTTTGGGTTTCTTTTTTTCCTTCAGTTTTATTGAGGTGTAATTGACAAGTAAAAATTATA
TATAAATACAATGTATAATATGATGTTTTG ATGTATGTGTATATACATTGTGAAATGATTA CT
ACAGTCAAACTACTTAACATATTCATCACCTCACATAATTATTATTCTCCCCCCAGGGTGAAA
GCATTTAAGATCTACAAGCTACAATTTTCAATTAT
pA RB I-931: GTCTTAACCCTTGGTTCCTACccaagtttgtctctctagtcctc a a ctctga a agccacttgatatctta ca caatt PTP N2 1 ccttttttgcctga ga a a ata a a agtcagttttgatcattta ca a cca a a a atccta a cta a ca ca GACCTGTTTTTG
Endogen AAATCAGG TAG ATTGGAAGTCTTGGTTTACTTCTTATAAGCCCGTCCGCTGTCTGTCTTGTA
ous_34 ACATCCCAGGAGCAGACTTAACAAGGCCTTCCTG G AGCCTTG
CTCTTTCTGTCTGCTTCCCTC
ATCTGctctctctcctctctctTCACAGGGTCCACTTCCTAACACATGCTGCCATTTCTGGCTTATGG
TTTGGCAGCAG AAGACCAAAGCAGTTGTCATGCTGAACCGCtccggatccggagagggcaggggat ctctccttacttgtggcga cgtgga ggaga a ccccggc cc cATGGTGAG CAAGGGCGAGGAGCTGTTCAC
CG GGGTGGTGCCCATCCTGGTCGAGCTGGA CGGCGACGTAAACG GCCACAAGTTCAGCGT
GTCCGGCGAGGG CGAG GGCG ATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCAC
CACCG GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAG
TGCTTCAGCCGCTACCCCG ACCACATGAAGCAGCACGA CTTCTTCAAGTCCGCCATGCCCGA
AG GCTACGTCCAG GAGCG CACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCGCGCC
GAGGTGAAGTTCGAG GGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTC
AAGGAGGACGGCAACATCCTG GGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTC
TATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACA
TCGAG GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACG
GCCCCGTG CTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCC
CAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTC
GGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggttacaaata a agca atagcatca ca a a tttcaca a a ta a a gcatttttttcactgcattctagttgtggtttgtcca a a ctcatca a tgtatctta tATTGTGGAG A
AA GAATCG GTGAGTAATATTTAATATTTACAACTTAGTATACTGTATGTGATCATTAGCATA
TAAAGATTTTCATTTTGGGGCCAATATTCATTCCTTAGTTTTGGCCTTTAATTGCTAAAGG CT
AG CG GTCAAAGTGTTTCTGTCCG GAAAAA CTCATGTATCCTTTTCCTTTCTTAAGTATTTTTA
TAACAATTG CAGA CCTTTCATCTCCAAGATAGAAATACAATG GAATAAATATG GA CTAG CA
GTGACATATAGATCTTTG GAATCCCTAG G AATCCTTATTGACCTATTGTCAATAAGTCCTCTT
CCAGCTTTAAGATCTTGTACTGTTCTACTTCAGATTTCTTACTTTATTCTACATTTCATAATTG
CTGGTTTTTAATGATGATGAAAATTTCCTAATCCACTTCTAATTAGATGTGGCAATGACATGC
pA RB I- CAGAATTTGAATATGATGTGGCTGACCATAGATACCTCCATTGCATTAATTCTG

932: AGTGTTTGATG CTTG CTTGTA G GTGTTG AATAG AG CCTAATTAG CA GTAAGTAAACA
GATCT
PTP N2_2 agagcacagactttggcatctgaaggaccttggttcaatgctgggactgccacttactagctatgggatctttggtgga tt _Endogen tttaacctttgaaagcctttcatccctcacagatataatggggaagaaaatcgtttccattagtattgtgaggattaaa tg ous 35 attaattgatgta a a ca atgtgctttccatagta cttggcatatcata agtgcttcatGTTATTTTACGCTG G CTGG
GAA GATAAGTTTTG CTGTG G AGAATTTAA GAG G GATATTAAAATATATTTTT GTTATTTTAA
GGAAATTCGAAATGAGTCCCATGACtccggatccggagagggcaggggatctctccttacttgtggcgacgt ggagga ga a ccccggccccATGGTGAGCAAGGG CG AGGAGCTGTTCACCGGGGTGGTGCCCATC

CTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCG AGGGCGA
GGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCC
GTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCC
CGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGG AG
CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAG G
GCGACACCCTG GTG AACCGCATCGAG CTGAAGG G CATCGACTTCAAGGAGGACGGCAACA
TCCTGGGG CAC AA G CTG GAGTACAACTACAACAGCCACAACGTCTATATCATG G CC G A CAA
GCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGT
GCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCG GCGACGGCCCCGTGCTGCTGCCC
GACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAG CGCGAT
CACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGG GATCACTCTCGGCATGGACGAGCTGT
ACAAatgattgtttattgcagcttata atggtta ca a a ta aagcaatagcatcaca a atttca ca a ata a agcattttttt ca ctgcattctagttgtggtttgtccaa a ctcatca atgtatctta tTATCCTCATAGAGTGG
CCAAGTTTCCA
GAAAACAGAAATCGAAACAGATACAGAGATGTAAGCCCATGTAAGTACTTGTGGGTTTGTG
TG CATG TG TATTTTTTG GTTTGTTTTAG G AG G CAG G ATGTA GTG AAAAG G AG G G CTTG G
GG
TCAGTG TTG ATAAATA CAGTAATTTTTTTCCTATTTAC GTAA G A CACG TTCTTTAA GAATTTA
AAGGTGTAGAATAGTGGAAGGAAAAATAATTACCCATAATTATTTCATCCTTCAAACATAG
GCACCATTATTTTGGTGTATTTCCCACAACCTGTTTTTCTTATGCACAGTTTTCCTTTCTTTAA
AATAATTGTAATTATAGATGTCTATATAATATCCATATGTATTTCCTTTTCATATCATTCCATC
CCCTTTGTAAGTTTTGCTTAAAATAGTTATTGGTGTAGTTTGGTTTTTT
pA R B I- ACCA GTTATCGTTTTGTAG GTA GAC CAATCAATTCTTAG

933: TTTACTTTTAGGAAGGTAAGTACTTTATTCATTTAACTATGTTTACTCCTGGTCGATTTTTCAA
PT P N2_3 GTCACAATG G CTAATG TGCTACAAACAG AAG TG CTTCCTTTCCAG CA CTATA
GTAATAAACA
Endogen AG ATTG CATTTTATACTCCTTACAAAAAAAG TAG AG AATAG TTTAG G ATTTG
TCTCAACTCTA
ous_36 TTATGATGCTAATTCATTTTTCTTATTTCTTCTGTCTTTTTATAAATACCTAGAATATTAATATG
AAGATAAAAACAGTAATTTTGAAATGACAGTTGTGGTTTATCATTCTCTATTTTCAGATGATC
ACAGTCGTGTTAAACTGCAAAATGCTGAGAATGATTATATTAATGCCAGTTTAGTTGACATA
GAAGAGGCAtccgga tccggaga ggg caggggatctctccttacttgtggcga cgtggaggaga a ccccggcccc ATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGAC
GGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGG GCGAGGGCGATGCCACCTAC
GGCAAGCTGACCCTGAAGTTCATCTG CACCACCGG CAAGCTGCCCGTGCCCTGGCCCACCC
TCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCA
GCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCA
AG GACGACGGCAACTACAAGACCCGCGCCG AGGTGAAGTTCG AGGGC GACACCCTG GTG A
ACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGC
TGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCAT
CAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGG CAG CGTGCAGCTCGCCGACCA
CTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTG
AG CACCCA GTCCAAGCTGAG CAAAGACCCCAACGAG AAGCGCGATCACATGGTCCTGCTG
GAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAG CTGTACAAatgattgtttattgc agcttataatggtta ca aata a agca atagcatcaca a atttca ca a ata a agcatttttttca ctgcattctagttgtggt ttgtcca a a ctca tca a tgta tcttatCAAAGGAGTTACATCTTAACACAGGCAAGTAATACGATACC
ACGCAAATGTCTGAAAATGATGTGTTTGTGCTTGTTCTGCTTTAAATCATAGGTAAAAGTAT
ATCTTG A CTTCTTTTG G AAATG AAATG G TATTTCAG TTTTTTTCTG ACG ATGTAATG AATTAT
G G ACATTATAAG GTTTTG AA GCTTTG A GTATTTAAG ATAAAAG G CAA GTTATTTTTG ATATT
ACAC GTTCTGTG AAG G AAAATTCTTA G G AAATG G CTTA G GC CAGTTCTTTG G CA G ATTGTG
TTCCTTACATTATATCTG ACACAG AG TG CTG CTTATG CTTCTTAG TGTCTTCTTTTCTCTTCCC
ATACCCTGTGGCAAAACCAG AGGCCCTGGACAGCTCTTCTGTAGCTCCCCCTGCCTCGCATA
TATCTTCAGAGAGTACACCAAGCCCTGGATGGTGT
pARB I- CCTGCTCAAGG GCCGAGGTGTCCACGGTAGCTTCCTGG

934: GGTGACTTCTCGCTCTCCGTCAGGTAGGTGGG CCCCCCGCAACCCCGGGCATTTTGGCCACT

_Endogen GTCTGTTCCCTTGCCCCCAACCCCCACACTCCCCATCCCTGTCTGTGCCCACCCATGCCCATG
ous_37 TGTG CCCCCACCCAGG ACCTCAGCCGATCCCTG CCCTCCTGCCTCTACTCCTG
CACCGACTG
GCCTCACCGCCTGGTGCCCTGCAGGGTGGGGGATCAGGTGACCCATATTCG GATCCAGAAC

TCAGG GGATTTCTATGACCTGTATG GAGG G GAGAAGTTTGCGACTCTGACAGAG CTG GTG
GAGTACTACACTCAGCAGtccggatccggaga gggcagggg atctctcctta cttgtggcga cgtggagga ga a ccccggccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCG
AG CTG GACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCGGCGAGGGCGAG GGCGAT
GCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCT
GGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCAC
ATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCA
TCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACA
CCCTGGTGAACCGCATCGAG CTGAAGGGCATCGACTTCAAGGAGGACGG CAACATCCTGG
G G CACAAG CTG G AGTACAACTACAACAG CCACAACGTCTATATCATG G CC GACAAGCAGAA
GAACGGCATCAAG GTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTG CAG CT
CGCCGACCACTACCAGCAGAACACCCCCATCGGCG ACGGCCCCGTGCTGCTGCCCGACAAC
CACTACCTGAGCACCCAGTCCAAGCTGAG CAAAGACCCCAACGAGAAGCGCGATCACATG
GTCCTGCTGGAGTTCGTGACCGCCGCCGG GATCACTCTCGGCATGGACGAGCTGTACAAatg attgtttattgcagcttata atggttacaa ata a agcaatagcatca ca a atttca ca aata a a gcatttttttca ctg ca t tctagttgtggtttgtcca a a ctcatca atgtatctta tCAGGGTGTCCTGCAG GACCG CGACGGCACCAT

CATCCACCTCAAGTACCCG CTG AACTGCTCCGATCCCACTAGTGAGAGGTGAGGGCTCCGC
ACCCCCGCCATTCCCAAGCAGGGATGAGCCGGCTCCCACCCTGAACAGCCAGGGAGGCAG
GGAGACTGGCAGCCG GCGCTGCCTACCCTCCATCCCCTCCCCTCCCTGCACCAGCTGGGGCT
CTCAATGTCCCTCCTCCCTGCTGTCCTGGG ACCTGGTGTCTCAGAGCCTAACCTACCACCCTT
TCCACCTAACCCCGAGGAAG CCACAGAAAGCTGCCTCG CCCTACTCCGGGAGCCCTGGCCG
CTGCAACCCAGGTCCCACTGGAGACAGGGAGGCCACTGCTGGTGGCCAG CATGTCGTGCA
GG CCAGCTCTGTTGTTAG AAAG CTCTTCTTCCTCTG G AATCG AG CCTG CCT
pA R B I- CTGGGTTCGAAGCCCGGTTAGAACTCTGGAGGCTAGGATGGCTTGAACCTG GGAG

935: GGCTGCAGAGAGCTGTAACCG CGCCACTGCACTCCAGCCTGGGCAACAGAGCTCTGGAAG
PTP N6_2 CTTGCCCTAGAGTCAGTCAAGGGCCCTAGGCCAGTGAGTAACAGCTCAGCGTCAGTTTCCT
End ogen CATCTATAAAATG GGG GTAATATCATACCTAGCTCTCAG CATGTTTGTGAG AGACCTAAATG
ous_38 AG GTG GTG GATTTGGAAG CATGTAGCGCAGTGCCTGGCACACAGTAG
GTGCTTGATTTCCG
GCCCCTCTCTGTGAATGTCTCTGCTCAGCGCCTTCCCCTGTGGCCTGGGTCTTACCTTCCCTG
ACGCTGCCTTCTCTAGGTGGTACCATGGCCACATGTCTGGCGGGCAGGCAGAGACGCTGCT
GCAGG CCAAG GG CGAGCCCTG Gtccgga tccggagagggca ggggatctctccttacttgtggcga cgtgg a ggaga a ccccggccccATGGTGAGCAAGG G CGAG GAG CTGTTCACCGGG GTG GTGCCCATCCT
GGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGG
GCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGT
GCCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTG CTTCAGCCGCTACCCCG
ACCACATGAAG CAG CACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAG CG
CACCATCTTCTTCAAG GACGACGG CAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGG C
GACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAG GAGGACGGCAACATC
CTGGG GCACAAG CTG GAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAG C
AGAAGAACG GCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGG CAG CGTGC
AG CTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTG CCCGA
CAACCACTACCTG AGCACCCAGTCCAAGCTGAGCAAAGACCCCAACG AGAAGCGCGATCAC
ATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACA
Aatgattgtttattgcagcttata atggtta ca a a ta aagcaatagcatcacaa atttcaca a ata a agcatttttttca ct gcattctagttgtggtttgtccaa a ctcatca atgtatctta tACGTTTCTTGTGCGTGAGAGCCTCAG CCAG

CCTGGAGACTTCGTGCTTTCTGTGCTCAGTGACCAGCCCAAGG CTGGCCCAGGCTCCCCGCT
CAGGGTCACCCACATCAAGGTCATGTGCGAGGTAAGGCAGCCAGGCGGCGGGGGAGCCTC
TGCTGAGGCTCCIGTCTGTGACCACAGIGTGGGTGGCAGGGAGGGICTGCCIGGGCTTGA
ATTCAAGG CTG GGGACCCAG G GAG GGAGACTCAAGTCCTGTGAATG GCCTAATTTGGCTC
CCCCCAGGGTG GACGCTACACAGTGGGTGGTTTGGAGACCTTCGACAG CCTCACGGACCTG
GTGGAGCATTTCAAGAAGACGGGGATTGAGGAGGCCTCAG GCGCCTTTGTCTACCTGCGG
CAGGTCAGGGGTGGGCCCAGCTGCCTCCCCACTTCCCCTGAGCTGTCCCCCAGATGT
pA R B I- TGTCACATATGTGCAATGCCATGCTCCTGAGCCTTTGATTGCAGACGTGTGG

936: CCCGTCCCCACCCCCAGTG CCACCCTG CTCTG CTTCTCTTCCCTTG CTGTG CTCTAAAACG AG
PTP N6_3 AAGTACAAGTGAGTTCCCCCAAGGGGTCGGCCG CGCCTCTTCCTGTCCCCGCCCTGCCGGC

End ogen TGCCCCAGGCCAGTGGAGTGGCAG CCCCAGAACTGGGACCACCGGG GGTGGTGAGGCGG
ous_39 CCCGG CACTGG GAG CTG
CATCTGAGGCTTAGTCCCTGAGCTCTCTGCCTGCCCAGACTAGCT
GCACCTCCTCATTCCCTGCGCCCCCTTCCTCTCCGGAAGCCCCCAGGATGGTGAGGTAAGGG
CCTGCCACCCACGGTAGACAGGAGGCAAGGGTGCCTGGTGCCCACGGGACCCCTCCTCACT
GCCCTGCCTGGGCCGCCCAGGtccggatccggagagggcaggggatctctccttacttgtggcgacgtggagg agaaccccggccccATGGTGAGCAAGG GCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGG
TCGAG CTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG GCGAGGGCGAG G GC
GATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGC
CCTGGCCCACCCTCGTGACCACCCTGACCTACG G CGTGCAGTGCTTCAG CCG CTACCCCGAC
CACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCG AAGGCTACGTCCAGGAGCGCA
CCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAG GTGAAGTTCGAGG GCG
ACACCUGGTGAACCGCATCGAGCTG AAGGGCATCGACTTCAAGG AGGACGGCAACATCC
TGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCA
GAAGAACGGCATCAAGGTGAACTTCAAGATCCG CCACAACATCGAGGACGGCAGCGTGCA
GCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCG ACGGCCCCGTGCTGCTGCCCGAC
AACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACA
TGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATG GACGAGCTGTACAA
atgattgtttattgcagcttataatggtta ca a ata a agca atagcatca ca aatttca ca a ataa agcatttttttca ctg cattctagttgtggtttgtcca aa ctcatca atgtatcttatTGGTTTCACCGAGACCTCAGTG GG CTGGATG

CAGAGACCCTGCTCAAG GGCCGAGGTGTCCACGGTAGCTTCCTGGCTCGGCCCAGTCGCAA
GAACCAGGGTGACTTCTCGCTCTCCGTCAGGTAGGTGG GCCCCCCGCAACCCCGGGCATTT
TGGCCACTCTCTTGTG CCATCCAGG CCCTGAACCACTCATTCCTGGTTCCCCGTGGCAGTGC
TGACTCCCCGTCTGTTCCCTTGCCCCCAACCCCCACACTCCCCATCCCTGTCTGTGCCCACCC
ATGCCCATGTGTGCCCCCACCCAGGACCTCAGCCGATCCCTGCCCTCCTGCCTCTACTCCTGC
ACCGACTGGCCTCACCGCCTGGTGCCCTGCAGGGTGGGGGATCAGGTGACCCATATTCGG
ATCCAGAACTCAG G GGATTTCTATG ACCTGTATG GAG G G GAGAAGTTTG
pARB I- al I I AG CTAATGTTCTG TAG

937:
GCATCAATATTTGTATATATGCATGCATATATGTGTATGTATATTTATGAATACATATACACA
PTP RC_1_ CCATATACATACACTTATGTATGGTGTGTGCATATGTATGTGTAGATATATGTACATACACA
Endogeno CTATAGTGGACTGG GGAGTTAGTATACTG GGAG GAG CATACATTTAGG GTATGATTCACAT
us_40 ATTTATTTTGTCCTTCTCCCATTTTCCATTAATTAACAGGATTGACTACAGCAAAGATGCCCA
GTGTTCCACTTTCAAGTGACCCCTTACCTACTCACACCACTGCATTCTCACCCGCAAGCACCT
TTGAAAGA GAAAATGACTTCTCAGAG ACCACAACTTCTCTTA GTCCAGACAATACTTCCA CC
CAAGTATCCCCGtccgg atccggagagggcaggggatctctcctta cttgtggcga cgtggaggaga a ccccggcc ccATGGTGAGCAAGGGCGAGG AGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGG
ACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCT
ACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCAC
CCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATG AAG
CAGCACGACTTCTTCAAGTCCG CCATGCCCGAAG GCTACGTCCAGGAG CG CACCATCTTCTT
CAAGGACGACGG CAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGT
GAACCGCATCGAGCTGAAGGG CATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAA
GCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGG
CATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG GACGG CAG CGTGCAGCTCGCCGAC
CACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACC
TGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCT
GGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgattgtttattg cagcttata atggtta ca a ataa agca a ta gcatca ca a atttca ca a ata aagcatttttttca ctg ca ttctagttgtg gtttgtcca a a ctca tca a tgtatcttatGACTCTTTGGATAATGCTAGTGCTTTTAATACCACAGGTTG
GCACACAAAAGTTGTTAACTTAAATATCAGGGAATGTCATTTAGAAAATTCTACAGTTATCA
GTACAACTTGTCTTTAAATTATTTGCACAGTTTCTAAGTATGTGATTTTATTCAAGTG CAG AA
ATTGCAGGAAATTAGTATCTGTGAAATATAGATCGACTGAAGTAATTAATGCATGTTGTTAG
GGAGTGGAGAGAGAAAAGAAGGGAAGCAAGATCTCCAAGGACAATCAGGAGGGGAAATT
TGTTCTAGTATCCTCTGATCTATACACACTCGCCATGATTCTCCCGCTTGGCTTCCCGCCACCT
GGACAGATGAGAATTTCCCTAGTTCAGAGATTATCAGTTCTACTCTCCTTG GAAGGTGTCTT
AAATGGGAGTCTTCCCATTTCTTTGTTTCACTCTAGA

pA R B I- AG G G AATTTTTATATGTTG GTCATATTATATCCCTAGCATGTGG

938: ATAGGTG CTCAATTAATCATCATTATCTAAATAAATAATG CATTTGGG
AAAAAAAAGTTTCA
PT P RC_2_ AAAGTTTTTCAAAAGTCTTTTGCAG GCTTGAAATTAATCCCAATAGTGATCCTTTAGTCTGTT
Endogeno ATGTTTCTGATTTAGCCTGGG GATTCAAAAAATAAATAACATAATTTTGATATATTTGGG CU
us_41 TGTAAACATG GTACTAA G AG A G G AAATATAGTTTC ATTA G G
GTAAAAGCTACTGAAAATTG
CCACTTG GTGAATGTTCTATCATAGACTTGAGGTACATATAAAAATCTAATATATGTTTACAT
TAATATG AATG AAATTTG AAATTTTCTAAG A G ATTITTGTTTCTTCTTIG CA G GGCAAAGCCC
AACACCTTCCCCCtccgga tccggagagggca ggggatctctccttacttgtggcgacgtgga ggaga a cc ccggc cccATG GTGAGCAAG GGCGAGGAG CTGTTCACCGGG GTG GTG CCCATCCTG GTCG AG CTG
GACGGCGACGTAAACGG CCACAAGTTCAGCGTGTCCGGCGAG G GCGAGGG CGATGCCAC
CTACGGCAAG CTGACCCTGAAGTTCATCTGCACCACCGGCAAG CTGCCCGTGCCCTGGCCC
ACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAG CCGCTACCCCGACCACATGAA
GCAGCACGACTTCTTCAAGTCCGCCATG CCCGAAGGCTACGTCCAG GAGCGCACCATCTTCT
TCAAG GACGACG GCAACTACAAGACCCGCG CCG AGGTGAAGTTCGAGGGCGACACCCTGG
TGAACCGCATCGAG CTGAAGGGCATCGACTTCAAGGAGGACG G CAACATCCTGGG GCACA
AG CTG GAGTACAACTACAACAGCCACAACGTCTATATCATGG CCGACAAG CAGAAGAACG
GCATCAAG GTGAACTTCAAGATCCGCCACAACATCGAG GACG GCAGCGTGCAGCTCGCCG
ACCACTACCAGCAGAACACCCCCATCGGCGACG GCCCCGTG CTG CTG CCCGACAACCACTA
CCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTG
CTGGAGTTCGTGACCG CCGCCGGGATCACTCTCGG CATGGACGAG CTGTACAAatgattgttta ttg ca gcttata a tggtta ca a ata aagcaatagcatcacaaatttca ca a ata a agca tttttttca ctgcattctagttg tggtttgtcc a a a ctc a tca a tgta tctta tACTG G TAA G
AATTAATATTTATATTTTTACTAATTTTATTT
TCTTGTTGCAAAGTTTATATATTTAACTACAATTTTCTATTATTAACACTG AAATTATTTTTAA
GGATAAATTTTATAATCATGAGTGATTCTTGACATTCACTTGTTCTTAAACTTTCTGCTTATAC
GTTATAGAGTTTAATAACTACCTAAACATGTTATTAAATTTGTATATATATTTTGTGTATAAA
TAG TAACTTTTCCCAAA CTTG ACAG TAAATCACACAA CAG G TTTCTACTCTCTTTTAATATTTT
AAG A CTATAAAAAAATG CATTTAAATTAGATAACAAAATTTTATAGTCTGAAAGCAGGTTAA
CAG CTGTCTATGTATG TTATAG ATATG TAG ATAACAG ATTTG CATATGTCTATATTTCTTTAA
GA G TATG TTG CTTTTTTCAATG GTATG CA
pA R B I- CCTTTAG CACCATAAA GAAACTAAATTATTTA G ATG TTTTTATG AG AA

939: CI III CTGTCATCCAATACTTCCACAAATAAATCATTAGTTCTTG
CTAATCTTCATCTGG CATA
PT P RC_3_ AAAATAATG A CATCAACTTTCTTCATG TAATTTCCCACTTAATTCCTTTACTAG G A
GCAATAT
Endogeno CAATTCCTATATGACGTCATTGCCAGCACCTACCCTGCTCAGAATGGACAAGTAAAGAAAAA
us_42 CAACCATCAA G AAG ATAAAATTG AATTTG ATAATG AAGTG G A
CAAAGTAAAG CA G G ATG CT
AATTGTGTTAATCCACTTG GTG CCCCA G AAAAG CTCCCTG AA G CAAAGGAACAGGCTGAAG
GTTCTGAACCCACGAGTGGCACTGAG GGG CCAGAACATTCTGTCAATGGTCCTGCAAGTCC
AGCTTTAAATCAAGGTtccggatccggagagggcaggggatctctccttacttgtggcgacgtggaggagaacc ccggccccATGGTGAGCAAG GGCGAGGAGCTGTTCACCG G GGTGGTGCCCATCCTGGTCGAG
CTGGACG GCGACGTAAACGG CCACAAGTTCAGCGTGTCCGGCGAG G GCGAGGG CGATGC
CACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGG
CCCACCCTCGTGACCACCCTGACCTACGG CGTG CAGTGCTTCAGCCGCTACCCCGACCACAT
GAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAG G CTACGTCCAGGAGCGCACCATC
TTCTTCAAGGACGACG G CAACTACAAGACCCG CGCCGAG GTGAAGTTCGAG G GCGACACC
CTGGTGAACCG CATCGAGCTGAAG GGCATCGACTTCAAGGAG GACG G CAACATCCTGG GG
CACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATG GCCGACAAGCAGAAGA
ACG GCATCAAG GTGAACTTCAAGATCCG CCACAACATCGAGGACG GCAGCGTGCAG CTCG
CCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTG CTG CTG CCCGACAACCA
CTACCTG AG CACCCAGTCCAAG CTGAGCAAAGACCCCAACGAGAAGCG CGATCACATGGTC
CTGCTG GAGTTCGTGACCGCCGCCG GGATCACTCTCGG CATG GACGAG CTGTACAAatgattg tttattgcagcttata atggttaca a ata a a gcaatagcatca ca a atttca ca aa ta a agcatttttttca ctgcattcta gttgtggtttgtccaa a ctcatca atgtatcttatTCATAGGAAAA GACATAAATGAG GAAACTCCAAAC
CTCCTGTTAG CTGTTATTTCTATTTTTG TAG AAG TAG G AAGTGAAAATAGGTATACAGTG GA
TTAATTAAATGCA GCG AA CCAATATTTGTAG AAG G GTTATATITTACTACTGTG GAAAAATA
TTTAAG ATAGTTTTG CCAG AACAGTTTG TACAG A CGTATG CTTATTTTAAAATTTTATCTCTT
ATTCAGTAAAAAACAACTICTTTGTAATCGTTATGIGIGTATAIGTATGIGTGTATG GGTGT

GTGTTTGTGTGAGAGACAGAGAAAGAGAGAGAATTCTTTCAAGTGAATCTAAAAGCTTTTG
CI!!! CCTTTGTTTTTATGAAGAAAAAATACATTTTATATTAG AAGTGTTAACTTAG CTTGAA
GG ATCTGTTTTTAAAAATCATAAACTGTGTG CA G ACTCAATA
pA RB I- aa gga a a a aggca ctgagtgctggggggtgctggggtgggctgcagtg ataga catcagggtaga ggtta a ggtca g 1806 940: gttcagcctca ctggggtgaagtttgagcacggtgagcaggccatgcagcccgggggaggggaggatgggaggaggt PTP RCA P
ggagctttccgggcagagggaacagccagtgcgaaggccccaggcaggtggcttaatgcagctgttgggggaggtgag _1_En dog tggtagggaggaggctggagggatgggggctgatctcacagggccaga gcctggttga cca a ata aggc cttggccttt en ous_43 tctGCTTGGCTGTCCCAAGAGGATCCCAAAGAGAAAAAAACGAAAGTGGTCTTGGTCACCCA
GCCTGCCCCACACCAGGCCCCACCCCAGGTGCTGAGCCCTCTGAGCCCCTGCCTGTCTCCCA
CAGGCTCTGCCCTGCtccggatccggagagggcaggggatctctcctta cttgtggcga cgtggaggaga a cccc ggccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGC
TG GACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG GCGAGGGCGAG GGCGATG CC
ACCTACG G CAAGCTGACCCTGAAGTTCATCTG CACCACCGGCAAGCTGCCCGTG CCCTGGC
CCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCG CTACCCCGACCACATG
AAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCT
TCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCC
TGGTGAACCGCATCGAGCTGAAGG GCATCG ACTTCAAGG AGGACGGCAACATCCTGGG GC
ACAAGCTG GAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAG CAGAAGAA
CG GCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGC
CGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCAC
TACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCC
TGCTG GAG TTCGTGACCGCCGCCGG GATCACTCTCGGCATG G ACGAGCTGTACAAatgattgt ttattgcag cttataatggtta ca a ata a agcaatagcatcaca aatttcacaa ata a agcatttttttca ctgcattcta g ttgtggtttgtccaaactcatcaatgtatcttatACCTTAGGGCTCGGGATGCTGCTGGCCCTGCCAGGG
GCCTTGGGCTCGGGTGGCAGCGCGGAGGACAGCGTGGG CTCCAGCTCTGTCACCGTTGTCc tgctgctgctgctgctcctactgctgGCCACTGGCCTAGCACTGGCCTGGCGCCGCCTCAGCCGTG AC
TCAGG GGGCTACTACCACCCGGCCCGCCTAGGTGCCGCGCTGTGG G GCCG CACGCGGCGC
CTGCTCTGGGCCAGCCCCCCAGGTCG CTGGCTGCAGGCCCGAGCTGAG CTGGGGTCCACA
GACAATGACCTTGAGCG ACAG GAG GATGAG CAGG ACACAGACTATGACCACGTCGCGGAT
GGTGGCCTGCAGGCTGACCCTGGGGAAGGCGAGCAGCAATGTGGAGAGGCGTCCAGCCC
AG AGCAGGTCCCCGTGCGGGCTGAGGAA GCCAG AGACAGTGACACG
pA RB I- aaggtcaggttcagcctcactggggtga agtttgagcacggtgagcaggccatgcagcccgggggaggggaggatggg 1806 941: aggaggtgga gctttccggg cagaggga a cagccagtgcga a ggccccaggcaggtggcttaatgcagctgttggggg PTP RCA P aggtgagtggtagggaggaggctgg agggatggggg ctga tctca cagggccagagcctggttga cca a ataaggcct _2_Endog tggccttttctGCTIGGCTGICCCAAGAGGATCCCAAAGAG AAAAAAACG AAAGTGG TCTTG GT
en ous_44 CACCCAGCCTGCCCCACACCAG GCCCCACCCCAGGTGCTG AG CCCTCTG AGCCCCTGCCTGT
CTCCCACAGGCTCTGCCCTGCACCTTAGGGCTCGGGATGCTGCTGGCCCTGCCAGGGGCCT
TGGGCTCG GGTGG CA GCG CGGAGGACAGCtccggatccgga gagggcaggggatctctccttacttgtgg cga cgtggagga ga a ccccggccccATG GTG AGCAAGGGCGAGGAGCTGTTCACCGG GGTGGTG C
CCATCCTGGTCGAGCTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCGGCGAGG
GCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCT
GCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCT
ACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCA
GGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTC
GAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGG GCATCG ACTTCAAGG AG GACGGC
AACATCCTG G G GCACAAG CTG GAG TACAACTACAACAG CCACAACGTCTATATCATG G CCG
ACAAGCAGAAGAACG GCATCAAGGTGAACTTCAAGATCCGCCACAACATCG AG GACGG CA
GCGTG CAGCTCGCCGACCACTACCAG CAGAACACCCCCATCGGCGACGG CCCCGTGCTGCT
GCCCG ACAACCACTACCTGAGCACCCAGTCCAAG CTGA GCAAAGACCCCAACGAGAAGCG
CGATCACATGGTCCTGCTGGAGTTCGTGACCGCCG CCGGGATCACTCTCGGCATGGACGAG
CTGTACAAatgattgtttattgcagcttataatggttacaa ata a agca atagc atca ca a atttca c a a ata aagca tttttttca ctgca ttctagttgtggtttgtcca a actcatca atgtatcttatGTGGGCTCCAGCTCTGTCACCGT
TGTCctgctgctgctgctgctcctactgctgGCCACTGGCCTAGCACTGGCCTGGCGCCGCCTCAGCC
GTGACTCAGGGGGCTACTACCACCCGGCCCGCCTAGGTGCCGCGCTGTGGGGCCGCACGC
GGCGCCTGCTCTGG GCCAGCCCCCCAGGTCGCTGGCTGCAGG CCCGAGCTG AGCTGGG GT

CCACAGACAATGACCTTGAG CGACAG GAGG ATGAGCAGGACACAGACTATGACCACGTCG
CGGATGGTGGCCTGCAGG CTG ACCCTGGGGAAGGCGAGCAGCAATGTGGAGAGGCGTCC
AG CCCAGAGCAG GTCCCCGTGCG G GCTGAGGAAG CCA GAGACAGTG ACACG GAGG GCGA
CCTGGTCCTCGGCTCCCCAGGACCAGCGAGCG CAGGGGGCAGTGCTGAGGCCCTGCTGAG
pA RB I-cagtgcgaaggccccaggcaggtggcttaatgcagctgttgggggaggtgagtggtagggaggaggctggagggatg 942:
ggggctgatctcacagggccagagcctggttgaccaaataaggccttggccttttctGCTTGGCTGTCCCAAGAG
PTPRCAP GATCCCAAAGAGAAAAAAACGAAAGTGGTCTTGGTCACCCAGCCTGCCCCACACCAGGCCC
_3_Endog CACCCCAGGTGCTGAGCCCTCTGAGCCCCTGCCTGTCTCCCACAGGCTCTGCCCTGCACCTT
enous_45 AGGGCTCGGGATGCTGCTGGCCCTGCCAGGGGCCTTGGGCTCGGGTGGCAGCGCGGAGG
ACAGCGTGGGCTCCAGCTCTGTCACCGTTGTCctgctgctgctgctgctcctactgctgGCCACTGGCC
TAGCACTGGCCTGGCGCCGCCTCAGCCGTGACTCAGGGGGCTACTACtccggatccggagagggc aggggatctctccttacttgtggcgacgtggaggagaaccccggccccATGGTGAGCAAGGGCGAGGAGCT
GTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTT
CAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCAT
CTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGC
GTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCAT
GCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACC
CGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATC
GACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCAC
AACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCC
ACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCG
GCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAA
AGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGAT
CACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggttacaaataaagcaatag catcacaaatttcacaaataaagcatttttttcactgcattctagttgtggtttgtccaaactcatcaatgtatcttat CAC
CCGGCCCGCCTAGGTGCCGCGCTGTGGGGCCGCACGCGGCGCCTGCTCTGGGCCAGCCCC
CCAGGTCGCTGGCTGCAGGCCCGAGCTGAGCTGGGGTCCACAGACAATGACCTTGAGCGA
CAGGAGGATGAGCAGGACACAGACTATGACCACGTCGCGGATGGTGGCCTGCAGGCTGAC
CCTGGGGAAGGCGAGCAGCAATGTGGAGAGGCGTCCAGCCCAGAGCAGGTCCCCGTGCG
GGCTGAGGAAGCCAGAGACAGTGACACGGAGGGCGACCTGGTCCTCGGCTCCCCAGGACC
AGCGAGCGCAGGGGGCAGTGCTGAGGCCCTGCTGAGTGACCTGCACGCCTTTGCTGGCAG
CGCAGCCTGGGATGACAGCGCCAGGGCAGCTGGGGGCCAGGGCCTCCATGTCACCGCACT
GTAGAGGCCGGTCTTGGTGTCCCATCCC
pARBI- ATAGAGTAGGGCGGGGGATGCCATG GAGAGGCTCCATGGGGGAGGGCCGGGGAAGCGC

943: CGCTCCAGGAG GCACGTGGTCCGGCGCGGAAGGGGCCCATGAGG CGTGGAG GCCGCCGA
RPS23_1_ GGTCGGGGTACCGAGGGACGCAGGGAGGCCAGCGCTTCCTCCCGGGCATTCGAGCGGGG
Endogeno CCTCGTCCTTCGGGAGAACACATTCTCCGGAGCCCTCTTCGAACGTTTATTAGTCGGTTCAG
us_46 GGCAACTTGAAGGCCAAATGTTTGGCCCACAGGCCAATAAATAGTACGAGAGCCAATCGG
CTTAAGGGTTTATTCCAGGTGAGGCGAGTGTCTTAGAAGATGGGAAACACGTAGATGGCG
TGTTTTTACGGAAGAACTAAAATATTTAATTTTTAGGCAAGTGTCGTGGACTTCGTACTGCT
AGGAAGCTCCGTAGTCACCGACGAGACCAGtccggatccggagagggcaggggatctctccttacttgtg gcgacgtggaggagaaccccggccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTG
CCCATCCTGGTCGAGCTGGACGG CGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAG
GGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAG
CTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCG
CTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCC
AGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCG CCGAGGTGAAGT
TCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACG
GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGC
CGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGG
CAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTG
CTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGC
GCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGA
GCTGTACAAatgattgtttattgcagcttataatggttacaaataaagcaatagcatcacaaatttcacaaataaagc atttttttca ctgcattctagttgtggtttgtcca aactcatcaatgtatcttatAAGTGGCATGATAAACAGTATA
AGAAAGCTCATTTG GG CACAG CCCTAAAG G CCAACCCTTTTGG AG GTG CTTCTCATG CAAA
AG GAATCGTG CTG GAAAAAGTGTAAGTCCATTG CTCCC GTCAAGTTTTAGTTTATTATAG GA
ATTCGAGACATGAACTTACGAATTCTTGTTTTGAAAGTAATTGCAGGTTTTTGTGTAGTAGT
ATTCATTTGG GCATTGTGGGGTAAAATTGCAAAGCGTTTGTTCTATTTAAAAGTTGGTAAAA
TTAGTTTTTG G GAATTAG GTAG TTAAG GTTTTAATTTAACGTTG G CCTG GAAG GAATTG GA
GAAGATACTAGCAATGATGAAGTAAAGGACACAAACACCTTTACTGTGGGAGTTGTTATAA
GTAAATGGCACGTGTCAGCTATTGAACTTTATCGACTTGATAAAACTAAGGTGAAGAGA
pARBI- CAAGCGGGACTTGGG GTCTTGGGGACGGGCGGGCGGATGCGAATAGAGTAG

944: GATGCCATGGAGAGG CTCCATGGG G GAG G GCCGGG GAAGCG CCGCTCCAGGAGGCACGT
R PS23_2_ GGTCCGGCGCGGAAG GGGCCCATGAGGCGTGGAGGCCGCCGAGGTCGGGGTACCGAGG
End ogeno GACGCAGGGAGGCCAGCGCTTCCTCCCGGGCATTCG AGCG GGG CCTCGTCCTTCGG GAGA
us_47_48 ACACATTCTCCGGAGCCCTCTTCGAACGTTTATTAGTCGGTTCAGGGCAACTTGAAGGCCAA
ATGTTTGGCCCACAGGCCAATAAATAGTACGAGAGCCAATCGGCTTAAGGGTTTATTCCAG
GTGAGGCGAGTGTCTTAGAAGATG GGAAACACGTAGATGGCGTGTTTTTACGGAAGAACT
AAAATATTTAATTTTTAGGCAAGTGCCGCGGAtccggatccggagagggcaggggatctctccttacttgt ggcga cgtggaggaga a ccccgg ccccATG GTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTG GT
GCCCATCCTGGTCGAGCTGGACG GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGA
GGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAA
GCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCC
GCTACCCCGACCACATGAAG CAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGT
CCAGGAGCGCACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCGCGCCGAGGTGAA
GTTCGAGG GCGACACCCTG GTGAACCG CATCGAGCTGAAGGG CATCGACTTCAAGGAG GA
CG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAG CCACAACGTCTATATCATG
GCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGAC
GGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGC
TG CTGCCCGACAACCACTACCTGAG CACCCAGTCCAAGCTGAG CAAAGACCCCAACGAGAA
GCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGAC
GAGCTGTACAAatgattgtttattgcagcttata atggttaca a ata a agca ata gcatca ca a atttca ca aata a agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatctta tCTTCGTACTGCTAGGAAGCT
CCGTAGTCACCGACGAGACCAGAAGTGGCATGATAAACAGTATAAGAAAGCTCATTTGGG
CACAGCCCTAAAGGCCAACCCTTTTGGAGGTGCTTCTCATGCAAAAGGAATCGTGCTGGAA
AAAGTGTAAGTCCATTGCTCCCGTCAAGTTTTAGTTTATTATAGGAATTCGAGACATGAACT
TACGAATTCTTGTTTTGAAAGTAATTG CAG GTTTTTGTGTAGTAGTATTCATTTG G G CATTGT
GG GGTAAAATTG CAAAG CGTTTGTTCTATTTAAAAGTTGGTAAAATTAGTTTTTG G GA ATTA
GG TAGTTAAG GTTTTAATTTAA CGTTG G CCTG GAAGGAATTG G AGAAGATACTAG CAATGA
TGAAGTAAAGGACACAAACACCTTTACTGTGGGAGTTGTTATAAGTAAATGGCACGTGTCA
pARBI- TTGTATATCCTTTTTAAAGTTATTCTTTTTAGTTAATTG

945: CAGAAAG CTTAGATTCTAGTCCCAG GTGTAAGTTGTGTAACCCTTG G CAAGTGTCAATCTCT
RTRAF_l_ AG GCCTCAG CTTTCTCATCTATAAAATGAG G AAGTTG TCG TATTCTATTTTTTTTCTTAAG
AT
E nd og en o GATACACTTAAATGTTCCCTTCTGTTG GGTTATATAATTGCATCAAAAGTGTAGTAATGTTAT
us_49 TAAAAAATTGTTAGAGATCCAAACTAAG
GTCTCTTTCAACTCTCCCATTCTTTTTTCTGTGACT
TTATGGTAATAATGAAACTG GTG GTTTTCTTTTCTTCCCCCTCACAGTATCTCAG AG ATGTTA
ACTGTCCTTTCAAGATTCAAGATCGACAAGAAGCTATTGACTG GCTTCTTG GTTTAGCTGTT
AGACTTGAAtccggatccggaga gggcaggggatctctcctta cttgtggcga cgtggagga ga a ccccggccccA
TGGTG AGCAAGG GCGAG GAG CTGTTCACCG G GGTGGTGCCCATCCTGGTCGAGCTGGACG
GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGG CGAGG GCGATGCCACCTACG
GCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTG CCCGTGCCCTGGCCCACCCTC
GTGACCACCCTGACCTACGGCGTGCAGTG CTTCAGCCGCTACCCCGACCACATGAAGCAGC
ACGACTTCTTCAAGTCCG CCATGCCCGAAG GCTACGTCCAGGAGCGCACCATCTTCTTCAAG
GACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAAC
CGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTG
GAGTACAACTACAACAG CCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCA
AG GTGAACTTCAAGATCCG CCACAACATCGAG GACGGCAGCGTGCAG CTCG CCGACCACTA
CCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGC

ACCCAGTCCAAGCTGAG CAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTG GAG
TTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagctt ataatggtta ca a ata a agca atagcatca ca a atttca ca a ata aa gcatttttttca ctgcattctagttgtggtttgtc ca a actcatca atgtatctta tTATG GAGATAATGGTACGTTTTGTG G GGAATGTGTATTTTAAAG A
GAGAGGAAAGATGGGAAAGGGAGTGTGAAAATGTAGGGAACTTTGCAGTTTGTTTTGTCT
AGTACTATTTTACCTTTGGTTTATTCTTATCACAAGTTAAAAGCACTTTTATTGTCTTTCATTG
GTGTTTATATATTTCTGTTAGAATTTGGAAATG GTG CCCTCTG G A G AA G G CTAATTG ACTGT
CTTCTCA CAG AG TAACA CTACTTTG ATAATATG GTCTGCACCTTAGCCTTTCAAATTAAATTG
TTTTTAGTGTCCCAGAATTGATGGGACTTTGAAGTGTTGTTGCAGTAGGTAATTTCTCAAAA
GA CTG AAA CATGTCTAATG CCAATATACATTAATACTCTATA G GCCAG AATATATTA CTTAT
GTTTACTGTCTTAGCACAGATACTTCTATG GT
pARB I- GCACCTGTTATGTAG G AG G AGTAATAAAATGAATG AATG

946: TTCTGTAATCCAACGG G AG G TACAG C G AATAC CAAAAG CCTA
CATATACTATTCTCTG CATG
RTRAF_2_ CTATAGCAAGAAAG AAAGGAAAGTGGCTTCCCGGTGGTTTTCTGCCTATTGTACAACCAGG
Endogeno AAGCTGACAATAAAGTTTATTTGAGCGTCGACGTGCGCCGACGTG GCCCCGCCTCCCCAGC
us_50 CGGAGCCGCGATTGGTGG
GCATTTGCCGGCGGCCACCGCTTTTAAGCCACGATTGGCGAA
GGCCGCCGTCATTTCG GAGCGACTCAGCGCCTGCCCGCCCTCTCGCCGCGTCGCCGGTG CC
TGCGCCTCCCGCTCCACCTCGCTTCTTCTCTCCCGGCCGAGGCCCGGGGGACCAGAGCGAG
AAGCGGGGACCATGTTCCGACGCtccggatccggagagggcaggggatctctccttacttgtggcgacgtgg aggaga a ccccggccccATGGTGAGCAAG GGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCC
TG GTCG AG CTG GACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGG GCGAG
GGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTG CACCACCGG CAAGCTGCCCG
TGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCC
GACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGC
GCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGG
GCGACACCCTG GTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACA
TCCTGGGG CAC AA G CTG GAGTACAACTACAACAGCCACAACGTCTATATCATG G CC G A CAA
GCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGT
GCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCG GCGACGGCCCCGTGCTGCTGCCC
GACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAG CGCGAT
CACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGG GATCACTCTCGGCATGGACGAGCTGT
ACAAatgattgtttattgcagcttataatggtta ca a a ta aagcaatagcatcaca a atttca ca a ata a agcattttttt ca ctgcattctagttgtggtttgtccaa a ctcatcaatgtatctta tAAGTTGACGGCTCTCGACTACCACAAC

CCCGCCG GCTTCAACTGCAAAGGTGAG GCGGCG GCCTCA GCCCGGCCGCGTGTCCCTG ACC
TGGGCGGAGGTCCCAGCCTCAGTGCCCGCACCCCACCTCCCCGTCGGGACCCTCGGCGGCC
TGGTTTCCGCCGGCAGCCTCCGGGCCCCTCTCCTCTGGGTCGCCACGTACCTCGGCTCTTCG
CCG CCCCTTCC CG CCTTTAAAG CCCTCTCACCTA CTCCTGTCTCG G CATG TTACTTTCTG CA CT
TGCTTAACTCCAAGCATCACGTAACTACCTTCTCTGTACATAAAAGG G AG A G CATTCGTCTT
TCTCACTCACTATTCAACTCCATGGTTCCCTG G GTAATTAG G CG ATAC CTTGAG CA CCTG CTA
ATTATGGGCCAGCGCGGTGCTGGATTCTGAGGAAGGTGCTGAGTAACTTG
pARB I- CACTAAACAGTAATTCTGTAATCCAACG G G AG

947: CTATTCTCTG CATG CTATAG CAAG AAA G AAAG GAAAGTGGCTTCCCG GTG
GTTTTCTG CCTA
RTRAF_3_ TTGTACAACCAGGAAGCTGACAATAAAGTTTATTTGAGCGTCGACGTGCGCCGACGTGGCC
Endogeno CCGCCTCCCCAGCCGGAGCCGCGATTGGTGGGCATTTGCCGGCGGCCACCG CTTTTAAG CC
us_51 ACGATTGGCGAAGGCCGCCGTCATTTCGGAGCGACTCAGCGCCTGCCCGCCCTCTCGCCGC
GTCGCCGGTGCCTG CGCCTCCCGCTCCACCTCGCTTCTTCTCTCCCGGCCGAGG CCCGGGGG
ACCAGAGCGAGAAGCGG GGACCATGTTCCGACGCAAGTTGACG GCTCTCGACTACCACAA
CCCCGCCGGCTTCAACTGCAAAtccggatccggagagggcaggggatctctccttacttgtggcgacgtggag gagaaccccggccccATGGTGAGCAAGGG CGAG GAGCTGTTCACCGGGGTGGTGCCCATCCTG
GTCGAGCTGGACGG CGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGG G CGAGG G
CGATG CCACCTACGGCAAG CTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTG
CCCTGG CCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAG CCGCTACCCCGA
CCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATG CCCGAAGGCTACGTCCAGG AGCG C
ACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCGCG CCGAGGTGAAGTTCGAG GGC
GACACCCTGGTGAACCG CATCGAG CTGAAGGG CATCGACTTCAAG GAGGACG GCAACATC

CTGGG GCACAAG CTG GAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAG C
AGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGG CAGCGTGC
AG CTCGCCGACCACTACCAGCAG AACACCCCCATCGGCGACGGCCCCGTGCTGCTG CCCGA
CAACCACTACCTG AGCACCCAGTCCAAGCTGAGCAAAGACCCCAACG AGAAGCGCGATCAC
ATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACA
Aatgattgtttattgcagcttata atggtta ca a a ta aagcaatagcatcacaa atttcaca a ata a agcatttttttca ct gcattctagttgtggtttgtccaa a ctcatca atgtatctta tGGTGAG GCGG CG G CCTCA
GCCCGGCCGCG
TGTCCCTGACCTGGGCGGAGGTCCCAGCCTCAGTGCCCGCACCCCACCTCCCCGTCGGGAC
CCTCGGCGGCCTGGTTTCCGCCGGCAGCCTCCGGG CCCCTCTCCTCTGGGTCGCCACGTACC
TCG G CTCTTC GC CG CC CCTTCC CG CCTTTAAA G CCCTCTCA CCTACTCCTGTCTCG GCATG TT
ACTTTCTGCACTTGCTTAACTCCAAGCATCACGTAACTACCTTCTCTGTACATAAAAG G G AG A
G CATTCG TCTTTCTCACTCACTATTCAA CTCCATG G TTCC CTG G GTAATTAG GCGATACCTTG
AG CACCTG CTAATTATGGGCCAG CGCG GTGCTGGATTCTGAGG AAGGTGCTG AGTAACTTG
AAGACTAGTTCACTGCCTGCCAGGAGCTAAAGGGACGAG GGGTGGAAG
pARBI- CCGGGTTAGGCATAGGCCCTCCCGGATCTTCCGCG GTGTAAGGAGAAG

948: GTGATAGGCCCAGAGCCCCCACTCCACAAGCCAGCCCATCCCCCACGGAGACCCAAACCGT
SE RF2_1_ CCACACACACCTTGCCAGCTGTTTGGGCCCCACGCCG CCCCAAAACACGCCTCCAGCTGGCC
Endogeno CCTTGGGACCTCCCTTCTCTAGTCCGTATTTTGACCTG GCCCGTGGCAGATTCGCCACTCCCC
us 52 CCTACCCCAAGCAGCCTGGGCCTCGATGG GCCGTTGTCGGGGCCCGGAGATTGAAGTG
GT
GTTGG ATCCTGCTGCTGGCCGCGCTGGGGTAGAAGGGTCGCCGGTGTGTGGGCAGAGCG
GCCCCCGCGTCTCACCTTTAATTTTCTTTCCTTAGGCGGTAACCAGCGTGAGCTCGCCCGCCA
GAAGAATATGAAAAAGCAGAGCtccggatccggagagggcaggggatctctccttacttgtggcgacgtgga ggaga a ccccggccccATGGTGAGCAAGG G CGAG GAG CTGTTCACCGGG GTG GTGCCCATCCT
GGTCG AGCTG GACG G CGACGTAAACGGCCACAAGTTCAG CGTGTCCGGCGAGGG CG AG G
GCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGT
GCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCG
ACCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAG CG
CACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGG C
GACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAG GAGGACGGCAACATC
CTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAG C
AGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAG CGTGC
AG CTCGCCGACCACTACCAGCAG AACACCCCCATCGGCGACGGCCCCGTGCTGCTG CCCGA
CAACCACTACCTG AGCACCCAGTCCAAGCTGAGCAAAGACCCCAACG AGAAGCGCGATCAC
ATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACA
Aatgattgtttattgcagcttata atggtta ca a a ta aagcaatagcatcacaa atttcaca a ata a agcatttttttca ct gcattctagttgtggtttgtccaa a ctcatca atgtatctta tGACTCGGTTAAGGG AAAGCG CCGA
GATGA
CG GGCTTTCTGCTGCCG CCCGCAAGCAGA GGTAGCCCCAGG G AGG G GAG GGAAAG GG AC
GGTGGAGACCTGGGTTAGACCAAGGGTTATAGAAGGAAAGAGAG CTACCTCAGGGCTTGA
ATGTGGACTAG TCGTGAGG AG CAG AGTG CATTG CTTCCTCTAGGGTTTTATTTCCTCCCCAC
CCTCCAAATTGTTAGCTCACAGCCTTACAGGAAAGGACGGG GGCG GGCGCCTGCCCTCAGT
CTGATTTCTG AG CGTCCCTGG G TCTGACCTTAAG G G CAAG G G CAG G GA G CTTCACATTTCA
AATA CAGTTGTG GTTACG G CAG C C CA GTA CTTTTG GCCCT CCTTG CTGTTC G GTTCTC CTCC
C
TTCTCCCAACCTCCTCACTGGTGTTGCTGGGTGTGGTCCTCAATACAGAATAGAG
pARB I- AATCGAGCCCTTTGCCCACGGCTACTTCACGGGACCACCCTCCCGG

949: CCCGGATCTTCCGCGGTGTAAGGAGAAGGCCAGCGCCCCTGTGATAGGCCCAGAGCCCCC
SE RF2_2_ ACTCCACAAGCCAG CCCATCCCCCACG GAGACCCAAACCGTCCACACACACCTTG CCA GCTG
Endogeno TTTGGGCCCCACGCCGCCCCAAAACACGCCTCCAGCTGGCCCCTTG GGACCTCCCTTCTCTA
us_53_54 GTCCGTATTTTGACCTGGCCCGTGGCAGATTCGCCACTCCCCCCTACCCCAAGCAGCCTGGG
CCTCGATGGGCCGTTGTCGGGGCCCGGAGATTGAAGTGGTGTTGGATCCTGCTG CTG G CC
GCGCTGGGGTAGAAG GGTCGCCGGTGTGTGGGCAGAGCGGCCCCCGCGTCTCACCTTTAA
TTTTCTTTCCTTAGGCGGTAACtccggatccggagagggcaggggatctctccttacttgtggcgacgtggagg agaaccccggccccATGGTGAGCAAGG GCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGG
TCGAG CTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG GCGA GGGCGAG G GC
GATGCCACCTACGG CAA GCTG ACCCTGAAGTTCATCTGCACCACCGGCAAG CTG CCCGTGC
CCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGAC

CACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCG AAGGCTACGTCCAGGAGCGCA
CCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGG GCG
ACACCCTGGTGAACCGCATCGAGCTG AAGGGCATCGACTTCAAGG AGGACGGCAACATCC
TGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCA
GAAGAACGGCATCAAGGTGAACTTCAAGATCCG CCACAACATCGAGGACGGCAGCGTGCA
GCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCG ACGGCCCCGTGCTGCTGCCCGAC
AACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACA
TGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATG GACGAGCTGTACAA
atgattgtttattgcagcttataatggtta ca a ata a agca atagcatca ca aatttca ca a ataa agcatttttttca ctg cattctagttgtggtttgtccaaactcatcaatgtatcttatCAGCGCGAGCTTGCCCGCCAGAAGAATATG
AAAAAGCAGAGCGACTCGGTTAAG GGAAAG CGCCGAGATGACGGGCTTTCTGCTGCCG CC
CGCAAGCAGAGGTAG CCCCAG GGAG GGGAGG GAAAG G GACGGTGGAGACCTGG GTTAG
ACCAAGGGTTATAGAAGGAAAGAGAGCTACCTCAGGG CTTGAATGTGGACTAGTCGTGAG
GAGCAGAGTGCATTGCTTCCTCTAGGGTTTTATTTCCTCCCCACCCTCCAAATTGTTAGCTCA
CAGCCTTACAGGAAAGGACGG GGGCGGGCGCCTGCCCTCAGTCTGATTTCTGAGCGTCCCT
GG GTCTGACCTTAAG G G CAAG GG CAG G G AG CTTCACATTTCAAATACAGTTGTG GTTACGG
CAG CC CAG TACTTTTG GCC CTCCTTG CTGTTCG GTTCTCCTCCCTTCTCCCAACCTC
pARBI- TTCAAAAGGAATCTTTTTGTTTCAACTATTAGGATCTTTTTAAATCAAATATGTATTTTAATG 1806 950: GTGTACACCAG ACCTGAAGAAAAAGATCACAGAAG GAATTTCCCCTTTGTAAG ATA GAG GT

1 End oge AAGGAGAAATGATGGTTGCATCTCACATTTTAAAATGTTAAGAATTTGTTTTGTAATGAGGA
no u s_55 TACCTTAACCCCTGAAGGCCAAAATGACTATTTGTGTGAGTTTGAGAAAGGCACATAGTAA
CTTGGGGAACAGTCAATAGAATGACAAAATCTTGACTATTTTAACTTTCTGAACCCTGTCAT
TTCTTGTGTTCTCAAATTTGATTTTTAAATAGGCTG CATGGTGTATG AAAAGCTG GG G GAAC
AAGTCTTTGGCACCACAtccggatccggagagggcaggggatctctcctta cttgtggcgacgtggaggagaac cccggccccATGGTGAGCAAGGGCGAGGAG CTGTTCACCGGG GTGGTGCCCATCCTGGTCGA
GCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATG
CCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTG
GCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACA
TGAAG CAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCAT
CTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACAC
CCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGG
GCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCG ACAAGCAGAAG
AACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTC
GCCGACCACTACCAGCAGAACACCCCCATCGG CGACGGCCCCGTGCTGCTGCCCGACAACC
ACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGT
CCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgat tgtttattgcagcttata atggttacaa ata a agcaatagcatca ca a atttca ca aata a a gcatttttttc a ctg cattc tagttgtggtttgtccaa a ctcatca atgtatcttatGG GAAGTTCGTAATCTTTGGAGCCACCTCTCTACA
GAACACTG GAG GTAAAAA GAACATG CTTTTCTTTACATAACTTG AAACTAATTCTG GTG GAT
GAG CG G CCACATGAATTTTAACATAATTCAAG CAGTTATCATCCTCATTGCTAAAATG G CAC
AGGGAAAGTAAAGCAGAGACAGCAGTCACTTATTTAAAGCCACAAATCCTGCTAGAGTAGC
TGAAGTTGCCTTTGTGTCTTACTGACAGTTGGTCTAAGAACGG GCTGATAACTTTTTATGTA
GCTTG CAACATAAG CCTTCGTATCTTCTTTCTAAAAATGTTCTCATTTTCCTTCAGCAATG CTG
AG CTACCTCTTCATCGTAAAAAATGAACTACCCTCTG CCATAAAGTTTCTAATG G GAAAG GA
AGAGACATTTTCGTAAGTTATAGACCATTTTTTTATGATATA
pAR B I- TAAAG ATG TCAAG CTTCAAATCATTATCATAAG GTTAAATATATTACG

951: AGTTTATTTCTGATCGTG AAAGTAG AAG AAGTCTCACAAACAG CCATTTG GAAAAAAAGAA
SLC38A1_ GTGTGATGAGTATGTAAGTATCATTATAATGAAGTATGCTTATTTTTTTAATCATATAAATTG
2_Endoge GGACAAATCTTTTTAATTCAAGTAG CATAGTACCAAAAGCATAACTACCTGTATTCACCAGG
no u s_56_ ATATTCCTCATACCTTATGTATTTTCTCTTGAAATCCATTCAAGGAAATAACTAGATAACTAA

CTTTG G CT
TGATGAACTTTCTTGGGGATTATTTTGCTTAGTGTCAATGTTTGTTTCTTTGTTCAGATTCCA
GGTACAACCTCCtccgg atccggagagggcaggggatctctcctta cttgtggcga cgtggaggaga a ccccggcc ccATGGTGAGCAAGGGCGAGG AGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGG

ACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCT
ACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCAC
CCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATG AAG
CAGCACGACTTCTTCAAGTCCG CCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTT
CAAGGACGACGG CAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGT
GAACCGCATCGAGCTGAAGGG CATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAA
GCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGG
CATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGAC
CACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACC
TGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCT
GGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgattgtttattg cagcttataatggttacaaataaagcaatagcatcacaaatttcacaaataaagcatttttttcactgcattctagttg tg gtttgtcca a a ctca tca a tgtatcttatTTAG GCATGTCAGITTTTAACCTAAGCAACGCCATTATG GG

CAGTGGGATTTTGG GACTCGCCTTTGCCCTGGCAAACACTGGAATCCTACTTTTTCTGTGAG
TATTGAC GTG C CG G TACTTTCCATTTTAAAACTG AA CTTTTTGTATGTTTCTGTTATTACATAG
AAGAAATGTGAAATATTTATAAATAGTTTTTTTATTAACTTG G CTAAGTTA CAAG TACATA CT
TTG ATATTTATTG CTTG A GTG ATTTTCCAAACTGTA CCTAATG CTTACAA CAAAG A G G AAG G
AG AAAATCTTTTTATTAATCAAAAATACTG AAATAAG CATTTG TG AAAA G G ATTAAAACATT
TGAAAATAACTTTTTTTGGTATTTTTGGAGTCTACCTGTGACTTTAAATCTCAGTTAAAATAT
TAAG A G G TGTTTAACC CCA G CCAGTCACCACTT
pA R B I- ATCATTCTTCACACTAGTTC CTTTTCTG ACTCTTCA GTCA CCTC

952: GCATCTCTAAAATGAGTTTGATTTACTAAACATAGAAGTTGACAAAGTTTTTCAAGTATAAC
S M A D2_1 TGCATAAACTCTCTTTTTTAAAAAATCAGTTTTTCCCAAGTTGTTAGCCTTTCATTG GCATTTG
Endogen AGTCATTTATG TG ACATATTTATAA G AAACATCTG CTAG TG CTG CTG CATA CTTTTATCA
GAT
ous_58 CCATTAAATAGTACATTTTTGTTCTTGATTCTCACTAAAACTAACTAAATGGCTGTCATTCTTT
TCTTTATG CTTTTATTG TACTAATTTAG C CCATTTG A CTG CA CttttttttttttttttttAAAGTTCC
CT
CCTTTCTTTTCCCCTTG CTTCCCAACAGGTCTCTTGATGGTCGTCTCCAG GTATCCCATCGAtc cggatccggagagggcaggggatctctccttacttgtggcga cgtggagga ga a ccccggccccATG GTG
AGCAA
GGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTG GACGGCGACGTAAA
CGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAG GGCGATGCCACCTACGGCAAGCTGAC
CCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCC
TGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAG CACGACTTCTTC
AAGTCCGCCATGCCCGAAG GCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCA
ACTACAAGACCCGCGCCGAGGTGAAGTTCGAG GGCGACACCCTG GTGAA CCG CATCGA GC
TGAAGGGCATCGACTTCAAG GAGG ACGGCAACATCCTGGGGCACAAGCTGGAGTACAACT
ACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTT
CAAGATCCGCCACAACATCGAGGACGGCAG CGTGCAGCTCGCCGACCACTACCAGCAGAA
CACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCC
AAGCTGAGCAAAGACCCCAACGAG AAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACC
GCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggttac aaataaagcaatagcatcacaaatttcacaaataaagcatttttttcactgcattctagttgtggtttgtccaaactca tca atgtatcttatAAAGGATTGCCACATGTTATATATTGCCGATTATGGCGCTGGCCTGATCTTCAC
AGTCATCATGAACTCAAG G CAATTG AAAA CTG CG AATATG CTTTTAATCTTAAAAAG G ATG A
AGTATGTGTAAACCCTTACCACTATCAGAG AGTTGAGACACCAGGTAGGAATATTG CAAGT
TTTTTTCTTG GTTTTG AATTAAATG CCTG AACTTCAGTATATTTAAGTA CTCTTG TG ACC CAG
GAAAATTTTAGTGTAAATTCTAATAAATTACCTTAAATTGTGCATATTATTTGGTGGTAATTA
AAAttttttta ttttta a aa atttttAATG AG AAAACATATCTGTTTCTTGGAATAGCATATGATTTGG
CATG AAAG TG CATATCCA G G AAATCCTGTAG ATTG G CA G TCTG CTCAG CG CTATGCTGATG
TG CTTCTACATGTCC CT
pARB I- TTCTATATTAAAAATCAGTTG

953: ATGAAGCACTTAGTAAAGCTGATGGCAGAG GTTTTAGATCTTTTGAATACAG GAAA GTG GT

CA
_Endogen TAATCATGACTTATCTGGAGTCACATGTTCCATTTTATTGG CTTCCTTCATATAGAAAATATA
ous_59 GTAACCAG CACTACATGCCTGTGAAAAAGTGAAGCAATTGTATATTTTCTGG GTG
AAG G AA
GTATTCTGTACATTCTCCATGTTTTACATCATGGTATTTTGAATAACCATGCTTCCATGTTCAC

ATCAATTTTTTGTTTTTCAATTTATG CACAG CACTTG CTCTG AAATTTG G GG ACT G AGTACAC
CAAATACGATAGATtccgga tccggagagggca ggggatctctccttacttgtggcga cgtggaggaga a cc ccg gccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCT
GGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCA
CCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCC
CACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGA
AG CAGCACGACTTCTTCAAGTCCGCCATG CCCGAAGGCTACGTCCAGGAGCGCACCATCTT
CTTCAAGGACG ACG GCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCT
GGTGAACCGCATCGAG CTGAAGGG CATCGACTTCAAGGAGG ACG GCAACATCCTG GG GCA
CAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAAC
GGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCC
GACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACT
ACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCT
GCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgattgttt attgcagcttata atggtta ca a ataa agca a ta gc atca ca a atttca ca a ata a agc atttttttcactgcattctagtt gtggtttgtcca a a ctcatca atgtatcttatCAGTGGG ATACAACAGGCCTTTACAGCTTCTCTGAACA
AACCAGGTGAATATTCTGCCCTCTGTCGAATCTTAGAGATCTTGTGGGAGGGGGGTATATT
TTG AAAG A CCTATATG G GGTTG CTTAGTATAATTTTG CCTAG G ATGTTTCCTTAATGTAAAA
TAAGGCATAGCATTTAAAATATCTGCTTGCCAGTATAAAAAATATATTAAATGTTTCAGCTG
ATGTTTTAATCAATGCAATTCTAGTTTGAATCTTCTTTAAATTACTGTATCCCCTAAAGAATA
ATAATTTTG ATAA CTATATATTTATTTTAG CTTG AG ATCAG ATTATAATCTG TTG TTG G CCTA
ATTTTTAAGTAGATACATGATGAGTTTTGCTAAATTTCTTGTTATCAGAATCTCATCTTTATAC
TAATAAATACATACTTAATGAACCCCTTATATCACAA
pA R B I- TCAATGTATTTTTG TATA GTCTTTTG AG ATTA G AGTGAA GTTCTAAAAAG

954: TTGTGAAAATATTTTAATTGACACTACTAAGAGTTTAAAATAGTATTGGTGGTGAAAAATCG
SMA D2_3 TTAAAAGTATCTG ATTTTACTTG CAAAATTACTGTATTTTCCCACAAG AG G A GTCCTTACAA
C
Endogen ATTCTTGTTTTTA G AAG G GTTTGTTTCCATCATTCATTTAAATTCATAAAG ATAACGTTTTCAT
o u s_60 GG GTG G AG AAG TCTATTG GG AAAG TCATAATTCG AATTTCTATCTTG
CTTTG CAGTTTGCTT
TCTATG ATAAGTTGAAATTATTACTTG ATG TTCAAG G TA GTCTCTACATCATCCTTTCAATAT
TTCTG CTAG G TTCGATACAAG AG G CTGTTTTCCTAG CGTG G CTTG CTG CCTTTG GTAAGAAC
ATGTCGTCCATCtccggatccgg agagggcaggggatctctcctta cttgtggcga cgtgga ggaga a ccccggcc ccATGGTGAGCAAGGGCGAGG AGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGG
ACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCT
ACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCAC
CCTCGTGACCACCCTG ACCTACGG CGTG CAGTGCTTCAGCCGCTACCCCGACCACATG AAG
CAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTT
CAAGGACGACGG CAACTACAAGACCCG CGCCGAGGTG AAGTTCGAGGGCG ACACCCTGGT
GAACCGCATCGAGCTGAAGGG CATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAA
GCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGG
CATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGAC
CACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACC
TGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCT
GGAGTTCG TGACCGCCG CCG GGATCACTCTCGGCATGGACGA GCTGTACAAatgattgtttattg cagcttata atggtta ca a ataa agca a ta gcatca ca a atttca ca a ata aagcatttttttca ctg ca ttctagttgtg gtttgtcca a a ctca tca a tgtatcttatTTGCCATTCACGCCGCCAGTTG TGAAGAGACTGCTGGGAT
G GAAGAAGTCAG CTG GTG G GTCTG GAG G AGCAGG CG GAG GAGAG CAGAATGGG CAG GA
AG AAAAGTG GTGTGAGAAAGCAGTGAAAAGTCTG GTGAAG AAGCTAAAGAAAACAGG AC
GATTAG ATG AG CTTG A G AAAG CCATCA CCACTCAAAACTGTAATACTAAATGTG TTACCATA
CCAA G GTA AGTTTTG TTAG ATCCCA G GTTTG ATCAAATTATG TCAAG G AATCTG AAG G AAA
GTTACTGAATTTGTGTTCCTTTCAAGTTGCCTGTAAAAAGTGATGATTGAAATATGACTGTTT
TTAACCTTGTATAAATTGTTTTTG CTAG CTG A CTTGTTTTATAAATTATTTTCTTG AATAG TG A
GGTTTAATCAAGCTAAATAAGATACTTAGATTTATTATCTTCA
pA R B I-955: CCCAGCCTGGCCAGCAGGCGGCGGGCGCGGGGCGGCGAGCCGGGGCCGGACGGCTGGA
SOCS1_1 GCCAGAACCGGCTGCTCTCCACGCCCCCCTCTCGGTGCTG CCCGGAGGCCGGACTCCGCCT

Endogen CCACCGAG CCCCCACCCGCCGGGAAGAGCTCCGCGGAGTACAGAGCCCATTTTCTAGCTGT
ous_61 GTCCACTGAGGCTGAACGGATCCGCGCGGACTTGGTGCTCCGTGCTCGCCCCCTAGGGCCG
GGTCCGCCGGGAGCGCCGCCCTCCGGAGTTGTCCGGCCGGCGCACACCTGCCCGGCCCCG
CAGCGCCCCAGCTCACCTCTTTGTCTCTCCCGCAGCGCACCCCCGGACGCTATGGCCCACCC
CTCCGGCTGGCCCCTTCTGTAGGATGtccggatccggagagggcaggggatctctccttacttgtggcgacg tggaggagaaccccggccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCAT
CCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGA
GGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCC
GTGCCCIGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCC
CGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAG
CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGG
GCGACACCCTG GTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACA
TCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAA
GCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGT
GCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCC
GACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGAT
CACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGG GATCACTCTCGGCATGGACGAGCTGT
ACAAatgattgtttattgcagcttataatggttacaaataaagcaatagcatcacaaatttcacaaataaagcattttt tt cactgcattctagttgtggtttgtccaaactcatcaatgtatcttatGTAGCACACAACCAGGTGGCAGCCGA
CAATGCAGTCTCCACAGCAGCAGAGCCCCGACGGCGGCCAGAACCTTCCTCCTCTTCCTCCT
CCTcgcccgcggcccccgcgcgcccgcggccgtgccccgcggtcccggccccggCCCCCGGCGACACGCACTT
CCGCACATTCCGTTCGCACGCCGATTACCGGCGCATCACGCGCGCCAGCGCGCTCCTGGAC
GCCTGCGGATTCTACTGGGGGCCCCTGAGCGTGCACGGGGCGCACGAGCGGCTGCGCGCC
GAGCCCGTGGGCACCTTCCTGGTGCGCGACAGCCGCCAGCGGAACTGCTTTTTCGCCCTTA
GCGTGAAGATGGCCTCGGGACCCACGAGCATCCGCGTGCACTTTCAGGCCGGCCGCTTTCA
CCTGGATGGCAGCCGCGAGAGCTTCGACTGCCTCTTCGAGCTGCTG
pARBI- GGATCCCAGCCTGGCCAGCAGGCGGCGGGCGCGGGGCGGCGAGCCGGGGCCGGACGGC 1806 956:
TGGAGCCAGAACCGGCTGCTCTCCACGCCCCCCTCTCGGTGCTGCCCGGAGGCCGGACTCC
S0051_2 GCCTCCACCGAGCCCCCACCCGCCGGGAAGAGCTCCGCGGAGTACAGAGCCCATTTTCTAG
Endogen CTGTGTCCACTGAGGCTGAACGGATCCGCGCGGACTTGGTGCTCCGTGCTCGCCCCCTAGG
ous_62 GCCGGGTCCGCCGGGAGCGCCGCCCTCCGGAGTTGTCCGGCCGGCGCACACCTGCCCGGC
CCCGCAGCGCCCCAGCTCACCTCTTTGTCTCTCCCGCAGCGCACCCCCGGACGCTATGGCCC
ACCCCTCCGGCTGGCCCCTTCTGTAGGATGGTAGCACACAACCAGGTGGCAGCCGACAATG
CAGTCTCCACAGCAGCAGAGCCCCGAtccggatccggagagggcaggggatctctccttacttgtggcgacg tggaggagaaccccggccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCAT
CCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGA
GGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCC
GTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCC
CGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAG
CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGG
GCGACACCCTG GTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACA
TCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAA
GCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGT
GCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCG GCGACGGCCCCGTGCTGCTGCCC
GACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGAT
CACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGT
ACAAatgattgtttattgcagcttataatggttacaaataaagcaatagcatcacaaatttcacaaataaagcattttt tt cactgcattctagttgtggtttgtccaaactcatcaatgtatcttatCGGCGGCCAGAACCTTCCTCCTCTTCCT
CCTCCTcgcccgcggcccccgcgcgcccgcggccgtgccccgcggtcccggccccggCCCCCGGCGACACGCA
CTTCCGCACATTCCGTTCGCACGCCGATTACCGGCGCATCACGCGCGCCAGCGCGCTCCTGG
ACGCCTGCGGATTCTACTGGGGGCCCCTGAGCGTGCACGGGGCGCACGAGCGGCTGCGCG
CCGAGCCCGTGGGCACCTTCCTGGTGCGCGACAGCCGCCAGCGGAACTGCTTTTTCGCCCT
TAGCGTGAAGATGGCCICGGGACCCACGAGCATCCGCGTGCACTITCAGGCCGGCCGCTTT
CACCTGGATGGCAGCCGCGAGAGCTTCGACTGCCTCTTCGAGCTGCTGGAGCACTACGTGG
CGGCGCCGCGCCGCATGCTGGGGGCCCCGCTGCGCCAGCGCCGC

pARB I- CG

957: ATTTTCTAGCTGTGTCCACTGAGGCTGAACG GATCCGCGCGGACTTGGTGCTCCGTGCTCGC
SOCS1_3 CCCCTAGGGCCGGGTCCGCCGGGAGCGCCGCCCTCCGGAGTTGTCCGGCCGGCGCACACC
Endogen TGCCCGGCCCCGCAGCG CCCCAGCTCACCTCTTTGTCTCTCCCGCAGCGCACCCCCGGACGC
ous_63 TATG G CCCACCCCTCCGGCTGG CCCCTTCTGTAG GATG GTAGCACACAACCAG
GTG GCAGC
CGACAATGCAGTCTCCACAGCAGCAGAGCCCCGACGGCG GCCAGAACCTTCCTCCTCTTCCT
CCTCCTcgcccgcggcccccgcgcgcccgcggccgtgccccgcggtcccggccccggCCCCCGGCGACACGCA
CTTCCGCACAtccggatccggagagggcaggggatctctccttacttgtggcga cgtggaggaga a ccccggcccc ATGGTGAGCAAGGGCGAGGAGCTGITCACCGGGGTGGIGCCCATCCTGGICGAGCTGGAC
GC CGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTAC
GGCAAGCTGACCCTGAAGTTCATCTG CACCACCGG CAAGCTGCCCGTGCCCTGGCCCACCC
TCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCA
GCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCA
AG GACGACGGCAACTACAAGACCCGCGCCG AGGTGAAGTTCG AGG GCGACACCCTGGTG A
ACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGC
TGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCAT
CAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGG CAG CGTGCAGCTCGCCGACCA
CTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTG
AG CACCCA GTCCAAGCTGAG CAAAGACCCCAACGAG AAGCGCGATCACATGGTCCTGCTG
GAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAG CTGTACAAatgattgtttattgc agcttataatggtta ca aata a agca atagcatcaca a atttca ca a ata a agcatttttttca ctgcattctagttgtggt ttgtcca a a ctca tca a tgta tcttatTTCCGTTCGCACG CCGATTACCGGCGCATCACGCGCGCCAG C

G CG CTCCTG GA CG CCTG CG GATTCTACTG GG GC CCCCTG AG CGTG CACGGGG CG CACGAG
CG GCTGCGCGCCGAGCCCGTGGGCACCTTCCTGGTGCGCGACAGCCGCCAGCGGAACTGC
TTTTTCGCCCTTAGCGTGAAGATGGCCTCGGGACCCACGAGCATCCGCGTGCACTTTCAGGC
CGGCCGCTTTCACCTG GATGGCAGCCGCGAGAGCTTCGACTGCCTCTTCGAGCTGCTGG AG
CACTACGTGGCGGCG CCGCGCCGCATGCTGG GGGCCCCG CTG CGCCAGCGCCGCG TGCG G
CCGCTGCAGGAGCTGTGCCGCCAGCGCATCGTGGCCACCGTGGGCCGCGAGAACCTGG CT
CGCATCCCCCTCAACCCCGTCCTCCGCGACTACCTGAGCTCCTTC
pARB I- GG CAGACAAACAGTTAAAACAGAG

958: CTCTAGTTTTGTTAAACACCTCATCTGCTATCACGGTCTGG CGAAACCCTG G AG AAACTATT
SR P14_1_ ATTTCCACGACGGAAACATGTAATATCGAAGCACGTTAAATTCCACTCAAAGTGGCGGCGT
Endogeno CTGGGATCCCTGACAAAGAGCCCTGGGCTGCTTGGGGCCCATTGTTACCAGAAGCAACCTA
us_64 GGCGGCTCAGTGCTGGGGACTGAGCCAGCTACAATCCCTACTATTTTCCGGGCCCGAAGCC
CCGAATGTGCTCAGTACAGG GTGGGGAAAGGTAGAGGAAGG CGGCGGTCCGCGGCAGAC
AG ACTCCG GTG GCTCCCAGACACCGG GAACCCAGG GAGCATCGCCCG GCTCCCCTCCCGCT
GCTTACACTTCTTCAAGGTGATATAtccggatccggagagggcaggggatctctccttacttgtggcgacgtg gaggaga a ccccggccccATGG TGAGCAAGGG CGAGGAGCTGTTCACCGGGGTGGTGCCCATCC
TG GTCG AG CTG GACGGCG ACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGG GCGAG
GGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTG CACCACCGG CAAGCTGCCCG
TGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCC
GACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGC
GCACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCG CGCCGAGGTGAAGTTCGAGG
GCGACACCCTG GTG AACCGCATCGAG CTGAAGG G CATCGACTTCAAGGAGGACGGCAACA
TCCTGGGG CAC AA G CTG GAGTACAACTACAACAGCCACAACGTCTATATCATG G CC G A CAA
GCAGAAGAACG GCATCAA GGTGAACTTCAAGATCCG CCACAACATCGAGGACGGCAG CGT
GCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCG GCGACGGCCCCGTGCTGCTGCCC
GACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAG CGCGAT
CACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGG GATCACTCTCGGCATGGACGAGCTGT
ACAAatgattgtttattgcagcttata atggtta ca a a ta aagcaatagcatcaca a atttca ca a ata a agcattttttt ca ctgcattctagttgtggtttgtccaa a ctcatca atgtatctta tGACGCTGCCCG
ACGTCCGGCACTTCTG
GAAAAGTCTGGTCAGCTCCGTCAGGAACTGGAAAACAACAGGCCCAGCCCCGTTAGCCAG
CCCCAAATCCTCGTCCTGCCGCGTCAAG G CCCTG GTCCTCCTGCAGGAG GCACAGGTCTCG
AGTAACGCCTGAGCCG CCCCCTTCCCTCGGCCGGCCAGGCCTAGCCATACCTGCTCGCTCTC
CAACAACACCATCGCGGCGACGCTGGCTCGACTCCCTCCGCTTAAGCCCCTAGCAGTGAGA

GCCGGAAGTTCGG CCTAGGCTGGG CGGGACTTCCGCTACTAG ACTTCCATTGTCTTCCACCA
ATCTCTTCTCTTCCACCAATCCCGG CCTG CGCCCTCCCCCCTCCCCGCCCGCCTAG CG CCCGC
GCCCTGGGACGTCCGGGG GCCTGTGACCCG GAGG CGCTGGGGCTGTCCTGGGTC
pA RB I- GTCTGTCTGCCGCGGACCGC

959: GG CTTCGGGCCCGGAAAATAGTAG GGATTGTAGCTGG CTCAGTCCCCAGCACTG AG CCGC
SR P14_2_ CTAGGTTGCTTCTGGTAACAATGGGCCCCAAGCAGCCCAGGGCTCTTTGTCAGGGATCCCA
Endogeno GACGCCGCCACTTTGAGTGGAATTTAACGTGCTTCGATATTACATGTTTCCGTCGTGGAAAT
us_65_66 AATAGTTTCTCCAGGGTTTCGCCAGACCGTGATAGCAGATGAGGTGTTTAACAAAACTAGA
GCAGCGGGGCTGGGTTAGGTCATCACTTACCCCACTCCTCTGTTTTAACTGTTTGTCTGCCG
CTCAGCCACGTACGCGGCCGTGTTCACCAGGATGCATTTTTTCCTTCAGATGACGGTCGAAC
CAAACCCATTCCAAAGAAGtccggatccggagagggcaggggatctctccttacttgtggcgacgtggaggag aaccccggccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTC
GAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGA
TGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCT
GGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCAC
ATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCA
TCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACA
CCCTGGTGAACCGCATCGAG CTGAAGGGCATCGACTTCAAGGAGGACGG CAACATCCTGG
GGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAA
GAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCT
CGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAAC
CACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATG
GTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatg attgtttattgcagcttataatggttacaaataaagcaatagcatcacaaatttcacaaataaagcatttttttcactg cat tctagttgtggtttgtccaaactcatcaatgtatcttatGGTACTGTGGAGGGCTTTGAGCCCGCAGACAA
CAAGTGTCTGTTAAGAGCTACCGATGGGAAGAAGAAGATCAGCACTGTGGTGAGCTTAATT
TCTAAGGCTGTCTTTTGAAATGTAAAGACTTGAACTTAACAGAGGATGGGGCGTTTCTGAA
CCAGGCTTTTATTTGTTTTTCCCTTTTGCCCTGTGTGGCTATTTTTGAGACCAGGACCTTCCTA
TACTTTAGTAGTGGAAACCTCAAGAATAAAATAAGAAGGTAGAGGTCAGACAGTCGTTAGT
TCTGCTAAAGCTCTTGTGGAAATGAAAGTAGCATATTGGTCCTTATTCGTAGTACTGTAAGG
AGCAGCTGGCATAAATATTTGATTCCTTAGCCCCTTACTGTGCTGGGCCTTCAATAAGCAGT
TTCTTGGTACCAGGTGATGCTATTATTTCCTTATCTTTGTGGTTCAC
pARBI- CCCACGTCAGTGAGTGCTCACAAGTCTGTGAGGTAGGCGGCGCCACAGAGAGAAACTGAG 1806 960: GCGCGGGAGCCCAAGCCACTTGTCCAGCGCCTCAGCCGAGCGCGAACCGCGCTCGGGGAT
SRS F9_1_ GGCATCCACACGGCCCGGCCCCAGGCTCTCCCTGTCAGCGCCCGAAGGCCCTICGCACCTC
Endogeno CAGGGGGCGCCGGCCTGCGCGCACG CGCAATGGTCGCAGCCGCGTTCTCTTTAAGAGGAC
us_67 TCCTTTTGCCTCCGCCGACCCCTTCGCTTCCGCTCCGCGTTCCCACAATGCAGTGCGGCTGAG
CGCCTCGGAGCCCGCGGGGACGCTGCGGGGGGACCCGTGCTGAggcggcggcggcgacgtgggc tgcggcgggcccgcggcgtcgggcggtgcggatgtcgggctgggcggacgagcgcggcggcgAGGGCGACGGG
CGCATCTACtccggatccggagagggcaggggatctctccttacttgtggcgacgtggaggagaaccccggccccA
TGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACG
GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACG
GCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTC
GTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGC
ACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAG
GACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAAC
CGCATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGGCAACATCCTGGGGCACAAGCTG
GAGTACAACTACAACAG CCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCA
AGGTGAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTCGCCGACCACTA
CCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGC
ACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAG
TTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagctt ataatggttacaaataaagcaatagcatcacaaatttcacaaataaagcatttttttcactgcattctagttgtggttt gtc caaactcatcaatgtatcttatGTGGGGAACCTTCCGACCGACGTGCGCGAGAAGGACTTGGAGGA
CCTGTTCTACAAGTACGGCCGCATCCGCGAGATCGAGCTCAAGAACCGGCACGGCCTCGTG

CCCTTCGCCTTCGTGCGCTTCGAGG ACCCCCGGTGAGG cccccgcgcccctgccctctcctcctcggtgc ctgaggcccccgccttccctgctgtcccctcccccagggctcccctccc cc cggcctcccctc cc cccgcctc cg cgcaga cccctca CGGCGCCCCCTCACGGGTGGAGGATGAGG CAGCCTCTCCTCGCAGGCCCGGGCCG
TCCTTCGCGCCGTCGTCACTTCCTTTATTTTTATTATTCCAATATTTTACTTAGAAACCCAAAA
GCTGAGCCTTTGGAGGGCCCAAGCCCGCCTGCACCGGCCTCGGGGGGTCCAAATGAGCCT
TGTCCG CC
pARBI- TGTGGTGTGCTTTTCGAGAAAATGCTCACGAAGATTAGAAAAACACTGATGGTTTATTGGC 1806 961:
AAAATCAATAATCTGTCAGCAGTTGATTCttttttttCTCTGAATTAGCCCTCTTATGAGTAATCT
SRS F9_2_ GTTTGCTCATTCATTATAATTTCATACCTCTAAAACTGCGTGTGACAGCTGTAAAGGTTAATT
Endogeno CCAGTATCATGAAACGTCTCCAAACCAAAGCAGAAGTGCTTCAGGATCCTGATTTGTGTGTT
us_68 TTTTTCTTCACTCTAGGTTTCCCTTTTATGCTTACTACTCATGCCCCTCACTTGGAAAGTCACT
TGGCCTCCTGAACAGCACTAACTCCAAACGttttttttgttgttgttgttttttttAGAGATGCAGAGGA
TGCTATTTATGGAAGAAATGGTTATGATTATGGCCAGTGTCGGCTTCGTGTGGAGTTCCCCA
GGtccggatccggagagggcagggga tctctccttacttgtgg cga cgtgga ggag a a ccccggccccATGGTGA
GCAAGGGCGAGGAGCTGITCACCGGGGIGGIGCCCATCCIGGICGAGCTGGACGGCGAC
GTAAACGGCCACAAGTTCAGCGTGTCCGGCGAG GGCGAGGGCGATGCCACCTACGGCAAG
CTGACCCTGAAGTTCATCTGCACCACCG G CAAGCTGCCCGTGCCCTGGCCCACCCTCGTGAC
CACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACG AC
TTCTTCAAGTCCGCCATGCCCGAAG GCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACG A
CGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCAT
CGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTA
CAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTG
AACTTCAAGATCCGCCACAACATCGAGGACG GCAGCGTG CAG CTCGCCGACCACTACCAGC
AGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCA
GTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGT
GACCG CCGCCG GGATCACTCTCG GCATGGACGAGCTGTACAAatgattgtttattgcagcttata atg gtta ca a ataa agca a ta gcatca ca a atttcaca a ata a agcatttttttcactgca ttctagttgtggtttgtcca a act catca atgtatcttatACTTATG GAG GTCGGGGTGGGTGGCCCCGTGGTGGGAGGAATGGGCCT
CCTACAAGAAGATCTGATTTCCGAGTTCTTGTTTCAGGTATGTTCCTTTCAAACAG a a tg aga t gata catgtaa a ata ctta a ca ca gagtctgtcttcca ag aa atgatagctgttattcttCAGTG
CATGGGACACG
GGGGCTTTCTTTTCAATAGCCTGTGTGAAGCCTTGCCCTGGATTGCCAATGAGGAAAGTATC
CTGCAAATGAAATTGCG CTGGGAGTGCAGCCTTGGAAGAACATAACCATATTTCTTGTAAA
GGAGTTTTCTAGTGGTGAGAAGGAAAGATGATGGGAAAACTTGAGCTACAATTCTAAAGA
TG CTTCTTTTG GAATATACTTG G CATCAGACATG GTAGAAAG G CATTCAAG GAG CCAGATTT
GAACAACTTACCCAGCC
pARBI- GATGGCATCCACACGGCCCGGCCCCAGGCTCTCCCTGTCAGCGCCCGAAGGCCCTTCGCAC 1806 962:
CTCCAGGGGGCGCCGGCCTGCGCGCACGCGCAATGGTCGCAGCCGCGTTCTCTTTAAGAG
SRS F9_3_ GACTCCTTTTGCCTCCGCCGACCCCTTCGCTTCCGCTCCGCGTTCCCACAATGCAGTGCGGCT
Endogeno GAGCGCCTCGGAGCCCGCGGGGACGCTG CGG GGGG ACCCGTGCTGAggcggcggcggcga cgt us_69 gggctgcggcgggcccgcggcgtcgggcggtgcggatgtcgggctgggcgga cgagcgcggcggcgAG GGCGAC
GGGCGCATCTACGTGGGGAACCTTCCGACCGACGTG CGCGAGAAGGACTTGGAGGACCTG
TTCTACAAGTACGGCCGCATCCGCGAGATCGAGCTCAAGAACCGGCACGGCCTCGTGCCCT
TCGCCTTCtccggatccggagagggcaggggatctctccttacttgtggcga cgtggagga ga a ccccggccccAT
G GTGAG CAAG G G CGAG GAG CTGTTCACCGGG GTG GTG CCCATCCTG GTCGAGCTGG ACG
GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACG
GCAAGCTGACCCTG AAGTTCATCTGCACCACCGGCAAGCTG CCCGTGCCCTGG CCCACCCTC
GTGACCACCCTGACCTACGGCGTGCAGTG CTTCAGCCGCTACCCCGACCACATGAAGCAGC
ACGACTTCTTCAAGTCCGCCATGCCCGAAG GCTACGTCCAGGAGCGCACCATCTTCTTCAAG
GACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGG G CGACACCCTGGTGAAC
CGCATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGGCAACATCCTGGGGCACAAGCTG
GAGTACAACTACAACAG CCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCA
AGGTGAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTCGCCGACCACTA
CCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGC
ACCCAGTCCAAGCTGAG CAAAGACCCCAACGAGAAG CGCGATCACATGGTCCTG CTG GAG
TTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagctt ataatggttacaaataaagcaatagcatcacaaatttcacaaataaagcatttttttcactgcattctagttgtggttt gtc ca a actcatca atgtatctta tGTGCGCTTCGAGGACCCCCG GTGAGGcccccgcgcccctgccctctcctcc tcggtgcctgaggcccccgccttccctgctgtcccctcccccagggctcccctccccccggcctcccctccccccgcct ccg cgcagacccctcaCGGCGCCCCCTCACGGGTGGAGGATGAGGCAGCCTCTCCTCGCAGGCCCG
GG CCGTCCTTCGCGCCGTCGTCACTTCCTTTATTTTTATTATTCCAATATTTTACTTAGAAACC
CAAAAGCTGAGCCTTTGGAGGGCCCAAGCCCGCCTGCACCGG CCTCGG GGGGTCCAAATG
AG CCTTGTCCGCCTCCTG CCTG GGGCAGCACCGTAGGG G GAAGCGGCCGCGG GGCAGCGC
GG GGGTCGCCGTTCGCCCTTCCCGCTCGCCTCTCCCCCGGCCCGTGCTCGCCGTGGCTGGA
GAGCAAGCT
pARBI- GTAACTGACTAGCCTTAGTAGACTGTGGTTTCCAGGTTTATTCAGAAGTAGCAAGATCCCTC 1806 963:
CAtttttttttCTACCAAAGAAATCGTATGTGGGATCCCAAACCACAAAATAACCGTTCCTGTGG
SU B1_1_ TTAATACTACTATAATGCCTGAAGTGTCTTTTGGGATCCTGAGAACAGAGTTTGAAAACATT
Endogeno ACTAGACAGAAGGATTGGTTAGATTCATAGTTTTGTTGTTGAG TGAAACTTG CTTATGTATA
us_70 TATTTATGATATTTTGGATGTAGTCTTTTGATTGTTTAAATCTTAAAAAGTAATGGGATCTTT
TGACACTGGGGTATGTTTTATTTTTATGTG TGCAAATTTTAACCATATTCTTTTCTA GTTAAA
GAGGAAAAAGCAAGTTGCTCCAGAAAAACCTGTAAAGAAACAAAAGACAGGTGAGACTTC
GAGAGCCCTGTCAtccggatccgg agagggcaggggatctctcctta cttgtggcga cgtggaggaga a ccccgg ccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTG
GACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCAC
CTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCC
ACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAA
GCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGG CTACGTCCAGGAGCGCACCATCTTCT
TCAAG GACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGG
TGAACCGCATCGAG CTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGG GCACA
AGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACG
GCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCG
ACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTG CTGCTGCCCGACAACCACTA
CCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTG
CTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAG CTGTACAAatgattgttta ttg cagcttata a tggtta ca a ata aagcaatagcatcacaaatttca ca a ata a agca tttttttca ctgcattctagttg tggtttgtcca a a ctcatca a tgtatctta tTCTTCTAAACAGAGCAGCAG CAGCAGAGATGATAACAT
GTTTCAGGTAAAGTTGGCTAttttttttttttttttttttga catggagtcatgctctgtca cc ca ggctggagtgca gtggcgccatctcggctca ctgca a cctcagcctcctgagttca agcagttctctgcctcagcctcccgagtagctagga t tacaggcatccgccaccaga cctggcta atttttgtatttttagtagagatggggtttcaccatcttggccaggctggtctt ga a ctcctgaccttgtgatccaa ctgcctcagcctcca a a agtgctgggttta caggtgtgagcca ccatgccttg ccAA
AGTTGGCTGTTTCTTTAGATTCAGAGGAATTATTATCTGGCTTGATCTGAAGAATGTTAAAA
GTACTATGATCTGATAATTGCCTAATATG
pARBI- ATAAATAAACATATCCTGGAAATGGAATAAGTTGGTTATATTCTTTTTAATTAGTATATCTGC 1806 964:
TTCGTAAAATAAGTAACTGACTAGCCTTAGTAGACTGTGGTTTCCAGGTTTATTCAGAAGTA
SU B1_2_ GCAAGATCCCTCCATTTTTTTTTCTACCAAAGAAATCGTATGTGGGATCCCAAACCACAAAAT
Endogeno AACCGTTCCTGTGGTTAATACTACTATAATGCCTGAAGTGTCTTTTGGGATCCTGAGAACAG
us_71 AGTTTGAAAACATTACTAGACAGAAGGATTGGTTAGATTCATAGTTTTGTTGTTGAGTGAAA
CTTGCTTATGTATATATTTATGATATTTTGGATGTAGTCTTTTGATTGTTTAAATCTTAAAAAG
TAATGGGATCTTTTGACACTGG GGTATGTTTTATTTTTATGTGTGCAAATTTTAACCATATTC
TTTTCTAGTTAtccggatccgg agagggcaggggatctctcctta cttgtggcga cgtggaggaga a ccccggcccc ATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGAC
GG CGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTAC
GG CAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCC
TCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCA
GCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCA
AG GACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGG GCGACACCCTGGTGA
ACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGC
TGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCAT
CAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGG CAG CGTGCAGCTCGCCGACCA
CTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTG

AG CACCCAGTCCAAGCTGAG CAAAGACCCCAACGAG AAGCGCGATCACATGGTCCTGCTG
GAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgc agcttataatggtta ca aata a agca atagcatcaca a atttca ca a ata a agcatttttttca ctgcattctagttgtggt ttgtcca a a ctca tca a tgta tcttatAAGAGGAAAAAGCAAGTTGCTCCAGAAAAACCTGTAAAGAA
ACAAAAGACAGGTGAGACTTCGAG AGCCCTGTCATCTTCTAAACAGAGCAGCAGCAGCAG
AG ATG ATAACATGTTTCAG GTAAAGTTG G CTATTTTTTTTTTTTTTTTTTTTGACATGGAGTCA
TGCTCTGTCACCCAGGCTG GAGTGCAGTG GCGCCATCTCGG CTCACTGCAACCTCAGCCTCC
TGAGTTCAAGCAGTTCTCTGCCTCAGCCTCCCGAGTAGCTAGGATTACAGGCATCCGCCACC
AGACCTG G CTAATTTTTGTATTTTTAGTAGAGATG G G GTTTCACCATCTTG G CCAGG CTG GT
CTTGAACTCCTGACCTTGTGATCCAACTGCCTCAGCCTCCAAAAGTGCTGGGTTTACAGGTG
TGAGCCACCATGCCTTGCCAAAGTTGGCTGTTTCTTT
pARBI- AATTAGTATATCTGCTTCGTAAAATAAGTAACTGACTAGCCTTAGTAGACTGTGGTTTCCAG 1806 965: GTTTATTCAG AAGTAG
CAAGATCCCTCCATTTTTTTTTCTACCAAAGAAATCGTATGTG G GAT
SU B1_3_ CCCAAACCACAAAATAACCGTTCCTGTG GTTAATACTACTATAATGCCTGAAGTGTCTTTTG
End ogen o GGATCCTGAGAACAGAGTTTGAAAACATTACTAGACAGAAGGATTGGTTAGATTCATAGTT
us_72 TTGTTGTTGAGTG
AAACTTGCTTATGTATATATTTATGATATTTTGGATGTAGTCTTTTGATT
GTTTAAATCTTAAAAAGTAATG GGATCTTTTGACACTGGGGTATGTTTTATTTTTATGTGTGC
AAATTTTAACCATATTCTTTTCTAGTTAAAGAGGAAAAAGCAAGTTGCTCCAGAAAAACCTG
TAAAGAAACAAAAGtccggatccggag agggcaggggatctctcctta cttgtggcga cgtggaggaga acccc ggccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGC
TGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCG GCGAGGGCGAGGGCGATGCC
ACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCG GCAAGCTG CCCGTGCCCTG GC
CCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCG CTACCCCGACCACATG
AAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCT
TCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCC
TGGTGAACCGCATCGAGCTGAAGG GCATCGACTTCAAGGAGGACGGCAACATCCTGGG GC
ACAAGCTG GAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAG CAGAAGAA
CG GCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGC
CGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCAC
TACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCC
TGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgattgt ttattgcag cttataatggtta ca a ata a agcaatagcatcaca aatttcacaa ata a agcatttttttca ctgcattcta g ttgtggtttgtccaaactcatcaatgtatcttatACAGGTGAGACTTCGAGAGCCCTGTCATCTTCTAAAC
AG AG CAG CAG CAG CAGAG ATGATAACATGTTTCAG GTAAAGTTG G CTATTTTTTTTTTTTTT
TTTTTTGACATGGAGTCATGCTCTGTCACCCAGGCTGGAGTGCAGTGGCGCCATCTCGGCTC
ACTGCAACCTCAG CCTCCTGAGTTCAAG CAGTTCTCTGCCTCAGCCTCCCGAGTAGCTAG GA
TTACAGGCATCCGCCACCAGACCTGGCTAATTTTTGTATTTTTAGTAGAGATGG GGTTTCAC
CATCTTGGCCAGGCTGGTCTTG AACTCCTGACCTTGTGATCCAACTGCCTCAGCCTCCAAAA
GTGCTG GGTTTACAGGTGTGAGCCACCATGCCTTG CCAAAGTTG GCTGTTTCTTTAGATTCA
GAG GAATTATTATCTG G CTTGATCTGAAGAATGTTAAAAGT
pARBI- CACATTTTAATTTTTGTTTCCATGCTCTTTAGAATTCAACTAGAGGGCAGCCTTGTGGATGGC 1806 966:
CCCGAAGCAAGCCTGATGGAACAGGATAGAACCAACCATGTTGAGGGCAACAGACTAAGT
TET2_1_E CCATTCCTGATACCATCACCTCCCATTTGCCAGACAGAACCTCTGGCTACAAAGCTCCAGAA
ndogen o TGGAAGCCCACTGCCTGAGAGAGCTCATCCAGAAGTAAATGG AGACACCAAGTGGCACTCT
us_73 TTCAAAAGTTATTATGGAATACCCTGTATGAAGGGAAGCCAGAATAGTCGTGTGAGTCCTG
ACTTTACACAAGAAAGTAG AG G GTATTCCAAG TGTTTG CAAAATG GAG GAATAAAACG CAC
AGTTAGTGAACCTTCTCTCTCTGGGCTCCTTCAGATCAAGAAATTGAAACAAGACCAAAAGG
CTAATGGAGAAAGACGTAACtccggatccgga g agggcaggggatctctcctta cttgtggcga cgtggagg a ga a ccccggccccATGGTGAG CAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGT
CGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCG
ATGCCACCTACGGCAAG CTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCC
CTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACC
ACATGAAG CAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCAC
CATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGA
CACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCT

GGGG CACAAGCTGGAGTACAACTACAACAG CCACAACGTCTATATCATGGCCGACAAG CA
GAAGAACG G CATCAAGGTGAACTTCAAGATCCG CCACAACATCGAGGACGGCAGCGTGCA
GCTCG CCGACCACTACCAG CAGAACACCCCCATCGGCG ACG G CCCCGTGCTGCTGCCCG AC
AACCACTACCTGAGCACCCAGTCCAAGCTGAG CAAAGACCCCAACGAGAAG CGCGATCACA
TG GTCCTGCTGGAGTTCGTGACCGCCGCCG GGATCACTCTCG GCATG GACGAGCTGTACAA
atgattgtttattgcagcttataatggttacaaataaagcaatagcatcacaaatttcacaaataaagcatttttttca ctg cattctagttgtggtttgtccaaactcatcaatgtatcttatTTCGG GGTAAGCCAAGAAAGAAATCCAG GT
GAAAGCAGTCAACCAAATGTCTCCGATTTGAGTGATAAGAAAGAATCTGTGAGTTCTGTAG
CCCAAGAAAATGCAGTTAAAGATTTCACCAGTTTTTCAACACATAACTGCAGTGG GCCTGAA
AATCCAGAGCTTCAGATTCTGAATGAGCAGGAGGG GAAAAGTGCTAATTACCATGACAAG
AACATTGTATTA CTTAAAAACAAG G CA GTGCTAATG CCTAATG GTGCTACAGTTTCTGCCTC
TTCCGTGGAACACACACATGGTGAACTCCTGGAAAAAACACTGTCTCAATATTATCCAGATT
GTGTTTCCATTGCGGTGCAGAAAACCACATCTCACATAAATGCCATTAACAGTCAG GCTACT
AATGAGTTGTCCTGTGAGATCACTCACCCATCGCATACCTCAG GGCAGATC
pA RB I- AATGGAGAAAGACGTAACTTCGGG GTAAGCCAAGAAAGAAATCCAGGTGAAAG

967: CCAAATGTCTCCGATTTGAGTGATAAGAAAGAATCTGTGAGTTCTGTAGCCCAAGAAAATG
TET2_2_E CAGTTAAAGATTICACCAGTTTTICAACACATAACTGCAGTG G GCCTGAAAATCCAGAGCTT
ndogen o CAGATTCTGAATGAGCAGGAGGG GAAAAGTG CTAATTACCATGACAAGAACATTGTATTAC
us 74 TTAAAAACAAG GCAGTGCTAATG
CCTAATGGTGCTACAGTTTCTGCCTCTTCCGTGGAACAC
ACACATGGTGAACTCCTGGAAAAAACACTGTCTCAATATTATCCAGATTGTGTTTCCATTG C
GGTGCAGAAAACCACATCTCACATAAATG CCATTAACAGTCAGG CTACTAATGAGTTGTCCT
GTGAGATCACTCACCCATCGtccggatccggagagggcaggggatctctccttacttgtggcgacgtggagga ga a ccccggccccATGGTGAG CAAGG GCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGT
CGAGCTG GACG G CGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGG GCG AG GGCG
ATGCCACCTACG GCAAG CTGACCCTGAAGTTCATCTG CACCACCGGCAAGCTGCCCGTGCC
CTGG CCCACCCTCGTGACCACCCTGACCTACGG CGTG CAGTGCTTCAGCCGCTACCCCGACC
ACATGAAG CAG CACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAG GAGCGCAC
CATCTTCTTCAAGGACGACG GCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGA
CACCCTG GTGAACCGCATCGAGCTGAAGGG CATCGACTTCAAGGAGGACGGCAACATCCT
GGGG CACAAGCTGGAGTACAACTACAACAG CCACAACGTCTATATCATGG CCGACAAGCA
GAAGAACGG CATCAAGGTGAACTTCAAGATCCG CCACAACATCGAGGACGGCAGCGTGCA
GCTCG CCGACCACTACCAG CAGAACACCCCCATCGGCG ACG G CCCCGTGCTGCTGCCCG AC
AACCACTACCTGAGCACCCAGTCCAAGCTGAG CAAAGACCCCAACGAGAAG CGCGATCACA
TG GTCCTGCTGGAGTTCGTGACCGCCGCCG GGATCACTCTCG GCATG GACGAGCTGTACAA
atgattgtttattgcagcttataatggtta ca a ata a agca atagcatca ca aatttca ca a ataa agcatttttttca ctg cattctagttgtggtttgtcca aa ctcatca atgtatcttatCATACCTCAGG GCAGATCAATTCCG CACAGA

CCTCTAACTCTGAGCTGCCTCCAAAGCCAG CTGCAGTG GTGAGTGAG GCCTGTGATG CTGA
TGATGCTGATAATGCCAGTAAACTAG CTGCAATGCTAAATACCTGTTCCTTTCAGAAACCAG
AACAA CTACAA CAACAAAAATCAGTTTTTG A GATATG CCCATCTCCTG CAGAAAATAACATC
CAG GGAACCACAAAG CTAGCGTCTGGTGAAGAATTCTGTTCAG GTTCCAGCAGCAATTTGC
AAGCTCCTGGTGGCAGCTCTGAACG GTATTTAAAACAAAATGAAATGAATG GTG CTTACTT
CAA G CAAAGCTCA GTGTTCACTAAG G ATTCCTTTTCTGCCACTA CCACACCACCACCACCATC
ACAATTGCTTCTTTCTCCCCCTCCTCCTCTTCCACAGGTTCCTCAGCTT
pA RB I-968: ACACAACACTTTTAAG GGAAGTGAAAATAGAGG GTAAACCTG AGGCACCACCTTCCCAGAG
TET2_3_E TCCTAATCCATCTACACATGTATG CAG CCCTTCTCCGATGCTTTCTGAAAGGCCTCAGAATAA
ndogen o TTGTGTGAACAGGAATGACATACAGACTG CAGG GACAATGACTGTTCCATTGTGTTCTGAG
us_75 AAAACAAGACCAATGTCAGAACACCTCAAGCATAACCCACCAATTTTTG GTAGCAGTG
GAG
AGCTACAG GACAACTGCCAGCAGTTGATGAGAAACAAAGAGCAAGAGATTCTGAAGG GTC
GAGACAAGG AG CAAACACGAGATCTTGTGCCCCCAACACAGCACTATCTGAAACCAG GAT
GGATTGAATTG AAGGCCCCTCGTtccggatccggagagggcaggggatctctccttacttgtggcgacgtgga ggaga a ccccggccccATGGTGAGCAAGG G CGAG GAG CTGTTCACCGGG GTG GTGCCCATC CT
GGTCGAGCTG GACG G CGACGTAAACGGCCACAAGTTCAG CGTGTCCGGCGAGGG CG AG G
GCGATGCCACCTACGG CAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAG CTG CCC GT
GCCCTG GC CCACCCTCGTG ACCACCCTG ACCTACGGCGTGCAGTG CTTCAGCCGCTACCCCG

ACCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAG CG
CACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGG C
GACACCCTGGTGAACCG CATCGAG CTGAAGGG CATCGACTTCAAG GAG GACG GCAACATC
CTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAG C
AGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAG CGTGC
AG CTCGCCGACCACTACCAGCAG AACACCCCCATCGGCGAC GGCCCCGTGCTGCTG CCCGA
CAACCACTACCTG AGCACCCAGTCCAAGCTGAGCAAAGACCCCAACG AGAAGCGCGATCAC
ATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACA
Aatgattgtttattgcagcttata atggtta ca a a ta aagcaatagcatcacaa atttcaca a ata a agcatttttttca ct gcattctagttgtggtttgtccaa a ctcatca atgtatctta tTTTCACCAAGCG G
AATCCCATCTAAAACGT
AATG A GG CATCACTGCCATCAATTCTTCAG TATCAACCCAATCTCTCCAATCAAAT G ACCTCC
AAACAATACACTGGAAATTCCAACATGCCTGGGGGGCTCCCAAGGCAAGCTTACACCCAGA
AAACAACACAG CTG G AG CA CAAGTCACAAATGTA CCAAG TTG AAATG AATCAAG G G CA GT
CCCAAGGTACAGTGGACCAACATCTCCAGTTCCAAAAACCCTCACACCAGGTGCACTTCTCC
AAAACAG A CCATTTACCAAAAG CTCATG TGCA GTCACTGTGTG G CACTAG ATTTCATTTTCA
ACAAA G AG CA G ATTCCCAAA CTG AAAAACTTATG TCCC CAGTGTTG AAACA GCACTTG AAT
CAA CAG G CTTCAG A G ACTG A G CCATTTTCAAACTCACAC CTTTTG CAA CAT
pARBI- AAAGGGAGGTTGAGATGGGCTGAGGTCTTCTAGGAGGCGGGGAGGGAGTGCAGCCTTGA 1806 969: CAAGCCTCCCTGTGGGCGAGGTGTAAAGAGGGGCAGAAGTCAGCTCTGGAAGCATAGGG C

ndogen o GAGAAGACTGGAAAGACAACCTGAATGGGGGACTGGGAGCCTTGAATAACAGGCATGGA
us_76 GAGGAGCGTCTCTTGAAATGGAAGAAACAGGAAATAATTACAGCCTCTATGGAGGAGCAA
CAGGATGGACTGGAGAAACTATCATTCCAAAATCCAGTTGGGGCCTCAAAGGCCCTTAGAA
TTTTTCTAG G AAG GTTGAA G GC CAG CTG CTG ACCCAG G ACTCACATGTG CTTC GTCCTCTTC
CCTAGGAATGATGACAGGCACAATAGAAACAtccggatccggaga gggcaggggatctctccttacttgt ggcga cgtggaggaga a ccccgg ccccATG GTGAGCAAGG GCGAG GAGCTGTTCACCGGGGTG GT
GCCCATCCTGGTCGAGCTGGACG GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGA
GGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAA
GCTG CCCGTGCCCTGG CCCACCCTCGTGACCACCCTG ACCTACG GCGTGCAGTGCTTCAG CC
GCTACCCCGACCACATGAAG CAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGT
CCAGGAGCGCACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCGCGCCGAGGTGAA
GTTCGAGG GCGACACCCTG GTGAACCG CATCGA GCTGAAGGG CATCG ACTTCAAGGAG GA
CG G CAACATC CTG G G G CA CAA GCTGG A GTACAA CTA CAA CAG CCACAACGTCTATATCATG
GCCGACAAGCAGAAG AACGGCATCAAG GTGAACTTCAAGATCCGCCACAACATCGAGG AC
GGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGC
TGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAA
GCGCGATCACATG GTCCTGCTGG AGTTCG TGACCG CCG CCG GGATCACTCTCG GCATGGAC
GAGCTGTACAAatgattgtttattgcagcttata atggttaca a ata a agca ata gcatca ca a atttca ca aata a agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatctta tACGGG
GAACATTTCTGCAG
AG AAAGGTGGCTCTATCATCTTACAATGTCACCTCTCCTCCACCACGG CACAAGTGACCCAG
GTCAACTGGGAGCAGCAGGACCAGCTTCTGGCCATTTGTAATGCTGACTTGGGGTGGCACA
TCTCCCCATCCTTCAAGGATCGAGTGGCCCCAGGTCCCGGCCTGGGCCTCACCCTCCAGTCG
CTGACCGTGAACGATACAGGG GAGTACTTCTGCATCTATCACACCTACCCTGATG G GACGT
ACACTGG G AGAATCTTCCTG GAG GTCCTAGAAAGCTCAGGTATTCCTGCTGGAG CAAGTTG
GTGGATAAACCTCTCCCTCTAG CATAG A AAATG CAATC CTG AA ACACTG CACAG CAG G G CT
TCTCAATTCG G G ATCA CATTTG AATCACCT G AG G A G ATTTTAAATCATA CTGATG CC G AG G
C
pARBI- GAGGAGATGGGCTGGGCTGGGCTGGAGTAGAGATGTGTGGAGAAGGGTGAGAAGACTG 1806 970: GAAAGACAACCTG AATGGGG GACTGGGAGCCTTG AATAACAGGCATG GAG AGGAGCGTC
TIGIT_2_E TCTTGAAATGGAAGAAACAGGAAATAATTACAGCCTCTATG GAGGAGCAACAGGATGGAC
ndogen o TG GAG AAA CTATCATTC CAAAATCCAG TTG GGGCCTCAAAG GCC CTTAGAATTTTTCTAG
GA
us 77 AG GTTGAAGGCCAGCTGCTGACCCAGG
ACTCACATGTGCTTCGTCCTCTTCCCTAGGAATG
ATGACAGGCACAATAGAAACAACG G G G AA CATTTCTG CA G AG AAAG GTG GCTCTATCATCT
TACAATGTCACCTCTCCTCCA CCACGG CACAAGTGACCCAGGTCAACTGG GAGCAGCAG GA
CCAGCTTCTGGCCATTTGTAATGCTGACtccggatccggagagggcaggggatctctccttacttgtggcga cgtggaggaga a ccccggccccATGGTGAGCAAGG GCGAG GAGCTGTTCACCGGGGTGGTGCCC
ATCCTGGTCGAGCTGGACG GCGACGTAAACG GCCACAAGTTCAGCGTGTCCG GCGAGG GC
GAGGGCGATGCCACCTACGGCAAGCTGACCCTG AAGTTCATCTGCACCACCGGCAAGCTG C
CCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTAC
CCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGG
AG CGCACCATCTTCTTCAAGGACGACGG CAACTACAAG ACCCGCGCCGAGGTGAAGTTCGA
GGGCGACACCCTGGTGAACCG CATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGGCAA
CATCCTG GGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGAC
AAGCAGAAGAACGG CATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACG GCAGC
GTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGC
CCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCG
ATCACATGGICCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCT
GTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatca ca aa tttca caa ata a agcattt ttttca ctgcattcta gttgtggtttgtcca a a ctcatca atgtatcttatTTGG GGTG
GCACATCTCCCCATCCT
TCAAG GATCGAGTGGCCCCAGGTCCCGGCCTGG GCCTCACCCTCCAGTCGCTGACCGTGAA
CGATACAGGGGAGTACTTCTGCATCTATCACACCTACCCTGATGGGACGTACACTGGGAGA
ATCTTCCTGGAGGTCCTAG AAAGCTCAGGTATTCCTGCTG GAGCAAGTTGGTGGATAAACC
TCTCCCTCTAGCATAGAAAATGCAATCCTGAAACACTGCACAGCAGGGCTTCTCAATTCGGG
ATCACATTTGAATCACCTGAGGAGATTTTAAATCATACTGATGCCGAGGCCTCACCCAGACC
AATTCAATCAGAATCCCTAATAGCAGAGCTAAACAAGG GTAAGGTCTAAAAGCATTTCCAG
GTGATTCTAATGGGCAGCCAATACTGAGAACCACTGTTCTTATGTAAGAAGCACATC
pARBI- CCTCCCTGTG GGCGAGGTGTAAAGAGG

971: GGGTGGGGAGGAGATGGGCTGGGCTG GGCTGGAGTAGAGATGTGTGGAG
AAGGGTGAG
TIGIT_3_E AAGACTGGAAAGACAACCTGAATG GGGGACTGGGAGCCTTGAATAACAGG CATGGAGAG
ndogen o GAGCGTCTCTTGAAATGGAAGAAACAG GAAATAATTACAGCCTCTATG GAG GAGCAACAG
us_78 GATGGACTGGAGAAACTATCATTCCAAAATCCAGTTGGGGCCTCAAAGGCCCTTAGAATTT
TTCTAG GAAGGTTGAAG GCCAGCTGCTGACCCAGGACTCACATGTGCTTCGTCCTCTTCCCT
AGGAATGATGACAGG CACAATAGAAACAACGG GGAACATTTCTGCAGAGAAAG GTG GCTC
TATCATCTTACAATGTCACCTCTCCTCCACCtccggatccggagagggcaggggatctctccttacttgtggc gacgtggaggagaaccccggccccATGGTGAG CAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCC
CATCCTG GTCGAGCTGGACGGCGACGTAAACGG CCACAAGTTCAGCGTGTCCGGCGAGGG
CGAGGGCGATGCCACCTACGG CAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTG
CCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTA
CCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAG
GAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCG
AG GGCGACACCCTG GTGAACCGCATCGAGCTGAAGGG CATCGACTTCAAGGAGGACGGCA
ACATCCTGGGG CACAAGCTG GAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGA
CAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAG
CGTGCAGCTCG CCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTG
CCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGC
GATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAG C
TGTACAAatgattgtttattgcagcttata atggtta ca a ata aa gca atagcatcacaa atttca ca a a ta a agcatt tttttca ctgcattctagttgtggtttgtccaa a ctcatca atgtatctta tACGGCACAAGTGACCCAGGTCAAC
TGGGAGCAG CAG GACCAG CTTCTG GCCATTTGTAATGCTGACTTG GGGTGG CACATCTCCC
CATCCTTCAAGGATCGAGTG GCCCCAGGTCCCGGCCTG GGCCTCACCCTCCAGTCGCTGAC
CGTGAACGATACAGGGGAGTACTTCTGCATCTATCACACCTACCCTGATGGGACGTACACT
GGGAGAATCTTCCTGGAGGTCCTAGAAAGCTCAGGTATTCCTGCTGGAGCAAGTTGGTGG
ATAAACCTCTCCCTCTAGCATAGAAAATGCAATCCTGAAACACTGCACAGCAGGGCTTCTCA
ATTCGG GATCACATTTGAATCACCTGAG G AGATTTTAAATCATACTGATG CCG AG G CCTCAC
CCAGACCAATTCAATCAGAATCCCTAATAGCAGAGCTAAACAAGGGTAAGGTCTAAAAG
pA R B I-972: GG
CCTTTTTCCCATGCCTGCCTTTACTCTGCCAGAGTTATATTGCTGGGGTTTTGAAGAAGAT
TRAC_1_ CCTATTAAATAAAAGAATAAG CAGTATTATTAAG TAG CCCTG CATTTCAG GTTTCCTTGAGT
End ogeno GG CAGGCCAGGCCTG GCCGTGAACGTTCACTGAAATCATGGCCTCTTGGCCAAGATTGATA
us_79 GCTTGTGCCTGTCCCTGAGTCCCAGTCCATCACGAGCAGCTGGTTTCTAAGATGCTATTTCC

CGTATAAAGCATGAGACCGTGACTTG CCAGCCCCACAG AGCCCCGCCCTTGTCCATCACTG
GCATCTG GACTCCAGCCTG GGTTGGG GCAAAGAGGGAAATGAGATCATGTCCTAACCCTG
ATCCTCTTGTCCCACAGATtccggatccggagagggcaggggatctctccttacttgtggcgacgtggaggaga accccggccccATGGTGAGCAAGGG CGAG GAGCTGTTCACCG GGGTGGTGCCCATCCTGGTC
GAGCTGGACGG CGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGG CGAGG GCGA
TGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTG CACCACCGGCAAGCTGCCCGTG CCCT
GGCCCACCCTCGTGACCACCCTGACCTACGG CGTG CAGTGCTTCAGCCGCTACCCCGACCAC
ATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAG G CTACGTCCAGGAGCGCACCA
TCTTCTTCAAG GACGA CGGCAACTACAAGACCCGCGCCG AG GTGAAGTTCGAGG GCGACA
CCCTGGTGAACCGCATCGAG CTGAAGGG CATCGACTTCAAG GAGGACGG CAACATCCTG G
GGCACAAG CTGGAGTACAACTACAACAGCCACAACGTCTATATCATG G CC GACAAGCAGAA
GAACGGCATCAAG GTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAG CGTG CAG CT
CG CCGACCACTACCAGCAGAACACCCCCATCGGCG ACGG CCCCGTGCTGCTGCCCGACAAC
CACTACCTGAGCACCCAGTCCAAGCTGAG CAAAGACCCCAACGAGAAG CGCGATCACATG
GTCCTG CTGGAGTTCGTGACCGCCGCCGG GATCACTCTCGG CATGGACGAGCTGTACAAatg attgtttattgcagcttata atggttacaa ata a agcaatagcatca ca a atttca ca aata a a gcatttttttc a ctg cat tctagttgtggtttgtcca a a ctcatca atgtatctta tATCCAGAACCCTG ACCCTG
CCGTGTACCAGCTG
AGA GACTCTAAATCCAGTGA CAA GTCTGTCTG CCTATTCA CCG ATTTTGATTCTCAAACAAAT
GTGTCACAAAGTAAGGATTCTGATGTGTATATCACAGACAAAACTGTGCTAGACATGAG GT
CTATGGACTTCAAGAG CAACAGTGCTGTG G CCTG GAG CAACAAATCTGACTTTG CATGTG C
AAACG CCTTCAACAACAGCATTATTCCAGAAGACACCTTCTTCCCCAG CCCAGGTAAGGGCA
GCTTTG GTG CCTTCGCAGGCTGTTTCCTTGCTTCAG GAATGGCCAGGTTCTGCCCAGAG CTC
TGGTCAATGATGTCTAAAACTCCTCTGATTGGTGGTCTCGGCCTTATCCATTGCCACCAAAA
CCCTCTTTTTACTAA GAAACA GTGA GCCTTGTTCTG G CA GTCCA G A
pA RB I- GATTCCAAGATGTACAGTTTGCTTTGCTGGG

973: AGTTATATTG CTG GGGTTTTGAAGAAGATCCTATTAAATAAAAGAATAAGCAGTATTATTAA
TRAC_2_ GTAGCCCTGCATTTCAGGTTTCCTTGAGTG G CAG GCCAGGCCTGGCCGTGAACGTTCACTG
Endogeno AAATCATG GCCTCTTG GCCAAGATTGATAGCTTGTG CCTGTCCCTGAGTCCCAGTCCATCAC
us_80 GAGCAGCTGGTTTCTAAGATGCTATTTCCCGTATAAAGCATGAGACCGTGACTTG
CCAGCCC
CACAGAGCCCCGCCCTTGTCCATCACTG G CATCTGGACTCCAG CCTGGGTTG G GGCAAA GA
GGGAAATGAGATCATGTCCTAACCCTGATCCTCTTGTCCCACAGATATCCAGAACCCTGACC
CTGCCGTGTACCAGCTGtccggatccgga gagggcaggggatctctccttacttgtggcgacgtgg aggaga a c cccggccccATGG TGAG CAAGG G CGAGGAG CTGTTCACCGGG GTGGTGCCCATCCTG GTCGA
GCTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCGGCGAG GGCGAGG GCGATG
CCACCTACGGCAAG CTGACCCTGAAGTTCATCTG CACCACCGG CAAGCTGCCCGTGCCCTG
GCCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAGCCGCTACCCCGACCACA
TGAAG CAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGG CTACGTCCAG GAGCGCACCAT
CTTCTTCAAGGACGACGGCAACTACAAGACCCGCG CCGAGGTGAAGTTCGAGG GCGACAC
CCTGGTGAACCGCATCGAGCTGAAGGG CATCGACTTCAAGGAGGACG GCAACATCCTGGG
GCACAAGCTGGAGTACAACTACAACAG CCACAACGTCTATATCATG G CCG ACAAG CAGAAG
AACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTC
GCCGACCACTACCAGCAGAACACCCCCATCGG CGACGGCCCCGTGCTGCTGCCCGACAACC
ACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATG GT
CCTGCTGGAGTTCGTGACCGCCG CCG GGATCACTCTCGG CATG GACGAGCTGTACAAatgat tgtttattgcagcttataatggttacaaataaagcaatagcatcacaaatttcacaaataaagcatttttttcactgca ttc tagttgtggtttgtccaa a ctcatca atgtatcttatAGAGACTCTAAATCCAGTGACAAGTCTGTCTGCCT
ATTCACCGATTTTGATTCTCAAACAAATGTGTCACAAAGTAAGGATTCTGATGTGTATATCA
CAGACAAAACTGTG CTAGACATGAGGTCTATG GACTTCAAGAGCAACAGTG CTGTGG CCTG
GAGCAACAAATCTGACTTTGCATGTGCAAACG CCTTCAACAACAGCATTATTCCAGAAGACA
CCTTCTTCCCCAGCCCAGGTAAGG GCAGCTTTGGTGCCTTCGCAGG CTGTTTCCTTG CTTCA
GGAATGG CCAG GTTCTGCCCAGAG CTCTGGTCAATGATGTCTAAAACTCCTCTGATTG GTG
GTCTCG G CCTTATCCATTG CCACCAAAACCCTCTTTTTA CTAAG AAA CAG TGAG CCTTGTTCT
GG CAGTCCAGAGAATGACACG GGAAAAAAGCAGATGAAGAG AAG
pA R B I- TTCCTGTGGTAGCG

974: TTCTGAGCACCTACCCCATCCCCAGAAG GG CTCAGAAATAAAATAAGAGCCAAGTCTAGTC

TRAC_3_ G G TG TTTCCTGTCTTG AAA CACAATACTGTTG G CCCTG G AAG A ATG CACAG AATCTG
TTTG T
Endogeno AAGGGGATATGCACAGAAGCTGCAAGGGACAGGAG GTGCAGGAGCTGCAGGCCTCCCCC
us_81 ACCCAGCCTGCTCTGCCTTGGG GAAAACCGTG
GGTGTGTCCTGCAGGCCATGCAGGCCTGG
G ACAT G CA AG CCCATAA CCG CTG TG G C CTCTTG G TTTTACAG ATAC G AA CCTAAA
CTTTCAA
AACCTGTCAGTGATTGGGTTCCGAATCCTCCTCCTGAAAGTGGCCGGGTTTAATCTGCTCAT
GACGCTGCGGCTGTGGTCCAGCtccggatccggagagggcaggggatctctcctta cttgtggcga cgtgga ggaga a ccccggccccATGGTGAGCAAGG G CGAG GAG CTGTTCACCGGG GTG GTGCCCATCCT
GGTCG AGCTG GACG G CGACGTAAACGGCCACAAGTTCAG CGTGTCCGGCGAGGG CG AG G
GCGATGCCACCTACGGCAAGCTGACCCTG AAGTTCATCTGCACCACCGGCAAG CTG CCC GT
GCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCG
ACCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAG CG
CACCATCTTCTTCAAG GACGACGG CAACTACAAGACCCGCGCCGAGGTGAAGTTCG AGGG C
GACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAG GAGGACGGCAACATC
CTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAG C
AGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAG CGTGC
AG CTCGCCGACCACTACCAGCAG AACACCCCCATCGGCGAC GGCCCCGTGCTGCTG CCCGA
CAACCACTACCTG AGCACCCAGTCCAAGCTGAGCAAAGACCCCAACG AGAAGCGCGATCAC
ATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACA
Aatgattgtttattgcagcttata atggtta ca a a ta aagcaatagcatcacaa atttcaca a ata a agcatttttttca ct gcattctagttgtggtttgtccaa a ctcatca atgtatctta tTGAGGTGAGGGGCCTTGAAGCTGGGAGTG
GGGTTTAGGGACGCGGGTCTCTGGGTGCATCCTAAGCTCTGAGAGCAAACCTCCCTGCAGG
GTCTTG CTTTTAAG TCCAAA G CCTG AG CCCACCAAA CTCTCCTA CTTCTTCCTGTTACAAATT
CCTCTTGTG CAATAATAATG G C CTG AAA CG CTGTAAAATATC CTCATTTCA G CCG CCTCA GTT
GCACTTCTCCCCTATGAGG TAG GAAGAACAGTTGTTTAGAAACGAAG AAACTGAGGCCCCA
CAG CTAATGAGTG G AG GAAGAGAG ACACTTGTGTACACCACATGCCTTGTGTTGTACTTCT
CTCACCGTGTAACCTCCTCATGTCCTCTCTCCCCAGTACGG CTCTCTTA G CT CAG TAG AAAG A
AG ACATTACACTCATATTACACCCCAATCCTGG CTAGAGTCTCCG CACC
pARB I- ATTATTAAGTAGCCCTGCATTTCAGGTTTCCTTGAGTGGCAGGCCAGGCCTG

975: GTTCACTGAAATCATG
GCCTCTTGGCCAAGATTGATAGCTTGTGCCTGTCCCTGAGTCCCAG
TRAC_4_ TCCATCA C G AG CA G CTG GTTTCTAA G ATG CTATTTCCC G TATAAA G CATGAG
ACCG TG ACTT
Endogeno GCCAG CCCCACAG AGCCCCG CCCTTGTCCATCACTGG CATCTG GACTCCAGCCTG GGTTGG
us_82 G G CAAAGAG G
GAAATGAGATCATGTCCTAACCCTGATCCTCTTGTCCCACAGATATCCAG A
ACCCTGACCCTGCCGTGTACCAGCTGAGAGACTCTAAATCCAGTGACAAGTCTGTCTGCCTA
TTCACCG ATTTTGATTCTCAAACAAATGTGTCACAAAGTAAGGATTCTGATGTGTATATCAC
AG ACAAAACTGTG CTAG ACtccggatccggaga gggcaggggatctctc ctta cttgtggcga cgtgga ggag aaccccggccccATGGTG AGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTC
GAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGG CGAGGGCGA
TGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCT
GGCCCACCCTCGTGACCACCCTGACCTACGG CGTG CAGTGCTTCAGCCGCTACCCCG ACCAC
ATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCA
TCTTCTTCAAG GACGA CGGCAACTACAAGACCCGCGCCG AG GTGAAGTTCGAGG GCGACA
CCCTGGTGAACCGCATCGAG CTGAAGGGCATCGACTTCAAGGAGGACGG CAACATCCTGG
G G CACAAG CTG G AGTACAACTACAACAG CCACAACGTCTATATCATG G CC GACAAGCAGA A
GAACGGCATCAAG GTGAACTTCAAGATCCGCCACAACATCGAGGA CGGCAGCGTG CAG CT
CGCCGACCACTACCAGCAGAACACCCCCATCGGCG ACGGCCCCGTGCTGCTGCCCGACAAC
CACTACCTGAGCACCCAGTCCAAGCTGAG CAAAGACCCCAACGAGAAGCGCGATCACATG
GTCCTGCTGGAGTTCGTGACCGCCGCCGG GATCACTCTCGGCATGGACGAGCTGTACAAatg attgtttattgcagcttata atggttacaa ata a agcaatagcatca ca a atttca ca aata a a gcatttttttc a ctg ca t tctagttgtggtttgtcca a a ctcatca atgtatctta tATGAGGTCTATGGACTTCAAGAGCAACAGTGCT
GTGGCCTG GAG CAACAAATCTG ACTTTG CATG TG CAAA CGC CTTCAA CAACA G CATTATTC C
AG AAGACACCTTCTTCCCCAGCCCAGGTAAGG GCAGCTTTGGTGCCTTCGCAGGCTGTTTCC
TTG CTTCAG G AATG G CCAG G TTCTG CC CAGA G CTCTG GTCAATG ATGTCTAAAACTCCTCTG
ATTG GTG G TCTCG G CCTTATCCATTG C CACCAAAACCCTCTTTTTACTAAG AAACA G TG AG C
CTTGTTCTGGCAGTCCAGAGAATGACACG GGAAAAAAGCAGATGAAGAGAAGGTGGCAG

GAGAGGGCACGTGGCCCAGCCTCAGTCTCTCCAACTGAGTTCCTGCCTGCCTGCCTTTGCTC
AGACTGTTTGCCCCTTACTGCTCTTCTAGGCCTCATTCTAAGCCCCTT
pARBI- TTCATTTCCATTTGAGTTGTTCTTATTGAGTCATCCTTCCTGTGGTAGCGGAACTCACTAAGG 1806 976: GGCCCATCTGGACCCGAGGTATTGTGATGATAAATTCTGAGCACCTACCCCATCCCCAGAA
TRAC_6_ GGGCTCAGAAATAAAATAAGAGCCAAGTCTAGTCGGTGTTTCCTGTCTTGAAACACAATAC
Endogeno TGTTGGCCCTGGAAGAATGCACAGAATCTGTTTGTAAGGGGATATGCACAGAAGCTGCAA
us_84 GGGACAGGAGGTGCAGGAGCTGCAGGCCTCCCCCACCCAGCCTGCTCTGCCTTGGGGAAA
ACCGTGGGTGTGTCCTGCAGGCCATGCAGGCCTGGGACATGCAAGCCCATAACCGCTGTG
GCCTCTTGGTTTTACAGATACGAACCTAAACTTTCAAAACCTGTCAGTGATTGGGTTCCGAA
TCCTCCTCCTGAAAGTGGCCGGGtccggatccggagagggcaggggatctctccttacttgtggcgacgtgga ggagaaccccggccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCT
GGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGG
GCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGT
GCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCG
ACCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAG CG
CACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGC
GACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAG GAGGACGGCAACATC
CTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGC
AGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGC
AGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGA
CAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCAC
ATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACA
Aatgattgtttattgcagcttataatggttacaaata aagcaatagcatcacaa atttcaca aataaagcatttttttcact gcattctagttgtggtttgtccaaactcatcaatgtatcttatTTTAATCTGCTCATGACGCTGCGGCTGTGG
TCCAGCTGAGGTGAGGGGCCTTGAAGCTGGGAGTGGGGTTTAGGGACGCGGGTCTCTGG
GTGCATCCTAAGCTCTGAGAGCAAACCTCCCTGCAGGGTCTTGCTTTTAAGTCCAAAGCCTG
AGCCCACCAAACTCTCCTACTTCTTCCTGTTACAAATTCCTCTTGTGCAATAATAATGGCCTG
AAACGCTGTAAAATATCCTCATTTCAGCCGCCTCAGTTGCACTTCTCCCCTATGAGGTAGGA
AGAACAGTTGTTTAGAAACGAAGAAACTGAGGCCCCACAGCTAATGAGTGGAGGAAGAGA
GACACTTGTGTACACCACATGCCTTGTGTTGTACTTCTCTCACCGTGTAACCTCCTCATGTCC
TCTCTCCCCAGTACGGCTCTCTTAGCTCAGTAGAAAGAAGACATTACACTC
pARBI- CAGTCCATCACGAGCAGCTGGTTTCTAAGATGCTATTTCCCGTATAAAGCATGAGACCGTGA 1806 977: CTTGCCAGCCCCACAGAGCCCCGCCCTTGTCCATCACTGGCATCTGGACTCCAGCCTGGGTT
TRAC_En GGGGCAAAGAGGGAAATGAGATCATGTCCTAACCCTGATCCTCTTGTCCCACAGATATCCA
dogeno us GAACCCTGACCCTGCCGTGTACCAGCTGAGAGACICTAAATCCAGTGACAAGICTGTCIGC

CTATTCACCGATTTTGATTCTCAAACAAATGTGTCACAAAGTAAGGATTCTGATGTGTATATC
ACAGACAAAACTGTGCTAGACATGAGGTCTATGGACTTCAAGAGCAACAGTGCTGTGGCCT
GGAGCAACAAATCTGACTTTGCATGTGCAAACGCCTTCAACAACAGCATTATTCCAGAAGA
CACCTTCTTCCCCAGCCCAtccggatccggagagggcaggggatctctccttacttgtggcgacgtggaggaga accccggccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTC
GAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGA
TGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCT
GGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCAC
ATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCA
TCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACA
CCCTGGTGAACCGCATCGAG CTGAAGGGCATCGACTTCAAGGAGGACGG CAACATCCTGG
GGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAA
GAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCT
CGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAAC
CACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATG
GTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatg attgtttattgcagcttataatggttacaaataaagcaatagcatcacaaatttcacaaataaagcatttttttcactg cat tctagttgtggtttgtccaaactcatcaatgtatcttatGGTAAGGGCAGCTTTGGTGCCTTCGCAGGCTGT
TTCCTTGCTTCAGGAATGGCCAGGTTCTGCCCAGAGCTCTGGTCAATGATGTCTAAAACTCC
TCTGATTGGTGGTCTCGGCCTTATCCATTGCCACCAAAACCCTCTTTTTACTAAGAAACAGTG

AGCCTTGTTCTGGCAGTCCAGAGAATGACACG GGAAAAAAGCAGATGAAGAGAAGGTGGC
AGGAGAGGGCACGTGGCCCAGCCTCAGTCTCTCCAACTGAGTTCCTGCCTGCCTGCCTTTGC
TCAGACTGTTTGCCCCTTACTGCTCTTCTAGGCCTCATTCTAAGCCCCTTCTCCAAGTTGCCTC
TCCTTATTTCTCCCTGTCTGCCAAAAAATCTTTCCCAGCTCACTAAGTCAGTCTCACGCAGTC
ACTCATTAACCCACCAATCACTGATTGTGCCGGCACATGAATG
pARBI- TGAGGAAGAAGCGCGGGCGGCGCCTTCGGGAGGCGAGCAGGCAGCAGTTGGCCGTGCCG 1806 978: TAGCAGCGTCCCGCGCGCGGCGGGCAGCGGCCCAGGAGGCGCGTGGCGGCGCTCGGCCT
TRIM28_ CGCGGCGGCGGCGGCGGCAGCGGCCCAGCAGTTGGCGGCGAGCGCGTCTGCGCCTGCGC
1_Endoge GGCGGGCCCCGCGCCCCTCCTCCCCCCCTGGGCGCCCCCGGCGGCGTGTGAATGGCGGCCT
nous_85 CCGCGGCGGCAGCCTCGGCAGCAGCGGCCTCGGCCGCCTCTGGCAGCCCGGGCCCGGGCG
AGGGCTCCGCTGGCGGCGAAAAGCGCTCCACCGCCCCTTCGGCCGCAGCCTCGGCCTCTGC
CTCAGCCGCGGCGTCGTCGCCCGCGGGGGGCGGCGCCGAGGCGCTGGAGCTGCTGGAGC
ACTGCGGCGTGTGCAGAGAGCGCCTGCGACCCtccggatccggagagggcaggggatctctccttactt gtggcgacgtggaggagaaccccggccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGG
TGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCG
AGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCA
AGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGC
CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACG
TCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAA
GTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGA
CGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATG
GCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGAC
GGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGC
TGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAA
GCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGAC
GAGCTGTACAAatgattgtttattgcagcttataatggttacaaataaagcaatagcatcacaaatttcacaaataa agcatttttttcactgcattctagttgtggtttgtccaaactcatcaatgtatcttatGAGAGGGAGCCCCGCCTGC
TGCCCTGTTTGCACTCGGCCTGTAGTGCCTGCTTAGGGCCCGCGGCCCCCGCCGCCGCCAAC
AGCTCGGGGGACGGCGGGGCGGCGGGCGACGGCACCGGTAAGTACGAAGTGATCGGTGC
CACCCCTCCCCCTACTCTCTGCCTTTGATTCCGACTGGGTGCAGAGATGAGGATGCCACCTG
GGCGAGAGGATGGGGGCCCGGACAGGGCACGGGAAATACTTTCTGGGTCCTGCATACGA
ACGTGGGTTTGTGCTGGCCGCTGAGATGGGACATCTGACTAAAGTTGGAGAAAAGAAGGC
TCGGGGAGGGGAGGGGCTGGTTCGCTGCGGGATAATGGTCGGGGGCCCACCCAGCAGGG
GAATGGTGGGGGCCATAACCTGGGTGGGAACTTGTAACAGTCTCCCACATCCCTGCTTCTC
GAAGTGGTG
pARBI- TCGAAGTGGTGGACTGTCCCGTGTGCAAGCAACAGTGCTTCTCCAAAGACATCGTGGAGAA 1806 979: TTATTTCATGCGTGATAGTGGCAGCAAGGCTGCCACCGACGCCCAGGATGCGAACCAGGTG
TRIM28_ CGTCCTATCTCAGCAACCACAAGGAGGTTTCTGGGGAGGGGGCATCTGCGCAGGAGGAGC
2_Endoge TTGGCACCAGCTCCAGGCTGTTACTCCACTTTCCCAAGGCTCTGGGTGGGCTGCCTAGGTTG
nous_86 GGTCAAGGGACCAATCTTAAATCTCCGGTTGTATTTTCTGGGATGTAAACGTGGATCTATCA
AGTTGTCTTGCCTTCTCTGACCCTGCCTTTGTCTGGCAGTGCTGCACTAGCTGTGAGGATAA
TGCCCCAGCCACCAGCTACTGTGTGGAGTGCTCGG AGCCTCTGTGTGAGACCTGTGTAGAG
GCGCACCAGCGGGTGAAGTACtccggatccggagagggcaggggatctctccttacttgtggcgacgtggag gagaaccccggccccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTG
GTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGG
CGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTG
CCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGA
CCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGC
ACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGC
GACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATC
CTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGC
AGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGC
AGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGA
CAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCAC
ATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACA

Aatgattgtttattgcagcttata atggtta ca a a ta aagcaatagcatcacaa atttcaca a ata a agcatttttttca ct gcattctagttgtggtttgtccaaactcatcaatgtatcttatACCAAGGACCATACTGTGCGCTCTACTG GT
ACATGAGGCTGAGGGGGGCTGTTGGAGTTGTTCTCCCATGTGTGCCCTCAGTTGCTTTTATG
ATGTTGGTTGCATCTGGTGGATGGGTCCTAGAGTTCTCTAGGGGGTGCCGCCCGAAGGGCC
CGAGGGCAGAACTCCAGAAGCAGAAAAACTGGGGTTGTGGTGTGTAGTCTCTAGGGCCTA
GGTGGGAGAGGGTGGGAAGGGGAAATAAGGGAGACCTTAATGTGCTGCAGGGAGGTAC
AAAGGGTTTGAGAAGGCTTATCAGGGAGTTGTCAGACCTGGTGGTGAAG GGCTCAGCATA
TGCAAACAGGGAAAGGCATGGTGTGAAGGGCTTTCTGGGTTTGTGGTTCCCTGGCACACAT
CTGGTATAAGATGCTGCTAGGGAAAGTAGACTGTGGGTCCATGGATTATAAGTGATGA
pARBI- GGCGCCCAATGCGCGTGCGCGGCGGCGTCGG

980: TCGGCTCTTTCTGCGAGCG
GGCGCGCGGGCGAGCGGTTGTGCTTGTGCTTGTGGCGCGTG
TRIM28_ GTGCGGGTTTCGGCGGCGGCTGAGGAAGAAGCGCGGGCGGCGCCTTCGGGAGGCGAGCA
3_Endoge GGCAGCAGTTGGCCGTGCCGTAGCAGCGTCCCGCGCGCGGCGGGCAGCGGCCCAGGAGG
nous_87 CGCGTGGCGGCGCTCGGCCTCGCGGCGGCGGCGGCGGCAGCGGCCCAGCAGTTGGCGGC
GAGCGCGTCTGCGCCTGCGCGGCGGGCCCCGCGCCCCTCCTCCCCCCCTG GGCGCCCCCGG
CGGCGTGTGAATGGCGGCCTCCGCGGCGGCAGCCTCGGCAGCAGCGGCCTCGGCCGCCTC
TGGCAGCCCGGGCCCGGGCGAGGGCTCCGCTtccggatccggagagggcaggggatctctccttacttgt ggcga cgtggaggaga a ccccgg ccccATG GTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTG GT
GCCCATCCTGGTCGAGCTGGACG GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGA
GGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAA
GCTG CCCGTGCCCTGG CCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAG CC
GCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGT
CCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAA
GTTCGAGG GCGACACCCTG GTGAACCG CATCGAGCTGAAGGG CATCGACTTCAAGGAG GA
CGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATG
GCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGAC
GGCAG CGTGCAG CTCGCCGACCACTACCAGCAGAACACCCCCATCGG CGACG GCCCCGTGC
TGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAA
GCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGAC
GAGCTGTACAAatgattgtttattgcagcttataatggttacaaataaagcaatagcatcacaaatttcacaaataa agcatttttttcactgcattctagttgtggtttgtccaaactcatcaatgtatcttatGGCGGCGAAAAGCGCTCCA
CCGCCCCTTCGGCCGCAGCCTCGGCCTCTGCCTCAGCCGCGGCGTCGTCGCCCGCGGGG GG
CGGCGCCGAGGCGCTGGAGCTGCTGGAGCACTGCGGCGTGTGCAGAGAGCGCCTGCGAC
CCGAGAG GGAGCCCCGCCTGCTGCCCTGTTTG CACTCG G CCTGTAGTGCCTGCTTAG GG CC
CGCGGCCCCCG CCGCCGCCAACAGCTCGGGGGACGGCGGGGCGGCGGGCGACGGCACCG
GTAAGTACGAAGTGATCGGTGCCACCCCTCCCCCTACTCTCTGCCTTTGATTCCGACTGGGT
GCAGAGATGAGGATGCCACCTGGGCGAGAGGATGGGGGCCCGGACAGGGCACGGGAAA
TACTTTCTGG GTCCTGCATACGAACGTGGGTTTGTGCTGGCCGCTGAGATGGGACATCTGA
CTAAAGTTGG
pARBI- ATCTCGGCCAATAAAGGAGAAAGG GCGCGGCCCGTACGCG

596: ACCAGCTCACGCCCCTCCTCCAGCCGCCAAGGCCCCGGCCCACAGCTGCCTG
GCTGCAGTC
AP RT_LE AGAAGCGTAGCCCGAGACAAGGAAGGGCGCCTTGACTCGCACTTTTGTCCGGTTCGAACGT
xogenous TCTGctcagtggtgcgtggaatgcgagcgcgtcttaaaatcgatggcgcctaggagtccatgaaatacggTACAGG

CTTCCGGCGACGGATGCCCCGCCCCTCACCCACGCTCCGCCCTCCGGGGATGCCCCACCCCT
CGTGGCGGTCCCGCCCGTCCCCGCGCAGGCGCGCTCGGGCTG CCGCTGGCTCTTCGCACGC
GGCCATGGCCGACTCCGAGCTGCAGCTGGTTGAGCAGCGGATCCGCAGCTTCCCCGACTTC
CCCACCCCAtga cg a ctgtg ccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttga ccctgga a ggtgcca ctccca ctgtcctttccta ata a a atgagga a attgcatcgcattgtctgagtaggtgtcattcta ttctggggg gtggggtggggcagga cagca agggggagg attgggaaga ca atagca ggcatgctggggatgcggtgggctctatg gGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTCCCCG
AGAAGTTG GGG G GAG G GGTCGG CAATTGAACG GGTGCCTAGAGAAGGTGGCG CGG G GT
AAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGGGGAGAACC
GTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAACAC
AG CTGAAG CTTCGAGG GG CTCGCATCTCTCCTTCACGCGCCCG CCGCCCTACCTGAG GCCG
CCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAACTGCGT

CCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCGCTCCCTTG
GAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTACGT
CTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTACTCTAGA
GCTAG CGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCC
ATCCTGGTCGAGCTGGACG GCGACGTAAACG GCCACAAGTTCAGCGTGTCCG GCGAGG GC
GAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTG C
CCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTAC
CCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGG
AG CGCACCATCTTCTTCAAGGACGACGG CAACTACAAG ACCCGCGCCGAGGTGAAGTTCGA
GGGCGACACCCTGGTGAACCG CATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGGCAA
CATCCTG G G G CACAAG CTG GAGTACAACTACAACAG CCACAACGTCTATATCATG G CCG AC
AAGCAGAAGAACGG CATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACG GCAGC
GTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGC
CCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCG
ATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCT
GTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatca ca aa tttca caa ata a agcattt ttttca ctgcattcta gttgtggtttgtcca a a ctcatca atgtatcttatGGCGTGGTATTCAGGTGCACGCAC
AG GCCGCCCTCGTG GCGCCCCGACCTG CGGGCCTACGGATGG GAG CGCGTGG CCCGCGAC
CTCCGGGCGGGCGGGGCGGGAACCCTCGTCTTTCGCCCCCGG GGCCCTGCCCTCCTTCGGC
CCCGGCGTCACCAGGCCTGTCCTTGGGTCCAGGGACATCTCGCCCGTCCTGAAGGACCCCG
CCTCCTTCCGCGCCGCCATCGG CCTCCTGGCGCGACACCTGAAGGCGACCCACGGG GGCCG
CATCGACTACATCGCAGGCGAGTGCCCAGTGGCCGCATCTAGGGCGCTTCCGCCTCTGCGC
GCGCCGAGGGCAGCACGTGGGCTCTGCGCGTCTGCTTGGGGGAGGGCCTTTGGGGTGCTT
CAGGGGGCGCCGGGACGGGCGCCGTGCTTGGGTCGCCCGGGAAGGGTTGTGAGATTGAG
CCC
pARBI- TGCCCCGCCCCTCACCCACGCTCCGCCCTCCGGGGATGCCCCACCCCTCGTGGCGGTCCCGC 2541 597: CCGTCCCCGCGCAGGCGCGCTCG
GGCTGCCGCTGGCTCTTCGCACGCGGCCATGGCCGACT
AP RT_2_E CCGAGCTGCAG CTG GTTGAGCAGCGGATCCGCAGCTTCCCCGACTTCCCCACCCCAGGCGT
xogenous GGTATTCAGGTGCACGCACAGGCCGCCCTCGTGGCGCCCCGACCTGCGGGCCTACGGATG

GGCGGGCGGGGCGGGAACCCTCGTCTTTCGCCCCCG
GGGCCCTG CCCTCCTTCGGCCCCGGCGTCACCAGG CCTGTCCTTGG GTCCAGGGACATCTC
GCCCGTCCTGAAGGACCCCGCCTCCTTCCGCGCCGCCATCGGCCTCCTGGCGCGACACCTG
AAGGCGACCCACGGGGGCCGCATCtgacgactgtgccttctagttgccagccatctgttgtttgcccctccccc gtgccttccttga ccctgga a ggtgccactccca ctgtcctttccta ata a aa tg agga a attgcatcgcattgtctgagta ggtgtcattctattctggggggtggggtggggcagg acagca agggggaggattggga agacaatagcaggcatgctg gggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACA
TCGCCCACAGTCCCCG AGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAG
AAGGTGGCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAG
GGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGT
TTGCCGCCAGAACACAG CTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCG
CCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTG CCGCCTCCCGCCTGTGGTG
CCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAG GTCGAGACCGGGCCTTTGT
CCG GCGCTCCCTTGGAG CCTACCTAGACTCAG CCGGCTCTCCACGCTTTGCCTGACCCTG CT
TG CTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTG CG C CGTTACAGATCCAAG CTGTGACC
GG CGCCTACTCTAGAGCTAG CGAATTgccgccaccATGGTGAGCAAG GGCGAGGAGCTGTTC
ACCGG GGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGC
GTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGC
ACCACCGG CAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACG GCGTGC
AGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCC
GAAGGCTACGTCCAGGAG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCG
CCGAG GTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACT
TCAAGGAGGACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACG
TCTATATCATG GCCGACAAGCAGAAGAACG GCATCAAGGTGAACTTCAAGATCCGCCACAA
CATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGA
CG GCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGAC

CCCAACGAGAAGCGCGATCACATG GTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTC
TCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcaca aatttca ca a ata aagcatttttttca ctgcattctagttgtggtttgtcca a a ctcatcaatgtatcttatGACTACATC
GCAGGCGAGTGCCCAGTGGCCGCATCTAGGGCGCTTCCGCCTCTGCGCGCG CCGAG GG CA
GCACGTGGGCTCTG CGCGTCTGCTTGGGGGAGGGCCTTTGGGGTGCTTCAGGGGGCGCCG
GGACGGGCGCCGTGCTTG GGTCGCCCGGGAAGGGTTGTGAGATTGAGCCCCCGAGGCCG
CCGCGCTGTGCAGGCGTCCTTCCCGCAGGTTCCGGGTCCCCAGCCCAGGACAGGCGTGACC
GAGTTGCCGGGTCAGTTGGTCTCCCTGGAGTGCCCAAGCTGAATCCACAGGGCCCAG CTGC
CTTGCTTCTTGTTCCTTCTG CGAG CTG GTATTG AG CG CCTG CCACGAG CCAG G CCTTCCCTG
GTGAAGATCACGGAATGCCCACCCAGGGAAGGGAGGCCTGGAGG CCTCCGGGAGAGCCC
AAGAGGTGG CCCAGG GAGA
pARBI- CGTTCTGctca gtggtgcgtgga atgcgag cgcgtctta a aatcgatggcgcctaggagtcca tg a a ata cggTAC 2541 598: AG GCTTCCG GCGACGGATGCCCCG CCCCTCACCCACGCTCCGCCCTCCG GGGATGCCCCAC
AP RT_3_E CCCTCGTGGCGGTCCCGCCCGTCCCCGCGCAGGCG CGCTCGGGCTGCCGCTGGCTCTTCGC
xogenous ACGCGGCCATGGCCGACTCCGAGCTGCAGCTGGTTGAGCAGCGGATCCGCAGCTTCCCCG

ACTTCCCCACCCCAGGCGTGGTATTCAGGTGCACGCACAGGCCGCCCTCGTGGCGCCCCGA
CCTGCGGGCCTACG GATGGGAGCGCGTGGCCCGCGACCTCCGGGCGGGCGGGGCGGGAA
CCCTCGTCTTTCGCCCCCGG GG CCCTG CCCTCCTTCGG CCCCG GCGTCACCAGGCCTGTCCTT
GGGTCCAGGtgacga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttg a ccctgga aggtgccactccca ctgtcctttccta ata a aatgagga a attgcatcgcattgtctgagtaggtgtcattctattctgggg ggtggggtggggca ggacagca a gggggaggattggga aga ca atagcaggcatgctggggatgcggtgggctctat ggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGG G CAGAGCGCACATCGCCCACAGTCCCC
GAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGGTGGCGCGGGG
TAAACTGG GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG GGAGAACC
GTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAACAC
AG CTGAAG CTTCGAGG GG CTCGCATCTCTCCTTCACGCGCCCG CCGCCCTACCTGAG GCCG
CCATCCACGCCG GTTGAGTCGCGTTCTGCCGCCTCCCG CCTGTGGTGCCTCCTGAACTG CGT
CCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCGCTCCCTTG
GAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTACGT
CTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTACTCTAGA
GCTAG CGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCC
ATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGG GC
GAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTG C
CCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTAC
CCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGG
AG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAG ACCCGCGCCGAG GTGAAGTTCGA
GGGCGACACCCTGGTGAACCG CATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGGCAA
CATCCTG GGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGG CCGAC
AAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGC
GTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGC
CCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCG
ATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCT
GTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatca ca aa tttca caa ata a agcattt ttttca ctgcattcta gttgtggtttgtcca a a ctcatca atgtatcttatGACATCTCG
CCCGTCCTGAAGGACC
CCGCCTCCTTCCGCGCCGCCATCGGCCTCCTGGCGCGACACCTGAAGGCGACCCACGGGGG
CCG CATCGACTACATCGCAGGCGAGTGCCCAGTGGCCGCATCTAG GGCGCTTCCGCCTCTG
CGCGCGCCGAGGGCAGCACGTGGGCTCTGCGCGTCTGCTTGGGGGAGGGCCTTTGGGGTG
CTTCAGGG GGCGCCGGGACGGGCGCCGTGCTTGGGTCGCCCGGGAAGGGTTGTGAGATT
GAGCCCCCGAGGCCGCCG CGCTGTGCAGGCGTCCTTCCCGCAGGTTCCGGGTCCCCAGCCC
AG GACAGGCGTGACCGAGTTG CCGGGTCAGTTGGTCTCCCTG GAGTGCCCAAGCTGAATC
CACAGGGCCCAGCTGCCTTG CTTCTTGTTCCTTCTGCGAGCTGGTATTGAGCGCCTGCCACG
A
pARB I- AGTAATTTGATGGGGGCTATTATGAACTGAGAAATGAACTTTGAAAAGTATCTTGG

599: AATCATGTAGACTCTTGAGTGATGTGTTAAG GAATGCTATG AGTG CTGAG AG G G CATCAGA
B2 M_LE AGTCCTTGAGAGCCTCCAGAGAAAGGCTCTTAAAAATGCAGCGCAATCTCCAGTGACAGAA

xogenous GATACTG CTAGAAATCTGCTAGAAAAAAAACAAAAAAGG CATG TATAGAGGAATTATG AG

GGAAGGTGGA
AG CTCATTTG G CCAG A GTG GAAATGGAATTGG G AG AAATCGATG ACCAAATG TAAACA CTT
GGTGCCTGATATAG CTTG A CACCAAG TTAG CCCCAAGTG AAATACCCTG G CAATATTAATG T
GTCTTTTCCCGATATTCCTCAGGTtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgt gccttccttga ccctgga a ggtgcca ctccca ctgtcctttccta ata a aa tg agga a attgcatcgcattgtctgagtag gtgtcattctattctggggggtggggtggggcagga cagcaagggggaggattgggaagacaatagcaggcatgctgg ggatgcggtgggctctatggGGATCTGCGATCGCTCCG GTG CCCGTCAGTGGG CAGAG CG CACAT
CGCCCACAGTCCCCGAGAAGTTG GGGG GAGG G GTCGGCAATTGAACGGGTGCCTAG AGA
AG GTGGCGCG GGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCG CCTTTTTCCCGAG
GGTG GGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGT
TTGCCG CCAGAACACAG CTGAAGCTTCGAG GGG CTCGCATCTCTCCTTCACGCGCCCGCCG
CCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTG CCGCCTCCCGCCTGTG GTG
CCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAG CTCAG GTCGAGACCGGGCCTTTGT
CCG GCGCTCCCTTGGAG CCTACCTAGACTCAG CCGGCTCTCCACGCTTTGCCTGACCCTG CT
TG CTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAG CTGTG ACC
GG CGCCTACTCTAGAGCTAG CGAATTgccgccaccATG GTGAGCAAG GGCGAGG AG CTGTTC
ACCGG GGTGGTG CCCATCCTG GTCGAGCTGGACG GCGACGTAAACG GCCACAAGTTCA GC
GTGTCCGG CGAGGG CGAG GGCGATGCCACCTACG GCAAGCTGACCCTGAAGTTCATCTG C
ACCACCGG CAAGCTGCCCGTG CCCTG GCCCACCCTCGTGACCACCCTGACCTACG GCGTG C
AGTGCTTCAGCCG CTACCCCGACCACATGAAGCAG CAC GACTTCTTCAAGTCCGCCATG CCC
GAAGG CTACGTCCAGGAG CGCACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCGCG
CCG AG GTGAAGTTCGAGG G CGACACCCTG GTGAACCGCATCGAGCTGAAGG GCATCGACT
TCAAGGAG GACG GCAACATCCTGG G GCACAAGCTGGAGTACAACTACAACAG CCACAACG
TCTATATCATG GCCGACAAGCAGAAGAACG GCATCAAGGTGAACTTCAAGATCCGCCACAA
CATCGAG GACGG CAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGG CG A
CG GCCCCGTG CTGCTGCCCGACAACCACTACCTG AGCACCCAGTCCAAGCTGAGCAAAGAC
CCCAACGAGAAGCGCGATCACATG GTCCTGCTGGAGTTCGTGACCGCCG CCG GGATCACTC
TCGGCATG GACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcaca aatttca ca a a ta aagcatttttttca ctgcattctagttgtggtttgtcca a a ctcatcaatgtatcttatACTCCAAAG
ATTCAGGTTTACTCACGTCATCCAG CAG AG A ATG GAAAGTCAAATTTCCTGAATTGCTATGT
GTCTGG GTTTCATCCATCCGACATTGAAGTTGACTTACTGAAGAATGGAGAGAGAATTGAA
AAAG TGG A G CATTCAG A CTTG TCTTTCA G CAAGG A CTG GTCTTTCTATCTCTTGTACTACACT
GAATTCACCCCCACTG AAAAAGATGAGTATGC CTGCCGTGTG AACCATGTGACTTTGTCACA
GCCCAAGATAGTTAAGTGGG GTAAGTCTTACATTCTTTTGTAAGCTG CTG AAA GTTGTG TAT
GA G TAGTCATATCATAAA G CTG CTTTG ATATAAAAAA G GTCTATG G CCATACTACCCTG AAT
GA G TCCCATCCCATCTGATATAAACAATCTG CATATTG G GATTGTCAGGGAATGTTCTTAAA
GATCA GA
pA RB I- AAGATCTTAATCTTCTGGGTTTCCGTTTTCTCGAATGAAAAATG

600: TG GCTG GGG CACCATTAGCAAGTCACTTAGCATCTCTGGG G
CCAGTCTGCAAAGCGAGGG
B2 M_2_E GGCAG CCTTAATGTGCCTCCAG CCTGAAGTCCTAGAATGAG CGCCCG GTGTCCCAAGCTGG
xogenous GGCG CGCACCCCAGATCGGAG GGCG CCGATGTACAGACAG CAAACTCACCCAGTCTAGTG
CATGCCTTCTTAAACATCACGAGACTCTAAGAAAAGGAAACTGAAAACGG GAAAGTCCCTC
TCTCTAACCTGGCACTGCGTCG CTGGCTTGGAGACAGGTGACGGTCCCTGCGGGCCTTGTC
CTGATTGGCTGGGCACGCGTTTAATATAAGTGGAGGCGTCGCGCTGGCGGGCATTCCTGAA
GCTGACAGCATTCGGGCCGAGATGtgacgactgtgccttctagttgccagccatctgttgtttgcccctccccc gtgccttccttga ccctgga a ggtgccactccca ctgtcctttccta ata a aa tg agga a attgcatcgcattgtctgagta ggtgtcattctattctggggggtggggtggggcaggacagcaagggggaggattggga agacaatagcaggcatgctg gggatgcggtgggctctatggGGATCTG CGATCGCTCCGGTGCCCGTCAGTG G GCAGAGCG CACA
TCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAG
AAGGTGGCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAG
GGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGT
TTGCCGCCAGAACACAG CTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCG
CCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTG CCGCCTCCCGCCTGTG GTG
CCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAG GTCGAGACCGGGCCTTTGT

CCG GCGCTCCCTTGGAG CCTACCTAGACTCAG CCGGCTCTCCACGCTTTGCCTGACCCTG CT
TGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACC
GG CGCCTACTCTAGAGCTAG CGAATTgccgccaccATGGTGAGCAAG GGCGAGGAGCTGTTC
ACCGGGGTGGTG CCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGC
GTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGC
ACCACCGG CAAGCTGCCCGTGCCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGC
AGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCC
GAAGGCTACGTCCAGGAG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCG
CCGAG GTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACT
TCAAGGAGGACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACG
TCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAA
CATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGA
CG GCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGAC
CCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTC
TCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcaca aatttca ca a a ta aa gcatttttttca ctgcattctagttgtggtttgtcca a a ctcatcaatgtatcttatTCTCGCTCC
GTGGCCTTAGCTGTGCTCGCGCTACTCTCTCTTTCTGGCCTGGAGGCTATCCAGCGTGAGTC
TCTCCTACCCTCCCG CTCTGGTCCTTCCTCTCCCGCTCTGCACCCTCTGTGGCCCTCGCTGTG C
TCTCTCGCTCCGTGACTTCCCTTCTCCAAGTTCTCCTTGGTGGCCCGCCGTGGGGCTAGTCCA
GGGCTGGATCTCGGGGAAGCGGCGGGGTGGCCTGGGAGTGGGGAAGGGGGTGCGCACC
CG GGACGCG CGCTACTTGCCCCTTTCGGCGGGGAGCAGGGGAGACCTTTGGCCTACGGCG
ACGGGAGGGTCGGGACAAAGTTTAGGGCGTCGATAAGCGTCAGAGCGCCGAGGTTGGGG
GAGGGTTTCTCTTCCGCTCTTTCGCGGG GCCTCTGGCTCCCCCAGCGCAGCTGGAGTGGGG
GACGGGTAGGCTCG
pARB I-601:
CTTAAAAATGCAGCGCAATCTCCAGTGACAGAAGATACTGCTAGAAATCTGCTAGAAAAAA
B2 M_3_E AACAAAAAAGGCATGTATAGAGGAATTATGAGGGAAAGATACCAAGTCACGGTTTATTCTT
xogenous CAAAATGGAGGTGGCTTGTTGGGAAG GTGGAAGCTCATTTGGCCAGAGTGGAAATGGAAT

TGGGAGAAATCGATGACCAAATGTAAACACTTGGTGCCTGATATAGCTTGACACCAAGTTA
GCCCCAAGTGAAATACCCTGGCAATATTAATGTGTCTTTTCCCGATATTCCTCAGGTACTCCA
AAGATTCAGGTTTACTCACGTCATCCAGCAGAGAATGGAAAGTCAAATTTCCTGAATTGCTA
TGTGTCTGGGTTTCATCCATCCtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgc cttccttgaccctgga a ggtgccactcccactgtcctttccta ata a a atgagga a attgcatcgcattgtctgagtaggtg tca ttctattctggggggtggggtggggcagga cagca agggggaggattggga ag a ca atagcaggca tgctgggga tgcggtgggctctatggGGATCTGCGATCG CTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGC
CCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGG
TGGCGCGGG GTAAACTGG GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTG
GG GGAGAACCGTATATAAGTG CAGTAGTCG CCGTGAACGTTCTTTTTCGCAACGGGTTTGC
CGCCAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCT
ACCTGAGGCCG CCATCCACG CCG GTTGAGTCG CGTTCTGCCGCCTCCCG CCTGTG GTGCCTC
CTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCG
GCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGC
TCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGC
GCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACC
GG GGTGGTG CCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTG
TCCGGCGAGGGCGAG GGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACC
ACCGG CAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT
GCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAA
GGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCG
AG GTGAAGTTCGAG G GCGACACCCTG GTGAACCGCATCGAG CTGAAGGG CATCGACTTCA
AG GAG GACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCT
ATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACAT
CGAGGACGG CAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGG CGACGG
CCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCC
AACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCG

GCATG GACGAGCTGTACAAa tg attgtttattgcagcttata atggtta ca a ata a agca atagcatcacaaatt tca ca a ata aag ca tttttttca ctgcattctagttgtggtttgtcca aa ctcatcaatgtatcttatGACATTGAAGTT
GACTTACTGAAGAATGGAGAGAGAATTGAAAAAGTGGAGCATTCAGACTTGTCTTTCAGCA
AG GACTG GTCTTTCTATCTCTTGTACTACACTGAATTCACC CCCACTGAAAAA GATGAGTAT
GCCTGCCGTGTGAACCATGTGACTTTGTCACAGCCCAAGATAGTTAAGTGGGGTAAGTCTT
ACATTCTTTTGTAAGCTGCTGAAAGTTGTGTATGAGTAGTCATATCATAAAGCTGCTTTGAT
ATAAAAAAGGTCTATG G CCATACTACCCTGAATGAGTCCCATC CCATCTGATATAAACAATC
TGCATATTGGGATTGTCAGGGAATGTTCTTAAAGATCAGATTAGTGGCACCTGCTGAGATA
CTGATGCACAGCATGGTTTCTG AACCAGTAGTTTCCCTGCAGTTGAGCAGGGAGCAGCAGC
AGCACTTG
pA R B I-602: AAGAGAACCCCCACCAGATCTGCACCCTCCCCTTCACGCGTGCACCCAGTCCAGGCTCCCTC
CAP N S1_ AAGCCCCACG GGTG CCTTTTAGACCTGAG GAG GTTGCAAACCTGATCCCCCATACCTGCCCC
1_Exogen ACCCATCCGCGGACAACCCGCCCTCGCAAACTCAGACCCCCACCCG GAGG CTTCAGATTCCT
ous_7 CCCAGGTCCAGCTGCCGGAAATGCGTGTTTGAAGGGAGGGTGTGGGCTCAGGGGCGAAG
CACCCACTGGTCCCCTTTTTTCCCCCCAGCAGTGAGTCGCAGCCATGTTCCTGGTTAACTCGT
TCTTGAAGGGCGGCGGCG GCGGCGG CGGG GGAGGCGGGGG CCTGGGTGGG GGCCTGGG
AAATGTGCTTG GAG G CCTGATCtga cga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgtg ccttccttga ccctgga aggtgcca ctccca ctgtcctttccta ata a a atga gga a attgcatcgcattgtctgagtaggt gtcattctattctggggggtggggtggggcagga cagcaagggggaggattgggaagacaatagcaggcatgctgggg atgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCG
CCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAG
GTGGCGCGG GGTAAACTG GGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGT
GGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTG
CCGCCAGAACACAG CTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCC
TACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCT
CCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCG GGCCTTTGTCC
GGCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTG
CTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGG
CGCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACC
GG GGTGGTG CCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTG
TCCGGCGAGGGCGAG GGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACC
ACCGG CAAGCTGCCCGTG CCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT
GCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAA
GGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCG
AG GTGAAGTTCGAG G GCGACACCCTG GTGAACCGCATCGAG CTGAAGGG CATCGACTTCA
AG GAG GACG GCAACATCCTGG GGCACAAG CTGGAGTACAACTACAACAG CCACAACGTCT
ATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACAT
CGAGGACGGCAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGG
CCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCC
AACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCG
GCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcacaaatt tca ca a ata aag ca tttttttca ctgcattctagttgtggtttgtcca aa ctcatcaatgtatcttatAGCGGGGCCGG
GGGCGG CGGCGGCGGCGGCGGCGGCGGCGGCGGTGGTGGAGGCGGCGGTGGCGGTGG
AACGGCCATGCGCATCCTAGGCGGAGTCATCAGCGCCATCAGGTAAGGCGGAGACTATCA
GAGGGG CGGGGCCTGGGAATGGGAGGAGCCTCAGTGAGG CGTGGTCTGG GAG G GGCGT
GGTCTAAAAATAGAATAGGATTAACCTGGAGGCTAACCTGGGTACATGAATTAGGCCGGG
GAGGCCTGGTTTGAGAGTTCTGCTGTAAGGGGGTGGACCCCAGTGAGGCGGATATCAGTC
ATTGGGGGCGGTGCTTGATATGGGAGTAGTCTGATTGTGTGGGACCAGGACAATGTGGTC
CTGAG GGGACTGATGTGGAGTTTTGGCG GGTGGG G CTTATGGGTCTGGGCTCAG CCTG CA
TTGGCTAACCTGGAGATGAACGTT
pA R B I- CGGGCATTTAG GAAGCGGGCCATG

603: TAAGGCCTTGG GAAGCTGGGGCAGAGCCTCTATGAAGTGAGCCCTCCAAGGGGCGGGGTC
CA P N S1_ TTTTGATTCTACGTGGGATTTTTTAGGTTTAGGGTGGCCAAGATGACTGAAATCTGCCACTG
GGTAGGTGTGCCTGGCAGGAGGGGAGCCTCCCAG GGGACCGGTCTCTGGGTTTCCTCGAG

2_Exogen GGTGGGGTTGGCCTGAGGAAGGGAGAAGAGGGG CACGACCAGG GCAGTGTGGATTGGG
ous_8 ACAGATGAGGACAAGAACAAATGAAAGGCACAGCAACCAAGTAAGGAAGATAACGGCTG

GG GTCTGGAGCGTTGGGGCTGATGGTTCTGTAGTGCTGCCCGTTG GAGGCCCCGCCCCTG
GCACTAACCCCTCCCCCTTATCTCTTCGCAGCtga cgactgtgccttctagttgccagccatctgttgtttgcc cctcccccgtg ccttccttga ccctgga aggtgccactc cca ctgtc ctttccta ata a aatgagga a attgcatcgcattg tctga gtaggtgtcattctattctggggggtggggtggggcagg acagca agggggaggattggga aga ca atagca g gcatgctggggatgcggtgggctctatggGGATCTGCGATCGCTCCGGIGCCCGTCAGTGG GCAGAG
CGCACATCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCC
TAGAGAAGGTG G CGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTG GCTCCGCCTTTTTC
CCGAG GGTGGGG GAGAACCGTATATAAGTG CAGTAGTCGCCGTGAACGTTC I I I I TCGCAA
CG GGTTTGCCGCCAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGCG CC
CGCCGCCCTACCTGAGGCCG CCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTG
TGGTGCCTCCTGAACTGCGTCCGCCGTCTAG GTAAGTTTAAAG CTCAGGTCGAGACCGGGC
CTTTGTCCGG CGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGAC
CCTG CTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTG CG CCGTTACAGATC CAAG CT
GTGACCGGCGCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAG GGCGAGGAG
CTGTTCACCG GGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACG GCCACAAG
TTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTC
ATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGG
CGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCA
TGCCCGAAGGCTACGTCCAG GAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGAC
CCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCAT
CGACTTCAAGGAGGACGG CAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCA
CAACGTCTATATCATG GCCGACAAGCAGAAGAACGG CATCAAGGTGAACTTCAAGATCCGC
CACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCG
GCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAG CTGAGCAA
AGACCCCAACGAGAAGCG CGATCACATGGICCTGCTGGAGTTCGTGACCGCCGCCGGGAT
CACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttata atggtta caa ata a agcaatag catca ca aatttca caa ata a agcatttttttca ctgca ttctagttgtggtttgtccaa a ctcatca atgtatcttatGAG
GCGG CTGCGCAGTACAACCCG GAG CCCCCG GTAAGCCCCCTCTGCAACCAGACCCCCTTCT
CCTGCCAAGGCCTCTTCGAGGTCCCATCCCTGTTCCTGTAGAGAAGCCCCACCTTCCTCCCCT
TCTTGTG AAATTCCTCTG CCAGTTCCTCCCATG CC GTGTCTG CA G CTTCG CCATG G GTCTTAG
CCATG CCCCACATACGTG CACCCCATTCACTACCACCCCTCATTCTTTTCCTCATCAAG CTG CC
CAG CCCACTCTGACTTCCCCACCCAGGGTACCTGGGTTTG GGGAGCCGTCCTGG CCGGGTT
CCCCTCCCCCTGCTCTGAGCTCTCCTCCCTTTGCAGCCCCCACGCACACATTACTCCAACATT
GAGGCCAACGAGAGTGAGGAGGTCCGGCAGTTCCGGAGACTCTTTG CCCAGCTGGCTG GA
GATGTAAGTAAC
pARBI- GCTGATGGTTCTGTAGTGCTGCCCGTTGGAGGCCCCGCCCCTGGCACTAACCCCTCCCCCTT 2541 604! ATCTCTTCGCAGCGAGGCGGCTGCGCAGTACAACCCGGAGCCCCCGGTAAG
CCCCCTCTGC
CAP N 51_ AACCAGACCCCCTTCTCCTGCCAAGGCCTCTTCGAGGTCCCATCCCTGTTCCTGTAGAGAAG
3_Exogen CCCCACCTTCCTCCCCTTCTTGTGAAATTCCTCTG CCAGTTCCTCCCATGCCGTGTCTGCAG CT
ous_9 TCGCCATGGGTCTTAGCCATGCCCCACATACGTGCACCCCATTCACTACCACCCCTCATTCTT
TTCCTCATCAAGCTGCCCAGCCCACTCTGACTTCCCCACCCAGG GTACCTGG GTTTGGGGAG
CCGTCCTGGCCGGGTTCCCCTCCCCCTGCTCTGAGCTCTCCTCCCTTTGCAGCCCCCACGCAC
ACATTACTCCAACtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttgaccc t ggaaggtgcca ctccca ctgtcctttcctaata a a atgagga a attgcatcgcattgtctga gtaggtgtca ttctattctg gggggtggggtggggcaggacagcaagggggaggattgggaagacaatagcaggcatgctggggatgcggtgggct ctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTC
CCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGGTGGCGCGG
GGTAAACTGGGAAAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCCGAGG GTG GGGG AGA
ACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAA
CACAGCTGAAGCTTCGAGGGG CTCGCATCTCTCCTTCACG CGCCCGCCGCCCTACCTGAGG
CCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAACTG
CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCGCTCCC
TTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTA

CGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTACTCT
AGAGCTAGCGAATTgccgccaccATGGTGAGCAAG GGCGAGGAGCTGTTCACCGG GGTG GT
GCCCATCCTGGTCGAGCTGGACG GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGA
GGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAA
GCTG CCCGTGCCCTGG CCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAG CC
GCTACCCCGACCACATGAAG CAG CACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGT
CCAGGAGCGCACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCGCGCCGAGGTGAA
GTTCGAGG GCGACACCCTG GTGAACCG CATCGAGCTGAAGGG CATCGACTTCAAGGAG GA
CG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAG CCACAACGTCTATATCATG
GCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGAC
GGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGC
TGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAA
GCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGAC
GAGCTGTACAAatgattgtttattgcagcttata atggttaca a ata a agca ata gcatca ca a atttca ca aata a agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatctta tATTGAGGCCAACGAGAGTG
AG GAGGTCCGG CAGTTCCGGAGACTCTTTGCCCAGCTGGCTGGAGATGTAAGTAACCTGG
GGTCCCTGGCCCCGTCCTAACCGTTCCATCCCTTCCCTTGTG GCTGCCCTTGCACACACACCC
TTGACCATGACAATCCCAGTGTTCCCATTCTCCATGACATTCTCAGACCCCTTTCAGTCACCC
CTGACCTGCCCCTAACTTCCGCCCGCAGGACATG GAGGTCAGCG CCACAGAACTCATGAAC
ATTCTCAATAAG GTTGTGACACGACGTAAGTGACCGGG GTTAAGGAATAGGGTAGATTCA
GAGGCAGAGGGGTCAGAGAGGATTTGACCTCTGGCCTCTGACTTTCAACCTGTTACCCACA
GACCCTGATCTGAAGACTGATGGTTTTGGCATTGACACATGTCGCAGCATGGTGGCCGTGA
TG
pARB I- ggagagttatagcaga tggaga ca a a a aaggga a ctgctggggatca a a ccttgta a agtccttcta agtgatatatga 2541 605: gga ctttga cttttattcaa agttaga aggctttga a ca aaagaattatttcatttgtgtttta ca atggtcacaagtgccg CBLB_1_E t a gcgag a a ta gg ctgTGTAGAGAAGAG
ATTTTGTGGTTTTCTTTTTTGTTCTCTGTATTCTCTTCC
xogenous TGAAAAATCTGTTTTCTGCCTAATCATATTTCTGTAAAGAACAATAGACTTGAACGTCTGCTG
TTAAGTAACAAAGGTTGACCTAAACCACATAAGGTCAGATTAACTTTTAAAATATCCAAGAT
AATGTTAAGTGTATTAAATATGGTTCAGTATGTAATCAAAATATTTGATATTCTCTAGATACT
AATCTCtttt a a ttttttta ttttttAG CCTTG Gtg a cga ctgtgc cttcta gttg cc a gcca tctgttg tttgcccctccc ccgtgccttccttga ccctgga a ggtgcca ctccca ctgtcctttccta ata a aa tgagga a attgcatcgcattgtctgag taggtgtcattctattctggggggtggggtggggcaggacagcaagggggaggattggga agacaatagcaggcatgc tggggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCAC
ATCGCCCACAGTCCCCGAGAAGTTGG GGGGAGGGGTCGGCAATTGAACGGGTGCCTAGA
GAAGGTGGCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCCG A
GGGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGG
GTTTGCCGCCAGAACACAGCTGAAGCTTCGAG GGGCTCGCATCTCTCCTTCACGCGCCCGC
CGCCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGG
TGCCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAG CTCAGGTCGAGACCGGGCCTTT
GTCCGGCG CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTG CCTGACCCTG
CTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGA
CCGGCGCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGT
TCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGG CCACAAGTTCA
GCGTGTCCGGCGAGGGCGAGGG CGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCT
GCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGT
GCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGC
CCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCG
CGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGA
CTTCAAG GAG GACG GCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAA
CGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCAC
AACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGC
GACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAG
ACCCCAACGAGAAGCGCGATCACATGGTCCTG CTG GAGTTCGTGACCG CCGCCG GGATCAC
TCTCGG CATG GACGAGCTGTACAAatgattgtttattgcagcttata atggtta caa ata a agcaatagcatc aca aatttca caa ata a a gcatttttttca ctgca ttctagttgtggtttgtccaa a ctcatca atgtatcttatGGCTCT

ATTTTGCGGAATTGGAATTTCTTAGCTGTGACACATCCAGGTTACATGGCATTTCTCACATAT
GATGAAGTTAAAGCACGACTACAGAAATATAGCACCAAACCCGGAAGGTAAGACTTTTTAG
CAGTAATATTATGTGATATATCCAAAGATGAAAAATTCTCCATGGTTTGTTCCTAAGTAAGT
CAGCAATACCCGCTGATGGCATGCTTGGGAGGGGGCTAGAAAAGGTAAAAGACCAAAATG
GTGCTTCTTAACATAATGTTTTAAATTGATACATTGTGCTTACTCTGAAACTTGGTATTCATA
AAGTG AAAG G G GATATG GTCTCCTTTCTGTTTTCATGATATCCGTGTTTATGTG CAAAATTG
AATGTATATATTTGAATACTTGAATACTGTGCTAAAGAACAAATAAATAGAACAGTCTTATG
TTCATTACTTT
pARB I- CATATTACTCATTTGTGCAATGAATAAAGGATTTCTATTATTATATG

606:
ACTGTTTTTAGAAAATGAGAATTAAATCTTGATTCATTATTAAATAACCTTGTATACACACAT
CBLB_2_E GAAGAGGTCATGACACATAAGACTTTGACCTAATACCTTTTTTCAGATATGTTTCTATAGCTA
xogenous TCTTGATAGCCTAGGACTGTTTGAGAGAAAATTAATTTAAATTTTTGTCATATTCTTAGCAAA

TGCAAAATAATTGAGTTAAACAGAACCCACTACATTTGTTCTGAAATATGAACAATATTCTC
AAATATTTGTCTGGTAGTACAAATACAATTATATTAATTTTATTATTGAATTTTGCTGCTTCAA
AG G GAG GTATTTCTTAAATGATACTTGATTCAATTATTTCCCTTTTTTTCCACCTTG CACAGA
ACTATCGTAtga cg a ctgtg ccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttga ccctgga a ggtgcca ctccca ctgtcctttccta ata a a atgagga a attgcatcgcattgtctgagtaggtgtcattcta ttctggggg gtggggtggggcaggacagcaagggggaggattgggaagacaatagcaggcatgctggggatgcggtgggctctatg gGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTCCCCG
AGAAGTTG GGG G GAG G GGTCGG CAATTGAACG GGTGCCTAGAGAAGGTGGCG CGG G GT
AAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGGGGAGAACC
GTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAACAC
AG CTG AAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCTGAGGCCG
CCATCCACGCCG GTTGAGTCGCGTTCTGCCGCCTCCCG CCTGTGGTGCCTCCTGAACTG CGT
CCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCGCTCCCTTG
GAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTACGT
CI!! GTTTCGTITTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGG CGCCTACTCTAGA
GCTAG CGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCC
ATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGG GC
GAGGGCGATGCCACCTACGGCAAGCTGACCCTG AAGTTCATCTGCACCACCGGCAAGCTG C
CCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTAC
CCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCG AAGGCTACGTCCAGG
AG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAG ACCCGCGCCGAGGTGAAGTTCGA
GGGCGACACCCTGGTGAACCG CATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGGCAA
CATCCTG GGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGAC
AAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGC
GTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGC
CCGACAACCACTACCTGAGCACCCAGTCCAAG CTGAGCAAAGACCCCAACGAGAAGCGCG
ATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCT
GTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatca ca aa tttca caa ata a agcattt ttttca ctgcattcta gttgtggtttgtcca a a ctcatca atgtatcttatCCATG
GAAAGTATTCAGACAGTG C
CTTCATG AG GTCCACCAGATTAG CTCTG G CCTG G AAG CAATG G CTCTAAAATCAACAATTG A
TTTAACTTGCAATGATTACATTTCAGTTTTTGAATTTGATATTTTTACCAGGCTGTTTCAGGTA
AGTTTATACTTGCTTAGTCTATCCATTCCCTtcttctctctccctgtcttttctctctgtttctctctctctctgtct ctctctca ca ca c a ca cacaTAAACAAACATATGGAAAATAGTGTCATTCCTATTAGTCTGTAATTT
TATTGGTATGTTTGGTCATTGTTAAAAGTTTAGAGTGAGCCATTTGGCTTAAGTTTGGAGTT
ATCAGATG CTGTGAGCCTGGTGGCTGCTTGTTAGCTTTGAGAGGATCCATCCAAAAGAG CT
TTGTTTTAAACTCTCTACTTAGTTTCTTACTT
pARB I- TTAAACTTTAGAG GTTCAAAATAAGGATATGATATAG AAAAGATTGAATG

607:
GAATAGGATTTATTTATATTATGACAAACGTGACTATCAGTTAATAAATTGGAGATAGTAAG
CBLB_3_E TTAAGATCATTATTTATATTATGGAAAAATGAAGAAATAG GACATG GTTCTCTAACTTTTTAA
xogenous TTTTTCTGTGAAATAAAATATTATGACTTATTTTATTTTTATCTTTGTTTTTAGGTAAGACTGT

GCCAAAATCCCAAACTTCAGTTGAAAAATAGCCCACCATATATACTTGATATTTTGCCTGATA
CATATCAGCATTTACGACTTATATTGAGTAAATATGATGACAACCAGAAACTTGCCCAACTC
AGTGAGAATGAGTACTTTAAAATCTACATTGATAGCCTTATGAAAAAGTCAAAACGGGCAA

TAAGACTCTTTAAAtga cga ctgtgccttctagttgcca gccatctgttgtttgcccctcccccgtgccttccttga cc ctgga aggtgcca ctccca ctgtcctttccta a ta aaatgagga aattgcatcgcattgtctgagtaggtgtcattctattc tggggggtggggtggggcaggacagca agggggaggattgggaagacaatagcaggcatgctggggatgcggtggg ctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGG GCAGAGCGCACATCGCCCACAGT
CCCCGAGAAGTTGGGGGGAGGG GTCGGCAATTGAACGGGTGCCTAGAGAAGGTGGCGCG
GGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAG GGTGGGG GAG
AACCGTATATAAGTGCAGTAGTCGCCGTGAACGTICTTTITCGCAACGG GTTTGCCGCCAGA
ACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCTGAG
GCCGCCATCCACGCCG GTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAACT
GCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAG GTCGAGACCGGGCCTTTGTCCGGCGCTCC
CTTG GAG CCTACCTAGACTCAG CCG G CTCTCCACG CTTTG CCTGACCCTGCTTG CTCAACTCT
ACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCG GCGCCTACTC
TAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGT
GCCCATCCTGGTCGAGCTGGACG GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGA
GGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAA
GCTG CCCGTGCCCTGG CCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAG CC
GCTACCCCGACCACATGAAG CAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGT
CCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAA
GTTCGAGG GCGACACCCTG GTGAACCG CATCGAGCTGAAGGG CATCGACTTCAAGGAG GA
CG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAG CCACAACGTCTATATCATG
GCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGAC
GGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGC
TGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAA
GCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGAC
GAGCTGTACAAatgattgtttattgcagcttata atggttaca a ata a agca ata gcatca ca a atttca ca aata a agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatctta tGAAG G CAAG
GAGAGAATG
TATGAAGAACAGTCACAG GACAGGTAAGAAGAATATTTCAGATGITTTGGIGTAAAGGTCA
TTTATGTTGCTTTTTACTTAATAGTTAACCTAAACGTCACCATGTAATTGTTTTGGGGTAAAT
GTGAGTCG CTTTGTATATagtcatgtggta ctta a cgatggggata cttta ca agaaatgcatttttagacaattt cattgttgtgtggacatgatagagtgtacttacacagagctagatggtattacctaccgcacacctaggctgtatgatg ta gcctattgctcctaggctgca a a tcta ta cagtatgtta ctgtactgaata ctgtgggcagttgta a cacaatggtaagta tgtgtgtatcta a atata ccta a a catagaa a aggtatggta aa a ata ctgtata a a aggc cgggcgtgg pA R B I-608: CACACTCATAAACACATCTGCTTTG G CAAAG GAG CACATCAGAAG GG CTG G
CTTGTG CG CG
CD2_1_Ex CTCTTG CTCTCTGTGTATGTG TATTATGTTTTATGTTACTGTAAAAGATGTAAAG AGAG G CA
ogenous_ CGTGGTTAAGCTCTCGGGGTGTGGACTCCACCAGTCTCACTTCAGTTCCTTTTGCATGAAGA
13_14_15 GCTCAGAATCAAAAGAGGAAACCAACCCCTAAGATGAGCTTTCCATGTAAATTTGTAGCCA
GCTTCCTTCTGATTTTCAATGTTTCTTCCAAAGGTAAGCATAAGAGTCAAAGAAGTCCCAAC
CCAG CTITCCCTGAAA GTG ACTCTCAGTAACTCTTTTG CTITTTATAG GIG CAGTCTCCAA AG
AGATTACGAATGCCTTGtgacga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttcctt ga ccctgga aggtg cca ctccca ctgtcctttcctaata a a atgagga a attgcatcgcattgtctga gtaggtgtca ttc tattctggggggtggggtggggca ggacagca a gggggagga ttggga aga ca atag ca ggcatgctggggatgcgg tgggctctatggGG ATCTGCGATCGCTCCGGTGCCCGTCAGTGG G CAGAGCGCACATCGCCCAC
AGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGG GTGCCTAGAGAAG GTG GC
GCGGGGTAAACTGGG AAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGGG
GAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGC
CAGAACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCT
GAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTG CCGCCTCCCGCCTGTGGTGCCTCCTG
AACTG CGTCCG CCGTCTAG GTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGG CG
CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAA
CTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCT
ACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGG
TGGTGCCCATCCTGGTCGAG CTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG
GCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCG
GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTC

AG CCG CTACCCCGA CCACATG AAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGG CT
ACGTCCAG GAGCGCACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCGCGCCGAGGT
GAAGTTCGAGG G CGACACCCTGGTGAACCGCATCGAG CTGAAGGGCATCGACTTCAAG GA
GGACG GCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATC
ATG GCCGACAAG CAGAAG AACGGCATCAAGGTGAACTTCAAGATCCG CCACAACATCGAG
GACGG CAGCGTG CAG CTCGCCGACCACTACCAG CAGAACACCCCCATCGGCGACG GCCCC
GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAG CTGAG CAAAGACCCCAACG
AG AAGCGCGATCACATGGTCCTG CTGGAGTTCGTGACCG CCG CCG G GATCACTCTCGGCAT
GGACGAG CTGTACAAatgattgtttattgcagcttataatggttacaa ata a agcaatagc atca ca a atttc a ca aata a agcatttttttca ctgcattctagttgtggtttgtcca a actcatca atgtatcttatGAAACCTGGG GTGCC
TTG G G TCAG G ACATCAA CTTGG A CATTCCTAG TTTTCAAATG AGTG ATG ATATTG ACG ATAT
AAAATG G G AAAAAACTTCAG ACAAG AAAAAG ATTG CACAATTCAG AAAAG AG AAAG A GA C
TTTCAAG GAAAAAGATACATATAAGCTATTTAAAAATG GAACTCTGAAAATTAAG CATCTG A
AG ACCG AT GATCA G G ATATCTACAAG GTATCAATATATGATACAAAAG GAAAAAATGTGTT
GGAAAAAATATTTGATTTGAAGATTCAAGGTAAGTGTTCATTCCCTTAATTG CTTTATTTCAG
TGTG G GTG CTATTTGGCAAGTTGGAAAATAGCATTTCTAATATTCCCCAGCGCTGACCTCTG
CCTCCAG GGGG GCTATACAGGAAACCAGATGCATGTCTCCTGCCCCAGTG GAGACTGTGGT
CCAA
pA RB I- AG G ATTG TTCTAGTTG ATTGGTATG TGTG CA

609: CTG G ATATA CTCAACAAATATTTG TTG A GCCAAATACTCAACACCAG
CCAAACACG TAGTAT
C D3 El TTACTTTAGCTTAAG CGAATTATTTAG CCCTG A CAG AA G CCCTGG AATGTG
G GTCTTTAA GT
Exogenou TCCTATTTTTGAGATGGGAAAG CTGAGGCTCACGG AAGGAGGTGACCAG CTCAAGTCTCCT
s_16 ACCGTCCATGCCAAATTAGAATTCCAGCCTG
CCTCCTGACTTCAAGTCCAAAGTTCTTCCCAC
GCACTAAAGCTAGCTCTTCAGTGTCCTTTCTTAG GAGGTACTTCCTCCCGCACCACTGACCG
CCCCCTCTCTATTTCACCCCCAG CCCATCCGGAAAGGCCAGCGGGACCTGTATTCTG G CCTG
AATCAGAGACGCATCtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttga ccctgga aggtgcca ctccca ctgtcctttccta a ta aaatgagga aattgcatcgcattgtctgagtaggtgtcattctat tctggggggtggggtggggcaggacagca aggggga ggattggga agacaa ta gcaggcatgctggggatgcggtgg gctctatggGGATCTGCGATCGCTCCG GTGCCCGTCAGTG GGCAGAGCGCACATCGCCCACAG
TCCCCGAGAAGTTGGGGGGAGGGGTCG GCAATTGAACGGGTGCCTAGAGAAGGTGGCG C
GGGGTAAACTG G GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG GGA
GAACCGTATATAAGTGCAGTAGTCG CCGTGAACGTTCTTTTTCGCAACG G GTTTG CCG CCA
GAACACAGCTGAAGCTTCGAG GGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCTG
AG GCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAA
CTGCGTCCGCCGTCTAG GTAAGTTTAAAGCTCAGGTCGAGACCG GGCCTTTGTCCGGCG CT
CCCTTG G AG CCTACCTAG ACTCAG CCG G CTCTCCACG CTTTGCCTGACCCTGCTTG CTCAA CT
CTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTAC
TCTAGAG CTAG CGAATTgccgccaccATGGTGAG CAAGG GCGAGGAGCTGTTCACCGGG GTG
GTGCCCATCCTGGTCGAGCTGGACGG CGACGTAAACGGCCACAAGTTCAGCGTGTCCGGC
GAGG GCGAG GGCG ATG CCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGG C
AAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCA
GCCGCTACCCCGACCACATGAAG CAG CACGACTTCTTCAAGTCCGCCATGCCCGAAG GCTA
CGTCCAG GAGCGCACCATCTTCTTCAAGGACGACG G CAACTACAAGACCCG CGCCGAGGTG
AAGTTCGAGGG CGACACCCTGGTGAACCGCATCGAG CTGAAG GGCATCG ACTTCAAGG AG
GACGGCAACATCCTGG GGCACAAG CTG GAGTACAACTACAACAGCCACAACGTCTATATCA
TG GCCGACAAGCAGAAGAACG GCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG G
ACG GCAGCGTGCAG CTCGCCGACCACTACCAG CAGAACACCCCCATCG GCGACG GCCCCGT
GCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAG ACCCCAACG AG
AAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCG GCATGG
ACGAGCTGTACAAatgattgtttattgca gcttata a tggtta ca aata a agca atagca tca c a a atttca ca a at aa agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatcttatTGACCCTCTG
GAG AACAC
TGCCTCCCGCTG GCCCAGGTCTCCTCTCCAGTCCCCCTGCGACTCCCTGTTTCCTG G GCTAGT
CTTGG ACCCCACGAGAG AG AATCGTTCCTCAG CCTCATGGTGAACTCG CG CCCTCCAGCCT
GATCCCCCG CTCCCTCCTCCCTGCCTTCTCTG CTGGTACCCAGTCCTAAAATATTGCTGCTTCC
TCTTCCTTTGAAGCATCATCAGTAGTCACACCCTCACAGCTGGCCTGCCCTCTTGCCAG GATA

TTTATTTGTGCTATTCACTCCCTTCCCTTTGGATGTAACTTCTCCGTTCAGTTCCCTCCTTTTCT
TG CATGTAA GTTG TCCCCCATCCCAAAGTATTCCATCTA CTTTTCTATC G CC GTCC CCTTTTG C
AG CCCTCTCTGGGGATG GACTGG GTAAATGTTGACAGAG GCCCTGCCCCGTT
pARB I- AATAATAAAATAATAA CAATACTTAACATTTATTG A GTG CTTATTAAG TCTCA

610: GTA CCCAA CACTTATCAA G G ATTCTTTTTCATG TAATC CTCTCAACAACTATATG G G
TTAA GT
C 03 E_2_ ATCATTTTATTCCCATGAGTAAAGG GATG AGGAAACAG AGGGTTTGTGAGTTGAAAACACA
Exogenou TTTCACGCTTCTCACAGCTAGTGAGTAATAAAGCTGG GACTCAAACCCAGGGCTGTTTGACT
s_17_18 CCAGTGCCTCTACCCACGGCCACCACTCTTTGCTTGTCAATGTTGTTCTAAACATATTGAAGG
GG GGGCTCTGACCGTGGCAAGCGTGTGAGTAGTAAGGGGAGAATGGCCTTCATGCACTCC
CTCCTCACCTCCAGCGCCTTGTGTTTTCCTTGCTTAGTGATTTCCCCTCTCCCCACCCCACCCC
CCACAGTGTGTGAG tga cga ctgtgccttctagttgccag cc atctgttgtttgcccctcc cc cgtg ccttccttga c cctggaaggtgccactcccactgtcctttccta ataa a atgaggaa attgcatcgcattgtctgagtaggtgtcattctatt ctggggggtggggtggggcaggacagcaagggggaggattgggaaga caatagcaggcatgctggggatgcggtgg gctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAG
TCCCCGAGAAGTIGGGGGGAGGGGICGGCAATTGAACGGGIGCCTAGAGAAGGIGGCGC
GGGGTAAACTG G GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGA GGGTGGG GGA
GAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCG CCA
GAACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCTG
AG GCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAA
CTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCG GGCCTTTGTCCGGCG CT
CCCTTG G AG CCTACCTAG ACTCAG C CG G CTCTC CACG CTTTG CCTG ACC CTG CTTG CTCAA
CT
CTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTAC
TCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTG
GTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGC
GAGGGCGAGGGCG ATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGC
AAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCA
GCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTA
CGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTG
AAGTTCGA GGG CGACACCCTGGTGAACCGCATCGAG CTG AAG GGCATCG ACTTCAAGG AG
GACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCA
TGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGG
ACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGT
GCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAG ACCCCAACG AG
AAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGG
ACGAGCTGTACAAatgattgtttattgca gcttata a tggtta ca aata a agca atagca tca c a a atttca ca a at aa agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatcttatAACTG
CATGGAGATGGA
TGTGATGTCGGTGGCCACAATTGTCATAGTGGACATCTGCATCACTGGGGGCTTGCTGCTG
CTGGTTTACTACTGGAGCAAGAATAGAAAGGCCAAGGCCAAGCCTGTGACACGAGGAGCG
GG TG CTGG CGG CAG G CAAAGG G GTAAGG CTGTG GAGTCCAG TCA GAG GAGATTCCTG CC
AAGGGGGACGACCAGCCTGGGCCAG GGTGGGTGGCAAGTCCACAGCTAGGTCAGAACAG
CTTCTCTAG AG CTTCTATG CACAG CTT CTATTACTGTG ATG ACA AG ATCTCAA CAG ACG GTTT
CAAATCTCACATCACTCCCCTCCTTCCCATCCTAGAAAAGTGCAAAAAAGTTTATGAAAGTG
ATG G G CTTCCTCACATACCTGTCAATG CCTG CAGTCATCC GATTCCG CC CCTAA G CTGTG G G
AAGAGAG
pARB I-611: ATCCTTCTCTTTAG TTCATCTATTCTAC CCAAAG TG ATCTCATCATCTG GTATG CTG TTAG
CA G
CD3G_1_ TTTCTTACCTGTATA GTATCTTC CAAATAACATG C CC CAAAATCCCAAA GTTTTA CC
CCTACTA
Exogenou ATTACAGCAATGTCTCTTTTATTCTTCACCCCCTGACGCAGATATTGGCGTCACCCGAGAGC
s_19_20 ATGTTAGTAATG CAG AATCTC C CCTCCCCA G AACTACTAAATAG CACCTGAAATTTTAA CAA
GATCCCCATGTGATTCATGTGCACATCAAAGTTTGAG AAACACTACTCTAATGATCTCCTGG
TATG CAG AAG CAG G G AG AATTTCAGA G G CAA GATCCTTAATAG AAC CACG G CTTTTCTCAT
TTCAGGAAACCACtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttga cc c tggaaggtgccactcccactgtcctttccta ata a a atgagga a attgcatcgcattgtctgagtaggtgtcattctattct ggggggtggggtggggcagga cagca agggggaggattggga a ga ca atagcaggcatgctggggatgcggtgggc tctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTC

CCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGGTGGCGCG G
GGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGG GTG GGGG AGA
ACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAA
CACAGCTGAAGCTTCGAGGGG CTCGCATCTCTCCTTCACG CGCCCGCCGCCCTACCTGAGG
CCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAACTG
CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCGCTCCC
TTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTA
CGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTACTCT
AGAGCTAGCGAATTgccgccaccATGGTGAGCAAG GGCGAGGAGCTGTTCACCGG GGTG GT
GCCCATCCTGGTCGAGCTGGACG GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGA
GGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAA
GCTG CCCGTGCCCTGG CCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAG CC
GCTACCCCGACCACATGAAG CAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGT
CCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAA
GTTCGAGG GCGACACCCTG GTGAACCG CATCGA GCTGAAGGG CATCGACTTCAAGGAG GA
CG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAG CCACAACGTCTATATCATG
GCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGAC
GGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGC
TGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAA
GCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGAC
GAGCTGTACAAatgattgtttattgcagcttata atggttaca a ata a agca ata gcatca ca a atttca ca aata a agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatctta tTTGGTTAAGGTGTATGACTA
TCAAGAAGATGGTTCGGTACTTCTGACTTGTGATGCAGAAGCCAAAAATATCACATGGTTTA
AAGATGG GAAGATGATCG GCTTCCTAACTGAAGATa a a aa a a a aTG GAATCTGG GAAGTAAT
GCCAAGGACCCTCGAGGGATGTATCAGTGTAAAGGATCACAGAACAAGTCAAAACCACTCC
AAGTGTATTACAGAAGTATGTAATCCCCTTTGGTCTGTTTGTTGTGAAATTAATCAGTATTTG
CTGITCTGGTGAGCTITTTATCTG G G GTGAAAGTG GAAATAGATCCTCAACAGTAATATTAT
CGCCTGTTCTCTTAATTTCAGCTTGCCTCTTTTAAAATACTGTAAGATACTTCCCTCACCCTAT
TGAAAAACTACAG C CAGTCCTGTAAAATTTTGTTTACCTTTG G GTG GG CTCCATG G
pARB I- TTCGAGACAAG CCTGG

612: ACCAGGCGTGGAGGCGCGCGCCTGTAGTCCCAGCTACTCAGGAGGCTGAGGCAGGAGGA
CD3G_3_ CCACTTGAACCCAGGAG GTCGAGGTTGCAGTGAGCTGCGATTGTGCCACTGCACTCCAGCC
Exogenou TGGGCAACAGAGAAAGACTCCGTCTCAAAAAAAAAAAAGAGAGAGAGAGAAAAAGAAAA
s_21 AAGACAG AG CCTCCATCTCCTTGTCCTCTTTCCATCCTCAG
GACCATGAAGTACCCACTCCAA
ATTCTCACATATAAAAAACATTCAATAAACATGCATCAAATTAATTAATAGAGGATGGAAAA
AATGACTTATGACTGTGCTGTCCTTTCCAGCCCCTCAAGGATCGAGAAGATGACCAGTACAG
CCACCTTCAAGGAAACCAGTTGtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgc cttccttgaccctgga a ggtgccactcccactgtcctttccta ata a a atgagga a attgcatcgcattgtctgagtaggtg tcattctattctggggggtggggtggggcaggacagcaagggggaggattgggaag a ca atagcaggca tgctgggga tgcggtgggctctatggGGATCTGCGATCG CTCCGGTGCCCGTCAGTGG GCAGAG CGCACATCG C
CCACAGTCCCCGAGAAGTTG G GGG GAGGG GTCG GCAATTGAACGGGTGCCTAGAGAAGG
TG GCG CGGG GTAAACTGG GAAAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCCGAGGGTG
GG GGAGAACCGTATATAAGTG CAGTAGTCG CCGTGAACGTTCTTTTTCGCAACG GGTTTGC
CG CCAGAACACAG CTGAAGCTTCGAGG GGCTCG CATCTCTCCTTCACG CG CCCG CCGCCCT
ACCTGAG G CCG CCATCCACG CCG GTTGAGTCG CGTTCTGCCG CCTCCCG CCTGTG GTGCCTC
CTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAG CTCAGGTCGAGACCGG GCCTTTGTCCG
GCGCTCCCTTGG AG CCTACCTAGACTCAG CCGGCTCTCCACGCTTTGCCTGACCCTG CTTGC
TCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTG CG CCGTTACAGATCCAAG CTG TG A CCG GC
GCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAG CAAGGGCGAGGAGCTGTTCACC
GG GGTG GTG CCCATCCTG GTCGAG CTGGACGGCGACGTAAACG G CCACAAGTTCAGCGTG
TCCGGCGAGG GCGAG GGCGATG CCACCTACG GCAAGCTGACCCTGAAGTTCATCTGCACC
ACCGG CAAGCTGCCCGTG CCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT
GCTTCAG CCGCTACCCCGACCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATG CCCGAA
GGCTACGTCCAGGAGCG CACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCG CCG
AG GTGAAGTTCGAGG GCGACACCCTG GTGAACCG CATCGAGCTGAAG GGCATCGACTTCA

AG GAG GACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCT
ATATCATGGCCGACAAGCAG AAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACAT
CGAGGACGG CAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGG
CCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTG AGCAAAGACCCC
AACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCG
GCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcacaaatt tca caaataaagcatttttttcactgcattctagttgtggtttgtccaaa ctcatcaatgtatcttatAGGAGGAATTG
AACTCAGG ACTCAGAGTAGGTGGGTTCTTCAATGCCAATTCTAATAAAGGACCCTTGCATCA
ACTGCCCTCG CAATTG CTTCTAAGTCTAG CTCCCTTCCCTAAGCGGCTATAAG CATCAGACTC
TGGGGATCAGGGATTGGGACGTGGTTTGGGGTACTCTTTTCTAAAAATTCTGG GGCCATAC
TGATTGTCTTGGCCTAGGTAAATATGAATTTTATGTATCTGTAAATCCTGTCAGAG CAGG GC
CTCAAGCCATAGAGATG CTGAATATTAATCTTAACCTACATTTGAATTTCTCATTATCTACAC
TATTAACATTTTGGGCTAATTAATTATTTGTGATGAGGGGCTAGCCTGTGCATTGTAGGAGT
TATGGAAGCATCCCTGGCCTCTCTCCACCAGATGCTGGTAGATTGTCCAGTGTGACAATCAA
AAAT
pA R B I-613: CACACCTGGCCTAAAATTAATTTTTAAAGATCTCTTAAAAAG
CAGACACCAGCCCCAATCTC
CD5_1_Ex AGACCCCTTGAGACAGAATTTCCAGGACAGGGGCCATCCTGCTGGACAGTGGGTGCCGAG
ogenous AACACCTTG CCCATTTATCTG AG CTCCCTTCTGACTCTGAAATCTG GAGCCCCACC CTCCTG G

CTGCCTGGGTCAGGGTCCTCTGGGAAGCCCCTGCAGTGCCCCAGAAG
GGACGAAGCTCACAAGGGGCAAGGCAGGCAGCCCACGGGGCAGGAGGGAGCTCAACTGG
GCGTCCTAGGGAGAG GGCAGTGAGGGGTGCCAGTG GGGAACCCCTCCCAGCCTGACCCCC
ACCACACCTTTCTGACCCCCAGATtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgt gccttccttga ccctgga a ggtgcca ctccca ctgtcctttccta ata a aa tg agga a attgcatcgcattgtctgagtag gtgtcattctattctggggggtggggtggggcagga cagcaagggggaggattgggaagacaatagcaggcatgctgg ggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACAT
CGCCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGA
AG GTGGCGCG GGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCG CCTTTTTCCCGAG
GGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGT
TTGCCGCCAGAACACAG CTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCG
CCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTG CCGCCTCCCGCCTGTGGTG
CCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAG GTCGAGACCGGGCCTTTGT
CCG GCGCTCCCTTGGAG CCTACCTAGACTCAG CCGGCTCTCCACGCTTTGCCTGACCCTG CT
TGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACC
GG CGCCTACTCTAGAGCTAG CGAATTgccgccaccATGGTGAGCAAG GGCGAGGAGCTGTTC
ACCGG GGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGC
GTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGC
ACCACCGG CAAGCTGCCCGTGCCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGC
AGTGCTTCAGCCG CTACCCCGACCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATGCCC
GAAGGCTACGTCCAGGAG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCG
CCGAG GTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACT
TCAAGGAGGACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACG
TCTATATCATG GCCGACAAGCAGAAGAACG GCATCAAGGTGAACTTCAAGATCCG CCACAA
CATCGAG GACGGCAG CGTGCAGCTCGCCGACCACTACCAG CAGAACACCCCCATCGG CGA
CG GCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGAC
CCCAACGAGAAGCGCGATCACATG GTCCTGCTGGAGTTCGTGACCGCCG CCGGGATCACTC
TCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcaca aatttca ca a ata aagcatttttttca ctgcattctagttgtggtttgtcca a a ctcatcaatgtatcttatTTCCAGGCA
AG GCTCACCCGTTCCAACTCGAAGTGCCAGGG CCAGCTGGAG GTCTACCTCAAGGACGGAT
GGCACATG GTTTGCAGCCAGAGCTGGG GCCG GAG CTCCAAGCAGTG GGAG GACCCCAGTC
AAGCGTCAAAAGTCTGCCAGCGGCTGAACTGTGGGGTGCCCTTAAGCCTTGGCCCCTTCCT
TGTCACCTACACACCTCAGAGCTCAATCATCTGCTACGGACAACTGGGCTCCTTCTCCAACT
GCAGCCACAGCAGAAATGACATGTGTCACTCTCTGGGCCTGACCTGCTTAGGTGGGTAACT
AG CCAGCCACACG GGCACCCTGGG CCTGG GCGCCAGCCCCGAGGAGACTGCCCGAG GCCT

GTGATCTAGGGTCTGAGCAGGCTGGTG GAAGGGGTGGG GGGACCCCAGTTTATAACCACT
CCCCAAGACACATACC
pARB I- GGACAGGGGCCATCCTGCTGG

614:
CCCITCTGACTCTGAAATCTGGAGCCCCACCCICCIGGGTCTAGCTTCGGGGCTGCCIGGGT
CD5_2_Ex CAGGGTCCTCTGGGAAGCCCCTG CAGTGCCCCAGAAGGGACGAAGCTCACAAGGGGCAAG
ogenous_ GCAGGCAGCCCACGG GGCAGGAGGGAGCTCAACTGGGCGTCCTAGGGAGAGGGCAGTGA

GGGAACCCCTCCCAGCCTGACCCCCACCACACCTTTCTGACCCCCAGATT
TCCAGGCAAGGCTCACCCGTTCCAACTCGAAGTGCCAGGGCCAGCTGGAGGTCTACCTCAA
GGACGGATG GCACATGGTTTGCAGCCAGAGCTGGGGCCGGAGCTCCAAGCAGTGGGAGG
ACCCCAGTCAAGCGTCAAAAGTCTGCtgacga ctgtgccttctagttgccagccatctgttgtttgcccctccc ccgtgccttccttga ccctgga a ggtgcca ctccca ctgtcctttccta ata a aa tgagga a attgcatcgcattgtctgag taggtgtcattctattctggggggtggggtggggcaggacagcaagggggaggattggga agacaatagcaggcatgc tggggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCAC
ATCGCCCACAGTCCCCGAGAAGTTGG GGGGAGGGGTCGGCAATTGAACGGGTGCCTAGA
GAAGGTGGCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCCG A
GGGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGG
GTTTGCCGCCAGAACACAGCTGAAGCTTCGAG GGGCTCGCATCTCTCCTTCACGCGCCCGC
CGCCCTACCTGAGGCCG CCATCCACGCCG GTTGAGTCG CGTTCTGCCGCCTCCCG CCTGTG G
TGCCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAG CTCAGGTCGAGACCGGGCCTTT
GTCCGGCG CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTG
CTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGA
CCGGCGCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGT
TCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGG CCACAAGTTCA
GCGTGTCCGGCGAGGGCGAGGG CGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCT
GCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGT
GCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGC
CCGAAGGCTACGTCCAG GAGCGCACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCG
CGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGA
CTTCAAG GAG GACG GCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAA
CGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCAC
AACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGC
GACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAG
ACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCAC
TCTCGG CATG GACGAGCTGTACAAatgattgtttattgcagcttata atggtta caa ata a agcaatagcatc aca aatttca caa ata a a gcatttttttca ctgca ttctagttgtggtttgtccaa a ctc atca atgtatcttatCAGCGG
CTGAACTGTGGGGTGCCCTTAAGCCTTGGCCCCTTCCTTGTCACCTACACACCTCAGAGCTC
AATCATCTGCTACGGACAACTGGGCTCCTTCTCCAACTGCAGCCACAGCAGAAATGACATGT
GTCACTCTCTGGGCCTGACCTG CTTAGGTGGGTAACTAGCCAGCCACACGGGCACCCTG GG
CCTGGGCGCCAGCCCCGAGGAGACTGCCCGAGGCCTGTGATCTAGGGTCTGAGCAGGCTG
GTGGAAGG G GTG GGG GGACCCCAGTTTATAACCACTCCCCAAGACACATACCCAG GAG G G
GGACTGGAAGGGGCCAGCACCCATCTGTAGGATGGCAATG GAGGACCTAGTTCTGCCAAT
CACTGACTTCATCGTCGCCTCTGAACCTCCATTCTCCCATCTGTGAAGTGGGGTGGTACTTCC
CGCCTCGCAGGAGGCT
pARBI- tgctgggatta caggcatgagcca ccacacctggccta a a atta atttttaa agaTCTCTTAAAAAGCAGACAC 2541 615:
CAGCCCCAATCTCAGACCCCTTGAGACAGAATTTCCAGGACAGGGGCCATCCTGCTGGACA
CD5_3_Ex GTGGGTGCCGAGAACACCTTGCCCATTTATCTGAGCTCCCTTCTGACTCTGAAATCTGGAGC
ogenous_ CCCACCCTCCTGGGTCTAG CTTCGGGGCTGCCTG GGTCAGG GTCCTCTGGGAAGCCCCTGC

GAG
GGAGCTCAACTGGGCGTCCTAGGGAGAGGGCAGTGAGG GGTGCCAGTGGGGAACCCCTC
CCAGCCTGACCCCCACCACACCTTTCTGACCCCCAGATTTCCAG GCAAG GCTCACCCGTTCC
AACTCGAAGTGCtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttgaccct ggaaggtgcca ctccca ctgtcctttcctaata a a atgagga a attgcatcgcattgtctga gtaggtgtca ttctattctg gggggtggggtggggcaggacagcaagggggaggattgggaagacaatagcaggcatgctggggatgcggtgggct ctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTC
CCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGGTGGCGCG G

GGTAAACTGGGAAAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCCGAGG GTG GGGG AGA
ACCGTATATAAGTGCAGTAGTCG CCGTGAACGTTCTTTTTCGCAACGGGTTTGCCG CCAGAA
CACAG CTGAAGCTTCGAGGGG CTCGCATCTCTCCTTCACG CGCCCGCCGCCCTACCTGAGG
CCG CCATCCACG CCGGTTGAGTCGCGTTCTGCCG CCTCCCGCCTGTGGTGCCTCCTGAACTG
CGTCCG CCGTCTAG GTAAGTTTAAAG CTCAGGTCGAGACCG GG CCTTTGTCCGGCG CTCCC
TTGGAG CCTACCTAGACTCAGCCGGCTCTCCACG CTTTGCCTGACCCTG CTTGCTCAACTCTA
CGTCTTTGTTTCGTTTICTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGG CGCCTACTCT
AG AGCTAGCGAATTgccgcca ccATG GTGAGCAAG GGCGAGGAGCTGTTCACCGG GGTG GT
GCCCATCCTGGTCGAGCTG GACG GCGACGTAAACGG CCACAAGTTCAG CGTGTCCGGCGA
GGGCGAGGGCGATG CCACCTACG GCAAG CTGACCCTGAAGTTCATCTG CACCACCGGCAA
GCTG CCCGTGCCCTGG CCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAG CC
GCTACCCCGACCACATGAAG CAG CACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGT
CCAGGAGCGCACCATCTTCTTCAAG GACGACG GCAACTACAAGACCCGCG CCGAG GTGAA
GTTCGAGG GCGACACCCTG GTGAACCG CATCGAGCTGAAGGG CATCGACTTCAAGGAG GA
CG GCAACATCCTGG G G CA CAA GCTGG A GTACAA CTA CAA CAG CCACAACGTCTATATCATG
GCCGACAAGCAGAAGAACGGCATCAAG GTGAACTTCAAGATCCGCCACAACATCGAGG AC
GGCAGCGTGCAG CTCGCCGACCACTACCAGCAGAACACCCCCATCGG CGACG GCCCCGTGC
TG CTGCCCGACAACCACTACCTGAG CACCCAGTCCAAGCTGAG CAAAGACCCCAACGAGAA
GCGCGATCACATG GTCCTGCTGGAGTTCGTGACCG CCG CCG GGATCACTCTCG GCATGGAC
GAGCTGTACAAatgattgtttattgcagcttata atggttaca a ata a agca ata gcatca ca a atttca ca aata a agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatctta tCAGG G CCAGCTG
GAGGTCT
ACCTCAAG GACGGATG GCACATGGTTTGCAGCCAGAGCTGGGGCCG GAG CTCCAAGCAGT
GGGAGGACCCCAGTCAAG CGTCAAAAGTCTGCCAGCGG CTGAACTGTGG G GTGCCCTTAA
G CCTTG G CCCCTTCCTTG TCACCTACACACCTCAG AG CTCAATCATCTGCTACGGACAACTG
G G CTCCTTCT CCAACTG CA GCCACAG CA G AAATG ACATGTGTCACTCTCTG GGCCTGACCTG
CTTAGGTG GGTAACTAG CCAG CCACACGGGCACCCTGGG CCTGGGCGCCAG CCCCGAG GA
GACTG CCCGAG GCCTGTGATCTAG GGTCTGAGCAGG CTG GTG GAAGG GGTGGG G GGACC
CCAGTTTATAACCACTCCCCAAGACACATACCCAGGAGGGG GACTGGAAGG G GCCAGCAC
CCATCTGT
pA R B I- TGTCCAAAGTGTGTGTCACCTGCTCTGTGTTCACCCACTGGGTTCATCAGAGGGG

616: ACAAAACAGGAATGTCACTG GAGA GGGG
AACAAATGCTGTCCCCAAGTCCTTCCCTGCCGT
ch r2_1_E TTAGAGTAACG CAG CCCCTGTCCGCACATCCTCACCCCCATCTGCCCTCTGCTTGCCAG AATT
xogenous ATCAGAATCTCTCAATTCTAATTTCAG CCATTCTCTGTGAAAATGAAAAGTTAGTG GCAGGC

AAAAGTCCCAG A GTAATCT
GG GAGTAATCTG AG CAAACTG ATTCAAAG G ATTATTACAATTTCTTAGTCTTCTACTG CTG C
AAAAGTGG CACTCTTTG ATTCTCATGTTTG AG GTTCTAATTACTACCTCATGCA GG ACTTACA
AG AGATCAGCCATGCtga cga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttga ccctgga aggtgcca ctccca ctgtcctttccta a ta aaatgagga aattgcatcg cattgtctgagtaggtgtcattctat tctggggggtggggtggggcaggacag ca agggggaggattgggaagacaatagcaggcatgctggggatgcggtgg gctctatggGGATCTGCGATCG CTCCG GTG CCCGTCAGTG GG CAGAGCGCACATCGCCCACAG
TCCCCGAGAAGTTGGGG GGAG GGGTCGGCAATTGAACGGGTGCCTAGAGAAGGTGGCG C
GGGGTAAACTG G GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG GGA
GAACCGTATATAAGTGCAGTAGTCG CCGTGAACGTTCTTTTTCGCAACG G GTTTG CCG CCA
GAACACAGCTGAAGCTTCGAG GGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCTG
AG GCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAA
CTGCGTCCGCCGTCTAG GTAAGTTTAAAGCTCAGGTCGAGACCG GGCCTTTGTCCGGCG CT
CCCTTG G AG CCTACCTAG ACTCAG CCG G CTCTCCACG CTTTGCCTGACCCTGCTTG CTCAA CT
CTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTAC
TCTAGAG CTAG CGAATTgccgccaccATGGTGAG CAAGGGCGAGGAGCTGTTCACCGGG GTG
GTGCCCATCCTGGTCGAGCTGGACGG CGACGTAAACGGCCACAAGTTCAGCGTGTCCGGC
GAGG GCGAG GGCG ATG CCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGG C
AAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCA
GCCGCTACCCCGACCACATGAAG CAG CACGACTTCTTCAAGTCCGCCATGCCCGAAG GCTA
CGTCCAG GAGCGCACCATCTTCTTCAAGGACGACG G CAACTACAAGACCCG CGCCGAGGTG
AAGTTCGAGGG CGACACCCTGGTGAACCGCATCGAG CTGAAG GGCATCG ACTTCAAGG AG

GACGGCAACATCCTGG GGCACAAG CTG GAGTACAACTACAACAGCCACAACGTCTATATCA
TG GCCGACAAGCAGAAGAACG GCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG G
ACG GCAGCGTG CAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACG GCCCCGT
GCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAG ACCCCAACG AG
AAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCG GCATGG
ACGAGCTGTACAAatgattgtttattgca gcttata a tggtta ca aata a agca atagca tca c a a atttca ca a at aa agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatcttatTGCTTACATTGTGTGTAG
TGAATCAATGTTATTCTGAAGG G CCTCCTG G AA G CAG TAG G G ATTGG AG AATGTATGTCTT
TTAACATGATCATTCCCATCCTATGGGG GTTGGG CA CTCTTAACTCATG TTG CA G ATG AG TA
AACTGAGGCTTTTGAGAGTTTGAGGGTGCACAGCTTGAG GCAGAGGTGGAATTCCCAGCTT
CATCCAAGAAATCACGAATCTCCAATG GCTTGCACACATCCAG GTTTGCTGGATCCTATTTG
TCAA G A CCA CATG CAG ATTAG G AG CCTTCAAA CAA ATTATTTCCAAG G G AG G G
CATCTTTCT
CTGTAACCCCTACCCTGGCAATTCTCCATG GCCTCAG GGAGTCCAG CCG GAAGCTCTG CAAT
GAGTCCTG GGGAACTCTCATCTGGAAAAAGG CG CTCATGACAG GTGGTGCCAACCTTCCAC
pA R B I- CTACAAGCATGCATCACCAGGCCTG G CTAATTTTTCTATTTTTTGTAG GG ACA G

617:
GTGTTGCCCAGGCTGGTCTCAATTTCCTGGACACAAGCAATCTGCCTGCCTCAGCCTCATAA
chr2_2_E IGTITTTTTTTAATGTTAAACCCCTTCTTGIGTTGAG AGG CCTG G CTTGTGTGTTTGA G CG
CA
xogenous GCTGCCTACCTTTGCAGCTGTAAATGTCAGGGCTCCAGGAGGGCTGGCTGGGTCTCCCCTC

GGCCCTGAGCTTATCCAGGCATCCCTAGGTGGATGGATGGG
TGGATGGGTAGG CCCTGGCCTTCAGGGAATGCACTGTGTGTGGTCTTCCAGCCCCAGCAAA
GTCTCAGGGGGCCCGAGCGGGAAGTCGCCATCCTCAGGGGGTCCGAGTGG GGAGTCACC
ATCCTCACGGTCAGCCGTG Gtga cga ctgtgccttctagttgccagccatctgttgtttgc cc ctcccccgtgcctt ccttga ccctgga aggtgcca ctccca ctgtcctttccta a ta aaatgaggaaattgcatcgcattgtctgagtaggtgtc attctattctggggggtggggtggggcaggacagcaagggggaggattggga a ga ca atagcaggcatgctggggat gcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCG CACATCGC
CCACAGTCCCCGAGAAG TUG C GGG C AGGG GTCG G CAATTG AACGG GTG CCTAG AGAAGG
TGGCGCGGG GTAAACTGG GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTG
GG GGAGAACCGTATATAAGTG CAGTAGTCG CCGTGAACGTTCTTTTTCGCAACGGGTTTGC
CGCCAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCT
ACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTC
CTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCG
GCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGC
TCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGC
GCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACC
GG GGTGGTG CCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTG
TCCGGCGAGGGCGAG GGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACC
ACCGG CAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT
GCTTCAGCCGCTACCCCGACCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATGCCCGAA
GGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCG
AG GTGAAGTTCGAG G GCGACACCCTG GTGAACCGCATCGAG CTGAAGGG CATCGACTTCA
AG GAG GACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCT
ATATCATGGCCGACAAG CAGAAGAACG GCATCAAGGTGAACTTCAAGATCCGCCACAACAT
CGAGGACGG CAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGG CGACGG
CCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTG AGCAAAGACCCC
AACGAGAAGCG CGATCACATG GTCCTG CTGGAGTTCGTGACCGCCGCCGG GATCACTCTCG
GCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcacaaatt tca ca a ata aag ca tttttttca ctgcattctagttgtggtttgtcca aa ctcatcaatgtatcttatTTTCGAATTGAG
CTCCCAGGGGCGGAGACTCCATTCCCCCGGGGAACAATCTGATAGCCAGAAATGAGAGCA
GTGAGTGACTTGGACGCTGTGTGATTCAGGAGGGTCCCAGTCCCTCAGAGTCGCCTGGACA
CCAGATTGTATTTACAAAACAGATGGATGAGGAAATGGAGATTGG GTCTTCAAAAGAATTA
ACCTCAGATTTTATTTATTTATTTTAAAAATTTTAATTTTTCCCAAGGACCCTTTCCAAGTCAA
GAATTGTTTTAAAAAGAAGCCTGGTG AAGGCAAATGCTGTCTGTGGCCGAAGGCACCGTG
AG GAG CTGAGGCTGACTTAIGTITCTCCTCTG GGCTATGTGCG CCTCTAAGGAGTTCACACA

CTTTAACCCCCTTAGGAACCCTACATGATCATCCTCGTTTCTTAAAGAGGAAG CAGAG CCAG
CAAACAAG
pARBI- TCTTGATGGATACCCATCCTTCTCTGCTGTTCTCTCTTTCCTCCATCCTCAAGCACAGGGAGC 2541 618: TTTTGGCGAGGTTAGCTAACTTCGGGGCAGCTGTGAATTAGATACTGGCTGTAAGTTACCTC
chr2_3_E CAACATCCAGAGCATTTAAACG CTCG GCTGTCAATGATTGTCATTTTCATTGTTTACATATAG
xogenous TCACATCCTCAGTTTGTGG GGTGGATGCATTTGATTCAATTCAGATATG CCTTTCAAG CTG G

GCCTGAAAGGCACATGGCTCATCTCATCTGCATGTGACAGTCTCTTTG
AAGTCCCCAGAAGTGCCTGCTCAGAGTCATGCTCGGGAGGGAAGAG GTAAGGCACAGAAT
GAGTACAGACTGCCACCCAG CCGCAGGAAAGGTGACTGCACACCAGAGGTTCCTGGTAAG
GAGGTGACTCACCCGCGAtgacga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttcc ttga ccctgga aggtgcca ctccca ctgtcctttccta a ta aaatgaggaaattgcatcgcattgtctgagtaggtgtcatt ctattctggggggtggggtggggcaggacagcaagggggaggattgggaagacaatagcaggcatgctggggatgcg gtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCA
CAGTCCCCGAG AAGTTGGGGG GAGG GGTCGGCAATTGAACGGGTGCCTAGAGAAGGTGG
CGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG
GGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTITTTCGCAACGGGTTTGCCG
CCAGAACACAGCTGAAGCTTCGAGGGG CTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACC
TGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTG
AACTG CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCG
CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAA
CTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCT
ACTCTAGAGCTAGCGAATTgccgccaccATGGTG AGCAAGGGCGAGGAGCTGTTCACCGGGG
TGGTGCCCATCCTGGTCGAG CTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG
GCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCG
GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTC
AG CCG CTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGG CT
ACGTCCAG GAGCGCACCATCTTCTTCAAG GACGACG GCAACTACAAGACCCGCGCCGAGGT
GAAGTTCGAGG G CGACACCCTGGTGAACCGCATCGAG CTGAAGGGCATCGACTTCAAG GA
GGACGGCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATC
ATGGCCGACAAGCAGAAG AACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG
GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCC
GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAG CAAAGACCCCAACG
AGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCAT
GGACGAGCTGTACAAatgattgtttattgcagctta ta atggttacaa ata a agcaatagc atca ca a atttca c a aata a agcatttttttca ctgcattctagttgtggtttgtcca a actcatca atgtatcttatCTGTCTCACCTGTCCA
AAGTGTGTGTCACCTGCTCTGTGTTCACCCACTGGGTTCATCAGAGGGGCACTTGAACAAA
ACAGGAATGTCACTGGAGAGG GGAACAAATG CTGTCCCCAAGTCCTTCCCTG CCGTTTAGA
GTAACGCAGCCCCTGTCCGCACATCCTCACCCCCATCTG CCCTCTGCTTGCCAGAATTATCAG
AATCTCTCAATTCTAATTTCAGCCATTCTCTGTGAAAATGAAAAGTTAGTGGCAGGCAATGG
ACACCCTATTGAAATAGAAATTGTTCTTCATATGTCAGAAAAGTCCCAGAGTAATCTG G GAG
TAATCTG AG CAAACTGATTCAAAG GATTATTACAATTTCTTAGTCTTCTACTG CTG CAAAAG T
GG CACTCTTTGATTCTCATGTTTG AG GTTCTAATTACTACCTCATG CAG G ACTTACAAG AGA
pARBI- TGAGCAAATAGTTCTATCCTTTTTGACACATTTTCCACAGTTTGTATAATTTATTCATAGAATC 2541 619: CTTCTTAGACTACTAATAAATTCCTATG G G ACATAGTCCTTTTATATG CG GAG GAATCCACC
chr3_1_E ACCATGAAATGTTCACAAGAAATGACTTTTAAAATTTTATTTGTG G CCATG CTGATTATGAA
xogenous TTTCTG AATTATGTCTGG GCTG CTCTTTTTCTCGGAATATG CAA AGTG GGATACTGG
CATCTC

GGAAGTAGCAGGAGAGG GA
GAGATCAG AAATCTCCACTCCCTGAAACACAGACACTTCACATACCCAG GAG GTAAATTCCT
GG CATGAATTG TAACAA GTATACACAG G CATTTGTACATTGTTTG CTTATGTG CTAGATGAC
ATGGCAGTCTATGAAtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttga ccctgga aggtgcca ctccca ctgtcctttccta a ta aaatgagga aattgcatcgcattgtctgagtaggtgtcattctat tctggggggtggggtggggcaggacagca agggggaggattgggaagacaatagcaggcatgctggggatgcggtgg gctctatggGGATCTGCGATCGCTCCG GTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAG
TCCCCGAGAAGTTGGGGG GAG GG GTCG G CAATTG AACG G GTG CCTAGAGAAG GTG GCG C
GGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG GGA

GAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCG CCA
GAACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCTG
AG GCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAA
CTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCG GGCCTTTGTCCGGCG CT
CCCTTG GAG CCTACCTAGACTCAG CCG G CTCTCCACG CTTTG CCTGACCCTG CTTG CTCAACT
CTACGTCTTTGTTTCGTTTTCTG TTCTG CGCCGTTACAGATCCAAG CTGTGACCGGCGCCTAC
TCTAGAG CTAG CGAATTgccgccaccATGGTGAG CAAGG GCGAGGAGCTGTTCACCG GG GTG
GTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGC
GAGGGCGAGGGCG ATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGC
AAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCA
GCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTA
CGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTG
AAGTTCGA GGG CGACACCUGGTGAACCGCATCGAG CTG AAG GGCATCG ACTTCAAGG AG
GACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCA
TG GCCGACAAGCAGAAGAACG GCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG G
ACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACG GCCCCGT
GCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTG AGCAAAG ACCCCAACG AG
AAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGG
ACGAGCTGTACAAatgattgtttattgca gcttata a tggtta ca aata a agca atagca tca c a a atttca ca a at aa agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatcttatCAGGG
GCCCCACATCTG C
AG G CAGGACATTCAG CCCTG TATCTTG CCACTACCATGAAATCTGA G GTTCTTTCATTTTTG T
TTTTCTATTCTGATGAGATTG A G GTACACTGACTCA CTG GTAAAGAA GACAATTTATTTGTG
AACTTGTAGTCACTCCATTTGACGGCCTCCCTCTACTCACTGAGGCACAGCACGTGTGGAAC
AG GAAAAA G GCCTCTG G A GAG CTTGTGTTCCACTG GTTTCTTTGACTTTTG CCAACCAACTT
TG CAAATGTTCTCAAAA G G CATTCATTACCTCTG CA G GTGTGAAATAACTCTGCAGTTCAAG
GGTTTGTGAAATCACGTGAAACAGCTGAAAGCTGTTCTGGCAAAGCGACCAGAGTCAGAG
TTA G CTGTCACCAAG CCACATCCTG G CTTAAACGATGTTATTTATATCTAACTG CA CTCG A
pA RB I- AG ATG

620: TAACTAAAATG CCAAATGTGTG ACTTG AGTAATTAACTAG A GACCTCCA GAA
G G CTTTCCTT
ch r3_2_E CTTATTCTGG GTACATTGATATTTACTTTAGTATTTGTTTTGCCTTTAAAGTCATGTTCCCTTT
xogenous TGATTCTTTAATATCATATCCAAAACG AAGTCCTTTTAACCTATAACAATGCAAATCAAAAGT

GCTGTCTTTCAGCTACCTCAAGGATTTAGCAAGCGTTCCTGGAAATCTAACCAACCTTTTAA
AAAGTATTTCTATGG GAATGTATTTCTG AG CCCTAAATTATG GATTTTGAAACTAAAATAGA
AGAATACTGTAAGTGCAAAACTTCCAGTAGATAACAGAATTTG CAAAATCTCTAATCTCTTG
TATGTGCCCCAAtga cga ctgtg ccttctagttgccagccatctgttgtttg cc cctcccccgtgccttccttga ccctg gaaggtgcca ctcccactgtcctttccta ataa a atgaggaa attgcatcgcattgtctgagtaggtgtcattctattctgg ggggtggggtggggcaggacagcaagggggaggattgggaaga ca atagcaggcatgctggggatgcggtgggctct atggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTCCC
CGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGGTGGCGCGGG
GTAAACTG GGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCG AGGGTGG GGGAGAA
CCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAAC
ACAG CTGAAGCTTCGAG GGG CTCG CATCTCTCCTTCACGCGCCCGCCG CCCTACCTGAGGC
CGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTG GTGCCTCCTGAACTGC
GTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAG ACCGGGCCTTTGTCCGG CG CTCCCTT
G GAG CCTACCTAGACTCAG CCG G CTCTCCACG CTTTG CCTGACCCTGCTTGCTCAACTCTAC
GTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTACTCTA
GAGCTAGCGAATTgccgcca ccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGC
CCATCCTGGTCGAGCTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG GCGAGG
GCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCT
GCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCT
ACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCA
GGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTC
GAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGG AGGACGGC
AACATCCTGGG GCACAAG CTG GAG TACAACTACAACAG CCACAACGTCTATATCATG G CCG
ACAAGCAGAAGAACG GCATCAAGGTGAACTTCAAGATCCGCCACAACATCG AG GACGG CA

GC GTG CAGCTCGCCGACCACTACCAG CAGAACACCCCCATCGGCGACGG CCCCGTGCTGCT
GCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCG
CGATCACATGGTCCTGCTG GAGTTCGTGACCGCCG CCG G GATCACTCTCGG CATGGACGAG
CTGTACAAatgattgtttattgcag cttataatggttacaa ata a agca atagc atca ca a atttca c a a ata aagca tttttttca ctgca ttctagttgtggtttgtcca a actcatca atgtatcttatGTAGTACTCTCTATAGGCCATTG
GAACAGAACATCTG G CTTAAAATAATCTCAG CACCAAAAAC G GT CTAACTATG GTCTCCACA
ACATAACCCGATG GTTCTCATATTTTAGATGCTTGATGCACAAGTTGAACTGGCTAACAATT
CAGACTTCAG GACTTGACTCCCCTG G GAATCTGCAGTTGTTTCCCTCCCCTTGAGTGAGACT
GACTGTTGGGAGTAGAATAACCGCATGCTCTGGGTCAGAATG CCIGGCCCITCAGAGTAAC
GGCAG CCATGATCGCTGAGCTCCTACTGTGAG GCACATACTTCGCCACACTAACACATG GA
ATCCTTG CAAGCTCCTTACATAATCA GTATTTTATG CA CTTTG CAAATG A G AATACTAAG G CT
CAGAAGAGG CAAGCTCCTCTCCCACTGTTAAACAGCTAGTGATGAACAGAGCTGG G
pA R B I-621: ATAACCGCATGCTCTGGGTCAGAATG CCTGGCCCTTCAGAGTAACGG CAGCCATGATCG CT
chr3_3_E GAGCTCCTACTGTGAGG CACATACTTCGCCACACTAACACATG GAATCCTTG CAA G CTCCTT
xogenous ACATAATCA GTATTTTATG CA CTTTG CAAATGAG AATACTAAG GCTCAGAAGAGGCAAG CT

ACCTAAACTTG TCTG ATTTCC
CACACTGCTGCTCCATCAGATTGTTTGCCAGGTTAAG GATTCATGCCAG GACCTTAAATCCC
TCTGAATCGACAGAGCTAACCTAAAG GGCACCTTGATTTCAGGTTTTGTTCCCAAACCTATG
ACTGTG GCCTGGCCACGTtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttcc ttga ccctgga aggtgcca ctccca ctgtcctttccta a ta aaatgaggaaattgcatcgcattgtctgagtaggtgtcatt ctattctggggggtggggtggggcaggacagcaagggggagg attgggaagacaatagcaggcatgctggggatgcg gtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGG GCAGAGCGCACATCGCCCA
CAGTCCCCGAG AAGTTG G GGG GAGG GGTCGG CAATTGAACG GGTGCCTAGAGAAGGTGG
CGCGG GGTAAACTGG GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG
GGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTITTTCGCAACGG GTTTGCCG
CCAGAACACAG CTGAAGCTTCGAG GGG CTCG CATCTCTCCTTCACG CGCCCGCCGCCCTACC
TGAGG CCGCCATCCACG CCG GTTGAGTCG CGTTCTG CCGCCTCCCG CCTGTGGTGCCTCCTG
AACTG CGTCCG CCGTCTAG GTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGG CG
CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACG CTTTG CCTGACCCTGCTTGCTCAA
CTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCG CCT
ACTCTAGAGCTAGCGAATTgccgccaccATGGTG AGCAAGG GCGAGGAGCTGTTCACCGG GG
TGGTG CCCATCCTGGTCGAG CTGGACG GCGACGTAAACG GCCACAAGTTCAGCGTGTCCG
GC GAGG GCG AG GG CGATGCCACCTACG G CAAGCTGACCCTGAAGTTCATCTGCACCACCG
GCAAGCTG CCCGTG CCCTG G CCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTC
AG CCG CTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCG CCATGCCCGAAGG CT
ACGTCCAG GAGCGCACCATCTTCTTCAAG GACGACG GCAACTACAAGACCCGCGCCGAGGT
GAAGTTCGAGG G CGACACCCTGGTGAACCGCATCGAG CTGAAGGGCATCGACTTCAAG GA
GGACG GCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATC
ATG GCCGACAAG CAGAAG AACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG
GACGG CAGCGTG CAG CTCGCCGACCACTACCAG CAGAACACCCCCATCGGCGACG GCCCC
GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAG CTGAG CAAAGACCCCAACG
AG AAGCGCGATCACATGGTCCTG CTGGAGTTCGTGACCG CCG CCG G GATCACTCTCGGCAT
GGACGAG CTGTACAAatgattgtttattgcagcttataatggttacaa ata a agcaatagcatca ca a atttca c a aata a agcatttttttca ctgcattctagttgtggtttgtcca a actcatca atgtatcttatAATGAAG
ACCCGGTT
GGTTCTAG CTTCACAATGAGCTCATTG G CATTGTCCCCACAACCACCACCCCCTCTCCACCTC
TGATCATCCACTATCATTTCTATATGGCCTTTGCCAGGCTTGTCCTCATATCAAAGCTTCTGC
CCTCTATGTTTAAATG CATAA G CTTG ACAAATGTTTTCATTTG A G AA G G CAA GCTG CCTCCC
CG TG CTTG G TCTAATGTG AG CTTG GTAG AG TG ACA CTG CTGGATTCAGTTTTAGTCATCAAC
ACTTCCAGCTGATGCTGATCTG AGTAATTGTCCTATTTAGACTCTGACAGGTGAAAGTCATA
GATTCAG GAAACGGAGACTCCCCTCACCCTGCTATGAAG GGGAGTTTGAGAGGGTTGG GC
AG CTCTTCCCG G CAG TG GCAGG CA GTG TG G G GACACACGTATGGATAAATTGTTAATG CTC
IT
pA R B I- CTGCAGGTGGCAGACTTACTCATTCTCATG CTTCAGCTCCAG GGGTTCCCCTGTATG

622: TCTCTGCCCG CTCCCACACCCCAGGTAG CATTGCCCCTG CTTGCTTTGTG CCCTCACTGTCTC
121 ch r3 :1865 CACATATATACACATCTCTTACAATAATATTAAATACCAGCCAACATTTATATAGTACTTCTTA
Ex oge n o TGTAC CAAG TACTG TTGTCTATCTATCTATCATCTATCTATCTATCTATGTA CA CACA CA
CTCA
us_109 TATTATCTTCA CAATG CTATG AAATA G GTATTATTG CTGTC CATTTTACAG
ATG G G G AAA CC
AATCTATAATTG G TTTAGTG AG ATTTACTG AGCATATCTACA G GTATTTG TCCTTG G TTATA G
GAAACCAATCTGTCACTGGTTTCCCCTTCTGTAAAATAAGTTAAATAATTTGCTTAAGCTCAC
TCAGATAGTtga cgactgtgccttctagttgccagcca tctgttgtttgcccctccc ccgtgc cttccttgaccctgga a ggtgcca ctccca ctgtcctttccta ata a a atgagga a attgcatcgcattgtctgagtaggtgtcattcta ttctggggg gtggggtggggcaggacagcaagggggaggattgggaagacaatagcaggcatgctggggatgcggtgggctctatg gGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTCCCCG
AG AAGTTG GGG G GAG G GGTCGG CAATTGAACG GGTGCCTAG AGAAGGTGGCG CGG G GT
AAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGGGGAGAACC
GTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAACAC
AG CTG AAG CTTCGAGG GG CTCGCATCTCTCCTTCACGCGCCCG CCGCCCTACCTGAG GCCG
CCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAACTGCGT
CCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCGCTCCCTTG
GAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTACGT
CTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGG CGCCTACTCTAG A
GCTAG CGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCC
ATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGG GC
GAGGGCGATGCCACCTACGGCAAGCTGACCCTG AAGTTCATCTGCACCACCGGCAAGCTG C
CCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTAC
CCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCG AAGGCTACGTCCAGG
AG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAG ACCCGCGCCGAG GTGAAGTTCG A
GGGCGACACCCTGGTGAACCG CATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGGCAA
CATCCTG G G G CACA AG CTG GAGTACAACTACAACAG CCACAACGTCTATATCATG G CCG AC
AAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGC
GTGCA GCTCGCCGACCACTACCAGCAGAACACCCCCATCGG CGACG GCCCCGTG CTGCTGC
CCG ACAACCACTACCTGAGCACCCAGTCCAAG CTGAGCAAAGACCCCAACGAGAAGCGCG
ATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGG ATCACTCTCGGCATGGAC GAGCT
GTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatca ca aa tttca caa ata a agcattt ttttca ctgcattcta gttgtggtttgtcca a a ctcatca atgtatcttatGTGTGGTTGAGTCATAATCTGAGCT
ACTCTCTGGCCCAAACTGCTTATTGTAACCACCACATCATATGGCCACTTCAGCACCAGTCAC
ACCTCATTTTACTTG TTG ATGTCTCA GTTTTCC CCTCTATTCTGC GGG CTACTTG AG AAG ACTT
CTGTCTTCTTCCTCTAG G TATCTCTG TG CTTG A CTCA GTATCG AG ACCATA G G TGTGTG G G T
GG G GTTAATCC ACA CTTG CTG AACACATGTGTCC GTG CCCAATGTC C CTG TG CATAGTG G CT
ACTTCCCATGGTTCACACGCATCTTCATGGCTCTTGTTTCTTCCATTCACCCATCCCATGTCCC
TGGCCCTGTTCTCAGTGCTGAG GATGGAACCTCATTGCTGCTG CTTCCAGCCCTGGCCCTGC
CCCTGTTTGG GGCTCAGTGCTTCATCCCGTGGAGCTCAAGGGGTGCTG
pARB I- CTTCTCATCCTCACA CCCATTCATTTCTTAG TTTCCCATACTCTG A CTTCTACTA

623: ATAAAG A C CTCATG AA CACAACCTG G AG AAC G CCCTCA G
CCTCATCTCACCTC CCTG CAACA
ch r3 :1865 CATG G CTCACTTG TCTGCTTCTTCCTTCTTAG AG TCCTCTCTCTG GTTG ACCTGTCAAA
G ATG
Exogeno GGATTCTCCTGTTAACGACCCTGACTCCTTTGTCTCCTTCTACAAAGATAATGCACTGACTTT
us_110 CCAAATGTACTAACTA G TATG ATCATCACTG TCTACTGTTG CATCG AA
CCACTACTAA C CTCA
ATAGG GAAGGCACCAGGTTCAAGAGGCCAAAG AAGAGA CCCAGAGCCAGCAAATGAGAC
ATG G GGTTTTATGAGGG GCTCACATAG AGG G GAGAG AGTCCAGTGGCAGTG GGCTGG GC
AG GAGAACCACCTTACtga cga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttg accctggaaggtgcca ctccca ctgtcctttccta ataa a atgagg aa attgcatcgcattgtctgagtaggtgtcattct attctggggggtggggtggggcaggacagca agggggaggattgggaagacaatagcaggcatgctggggatgcggt gggctctatggGGATCTGCGATCG CTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCAC
AGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGG GTGCCTAGAGAAG GTG GC
GCGGGGTAAACTGGG AAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGGG
GAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGC
CAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCT
GAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTG
AACTG CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCG
122 CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAA
CTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCT
ACTCTAGAGCTAGCGAATTgccgccaccATGGTG AGCAAGGGCGAGGAGCTGTTCACCGGGG
TGGTGCCCATCCTGGTCGAG CTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG
GCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCG
GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTC
AG CCG CTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGG CT
ACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGT
GAAGTTCGAGGG CGACACCCTGGTGAACCGCATCGAG CTGAAGGGCATCGACTTCAAG GA
GGACGGCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATC
ATGGCCGACAAGCAGAAG AACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG
GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCC
GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAG CAAAGACCCCAACG
AGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCAT
GGACGAGCTGTACAAatgattgtttattgcagcttataatggttacaa ata a agcaatagc atca ca a atttca c a aata a agcatttttttca ctgcattctagttgtggtttgtcca a actcatca atgtatcttatACGTG
GTCCAGTG GT
GGGAGG CTGGAAAAGAGATCTGTCTTGCCTACAGTTCAGTGGTGGCGGGTTGGGCAGGAA
AACCACAACTGCCTGCAAACATCATGCAGTTTACATAACATTGTCACTTAG CAG CCTCCCCTA
ACGACCTCCACCTGG CAATCTTCATTTAAC CCAAAA CTCAAG G CCTCAATCCCCTGTATG GA
CCATGTTTCATGGGACAGGCCAGGGGCTCAGAAGTTTATCATAGATAAGGAATGAATCTCC
GGGTTGGCCACTCCTGGATTCCCTAGCTTGGAACACACATTCAAGTGTATCTGCCATTCTGA
GTGTATTCTTCAATTATTGCTGTCAGGTGTGTTGACCCTCTATCTACATAAGTATTTCTCAAA
CCAGGGCCCTACGATCAGCATCAGCATCACCTGGG GTATATGTGAAAAATGCAGCTCCTGG
ACC
pARB I-624:
CAGCTCTGCCCTCGCCTCCACCTACATCATGCCTTCCTGCATGCCAACTTCCCTCTGAGGTTC
ch r3 :1865 CCTTCACCCACTTACAACCACAGCCTCCTTCCGGACCTCCCCTACACGATGAACAGTCTGCCA
Exogeno GTTAATATACATCAAGCTGCCACCATGAAACAGCTCG GAAAGCCCCATTTGTTCTCAATAGC
us_111 ATTCCAGCTAGAAAGACCATCCACCACCCATGGCCACTGAAGAAGGTCCTCCCCTGAGTGC
CAGGCAGTGGGGCTCCTAACTCTGGGCCCCAATTAGCTTAG CTTGGGGGAGCAAAGGACA
GGAGCACTGTTCACCCAGTTGGACTAACAGCTCCACACTGTGCCTAATAGGAACCTCCTTCT
CACAGGTGGCACCTTTGtgacga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttcctt ga ccctgga aggtg cca ctccca ctgtcctttcctaata a a atgagga a attgcatcgcattgtctga gtaggtgtca ttc tattctggggggtggggtggggca ggacagca a gggggagga ttggga aga ca atag ca ggcatgctggggatgcgg tgggctctatggGG ATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCAC
AGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGG GTGCCTAGAGAAG GTG GC
GCGGGGTAAACTGGG AAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGGG
GAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGC
CAGAACACAGCTGAAGCTTCGAGGGGCTCG CATCTCTCCTTCACGCGCCCGCCG CCCTACCT
GAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTG
AACTG CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCG
CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAA
CTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCT
ACTCTAGAGCTAGCGAATTgccgccaccATGGTG AGCAAGGGCGAGGAGCTGTTCACCGGGG
TGGTGCCCATCCTGGTCGAG CTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG
GCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCG
GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTC
AG CCG CTACCCCGACCACATGAAGCA GCACGACTTCTTCAAGTCCGCCATGCCCGAAGG CT
ACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGT
GAAGTTCGAGG G CGACACCCTGGTGAACCGCATCGAG CTGAAGGGCATCGACTTCAAG GA
GGACGGCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATC
ATGGCCGACAAGCAGAAG AACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG
GACGGCAGCGTGCAG CTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCC
GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAG CAAAGACCCCAACG
AGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCAT
123 GGACGAG CTGTACAAatgattgtttattgcagcttataatggttacaa ata a agcaatagc atca ca a atttca c a aata a agcatttttttca ctgcattctagttgtggtttgtcca a actcatca atgtatcttatTGAATCAGGGTCTGT
CCTGAGCCACCCTCCCTTG GAG AGATCTGAGCACTGAGAACCACAGAGCCTGGGACACCAG
CACAG GGTTTG AAAAACTCCACTCCCAGTCAGCTGCGGTG G CTCATGCCTGTAATCCCAGC
ACTTTG GGAG GCTGAGGTGGGTGGATCACAAGGTCAG GAGTTCGAGACAGCTTGACCAGC
ATG GTAAAACCCCATCTCTACTAAAAGTACAAAAATTAGCCAG GCATGGTGTTG GG CACCT
ATAATCCCAGCTACTCGGGAGGCTGAGGCAGGG GAATTGTTTGAACCTG GGAGCTGGAGG
TTGCAGTGAGCCGAGATCGCACCACTGCACTCCAG CCTGGG CGG CTGAGCG AG ATTCTGTC
TCAAAAAAATG GGAAAAACATCATTCCCATGTGTGTCCTG GTTGCTTGTCTCTTTCAAAGTA
CTGCATACT
pA RB I- CCACATCATGGACCAGAAGCCACTTG

625: AAATG GGCAATGAACAGG AG CCTGTTTGTTTCTCCTTCCA G CTCCTG ACCTG TCTAG CTCCT
chr8_1_E GACCCTCTCAGTGTCATTCTTG CTCACCCCTGGCCCTGCTCCCTGGATATG CAGACAGGATG
xogenous GG CTTCTTCCCTTTAGACCTTCACTTTGCTCTTTTGGACTCCTCTTCTGGCCTTGCCTTCCCCT

CCTCCTG GCCTT
TGGTGACACCG CATCAG CAGG CTCCTGCTTCCCTGTCAAG GGATACAAGCCAG CACCGGTC
TTTCGCCACAAG GTATGTGTGTAAAGATTGCTTCCTGTACTTG GTGTCCACAGGATTTTAAA
GAACTCCTGGCCCACTtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttg accctggaaggtgcca ctccca ctgtcctttccta ataa a atgagg aa attgcatcgcattgtctgagtaggtgtcattct attctggggggtggggtggggcaggacagca agggggaggattgggaagacaatagcaggcatgctggggatgcggt gggctctatggGGATCTGCGATCG CTCCGGTGCCCGTCAGTG GGCAGAG CGCACATCG CCCAC
AGTCCCCGAGAAGTTG GGGGGAGG GGTCGGCAATTGAACGG GTG CCTAGAGAAG GTG GC
GCGGGGTAAACTGGG AAAGTGATGTCGTGTACTGGCTCCG CCTTTTTCCCGAGGGTGGG G
GAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCG CAACGGGTTTG CCGC
CAGAACACAGCTGAAGCTTCGAGGGGCTCG CATCTCTCCTTCACGCGCCCGCCG CCCTACCT
GAGGCCGCCATCCACG CCGGTTGAGTCGCGTTCTG CCG CCTCCCGCCTGTG GTGCCTCCTG
AACTG CGTCCG CCGTCTAG GTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGG CG
CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACG CTTTG CCTGACCCTGCTTGCTCAA
CTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCG CCT
ACTCTAGAGCTAGCGAATTgccgccaccATGGTG AGCAAGG GCGAGGAGCTGTTCACCGG GG
TGGTG CCCATCCTGGTCGAG CTGGACG GCGACGTAAACG GCCACAAGTTCAGCGTGTCCG
GC GAGG GCG AG GG CGATGCCACCTACG G CAAGCTGACCCTGAAGTTCATCTGCACCACCG
GCAAGCTG CCCGTG CCCTG G CCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTC
AG CCG CTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCG CCATG CCCGAAGG CT
ACGTCCAG GAGCGCACCATCTTCTTCAAG GACGACG GCAACTACAAGACCCGCGCCGAGGT
GAAGTTCGAGG G CGACACCCTGGTGAACCGCATCGAG CTGAAGGGCATCGACTTCAAG GA
GGACG GCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATC
ATG GCCGACAAG CAGAAG AACGGCATCAAGGTGAACTTCAAGATCCG CCACAACATCGAG
GACGG CAGCGTG CAG CTCGCCGACCACTACCAG CAGAACACCCCCATCGGCGACG GCCCC
GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAG CTGAG CAAAGACCCCAACG
AG AAGCGCGATCACATGGTCCTG CTGGAGTTCGTGACCG CCG CCG G GATCACTCTCGGCAT
GGACGAG CTGTACAAatgattgtttattgcagcttataatggttacaa ata a agcaatagc atca ca a atttca c a aata a agcatttttttca ctgcattctagttgtggtttgtcca a actcatca atgtatcttatCGGAACTCTACGCTTG
AAACAGAAACATCTTCTCAAAACACCTCTGGTATTG G CCCATTTCTCTATTACTGTTTCTTGT
GG TTTCATACTG AG G TTTTTG GTG CATTGCAATTGCCCTGGAGTTTATTTTTAGTAATAAAG
ACAGAGTTCAATTAATGTCAAAACATGAGGGTAATTGCAGAGGAAG GATTAGTAAGTCTAG
GG GAAAATGTG CCCAACTTTTTTTCTTCTG ATAACATCTTT CACTG ATCATACACCGTCA CAC
CCACAGACACG GAGCTTCCTCCGTGATGTCTGCAATGCACTAG GCCCAGTCGG GGAGCAGT
GGCTG CGCCACCACCTGAAAACGAAAG CATTTCTGAGTCTCTTTCAGTCCAG CTCATCTAGC
AG CCAGCCACACAGGTGACCACCCACCGTAGCTGTCACTATGGGTGATGATTATCATTATG
pA RB I- AG GATCTTTCTCTCTACAGTGCCTCTTGTG GGACGGGTTCTGCTGATG GGGG

626: GATCTCG CCTGGAAACACGATCTCTCCAACTTTCCCTG G GAG CAGAGCCTTG GGTCTCACTG
chr8_2_E AACCCACCTCCCCAGCACCTTGTGTGGTGACTG G CAGG AG GTAGCCATTCCCTCCTGTCCTC
TCTG CCCTTGTCCACTTTCCTGTTATGAGTCAGTTG CTG A GATAG G CTATG AG G A GTTAA CA
124 xogenous GAGTGGATAAAAAAGAGCTTTGTCCTTCTTTAAAGCCTTCAAAATATACAGATGTATTAGTC

CGTATCCATGTCTATA
TCAATATCTACCTTGATATTTATATCTATATCTAG GAG ATTTATTTATTATAAG GTATTG G CTC
ATGCAATTATGGACCtga cgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttga ccctgga aggtgcca ctccca ctgtcctttccta a ta aaatgagga aattgcatcgcattgtctgagtaggtgtcattctat tctggggggtggggtggggcaggacagca agggggaggattgggaagacaatagcaggcatgctggggatgcggtgg gctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAG
TCCCCGAGAAGTTGGGGG GAG GG GTCG G CAATTG AACGG GIG CCTAGAGAAG GIG GCG C
GGGGTAAACTG G GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGA GGGTGGG GGA
GAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCG CCA
GAACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCTG
AG GCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAA
CTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCG GGCCTTTGTCCGGCG CT
CCCTTG GAG CCTACCTAGACTCAG CCG G CTCTCCACG CTTTG CCTGACCCTG CTTG CTCAACT
CTACGTCTTTGTTTCGTTTTCTG TTCTG CGCCGTTACAGATCCAAG CTGTGACCGGCGCCTAC
TCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTG
GTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGC
GAGGGCGAGGGCG ATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGC
AAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCA
GCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTA
CGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTG
AAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAG
GACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCA
TGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGG
ACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGT
GCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAG ACCCCAACGAG
AAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGG
ACGAGCTGTACAAatgattgtttattgca gcttata a tggtta ca aata a agca atagca tca c a a atttca ca a at aa agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatcttatCAGAG
GGGTCCTGTGATT
TGCCACCTGCAGGCTGGAGACCTAGGAAACAGCTGTCGTGTCACAAAGGGCAGAG GGCTG
GAGAACCTGTAGTGTAGATTCCAGTCCAGATTCGAAGGCCTGAGGACCAGGAATGCCAAG
GGCAGGAGAAGATCGATGTCCCAGAACAAGCAATCAGGCAGAGATGAAATTCAACCTTCC
TGTG CCTTTCTGTTCTATTCAG G CCTTCAGTG G GTTGAATGAG GCCCAACC CACATTGAAG A
GG ACCATCTG CTTTACTCAGTCTCCTG GTTGAAGTGATAATCTCATCTG GA CATACTCAGAA
ATGCTGTTTAACCAGCTCTCTGGAAATCCTGTGGCCCAGTCAAGTTGACACACAAAATTAAC
CATCACACTGCAGAGACACAGTCTATCCCAATACCTGTATTTACTGGGTAACGTGCATCTCA
GCTC
pARB I-627! ACCCTCCCAGAGCTGGTTTCAATGGG
GGCATACCCATTATGGGATGCAGGGCATCCTGCAT
chr8_3_E CCTGAGGAATTTTTTTTCCTCCAAAAATGAAACCTTGAAATGAGGACATTGTCCTGTCCACG
xogenous GACTGCACAACAACACTGAGCCTCAAGGACTCATACTGGCATTTTTCTTCTTTTGCAGAGTG

GCTTCAAGCTCACGAGAAACCAGGTCGGGATTTAAACAATGTTGGGTTAA
AG CAAAGTTTCATAAAGACAGAATCAAGAAAAAAAG AAAGAG AAACCAATCTAAGTGCCA
TCCTCCCTGAGTTG CATCTTACCTGAGTCTTCAGCCGCCGCCCCCTGCTG CTGTGGGAGGAA
ACG GGAAAGTGACTG GCCtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttc cttga ccctgga aggtgcca ctccca ctgtcctttccta ata a aatgagga a attgcatcgcattgtctgagtaggtgtca ttctattctggggggtggggtggggcagga cagcaagggggaggattgggaagacaatagcaggcatgctggggatgc ggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCC
ACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGGTG
GCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGG
GG GAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACG G GTTTG CC
GCCAGAACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTA
CCTGAGG CCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCG CCTCCCGCCTGTGGTGCCTCC
TGAACTG CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGG GCCTTTGTCCG G
CGCTCCCTTG GAGCCTACCTAG ACTCAGCCG GCTCTCCACGCTTTGCCTGACCCTGCTTGCTC
125 AACTCTACGTCTTTGTTTCGTTTTCTGTTCTG CGCCGTTACAGATCCAAG CTGTGACCG GCGC
CTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGG
GGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTC
CGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCAC
CG GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGC
TTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAG
GCTACGICCAGGAGCGCACCATCTTCTICAAGGACGACGGCAACTACAAGACCCGCGCCGA
GGTGAAGTTCGAGG G CGACACCCTGGTGAACCGCATCGAGCTGAAG GGCATCGACTTCAA
GGAGGACGGCAACATCCTGGGGCACAAGCTG GAGTACAACTACAACAGCCACAACGTCTA
TATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATC
GAGGACGGCAGCGTGCAG CTCGCCGACCACTACCAG CAGAACACCCCCATCG GCGACG GC
CCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCA
ACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGG
CATGGACGAGCTGTACAAatgattgtttattgcagcttataatggttacaa ata a agca atagcatca ca a attt ca ca a ataa a gcatttttttca ctg ca ttctagttgtggtttgtccaa a ctcatca atgtatcttatATGGGGACAGG
GGTCG GGGGTGTGAGGGTTAGTGGTAGCAGGGAGGTAGCACTTCTGTGAAGCCGGAAAG
GAGACCTGAACCTG GCTCTCTGCTTTGCTGCTTGCCAGCCCTCTG GCTGGGAAGAAG GCTCT
TG G CTCTTGTGAG CCTCG GTTTCCTCTG CAGAGTG G C CTGTTCTGAG GA CAG G CTGACATTG
CTGTCCATCAGTCTTCCAGGACCCTGAGGATTTCAGCTCTACCCATGG G CACTG AACTG G AA
CATTCACTG GTCATTTG CTTTCGTTTTTTTTTTTTG TTTGTTTGTTTTTTGATACAGAGTTTCAC
TCTTGTTGCCTAGGGTGGAGTACAATGGCGTGATCTCAGCTCACCACAAGCTCTGCCTCCCG
GGGTCAAGCGATTCTCCCCAGTAGCTGAGATTACAGGCATGCGCCACCACGTCCGGCTAAT
TTTG TA
pA R B I- AAGTCAACAACAGGCTCCAGAAAAAAATGGAAATATAAATGGTCCCCAATTTG

628: GAATTTG AG G CTTATATAGTATGAG CAAAATAAACTCCTCCTG CTTCTATG
AATTCTTTG CAC
chr9_1_E 11111CTCTATAATTCCTTATCAAGCAATGGGTTIGTCTGTTAACCAAG ACTATACATAAAAT
xogenous GTATTCCCGTAGTTCCTTTTATTACATTTTTTCTTTGTTGTTACTAGTGGAGAAGCTAACAAG

CTTTGATACA
GCCAGTGTAGATACAGCTAG GACGTCATGAATCAGAGCTCCCACACACGATGATAATGGAA
ACATCTTATCCATATCTGATTTGCTCACATATAGTTATTTAGACAACAAATTTTACATTCCTG G
ATAGG GAGCCTCCGtgacga ctgtgccttcta gttgccagccatctgttgtttgcccctcccccgtgccttccttga c cctggaaggtgccactcccactgtcctttccta ataa a atgaggaa attgcatcgcattgtctgagtaggtgtcattctatt ctggggggtggggtggggcaggacagcaagggggaggattgggaaga caatagcaggcatgctggggatgcggtgg gctctatggGGATCTGCGATCGCTCCG GTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAG
TCCCCGAGAAGTTGGGGG GAG GG GTCG G CAATTG AACG G GTG CCTAGAGAAG GTG GCG C
GGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG GGA
GAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCG CCA
GAACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCTG
AG GCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAA
CTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCG GGCCTTTGTCCGGCG CT
CCCTTG GAG CCTACCTAGACTCAG CCG G CTCTCCACG CTTTG CCTGACCCTG CTTG CTCAACT
CTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTAC
TCTAGAG CTAG CGAATTgccgccaccATGGTGAG CAAGG GCGAGGAGCTGTTCACCG GG GTG
GTGCCCATCCTGGTCGAGCTGGACGG CGACGTAAACGGCCACAAGTTCAGCGTGTCCGGC
GAGGGCGAGGGCG ATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGC
AAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCA
GCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTA
CGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTG
AAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAG
GACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCA
TGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGG
ACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACG GCCCCGT
GCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAG ACCCCAACGAG
AAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGG
ACGAGCTGTACAAatgattgtttattgca gcttata a tggtta ca aata a agca atagca tca c a a atttca ca a at
126 a a agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatcttatATTTGATTTCAATCTCATA
GTCCAGGTAGGTGTAAAATGACTGCCAAGCCAG CAGACATAAATCTGTATTATCTCATTTTC
ATCACAGTCTCTGCATTGG GAAAAAAATGTGAGTG AAAAGACATTTTCTCCCACCTTGTTAA
GTACACAATTACATAGTTTAAAGAAGATAAGAATTTCAAATTGTTAGAATGATATCATTATT
CAAATGATTCCCTGAGAACTTCAAGTATTCCCCTTCCCCCAAATGTAGGCAGGGCAAAATAG
AACCAGTAGGAATGAAAAATGTTAATTGTTTCAGGTTCTGTAATTTTGGGTTTAATGACTCA
GG CTATGTGCTACTGAGTAAAATACTCATCTG GAGAAAAAGACTGTTG CATGACATATTTCT
GATTAAGGACAATTTAAACTAATTCTAAGAAAACTCTTACATACTGCACATGCCATAAA
pARBI- GATTAGGTCTGGAATAAGATTGCACCATGACTAAAATGACTTCTTTAGCCAATGACATTCCT 2541 629: GAAGATTG GATCAGAGTAAAGTTAAACTAACCTACTCAAGGAAGACTTTCCTAAATCAG CC
ch r9_2_E CCTTTGCTGTATAATTTAGTATGGAGAGCTGATTGTGGGTTTTTCTTGTTTCATTGTTTAATA
xogenous CATAGAGGTTTCACATTTCATTTTGAAACAGAAAGAGGGACTGGAAGAGATAAGTTAGTAT

TTGATAATCTCAG GAAGGACTAGACTGCTAAGAAAGAGGAGCCCAGATTGGAATAAAAGA
GACGATGCCAGGATCAGCCTGTGAGAGGATGTCTCACTTAGTACACCTCAAGGATCATGTC
AG CTGAG CATCTATACCCCGCtga cgactgtg ccttctagttgccagccatctgttgtttg cc cctcccccgtgcc ttccttgaccctggaaggtgccactccca ctgtcctttccta ata a a atga gga a attgcatcgcattgtctgagtaggtgt cattctattctggggggtggggtggggcagga cagcaagggggaggattgggaagacaatagcaggcatgctggggat gcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCG CACATCGC
CCACAGTCCCCGAGAAGTTG GGG G GAG G GGTCGGCAATTGAACG GGTGCCTAGAGAAGG
TGGCGCGGG GTAAACTGG GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTG
GG GGAGAACCGTATATAAGTG CAGTAGTCG CCGTGAACGTTCTTTTTCGCAACGGGTTTGC
CGCCAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCT
ACCTGAGGCCGCCATCCACGCCG GTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTC
CTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCG
GCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGC
TCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTG CG CCGTTACAGATCCAAG CTGTGACCG GC
GCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACC
GG GGTGGTG CCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTG
TCCGGCGAGGGCGAG GGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACC
ACCGG CAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT
GCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAA
GGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCG
AG GTGAAGTTCGAG G GCGACACCCTG GTGAACCGCATCGAG CTGAAGGG CATCGACTTCA
AG GAG GACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCT
ATATCATGGCCGACAAGCAG AAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACAT
CGAGGACGGCAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGG
CCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTG AGCAAAGACCCC
AACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCG
GCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcacaaatt tca caaataaagcatttttttcactgcattctagttgtggtttgtccaaa ctcatcaatgtatcttatGTGATGAATATT
GCATACAG ACTCTGCCTTCAAGTCATCCCCCATTCGACAAGCATTTATTTAAG GTTCTTACAT
TCCAGGACTTGTACTAGATGCTAGAAACAAACAGATAAGGAAAACGTAGGAAAATGCTTCT
CTCAG GAA GCAATATTGTATATGAGTGGTGAAG AG CTTCAAATCTGG G G CCAAACTGCTGG
GTTTGAAATTTGGCTTGGCCACCTACTGGCTGTG AAATCTTGGGCAACTTTCTTAATCACTCT
GTGCCTCAGTTTCTTCATCTGTAAAATGAAATAGCCATAGTACTTATTACAAATAGGTAATA
ATTAGTGAGAATTAATTTGAGTCAATACATAAAAATG ACTCAGAATAAAAGACCAGCTAAG
AATATATACACACATACTTAGCTACTTATAATCTAGTAGGAAAGATGGGTATGTTATCAGAG
GACTT
pARBI- CAGATAAAAAAAGCTGAAGCTCAG GGATATTGTCATTTACCCATGAG CTCACAG

630: GTGGTGAAACTATCATTAAACTACGGTTGGCCTTGTTCTTTTGTGGTTTCATTTCACTAAGCC
chr9 3 E TCTGACACCCTGACCTCAACTTCCTTCTGTAGATATGTCACCTG GCCTCCTTCAGTCTAAGAC
xogenous ACTACACACTGGCCACACACTTTGTCATCCTCATAGACTGACAACCTGCTCTCACATACCCTG

ACTCCTAGTCATCCTACAAGAACCACATTCACTTTAAGTTTCTTTTGAAGCCTTTTCAGTGTCC
127 CATACCCTGATCCCATTACCTCTTCTATGTATG CAGGG CTTGCTTCATGCTTGTGTAACCCAA
GGGCCCTG
Ctgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttgaccctgga a ggtgcca ctccca ctgtcctttccta ata a a atgagga a attgcatcgcattgtctgagtaggtgtcattcta ttctggggg gtggggtggggcaggacagcaagggggagg attgggaagacaatagcaggcatgctggggatgcggtgggctctatg gGGATCTGCGATCGCTCCGGTGCCCGTCAGTGG GCAGAGCGCACATCG CCCACAGTCCCCG
AG AAGTTG GGG G GAG G GGTCGG CAATTGAACG GGTGCCTAGAGAAGGTGGCG CGG G GT
AAACTG GGAAAGTGATGTCGTGTACTG GCTCCGCCTTTTTCCCGAGG GTG G GGGAGAACC
GTATATAAGTG CAGTAGTCG CCGTGAACGTTCTTTTTCG CAACGGGTTTG CCGCCAGAACAC
AG CTGAAG CTTCGAGG GG CTCGCATCTCTCCTTCACGCGCCCG CCGCCCTACCTGAG GCCG
CCATCCACGCCG GTTGAGICGCGTICTGCCGCCICCCG CCTGIGGIGCCTCCTGAACTG CGT
CCG CCGTCTAG GTAAGTTTAAAGCTCAGGTCGAGACCG G GCCTTTGTCCG G CGCTCCCTTG
GAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTACGT
CTTTGTTTCGTTTICTGTTCTGCGCCGTTACAGATCCAAGCTGIGACCGG CGCCTACTCTAG A
GCTAG CGAATTgccgccaccATGGTGAGCAAG GGCGAG GAGCTGTTCACCG G GGTGGTGCCC
ATCCTGGTCGAGCTGGACG GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGG GC
GAGGGCGATGCCACCTACGG CAAGCTGACCCTG AAGTTCATCTGCACCACCGGCAAG CTG C
CCGTGCCCTG GCCCACCCTCGTGACCACCCTGACCTACGG CGTGCAGTGCTTCAG CCG CTAC
CCCGACCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATGCCCG AAGG CTACGTCCAGG
AG CGCACCATCTTCTTCAAGGACGACGG CAACTACAAG ACCCGCGCCGAG GTGAAGTTCG A
GGGCGACACCCTGGTGAACCG CATCGAGCTGAAGGGCATCGACTTCAAGGAG GACG GCAA
CATCCTG GGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGG CCG AC
AAGCAGAAGAACGG CATCAAGGTGAACTTCAAG ATCCGCCACAACATC GAG GACG GCAGC
GTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGG CGACG GCCCCGTG CTGCTGC
CCGACAACCACTACCTGAGCACCCAGTCCAAG CTGAGCAAAGACCCCAACGAGAAGCGCG
ATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCT
GTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatca ca aa tttca caa ata a agcattt ttttca ctgcattcta gttg tggtttgtcca a a ctcatca atgta tctta tATTTG ATTTAA CAC
ATTTTAAAATTC
TTAACAATTTATGAAAAAG GAAACCCATATGTTCATTCTGCCCTG GACCCTGCAAATTATGC
AG CCAATCCTGTGTG TGTTTTTTCATG TTG CGTA G A CTCTATTA CAG TCA G TAACATATTG AG
AAAG A GTG TGTGTCTCCTCTG ATAGTCTATG AG CTCCTTG AAG CAA G AG G CTTGTCTTAGTC
ATTGCTGTATCCTCACACCAGACTCTACG GGATG CTCAGGATTCACCTGTTGATCTGTGAGG
AAGCCCAAGGCTCAGTTAATGCCCCCCACCCTCCTTCCATAACAGTCTCCTGATTAGGAAAC
AGTGCTTGACCTCTAGACAATCTGAGTATCCACTCCAGGAAGTGGAACATTGCCTTCCCAAT
GTTG AAACTGTTAG CA CTCAATCAT GTTTCA GG AACATATAG AA GGTATGT
pARBI- TG GACCTTATGCAGGGTAATAAAAACCAACACGTG GGGCCTGTGCTGTAAG GTG

631: GGCGATGACTCACCCCACTCATGCAGGTTGAGTCATTTCAAGGCAGAGTTGTG
CCAGTTCA
ch r10_1_ GTACCCAGTAATATTTTCCAGTCGACGAGTATCAGTGAACAGG AGATAACCAGTCATTTCTA
Exogenou GATTCTGCTCAGAGTCCCAG CTTAG AGGCTCCACCAGCTCAAAGAGACGGGATGGCAAAAC
s_90 AG
CCACCTTAATTTCCAGATCATCTGCCAGCTCATCTGCCAGCTGACACACAGCCCATGG CT
CCACTATTGCCAGTGCTGTAGCTG CACCAGACCAATCTGGTTCAACTTTTATGTAACAAAGT
TGTGATTTGTTTTTCAGTTG CCCG G G ACTCCCA G GTTG AAG ATCATG AG CATG CCCTG ATG A
GCCAAGGCATGCAACCACAGtgacgactgtgccttctagttg cc agccatctgttgtttgcccctcccccgtgcct tccttgaccctggaaggtgccactcccactgtcctttcctaataaaatgaggaaattgcatcgcattgtctgagtaggt gtc attctattctggggggtggggtggggcaggacagcaagggggaggattgggaagacaatagcaggcatgctggggat gcggtgggctctatggGG ATCTGCG ATCGCTCCG GTG CCCGTCAGTG GG CAGAGCG CACATCGC
CCACAGTCCCCGAGAAGTTG G GGG GAGGG GTCG GCAATTGAACGGGTGCCTAGAGAAGG
TG GCG CGGG GTAAACTGG GAAAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCCGAGGGTG
GG GGAGAACCGTATATAAGTG CAGTAGTCG CCGTGAACGTTCTTTTTCGCAACG GGTTTGC
CG CCAGAACACAG CTGAAGCTTCGAGG GGCTCG CATCTCTCCTTCACG CG CCCG CCGCCCT
ACCTGAGG CCGCCATCCACG CCG GTTGAGTCGCGTTCTGCCG CCTCCCG CCTGTG GTGCCTC
CTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAG CTCAGGTCGAGACCGG GCCTTTGTCCG
GCGCTCCCTTGG AG CCTACCTAGACTCAG CCGGCTCTCCACGCTTTGCCTGACCCTG CTTGC
TCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTG CG CCGTTACAGATCCAAG CTG TG A CCG GC
GCCTACTCTAGAGCTAGCGAATTgccgccaccATGGIGAG CAAGG GCGAG GAGCTGTTCACC
GG GGTG GTG CCCATCCTG GTCGAG CTGGACGGCGACGTAAACG G CCACAAGTTCAGCGTG
128 TCCGGCGAGGGCGAG GGCGATG CCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACC
ACCGG CAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT
GCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAA
GGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCG
AG GTGAAGTTCGAGG GCGACACCCTG GTGAACCG CATCGAGCTGAAG GGCATCGACTTCA
AG GAG GACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCT
ATATCATGGCCGACAAG CAGAAGAACG GCATCAAGGTGAACTTCAAGATCCGCCACAACAT
CGAGGACGG CAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGG
CCCCGTGCTG CTGCCCGACAACCACTACCTGAG CACCCAGTCCAAGCTG AGCAAAGACCCC
AACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCG
GCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcacaaatt tca ca a ata aag ca tttttttca ctgcattctagttgtggtttgtcca aa ctcatcaatgtatcttatGGGAACCTAAG
TGCTTGGACTGAGGAGTGGGG ACTGAAATAAGAAGCG GACACTGCAGGGCAGGATCAGG
ATCCAATCAGATCGAGCCCTGGCATCACCTCATGGCAGGATCTAGTCAGATCGTGGCTTTTG
ACATCACTTCATTGTG AAATCCAATCAG ATCACACCTTATTACC CTATCCTTATAAAACCG GA
CCTAGCCCCCAGCTCG AGGAGGCACTGCTTTGGGAGTTATCCCAGGTGTTCTTGTTACTAGT
TGCAAGGAATAAAATCCTCTTGCTAAATCCTCCTTG GTTGTGCAGATGTTGGCATGGATGTG
GAGAGAAGGGAACACTTACGCTTTGTTGGTGAGGATGTAAATTAGCTCAACTCTGTGATAA
ACAGTATG GAGAGTTTTCAAAGAACTAAAAATAGAACTTCTACTCAACCCAGCAATCCTATG
AGTGCATG
pARB I-632:
TAACAGAAGCAGATCCAAGCCTTTTAGCTAATTAAATAAAAATGCACCATACAGCAACCATC
ch r10_2_ CTGCCTGCTTGGGTGTAAAATAGAGAG GAGAAAGATG GCATGCTGGTCATTGAGAGATAA
Exogenou TTACCTG CAATTTCAG AG CTGGAATCTAGTG ACAACATAAGAAAAATAATAACCTCTCTG GC
s_88 TGTTAGGATTCTCAGTTCTAGCTGCGAGCAAGGAGCAAAGGCACCCACAGACTTGCCAAAC
CTATGTCAGGAATGACGAATAGAAGGCAGAAATCCACATCCCCAGGTGGTAAAATTGTTAT
CCACCTTCGTCACCTCTTTCCTAATGGAGGAAGTGAGGACAGGCAGCCTTGGAGTCCTACTT
GAATGAG GCCTGGACCTTATtg a cga ctgtgccttctagttgccagcc atctgttgtttgc cc ctcc cc cgtg cct tccttga ccctgg a aggtgcca ctccca ctgtcctttccta ata a aa tgagga a attgcatcgcattgtctgagtaggtgtc attctattctggggggtggggtggggcaggacagcaagggggaggattggga a ga ca atagcaggcatgctggggat gcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCG CACATCGC
CCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGG
TGGCGCGGG GTAAACTGG GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTG
GG GGAGAACCGTATATAAGTG CAGTAGTCG CCGTGAACGTTCTTTTTCGCAACGGGTTTGC
CGCCAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCT
ACCTGAGGCCGCCATCCACGCCG GTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTC
CTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAG CTCAGGTCGAGACCGG GCCTTTGTCCG
GCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGC
TCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTG CG CCGTTACAG ATCCAAG CTGTGACCG CC
GCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACC
GG GGTGGTG CCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTG
TCCGGCGAGGGCGAG GGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACC
ACCGG CAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT
GCTTCAG CCGCTACCCCGACCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATG CCCGAA
GGCTACGTCCAGGAG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCG CCG
AG GTGAAGTTCGAG G GCGACACCCTG GTGAACCGCATCGAG CTGAAGGG CATCGACTTCA
AG GAG GACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCT
ATATCATGGCCGACAAGCAG AAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACAT
CGAGGACGGCAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGG
CCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTG AGCAAAGACCCC
AACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCG
GCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcacaaatt tca ca a ata aag ca tttttttca ctgcattctagttgtggtttgtcca aa ctcatcaatgtatcttatGCAGGGTAATA
AAAACCAACACGTG GGGCCTGTGCTGTAAGGTG GCAGAGGGCGATGACTCACCCCACTCA
TGCAGGTTGAGTCATTTCAAGGCAGAGTTGTGCCAGTTCAGTACCCAGTAATATTTTCCAGT
129 CGACGAGTATCAGTGAACAGGAGATAACCAGTCATTTCTAGATTCTGCTCAGAGTCCCAGC
TTAGAGGCTCCACCAGCTCAAAGAGACGGGATGGCAAAACAGCCACCTTAATTTCCAGATC
ATCTG CCAG CTCATCTGCCAG CTG A CACACA G CC CATG G CTCCA CTATTGCCAGTG CTGTA G
CTG CACCA G ACCAATCTG GTTCAACTTTTATGTAACAAAG TTG TG ATTTG TTTTTCA GTTG CC
CGGGACTCCCAGGTTGAAGATCATGAGCATGCCCTGATGAGCCAAGGCATGCAACCACAG
GGGAACCTAA
pARB I-633: TCCACCTCCCAAAGCGCTGAGATTATAGGCATGAGACACAGAGCCCAGCCACTGTCCGCAT
chr10_3_ TTCTTTCCTGGCTTCCTTACTGGGTTCTCTACTTTCACTCTTGCCTTCTTCCTCTCTTCCACCAG
Exogenou CATAAGTTTTAGAAACTGCACTTCTGATCATGCTGCTCTGCTGCTTAAAATCCTTCATGCAGA
s_89 TCATGTAAG A CCTTCCTG ATCCA GTGCTGCCTTG CC CTCTA G AA CCCAG
GG TG CCTTTCTCC C
CTCTGCA GTT CTACTG CG C CAC CTG CA GTTTCTCTCTC CTG TGTTCATG TGTCTACTC CCTCTG
AATG G TA G GCTCACACC CTCCACCT G G C CAACTCTTACTAAATCTTG G G CTTCAA CTCAA G G
ATCACCTTCAtga cga ctgtgccttctagttgccagccatctgttgtttgc cc ctcccccgtgccttccttga ccctgga aggtgccactccca ctgtcctttccta ata a aatgagga a attgcatcgcattgtctgagtaggtgtcattctattctgggg ggtggggtggggca ggacagca a gggggaggattggga aga ca atagcaggcatgctggggatgcggtgggctctat ggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTCCCC
GAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGGTGGCGCGGGG
TAAACTGG GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG GGAGAACC
GTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAACAC
AG CTG AAG CTTCGAGG GG CTCGCATCTCTCCTTCACGCGCCCG CCGCCCTACCTGAG GCCG
CCATCCACGCCG GTTGAGTCGCGTTCTGCCGCCTCCCG CCTGTGGTGCCTCCTG AACTG CGT
CCG CCGTCTAG GTAAGTTTAAAGCTCAGGTCGAGACCG G GCCTTTGTCCG G CGCTCCCTTG
GAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTACGT
CTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGG CGCCTACTCTAG A
GCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCC
ATCCTGGTCGAGCTGGACG GCGACGTAAACGGCCACAAGTTCAGCGTGICC GGCGAGG GC
GAGGGCGATGCCACCTACGGCAAGCTGACCCTG AAGTTCATCTGCACCACCGGCAAGCTG C
CCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTAC
CCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCG AAGGCTACGTCCAGG
AG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAG ACCCGCGCCGAG GTGAAGTTCG A
GGGCGACACCCTGGTGAACCG CATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGGCAA
CATCCTG GGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGG CCG AC
AAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGC
GTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGC
CCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCG
ATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGG ATCACTCTCGGCATGGAC GAGCT
GTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatca ca aa tttca caa ata a agcattt ttttca ctgcattcta gttgtggtttgtcca a a ctcatca atgtatcttatTG
GTTCACTCCACCAGGCTGGG CTG
TGTGTCCTGCCCAGTTCCTATAGTGCCCTAACTCACTTTTTTAAAAAAATACTTTAAGTTCTG
GG GTACATGTG CAG AA CATG CAG G TTTGTTA CATA G GTATACACTTG CCATG G TGG TTTG C
TGAAG CCATCAATGTCATCTA CATTAG G CATTTCTCCTAATG CTGTCCCTCCC CTAG CCTC CC
ACCCCCCAACAGG CCCCAGTGTGTGATGTTCCCCTTCCTGTGTCCATGTGTTCTCG TTG TTCA
GCTTCCA CTTATG AG TGAG AACATG CGG TGTTTG G TTTTCTG TTC CTGTG TTAG TTTG CTG A
GAATGATGCTTTCCAGCTTCATCCATGTCCCTG CAAAGGACATGAACTCATTCTTTTTTATGG
CTGTCCATTATTCCATG GTGTATGTG TG CC ACATTTTATTTATCCAGTCT
pAR B I- CCTTCCA CCCCCTTCCTTAATAGTTTTCCTCCTCCGCCTCCTG CACGGCCTCCG

634: ACCGTCCTCGG AAACCTCCCAG CCG CAGCCCGG CCCTCCCCAGCGG CCTGAGTG GCCTCCT
chr10_1_ CCCGCGGCGCCCCCCAGCTCCGCACCTCCCCAGCCCCGCCCGAGGCGCGCCCAGCGAGCGT
Exogenou GACCCCGACCCAGGGCCACGG CTGCGCCCGTAG GATCGCGG GCGCGCG GGTCTCCATG AG
s_92 GGACTGGGAGAGATGCGTCACCCGCTAATCCCTCAACTGGGTCGCACCCGGTCTGCCTTGG
TTGGTCCCTTCTGATGGCGGAGGCGGGACCTCGACTTG GTGGCCAATGAGGAGCCTCGTG
GGGGTGCTCTCGGCTGCCATGGCAACGGAGGGGCCGCCCGCTTCCCCGCGGGGCGAGGCC
GGCTCTCCCAGGACTGGCCACATtgacga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgt gccttccttga ccctgga a ggtgcca ctccca ctgtcctttccta ata a aa tg agga a attgcatcgcattgtctgagtag
130 gtgtcattctattctggggggtggggtggggcagga cagcaagggggaggattgggaagacaatagcaggcatgctgg ggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACAT
CG CCCACAGTCCCCGAGAAGTTG GGGG GAGG GGTCGGCAATTGAACGGGTGCCTAGAGA
AG GTGGCGCG GGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCG CCTTTTTCCCGAG
GGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGT
TTGCCGCCAGAACACAG CTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCG
CCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTG CCGCCTCCCGCCTGTGGTG
CCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAG GTCGAGACCGGGCCTTTGT
CCG GCGCTCCCTTGGAG CCTACCTAGACTCAG CCGGCTCTCCACGCTTTGCCTGACCCTG CT
TGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACC
GG CGCCTACTCTAGAGCTAG CGAATTgccgccaccATGGTGAGCAAG GGCGAGGAGCTGTTC
ACCGGGGTGGTG CCCATCCIGGICGAGCTGGACG GCGACGTAAACG GCCACAAGTTCAGC
GTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGC
ACCACCGG CAAGCTGCCCGTGCCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGC
AGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCC
GAAGGCTACGTCCAGGAG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCG
CCGAG GTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACT
TCAAGGAGGACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACG
TCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAA
CATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGA
CG GCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGAC
CCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTC
TCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcaca aatttca ca a ata aagcatttttttca ctgcattctagttgtggtttgtcca a a ctcatcaatgtatcttatTGCAGTCAC
GCGCTGTGGCCCCAGTCCTGTTAGCATGAGGTGGGGGTCAGAAGTTGAGGAACAACTCGG
ACCTAAAGCCTTGACAATTGCCAACAGTGTAGACAAGTCCCAGCATGCCCAAAATTTATAAG
AGAAGGCTGCCTGGAAGAG GGACACGATGAGGAGGGTCTGCTGAAACGGACAAAGAAAC
CCAGTTCAGTGCAGTGGAGTGGACATGGCACGGGCAG GGCACGGAGTG GGGTCGGTG CT
TAGTAAGCGGATTAGGGTTTCCAAGCCAAGACCCACCCAGGGCAG GATGGGGAACCAAGG
TGAGCAGGTGCGCTGAATTGTGG GTCTCACACTGTCCTTCAGTGTCCCGCAG GAAGAGAGA
GCTAAGGATCACCAGGGTCATTCCCCTGCCTCAGGCAG CCTGCACTCCACCTTAG GTACAGT
CATGTCCGGTCTTCAG CA
pARB I-635: CAGTTTTTAG
CCTGTCAGCTCTTTGAGGGACACTTGTTATGGTCATACCCACAGCTTTGACCC
ch r10_2_ TTACCAGGTGGTCAGGTGGATGAATGAATGAGTAGAAAGAAGGAAGAGAAGGCAGGAGG
Exogenou GAGAAGAGGAAGAATGAAG GAAG GGTTGAGTTTGTTCTGGAAGCTCCCAGCTTCCATCCG
s_91 CACCACACTCTCCCTCTGTCCCCAGGACACCAGGGAACTGACTCAAG
GCTCTCCTCTTACCA
GGTGCTGTGTGGGGTCCAGGGAGAAGAGCCCATTTGAAATGGGGCACTGGTCTAAACTCT
GG ACCTG CTG CTCACTATTGACTGAACTTAG CCA CTTAAATATTCTCCTCTATAAAATG G GA
GTGACCATAG CAACTACCTCGCtgacgactgtgccttctagttg cc agccatctgttgtttgcccctcccccgtgc cttccttgaccctgga a ggtgccactcccactgtcctttccta ata a a atgagga a attgcatcgcattgtctgagtaggtg tca ttctattctggggggtggggtggggcagga cagca agggggaggattggga ag a ca atagcaggca tgctgggga tgcggtgggctctatggGGATCTGCGATCG CTCCGGTGCCCGTCAGTGG GCAGAG CGCACATCG C
CCACAGTCCCCGAGAAGTTG G GGG GAGGG GTCG GCAATTGAACGGGTGCCTAGAGAAGG
TG GCG CGGG GTAAACTGG GAAAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCCGAGGGTG
GG GGAGAACCGTATATAAGTG CAGTAGTCG CCGTGAACGTTCTTTTTCGCAACG GGTTTGC
CG CCAGAACACAGCTGAAGCTTCGAGG GGCTCG CATCTCTCCTTCACG CG CCCG CCGCCCT
ACCTGAGG CCGCCATCCACG CCG GTTGAGTCGCGTTCTGCCG CCTCCCG CCTGTG GTGCCTC
CTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAG CTCAGGTCGAGACCGG GCCTTTGTCCG
GCGCTCCCTTGG AG CCTACCTAGACTCAG CCGGCTCTCCACGCTTTGCCTGACCCTG CTTGC
TCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTG CG CCGTTACAGATCCAAG CTG TG A CCG GC
GCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAG CAAGG GCGAG GAGCTGTTCACC
GG GGTG GTG CCCATCCTG GTCGAG CTGGACGGCGACGTAAACG G CCACAAGTTCAGCGTG
TCCGGCGAGG GCGAG GGCGATG CCACCTACG GCAAGCTGACCCTGAAGTTCATCTGCACC
ACCGG CAAGCTGCCCGTG CCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT
131 GCTTCAG CCGCTACCCCGACCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATG CCCGAA
GGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCG CCG
AG GTGAAGTTCGAGG GCGACACCCTG GTGAACCG CATCGAGCTGAAG GGCATCGACTTCA
AG GAG GACG GCAACATCCTGG GGCACAAG CTGGAGTACAACTACAACAG CCACAACGTCT
ATATCATGGCCGACAAG CAGAAGAACG GCATCAAGGTGAACTTCAAGATCCG CCACAACAT
CGAGGACGG CAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGG
CCCCGTGCTG CTGCCCGACAACCACTAC CTG AG CACCCAGTCCAAGCTG AGCAAAGACCCC
AACGAGAAGCG CGATCACATG GTCCTG CTGGAGTTCGTGACCGCCGCCGG GATCACTCTCG
GCATG GACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcacaaatt tca ca a ata aag ca tttttttca ctgcattctagttgtggtttgtcca aa ctcatcaatgtatcttatTGAATTGTAGG
GA GACCCAAAAAATGTATATGAATA CATTTGTCAACTAAAGAGTGTTAG G G ATTATCTCA GT
GGAGGGATAGG CAGTGCAG GCTCTG GAACCAGACTTCTTGTGTTCACATCCCTGCTCCACT
G CTTG G AG ACCTTTG G CAA G GTCCTTAACTCCTCTG GACCTCAGTTTTCTCATCTGTAAAACA
GG GATTATACTA GTTCCTA CCTCACA GA GTTGTGAAG ATTAAATGAGATAATG CATGAAAG
GCTTTTCAATAA G AG CA G G G CACACATTAGGCATGATGTCCATGATCAAGTGCCTAATATGT
GCCTGCTCTCAGGAAGGATGCTGTGAACTACTTCACACTAAAGTGTAAATGAATGATAAGA
TCTACTATGCTCCACTCATG CCTGTAATCCTAGCACTTTG G GAG G CCG AAG CAA GTG GATCA
CCTGAGG
pA RB I- GGTGAGGTCACCCTGGGTACCTG

636: AGTGATG GATGAGCTAGG CTCCAG GAAAGCAGGTGGG GCTG
GAAACCACCCAAGACAAG
ch r103 GAAATTGTTGG GTTTGAAATTCCAGGTAAGCAGAGTATGGCCTGAGAG CAGCG
GCAGAGG
Exogenou CTGAAAGTGGG CTTGGTCTCGGTATTTTTGAGGCTGAG GTAGCTCCAACTCTCTCCCTCTGA
s_93 GTCAG
CACCCCACTTAAGACAGAAATACTGAGATCTGCGACCCCAGCCTCCCCCACAGCAA
CTCCACAGTCCCCAAACCTTTGCTCCG CTG GGTCCTCCCAAGTATGGTAGCTGACACAGCCC
TGAGCAG CCCACTGTCCTCTGCCCTGCCAAGTCTTCTCACTCTTGTCACTTTGTCCTACATCAT
TTCTTTGTCTGGCACCCCTCCTtga cgactgtgccttctagttgccagcca tctgttgtttg cc cctcccccgtgc c ttccttgaccctggaaggtgccactccca ctgtcctttccta ata a a atga gga a attgcatcgcattgtctgagtaggtgt cattctattctggggggtggggtggggcagga cagcaagggggaggattgggaagacaatagcaggcatgctggggat gcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTG GG CAGAGCG CACATCGC
CCACAGTCCCCGAGAAGTTG G GGG GAGGG GTCG GCAATTGAACGG GTGCCTAGAGAAGG
TG GCG CGGG GTAAACTGG GAAAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCCGAGGGTG
GG GGAGAACCGTATATAAGTG CAGTAGTCG CCGTGAACGTTCTTTTTCGCAACG GGTTTGC
CG CCAGAACACAG CTGAAGCTTCGAGG GGCTCG CATCTCTCCTTCACG CG CCCG CCGCCCT
ACCTGAGG CCGCCATCCACG CCG GTTGAGTCGCGTTCTGCCG CCTCCCG CCTGTG GTGCCTC
CTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAG CTCAGGTCGAGACCGG GCCTTTGTCCG
GCGCTCCCTTGG AG CCTACCTAGACTCAG CCGGCTCTCCACGCTTTGCCTGACCCTG CTTGC
TCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTG CG CCGTTACAGATCCAAG CTGTGACCG GC
GCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAG CAAGGGCGAGGAGCTGTTCACC
GG GGTG GTG CCCATCCTG GTCGAG CTGGACGGCGACGTAAACG G CCACAAGTTCAGCGTG
TCCGGCGAGG GCGAG GGCGATG CCACCTACG GCAAGCTGACCCTGAAGTTCATCTGCACC
ACCGG CAAGCTGCCCGTG CCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT
GCTTCAG CCGCTACCCCGACCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATG CCCGAA
GGCTACGTCCAGGAG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCG CCG
AG GTGAAGTTC GAG G GCGACACCCTG GTGAACCGCATCGAG CTGAAGGG CATCGACTTCA
AG GAG GACG GCAACATCCTGG GGCACAAG CTGGAGTACAACTACAACAG CCACAACGTCT
ATATCATGG CCGACAAG CAG AAGAACG GCATCAAGGTGAACTTCAAGATCCG CCACAACAT
CGAGGACGG CAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGG CGACGG
CCCCGTGCTG CTGCCCGACAACCACTAC CTG AG CACCCAGTCCAAGCTG AGCAAAGACCCC
AACGAGAAGCG CGATCACATG GTCCTG CTGGAGTTCGTGACCGCCGCCGG GATCACTCTCG
GCATG GACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcacaaatt tca ca a ata aag ca tttttttca ctgcattctagttgtggtttgtcca aa ctcatcaatgtatcttatAGAATTGGTGC
CCCAGAGGTGG GAAAAACCTACCCTG CTCACTGCTCTATCTCCTGG GCTTAG CTCAGACTCT
GGCATACGTGAGTGCTTAGTAGAAACTTG ATG G GGTATTAAAATTGAGACCTGGGTG G GTT
CTCACATTAAATTTCTAA CA CTCCTTAG GTGGCTTGGG CCCACCCCTCTG AG CCCTGTTTCCA
CATCTGTAAAGTAGCAAACCATCGCCATGAAGAACAGGGAAGAACCAGGTCAGCAGGCAG
132 GGCGGATGAAG CATCCAG CCA GTTCTAGGAAGTTTCCGATG G CTAATCCTTCAGATGCA GC
TTGTAAG CAG G TCAAATACAG AATAG G AAACTG AG G CGCAACGAAGTCATGCTTAGAAGT
CTGTCCAGGGAGACCAGTCAAGGCCTCCCCCACACCCTGCAAGCCCTTCTAGGAGCTCTGG
TTAAGTCACTC
pARB I- ATCTTCAAATTG TAGTCTTTGTAA CAA CCAAATAAC CTTTTGTG GTCACTG

637: TTG GTA G A CAG AATCCATGTACCTTTG CTAAG G TTA G AATG
AATAATTTATTGTATTTTTAAT
chr11_1_ TTGAATGTTTGTGCTTTTTAAATGAGCCAAGACTAGAGGGGAAACTATCACCTAAAATCAGT
Exogenou TTGGAAAACAAGACCTAAAAAGGGAAGGGGATGGG GATTGTGGGGAGAGAGTGGGCGA
s_99 G G TG CCTTTA CTA CATGTG TG ATCTGAAAAC CCTG CTTG G TTCTG AG
CTGC GTCTATTG AAT
TGGTAAAGTAATACCAATGGCTTTTTATCATTTCCTTCTTCCCTTTAAGTTTCACTTGAAATTT
TAAAAATCATGGTTATTTTTATCGTTGGGATCTTTCTGTCTTCTGGGTTCCATTTTTTAAATGT
TTAAAAATATGTTGtga cga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttga cc ctgga aggtgcca ctccca ctgtcctttccta a ta aaatgagga aattgcatcgcattgtctgagtaggtgtcattctattc tggggggtggggtggggcaggacagca agggggaggattgggaagacaatagcaggcatgctggggatgcggtggg ctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGG GCAGAGCGCACATCGCCCACAGT
CCCCGAGAAGTTGGGGGGAGGG GTCGGCAATTGAACGGGTGCCTAGAGAAGGTGGCGCG
GGGTAAACTGG GAAAGTGATGTCGTGTACTG GCTCCGCCTTTTTCCCGAG GGTGGGG GAG
AACCGTATATAAGTGCAGTAGTCGCCGTG AACGTICTTTITCGCAACGG GTTTGCCGCCAG A
ACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCTGAG
GCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAACT
GCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAG GTCG AG ACC GGGCCTTTGTCCGGCGCTCC
CTTG GAG CCTACCTAGACTCAG CCG G CTCTCCACG CTTTG CCTGACCCTGCTTG CTCAACTCT
ACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCG GCGCCTACTC
TAG AG CTA GCG AATTgccgcca ccATGGTGAGCAAGGGCGAG GAGCTGTTCACC GGGGTGG T
GCCCATCCTGGTCGAGCTGGACG GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGA
GGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAA
GCTG CCCGTGCCCTGG CCCACCCTCGTGACCACCCTG ACCTACG GCGTGCAGTGCTTCAG CC
GCTACCCCGACCACATGAAG CAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGT
CCAGGAGCGCACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCGCGCCGAGGTGAA
GTTCGAGG GCGACACCCTG GTGAACCG CATCGA GCTGAAGGG CATCG ACTTCAAGGAG GA
CG GCAACATCCTGG G G CA CAAGCTG G A GTACAA CTA CAA CAG CCACAACGTCTATATCATG
G CCGACAAGCAGAAG AACG G CATCAAG GTGAACTTCAAGATCCG CCACAACATCGAGG AC
GGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGC
TGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAA
GCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGAC
GAGCTGTACAAatgattgtttattgcagcttata atggttaca a ata a agca ata gcatca ca a atttca ca aata a agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatctta tACATGGTAGTTCAGTTCTTA
ACCAATGACTTGGGGATGATGCAAACAATTACTGTCGTTGG G ATTTAG AG TGTATTAG TCA
CGCATGTATGG G G AA GTAGTCTC G G GTATG CTGTTGTG AAATTG AAACTGTAAAA GTAG AT
G G TTG AAAGTACTG G TATG TTG CTCTG TATG GTAAG AA CTAATTCTGTTACGTCATG TACAT
AATTACTAATCACTTTTCTTCCC CTTTA CAG CACAAATAAAGTTTG AG TTCTAAA CTCATTA G
AATTGTTGTATTG CTATG TTACATTTCTC G A CCCCTATCA CATTGC CTTCATAA CG ACTTTG G
ATGTATCTTCATATTGTAGATTTAGGTCTAGATTTGCTAGCTCCAAGTAATTAAGGCCATGTA
GGAGAGCATGGTAACCACAGATAGAACTGGTATTATCCCAAGTGGTCTGCAGACTGC
pARB I- TTACAG CA CAAATAAAGTTTG A GTTCTAAACTCATTAGAATTGTTGTATTG

638: TCTCGACCCCTATCACATTGCCTTCATAACGACTTTGGATGTATCTTCATATTGTAGATTTAG
chr11_2_ GTCTAGATTTGCTAG CTCCAAGTAATTAAG G CCATGTAG GAGAG CATG GTAAC CACAG ATA
Exogenou GAACTGGTATTATCCCAAGTGGTCTGCAGACTGCTGAGTGG G GATG GGATCTGCTCTCTGT
s_97 TGAGAGTTGGTAATCATTGGTTTGAAATGTGATGAAACCACTCAAGCCAATGAAGGTGGGT
GTGTAG GTG G G G AG TACTTTG CCATAATATTTTAAAA CATTA C CTG GTTAG A G TTCTAAGTG
GTACTTATTTTTGTTTGGTTAGGGGAAAGCCTGAATAAAAACAGAAATGGACACATAATAT
GCATATTCCATAGTCTTTtgacga ctgtgccttctagttgccag cc atctgttgtttgcccctcc cccgtg ccttcctt ga ccctgga aggtg cca ctccca ctgtcctttcctaata a a atgagga a attgcatcgcattgtctga gtaggtgtca ttc tattctggggggtggggtggggca ggacagca a gggggagga ttggga aga ca atag ca ggcatgctggggatgcgg tgggctctatggGG ATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCAC
133 AGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGG GTGCCTAGAGAAG GTG GC
GCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGGG
GAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGC
CAGAACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCT
GAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTG
AACTG CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCG
CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAA
CTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCT
ACTCTAGAGCTAGCGAATTgccgccaccATGGTG AGCAAGGGCGAGGAGCTGTTCACCGGGG
TGGTGCCCATCCTGGTCGAG CTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG
GCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCG
GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTC
AG CCG CTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGG CT
ACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGT
GAAGTTCGAGGG CGACACCCTGGTGAACCGCATCGAG CTGAAGGGCATCGACTTCAAG GA
GGACGGCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATC
ATGGCCGACAAGCAGAAG AACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG
GACGGCAGCGTGCAG CTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCC
GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAG CAAAGACCCCAACG
AGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCAT
GGACGAGCTGTACAAatgattgtttattgcagcttataatggttacaa ata a agcaatagc atca ca a atttca c a aata a agcatttttttca ctgcattctagttgtggtttgtcca a actcatca atgtatcttatGGGAG
GCTG GAATGT
GCCTGGGATTTGGGTCTAAGTGTATGCGTAATTCTTACCTCACTAAAGAATTTGCCTTGTTTT
TTTCCTTTTGGTGAGTGACTAAAACGTCTGGGCTTCCCTGTGTGCGTGCTACAGTAAGCAAG
CAGAGGCTGTGCAAAGGIGTGAGCAGGATCACGTGGAATCTGGAGGATACATCTTGGCTT
GCAAACTG CCTCTGTCTCCTGG GTGGGACTGTTCTGTCCTTGCACTGCTGTTCTGTGTTACCT
CTTGGGGTGTAAGGTTTTGCTTACAGGAGACAAACTTTGGGCGTAGAATGGAAGCCACTGC
CAGCCTCTGTGCTGAGAAGGAAGGTGCTTGTTTCAAAGGGAGCAGCAAGGGAGGCTTGTT
CTACTCACCTGGGCCTGTTTGCCTGAGAAGGGGAGATAAGGGCTGAACTGGGACTAGCCA
GGGGGA
pARB I-639: TTGTTAGTGTTGGTTAAGTTGTTG CTTG GAAG TG AG AA GTTGCTTA G
AAACTTTCCAAAGTG
ch r11_3_ CTTAGAACTTTAAGTG CAAACAGACAAACTAACAAACAAAAATTGTTTTGCTTTGCTACAAG
Exogenou GTGGGGAAGACTGAAGAAGTGTTAACTGAAAACAGGTGACACAGAGTCACCAGTTTTCCG
s_98 AGAACCAAAGGGAGGGGTGTGTGATGCCATCTCACAGGCAGGGGAAATGTCTTTACCAGC
TTCCTCCTGGTGG CCAAGACAGCCTGTTTCAGAGGGTTGTTTTGTTTGGGGTGTGGGTGTTA
TCAAGTGAATTAGTCACTTGAAAGATGGG CGTCAGACTTGCATACGCAGCAGATCAGCATC
CTTCGCTGCCCCTTAGCAACTTtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgcc ttccttgaccctggaaggtgccactccca ctgtcctttccta ata a a atga gga a attgcatcgcattgtctgagtaggtgt cattctattctggggggtggggtggggcagga cagcaagggggaggattgggaagacaatagcaggcatgctggggat gcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTG GG CAGAGCG CACATCGC
CCACAGTCCCCGAGAAGTTG GGG G GAG G GGTCGGCAATTGAACG GGTGCCTAGAGAAGG
TG GCG CGGG GTAAACTGG GAAAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCCGAGGGTG
CG GGAGAACCGTATATAAGTG CAGTAGTCG CCGTGAACGTTCTTTTTCGCAACG GGTTTGC
CG CCAGAACACAG CTGAAGCTTCGAGG GGCTCG CATCTCTCCTTCACG CG CCCG CCGCCCT
ACCTGAGG CCGCCATCCACG CCG GTTGAGTCGCGTTCTGCCG CCTCCCGCCTGTGGTG CCTC
CTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAG CTCAGGTCGAGACCGG GCCTTTGTCCG
GCGCTCCCTTGG AG CCTACCTAGACTCAG CCGGCTCTCCACGCTTTGCCTGACCCTG CTTGC
TCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTG CG CCGTTACAGATCCAAG CTG TG A CCG GC
GCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAG CAAGG GCGAG GAGCTGTTCACC
CG GGTG GTG CCCATCCTG GTCGAG CTGGACGGCGACGTAAACG G CCACAAGTTCAGCGTG
TCCGGCGAGG GCGAG GGCGATG CCACCTACG GCAAGCTGACCCTGAAGTTCATCTGCACC
ACCGG CAAGCTGCCCGTG CCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT
GCTTCAG CCGCTACCCCGACCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATG CCCGAA
GGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCG CCG
134 AG GTGAAGTTCGAGG GCGACACCCTG GTGAACCG CATCGAGCTGAAG GGCATCGACTTCA
AG GAG GACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCT
ATATCATGGCCGACAAGCAG AAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACAT
CGAGGACGGCAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGG
CCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTG AGCAAAGACCCC
AACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCG
GCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcacaaatt tca caaataaagcatttttttcactgcattctagttgtggtttgtccaaa ctcatcaatgtatcttatAGGTGGTTGATT
TGAAACTGTGAAG GTGTGATTTTTTCAG GAG CTG G AAGTCTTAGAAAAG CCTTGTAAATG C
CTATATTGTG GGCTTTTAACGTATTTAAGGGACCACTTAAGACGAG ATTAGATGGGCTCTTC
TGGATTTGTTCCTCATTTGTCACAGGTGTCTTGTGATTGAAAATCATGAGCGAAGTGAAATT
GCATTGAATTTCAAGGGAATTTAGTATGTAAATCGTGCCTTAGAAACACATCTGTTGTCTTTT
CTGTGTTTGGTCG ATATTAATAATGGCAAAATTTTTGCCTATCTAGTATCTTCAAATTGTAGT
CTTTGTAACAACCAAATAA CCTTTTGTG GTCACTGTAAAATTAATATTTG GTAGACAGAATCC
ATGTACCTTTGCTAAGGTTAGAATG AATAATTTATTGTATTTTTAATTTGAATGTTTGTGCTTT
pARB I-640:
CAGTATCTTTGGCACAGTGCATGAGCACGACTAAAGTAAAACATCGCAGAAAACATAGCTT
ch r11_1_ TAGTCTACCCTTCGTGTCCTAAAAGGAAAACCAGTAGCTTCCCAGGCCACCGGAAGGGCAA
Exogenou CACATGTCCTCTGCAGTTTCTGCACACGGGAAGGTAAAGACAGAGAGAGGACCTACTCCTC
s94 AACACAGAAACATTTCAAAATCTTTCCTCGCCTGCAACCCAAGCTGAAGTCATTCTCCCCAG
AAATAACAAAAGTTGGAAGAGAAGCCGGAGACAGGATAGGTGCAGGAAGCCCACACTTTG
AG GGCAG CACTCAGACACCCTCTCCTGTGTGCAGGACGTGCCGAATGTTCAGGTGCAATGA
GAATGAGCCATGCTTGGCTTAtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgcc ttccttgaccctggaaggtgccactccca ctgtcctttccta ata a a atga gga a attgcatcgcattgtctgagtaggtgt cattctattctggggggtggggtggggcagga cagcaagggggaggattgggaagacaatagcaggcatgctggggat gcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCG CACATCGC
CCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGG
TGGCGCGGG GTAAACTGG GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTG
GG GGAGAACCGTATATAAGTG CAGTAGTCG CCGTGAACGTTCTTTTTCGCAACGGGTTTGC
CGCCAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCT
ACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTC
CTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCG
GCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGC
TCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGC
GCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACC
GG GGTGGTG CCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTG
TCCGGCGAGGGCGAG GGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACC
ACCGG CAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT
GCTTCAGCCGCTACCCCGACCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATGCCCGAA
GGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCG
AG GTGAAGTTCGAG G GCGACACCCTG GTGAACCGCATCGAG CTGAAGGG CATCGACTTCA
AG GAG GACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCT
ATATCATGGCCGACAAG CAGAAGAACG GCATCAAGGTGAACTTCAAGATCCGCCACAACAT
CGAGGACGG CAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGG CGACGG
CCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTG AGCAAAGACCCC
AACGAGAAGCG CGATCACATG GTCCTG CTGGAGTTCGTGACCGCCGCCGG GATCACTCTCG
GCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcacaaatt tca ca a ata aag ca tttttttca ctgcattctagttgtggtttgtcca aa ctcatcaatgtatcttatCGAGGGCAATC
TGGCCCATCAAGTGGCCTTCGCCTCTGGGAGTAACAAAAATGCACTTCAAAATAGCTTCTGT
AATCAAGCTGCATG GGTG GAGTACTCCCCAG CTGACTCCAG GAAGTTCTCTATCCAAAG CT
ATTCATTAGGCCAGAGCTGTGCAAATAATTAGTCACCCACTTGCTCCATAACCCTCCATGAC
AG CCCAGGCATTGAGTCCAG GTGGGACCATCAAGCCATGCTCTGGTGGCTCATGCATTATC
ATAGAAATG G GAG G CTTTATTTATTTTACTAAAAAGAACAAAAACAACAGACTG CTGTCCTT
TAGACAATAGG ATCACGTCATCTGAGCCCTCTGTGCCCCAGGTGACAAGCCCAGCCCCAAG
135 TTCTCTTTCCTCAG CCTCCCCACACATGTTCTGGAG GAGATG GGCCCAGCAG GCTGCTCTGA
GGCCTG GC
pA RB I- TCA CAG AATCTTG TTTG AG CTAACAACCAGTTATTCTACCCAAA GTAAAG

641: ITTCCCATTGTTCTTCCAGGCTGGAAGATGTAAGAAACACACACAGTATAGTGTAGGTTTCC
ch r11_2_ TCCATAGTTAAGTACCCCAGAGCTAGGCTGAAATCCAGTGTTGGATTGTTTCCAACTTATAG
Exogenou TAG GGAACCGCCAATGACATG AAAGAGCATTACCTTACCGG ACTGATTCATATTCTACTTCC
s_96 AGTCATAG TA CAAATG ATCA CATG TCCACACCCA CATG TG CTCTG ATAA
GTA GTCAATTG A G
AGGAGTGAGTCAGGGAGCCGTAACTTTGCAGCCAACCTGCCAAGAAGAACTGATGCTTTTA
GAAATTCA G CA GCTTTCATTG G CATA G AAGTTTCAG CCTTATA G G CCCAG G GTTTCTCTG AG
GCAGAGGCAAACCCTGTtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttcctt ga ccctgga aggtg cca ctccca ctgtcctttcctaata a a atgagga a attgcatcgcattgtctga gtaggtgtca ttc tattctggggggtggggtggggca ggacagca a gggggagga ttggga aga ca atag ca ggcatgctggggatgcgg tgggctctatggGG ATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCAC
AGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGG GTGCCTAGAGAAG GTG GC
GCGGGGTAAACTGGG AAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGGG
GAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTICG CAACGGGTTTG CCGC
CAGAACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCT
GAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTG
AACTG CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCG
CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAA
CTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCT
ACTCTAGAGCTAGCGAATTgccgccaccATGGTG AGCAAGGGCGAGGAGCTGTTCACCGGGG
TGGTGCCCATCCTGGTCGAG CTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG
GC GAGG GCG AG GG CGATGC CACCTACG G CAAGCTGACCCTG AAGTTCATCTGCACCACCG
GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTC
AG CCG CTACCCCGA CCACATG AAGCA GCACGACTTCTICAAGTCCGCCATGCCCGAAGG CT
ACGTCCAG GAGCGCACCATCTTCTTCAAG GACGACGGCAACTACAAGACCCGCGCCGAGGT
GAAGTTCG AGG G CGACACCCTGGTG AACCGCATCGAG CTGAAGGGCATCGACTTCAAG GA
GGACGGCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATC
ATGGCCGACAAGCAGAAG AACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG
GACGGCAGCGTG CAG CTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCC
GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAG CAAAGACCCCAACG
AG AAGCGCGATCACATGGTCCTG CTGG AGTTCGTGACCG CCG CCG G GATCACTCTCGGCAT
GGACGAGCTGTACAAatgattgtttattgcagcttataatggttacaa ata a agcaatagc atca ca a atttca c a aata a agcatttttttca ctgcattctagttgtggtttgtcca a actcatca atgtatcttatTGTTATGTTGTTATCT
CAG G ATTCTTTTTTTTTTTTTTG AG ACAG A GTCTCACTCTG TTG CCCAG G CTG G AGTG CAGTG
GCACAATCTCGGCTCACTGCAAGCTCCGCCTCCCGGGTTCACG CCATTCTCCTGCCTCAGCC
TCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCACGCCTTTCTAAATTTTGTATTTTTAGT
AG AGACGG G ATTTCACCATG TTAG CCAG GATG GTCTCGATCTCCTGACCTCGTGATCCGCC
GCCCTGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACTGCGCCCAGCCTATCTCAG
GATTCTTAATG TAAAAACAGTT CAG TATG G AAAG CG TACTTATG ACA G AG CTGTTG CAG AA
TGTGAAATCACCCATGTGCCTAAGTGTCTCCCTACTACCTGGAACACTAGACCACATGCAGC
AG
pA RB I- CAG CC CTGAACTCAGCCTTCACGG CCCTCCCTAAACACATGGG

642: AAAATGCCCTGTAGCACAGGAATCTAAAACATTGCTGTTGTTCTAGTTAATAAAAGGACTTA
ch r11_3_ GAGATTGAAAAGAAGTACCATCTGTATGGCTGGCAACAGACAGCATGATTTTGATGACTGA
Exogenou CTCTGG GG AG ATCT GAG AAAA CAGTA CATAATATGTCCTTACCTCTG CTCCATCATTCATG
A
s_95 CCCAAGCTCCTGCCCTGTCTTCTCAGGAAGATTCTTACAGTAACTCCAGTTTGGAATTCCCCT
TTCATATTTGAAATAACAACAG TAATAG TG A CCACTAATG CTTACCGAG TGTCCACTCTGTG
CCAAGCATCGTGTTTTTATCATCTTTTTTGTTAGGCACTGTACATGGATTAATACAATTACAA
CCCACCACAACCCCATtga cga ctgtgccttctagttgccagccatctgttgtttgc cc ctcccccgtg ccttccttga ccctgga aggtgcca ctccca ctgtcctttccta a ta aaatgagga aattgcatcgcattgtctgagtaggtgtcattctat tctggggggtggggtggggcaggacagca agggggaggattgggaagacaatagcaggcatgctggggatgcggtgg gctctatggGGATCTGCGATCGCTCCG GTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAG
TCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGGTGGCGC
136 GGGGTAAACTG G GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG GGA
GAACCGTATATAAGTGCAGTAGTCG CCGTGAACGTTCTTTTTCGCAACG G GTTTG CCG CCA
GAACACAGCTGAAGCTTCGAG GGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCTG
AG GCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAA
CTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCG GGCCTTTGTCCGGCG CT
CCCTTG G AG CCTACCTAG ACTCAG CCG G CTCTCCACG CTTTGCCTGACCCTGCTTG CTCAA CT
CTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTAC
TCTAGAG CTAG CGAATTgccgccaccATGGTGAG CAAGGGCGAGGAGCTGTTCACCGGG GIG
GTGCCCATCCTGGTCGAGCTGGACGG CGACGTAAACGGCCACAAGTTCAGCGTGICCGGC
GAGG GCGAG GGCG ATG CCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGG C
AAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCA
GCCGCTACCCCGACCACATGAAG CAG CACGACTTCTTCAAGTCCGCCATGCCCGAAG GCTA
CGTCCAG GAGCGCACCATCTTCTTCAAGGACGACG G CAACTACAAGACCCG CGCCGAGGTG
AAGTTCGAGGG CGACACCCTGGTGAACCGCATCGAG CTGAAG GGCATCG ACTTCAAGG AG
GACGGCAACATCCTGG GGCACAAG CTG GAGTACAACTACAACAGCCACAACGTCTATATCA
TG GCCGACAAGCAGAAGAACG GCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG G
ACG GCAGCGTG CAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACG GCCCCGT
GCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAG ACCCCAACG AG
AAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCG GCATGG
ACGAGCTGTACAAatgattgtttattgca gcttata a tggtta ca aata a agca atagca tca c a a atttca ca a at aa agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatcttatGAGATAAGTACTTGTACT
ATCCTCATTTTATGG GTGATGATACAAG AAACAG AAAG G TTCA G CAA CATG CCTAAAGCCA
CACAG CTA CTATG AG G CAAA G TCTAG GTTTGAACTCAGAAATTCTGACTTCAGACTCACTTA
TCTTATTATACTTCACTTAGTATCTCCCATTAATTAATTCATTCATTCTTCCTTTTATTTGTTAAT
TCACCAAATATTCTGG GAAGCTACTTGTCTAGACACTGTATGAAAAATGAGAATGGATACA
AAG CAACCTCAG ACATG AA CTCAAG GTCCTTACAAGCTTAAACGTATATGATACATTTCCAA
ATA G CAATA CTATG AG G A CA G CTG ATACTCA CAACATAAGTCTTG GIG GTCACA GG AG ATC
TATCATAGGATTTCACGAAGAACCTGACACTGAAAACCAGAACTTGAGGGAAATAAAAC
pA R B I-643: TTG G CACAAG G A CCATG AAATCTTCA GTAG CTAAG G AG ACTCTTTG
ACAAATA G ATTTATTG
ch r15_1_ CATATG ATGTAAAAGG G TCTCATTCA G AG AACTACACATTTACTACA CATTTACTA G
CTGTT
Exogenou AG ATG TG AGAAATTG AAGCTTTG AGAATTGTCATTTTGTAAG CACG CTGTAATAATTGTGTT
s_100 CTCTACAAAAATG G AG AAATACTTAATCTTTGCAG G TTTTTATTAG CTTC G
AAAAAG GGAAT
ACAAATATTTTGATACCTGACTTTATTTTAAAGATACTGGCAAGCAAG GAAAGATACTCAGT
TTG CTAAATTATAGCATAG ATTTG AAAG AC G G AAATTTTTTTTTGTTATTTTG ATTCTTTCTG A
TTCAGAACCTGCGtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttgaccc tggaaggtgccactcccactgtcctttccta ata a a atgagga a attg catcgcattgtctgagtaggtgtcattctattct ggggggtggggtggggcagga cagca agggggaggattggga a ga ca atagcaggcatgctggggatgcggtgggc tctatggGGATCTGCGATCGCTCCG GTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTC
CCCGAGAAGTTGG GG GGAGGGGTCGGCAATTGAACGG GTGCCTAGAGAAGGTGGCGCGG
GGTAAACTGGGAAAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCCGAGG GIG GGGG AGA
ACCGTATATAAGTGCAGTAGTCG CCGTGAACGTTCTTTTTCGCAACGGGTTTGCCG CCAGAA
CACAG CTGAAGCTTCGAGGGG CTCGCATCTCTCCTTCACG CGCCCGCCGCCCTACCTGAGG
CCG CCATCCACG CCGGTTGAGTCGCGTTCTGCCG CCTCCCGCCTGTGGTGCCTCCTGAACTG
CGTCCG CCGTCTAG GTAAGTTTAAAG CTCAGGTCGAGACCG GG CCTTTGTCCGGCG CTCCC
TTGGAG CCTACCTAGACTCAGCCGGCTCTCCACG CTTTGCCTGACCCTG CTTGCTCAACTCTA
CGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGG CGCCTACTCT
AG AGCTAGCGAATTgccgcca ccATG GTGAGCAAG GGCGAGGAGCTGTTCACCGG GGTG GT
GCCCATCCTGGTCGAGCTG GACG GCGACGTAAACGG CCACAAGTTCAG CGTGTCCGGCGA
GGGCGAGGGCGATG CCACCTACG GCAAG CTGACCCTGAAGTTCATCTG CACCACCGGCAA
GCTG CCCGTGCCCTGG CCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAG CC
GCTACCCCGACCACATGAAG CAG CACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGT
CCAGGAGCGCACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCG CGCCGAGGTGAA
GTTCGAGG GCGACACCCTG GTGAACCG CATCGAGCTGAAGGG CATCGACTTCAAGGAG GA
CG GCAACATCCTGG G G CA CAA GCTGG A GTACAA CTA CAA CAG CCACAACGTCTATATCATG
137 GCCGACAAGCAGAAGAACGGCATCAAG GTGAACTTCAAGATCCGCCACAACATCGAGG AC
GGCAG CGTGCAG CTCGCCGACCACTACCAGCAGAACACCCCCATCGG CGACG GCCCCGTGC
TG CTGCCCGACAACCACTACCTGAG CACCCAGTCCAAGCTGAG CAAAGACCCCAACGAGAA
GCGCGATCACATG GTCCTGCTGGAGTTCGTGACCG CCG CCG GGATCACTCTCG GCATGGAC
GAGCTGTACAAatgattgtttattgcagcttata atggttaca a ata a agca ata gcatca ca a atttca ca aata a agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatctta tTGAGTTTCCATTCCTGGGAT
AG A G AG CCCTG CCAAAG CCAACTCAGTCTCCTGTTATTG CAGAAATTTAAAG G TTACTTG A G
CTGATTGATACCAATTAAAATAAAAGCATCTACTGCCATTTTATTAAACG CCCTATTTAATCT
CCACATCCCAAGTTCATTCTCTCTCTCTTTTAAGGCTATACATGGGCCTCAACATTCTCTCTTC
TTCAGTATCA G CTGTCCTCATCCCTATTCAAA GTCAG AATG A CTTG G AATCTATTCAGG G CT
GCTACTTAATTGGATGAAGAAAGTACTCTGTTG CATG CTAACCTCCCA G ATTTG AATTAG AA
CCAAG GTCCTGAATCTACACAAGACCTTTTTCCCTTTCCTTTCTGCATTTATTTCTTTTTTTTCT
ACATCTTTCTTTTTG AATTTTTTTTTTTTTTTTTTTCTG AG ATA GAGTCTTG
pA R B I- ATCATCAAAATCTTCATTTATACAAATCACGGGTATTAGAAATAAAATCAG

644: AGTCAG AT CCCA G G TTTACACAAACTAG AG
GTTTCAATTAAGTTTCCTTCATTTTATAAAACC
ch r15_2_ ATG ACACAA G CTG TAAATAAAG G A GCTCTG G CTTTAG G CCTTCTTCTGTTTATA G
ACAGTAA
Exogenou ATTTTAATCCTCTGTCCCTGCAGACCAACTCAAAGCAGTAACCTTGGCTACCGTTGCCGCAG
s_101 AAG AAACACACCCTG CA G ATTTCCAGTTCTATATTCAGTCTTA G AAAATTG G
TCATGTTG AA
AG G G AAAAAACAATTTTACTTTG CCTCA GTTATTGACACCAAG AAG CACTCCTCAG CCTGTT
CAG AG GGGAAACAATGAGTCATTCATG CTGTCAGCCCAGGTG GTGCATGAATTCCAG ACA
GAGGCCGCTGAATTAACtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttcctt gaccctggaaggtg cca ctccca ctgtcctttcctaata a a atgagga a attgcatcgcattgtctga gtaggtgtca ttc tattctggggggtggggtggggca ggacagca a gggggagga ttggga aga ca atag caggcatgctggggatgcgg tgggctctatggGG ATCTGCGATCGCTCCGGTGCCCGTCAGTGG GCAGAGCGCACATCGCCCAC
AGTCCCCGAGAAGTTG GGGGGAGG GGTCGGCAATTGAACGG GTG CCTAGAGAAG GTG GC
GCGGGGTAAACTGGG AAAGTGATGICGIGTACTGGCTCCG CCITTTTCCCGAGGGTGGG G
GAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCG CAACGGGTTTG CCGC
CAGAACACAGCTGAAGCTTCGAGGGGCTCG CATCTCTCCTTCACGCGCCCGCCGCCCTACCT
GAGGCCGCCATCCACG CCGGTTGAGTCGCGTTCTG CCGCCTCCCGCCTGTGGTGCCTCCTG
AACTG CGTCCG CCGTCTAG GTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGG CG
CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACG CTTTG CCTGACCCTGCTTGCTCAA
CTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCG CCT
ACTCTAGAGCTAGCGAATTgccgccaccATGGTG AGCAAGG GCGAGGAGCTGTTCACCGG GG
TGGTG CCCATCCTGGTCGAG CTGGACG GCGACGTAAACG GCCACAAGTTCAGCGTGTCCG
GC GAGG GCG AG GG CGATGCCACCTACG G CAAGCTGACCCTGAAGTTCATCTGCACCACCG
GCAAGCTG CCCGTG CCCTG G CCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTC
AG CCG CTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCG CCATGCCCGAAGG CT
ACGTCCAG GAGCGCACCATCTTCTTCAAG GACGACG GCAACTACAAGACCCGCGCCGAGGT

GGACG GCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATC
ATG GCCGACAAG CAGAAG AACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG
GACGG CAGCGTG CAG CTCGCCGACCACTACCAG CAGAACACCCCCATCGGCGACG GCCCC
GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAG CTGAG CAAAGACCCCAACG
AG AAGCGCGATCACATGGTCCTG CTGGAGTTCGTGACCG CCG CCG G GATCACTCTCGGCAT
GGACGAG CTGTACAAatgattgtttattgcagcttataatggttacaa ata a agcaatagc atca ca a atttca c a aata a agcatttttttca ctgcattctagttgtggtttgtcca a actcatca atgtatcttatCCGTG
GAGGCGTCTC
TCTGAGCAGAG CCCGCAATG CGCCTGCTTGGG GCTCCCTGCAGCCTCTGG GG GAGG CAGG
GCGG CCCAGAGCAG GCCTGTG CTG GAAAGGAACGCGAAGCCCTGTAACCAAGCCTGTACC
TCTG CAGTGCTAGTCCCAAG GGG CCTCCGAGCTGTTTGTCACCATGTGATTG GCTCAG GAG
AGGGGTGGAGAAATGAAAACACTCTGCCCAG GATATATTTAGTTGAAGTGCAG CTGG GGA
AG TG CTTAAACAAG GGAGCTTTTGTCCTTATG TTG AA GTG TTTTTCTTAACTCCTCAAG GGT
GAAAAACTTG A GCCACGTACTG ATCCCATTCCCCCCCACCACCCCCAATATATTTTCTCTTCT
TTAG G AAATG CTCTTATTCTG AAACTTTA G AATTTTCTAGG G TTTGTTGCATAAGAG GAAAC
TGAATAA
138 pARB I-645: CTGAG ATA GAGTCTTG CTCTGTCG CCAG G CTG GAG TG CAGTG GTGTG
ATCTTG G CTCACTG
chr15_3_ CAGTCTCCAACTCTGGTTCAAGCAATTCTCCTG CCTCAGCCTCCGGGTAGCTGG GACTACAG
Exogenou GCATACACCACCATGCCCAGCTAATTTTTGTATTTTTAGTAGAGATGGGGTTTCACCATAGT
s_102 GGCCAGGATGGTCTCCATCTCTTGACCTCGTGACCTGACCCCCTCGGCCTCCCAAAGTGCTG
AG ATTACA G G CATG AG C CACC G CG CC CGG C CTCTTTTTG AATTTTTTTAAAAAAACA CCTAA
AGTTTAGGAAAGTATAAGAGG CCAAAGAAACAAGAGTTCAAAGAAACAAGAGTGTTATAC
ACGCACACTTGCAtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttgaccc tggaaggtgccactcccactgtcctttccta ata a a atgagga a attgcatcgcattgtctgagtaggtgtcattctattct ggggggtggggtggggcagga cagca agggggaggattggga a ga ca atagcaggcatgctggggatgcggtgggc tctatggGGATCTGCGATCGCTCCG GTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTC
CCCGA GAAGTTGG GG GG AGGGGTCGGCAATTGAACG GGTGCCTA GAGAAGGTGGCGCGG
GGTAAACTGGGAAAGTGATGICGIGTACTGGCTCCGCCTTITTCCCGAGG GTG GGGG AGA
ACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAA
CACAGCTGAAGCTTCGAGGGG CTCGCATCTCTCCTTCACG CGCCCGCCGCCCTACCTGAGG
CCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAACTG
CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCGCTCCC
TTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTA
CG TCTTTGTTTCG TTTTCTG TTCTG CG CCGTTACAG ATCCAA G CTGTG AC CG G C G CCTA CTCT

AG AGCTAGCGAATTgccgcca ccATG GTGAGCAAG GGCG AGGAGCTGTTCACCGG GGTG GT
GCCCATCCTGGTCGAGCTGGACG GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGA
GGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAA
GCTG CCCGTGCCCTGG CCCACCCTCGTGACCACCCTG ACCTACG GCGTGCAGTGCTTCAG CC
GCTACCCCGACCACATGAAG CAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGT
CCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAA
GTTCGAGG GCGACACCCTG GTGAACCG CATCGA GCTGAAGGG CATCG ACTTCAAGGAG GA
CG GCAACATCCTGG G G CA CAAGCTGG A GTACAA CTA CAA CAG CCACAACGTCTATATCATG
GCCGACAAGCAGAAG AACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGG AC
GGCAGCGTGCAG CTCGCCG ACCACTACCAGCAGAACACCCCCATCGG CGACG GCCCCGTGC
TGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAA
GCGCGATCACATG GTCCTGCTGG AGTTCG TGACCG CCG CCG GGATCACTCTCG GCATGGAC
GAGCTGTACAAatgattgtttattgcagcttata atggttaca a ata a agca ata gcatca ca a atttca ca aata a agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatctta tGAAAG
GTAGTAAAATATCT
GTATA GTTCT GG CTGTCAA G CTTTTG G A CTG AG ATATG C G TAA G CAAG GCAAAAA G ATCC
C
ATA GTCCA G G AATATCAACCAGTCCAGTTTC CCAAG G AAACTAAAAG TAACTG AAAATG AG
AAGGGTGCCATATAGGAAAATTGAAGGTGGGGCATAATTATTAATACACTGCTTTG GAAAT
G CTG G TATG G AAG ATTCATATATG G ACTTCAAAG CATACATGTCTTATACTTAG CTATGG GA
AAATTATCATCAAAATCTTCATTTATACAAATCACGGGTATTAGAAATAAAATCAGGTATTG
TGTACAG TCAG ATCCCA G GTTTACACAAACTA G AG G TTTC AATTAAGTTTCCTTCATTTTATA
AAAC CATG A CA CAA G CTGTAAATAAA G G AG CTCTG G CTTTA G GCCTTCTTCTGTTTATAG A
pARB I-646: GATCTTATAAGCTCATGGTGATTGATCCAAATGATGCAG AGGTCGGCCTAAAG TTAG
AAGT
chr16_1_ GGGCCCCTCTCTGCCCCAAGACAGCCCTTCACCCCAATTCCATTCCCACAGTTTGGGCATCC
Exogenou ACCCAGGCTGCCAAGCCAAGCGGGGGCTGCCCGGGTTAGCAGGGACCTGGCCATGGGCCT
s_103 CCTCAGCTAGGGGCCGCCTCTCTGAG
GAGTGGGTGCCCCGCCCCTCCGTGGGCCGCCTGGT
TTCTCTTATTG G CA G CGTCTG TAGTCC CCTGG CTCTGTCAC CCG CAG CTATTCTAG G CCTCTG
GTTCCTTTCACTTTCTGCTCTGCGTCTTCCTGCCTCG GTG GTTCCCCACCATCCAG CCTG AG A
CCTCCCCGTTGCCCGCCCtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttcct tgaccctggaaggtgcca ctcccactgtcctttccta ataa a atgaggaa attgcatcgcattgtctgagtaggtgtcatt ctattctggggggtggggtggggcaggacagcaagggggaggattgggaagacaatagcaggcatgctggggatgcg gtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCA
CAGTCCCCGAGAAGTTG GG GG GAG G GGTCGG CAATTGAACG GGTGCCTAG AGAAGGTGG
CGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG
GGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTICTTTITCGCAACGGGTTTGCCG
CCAGAACACAGCTGAAGCTTCGAGGGG CTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACC
139 TGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTG
AACTG CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCG
CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAA
CTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCT
ACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGG
TGGTGCCCATCCTGGTCGAG CTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG
GCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCG
GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTC
AG CCG CTACCCCGACCACATGAAGCAGCACGACTICTTCAAGTCCG CCATGCCCGAAGG CT
ACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGT
GAAGTTCGAGGG CGACACCCTGGTGAACCGCATCGAG CTGAAGGGCATCGACTTCAAG GA
GGACGGCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATC
ATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG
GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCC
GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAG CAAAGACCCCAACG
AGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCAT
GGACGAGCTGTACAAatgattgtttattgcagcttataatggttacaa ata a agcaatagc atca ca a atttca c a aata a agcatttttttca ctgcattctagttgtggtttgtcca a actcatca atgtatcttatTGGGTTTCTGTCTGCT
CCTCTCCGTG GTG GCCGGCCCTGGCTCTGCCCCAAAGCCATGGAGAAGGCCAAATGCAGGC
TGTAGAAAGGTCTGGGATCAGATCCCCCCATCCTTCCATGGCCACGTGACCGGGCCCGCCT
CTTC CCTCTGAGCCTCACATCCTCATCTGAAGAGGGAATG CTG CTCATG CC CACCTTG CGG G
GATGTGGGGAGGGGGTGCTGCCAGTGGAGACTCCCAGG GGACCCTCCAGGAGTCTCATCC
TGACTCGAGACTGGGCACCTGGTGGGGGCACAGGCCCCCTCTCCCGCCATGGGACCGCCA
CATCCAGGCTCTCGTGGGTTGAGATGCTGAGGAAAG GAAAGAACAAAACCTGCTGGAGGA
GCAGCGAACACGGCCCTCAGCTGGGTGACCTTGGGCAAGTCACTTCCCCTCTCTGGGCCTC
AG TTTC CAT
pARB I- CTGAAGAGGGAATGCTGCTCATGCCCACCTTG CGGGGATGTG GGGAGGGG

647:
TGGAGACTCCCAGGGGACCCTCCAGGAGTCTCATCCTGACTCGAGACTGGGCACCTGGTGG
ch r16_2_ GGGCACAGG CCCCCTCTCCCGCCATGGGACCGCCACATCCAGGCTCTCGTGGGTTGAGATG
Exogenou CTGAGGAAAGGAAAGAACAAAACCTGCTGGAGGAGCAG CGAACACGGCCCTCAGCTGGG
s_104 TGACCTTGGGCAAGTCACTTCCCCTCTCTG
GGCCTCAGTTTCCATATCTGTGAATGAGGCAG
CTGGACTGAAGAATAGAGAGTCACAAAGCAGAGTGACACAAACTGG GCTACAGGAAACCC
TG GAG GCCCTCGACCCAG GCTGGG G CCGGGG ATGGTTCTGCAAAGCCTCTTGGGAGAGG G
TTGCCTGAGCTGAGTCTCCAAACAGAAtga cgactgtgccttctagttgccagccatctgttgtttgcccctcc cccgtgccttccttgaccctgga aggtgcca ctccca ctgtcctttccta ata a a atga gga aattgcatcgcattgtctga gtaggtgtcattctattctggggggtggggtggggcagga cagca agggggaggattggg a aga caatagcaggcatg ctggggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCG CA
CATCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGG CAATTGAACGGGTGCCTAGA
GAAGGTGGCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCCG A
GGGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGG
GTTTGCCGCCAGAACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGC
CG CCCTACCTGAG G CCG CCATCCACGCCG GTTGAGTCG CGTTCTGCCGCCTCCCG CCTGTG G
TGCCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAG CTCAGGTCGAGACCGGGCCTTT
GTCCGGCG CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTG
CTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGA
CCGGCGCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGT
TCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGG CCACAAGTTCA
GCGTGTCCGGCGAGGGCGAGGG CGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCT
GCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGT
GCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGC
CCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCG
CGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGA
CTTCAAG GAG GACG GCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAA
CGTCTATATCATG GCCGACAAGCAGAAGAACGG CATCAAGGTGAACTTCAAGATCCGCCAC
AACATCGAGGACGGCAGCGTG CAG CTCG CCGACCACTACCAG CAGAACACCCCCATCG GC
140 GACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAG
ACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCAC
TCTCGG CATG GACGAGCTGTACAAatgattgtttattgcagcttata atggtta caa ata a agcaatagcatc aca aatttca caa ata a a gcatttttttca ctgca ttctagttgtggtttgtccaa a ctc atca atgtatcttatCAGTGG
CCTTAAGTCACGGGAGATGAGCCAGAG CAGTCCAGAGGGCACGGTCAGCACATGCTAAGG
TCCTGGGGCACAGAGTACAGGGCTTGAGAAGGGTTCCTCTTGCTCCATGTGACTTTCATCAT
CCAACAGCCTGTGGTGGCACTTGCCTGTATCCCAGCTATCTAGGAGGCTGAGGTGGGAGG
ATTGCTTGAGTCTGAGAGGTCGGGCTTGTTCTCCCGGTGTTGGCAGAGTCCAG AGGGAGCA
AGTGGAAGCCACAGGTGTCTTGAGGCCTAGGCTGAGATCTAG CACATCACCGCGTCACTTC
ATCCCTTCATTTCATCCTATTGG CCAAAGCAGCCCAATTCTAGGGATGGGGATCTAGACTCA
ACTTCTTGATGGGAGAAGTAGCAAAGTCACACTGCCAAGGGCAGGGACACAG GGAGGGAT
GAAGAGTTAGGAATGGTT
pARB I-648: CG GTGCATGGCAATGTGATAGGGGAATGAAACACAGCAACAGAATCAATGCCCCACGCTG
ch r16_3_ GGCAATAGCCGACTTTCTGTTCGCTCCCATCCCTTCCTTCCCTCCCCGCCTCCTTCCCCTGCTC
Exogenou CTTCCATTCCACAAACATTTATTGAGCACCCGCTGTGTGCCAGCCACTGTTCTAGGCCCTGA
s_105 AGACACAGAAGTGAACAAAAAAAAAGAGTCCCTGTGCACATCCTG GAGG
GACTTTCCCCGT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTCTATACAGTGTATGGAGACAGTGGATAATAAAA
GCTGTTTATG AG G CACTTTCTCTGAGCTGTTCCATGTG CTTCACTTATACTCATTCAG CTAAT
CCTCAGGACACCCCCGtgacga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttg accctggaaggtgcca ctccca ctgtcctttccta ataa a atgagg aa attgcatcgcattgtctgagtaggtgtcattct attctggggggtggggtggggcaggacagca agggggaggattgggaagacaatagcaggcatgctggggatgcggt gggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTG GGCAGAGCGCACATCGCCCAC
AGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGG GTGCCTAGAGAAG GTG GC
GCGGGGTAAACTGGG AAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGGG
GAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGC
CAGAACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCT
GAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTG
AACTG CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCG
CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACG CTTTG CCTGACCCTGCTTGCTCAA
CTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCT
ACTCTAGAGCTAGCGAATTgccgccaccATGGTG AGCAAGGGCGAGGAGCTGTTCACCGGGG
TGGTGCCCATCCTGGTCGAG CTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG
GCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCG
GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTC
AG CCG CTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGG CT
ACGTCCAG GAGCGCACCATCTTCTTCAAG GACGACG GCAACTACAAGACCCGCGCCGAGGT
GAAGTTCGAGG G CGACACCCTGGTGAACCGCATCGAG CTGAAGGGCATCGACTTCAAG GA
GGACGGCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATC
ATGGCCGACAAGCAGAAG AACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG
GACGG CAGCGTG CAG CTCGCCGACCACTACCAG CAGAACACCCCCATCGGCGACG GCCCC
GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAG CAAAGACCCCAACG
AGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCAT
GGACGAGCTGTACAAatgattgtttattgcagcttataatggttacaa ata a agcaatagc atca ca a atttca c a aata a agcatttttttca ctgcattctagttgtggtttgtcca a actcatca atgtatcttatTGAAGTCAGTGATATT
AGTCACAG AGGCCCAGAGAAGTGAAGTGACTTGCCCAAG GTCACACAG CCAG CAAGAGG C
CAAGCCCG GATTGAACCCCAGCCGCCTGGCTCTGGAGCCTG CAGCATAACCACAGCAGGG
AACTG CCACCGGAGACAGACATCGACAGCCACTAGGAGATGTTAACCAACAGGCTTGTCTT
CACGG CACGGCCCCCGCTTCACCAGCTGCACTGTTTGATGAGCTTTGCAGGTCCCAGATCTT
ATAAGCTCATGGTGATTGATCCAAATGATGCAGAG GTCGGCCTAAAGTTAGAAGTGGGCCC
CTCTCTGCCCCAAGACAGCCCTTCACCCCAATTCCATTCCCACAGTTTGGGCATCCACCCAGG
CTGCCAAGCCAAGCGGGG GCTGCCCGGGTTAGCAGGGACCTGGCCATGGGCCTCCTCAGC
TAG GGGC
pARBI- GGGGGCGGCCGCACGG CTAGAGCGGAGACCCCGCGCCCCCTCCGCCCGCGTGG

649: GG GGTGCGG GGCCCGGGGAGGCACGGGGGCTGCGCGTCGGGGCGCAGCCGCCGCCCGC
141 E DF1_1_E GTGTG CTCGGAGGCCGCGGGGCCCGGGCTCCGGGGTCCTCCCGACCTGCAGCCCCAGCGG
xogenous CTACCGCGCCTCGCCAGGCCAGG CCAGGCCCCGACGTCGCCTTCCCTACGTCG CCGG CG CC

GCCACGACGTCCCTCAGACGAGCCGAACGCCGAATGGCCCCGAGCACGGGAAGTGCCC
GCCCCCCGCGTGCAGCCAGCCAATGGGACGCCGAAAGCGGGGAGGTGCCGAGGGGACGT
AG CGTCG CCGCGCCAGGTCTCTAG CAGCTGCCGCTGAGCCGCCG GACG GACGCTCGTCTTC
GCCCG CCATGG CCGAGAGCGACTGGGACACGtgacga ctgtgccttctagttgccagccatctgttgtttg cccctcccccgtgccttccttgaccctggaaggtgccactcccactgtcctttcctaataaaatgaggaaattgcatcg cat tgtctgagtaggtgtcattctattctggggggtggggtggggcaggacagcaagggggaggattgggaagacaatagc aggcatgctggggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGA
GCGCACATCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGG GTCGGCAATTGAACGGGTG
CCTAGAGAAGGTGGCGCGGGGTAAACTG GGAAAGTGATGICGIGTACTGGCTCCG CCITT
TTCCCGAGGGTGGG GGAGAACCGTATATAAGTG CAGTAGTCGCCGTGAACGTTCTTTTTCG
CAACGGGTTTG CCGCCAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGC
GCCCGCCGCCCTACCTGAGG CCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGC
CTGTGGTGCCTCCTGAACTGCGTCCGCCGTCTAG GTAAGTTTAAAGCTCAGGTCGAGACCG
GGCCTTTGTCCGGCGCTCCCITGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCT
GACCCTGCTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAA
GCTGTGACCGGCGCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAG
GAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCAC
AAGTTCAG CGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAG
TTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTA
CG GCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCC
GCCATGCCCGAAGGCTACGTCCAG GAG CGCACCATCTTCTTCAAGGACGACGGCAACTACA
AGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGG
GCATCGACTTCAAG GAG GACGGCAACATCCTGG GGCACAAGCTGGAGTACAACTACAACA
GCCACAACGTCTATATCATGG CCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGAT
CCGCCACAACATCGAGGACGGCAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCC
ATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGA
GCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTG CTGG AGTTCGTGACCGCCGCCG
GGATCACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca aataaagc aatagcatcacaaatttcacaaataa agcatttttttcactgcattctagttgtggtttgtccaaactcatcaatgtatctta tGTGACGGTGCTGCGCAAGAAGGGCCCTACGGCCGCCCAGGCCAAATCCAAGCAGGTGCTT
TGCTGACTGAG GAGCGGCCGGGCCGGGCGGGGCGGACGCTgggccggggcca cggaccgtgggg cagggggctcggggaagggcaggcggggcctgggacagggcgcaggggacggggacaaggctggGGTGCCGG
GGGCAGGACGGGGTTCGAAGGGCGGG GCGAATCGCAGGGGTAG GAGGACCGGGCCAGG
ACaggggtgggaaggtggggggcgga cccaggaggggatggacggaccca ggaggcagggggCGGGCGACTG
GAGGAGGAGAGGAGCAGGGAACCGGACGGGGCAAGGGGCAGGCGATAGGGCCGGGAC
CAGTG CCAGGGTGGGGCGGGGAGGGGATAGGGGCAGGGACTGTTGGTGGGGACAGGTG
TGAGGACAG
pARB I- ACGGCTAGAGCGGAGACCCCGCGCCCCCTCCGCCCGCGTGGCCCG

650: CCGGGGAGGCACGGGGGCTGCGCGTCGGGGCGCAGCCGCCGCCCGCGTGTGCTCGGAGG
E DF1_2_E CCG CG GGGCCCGGG CTCCGGG GTCCTCCCGACCTGCAG CCCCAGCGG CTACCGCGCCTCG C
xogenous CAGGCCAGGCCAGGCCCCGACGTCGCCTTCCCTACGTCGCCGGCGCCCGGCCACGACGTCC

CCCCCCGCGTG CA
GCCAGCCAATGGGACGCCGAAAGCGGGGAGGTGCCGAG GGGACGTAGCGTCGCCGCGCC
AG GTCTCTAGCAG CTGCCGCTGAGCCGCCG GACG GACGCTCGTCTTCGCCCGCCATGGCCG
AGAGCGACTGGGACACGGTGACGGTGCTGtgacgactgtgccttctagttgccagccatctgttgtttgccc ctcccccgtgccttccttgaccctggaaggtgccactcccactgtcctttcctaataaaatgaggaaattgcatcgcat tgt ctgagtaggtgtcattctattctggggggtggggtggggcaggacagcaagggggaggattgggaagacaatagcagg catgctggggatgcggtgggctctatggGGATCTGCGATCGCTCCG GTG CCCGTCAGTGGGCAGAGC
GCACATCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCT
AGAGAAGGTGGCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTC
CCGAG GGTGGGG GAGAACCGTATATAAGTG CAGTAGTCGCCGTGAACGTTCTTTTTCGCAA
CG GGTTTGCCGCCAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGCG CC
CGCCGCCCTACCTGAGGCCG CCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCG CCTG
142 TGGTGCCTCCTGAACTGCGTCCGCCGTCTAG GTAAGTTTAAAG CTCAGGTCGAGACCGGGC
CTTTGTCCGG CGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGAC
CCTG CTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTG CG CCGTTACAGATC CAAG CT
GTGACCGGCGCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAG
CTGTTCACCG GGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAG
TTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTC
ATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGG
CGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCA
TGCCCGAAGGCTACGTCCAG GAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGAC
CCGCG CCGAGGTGAAGTTCGAGGGCGACACCCTG GTGAACCG CATCGAG CTGAAGG GCAT
CGACTTCAAGGAGGACGG CAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCA
CAACGTCTATATCATG GCCGACAAGCAGAAGAACGG CATCAAGGTGAACTTCAAGATCCGC
CACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCG
GCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAG CTGAGCAA
AGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGAT
CACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttata atggtta caa ata a agcaatag catca ca aatttca caa ata a agcatttttttca ctgca ttctagttgtggtttgtccaa a ctcatca atgtatcttatCGC
AAGAAGGGCCCTACGGCCGCCCAGGCCAAATCCAAGCAGGTGCTTTGCTGACTGAGGAGC
GGCCGGGCCGGGCGGGGCGGACGCTgggccggggccacgga ccgtggggcagggggctcggggaaggg caggcggggcctgggacagggcgcaggggacggggacaaggctggGGTGCCGGGGGCAG GACGGGGTT
CGAAGGGCG GGGCGAATCGCAGGGGTAGGAGGACCGGGCCAGGACaggggtgggaaggtggg gggcggacccaggaggggatggacggacccaggaggcagggggCGGGCGACTGGAGGAGGAGAGGAG
CAGGGAACCGGACGGGGCAAGGGGCAGGCGATAGGGCCGGGACCAGTGCCAGGGTGGG
GCGGGGAGGGGATAGGGGCAGGGACTGTTGGTGGGGACAGGTGTGAGGACAGAGGCAG
GAG GTG
pARBI- CCCTGGAGTCG

651: CCATTTCAAAGCGGTAACCAGACCCCAGAGG CTGCCTGAACTCAAGG GGACATG
GGAACC
E DE1_3_E CAGGCTGTCCCAAGTTCACACACCTCAGGTCGTGGACTTCCAGAGTTTTCTCTCAGTTATGA
xogenous GGACAGGCAGCTGTACTCATGCCAACCAGAGCTGCTCTGGGAAGATGGCTGCCTCCCAGG

GCGCCGCA
CACACCACACCCACCAGCACATGGACCCCACAGCACAGCCTCATGTTGCAAGCGGAAACAC
AAGTACCTACATTTCTTGGAAGTCTCCACATCTTCTCCTCGTCTCTGTG CCG CTAAGATAG CC
TAGAAAATTAGAAAACATCAGTG Gtgacga ctgtgccttctagttgccagccatctgttgtttgcccctccccc gtgccttccttga ccctgga a ggtgccactccca ctgtcctttccta ata a aa tg agga a attgcatcgcattgtctgagta ggtgtcattctattctggggggtggggtggggcagg acagca agggggaggattggga agacaatagcaggcatgctg gggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACA
TCGCCCACAGTCCCCG AGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAG
AAGGTGGCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAG
GGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGT
TTGCCGCCAGAACACAG CTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCG
CCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTG CCGCCTCCCGCCTGTGGTG
CCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAG GTCGAGACCGGGCCTTTGT
CCG GCGCTCCCTTGGAG CCTACCTAGACTCAG CCGGCTCTCCACGCTTTGCCTGACCCTG CT
TG CTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTG CG C CGTTACAGATCCAAG CTGTGACC
GG CGCCTACTCTAGAGCTAG CGAATTgccgccaccATGGTGAGCAAG GGCGAGGAGCTGTTC
ACCGG GGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGC
GTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGC
ACCACCGG CAAGCTGCCCGTGCCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGC
AGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCC
GAAGGCTACGTCCAGGAG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCG
CCGAG GTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACT
TCAAGGAGGACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACG
TCTATATCATG GCCGACAAGCAGAAGAACG GCATCAAGGTGAACTTCAAGATCCG CCACAA
CATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGA
CG GCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGAC
143 CCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTC
TCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcaca aatttca ca a ata aagcatttttttca ctgcattctagttgtggtttgtcca a a ctcatcaatgtatcttatTTTCTAAAG
AGAAGATGTGCTTTATACACATCAGCTCCACTAG GACTGCTGCAAAACCAGGCGCGAAGCG
TCGCCTGAGAACAG CAGCTTCTAGGGCCCCTGGGGTACGCACCCACACGCAGCTGGGTTTT
GTGCGAGAGGCAGTGAGAGCCGGGCTGACCTGGCTTCCCGAGGCATTCCCTGCAGGGGAA
CCAGGGGGGTGGAGCCCACCCTCCCCGTGCTATCTGAACACCGGCCATCCCCCTCCCCAAA
CCCACAACCCCAGGAGTGAGAGCCCCGGCG CAGCCTCACCGTGGCCAGGTCCTTCTGCGTA
AG CCCCTTG CTCTGCCGACCTTGCTGGATCACCTTG CCCACCTCCAG G GTCACCCTGTCATG
GTGCAGCTCCTCTGTCTCCCGGTCCAGCTIGGCCGTGTTCTIGGTAATAGAATGITGTTIGTT
CTGGCCAGCAGC
pARB I-652: CCTCCGATTTCCTCTCCGCTTGCAACCTCCGGGACCATCTTCTCGGCCATCTCCTGCTTCTGG
FTL_l_Ex GACCTGCCAGCACCGTTTTTGTGGTTAG CTCCTTCTTGCCAACCAACCATGAGCTCCCAGATT
ogenous_ CGTCAGAATTATTCCACCGACGTGGAGGCAGCCGICAACAGCCTGGTCAATTIGTACCIGC
28_30 AGGCCTCCTACACCTACCTCTCTCTGGTGAGTCCCCAGGACGCCCCTGGCCCTAATTTCCTCC
AG CTG CGCACCTCCGGCCCTCACTGCACGCGCCAGCCTTCTTTGTGCGGTCG GGTAAACAG
AG GGCGGAGTCCCCTTG GCCTCGCCTCCCG CTAACCATTGTTG CCICCATCTCTTCCCGTAG
GGCTTCTATTTCGACtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttga c cctggaaggtgccactcccactgtcctttccta ataa a atgaggaa attgcatcgcattgtctgagtaggtgtcattctatt ctggggggtggggtggggcaggacagcaagggggaggattgggaaga caatagcaggcatgctggggatgcggtgg gctctatggGGATCTGCGATCGCTCCG GTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAG
TCCCCGAGAAGTTGGGGGGAGGGGTCG GCAATTGAACG GGTGCCTAGAGAAGGTGGCG C
GGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG GGA
GAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCG CCA
GAACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCTG
AG GCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAA
CTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCG GGCCTTTGTCCGGCG CT
CCCTTG GAG CCTACCTAGACTCAG CCG G CTCTCCACG CTTTG CCTGACCCTG CTTG CTCAACT
CTACGTCTTTGTTTCGTTTTCTG TTCTG CGCCGTTACAGATCCAAG CTGTGACCGGCGCCTAC
TCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTG
GTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGC
GAGGGCGAGGGCG ATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGC
AAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCA
GCCGCTACCCCGACCACATGAAG CAG CACGACTTCTTCAAGTCCGCCATGCCCGAAG GCTA
CGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTG
AAGTTCGAGGG CGACACCCTGGTGAACCGCATCGAG CTGAAG GGCATCGACTTCAAGGAG
GACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCA
TGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGG
ACG GCAGCGTGCAG CTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACG GCCCCGT
GCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAG ACCCCAACGAG
AAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGG
ACGAGCTGTACAAatgattgtttattgca gcttata a tggtta ca aata a agca atagca tca c a a atttca ca a at aa agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatcttatCG
CGATGATGTG GCTCTG
GAAGGCGTGAG CCACTTCTTCCG CGAATTGGCCGAGGAGAAGCGCGAGGGCTACGAGCGT
CTCCTGAAGATGCAAAACCAGCGTGGCGGCCGCGCTCTCTTCCAGGACATCAAGGTAACTA
GTGTGTGGGTAATGGACTACATCTCCCAGCAG GCCGTGCGCGCGAGGAG CCTTGATTTGA
GGGCGTAGGTGTCGCGTG G GCTTCTGG GAGATTGAGTTCGGTCTTGTGAGCCCTCTTAACC
GCTGGAAATAGAGGCGCACCTCGTGCAGTGCCCACAACACGCGGCAGTCCACACCGCTGC
GTGGTCTTAG G GACGTATAGCTGTAAGAGCTAG GACAGGGTGCG GAGAGTGATAAATACA
AG CTGTCACATGTCTTTGTGGCCTG G GCCTCTGACCCCCAACGACTCTTGGGAAATGTAGGT
TTAGTTCT
pARBI- TGGGGCGGG

653: GCCTCCTGCCACCGCAGATTGGCCGCTAGCCCTCCCCGAGCGCCCTGCCTCCGAGGGCCGG
FTL_2_Ex CGCACCATAAAAGAAGCCGCCCTAGCCACGTCCCCTCGCAGTTCGGCGGTCCCGCGGGTCT
144 ogenous_ GTCTCTTGCTTCAACAGTGTTTGGACGGAACAGATCCGGGGACTCTCTTCCAGCCTCCGACC

GCCATCTCCTGCTTCT
GGGACCTGCCAGCACCGTTTTTGTGGTTAGCTCCTTCTTGCCAACCAACCATGAGCTCCCAG
ATTCGTCAGAATTATTCCACCGACGTGGAGGCAGCCGTCAACAGCCTGGTCAATTTGTACCT
GCAGG CCTCCTACACCTACtg acga ctgtgccttctagttgccag cc atctgttgtttgcccctcc cccgtg ccttc cttga ccctgga aggtgcca ctccca ctgtcctttccta ata a aatgagga a attgcatcgcattgtctgagtaggtgtca ttctattctggggggtggggtggggcagga cagcaagggggaggattgggaagacaatagcaggcatgctggggatgc ggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCC
ACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGGTG
GCGCG GGGTAAACTG GGAAAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCCGAG GGTGG
GG GAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACG G GTTTG CC
GCCAGAACACAGCTGAAG CTTCGAGG G GCTCGCATCTCTCCTTCACGCGCCCG CCG CCCTA
CCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCC
TGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGG
CGCTCCCTTG GAGCCTACCTAGACTCAGCCG GCTCTCCACGCTTTGCCTGACCCTGCTTGCTC
AACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGC
CTACTCTAGAGCTAGCGAATTg ccgcca ccATG GTGAGCAAGGGCGAGGAG CTGTTCACCGG
GGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTC
CGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCAC
CG GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGC
TTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAG
GCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGA
GGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAA
GGAGGACGGCAACATCCTGGGGCACAAGCTG GAGTACAACTACAACAGCCACAACGTCTA
TATCATG GCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATC
GAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACG GC
CCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCA
ACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGG
CATGGACGAGCTGTACAAatgattgtttattgcagcttataatggttacaa ata a agca atagcatca ca a attt ca ca a ataa a gcatttttttca ctgcattctagttgtggtttgtccaa a ctcatca atgtatcttatCTCTCTCTGGTGA
GTCCCCAGGACGCCCCTGGCCCTAATTTCCTCCAGCTGCGCACCTCCGGCCCTCACTGCACG
CGCCAGCCTTCTTTGTGCGGTCGGGTAAACAGAGGGCGGAGTCCCCTTGGCCTCGCCTCCC
G CTAA CCATTGTTGCCTCCATCTCTTCCCGTAG G G CTTCTATTTCGACCG CGATGATGTG G CT
CTGGAAGGCGTGAGCCACTTCTTCCGCGAATTGGCCGAGGAGAAGCGCGAGGGCTACGAG
CGTCTCCTGAAGATGCAAAACCAGCGTGGCGGCCGCGCTCTCTTCCAGGACATCAAGGTAA
CTAGTGTGTGGGTAATGGACTACATCTCCCAGCAGGCCGTGCGCGCGAGGAGCCTTGATTT
GAGGGCGTAGGTGTCGCGTGGGCTTCTGGGAGATTGAGTTCGGTCTTGTGAG CCCTCTTAA
CCG CTG GA
pARB I- CTTTG CATAG

654: ATAAATACTATAAATAGTATTATTCCTTTTGCATTGAG
AGTCCTGACGAAATGTCCATGTGA
PTE N_1_ CAGTTCATTTTGGGTTTAGCTCTACCTCTAATATGTGACCTATG CTACCAGTCCGTATAGCGT
Exogenou AAATTCCCAGAATATATCCTCCTGAATAAAATGGGGGAAAATAATACCTGGCTTCCTTAATG
s_31 ATTATATTTAAG ACTTATCAA GAGACTATTTTCTATTTAACAATTAGAAAGTTAAG
CAATACA
TTATTTTTCTCTG GAATCCAGTGTTTCTTTTAAATACCTGTTAAGTTTGTATGCAACATTTCTA
AAGTTACCTACTTGTTAATTAAAAATTCAAG AGTTTTTTTTTCTTATTCTG AG GTTATCTTTTT
ACCACAGTTtga cgactgtgccttctagttgccagcca tctgttgtttg cc cctcccccgtgc cttccttga ccctgga a ggtgcca ctccca ctgtcctttccta ata a a atgagga a attgcatcgcattgtctgagtaggtgtcattcta ttctggggg gtggggtggggcaggacagcaagggggaggattgggaagacaatagcaggcatgctggggatgcggtgggctctatg gGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTCCCCG
AGAAGTTG GGG G GAG G GGTCGG CAATTGAACG GGTGCCTAGAGAAGGTGGCG CGG G GT
AAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGGGGAGAACC
GTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAACAC
AG CTGAAG CTTCGAGG GG CTCGCATCTCTCCTTCACGCGCCCG CCGCCCTACCTGAG GCCG
CCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAACTGCGT
CCG CCGTCTAG GTAAGTTTAAAGCTCAGGTCGAGACCG G GCCTTTGTCCG G CGCTCCCTTG
145 GAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTACGT
CTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTACTCTAGA
GCTAGCGAATTgccgccaccATGGTGAGCAAG GGCGAGGAGCTGTTCACCG GGGTGGTGCCC
ATCCTGGTCGAGCTGGACG GCGACGTAAACG GCCACAAGTTCAGCGTGTCCG GCGAGG GC
GAGGGCGATGCCACCTACGGCAAGCTGACCCTG AAGTTCATCTGCACCACCGGCAAGCTG C
CCGTGCCCTG GCCCACCCTCGTGACCACCCTGACCTACGG CGTGCAGTGCTTCAG CCG CTAC
CCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCG AAGGCTACGTCCAGG
AG CGCACCATCTTCTTCAAGGACGACGG CAACTACAAG ACCCGCGCCGAGGTGAAGTTCGA
GGGCGACACCCTGGTGAACCG CATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGGCAA
CATCCTG GGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGAC
AAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGC
GTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGC
CCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCG
ATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCT
GTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatca ca aa tttca caa ata a agcattt ttttca ctgcattcta gttgtggtttgtcca a a ctcatca atgtatcttatGCACAATATCCTTTTGAAGACCATA
ACCCACCACAGCTAGAACTTATCAAACCCTTTTGTGAAGATCTTGACCAATGGCTAAGTGAA
GATGACAATCATGTTG CAG CAATTCACTGTAAAG CTG GAAA G GGACGAACTG GTGTAATG A
TATGTGCATATTTATTACATCGGGGCAAATTTTTAAAGGCACAAGAGGCCCTAGATTTCTAT
GG GGAAGTAAG GACCAG AG ACAAAAAGGTAAGTTATTTTTTGATGTTTTTCCTTTCCTCTTC
CTGGATCTGAG AATTTATTGGAAAACAGATTTTGGGTTTCTTTTTTTCCTTCAGTTTTATTGA
GGTGTAATTGACAAGTAAAAATTATATATAAATACAATGTATAATATGATGTTTTGATGTAT
GTGTATATACATTGTGAAATGATTACTACAGTCAAACTACTTAACATATTCAT
pA R B I-655:
TAATACCTGGCTTCCTTAATGATTATATTTAAGACTTATCAAGAGACTATTTTCTATTTAACA
PTE N_2_ ATTAGAAAGTTAAGCAATACATTATTTTTCTCTGGAATCCAGTGTTTCTTTTAAATACCTGTT
Exogenou AAGTTTGTATGCAACATTTCTAAAGTTACCTACTTGTTAATTAAAAATTCAAG AGTTTTTTTTT
s_32 CTTATTCTGAGGTTATCTTTTTACCACAGTTGCACAATATCCTTTTGAAGACCATAACCCACC
ACAGCTAG AACTTATCAAACCCTTTTGTGAAGATCTTGACCAATGGCTAAGTGAAGATGACA
ATCATGTTGCAGCAATTCACTGTAAAGCTGGAAAGGGACGAACTGGTGTAATGATATGTGC
ATATTTATTACATtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttgaccc t ggaaggtgcca ctccca ctgtcctttcctaata a a atgagga a attgcatcgcattgtctga gtaggtgtca ttctattctg gggggtggggtggggcaggacagcaagggggaggattgggaagacaatagcaggcatgctggggatgcggtgggct ctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTC
CCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGGTGGCGCGG
GGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGG GTG GGGG AGA
ACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAA
CACAGCTGAAGCTTCGAGGGG CTCGCATCTCTCCTTCACG CGCCCGCCGCCCTACCTGAGG
CCG CCATCCACG CCGGTTGAGTCGCGTTCTGCCG CCTCCCGCCTGTGGTGCCTCCTGAACTG
CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCGCTCCC
TTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTA
CGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTACTCT
AGAGCTAG CGAATTgccgcca ccATG GTGAGCAAG GGCGAGGAGCTGTTCACCGG GGTG GT
GCCCATCCTGGTCGAGCTGGACG GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGA
GGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAA
GCTG CCCGTGCCCTGG CCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAG CC
GCTACCCCGACCACATGAAG CAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGT
CCAGGAGCGCACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCGCGCCGAGGTGAA
GTTCGAGG GCGACACCCTG GTGAACCG CATCGA GCTGAAGGG CATCGACTTCAAGGAG GA
CG GCAACATCCTGG GGCACAAGCTGGAGTACAACTACAACAG CCACAACGTCTATATCATG
GCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGAC
GGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGC
TGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAA
GCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGAC
GAGCTGTACAAatgattgtttattgcagcttata atggttaca a ata a agca atagcatca ca a atttca ca aata a
146 agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatctta tCGG GG
CAAATTTTTAAAGG
CACAAGAGGCCCTAGATTTCTATGGGGAAGTAAGGACCAGAGACAAAAAGGTAAGTTATT
TTTTGATGTTTTTCCTTTCCTCTTCCTGGATCTGAGAATTTATTGGAAAACAGATTTTGGGTTT
TTCCTTCA GTTTTATTG AG GTGTAATTGACAAGTAAAAATTATATATAAATA CAATG
TATAATATGATGTTTTGATGTATGTGTATATACATTGTGAAATGATTACTACAGTCAAACTAC
TTAACATATTCATCACCTCACATAATTATTATTCTCCCCCCAGGGTGAAAGCATTTAAGATCT
ACAAGCTACAATTTTCAATTATACAATGTTATTATTAACTATAGTCACTATGCTGTCCAGTAG
AG CTTCAGATCTTGTTCATCTTGTGTTCCTCCCTCCCCACCCTCAGTCCCTG GAA
pARBI- ATAAATAGTATTATTCCTTTTGCATTGAGAGTCCTGACGAAATGTCCATGTGACAGTTCATTT 2541 656: TGGGTTTAGCTCTACCTCTAATATGTGACCTATGCTACCAGTCCGTATAGCGTAAATTCCCA
PTE N_3_ GAATATATCCTCCTGAATAAAATGGGGGAAAATAATACCTG GCTTCCTTAATGATTATATTT
Exogenou AAGACTTATCAAGAGACTATTTTCTATTTAACAATTAGAAAGTTAAGCAATACATTATTTTTC
s_33 TCTGGAATCCAGTGTTTCTTTTAAATACCTGTTAAGTTTGTATGCAACATTTCTAAAGTTACC
TACTTGTTAATTAAAAATTCAAGAGTTTTTTTTTCTTATTCTGAGGTTATCTTTTTACCACAGT
TGCACAATATCCTTTTGAAGACCATAACCCACCACAGCTAGAACTTATCAAACCCTTTTGTGA
AGATCTTGACtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttgaccctgg a aggtgccactccca ctgtcctttccta ata a aatgagga a attgcatcgcattgtctgagtaggtgtcattctattctgggg ggtggggtggggca ggacagca a gggggaggattggga aga ca atagcaggcatgctggggatgcggtgggctctat ggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGG G CAGAGCGCACATCGCCCACAGTCCCC
GAGAAGTTGGG G G GAG G GGTCGG CAATTGAACGGGTGCCTAGAGAAGGTGGCGCG G GG
TAAACTGG GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG GGAGAACC
GTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAACAC
AG CTGAAG CTTCGAGG GG CTCGCATCTCTCCTTCACGCGCCCG CCG CCCTACCTGAG GCCG
CCATCCACGCCG GTTGAGTCGCGTTCTGCCGCCTCCCG CCTGTGGTGCCTCCTGAACTG CGT
CCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCGCTCCCTTG
GAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTACGT
CI!! GTTTCGTITTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGG CGCCTACTCTAGA
GCTAG CGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCC
ATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGG GC
GAGGGCGATGCCACCTACGGCAAGCTGACCCTG AAGTTCATCTGCACCACCGGCAAGCTG C
CCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTAC
CCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCG AAGGCTACGTCCAGG
AG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAG ACCCGCGCCGAGGTGAAGTTCGA
GGGCGACACCCTGGTGAACCG CATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGGCAA
CATCCTG GGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGAC
AAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGC
GTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGG CGACG GCCCCGTG CTGCTGC
CCGACAACCACTACCTGAGCACCCAGTCCAAG CTGAGCAAAGACCCCAACGAGAAGCGCG
ATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCT
GTACAAatgattgtttattgcagcttataa tggtta ca a ata a agca atagcatca ca aa tttca caa ata a agcattt ttttca ctgcattcta gttgtggtttgtcca a a ctcatca atgtatcttatCAATG
GCTAAGTGAAGATGACAAT
CATGTTG CAG CAATTCACTGTAAAG CTG GAAAG G GACGAACTG GTGTAATGATATGTG CAT
ATTTATTACATCGGG GCAAATTTTTAAAGGCACAAGAGG CCCTAGATTTCTATGGG GAAGT
AAGGACCAGAGACAAAAAGGTAAGTTATTTTTTGATGTTTTTCCTTTCCTCTTCCTGGATCTG
AGAATTTATTGGAAAACAGATTTTGGGTTTCTTTTTTTCCTTCAGTTTTATTGAG GTGTAATT
GACAAGTAAAAATTATATATAAATACAATGTATAATATGATGTTTTGATGTATGTGTATATA
CATTGTGAAATGATTACTACAGTCAAACTACTTAACATATTCATCACCTCACATAATTATTAT
TCTCCCCCCAGGGTGAAAGCATTTAAGATCTACAAGCTACAATTTTCAATTAT
pARBI- TAGTGACGGTATCAAGAGCAGCATCCCAGGTTGACTTTCTTTGGGGCAGGACCTTGCTTGT 2541 657: GTCTTAACCCTTGGTTCCTACccaagtttgtctctctagtcctc a a ctctga a agccacttgatatctta ca caatt PTP N2_1 ccttttttgcctga ga a a ata a a agtcagttttgatcattta ca a cca a a a atccta a cta a ca ca GACCTGTTTTTG
Exogeno AAATCAGGTAGATTGGAAGTCTTGGTTTACTTCTTATAAGCCCGTCCGCTGTCTGTCTTGTA
us_34 ACATCCCAGGAGCAGACTTAACAAGGCCTTCCTG GAGCCTTG
CTCTTTCTGTCTGCTTCCCTC
ATCTGctctctctcctctctctTCACAGGGTCCACTTCCTAACACATGCTGCCATTTCTGGCTTATGG
TTTGGCAGCAG AAGACCAAAGCAGTTGTCATGCTGAACCGCtgacgactgtgccttctagttgccagc
147 catctgttgtttgcccctcccccgtgccttccttgaccctggaaggtgcca ctccca ctgtcctttccta ataa a a tg agga aattgcatcgcattgtctgagtaggtgtcattctattctggggggtggggtggggcagga cagcaagggggaggattgg ga aga ca a ta gcaggcatgctggggatgcggtgggctctatggGGATCTGCGATCGCTCCG GTGCCCGTC
AGTGGGCAGAGCGCACATCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAAT
TGAACGGGTGCCTAGAGAAGGTGGCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTG
GCTCCGCCTTTTTCCCGAGGGTGGGG GAGAACCGTATATAAGTGCAGTAGTCGCCGTGAAC
GTTCTTTTTCGCAACGGGTTTGCCGCCAGAACACAGCTGAAGCTTCGAGGGGCTCGCATCTC
TCCTTCACGCGCCCGCCGCCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGC
CGCCTCCCGCCTGTGGTGCCTCCTGAACTGCGTCCGCCGICTAGGTAAGTTTAAAGCTCAGG
TCGAGACCGGGCCTTTGTCCGGCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCAC
GCTTTGCCTGACCCTGCTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTAC
AGATCCAAGCTGTGACCGGCGCCTACTCTAGAGCTAGCGAATTgccgcca ccATGGTGAGCAA
GGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTG GACGGCGACGTAAA
CGGCCACAAGTTCAGCGTGTCCG GCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGAC
CCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCC
TGACCTACGGCGTG CAGTGCTTCAG CCGCTACCCCGACCACATGAAG CAG CACGACTTCTTC
AAGTCCGCCATGCCCGAAG GCTACGTCCAGGAG CGCACCATCTTCTTCAAGGACGACGGCA
ACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGC
TGAAGGGCATCGACTTCAAG GAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACT
ACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTT
CAAGATCCGCCACAACATCGAGGACGGCAG CGTGCAGCTCGCCGACCACTACCAGCAGAA
CACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCC
AAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACC
GCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggttac aaataaagca atagcatcaca a atttca ca a ata a agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatc a atgtatcttatATTGTGGAGAAAGAATCGGTGAGTAATATTTAATATTTACAACTTAGTATACTG
TATGTGATCATTAGCATATAAAGATTTTCA II I I GGG GCCAATATTCATTCCITAGTTITG GC
CTTTAATTGCTAAAGG CTAGCGGTCAAAGTGTTTCTGTCCGGAAAAACTCATGTATCCTTTTC
CTTTCTTAAGTATTTTTATAACA ATTGCAGACCTTTCATCTCCAAGATAGAAATACAATGG AA
TAAATATGGACTAGCAGTGACATATAGATCTTTG GAATCCCTAGGAATCCTTATTGACCTAT
TGTCAATAAGTCCTCTTCCAG CTTTAAGATCTTGTACTGTTCTACTTCAG ATTTCTTACTTTAT
TCTACATTTCATAATTG CTG GTTTTTAATGATGATG AAAATTTCCTAATCCACTTCTAATTAG A
TGTG G CAATGA CATG CC
pARB I- CAGAATTTGAATATGATGTGGCTGACCATAGATACCTCCATTGCATTAATTCTG

658: AGTGTTTGATG CTTG CTTGTAG GTGTTG AATAG AG CCTAATTAG CA
GTAAGTAAACAGATCT
PTP N2_2 agagca ca ga ctttgg ca tctga a gga ccttggttcaatgctggga ctgcca ctta ctagctatgggatctttggtggatt Exogeno ttta a cctttga a agcctttcatccctcacagatata atggggaaga a aa tcgtttccattagtattgtga ggatta a atg us_35 attaattgatgta a a ca atgtgctttccatagta cttggcatatcata agtgcttcatGTTATTTTACGCTG G CTGG
GAAGATAAGTTTTG CTGTG G AGAATTTAA GAG G GATATTAAAATATATTTTTGTTATTTTAA
GGAAATTCGAAATGAGTCCCATGACtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccc cgtgccttccttgaccctgga aggtgcca ctccca ctgtcctttccta ata a a atga gga a attgcatcgcattgtctgagt aggtgtcattcta ttctggggggtggggtggggcagga cagca agggggaggattggg a aga caatagcaggcatgct ggggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCAC
ATCGCCCACAGTCCCCGAGAAGTTGG GGGGAGGGGTCGGCAATTGAACGGGTGCCTAGA
GAAGGTGGCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCCG A
GGGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGG
GTTTGCCGCCAGAACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGC
CGCCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGG
TGCCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAG CTCAGGTCGAGACCGGGCCTTT
GTCCGGCG CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTG
CTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGA
CCGGCGCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGT
TCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACG GCGACGTAAACGGCCACAAGTTCA
GCGTGTCCGGCGAGGGCGAGGG CGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCT
GCACCACCGGCAAGCTG CCCGTGCCCTGG CCCACCCTCGTGACCACCCTGACCTACGGCGT
148 GCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACG ACTTCTTCAAGTCCG CCATG C
CCGAAGGCTACGTCCAG GAGCGCACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCG
CGCCG AGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAG CTGAAG GGCATCGA
CTTCAAG GAG GACG GCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAA
CGTCTATATCATG GCCGACAAG CAGAAGAACGG CATCAAGGTGAACTTCAAGATCCG CCAC
AACATCGAGGACGGCAGCGTG CAG CTCG CCGACCACTACCAG CAGAACACCCCCATCGGC
GACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAG CACCCAGTCCAAGCTGAG CAAAG
ACCCCAACG AG AAGCG CGATCACATGGTCCTG CTG GAGTTCGTGACCG CCGCCG GGATCAC
TCTCGG CATG GACG AGCTGTACAAatgattgtttattgcagcttata atggtta caa ata a agcaatagcatc aca aatttca caa ata a a gcatttttttca ctgca ttctagttgtggtttgtccaa a ctcatca atgtatcttatTATCCTC
ATAGAGTG GCCAAGTTTCCAGAAAACAGAAATCGAAACAGATACAGAGATGTAAGCCCAT
GTAAGTACTTGTGG GTTTGTGTGCATGTGTATTTTTTGGTTTGTTTTAGGAGGCAGGATGTA
GTGAAAAG G AG G GCTTGGG GTCAGTGTTGATAAATACAGTAATTTTTTTCCTATTTACGTAA
GA CA C GTTCTTTAAG AATTTAAAG G TGTA G AATA GTG GAAG GAAAAATAATTACCCATAAT
TATTTCATCCTTCAAACATAG GCACCATTATTTTG GTGTATTTCCCACAACCTGTTTTTCTTAT
GCA CA GTTTTCCTTTCTTTAAAATAATTGTAATTATA G ATG TCTATATAATATCCATATG TATT
TCCTTTTCATATCATTCCATCCCCTTTGTAAGTTTTGCTTAAAATAGTTATTGGTGTAGTTTGG
TTTTTT
pA R B I- ACCAGTTATCGTTTTGTAG GTAGACCAATCAATTCTTAG

659: TTTACTTTTAGGAAG
GTAAGTACTTTATTCATTTAACTATGTTTACTCCTGGTCGATTTTTCAA

_ Ex oge n o AG ATTG CATTTTATACTCCTTACAAAAAAAG TAG AG AATAG TTTAG G
ATTTGTCTCAACTCTA
us_36 TTATGATGCTAATTCATTTTTCTTATTTCTTCTGTCTTTTTATAAATACCTAGAATATTAATATG
AAGATAAAAACAGTAATTTTGAAATGACAGTTGTG GTTTATCATTCTCTATTTTCAGATGATC
ACAGTCGTGTTAAACTGCAAAATGCTGAGAATGATTATATTAATGCCAGTTTAGTTGACATA
GAAGAGGCAtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttgaccctgga aggtgccactccca ctgtcctttccta ata a aatgagga a attgcatcgcattgtctgagtaggtgtcattctattctgggg ggtggggtggggca ggacagca a gggggaggattggga aga ca atagcaggcatgctggggatgcggtgggctctat ggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGG G CAGAGCGCACATCGCCCACAGTCCCC
GAGAAGTTGGG G GGAG GG GTCG G CAATTGAACGGGTGCCTAGAGAAGGTGGCGCG G GG
TAAACTGG GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG GGAGAACC
GTATATAAGTG CAGTAGTCG CCGTGAACGTTCTTTTTCG CAACGGGTTTG CCGCCAGAACAC
AG CTGAAG CTTCGAGG GG CTCGCATCTCTCCTTCACGCGCCCG CCGCCCTACCTGAG GCCG
CCATCCACGCCG GTTGAGTCGCGTTCTGCCGCCTCCCG CCTGTGGTGCCTCCTGAACTG CGT
CCG CCGTCTAG GTAAGTTTAAAGCTCAGGTCGAGACCG G GCCTTTGTCCG G CGCTCCCTTG
GAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTACGT
CTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGG CGCCTACTCTAG A
GCTAG CG AATTgccg ccaccATGGTGAGCAAG GGCGAG GAGCTGTTCACCG G GGTGGTGCCC
ATCCTGGTCGAGCTGGACG GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGG GC
GAGGGCGATGCCACCTACGG CAAGCTGACCCTG AAGTTCATCTGCACCACCGGCAAG CTG C
CCGTGCCCTG GCCCACCCTCGTGACCACCCTGACCTACGG CGTGCAGTGCTTCAG CCG CTAC
CCCGACCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATGCCCG AAGG CTACGTCCAGG
AG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAG ACCCGCGCCGAG GTGAAGTTCG A
GGGCGACACCCTGGTGAACCG CATCGAGCTGAAGGGCATCGACTTCAAGGAG GACG GCAA
CATCCTG GGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGG CCG AC
AAGCAGAAGAACGG CATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACG GCAGC
GTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGG CGACG GCCCCGTG CTGCTGC
CCGACAACCACTACCTGAGCACCCAGTCCAAG CTGAGCAAAGACCCCAACGAGAAGCGCG
ATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCT
GTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatca ca aa tttca caa ata a agcattt ttttca ctgcattcta gttgtggtttgtcca a a ctcatca atgtatcttatCAAAGG
AGTTACATCTTAACACAG
GCAAGTAATACGATACCACG CAAATGTCTGAAAATGATGTGTTTGTG CTTGTTCTGCTTTAA
ATCATAG G TAAAA G TATATCTT G ACTTCTTTTG G AAATG AAATGGTATTTCAGTTTTTTTCTG
ACG ATGTAATG AATTATG G A CATTATAAG GTTTT G AAG CTTTG A GTATTTAAG ATAAAAGG C
AAG TTATTTTTG ATATTA CA C GTTCTGTG AAG G AAAATTCTTAG G AAATGG CTTAG G CCA GT
149 TCTTTG G CAGATTGTGTTCCTTACATTATATCTGACACAGAGTG CTGCTTATG CTTCTTAGTG
TCTT CTTTTCTCTTCCCATACC CTG TG G CAAAACCAG A G GCCCTG GACA G CTCTTCTGTAG CT
CCCCCTG CCTCGCATATATCTTCAGAGAGTACACCAAGCCCTG GATG GTGT
pARBI- CCTGCTCAAGG GCCGAGGTGTCCACGGTAGCTTCCTGG

660: GGTGACTTCTCGCTCTCCGTCAGGTAGGTGGG
CCCCCCGCAACCCCGGGCATTTTGGCCACT
PTP N6_1 CTCTTGTGCCATCCAGGCCCTGAACCACTCATTCCTGGTTCCCCGTGGCAGTGCTGACTCCCC
Exogeno GTCTGTTCCCTTGCCCCCAACCCCCACACTCCCCATCCCTGTCTGTGCCCACCCATGCCCATG
us_37 TGTGCCCCCACCCAGGACCTCAGCCGATCCCTGCCCTCCTGCCTCTACTCCTG
CACCGACTG
GCCTCACCGCCTGGTGCCCTGCAGGGTGGGGGATCAGGTGACCCATATTCG GATCCAGAAC
TCAGG GGATTTCTATGACCTGTATGGAGGGGAGAAGTTTGCGACTCTGACAGAGCTGGTG
GAGTACTACACTCAGCAGtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttcc ttga ccctgga aggtgcca ctccca ctgtcctttccta a ta aaatgaggaaattgcatcgcattgtctgagtaggtgtcatt ctattctggggggtggggtggggcaggacagcaagggggaggattgggaagacaatagcaggcatgctggggatgcg gtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCA
CAGTCCCCGAGAAGTTGGGGG GAGG GGTCGGCAATTGAACGGGTGCCTAGAGAAGGTGG
CGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG
GGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCG
CCAGAACACAGCTGAAGCTTCGAGGGG CTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACC
TGAGG CCGCCATCCACG CCG GTTGAGTCG CGTTCTG CCGCCTCCCG CCTGTGGTGCCTCCTG
AACTG CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCG
CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAA
CTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCT
ACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGG
TGGTGCCCATCCTGGTCGAG CTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG
GCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCG
GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTC
AG CCG CTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCG CCATGCCCGAAGG CT
ACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGT
GAAGTTCGAGG G CGACACCCTGGTGAACCGCATCGAG CTGAAGGGCATCGACTTCAAG GA
GGACGGCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATC
ATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG
GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCC
GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAG CAAAGACCCCAACG
AGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCAT
GGACGAGCTGTACAAatgattgtttattgcagcttataatggttacaa ata a agcaatagcatca ca a atttca c a aa ta a agcatttttttca ctgcattctagttgtggtttgtcca a actcatca atgtatcttatCAGGGTGTCCTGCAG
GACCG CGACG G CACCATCATCCACCTCAAGTACCCG CTGAACTG CTCCGATC CCA CTAGTGA
GAGGTGAGGGCTCCGCACCCCCGCCATTCCCAAGCAGGGATGAGCCGGCTCCCACCCTGAA
CAGCCAGGGAGG CAGG GAG ACTGGCAGCCGGCGCTGCCTACCCTCCATCCCCTCCCCTCCC
TGCACCAGCTGGGGCTCTCAATGTCCCTCCTCCCTGCTGTCCTGGGACCTGGTGTCTCAGAG
CCTAACCTACCACCCTTTCCACCTAACCCCGAGGAAGCCACAGAAAGCTGCCTCGCCCTACT
CCG G GAG CCCTGG CCG CTG CAACCCAG GTCCCACTG GAG ACAGG GAGG CCACTGCTGGTG
GCCAG CATGTCGTG CAG GCCAGCTCTGTTGTTAGAAAG CTCTTCTTCCTCTGGAATCGAGCC
TGCCT
pARBI- CTGGGTTCGAAGCCCGGTTAGAACTCTGGAGGCTAGGATGGCTTGAACCTG GGAG

661: GGCTGCAGAGAGCTGTAACCG
CGCCACTGCACTCCAGCCTGGGCAACAGAGCTCTGGAAG
PTP N6_2 CTTGCCCTAGAGTCAGTCAAGGGCCCTAGGCCAGTGAGTAACAGCTCAGCGTCAGTTTCCT
Exogeno CATCTATAAAATG GG G GTAATATCATACCTAG CTCTCAG CATG TTTGTG AG AG AC
CTAAATG
us_38 AG GTG GTG GATTTGGAAG CATGTAGCGCAGTGCCTGGCACACAGTAG
GTGCTTGATTTCCG
GCCCCTCTCTGTGAATGTCTCTGCTCAGCGCCTTCCCCTGTGGCCTGGGTCTTACCTTCCCTG
ACGCTGCCTTCTCTAGGTGGTACCATGGCCACATGTCTGGCGGGCAGGCAGAGACGCTGCT
GCAGGCCAAGGGCGAGCCCTGGtgacga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgt gccttccttga ccctgga a ggtgcca ctccca ctgtcctttccta ata a aa tg agga a attgcatcgcattgtctgagtag gtgtcattctattctggggggtggggtggggcagga cagcaagggggaggattgggaagacaatagcaggcatgctgg ggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCG CACAT
150 CGCCCACAGTCCCCGAGAAGTTG GGGG GAGGGGTCGGCAATTGAACGGGTGCCTAGAGA
AG GTGGCGCG GGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCG CCTTTTTCCCGAG
GGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGT
TTGCCGCCAGAACACAG CTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCG
CCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTG CCGCCTCCCGCCTGTGGTG
CCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAG GTCGAGACCGGGCCTTTGT
CCGGCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTG CT
TGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACC
GGCGCCTACTCTAGAGCTAG CGAATTgccgccaccATGGTGAGCAAG GGCGAGGAGCTGTTC
ACCGGGGTGGTG CCCATCCIGGICGAGCTGGACG GCGACGTAAACG GCCACAAGTTCAGC
GTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGC
ACCACCGG CAAGCTGCCCGTGCCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGC
AGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCC
GAAGGCTACGTCCAGGAG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCG
CCGAG GTGAAGTTCGAGG G CGACACCCTG GTGAACCGCATCGAGCTGAAGG GCATCGACT
TCAAGGAGGACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACG
TCTATATCATG GCCGACAAGCAGAAGAACG GCATCAAG GTGAACTTCAAGATCCG CCACAA
CATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGA
CG GCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGAC
CCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTC
TCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcaca aatttca ca a ata aagcatttttttca ctgcattctagttgtggtttgtcca a a ctcatcaatgtatcttatACGTTTCTT
GTGCGTGAGAGCCTCAGCCAGCCTGGAGACTTCGTGCTTTCTGTGCTCAGTGACCAGCCCA
AG GCTGGCCCAGG CTCCCCGCTCAGGGTCACCCACATCAAG GTCATGTGCGAGGTAAGG C
AG CCAGG CGG CG GGG G AG CCTCTG CTGAG GCTCCTGTCTGTGACCACAGTGTG GGTG G CA
GGGAGG GTCTGCCTG GGCTTGAATTCAAGGCTGGGGACCCAGGGAGGGAGACTCAAGTC
CTGTGAATGGCCTAATTTG GCTCCCCCCAGG GTGGACGCTACACAGIGGGIGGITTGGAGA
CCTTCGACAGCCTCACGGACCTGGTGGAGCATTTCAAGAAGACGGGGATTGAGGAGGCCT
CAGGCGCCTTTGTCTACCTGCG GCAGGTCAGGGGTGGGCCCAGCTGCCTCCCCACTTCCCCT
GAGCTGTCCCCCAGATGT
pARB I- TGTCACATATGTGCAATGCCATGCTCCTGAGCCTTTGATTGCAGACGTGTGG

662: CCCGTCCCCACCCCCAGTG CCACCCTG CTCTG CTTCTCTTCCCTTG CTGTG
CTCTAAAACG AG
PTP N6_3 AAGTACAAGTGAGTTCCCCCAAGGGGTCGGCCGCGCCTCTTCCTGTCCCCGCCCTGCCGGC
Exogeno TGCCCCAGGCCAGTGGAGTGGCAG CCCCAGAACTGGGACCACCGGG GGTGGTGAGGCGG
us_39 CCCGGCACTGG
GAGCTGCATCTGAGGCTTAGTCCCTGAGCTCTCTGCCTGCCCAGACTAGCT
GCACCTCCTCATTCCCTG CGCCCCCTTCCTCTCCGGAAGCCCCCAGGATGGTGAG GTAAGGG
CCTGCCACCCACGGTAGACAGGAGGCAAGG GTGCCTGGTGCCCACGG GACCCCTCCTCACT
GCCCTGCCTGGGCCGCCCAGGtgacga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgtgc cttccttgaccctgga a ggtgccactcccactgtcctttccta ata a a atgagga a attgcatcgcattgtctgagtaggtg tca ttctattctggggggtggggtggggcagga cagca agggggaggattggga ag a ca atagcaggca tgctgggga tgcggtgggctctatggGGATCTGCGATCG CTCCGGTGCCCGTCAGTGG GCAGAG CGCACATCG C
CCACAGTCCCCGAGAAGTTG G GGG GAGGG GTCG GCAATTGAACGG GTGCCTAGAGAAGG
TG GCG CGGG GTAAACTGG GAAAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCCGAGGGTG
GG GGAGAACCGTATATAAGTG CAGTAGTCG CCGTGAACGTTCTTTTTCGCAACG GGTTTGC
CG CCAGAACACAG CTGAAGCTTCGAGG GGCTCG CATCTCTCCTTCACG CG CCCG CCGCCCT
ACCTGAGG CCGCCATCCACG CCG GTTGAGTCGCGTTCTGCCG CCTCCCG CCTGTG GTGCCTC
CTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAG CTCAGGTCGAGACCGG GCCTTTGTCCG
GCGCTCCCTTGG AG CCTACCTAGACTCAG CCGGCTCTCCACGCTTTGCCTGACCCTG CTTGC
TCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTG CG CCGTTACAGATCCAAG CTG TG A CCG GC
GCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAG CAAGGGCGAGGAGCTGTTCACC
GG GGTG GTG CCCATCCTG GTCGAG CTGGACGGCGACGTAAACG G CCACAAGTTCAGCGTG
TCCGGCGAGG GCGAG GGCGATG CCACCTACG GCAAGCTGACCCTGAAGTTCATCTGCACC
ACCGG CAAGCTGCCCGTG CCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT
GCTTCAG CCGCTACCCCGACCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATG CCCGAA
GGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCG CCG
151 AG GTGAAGTTCGAGG GCGACACCCTG GTGAACCG CATCGAGCTGAAG GGCATCGACTTCA
AG GAG GACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCT
ATATCATGGCCGACAAGCAG AAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACAT
CGAGGACGGCAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGG
CCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTG AGCAAAGACCCC
AACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCG
GCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcacaaatt tca ca a ata aag ca tttttttca ctgcattctagttgtggtttgtcca aa ctcatcaatgtatcttatTGGTTTCACCGA
GACCTCAGTGG GCTGGATGCAGAG ACCCTGCTCAAGGGCCGAGGTGTCCACGGTAGCTTC
CTGG CTCGGCCCAGTCG CAA GAACCAG GGTGACTTCTCG CTCTCCGTCAG GTAG GTG GGCC
CCCCGCAACCCCGGGCATTTTGGCCACTCTCTTGTGCCATCCAGGCCCTGAACCACTCATTCC
TGGTTCCCCGTGGCAGTGCTGACTCCCCGTCTGTTCCCTTGCCCCCAACCCCCACACTCCCCA
TCCCTGTCTGTGCCCACCCATGCCCATGTGTGCCCCCACCCAGGACCTCAGCCG ATCCCTGC
CCTCCTGCCTCTACTCCTGCACCGACTGGCCTCACCGCCTGGTGCCCTGCAGGGTGGGGGAT
CAGGTGACCCATATTCGGATCCAGAACTCAGGGGATTTCTATGACCTGTATGGAGGGGAGA
AG TTTG
pARB I- TCTTTAG CTAATGTTCTG TAG

663:
GCATCAATATTTGTATATATGCATGCATATATGTGTATGTATATTTATGAATACATATACACA

Exogenou CTATAGTGGACTGG GGAGTTAGTATACTG GGAG GAG CATACATTTAGG GTATGATTCACAT
s40 ATTTATTTTGTCCTTCTCCCATTTTCCATTAATTAACAGGATTGACTACAGCAAAGATGCCCA
GTGTTCCACTTTCAAGTGACCCCTTACCTACTCACACCACTGCATTCTCACCCGCAAGCACCT
TTGAAAGA GAAAATGACTTCTCAGAG ACCACAACTTCTCTTAGTCCAGACAATACTTCCA CC
CAAGTATCCCCGtga cgactgtgccttctagttgccagcca tctgttgtttg cc cctc cc ccgtgc cttccttga ccctg gaaggtgcca ctcccactgtcctttccta ataa a atgaggaa attgcatcgcattgtctgagtaggtgtcattctattctgg ggggtggggtggggcaggacagcaagggggaggattgggaaga ca atagcaggcatgctggggatgcggtgggctct atggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTCCC
CGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGGTGGCGCGGG
GTAAACTG GGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCG AGGGTGG GGGAGAA
CCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAAC
ACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCTGAGGC
CGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTG GTGCCTCCTGAACTGC
GTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCGCTCCCTT
G GAG CCTACCTAGACTCAG CCG G CTCTCCACG CTTTG CCTGACCCTGCTTGCTCAACTCTAC
GTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTACTCTA
GAGCTAGCGAATTgccgcca ccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGC
CCATCCTGGTCGAGCTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG GCGAGG
GCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCT
GCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCT
ACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCA
GGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTC
GAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGG AGGACGGC
AACATCCTGGG GCACAAG CTG GAG TACAACTACAACAG CCACAACGTCTATATCATG G CCG
ACAAGCAGAAGAACG GCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG GACGG CA
GCGTG CAGCTCGCCGACCACTACCAG CAGAACACCCCCATCGGCGACGG CCCCGTGCTGCT
GCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCG
CGATCACATGGTCCTGCTGGAGTTCGTGACCGCCG CCGGGATCACTCTCGGCATGGACGAG
CTGTACAAatgattgtttattgcagcttataatggttacaa ata a agca atagc atca ca a atttca c a a ata aagca tttttttca ctgca ttctagttgtggtttgtcca a actcatca atgtatcttatGACTCTTTGGATAATGCTAGTGC
TTTTAATACCACAGGTTGGCACACAAAAGTTGTTAACTTAAATATCAGGGAATGTCATTTAG
AAAATTCTACAGTTATCAGTACAACTTGTCTTTAAATTATTTGCACAGTTTCTAAGTATGTGA
TTTTATTCAAGTGCAGAAATTGCAG GAAATTAGTATCTGTGAAATATAGATCGACTGAAGTA
ATTAATGCATGTTGTTAGG GAGTGGAGAGAGAAAAGAAGGGAAGCAAGATCTCCAAGGAC
AATCAGGAGGGG AAATTTGTTCTAGTATCCTCTGATCTATACACACTCGCCATGATTCTCCC
152 GCTTGG CTTCCCG CCACCTGGACAGATGAGAATTTCCCTAGTTCAGAGATTATCAGTTCTAC
TCTCCTTGGAAGGTGTCTTAAATGGGAGTCTTCCCATTTCTTTGTTTCACTCTAG A
pARBI- AG G GAATTTTTATATGTTG

664: ATAG GTG CTCAATTAATCATCATTATCTAAATAAATAATG CATTTGG G
AAAAAAAAGTTTCA
PTP RC_2_ AAAGTTTTTCAAAAGTCTTTTGCAGGCTTGAAATTAATCCCAATAGTGATCCTTTAGTCTGTT
Exogenou ATGTTTCTGATTTAGCCTGGGGATTCAAAAAATAAATAACATAATTTTGATATATTTGGGCTT
s_41 TGTAAACATG GTACTAAGAG AG GAAATATAGTTTCATTAG G GTAAAAG
CTACTG AAAATTG
CCACTTG GTGAATGTTCTATCATAGACTTGAGGTACATATAAAAATCTAATATATGTTTACAT
TAATATGAATGAAATTTGAAATTTTCTAAGAGATTTTTGTTTCTTCTTTGCAGGGCAAAGCCC
AACACCTTCCCCCtgacga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttgaccct ggaaggtgcca ctccca ctgtcctttcctaata a a atgagga a attgcatcgcattgtctga gtaggtgtca ttctattctg gggggtggggtggggcaggacagcaagggggaggattgggaagacaatagcaggcatgctggggatgcggtgggct ctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTC
CCCGAGAAGTTGG GG GGAGGGGTCGGCAATTGAACGG GTGCCTAGAGAAGGTGGCGCGG
GGTAAACTGGGAAAGTGATGICGTGTACTGGCTCCGCCTTITTCCCGAGG GTG GGGG AGA
ACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAA
CACAGCTGAAGCTTCGAGGGG CTCGCATCTCTCCTTCACG CGCCCGCCGCCCTACCTGAGG
CCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAACTG
CGTCCG CCGTCTAG GTAAGTTTAAAG CTCAGGTCGAGACCG GG CCTTTGTCCGGCG CTCCC
TTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTA
CGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTACTCT
AGAGCTAGCGAATTgccgccaccATGGTGAGCAAG GGCGAGGAGCTGTTCACCGG GGTG GT
GCCCATCCTGGTCGAGCTGGACG GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGA
GGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAA
GCTG CCCGTGCCCTGG CCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAG CC
GCTACCCCGACCACATGAAG CAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGT
CCAGGAGCGCACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCGCGCCGAGGTGAA
GTTCGAGG GCGACACCCTG GTGAACCG CATCGAGCTGAAGGG CATCGACTTCAAGGAG GA
CG GCAACATCCTGG GGCACAAGCTGGAGTACAACTACAACAG CCACAACGTCTATATCATG
GCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGAC
GGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGC
TGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAA
GCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGAC
GAGCTGTACAAatgattgtttattgcagcttata atggttaca a ata a agca ata gcatca ca a atttca ca aata a agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatctta tACTG
GTAAGAATTAATATTT
ATATTTTTACTAATTTTATTTTCTTGTTGCAAAGTTTATATATTTAACTACAATTTTCTATTATT
AACACTGAAATTATTTTTAAGGATAAATTTTATAATCATGAGTGATTCTTGACATTCACTTGT
TCTTAAACTTTCTGCTTATACGTTATAGAGTTTAATAACTACCTAAACATGTTATTAAATTTGT
ATATATATTTTGTGTATAAATAGTAACTTTTCCCAAACTTGACAGTAAATCACACAACAG GTT
TCTACTCTCTTTTAATATTTTAAGACTATAAAAAAATGCATTTAAATTAGATAACAAAATTTTA
TAGTCTGAAAG CAG GTTAACAG CTGTCTATGTATGTTATAGATATG TAG ATAACAG ATTTG C
ATATGTCTATATTTCTTTAAGAGTATGTTG CTTTTTTCAATG G TATG CA
pARBI- CCTTTAGCACCATAAAGAAACTAAATTATTTAGATGTTTTTATGAGAACATATCAAAAAGTA 2541 665: CIIII CTGTCATCCAATACTTCCACAAATAAATCATTAGTTCTTG
CTAATCTTCATCTGG CATA
PTP RC_3_ AAAATAATGACATCAACTTTCTTCATGTAATTTCCCACTTAATTCCTTTACTAGGAGCAATAT
Exogenou CAATTCCTATATGACGTCATTGCCAGCACCTACCCTGCTCAGAATGGACAAGTAAAGAAAAA
s_42 CAACCATCAAGAAG ATAAAATTG AATTTGATAATGAAGTG G ACAAAGTAAAG CA
G GATG CT
AATTGTGTTAATCCACTTGGTGCCCCAGAAAAGCTCCCTGAAGCAAAGGAACAGGCTGAAG
GTTCTGAACCCACGAGTGGCACTGAGGGGCCAGAACATTCTGTCAATGGTCCTGCAAGTCC
AGCTTTAAATCAAGGTtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttg accctggaaggtgcca ctccca ctgtcctttccta ataa a atgagg aa attgcatcgcattgtctgagtaggtgtcattct attctggggggtggggtggggcaggacagca agggggaggattgggaagacaatagcaggcatgctggggatgcggt gggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCAC
AGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGG GTGCCTAGAGAAG GTG GC
GCGGGGTAAACTGGG AAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGGG
153 GAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGC
CAGAACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCT
GAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTG
AACTG CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCG
CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAA
CTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCT
ACTCTAGAGCTAGCGAATTgccgccaccATGGTG AGCAAGGGCGAGGAGCTGTTCACCGGGG
TGGTGCCCATCCTGGTCGAG CTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG
GCGAGG GCGAG GG CGATGCCACCTACG G CAAGCTGACCCTGAAGTTCATCTGCACCACCG
GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTC
AG CCG CTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGG CT
ACGTCCAG GAGCGCACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCGCGCCGAGGT
GAAGTTCGAGGG CGACACCCTGGTGAACCGCATCGAG CTGAAGGGCATCGACTTCAAG GA
GGACGGCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATC
ATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG
GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCC
GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAG CAAAGACCCCAACG
AGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCAT
GGACGAGCTGTACAAatgattgtttattgcagctta ta atggttacaa ata a agcaatagc atca ca a atttca c a aa ta a agcatttttttca ctgcattctagttgtggtttgtcca a actcatca atgtatcttatTCATAG
GAAAAGACA
TAAATGAG GAAACTCCAAACCTCCTGTTAGCTGTTATTTCTATTTTTGTAGAAGTAGGAAGT
GAAAATAGGTATACAGTGGATTAATTAAATGCAGCGAACCAATATTTGTAGAAGGGTTATA
TTTTACTACTGTGGAAAAATATTTAAGATAGTTTTGCCAGAACAGTTTGTACAGACGTATGC
TTATTTTAAAATTTTATCTCTTATTCAGTAAAAAACAACTTCTTTGTAATCGTTATGTGTGTAT
ATGTATGTGTGTATGGGTGTGTGTTTGTGTGAGAGACAGAGAAAGAGAGAGAATTCTTTCA
AGTGAATCTAAAAGCTTTTGCTTTTCCTTTGTTTTTATGAAGAAAAAATACATTTTATATTAG
AAGTGTTAACTTAG CTTGAAG GATCTGITTITAAAAATCATA AACTGTGTG CAG ACTCAATA
pARB I- aa gga a a a aggca ctgagtgctggggggtgctggggtgggctgcagtg ataga catcagggtaga ggtta a ggtca g .. 2541 666: gttcagcctca ctggggtgaagtttgagcacggtgagcaggccatgcagcccgggggaggggaggatgggaggaggt PTP RCAP
ggagctttccgggcagagggaacagccagtgcgaaggccccaggcaggtggcttaatgcagctgttgggggaggtgag _1_Exoge tggtagggaggaggctggagggatgggggctgatctcacagggccaga gcctggttga cca a ata aggc cttggccttt nous_43 tctGCTTGGCTGTCCCAAGAGGATCCCAAAGAGAAAAAAACGAAAGTGGTCTTGGTCACCCA
GCCTGCCCCACACCAGGCCCCACCCCAGGTGCTGAGCCCTCTGAGCCCCTGCCTGTCTCCCA
CAGGCTCTGCCCTGCtga cgactgtgccttctagttgccagcca tctgttgtttg cc cctcccccgtgc cttccttga c cctggaaggtgccactcccactgtcctttccta ataa a atgaggaa a ttg ca tcgc attgtctgagtaggtgtcattctatt ctggggggtggggtggggcaggacagcaagggggaggattgggaaga caatagcaggcatgctggggatgcggtgg gctctatggGGATCTGCGATCGCTCCG GTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAG
TCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGGTGGCGC
GGGGTAAACTGGGAAAGTGAIGTCGTGTACTGGCTCCGCCTITTTCCCGAGGGIGGG GGA
GAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCG CCA
GAACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCTG
AG GCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAA
CTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCG GGCCTTTGTCCGGCG CT
CCCTTG GAG CCTACCTAGACTCAG CCG G CTCTCCACG CTTTG CCTGACCCTG CTTG CTCAACT
CTACGTCTTTGTTTCGTTTTCTG TTCTG CGCCGTTACAGATCCAAG CTGTGACCGGCGCCTAC
TCTAGAG CTAG CGAATTgccgccaccATGGTGAG CAAGG GCGAGGAGCTGTTCACCG GG GTG
GTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGC
GAGGGCGAGGGCG ATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGC
AAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCA
GCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTA
CGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTG
AAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAG
GACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCA
TG GCCGACAAGCAGAAGAACG GCATCAAGGIGAACTICAAGATCCGCCACAACATCGAG G
ACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACG GCCCCGT
154 GCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAG
AAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGG
ACGAGCTGTACAAatgattgtttattgca gcttata a tggtta ca aata a agca atagca tca ca a atttca ca a at aa agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatcttatACCTTAGGG
CTCGG GAT
GCTGCTGG CCCTGCCAG GGGCCTTG GGCTCGG GTGG CAGCGCG GAGGACAGCGTGGG CT
CCAGCTCTGTCACCGTTGTCctgctgctgctgctgctcctactgctgGCCACTGGCCTAGCACTGGCCT
GGCGCCGCCTCAGCCGTGACTCAGGGG GCTACTACCACCCGGCCCGCCTAGGTGCCG CGCT
GTGGGG CCGCACGCG GCG CCTGCTCTG GGCCAG CCCCCCAGGTCGCTGGCTGCAG GCCCG
AG CTG AG CTGG G GTCCACAGACAATGACCTTGAG CG ACAG GAG GATGAG CAGG ACACAG
ACTATGACCACGTCGCGGATGGTGGCCTGCAGG CTGACCCTGGGGAAGG CGAGCAGCAAT
GTGGAGAGGCGTCCAGCCCAGAGCAGGTCCCCGTGCGGGCTGAGGAAGCCAGAGACAGT
GACACG
pARB I- aaggtcaggttcagcctcactggggtga agtttgagcacggtgagcaggccatgcagcccgggggaggggaggatggg 2541 667: aggaggtgga gctttccggg cagaggga a cagccagtgcga a ggccccaggcaggtggcttaatgcagctgttggggg PTP RCAP aggtgagtggtagggaggaggctgg agggatggggg ctga tctca cagggccagagcctggttga cca a ataaggcct _2_Exoge tggccttttctGCTTGGCTGTCCCAAGAGGATCCCAAAGAGAAAAAAACGAAAGTGGTCTTG GT
nous_44 CACCCAGCCTGCCCCACACCAGGCCCCACCCCAGGTGCTGAGCCCTCTGAGCCCCTGCCTGT
CTCCCACAGGCTCTGCCCTGCACCTTAGGGCTCGGGATGCTGCTGGCCCTGCCAGGGGCCT
TGGGCTCG GGTGGCAGCGCGGAGGACAGCtga cga ctgtgccttctagttgccagccatctgttgtttgcc cctcccccgtg ccttccttga ccctgga aggtgccactc cca ctgtc ctttccta ata a aatgagga a attgcatcgcattg tctga gtaggtgtcattctattctggggggtggggtggggcagg acagca agggggaggattggga aga ca atagca g gcatgctggggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGG GCAGAG
CGCACATCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCC
TAGAGAAGGTGGCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTC
CCGAGGGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAA
CG GGTTTGCCGCCAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGCG CC
CGCCGCCCTACCTGAGGCCG CCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTG
TGGTGCCTCCTGAACTGCGTCCGCCGTCTAG GTAAGTTTAAAG CTCAGGTCGAGACCGGGC
CTTTGTCCGG CGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGAC
CCTG CTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTG CG CCGTTACAGATC CAAG CT
GTGACCGGCGCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAG
CTGTTCACCG GGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAG
TTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTC
ATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGG
CGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCA
TGCCCGAAGGCTACGTCCAG GAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGAC
CCGCG CCGAGGTGAAGTTCGAGGGCGACACCCTG GTGAACCG CATCGAG CTGAAGG GCAT
CGACTTCAAGGAGGACGG CAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCA
CAACGTCTATATCATG GCCGACAAGCAGAAGAACGG CATCAAGGTGAACTTCAAGATCCGC
CACAACATCGAGGACGGCAGCGTG CAGCTCGCCGACCACTACCAG CAGAACACCCCCATCG
GCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAG CTGAGCAA
AGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGAT
CACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttata atggtta caa ata a agcaatag catca ca aatttca caa ata a agcatttttttca ctgca ttctagttgtggtttgtccaa a ctcatca atgtatcttatGTG
GGCTCCAG CTCTGTCACCGTTGTCctgctgctgctgctgctcctactgctgGCCACTGGCCTAG CACTG
GCCTGGCG CCGCCTCAGCCGTGACTCAGGGGGCTACTACCACCCGGCCCGCCTAGGTGCCG
CGCTGTGGGGCCGCACGCGGCGCCTGCTCTGGG CCAGCCCCCCAGGTCGCTGGCTGCAGG
CCCGAGCTGAG CTGGGGTCCACAGACAATGACCTTGAGCGACAGGAGGATGAGCAG GACA
CAGACTATGACCACGTCGCG GATGGTGGCCTGCAGG CTGACCCTGG GGAAGG CGAGCAG C
AATGTGGAGAGGCGTCCAGCCCAGAGCAGGTCCCCGTGCGGGCTGAGGAAGCCAGAGAC
AGTGACACGGAGGGCGACCTGGTCCTCGGCTCCCCAGGACCAGCGAGCG CAGGGGGCAG
TGCTGAGGCCCTGCTGAGT
pARB I-cagtgcgaaggccccaggcaggtggcttaatgcagctgttgggggaggtgagtggtagggaggaggctggagggatg .. 2541 668: ggggctgatctca caggg ccagagcctggttga cca a ata a ggccttggccttttctGCTTGG
CTGTCCCAAGAG
PTP RCAP GATCCCAAAGAGAAAAAAACGAAAGTGGTCTTGGTCACCCAG CCTGCCCCACACCAGGCCC
155 _3_Exoge CACCCCAGGTGCTGAGCCCTCTGAGCCCCTGCCTGTCTCCCACAGG CTCTGCCCTGCACCTT
nous_45 AG GGCTCGGGATGCTGCTGGCCCTGCCAGG GGCCTTGGGCTCGGGTGGCAGCGCGGAGG
ACAGCGTGGGCTCCAGCTCTGTCACCGTTGTCctgctgctgctgctgctccta ctgctgGCCACTGGCC
TAGCACTGGCCTGGCGCCGCCTCAGCCGTGACTCAGGGGGCTACTACtgacgactgtgccttctag ttgccagccatctgttgtttgcccctcccccgtgccttccttga ccctggaa ggtgcca ctccca ctgtcctttccta ataa a atgagga a attgcatcgcattgtctgagtaggtgtcattctattctggggggtggggtggggcagga cagca a ggggga ggattgggaaga caatagcaggcatgctggggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGC
CCGTCAGTGGGCAGAGCGCACATCGCCCACAGTCCCCGAGAAGTTGGGG GGAGGGGTCG
GCAATTGAACG GGIGCCTAGAGAAGGIG GCG CGGG GTAAACTGG GAAAGTGATGTCGTG
TACTGGCTCCGCCTTTTTCCCGAGGGTGGG GGAGAACCGTATATAAGTGCAGTAGTCGCCG
TGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAACACAGCTGAAGCTTCGAGGGGCTCG
CATCTCTCCTTCACG CGCCCGCCGCCCTACCTGAGGCCGCCATCCACGCCG GTTGAGTCGCG
TTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAG
CTCAGGTCGAGACCGGGCCTTTGTCCGGCGCTCCCTTGGAG CCTACCTAGACTCAGCCG GC
TCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCG
CCGTTACAGATCCAAGCTGTGACCGGCGCCTACTCTAGAGCTAGCGAATTgccgccaccATGG
TGAGCAAGGGCGAGGAGCTGTTCACCGG GGTGGTG CCCATCCTGGTCGAGCTGGACGGCG
ACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCA
AG CTG ACCCTGAAGTTCATCTG CACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTG
ACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACG
ACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGAC
GACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGG GCGACACCCTGGTGAACCGC
ATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGG CAACATCCTGGGGCACAAG CTG GAG
TACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGG
TGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCA
GCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACC
CAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTG GAGTTC
GTGACCGCCGCCGGGATCACTCTCGGCATGGACGAG CTGTACAAatgattgtttattgcagcttata atggtta ca aata a agca atagcatcaca a atttca ca a ata a agcatttttttca ctgcattctagttgtggtttgtcca a actcatcaatgtatcttatCACCCGGCCCGCCTAGGTGCCGCGCTGTGGGGCCGCACGCGGCGCCT
GCTCTGGGCCAGCCCCCCAG GTCGCTGGCTGCAGGCCCGAGCTGAGCTGGGGTCCACAGA
CAATGACCTTGAGCGACAGGAGGATGAGCAGGACACAGACTATGACCACGTCGCGGATGG
TGGCCTGCAGGCTGACCCTGGGGAAGGCGAGCAGCAATGTGGAGAGGCGTCCAGCCCAG
AG CAGGTCCCCGTG CGG G CTGAGGAAGCCAGAGACAGTGACACGGAG GGCGACCTGGTC
CTCGGCTCCCCAGGACCAGCGAGCGCAGGGGGCAGTGCTGAGGCCCTGCTGAGTGACCTG
CACGCCTTTGCTGGCAGCGCAGCCTG GGATGACAGCGCCAGGGCAGCTGGGGGCCAGGG
CCTCCATGTCACCGCACTGTAGAGGCCGGTCTTGGTGTCCCATCCC
pARBI- ATAGAGTAGGGCGGGGGATGCCATG GAGAGGCTCCATGGGGGAGGGCCGGGGAAGCGC

669! CGCTCCAGGAG GCACGTGGTCCGGCGCGGAAGGGGCCCATGAGG CGTGGAG
GCCGCCGA
R PS23_1_ GGTCG GGGTACCGAG GGACGCAG GGAG GCCAG CGCTTCCTCCCG GGCATTCGAGCG GGG
Exogenou CCTCGTCCTTCGGGAGAACACATTCTCCGGAGCCCTCTTCGAACGTTTATTAGTCGGTTCAG
s_46 GG
CAACTTGAAGGCCAAATGTTTGGCCCACAGGCCAATAAATAGTACGAGAGCCAATCGG
CTTAAGG GTTTATTCCAGGTGAGGCGAGTGTCTTAGAAGATGGGAAACACGTAG ATGGCG
TGTTTTTACG GAAGAACTAAAATATTTAATTTTTAG G CAAGTGTCGTG GACTTCGTACTG CT
AGGAAGCTCCGTAGTCACCGACGAGACCAGtgacgactgtgccttctagttgccagccatctgttgtttgcc cctcccccgtg ccttccttga ccctgga aggtgccactccca ctgtcctttccta ata a aatgagga a attgcatcgcattg tctgagtaggtgtcattctattctggggggtggggtggggcaggacagcaagggggaggattgggaagacaatagcag gcatgctggggatgcggtgggctctatggGGATCTGCGATCGCTCCGGIGCCCGTCAGTGGGCAGAG
CGCACATCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCC
TAGAGAAGGTGGCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTC
CCGAGGGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAA
CG GGTTTGCCGCCAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGCG CC
CGCCGCCCTACCTGAGGCCG CCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTG
TGGTGCCTCCTGAACTGCGTCCGCCGTCTAG GTAAGTTTAAAG CTCAGGTCGAGACCGGGC
CTTTGTCCGG CGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGAC
156 CCTG CTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTG CG CCGTTACAGATC CAAG CT
GTGACCGGCGCCTACTCTAGAGCTAG CGAATTgccgccaccATGGTGAGCAAGGGCGAGGAG
CTGTTCACCG GGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAG
TTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTC
ATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGG
CGTG CAGTGCTTCAGCCGCTACCCCGACCACATGAAG CAGCACGACTTCTTCAAGTCCGCCA
TGCCCGAAGGCTACGTCCAG GAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGAC
CCGCG CCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCAT
CGACTTCAAGGAGGACGG CAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCA
CAACGTCTATATCATG GCCGACAAGCAGAAGAACGG CATCAAGGTGAACTTCAAGATCCGC
CACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCG
GCGACGGCCCCGTGCTGCTG CCCGACAACCACTACCTGAGCACCCAGTCCAAG CTGAGCAA
AGACCCCAACGAGAAGCG CGATCACATGGICCTGCTGGAGTTCGTGACCGCCGCCGGGAT
CACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttata atggtta caa ata a agcaatag catca ca aatttca caa ata a agcatttttttca ctgca ttctagttgtggtttgtcca a a ctc atca atgtatcttatAAG
TGGCATGATAAACAGTATAAGAAAGCTCATTTGGGCACAGCCCTAAAGGCCAACCCTTTTG
GAG GTG CTTCTCATG CAAAAGGAATCGTG CTG GAAAAAGTGTAAGTCCATTGCTCCCGTCA
AGTTTTAGTTTATTATAG GAATTCGAGACATGAACTTACGAATTCTTGTTTTGAAAGTAATTG
CAGGTTTTTGTGTAGTAGTATTCATTTGGGCATTGTGGGGTAAAATTGCAAAGCGTTTGTTC
TATTTAAAAGTTGGTAAAATTAGTTTTTGGGAATTAGGTAGTTAAGGTTTTAATTTAACGTT
GG CCTGGAAGGAATTGGAGAAGATACTAGCAATGATGAAGTAAAGGACACAAACACCTTT
ACTGTGGGAGTTGTTATAAGTAAATGGCACGTGTCAGCTATTGAACTTTATCGACTTGATAA
AACTAAGGTGAAGAGA
pA R B I- CAAGCGGGACTTGGG GTCTTGGGGACGGGCGGGCGGATGCGAATAGAGTAG

670: GATGCCATGGAGAGG CTCCATGGG G GAG G GCCGGG GAAGCG
CCGCTCCAGGAGGCACGT
R PS23_2_ GGTCCGGCGCGGAAG GGGCCCATGAGGCGTGGAGGCCGCCGAGGTCGGGGTACCGAGG
Exogenou GACGCAGGGAGGCCAGCGCTTCCTCCCGGGCATTCG AGCG GGG CCTCGTCCTTCGG GAGA
s_47_48 ACACATTCTCCGGAGCCCTCTTCGAACGTTTATTAGTCGGTTCAGGGCAACTTGAAGGCCAA
ATGTTTGGCCCACAGGCCAATAAATAGTACGAGAGCCAATCGGCTTAAGGGTTTATTCCAG
GTGAGGCGAGTGTCTTAGAAGATG GGAAACACGTAGATGGCGTGTTTTTACGGAAGAACT
AAAATATTTAATTTTTAG G CAAGTG CCG CG GAtga cg a ctgtgcctt cta gttgc ca gc cat ctgttgtttg cccctcccccgtg ccttccttga ccctgga a ggtgcca ctc cca ctgtc ctttccta ata a aatgagga a attgcatcgcat tgtctga gtaggtgtca ttctattctggggggtggggtggggcaggac agca agggggaggattggga ag a ca atagc aggcatgctggggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGA
GCGCACATCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGG GTCGGCAATTGAACGGGTG
CCTAGAGAAGGTGGCGCGGGGTAAACTG GGAAAGTGATGTCGTGTACTGGCTCCGCCTTT
TTCCCGAGGGTGGG GGAGAACCGTATATAAGTG CAGTAGTCGCCGTGAACGTTCTTTTTCG
CAACGGGTTTG CCGCCAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGC
GCCCGCCGCCCTACCTGAGG CCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGC
CTGTGGTGCCTCCTGAACTGCGTCCGCCGTCTAG GTAAGTTTAAAGCTCAGGTCGAGACCG
GGCCTTTGTCCGGCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCT
GACCCTGCTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAA
GCTGTGACCG GCGCCTACTCTAGAGCTAG CGAATTgc cgcca ccATGGTGAGCAAG GG CGAG
GAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCAC
AAGTTCAG CGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAG
TTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTA
CG GCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCC
GCCATGCCCGAAGGCTACGTCCAG GAG CGCACCATCTTCTTCAAGGACGACGGCAACTACA
AGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGG
GCATCGACTTCAAG GAG GACGGCAACATCCTGG GGCACAAGCTGGAGTACAACTACAACA
GCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGAT
CCGCCACAACATCGAGGACGGCAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCC
ATCGG CGACGG CCCCGTGCTGCTGCCCGACAACCACTACCTGAG CACCCAGTCCAAGCTGA
GCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCG
GGATCACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agc
157 aatagcatcacaaatttcacaaataaagcatttttttcactgcattctagttgtggtttgtccaaactcatcaatgtat ctta tCTTCGTACTGCTAGGAAGCTCCGTAGTCACCGACGAGACCAG AAGTGGCATGATAAACAG
TATAAGAAAG CTCATTTG G G CACAG CCCTAAAG G CCAACCCTTTTG GAG GTGCTTCTCATG C
AAAAG GAATCGTGCTGGAAAAAGTGTAAGTCCATTG CTCCCGTCAAGTTTTAGTTTATTATA
GGAATTCGAGACATGAACTTACGAATTCTTGTTTTGAAAGTAATTGCAGGTTTTTGTGTAGT
AGTATTCATTTGGGCATTGTGG GGTAAAATTGCAAAGCGTTTGTTCTATTTAAAAGTTGGTA
AAATTAGTTTTTGGGAATTAGGTAGTTAAGGTTTTAATTTAACGTTGGCCTGGAAGGAATTG
GAGAAGATACTAGCAATGATGAAGTAAAGGACACAAACACCTTTACTGTGGGAGTTGTTAT
AAGTAAATGGCACGTGTCA
pARBI- TTGTATATCCTTTTTAAAGTTATTCTTTTTAGTTAATTGCTCTATGTGTATTGAGACTAGGAAT 2541 671:
CAGAAAGCTTAGATTCTAGTCCCAGGTGTAAGTTGTGTAACCCTTGGCAAGTGTCAATCTCT
RTRAF_l_ AGGCCTCAGCTTTCTCATCTATAAAATGAGGAAGTTGTCGTATTCTATTTTTTTTCTTAAGAT
Exogenou GATACACTTAAATGTTCCCTTCTGTTGGGTTATATAATTGCATCAAAAGTGTAGTAATGTTAT
s_49 TAAAAAATTGTTAGAGATCCAAACTAAGGTCTCTTTCAACTCTCCCATTCTTTTTTCTGTGACT
TTATGGTAATAATGAAACTGGTGGTTTTCTTTTCTTCCCCCTCACAGTATCTCAGAGATGTTA
ACTGTCCTTTCAAGATTCAAGATCGACAAGAAGCTATTGACTG GCTTCTTG GTTTAGCTGTT
AGACTTGAAtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttgaccctgga a ggtgccactcccactgtcctttcctaataaaatgaggaaattgcatcgcattgtctgagtaggtgtcattctattctgg ggg gtggggtggggcaggacagcaagggggaggattgggaagacaatagcaggcatgctggggatgcggtgggctctatg gGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTCCCCG
AGAAGTTG GGGG GAG G G GTCGG CAATTGAACG GGTGCCTAGAGAAGGTGGCG CGG G GT
AAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGGGGAGAACC
GTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAACAC
AG CTGAAG CTTCGAGG GG CTCGCATCTCTCCTTCACGCGCCCG CCGCCCTACCTGAG GCCG
CCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAACTGCGT
CCG CCGTCTAG GTAAGTTTAAAGCTCAGGTCGAGACCG G GCCTTTGTCCG G CGCTCCCTTG
GAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTACGT
CIII GTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTACTCTAGA
GCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCC
ATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGG GC
GAGGGCGATGCCACCTACGGCAAGCTGACCCTG AAGTTCATCTGCACCACCGGCAAGCTG C
CCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTAC
CCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCG AAGGCTACGTCCAGG
AG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAG ACCCGCGCCGAGGTGAAGTTCGA
GGGCGACACCCTGGTGAACCG CATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGGCAA
CATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGAC
AAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGC
GTGCA GCTCGCCGACCACTACCAGCAGAACACCCCCATCGG CGACG GCCCCGTG CTGCTGC
CCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCG
ATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCT
GTACAAatgattgtttattgcagcttataatggttacaaataaagcaatagcatcacaaatttcacaaataaagcattt ttttca ctgcattcta gttgtggtttgtcca a a ctcatca atgtatcttatTATGG AGATAATG
GTACGTTTTGTG
GGGAATGTGTATTTTAAAGAGAGAGGAAAGATGGGAAAGGG AGTGTGAAAATGTAGG GA
ACTTTGCAGTTTGTTTTGTCTAGTACTATTTTACCTTTGGTTTATTCTTATCACAAGTTAAAAG
CACTTTTATTGTCTTTCATTGGTGTTTATATATTTCTGTTAGAATTTG GAAATGGTGCCCTCTG
GAGAAGGCTAATTGACTGTCTTCTCACAGAGTAACACTACTTTGATAATATGGTCTGCACCT
TAG CCTTTCAAATTAAATTGTTTTTAGTGTCCCAG AATTGATGG GACTTTGAAGTGTTGTTG C
AGTAGGTAATTTCTCAAAAGACTGAAACATGTCTAATGCCAATATACATTAATACTCTATAG
GCCAGAATATATTACTTATGTTTACTGTCTTAGCACAGATACTTCTATGGT
pARBI- GCACCTGTTATGTAGGAGGAGTAATAAAATGAATGAATGCACTCAAAACACTAAACAGTAA 2541 672: TTCTGTAATCCAACGG GAG GTACAG CGAATACCAAAAG
CCTACATATACTATTCTCTG CATG

Exogenou AAGCTGACAATAAAGTTTATTTGAGCGTCGACGTGCGCCGACGTGGCCCCGCCTCCCCAGC
s_50 CGGAGCCGCGATTGGTGG
GCATTTGCCGGCGGCCACCGCTTTTAAGCCACGATTGGCGAA
GGCCGCCGTCATTTCGGAGCGACTCAGCGCCTGCCCGCCCTCTCGCCGCGTCGCCGGTG CC
158 TGCGCCTCCCGCTCCACCTCGCTTCTTCTCTCCCGGCCGAGGCCCGGGGGACCAGAGCGAG
AAGCGGGGACCATGTTCCGACGCtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgt gccttccttga ccctgga a ggtgcca ctccca ctgtcctttccta ata a aa tg agga a attgcatcgcattgtctgagtag gtgtcattctattctggggggtggggtggggcagga cagcaagggggaggattgggaagacaatagcaggcatgctgg ggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACAT
CG CCCACAGTCCCCGAGAAGTTG GGGG GAGG GGTCGGCAATTGAACGGGTGCCTAGAGA
AG GTGGCGCG GGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCG CCTTTTTCCCGAG
GGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGT
TTGCCGCCAGAACACAG CTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCG
CCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTG CCGCCTCCCGCCTGTGGTG
CCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAG GTCGAGACCGGGCCTTTGT
CCG GCGCTCCCTTGGAG CCTACCTAGACTCAG CCGGCTCTCCACGCTTTGCCTGACCCTG CT
TGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACC
GG CGCCTACTCTAGAGCTAG CGAATTgccgcca ccATG GTGAGCAAG GGCGAGGAGCTGTTC
ACCGGGGTGGTG CCCATCCTGGTCGAGCTGGACG GCGACGTAAACG GCCACAAGTTCAGC
GTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGC
ACCACCGG CAAGCTGCCCGTGCCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGC
AGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCC
GAAGGCTACGTCCAGGAG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCG
CCGAG GTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACT
TCAAGGAGGACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACG
TCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAA
CATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGA
CG GCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGAC
CCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTC
TCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcaca aatttca ca a ata aagcatttttttca ctgcattctagttgtggtttgtcca a a ctcatcaatgtatcttatAAGTTGACG
GCTCTCGACTACCACAACCCCG CCGGCTTCAACTGCAAAGGTGAGGCGGCGGCCTCAGCCC
GGCCGCGTGTCCCTGACCTGGGCGGAGGTCCCAGCCTCAGTG CCCGCACCCCACCTCCCCG
TCGGGACCCTCGGCGGCCTGGTTTCCGCCGGCAGCCTCCGGGCCCCTCTCCTCTGGGTCGC
CACGTACCTCGGCTCTTCG CCG CCCCTTCCCGCCTTTAAAGCCCTCTCACCTACTCCTGTCTC
GGCATGTTACTTTCTG CACTTGCTTAACTCCAAGCATCACGTAACTACCTTCTCTGTACATAA
AAGGGAGAGCATTCGTCTTTCTCACTCACTATTCAACTCCATGGTTCCCTGGGTAATTAGGC
GATACCTTGAG CACCTGCTAATTATGGG CCAGCGCGGTGCTGGATTCTGAGGAAGGTGCTG
AGTAACTTG
pARB I- CACTAAACAGTAATTCTGTAATCCAACG G GAG GTACAG CGAATACCA AAAG

673: CTATTCTCTGCATGCTATAGCAAGAAAGAAAGGAAAGTGGCTTCCCGGTGGTTTTCTG
CCTA
RTRAF_3_ TTGTACAACCAGGAAGCTGACAATAAAGTTTATTTGAGCGTCGACGTGCGCCGACGTGGCC
Fxogenou CCGCCTCCCCAGCCGGAGCCGCGATTGGTGGGCATTTGCCGGCGGCCACCG CTTTTAAG CC
s_51 ACGATTGGCGAAGGCCGCCGTCATTTCGGAGCGACTCAG
CGCCTGCCCGCCCTCTCGCCGC
GTCGCCGGTGCCTG CGCCTCCCGCTCCACCTCGCTTCTTCTCTCCCGGCCGAGG CCCGGGGG
ACCAGAGCGAGAAGCGGGGACCATGTTCCGACGCAAGTTGACGGCTCTCGACTACCACAA
CCCCGCCGGCTTCAACTGCAAAtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgc cttccttgaccctggaaggtgccactcccactgtcctttcctaataaaatgaggaaattgcatcgcattgtctgagtag gtg tcattctattctggggggtggggtggggcaggacagcaagggggaggattgggaagacaatagcaggcatgctgggga tgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGC
CCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGG
TGGCGCGGG GTAAACTGG GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTG
GG GGAGAACCGTATATAAGTG CAGTAGTCG CCGTGAACGTTCTTTTTCGCAACGGGTTTGC
CGCCAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCT
ACCTGAGGCCGCCATCCACGCCG GTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTC
CTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCG
GCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGC
TCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGC
GCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACC
159 GG GGTGGTG CCCATCCTG GTCGAG CTGGACGGCGACGTAAACG G CCACAAGTTCAGCGTG
TCCGGCGAGGGCGAG GGCGATG CCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACC
ACCGG CAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT
GCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAA
GGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCG
AG GTGAAGTTCGAGG GCGACACCCTG GTGAACCG CATCGAGCTGAAG GGCATCGACTTCA
AG GAG GACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCT
ATATCATGGCCGACAAG CAGAAGAACG GCATCAAGGTGAACTTCAAGATCCGCCACAACAT
CGAGGACGG CAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGG
CCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCC
AACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCG
GCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcacaaatt tca ca a ata aag ca tttttttca ctgcattctagttgtggtttgtcca aa ctca tcaatgta tcttatGGTGAGGCG GC
GGCCTCAGCCCGGCCGCGTGTCCCTGACCTGG GCGGAGGTCCCAG CCTCAGTGCCCGCACC
CCACCTCCCCGTCGG GACCCTCGG CGGCCTGGTTTCCGCCGGCAGCCTCCGG G CCCCTCTCC
TCTGGGTCGCCACGTACCTCGGCTCTTCGCCGCCCCTTCCCGCCTTTAAAGCCCTCTCACCTA
CTCCTGTCTCG G CATGTTACTTTCTG CACTTG CTTAACTCCAAG CATCACGTAACTACCTTCTC
TGTACATAAAAGGGAGAGCATTCGTCTTTCTCACTCACTATTCAACTCCATGGTTCCCTGGG
TAATTAGGCGATACCTTGAGCACCTGCTAATTATGGG CCAGCGCGGTGCTGGATTCTGAGG
AAGGTGCTGAGTAACTTGAAGACTAGTTCACTGCCTGCCAG GAGCTAAAGG GACGAGGGG
TGGAAG
pARB I- CCGGGTTAGGCATAGGCCCTCCCGGATCTTCCGCG GTGTAAGGAGAAG

674:
GTGATAGGCCCAGAGCCCCCACTCCACAAGCCAGCCCATCCCCCACGGAGACCCAAACCGT
SE RF2_1_ CCACACACACCTTGCCAGCTGTTTGGGCCCCACGCCG CCCCAAAACACGCCTCCAGCTGGCC
Exogenou CCTTGGGACCTCCCTTCTCTAGTCCGTATTTTGACCTG GCCCGTGGCAGATTCGCCACTCCCC
s_52 CCTACCCCAAGCAGCCTGGGCCTCGATGG GCCGTTGTCGGGGCCCGGAGATTGAAGTG
GT
GTTGG ATCCTGCTGCTGGCCGCG CTG GG GTAGAAGG GTCGCCGGTGTGTGGGCAGAGCG
GCCCCCG CGTCTCACCTTTAATTTTCTTTCCTTAG GCGGTAACCAGCGTGAGCTCG CCCGCCA
GAAGAATATGAAAAAGCAGAGCtgacga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgt gccttccttga ccctgga a ggtgcca ctccca ctgtcctttccta ata a aa tg agga a attgcatcgcattgtctgagtag gtgtcattctattctggggggtggggtggggcagga cagcaagggggaggattgggaagacaatagcaggcatgctgg ggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACAT
CGCCCACAGTCCCCGAGAAGTTG GGGG GAGGGGTCGGCAATTGAACGGGTGCCTAGAGA
AG GTGGCGCG GGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCG CCTTTTTCCCGAG
GGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGT
TTGCCGCCAGAACACAG CTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCG
CCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTG CCGCCTCCCGCCTGTGGTG
CCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAG GTCGAGACCGGGCCTTTGT
CCGGCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTG CT
TGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACC
GG CGCCTACTCTAGAGCTAG CGAATTgccgccaccATGGTGAGCAAG GGCGAGGAGCTGTTC
ACCGG GGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGC
GTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGC
ACCACCGG CAAGCTGCCCGTGCCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGC
AGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCC
GAAGGCTACGTCCAGGAG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCG
CCGAG GTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACT
TCAAGGAGGACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACG
TCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAA
CATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGA
CG GCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGAC
CCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTC
TCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcaca aatttca ca a ata aagcatttttttca ctgcattctagttgtggtttgtcca a a ctcatcaatgtatcttatGACTCGGTT
AAGGGAAAGCGCCGAGATGACG GGCTTTCTGCTGCCGCCCGCAAGCAGAGGTAGCCCCAG
160 GGAGGG GAGG GAAAGGGACGGTGG AGACCTGGGTTAGACCAAGGGTTATAGAAGG AAA
GAGAGCTACCTCAGGGCTTGAATGTGGACTAGTCGTGAGGAGCAGAGTG CATTGCTTCCTC
TAG G GTTTTATTTCCTCC CCACC CTCCAAATTG TTAG CTCA CAG C CTTACAG G AAA G G AC G G

GGGCGG GCG CCTGCCCTCAGTCTGATTTCTGAG CGTCCCTG GGTCTGACCTTAAGG GCAAG
GGCAGGGAGCTTCACATTTCAAATACAGTTGTGGTTACGGCAGCCCAGTACTTTTGGCCCTC
CTTGCTGTTCGGTTCTCCTCCCTTCTCCCAACCTCCTCACTGGTGTTGCTGGGTGTGGTCCTC
AATACAGAATAGAG
pARBI- AATCGAGCCCTTTGCCCACGGCTACTTCACGGGACCACCCTCCCGG

675: CCCGGATCTTCCGCGGTGTAAGGAGAAGGCCAGCGCCCCTGTGATAGGCCCAGAGCCCCC
SE R F2_2_ ACTCCACAAGCCAGCCCATCCCCCACGGAGACCCAAACCGTCCACACACACCTTGCCAGCTG
Exogenou TTTGGGCCCCACGCCGCCCCAAAACACGCCTCCAGCTGGCCCCTTG GGACCTCCCTTCTCTA
s_53_54 GTCCGTATTTTGACCTGGCCCGTGGCAGATTCGCCACTCCCCCCTACCCCAAGCAGCCTGGG
CCTCGATGGGCCGTTGTCGG G GCCCG GAGATTGAAGTGGTGTTGGATCCTG CTG CTG G CC
GC GCTG GGGTAGAAG GGTCGCCGGTGTGTGGGCAGAGCGGCCCCCGCGTCTCACCTTTAA
TTTTCTTTCCTTAGGCGGTAACtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgcc ttccttgaccctggaaggtgccactccca ctgtcctttccta ata a a atga gga a attgcatcgcattgtctgagtaggtgt cattctattctggggggtggggtggggcagga cagca agggggaggattgggaa ga ca atagcaggcatgctggggat gcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCG CACATCGC
CCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGG
TGGCGCGGG GTAAACTGG GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTG
GG GGAGAACCGTATATAAGTG CAGTAGTCG CCGTGAACGTTCTTTTTCGCAACGGGTTTGC
CGCCAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCT
ACCTGAGGCCGCCATCCACGCCG GTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTC
CTGAA CTGCGTCCGCCGTCTAGGTAAGTTTAAAG CTCAGGTCGAGACCGG GCCTTTGTCCG
GCGCTCCCTTGG AG CCTACCTAGACTCAG CCGGCTCTCCA CGCTTTGCCTG ACCCTG CTTGC
TCAACTCTACGTCTTTGTTTCG TTTTCTG TTCTG CG C CGTTACAG ATCCAA G CTG TG A C CG GC
GCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACC
GG GGTGGTG CCCATCCTG GTCGAG CTGGAC GGCG ACGTAAACG G CCACAAGTTCAGCGTG
TCCGGCGAGGGCGAG GGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACC
ACCGG CAAGCTGCCCGTG CCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT
GCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAA
GGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCG
AG GTGAAGTTC GAG G GCGACACCCTG GTGAACCGCATCGAG CTGAAGGG CATCGACTTCA
AG GAG G ACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCT
ATATCATGGCCGACAAGCAG AAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACAT
CGAGGACGGCAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGG
CCCCGTGCTG CTGCCCGACAACCACTACCTG AG CACCCAGTCCAAGCTG AGCAAAGACCCC
AACGAGAAGCG CGATCACATG GTCCTG CTGGAGTTCGTGACCGCCGCCGG GATCACTCTCG
GCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcacaaatt tca ca a ata aag ca tttttttcactgcattctagttgtggtttgtccaaa ctcatcaatgtatcttatCAGCGCGAGCT
TG CCCG CCAGAAGAATATGAAAAAGCAG AGCGACTCGGTTAAGG GAAAGCGCCG AG ATG
ACGGGCTTTCTGCTGCCGCCCGCAAGCAGAGGTAGCCCCAGGGAGGGGAGGGAAAGGGA
CG GTG GAGACCTGG GTTAGACCAAG GGTTATAG AAGG AAAGAGAGCTACCTCAGGG CTG
AATGTGG ACTAGTCGTG AG GAGCA GAGTGCATTGCTTCCTCTAG GGTTTTATTTCCTCCCCA
CCCTCCAAATTGTTAGCTCACAGCCTTACAGGAAAGGACGGGGGCGGGCGCCTGCCCTCAG
TCTGATTTCTGAG CGTCCCTG G GTCTGACCTTAAG G G CAA G GG CAGG G AGCTTCACATTTC
AAATACAGTTGTGGTTACGG CAGCCCAGTACTTTTGGCCCTCCTTGCTGTTCGGTTCTCCTCC
CTTCTCCCAACCTC
pARB I-676: GTGTACACCAGACCTGAAGAAAAAGATCACAGAAGGAATTTCCCCTTTGTAAGATAGAGGT
SLC38A1_ AGTTAAAAATAAG C CTTATG AC CTAATA G GTTAATTGTAATG CTCTGG TAG CCAACTTG
AAA
1 Exogen AAG GAG AAATG ATGGTTG CATCTCACATTTTAAAATGTTA AGAATTTGTTTTGTAATGAGGA
ous_55 TAC CTTAA CCCCTG AAG G CCAAAATG ACTATTTGTGTGA GTTTGAG AAA G
G CA CATAG TAA
CTTGGGGAACAGTCAATAGAATGACAAAATCTTGACTATTTTAACTTTCTGAACCCTGTCAT
TTCTTGTGTTCTCAAATTTGATTTTTAAATAGGCTGCATGGTGTATGAAAAGCTGGGGGAAC
161 AAGTCTTTGGCACCACAtgacga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttcctt ga ccctgga aggtg cca ctccca ctgtcctttcctaata a a atgagga a attgcatcgcattgtctga gtaggtgtca ttc tattctggggggtggggtggggca ggacagca a gggggagga ttggga aga ca atag ca ggcatgctggggatgcgg tgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCAC
AGTCCCCGAGAAGTTG GGGGGAGG GGTCGGCAATTGAACGG GTG CCTAGAGAAG GTG GC
GCGGGGTAAACTGGG AAAGTGATGTCGTGTACTGGCTCCG CCTTTTTCCCGAGGGTGGG G
GAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCG CAACGGGTTTG CCGC
CAGAACACAGCTGAAGCTTCGAGGGGCTCG CATCTCTCCTTCACGCGCCCGCCG CCCTACCT
GAGGCCGCCATCCACG CCGGTTGAGTCGCGTTCTG CCG CCTCCCGCCTGTG GTGCCTCCTG
AACTG CGTCCG CCGTCTAG GTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGG CG
CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACG CTTTG CCTGACCCTGCTTGCTCAA
CTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCG CCT
ACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGG GCGAGGAGCTGTTCACCGG GG
TGGTG CCCATCCTGGTCGAG CTGGACG GCGACGTAAACG GCCACAAGTTCAGCGTGTCCG
GC GAGG GCG AG GG CGATGCCACCTACG G CAAGCTGACCCTGAAGTTCATCTGCACCACCG
GCAAGCTG CCCGTG CCCTG G CCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTC
AG CCG CTACCCCGA CCACATG AAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGG CT
ACGTCCAG GAGCGCACCATCTTCTTCAAG GACGACGGCAACTACAAGACCCGCGCCGAGGT
GAAGTTCGAGGG CGACACCCTGGTGAACCGCATCGAG CTGAAGGGCATCGACTTCAAG GA
GGACG GCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATC
ATG GCCGACAAG CAGAAG AACGGCATCAAGGTGAACTTCAAGATCCG CCACAACATCGAG
GACGG CAGCGTG CAG CTCGCCGACCACTACCAG CAGAACACCCCCATCGGCGACG GCCCC
GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAG CTGAG CAAAGACCCCAACG
AG AAGCGCGATCACATGGTCCTG CTGGAGTTCGTGACCG CCG CCG G GATCACTCTCGGCAT
GGACGAG CTGTACAAatgattgtttattgcagcttataatggttacaa ata a agcaatagc atca ca a atttca c a aa ta a agcatttttttca ctgcattctagttgtggtttgtcca a actcatca atgtatcttatGGGAAGTTCGTAATC
TTTG GAGCCACCTCTCTACAGAACACTG GAGGTAAAAAGAACATGCTTTTC I I I ACATAACT
TGAAA CTAATTCTGGTG G ATG AG CGGCCACATGAATTTTAACATAATTCAAG CA GTTATCAT
CCTCATTGCTAAAATG GCACAG G GAAAGTAAAG CAG AGACAG CAGTCACTTATTTAAAG CC
ACAAATCCTG CTA G A GTAG CTG AA GTTG CCTTTGTGTCTTACTGACAGTTGGTCTAAGAACG
GG CTG ATAACTTTTTATG TAG CTTG CAACATAA G CCTTC G TATCTTCTTTCTAAAAAT GTTCT
CATTTTCCTTCAGCAATG CTG AG CTACCTCTTCATCG TAAAAAATGAACTACCCTCTG CCATA
AAG TTTCTAATG G G AAA GG AA GAG ACATTTTC GTAAGTTATAG ACCATTTTTTTATG ATATA
pA R B I- TAAAGATGTCAAG CTTCAAATCATTATCATAAGGTTAAATATATTACG

677: AGTTTATTTCTG ATCGTG AAAG TAG AAG AAGTCTCACAAACAG CCATTTG
GAAAAAAAG AA
SLC38A1_ GTGTG ATG AG TATG TAA GTATCATTATAATG AAG TATG
CTTATTTTTTTAATCATATAAATTG
2_Exogen GGACAAATCTTTTTAATTCAAGTAG CATAGTACCAAAAGCATAACTACCTGTATTCACCAGG
ous_56_5 ATATTCCTCATACCTTATGTATTTTCTCTTGAAATCCATTCAAGGAAATAACTAGATAACTAA

CATTAAA G G CTTTG G CT
TGATGAACTTTCTTGG GG ATTATTTTG CTTAGT GTCAATGTTTG TTTCTTTG TTCAG ATTCCA
GGTACAACCTCCtga cgactgtgccttctagttgccagccatctgttgtttg cc cctc cc ccgtgc cttccttga ccctg gaaggtgcca ctcccactgtcctttccta at a atgaggaa attgcatcgcattgtctgagtaggtgtcattctattctgg ggggtggggtggggcaggacagcaagggggaggattgggaaga ca atagcaggcatgctggggatgcggtgggctct atggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTCCC
CGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGGTGGCGCGGG
GTAAACTG GGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGG GGGAGAA
CCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAAC
ACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCTGAGGC
CGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTG GTGCCTCCTGAACTGC
GTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCGCTCCCTT
G GAG CCTACCTAGACTCAG CCG G CTCTCCACG CTTTG CCTGACCCTGCTTGCTCAACTCTAC
GTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTACTCTA
GAGCTAGCGAATTgccgcca ccATGGTGAGCAAGGG CGAG GAG CTGTTCACCGGG GTGGTGC
CCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCG GCGAGG
GCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCT
162 GCCCGTGCCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAG CCGCT
ACCCCGACCACATGAAG CAG CACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCA
GGAGCG CACCATCTTCTTCAAG GACGACG GCAACTACAAGACCCGCGCCGAG GTGAAGTTC
GAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGG GCATCG ACTTCAAGG AG GACGGC
AACATCCTGG G GCACAAG CTG GAG TACAACTACAACAG CCACAACGTCTATATCATG G CCG
ACAAG CAGAAGAACG GCATCAAGGTGAACTTCAAGATCCG CCACAACATCG AGGACGG CA
GC GTG CAGCTCGCCGACCACTACCAG CAGAACACCCCCATCGGCGACGG CCCCGTGCTGCT
GCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCG
CGATCACATGGTCCTGCTG GAGTTCGTGACCGCCG CCG G GATCACTCTCGG CATGGACGAG
CTGTACAAatgattgtttattgcagcttataatggttacaa ata a agca atagc atca ca a atttca c a a ata aagca tttttttca ctgca ttctagttgtggtttgtcca a actcatca atgtatcttatTTAG
GCATGTCAGTTTTTAACCTA
AG CAACGCCATTATG GG CAGTGG GATTTTG GGACTCGCCTTTGCCCTG GCAAACACTGG AA
TCCTACTTTTTCTG TG AG TATTG ACG TGCC GGTA CTTTCCATTTTAAAACTG AACTTTTT GTAT
GTTTCTG TTATTACATAG AA G AAATGTG AAATATTTATAAATAGTTTTTTTATTAA CTTG G CT
AAG TTACAA GTACATACTTTG ATATTTATTG CTTG AGTG ATTTTCCAAACTGTACCTAATG CT
TACAACAAAG A GG AA GG AG AAAATCTTTTTATTAATCAAAAATACTG AAATAAG CATTTGT
GAAAAGGATTAAAACATTTGAAAATAACTTTTTTTG GTATTTTTG G AG TCTACCTG TG ACTTT
AAATCTCAGTTAAAATATTAAG AG G TGTTTAACCCCAG CCAGTCACCACTT
pA R B I- ATCATTCTTCACACTAGTTCCTTTTCTG ACTCTTCA GTCA

678: GCATCTCTAAAAT GAGTTTGATTTA CTAAA CATAG
AAGTTGACAAAGTTTTTCAAGTATAAC

_ Ex oge n o AGTCATTTATGTGACATATTTATAAGAAACATCTG CTAG TGCTGCTGCATA CTTTTATCA
GAT
us_58 CCATTAAATAGTACATTTTTGTTCTTGATTCTCACTAAAACTAACTAAATGGCTGTCATTCTTT
TCTTTATG CTTTTATTG TACTAATTTAG CCCATTTG A CTG CA CttttttttttttttttttAAAGTTCCCT

CCTTTCTTTTCCCCTTG CTTCCCAACAGGTCTCTTGATGGTCGTCTCCAG GTATCCCATCGAtg acgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttgaccctgga aggtgccactcccac tgtcctttccta ataa a atgagg aa attgcatcgcattgtctg agta ggtgtc attctattctggggggtggggtggggca ggacagca agggggaggattgggaaga ca a ta gcaggcatgctgggg atgcggtgggctctatggGGATCTGCG
ATCGCTCCGGTGCCCGTCAGTG G GCAGAGCG CACATCGCCCACAGTCCCCGAGAAGTTGG
GGGGAG GGGTCGG CAATTGAACG GGTGCCTAGAGAAGGTGGCGCG GG GTAAACTGG GA
AAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCCGAGG GTG GGG GAGAACCGTATATAAGT
GCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACG G GTTTG CCG CCAGAACACAGCTGAAGC
TTCGAGGG GCTCGCATCTCTCCTTCACGCGCCCG CCG CCCTACCTGAG G CCG CCATCCACGC
CG GTTGAGTCG CGTTCTGCCGCCTCCCGCCTGTG GTG CCTCCTGAACTGCGTCCGCCGTCTA
GGTAAGTTTAAAGCTCAGGTCGAGACCGGG CCTTTGTCCGGCGCTCCCTTGGAG CCTAC CT
AG ACTCAG CCG GCTCTCCACGCTTTG CCTGACCCTGCTTGCTCAACTCTACGTCTTTGTTTCG
TTTTCTGTTCTG CGCCGTTACAGATCCAAG CTGTGACCG GCG CCTACTCTAGAG CTAGCGAA
TTgccgccaccATGGTGAG CAAGGG CGAG GAGCTGTTCACCG GGGTGGTGCCCATCCTGGTC
GAGCTGGACGG CGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGG CGAGG GCGA
TGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTG CACCACCGGCAAGCTGCCCGTG CCCT
GGCCCACCCTCGTGACCACCCTGACCTACGG CGTG CAGTGCTTCAGCCGCTACCCCGACCAC
ATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAG G CTACGTCCAGGAGCGCACCA
TCTTCTTCAAG GACGA CGGCAACTACAAGACCCGCGCCG AG GTGAAGTTCGAGG GCGACA
CCCTGGTGAACCGCATCGAG CTGAAGGG CATCGACTTCAAG GAGGACGG CAACATCCTG G
GGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATG G CCGACAAGCAGAA
GAACGGCATCAAG GTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAG CGTG CAG CT
CG CCGACCACTACCAGCAGAACACCCCCATCG GCG ACGG CCCCGTGCTGCTGCCCGACAAC
CACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAG CGCGATCACATG
GTCCTG CTGGAGTTCGTGACCGCCGCCGG GATCACTCTCGG CATGGACGAGCTGTACAAatg attgtttattgcagcttata atggttacaa ata a agcaatagcatca ca a atttca ca aata a a gcatttttttc a ctg ca t tctagttgtggtttgtcca a a ctcatca atgtatctta tAAAG GATTGCCACATGTTATATATTG
CCGATTAT
GGCG CTGGCCTGATCTTCACAGTCATCATGAACTCAAGGCAATTGAAAACTGCGAATATG C
TTTTAATCTTAAAAAG G ATGAA GTATGTGTAAACCCTTACCACTATCAG AG A GTTG AG A CAC
CAG GTAG GAATATTGCAAGTTTTTTTCTTG GTTTTG AATTAAATG CCTG AA CTTCA GTATATT
TAAGTACTCTTGTG A CCCAG G AAAATTTTA GTG TAAATTCTAATAAATTACCTTAAATTGTG C
163 ATATTATTTGGTG GTAATTAAAAttttttta tttttaa a a atttttAATG AG AAAA CATATCTG
TTTCTT
GGAATAGCATATGATTTGGCATGAAAGTGCATATCCAG GAAATCCTGTAGATTG GCAGTCT
GCTCAGCG CTATGCTGATGTGCTTCTACATGTCCCT
pA RB I- TTCTATATTAAAAATCAGTTG

679: ATGAAGCACTTAGTAAAGCTGATGGCAGAG GTTTTAGATCTTTTGAATACAG GAAA
GTG GT
SMA D2_2 AAG GGCATTATAG GTATG ATTCATTG AG CTAGAGGGCGTTG GTG TTCA CATTTTAAAAA
CA
Ex oge n o TAATCATGACTTATCTGGAGTCACATGTTCCATTTTATTGG CTTCCTTCATATAGAAAATATA
us_59 GTAACCAG CACTACATG CCTGTGAAAAAGTGAAGCAATTGTATATTTTCTGG GTG
AAGG AA
GTATTCTGTACATTCTCCATGTTTTACATCATGGTATTTTGAATAACCATGCTTCCATGTTCAC
ATCAATTTTTTGTTTTTCAATTTATG CACAG CACTTG CTCTGAAATTTGG GG ACT G AGTACAC
CAAATACGATAGATtgacga ctgtgccttctagttgcca gccatctgttgtttgcccctcccccgtgccttccttga cc ctgga aggtgcca ctccca ctgtcctttccta a ta aaatgagga aattgcatcgcattgtctgagtaggtgtcattctattc tggggggtggggtggggcaggacagca agggggaggattgggaagacaatagcaggcatgctggggatgcggtggg ctctatggG GATCTGCGATCG CTCCGGTGCCCGTCAGTGG GCAGAGCGCACATCGCCCACAGT
CCCCGAGAAGTTGGG GGGAGGG GTCG GCAATTGAACG GGTG CCTAGAGAAG GTG GCG CG
GGGTAAACTGG GAAAGTGATGTCGTGTACTG GCTCCGCCTTTTTCCCGAG GGTGGGG GAG
AACCGTATATAAGTGCAGTAGTCGCCGTG AACGTTCTTTTTCGCAACGG GTTTGCCGCCAG A
ACACAGCTGAAGCTTCGAGGG GCTCGCATCTCTCCTTCACGCGCCCGCCG CCCTACCTGAG
GCCGCCATCCACGCCG GTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAACT
GCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAG GTCG AG ACC GGGCCTTTGTCCGGCGCTCC
CTTG GAG CCTACCTAGACTCAG CCGGCTCTCCACG CTTTGCCTGACCCTGCTTGCTCAACTCT
ACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCG GCGCCTACTC
TAG AG CTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAG GAGCTGTTCACCGG GGTGGT
GCCCATCCTGGTCGAGCTG GACG GCGACGTAAACGG CCACAAGTTCAG CGTGTCCGGCGA
GGGCGAGGGCGATG CCACCTACG GCAAG CTGACCCTGAAGTTCATCTG CACCACCGGCAA
GCTG CCCGTGCCCTGG CCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAG CC
GCTACCCCGACCACATGAAG CAG CACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGT
CCAGGAGCGCACCATCTTCTTCAAG GACGACG GCAACTACAAGACCCGCG CCGAG GTGAA
GTTCGAGG GCGACACCCTG GTGAACCG CATCGAGCTGAAGGG CATCGACTTCAAGGAG GA
CG GCAACATCCTGG G G CA CAAGCTGG A GTACAA CTA CAA CAG CCACAACGTCTATATCATG
GCCGACAAGCAGAAG AACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGG AC
GGCAGCGTGCAG CTCGCCGACCACTACCAGCAGAACACCCCCATCGG CGACG GCCCCGTGC
TG CTGCCCGACAACCACTACCTGAG CACCCAGTCCAAGCTGAG CAAAGACCCCAACGAGAA
GCGCGATCACATG GTCCTGCTGGAGTTCGTGACCG CCG CCG GGATCACTCTCG GCATGGAC
GAGCTGTACAAatgattgtttattgcagcttata atggttaca a ata a agca ata gcatca ca a atttca ca aata a agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatctta tCAGTGGGATACAACAGGCC
TTTA CA G CTTCTCTG AACAAA CCAG GTGAATATTCTG CCCTCTGTCGAATCTTAGAGATCTTG
TGGGAGGGGG GTATATTTTGAAAGACCTATATGGG GTTG CTTAGTATAATTTTG CCTAGG A
TGTTTCCTTAATGTAAAATAAGGCATAG CATTTAAAATATCTG CTTGCCAGTATAAAAAATAT
ATTAAATGTTTCAGCTGATGTTTTAATCAATG CAATTCTAGTTTGAATCTTCTTTAAATTACTG
TATCCCCTAAAGAATAATAATTTTGATAACTATATATTTATTTTAG CTTG AG ATCA G ATTATA
ATCTGTTGTTG G C CTAATTTTTAAGTA G ATA CATG ATG A GTTTTG CTAAATTTCTTGTTATCA
GAATCTCATCTTTATA CTAATAAATA CATACTTAATG AACC CCTTATATCACAA
pA RB I- TCAATGTATTTTTG TATA GTCTTTTG AG ATTA G AGTGAA GTTCTAAAAAG

680: TTGTGAAAATATTTTAATTGACACTACTAAGAGTTTAAAATAGTATTG GTG
GTGAAAAATCG
SM A D2_3 TTAAAAGTATCTG ATTTTACTTG CAAAATTACTGTATTTTCCCACAAG AG G A GTCCTTACAA
C
_Exogeno ATTCTTGTTTTTAGAAGG GTTTGTTTCCATCATTCATTTAAATTCATAAAGATAACGTTTTCAT
us_60 GG GTG G AG AAG TCTATTG GGAAAGTCATAATTCGAATTTCTATCTTGCTTTG
CAGTTTGCTT
TCTATGATAAGTTGAAATTATTACTTGATGTTCAAG G TA GTCTCTACATCATCCTTTCAATAT
TTCTG CTAG G TTC GATACAAG AG G CTGTTTTCCTAG CGTG G CTTG CTG CCTTTG GTAAGAAC
ATGTCGTCCATCtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttga ccctg gaaggtgcca ctcccactgtcctttccta ataa a atgaggaa attgcatcgcattgtctgagtaggtgtcattctattctgg ggggtggggtggggcaggacagcaagggggaggattgggaaga ca atagcaggcatgctggggatgcggtgggctct atggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTCCC
CGAGAAGTTGGG GG GAGGG GTCGGCAATTGAACGG GTGCCTAGAGAAG GTGGCGCG GG
164 GTAAACTG GGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGG GGGAGAA
CCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAAC
ACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCTGAGGC
CGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTG GTGCCTCCTGAACTGC
GTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCGCTCCCTT
G GAG CCTACCTAGACTCAG CCG G CTCTCCACG CTTTG CCTGACCCTGCTTGCTCAACTCTAC
GTCTTTGTTTCGTTTTCTGTTCTGCG CCGTTACAGATCCAAG CTGTGACCG GCGCCTACTCTA
GAGCTAGCGAATTgccgcca ccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGC
CCATCCTGGTCGAGCTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG GCGAGG
GCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCT
GCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCT
ACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCA
GGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTC
GAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGG GCATCGACTTCAAGGAGGACGGC
AACATCCTG G G GCACAAG CTG GAG TACAACTACAACAG CCACAACGTCTATATCATG G CCG
ACAAGCAGAAGAACG GCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG GACGG CA
GCGTG CAGCTCGCCGACCACTACCAG CAGAACACCCCCATCGGCGACGG CCCCGTGCTGCT
GCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCG
CGATCACATGGTCCTGCTGGAGTTCGTGACCGCCG CCGGGATCACTCTCGGCATGGACGAG
CTGTACAAatgattgtttattgcagcttataatggttacaa ata a agca atagc atca ca a atttca c a a ata aagca tttttttca ctgca ttctagttgtggtttgtcca a actcatca atgtatcttatTTGCCATTCACG
CCGCCAGTTGT
GAAGAGACTG CTG GG ATG GAAGAAGTCAG CTG GTG GGTCTG GAG G AG CAGG CG GAG GA
GAGCAGAATGGG CAGGAAGAAAAGTGGTGTGAGAAAGCAGTGAAAAGTCTG GTGAAGAA
GCTAAAGAAAACAGGACGATTAGATGAGCTTGAGAAAGCCATCACCACTCAAAACTGTAAT
ACTAAATGTGTTACCATACCAAGGTAAGTTTTGTTAGATCCCAGGTTTGATCAAATTATGTC
AAG GAATCTGAAG GAAAG TTA CTGAATTTGTGTTCCTTTCAAGTTG CCTGTAAAAAGTG AT
GATTGAAATATGACTGTTTTTAACCTTGTATAAATTGTITTTGCTAGCTGACTTGITTTATAA
ATTATTTTCTTGAATAGTGAGGTTTAATCAAGCTAAATAAGATACTTAGATTTATTATCTTCA
pARB I-681: CCCAGCCTGGCCAGCAGGCGGCGGGCGCGGGGCGGCGAGCCGGGGCCGGACGGCTGGA
SOCS1_1 GCCAGAACCGGCTGCTCTCCACGCCCCCCTCTCGGTGCTG CCCGGAGGCCGGACTCCGCCT
Exogeno CCACCGAG CCCCCACCCGCCGGGAAGAGCTCCGCGGAGTACAGAGCCCATTTTCTAGCTGT
us_61 GTCCACTGAGG
CTGAACGGATCCGCGCGGACTTGGTGCTCCGTGCTCGCCCCCTAGGGCCG
GGTCCGCCGGGAGCGCCGCCCTCCGGAGTTGTCCGGCCG GCGCACACCTGCCCGGCCCCG
CAGCG CCCCAGCTCACCTCTTTGTCTCTCCCGCAGCGCACCCCCGGACGCTATGGCCCACCC
CTCCGGCTGGCCCCTTCTGTAG GATGtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccc cgtgccttccttgaccctgga aggtgcca ctccca ctgtcctttccta ata a a atga gga a attgcatcgcattgtctgagt aggtgtcattcta ttctggggggtggggtggggcagga cagca agggggaggattggg a aga caatagcaggcatgct ggggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCAC
ATCGCCCACAGTCCCCGAGAAGTTG G GGGGAGGG GTCGGCAATTGAACG GGTG CCTAGA
GAAGGTGGCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCCG A
GGGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGG
GTTTGCCGCCAGAACACAGCTGAAGCTTCGAG GGGCTCGCATCTCTCCTTCACGCGCCCGC
CGCCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGG
TGCCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAG CTCAGGTCGAGACCGGGCCTTT
GTCCGGCG CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTG CCTGACCCTG
CTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGA
CCGGCGCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGT
TCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGG CCACAAGTTCA
GCGTGTCCGGCGAGGGCGAGGG CGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCT
GCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGT
GCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGC
CCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCG
CGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGA
CTTCAAG GAG GACG GCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAA
165 CGTCTATATCATG GCCGACAAGCAGAAGAACGG CATCAAGGTGAACTTCAAGATCCGCCAC
AACATCGAGGACGGCAGCGTG CAG CTCG CCGACCACTACCAG CAGAACACCCCCATCG GC
GACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAG
ACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCAC
TCTCGG CATG GACGAGCTGTACAAatgattgtttattgcagcttata atggtta caa ata a agcaatagcatc aca aatttca caa ata a a gcatttttttca ctgca ttctagttgtggtttgtccaa a ctcatca atgtatcttatGTAG CA
CACAACCAGGTGGCAGCCGACAATGCAGTCTCCACAGCAGCAGAGCCCCGACGGCGGCCA
GAACCTTCCTCCTCTTCCTCCTCCTcgcccgcgg cccccgcgcgcccgcggccgtgc cc cg cggtcc cggcccc ggCCCCCGGCGACACGCACTTCCGCACATTCCGTTCGCACGCCGATTACCGGCGCATCACGC
GCGCCAGCGCGCTCCTG GACGCCTGCGGATTCTACTGGGGGCCCCTGAGCGTGCACGGGG
CGCACGAGCGGCTGCGCGCCGAGCCCGTGGGCACCTTCCTGGTGCGCGACAGCCG CCAGC
GGAACTGCTTTTTCGCCCTTAGCGTGAAGATGGCCTCGGGACCCACGAGCATCCGCGTG CA
CTTTCAGGCCGGCCGCTTTCACCTGGATGGCAGCCGCGAGAGCTTCGACTGCCTCTTCGAG
CTG CTG
pARBI- GGATCCCAGCCTGGCCAGCAGGCGGCGGGCG CGGGGCG

682:
TGGAGCCAGAACCGGCTGCTCTCCACGCCCCCCTCTCGGTGCTGCCCGGAGGCCGGACTCC
SOCS1_2 GCCTCCACCGAG CCCCCACCCGCCG GGAAGAG CTCCGCGGAGTACAGAG CCCATTTTCTAG
_Exogeno CTGTGTCCACTGAGGCTGAACGGATCCGCGCGGACTTGGTGCTCCGTGCTCGCCCCCTAGG
us 62 GCCGGGTCCGCCGGGAGCGCCGCCCTCCGGAGTTGTCCG GCCGGCG
CACACCTGCCCG GC
CCCGCAGCGCCCCAGCTCACCTCTTTGTCTCTCCCGCAGCGCACCCCCGGACGCTATGGCCC
ACCCCTCCGGCTGGCCCCTTCTGTAGGATGGTAGCACACAACCAGGTGGCAGCCGACAATG
CAGTCTCCACAGCAGCAGAG CCCCGAtgacgactgtgccttctagttgccagccatctgttgtttgcccctccc ccgtgccttccttga ccctgga a ggtgcca ctccca ctgtcctttccta ata a aa tgagga a attgcatcgcattgtctgag taggtgtcattctattctggggggtggggtggggcaggacagcaagggggaggattggga agacaatagcaggcatgc tggggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCAC
ATCGCCCACAGTCCCCGAGAAGTTGG GGGGAGGGGTCGGCAATTGAACGGGTGCCTAGA
GAAGGTGGCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCCG A
GGGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGG
GTTTGCCGCCAGAACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGC
CGCCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGG
TGCCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAG CTCAGGTCGAGACCGGGCCTTT
GTCCGGCG CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTG
CTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGA
CCGGCGCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGT
TCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCA
GCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCT
GCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGT
GCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGC
CCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCG
CGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGA
CTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAA
CGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCAC
AACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGC
GACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAG
ACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCAC
TCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggttacaaataaagcaatagcatc acaaatttcacaaataaagcatttttttcactgcattctagttgtggtttgtccaaactcatcaatgtatcttatCGGC
GG
CCAGAACCTTCCTCCTCTTCCTCCTCCTcgcccgcggcccccgcgcgcccgcggccgtgccccgcggtcccgg ccccggCCCCCGGCGACACGCACTTCCGCACATTCCGTTCGCACGCCGATTACCGGCGCATCA
CGCGCGCCAGCGCGCTCCTGGACGCCTGCGGATTCTACTGGGGGCCCCTGAGCGTGCACG
GGGCGCACGAGCGGCTGCGCGCCGAGCCCGTGGGCACCTTCCTGGTGCGCGACAGCCGCC
AGCGGAACTGCTTTTTCGCCCTTAGCGTGAAGATGGCCTCGGGACCCACGAGCATCCGCGT
GCACTTICAGGCCGGCCGCTTICACCTGGAIGGCAGCCGCGAGAGCTTCGACTGCCTCTTC
GAGCTGCTGGAGCACTACGIGGCGGCGCCGCGCCGCATGCTGGGGGCCCCGCTGCGCCAG
CGCCGC
166 pARBI- CG

683: ATTTTCTAGCTGTGTCCACTGAGGCTGAACG GATCCGCGCGGACTTGGTGCTCCGTGCTCGC
SOCS1_3 CCCCTAGGGCCGGGTCCGCCGGGAGCGCCGCCCTCCGGAGTTGTCCGGCCGGCGCACACC
Exogeno TGCCCGGCCCCGCAGCG CCCCAGCTCACCTCTTTGTCTCTCCCGCAGCGCACCCCCGGACGC
us_63 TATG G CCCACCCCTCCGGCTGG CCCCTTCTGTAG GATG GTAGCACACAACCAG
GTG GCAGC
CGACAATGCAGTCTCCACAGCAGCAGAGCCCCGACGGCG GCCAGAACCTTCCTCCTCTTCCT
CCTCCTcgcccgcggcccccgcgcgcccgcggccgtgccccgcggtcccggccccggCCCCCGGCGACACGCA
CTTCCGCACAtg a cga ctgtgccttctagttgccag cc atctgttgtttgcccct cccccgtg ccttccttga ccctgga aggtgccactccca ctgtcctttccta ata a aatgagga a attgcatcgcattgtctgagtaggtgtcattctattctgggg ggtggggtggggca ggacagca a gggggaggattggga aga ca atagcaggcatgctggggatgcggtgggctctat ggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTCCCC
GAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGGTGGCGCGGGG
TAAACTGG GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG GGAGAACC
GTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAACAC
AG CTG AAG CTTCGAGG GG CTCGCATCTCTCCTTCACGCGCCCG CCGCCCTACCTGAG GCCG
CCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAACTGCGT
CCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCGCTCCCTTG
GAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTACGT
CTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGG CGCCTACTCTAG A
GCTAG CGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCC
ATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGG GC
GAGGGCGATGCCACCTACGGCAAGCTGACCCTG AAGTTCATCTGCACCACCGGCAAGCTG C
CCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTAC
CCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCG AAGGCTACGTCCAGG
AG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAG ACCCGCGCCGAG GTGAAGTTCG A
GGGCGACACCCTGGTGAACCG CATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGGCAA
CATCCTG GGGCACAAGCTGGAGTACAACTACAACAGCCACAACGICTATATCATGG CCG AC
AAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGC
GTGCA GCTCGCCG ACCACTACCAGCAGAACACCCCCATCGG CGACG GCCCCGTG CTGCTGC
CCG ACAACCACTACCTGAGCACCCAGTCCAAG CTGAGCAAAGACCCCAACGAGAAGCGCG
ATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGG ATCACTCTCGGCATGGACGAGCT
GTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatca ca aa tttca caa ata a agcattt ttttca ctgcattcta gttgtggtttgtcca a a ctcatca atgtatcttatTTCCGTTCGCACGCCGATTACCG GC
GCATCACGCGCGCCAGCGCGCTCCTGGACGCCTGCGGATTCTACTGGGGGCCCCTGAGCGT
GCACGGGGCGCACGAGCGGCTGCGCGCCGAGCCCGTGGGCACCTTCCTGGTGCGCGACAG
CCG CCAGCGGAACTGCTTTTTCGCCCTTAGCGTGAAGATGGCCTCGG GACCCACGAG CATC
CG CGTG CACTTTCAGG CCGG CCG CTTTCACCTG GATG G CAG CCG CGAG AG CTTCGACTG CC
TCTTCG AGCTGCTG GAG CACTACGTG GCG G CG CCG CG CCG CATG CTGG G GG CCCCG CTGC
GCCAGCGCCGCGTGCGGCCGCTGCAGGAGCTGTGCCGCCAGCGCATCGTGGCCACCGTGG
GCCGCGAGAACCTGGCTCGCATCCCCCTCAACCCCGTCCTCCG CGACTACCTGAGCTCCTTC
pA RB I- GG CAGACAAACAGTTAAAACAGAG

684: CTCTAGTTTTGTTAAACACCTCATCTGCTATCACGGTCTGG CGAAACCCTG G AG AAACTATT
SR P14_1_ ATTTCCACGACG GAAACATGTAATATCGAAG CACGTTAAATTCCACTCAAAGTG G CG G CGT
Exogenou CTGGGATCCCTGACAAAGAGCCCTGGGCTGCTTGGGGCCCATTGTTACCAGAAGCAACCTA
s_64 GGCGGCTCAGTGCTGGG G ACTGAGCCAGCTACAATCCCTACTATTTTCCGGGCCCG
AAGCC
CCGAATGTGCTCAGTACAGG GTGGGGAAAGGTAGAGGAAGG CGGCGGTCCGCGGCAGAC
AG ACTCCG GTG GCTCCCAGACACCGG GAACCCAGG GAGCATCGCCCGGCTCCCCTCCCGCT
GCTTACACTTCTTCAAGGTGATATAtgacga ctgtgccttctagttgccagccatctgttgtttgcccctccccc gtgccttccttga ccctgga a ggtgccactccca ctgtcctttccta ata a aa tg agga a attgcatcgcattgtctgagta ggtgtcattctattctggggggtggggtggggcagg acagca agggggaggattggga agacaatagcaggcatgctg gggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACA
TCGCCCACAGTCCCCG AGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAG
AAG GTGG CGCG G GGTAAACTGG GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGA G
GGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGT
TTGCCGCCAGAACACAG CTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCG
167 CCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTG
CCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAG GTCGAGACCGGGCCTTTGT
CCGGCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTG CT
TGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACC
GG CGCCTACTCTAGAGCTAG CGAATTgccgccaccATGGTGAGCAAG GGCGAGGAGCTGTTC
ACCGGGGTGGTG CCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGC
GTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGC
ACCACCGG CAAGCTGCCCGTGCCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGC
AGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCC
GAAGGCTACGTCCAGGAG CGCACCATCTICTICAAGGACGACGGCAACTACAAGACCCGCG
CCGAG GTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACT
TCAAGGAGGACG GCAACATCCTGG G GCACAAGCTGGAGTACAACTACAACAG CCACAACG
TCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAA
CATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGA
CG GCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGAC
CCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTC
TCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcaca aatttca ca a ata aagcatttttttca ctgcattctagttgtggtttgtcca a a ctcatcaatgtatcttatGACGCTGCC
CGACGTCCGGCACTTCTGGAAAAGTCTGGTCAGCTCCGTCAGGAACTGGAAAACAACAGGC
CCAGCCCCGTTAGCCAGCCCCAAATCCTCGTCCTGCCGCGTCAAGGCCCTGGTCCTCCTG CA
GGAGGCACAGGTCTCGAGTAACGCCTGAGCCGCCCCCTTCCCTCGGCCGGCCAG GCCTAGC
CATACCTGCTCGCTCTCCAACAACACCATCGCGGCGACGCTGGCTCGACTCCCTCCGCTTAA
GCCCCTAGCAGTGAGAGCCGGAAGTTCGGCCTAGGCTGGGCGGGACTTCCGCTACTAGAC
TTCCATTGTCTTCCACCAATCTCTTCTCTTCCACCAATCCCGGCCTGCGCCCTCCCCCCTCCCC
GCCCG CCTAGCGCCCGCGCCCTGGGACGTCCGGGGGCCTGTGACCCGGAGGCGCTGGGGC
TGTCCTGGGTC
pARB I-685: GG CTTCGGGCCCGGAAAATAGTAG
GGATTGTAGCTGGCTCAGTCCCCAGCACTGAGCCGC
SR P14_2_ CTAGGTTGCTTCTGGTAACAATGGGCCCCAAGCAGCCCAGGGCTCTTTGTCAGGGATCCCA
Exogenou GACGCCGCCACTTTGAGTGGAATTTAACGTGCTTCGATATTACATGTTTCCGTCGTGGAAAT
s_65_66 AATAGTTTCTCCAGGGTTTCGCCAGACCGTGATAGCAGATGAGGTGTTTAACAAAACTAGA
GCAGCGGGGCTGGGTTAGGTCATCACTTACCCCACTCCTCTGTTTTAACTGTTTGTCTGCCG
CTCAG CCACGTACGCG G CCGTGTTCACCAGGATGCATTTTTTCCTTCAGATGACGGTCGAAC
CAAACCCATTCCAAAGAAGtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttc cttga ccctgga aggtgcca ctccca ctgtcctttccta ata a aatgagga a a ttgcatcgc attgtctgagtaggtgtca ttctattctggggggtggggtggggcagga cagcaagggggaggattgggaagacaatagcaggcatgctggggatgc ggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCC
ACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGGTG
GCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGG
GG GAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACG G GTTTG CC
GCCAGAACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTA
CCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCC
TGAACTG CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGG GCCTTTGTCCG G
CGCTCCCTTG GAGCCTACCTAGACTCAGCCG GCTCTCCACGCTTTGCCTGACCCTGCTTGCTC
AACTCTACGTCTTTGTTTCGTTTTCTGTTCTG CGCCGTTACAGATCCAAGCTGTGACCG GCGC
CTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGG
GGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTC
CGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCAC
CG GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGC
TTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAG
GCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGA
GGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAA
GGAGGACGGCAACATCCTGGGGCACAAGCTG GAGTACAACTACAACAGCCACAACGTCTA
TATCATG GCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATC
GAGGACGGCAGCGTGCAG CTCGCCGACCACTACCAG CAGAACACCCCCATCG GCGACG GC
168 CCCGTG CTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCA
ACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGG
CATGGACGAGCTGTACAAatgattgtttattgcagcttataatggttacaa ata a agca atagcatca ca a attt ca ca a ataa a gcatttttttca ctg ca ttctagttgtggtttgtccaa a ctcatca atgtatcttatGGTACTGTGGAG
GGCTTTGAGCCCGCAGACAACAAGTGTCTGTTAAGAGCTACCGATGGGAAGAAGAAGATC
AG CACTGTG GTGAG CTTAATTTCTAAG G CTGTCTTTTGAAATGTAAAGA CTTGAACTTAACA
GAG GATG GG G CGTTTCTGAACCAG GCTTTTATTTGTTTTTCCCTTTTGCCCTGTGTGGCTATT
TTTG AGACCAG GACCTTCCTATACTTTAGTAGTG GAAACCTCAAGAATAAAATAAGAAG GT
AGAGGTCAGACAGTCGTTAGTTCTGCTAAAGCTCTTGTGGAAATGAAAGTAGCATATTG GT
CCTTATTCGTAGTACTGTAAGG AG CAG CTG G CATAAATATTTGATTCCTTAG CC CCTTACTG T
GCTGGGCCTTCAATAAGCAGTTTCTTGGTACCAGGTGATGCTATTATTTCCTTATCTTTGTGG
TTCAC
pARB I- CCCACGTCAGTGAGTGCTCACAAGTCTGTGAG

686: GCGCGGGAGCCCAAGCCACTTGTCCAGCGCCTCAGCCGAGCGCGAACCGCGCTCGGGGAT
SRS F9_1_ GG CATCCACACGGCCCGGCCCCAGGCTCTCCCTGTCAGCGCCCGAAGGCCCTTCGCACCTC
Exogenou CAGGGGGCGCCG GCCTGCGCGCACGCGCAATGGTCGCAGCCGCGTTCTCTTTAAGAGGAC
s_67 TCCTTTTGCCTCCG CCGACCCCTTCGCTTCCG
CTCCGCGTTCCCACAATGCAGTGCGGCTGAG
CGCCTCGGAGCCCGCGGGGACGCTGCGGGGGGACCCGTGCTGAggcggcggcggcgacgtgggc tgcggcgggcccgcggcgtcgggcggtgcggatgtcgggctgggcgga cgagcgcggcggcgAGGGCGACGGG
CG CATCTACtga cg a ctgtg ccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttga ccctgga a ggtgcca ctccca ctgtcctttccta ata a a atgagga a attgcatcgcattgtctgagtaggtgtcattcta ttctggggg gtggggtggggcagga cagca agggggagg attgggaaga ca atagca ggcatgctggggatgcggtgggctctatg gGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTCCCCG
AGAAGTTG GGG G GAG G GGTCGG CAATTGAACG GGTGCCTAGAGAAGGTGGCG CGG G GT
AAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGGGGAGAACC
GTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAACAC
AG CTGAAG CTTCGAGG GG CTCGCATCTCTCCTTCACGCGCCCG CCGCCCTACCTGAG GCCG
CCATCCACGCCG GTTGAGTCGCGTTCTGCCGCCTCCCG CCTGTGGTGCCTCCTGAACTG CGT
CCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCGCTCCCTTG
GAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTACGT
CTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTACTCTAGA
GCTAG CGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCC
ATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGG GC
GAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTG C
CCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTAC
CCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGG
AG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAG ACCCGCGCCGAGGTGAAGTTCGA
GGGCGACACCCTGGTGAACCG CATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGGCAA
CATCCTG GGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGG CCGAC
AAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGC
GTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGG CGACG GCCCCGTG CTGCTGC
CCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCG
ATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCT
GTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatca ca aa tttca caa ata a agcattt ttttca ctgcattcta gttgtggtttgtcca a a ctcatca atgtatcttatGTGGGGAACCTTCCGACCGACGTG
CGCGAGAAGGACTTGGAGGACCTGTTCTACAAGTACGGCCGCATCCGCGAGATCGAGCTC
AAGAACCGGCACGGCCTCGTGCCCTTCGCCTTCGTGCGCTTCGAGGACCCCCGGTGAGGccc ccgcgcccctgccctctcctcctcggtgcctgaggcccccgccttccctgctgtcccctcccccagggctcccctcccc ccg gcctcccctccccccgcctccgcgcaga cccctcaCGGCGCCCCCTCACGG GTGGAGGATGAGGCAGCC
TCTCCTCGCAGGCCCGGGCCGTCCTTCGCGCCGTCGTCACTTCCTTTATTTTTATTATTCCAAT
ATTTTACTTAGAAACCCAAAAG CTGAGCCTTTGGAGGGCCCAAGCCCGCCTGCACCGGCCT
CGGGGG GTCCAAATGAGCCTTGTCCGCC
pARB I-687: AAAATCAATAATCTGTCAGCAGTTGATTCttttttttCTCTGAATTAGCCCTCTTATGAGTAATCT
SRS F9_2_ GTTTGCTCATTCATTATAATTTCATACCTCTAAAACTGCGTGTGACAGCTGTAAAGGTTAATT
169 Exogenou CCAGTATCATGAAACGTCTCCAAACCAAAGCAGAAGTGCTTCAGGATCCTGATTTGTGTGTT
s_68 TTTTTCTTCACTCTAGGTTTCCCTTTTATGCTTACTACTCATGCCCCTCACTTGGAAAGTCACT
TGGCCTCCTGAACAGCACTAACTCCAAACGttttttttgttgttgttgttttttttAGAGATGCAGAGGA
TGCTATTTATGGAAGAAATGGTTATGATTATGGCCAGTGTCGGCTTCGTGTGGAGTTCCCCA
GGtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttgaccctggaaggtgcc act cccactgtcctttcctaataaaatgaggaaattgcatcgcattgtctgagtaggtgtcattctattctggggggtgggg tgg ggcaggacagcaagggggaggattgggaagacaatagcaggcatgctggggatgcggtgggctctatggGGATCT
GCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTCCCCGAGAAGTT
GG GGGGAGGG GTCGGCAATTGAACGGGTGCCTAGAGAAGGTGGCGCGGGGTAAACTGG
GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGGGGAGAACCGTATATAA
GTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAACACAGCTGAA
GCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCTGAGGCCGCCATCCAC
GCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAACTGCGTCCGCCGTC
TAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCGCTCCCTTGGAGCCTAC
CTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTACGTCTTTGTTT
CGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTACTCTAGAGCTAGCG
AATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGG
TCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGC
GATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGC
CCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGAC
CACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCA
CCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCG
ACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCC
TGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCA
GAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCA
GCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC
AACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACA
TGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATG GACGAGCTGTACAA
atgattgtttattgcagcttataatggttacaaataaagcaatagcatcacaaatttcacaaataaagcatttttttca ctg cattctagttgtggtttgtccaaactcatcaatgtatcttatACTTATGGAGGTCGGGGTGGGTGGCCCCGT
GGTGGGAGGAATGGGCCTCCTACAAGAAGATCTGATTTCCGAGTTCTTGTTTCAGGTATGT
TCCTTTCAAACAGaatgagatgatacatgtaaaatacttaacacagagtctgtcttccaagaaatgatagctgttat tcttCAGTGCATGGGACACGGGGGCTTTCTTTTCAATAGCCTGTGTGAAGCCTTGCCCTGGAT
TGCCAATGAGGAAAGTATCCTGCAAATGAAATTGCGCTGGGAGTGCAGCCTTGGAAGAAC
ATAACCATATTTCTTGTAAAGGAGTTTTCTAGTGGTGAGAAGGAAAGATGATGGGAAAACT
TGAGCTACAATTCTAAAGATGCTTCTTTTGGAATATACTTGGCATCAGACATGGTAGAAAGG
CATTCAAGGAGCCAGATTTGAACAACTTACCCAGCC
pARBI- GATGGCATCCACACGGCCCGGCCCCAGGCTCTCCCTGTCAGCGCCCGAAGGCCCTTCGCAC 2541 688!
CTCCAGGGGGCGCCGGCCTGCGCGCACGCGCAATGGTCGCAGCCGCGTTCTCTTTAAGAG
SRSF9_3_ GACTCCTTTTGCCTCCGCCGACCCCTTCGCTTCCGCTCCGCGTTCCCACAATGCAGTGCGGCT
Exogenou GAGCGCCTCGGAGCCCGCGGGGACGCTG CGG GGGG ACCCGTGCTGAggcggcggcggcgacgt s_69 gggctgcggcgggcccgcggcgtcgggcggtgcggatgtcgggctgggcggacgagcgcggcggcgAGGGCGAC
GGGCGCATCTACGTGGGGAACCTTCCGACCGACGTG CGCGAGAAGGACTTGGAGGACCTG
TTCTACAAGTACGGCCGCATCCGCGAGATCGAGCTCAAGAACCGGCACGGCCTCGTGCCCT
TCGCCTTCtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttgaccctggaa gg tgccactcccactgtcctttcctaataaaatgaggaaattgcatcgcattgtctgagtaggtgtcattctattctgggg ggt ggggtggggcaggacagcaagggggaggattgggaagacaatagcaggcatgctggggatgcggtgggctctatgg GGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTCCCCGA
GAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGGTGGCGCGGGGTA
AACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGGGGAGAACCGT
ATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAACACA
GCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCTGAGGCCGC
CATCCACGCCGGITGAGTCGCGTTCTGCCGCCTCCCGCCIGTGGTGCCICCTGAACTGCGIC
CGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCGCTCCCTTGG
AGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTACGTC
170 TTTGTTTCGTTTTCTGTTCTGCG CCGTTACAGATCCAAGCTGTGACCGGCGCCTACTCTAGAG
CTAGCGAATTgccgccaccATGGTGAGCAAGG GCGAGGAGCTGTTCACCGG GGTGGTGCCCA
TCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCG
AG GGCGATG CCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCC
CGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACC
CCGACCACATGAAG CAGCACGACTTCTTCAAGTCCGCCATGCCCGAAG GCTACGTCCAGGA
GCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAG
GGCGACACCCTGGTGAACCGCATCGAGCTGAAG GGCATCGACTTCAAGGAGGACGGCAAC
ATCCTGG G GCACAAGCTGGAGTACAACTACAACAG CCACAACGTCTATATCATGGCCGACA
AGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGG CAGCG
TGCAG CTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTG CC
CGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGA
TCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACG AG CTG
TACAAatgattgtttattgcagcttataatggttacaa ata a agca atagc atca ca a atttc a ca a ata aagcattttt ttca ctgca ttctagttgtggtttgtcca a actcatca atgtatcttatGTGCGCTTCGAGGACCCCCGGTGAG
Gcccccgcgcccctgccctctcctcctcggtgcctgaggcccccgccttccctgctgtcccctcccccagggctcccct ccc cccggcctcccctccccccgcctccgcgcaga cccctca CG GCGCCCCCTCACGGGTGGAGGATGAGGCA
GCCTCTCCTCGCAGGCCCGGGCCGTCCTTCG CGCCGTCGTCACTTCCTTTATTTTTATTATTC
CAATATTTTACTTAGAAACCCAAAAGCTGAG CCTTTG GAG GGCCCAAGCCCGCCTGCACCG
GCCTCGGGGGGTCCAAATGAGCCTTGTCCGCCTCCTGCCTGGGGCAGCACCGTAGGG GGA
AG CGGCCGCGG G GCAGCGCGGG G GTCGCCGTTCGCCCTTCCCG CTCGCCTCTCCCCCG GCC
CGTGCTCGCCGTGG CTG GAGAGCAAGCT
pARB I-689:
CAtttttttttCTACCAAAGAAATCGTATGTGGGATCCCAAACCACAAAATAACCGTTCCTGTGG
SU B1_1_ TTAATACTACTATAATGCCTG AAGTGICTITIGGGATCCTGAGAACAGAGITTGAAAACATT
Exogenou ACTAGACAGAAGGATTGGTTAGATTCATAGTTTTGTTGTTGAGTGAAACTTG CTTATGTATA
s_70 TATTTATGATATTTTGGATGTAGTCTTTTGATTGTTTAAATCTTAAAAAGTAATGGGATCTTT
TGACACTGGGGTATGTTTTATTTTTATGTGTGCAAATTTTAACCATATTCTTTTCTAGTTAAA
GAGGAAAAAGCAAGTTGCTCCAGAAAAACCTGTAAAGAAACAAAAGACAGGTGAGACTTC
GAGAGCCCTGTCAtgacga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttga ccc tggaaggtgccactcccactgtcctttccta ata a a atgagga a attgcatcgcattgtctgagtaggtgtcattctattct ggggggtggggtggggcagga cagca agggggaggattggga a ga ca atagcaggcatgctggggatgcggtgggc tctatggGGATCTGCGATCGCTCCG GTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTC
CCCGAGAAGTTGG GG GGAGGGGTCGGCAATTGAACGG GTGCCTAGAGAAGGTGGCGCGG
GGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGG GTG GGGG AGA
ACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAA
CACAGCTGAAGCTTCGAGGGG CTCGCATCTCTCCTTCACG CGCCCGCCGCCCTACCTGAGG
CCG CCATCCACG CCGGTTGAGTCGCGTTCTGCCG CCTCCCGCCTGTGGTGCCTCCTGAACTG
CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCGCTCCC
TTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTA
CGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTACTCT
AGAGCTAGCGAATTgccgccaccATGGTGAGCAAG GGCGAGGAGCTGTTCACCGG GGTG GT
GCCCATCCTGGTCGAGCTGGACG GCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGA
GGGCGAGGGCGATG CCACCTACG GCAAG CTGACCCTGAAGTTCATCTG CACCACCGGCAA
GCTG CCCGTGCCCTGG CCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCAG CC
GCTACCCCGACCACATGAAG CAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGT
CCAGGAGCGCACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCGCGCCGAGGTGAA
GTTCGAGG GCGACACCCTG GTGAACCG CATCGAGCTGAAGGG CATCGACTTCAAGGAG GA
CG GCAACATCCTGG GGCACAAGCTGGAGTACAACTACAACAG CCACAACGTCTATATCATG
GCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGAC
GGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGC
TGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAA
GCGCGATCACATG GTCCTGCTGGAGTTCGTGACCG CCG CCG GGATCACTCTCG GCATGGAC
GAGCTGTACAAatgattgtttattgcagcttata atggttaca a ata a agca ata gcatca ca a atttca ca aata a agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatctta tTCTTCTAAACAGAG
CAGCA
171 GCAGCAGAGATGATAACATGTTTCAG GTAAAGTTGGCTAttttttttttttttttttttga catggagtcat gctctgtca cccaggctggagtgcagtggcgccatctcggctca ctgca a cctc agcctcctgagttca agcagttctctg cctcagcctcccgagtagctaggattacaggcatccgccaccagacctggctaatttttgtatttttagtagagatggg gt ttca cca tcttggccaggctggtcttga a ctcctga ccttgtgatcca a ctgcctcagcctccaa a agtgctgggttta cag gtgtgagccaccatgccttgccAAAGTTGGCTGTTTCTTTAGATTCAGAGGAATTATTATCTGGCTT
GATCTGAAGAATGTTAAAAGTACTATGATCTGATAATTGCCTAATATG
pARBI- ATAAATAAACATATCCTGGAAATGGAATAAGTTGGTTATATTCTTTTTAATTAGTATATCTGC 2541 690:
TTCGTAAAATAAGTAACTGACTAGCCTTAGTAGACTGTGGTTTCCAGGTTTATTCAGAAGTA
SU B1_2_ GCAAGATCCCTCCATTTTTTTTTCTACCAAAGAAATCGTATGTGGGATCCCAAACCACAAAAT
Exogenou AACCGTTCCTGTG GTTAATACTACTATAATGCCTGAAGTGTCTTTTGGGATCCTGAGAACAG
s_71 AGTTTGAAAACATTACTAGACAGAAGGATTGGTTAGATTCATAGTTTTGTTGTTGAGTGAAA
CTTGCTTATGTATATATTTATGATATTTTGGATGTAGTCTTTTGATTGTTTAAATCTTAAAAAG
TAATGGGATCTTTTGACACTGG GGTATGTTTTATTTTTATGTGTGCAAATTTTAACCATATTC
TTTTCTAGTTAtga cg a ctgtg ccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttga ccctgga aggtgccactccca ctgtcctttccta ata a aatgagga a attgcatcgcattgtctgagtaggtgtcattctattctgggg ggtggggtggggca ggacagca a gggggaggattggga aga ca atagcaggcatgctggggatgcggtgggctctat ggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTCCCC
GAGAAGTTGGG G G GAG G GGTCGG CAATTGAACGGGTGCCTAGAGAAGGTGGCGCG G GG
TAAACTGG GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG GGAGAACC
GTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAACAC
AG CTGAAG CTTCGAGG GG CTCGCATCTCTCCTTCACGCGCCCG CCGCCCTACCTGAG GCCG
CCATCCACGCCG GTTGAGTCGCGTTCTGCCGCCTCCCG CCTGTGGTGCCTCCTGAACTG CGT
CCG CCGTCTAG GTAAGTTTAAAGCTCAGGTCGAGACCG G GCCTTTGTCCG G CGCTCCCTTG
GAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAACTCTACGT
CTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTACTCTAGA
GCTAG CGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCC
ATCCTGGTCGAGCTGGACG GCGACGTAAACGGCCACAAGTTCAGCGTGICCGGCGAGG GC
GAGGGCGATGCCACCTACGGCAAGCTGACCCTG AAGTTCATCTGCACCACCGGCAAGCTG C
CCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTAC
CCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCG AAGGCTACGTCCAGG
AG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAG ACCCGCGCCGAGGTGAAGTTCGA
GGGCGACACCCTGGTGAACCG CATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGGCAA
CATCCTG GGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGG CCGAC
AAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGC
GTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGC
CCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCG
ATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCT
GTACAAatgattgtttattgcagcttata atggtta ca a ata a agca atagcatca ca aa tttca caa ata a agcattt ttttca ctgcattcta gttgtggtttgtcca a a ctcatca atgtatcttatAAGAGGAAAAAG
CAAGTTGCTCCA
GAAAAACCTGTAAA GAAACAAAAG ACAG GTGAGACTTCGAGAG CCCTGTCATCTTCTAAAC
AG AG CAG CAG CAG CAGAG ATGATAACATGTTTCAG GTAAAGTTG G CTATTTTTTTTTTTTTT
TTTTTTGACATGGAGTCATGCTCTGTCACCCAG GCTGGAGTGCAGTGGCGCCATCTCG GCTC
ACTGCAACCTCAGCCTCCTGAGTTCAAG CAGTTCTCTGCCTCAGCCTCCCGAGTAGCTAG GA
TTACAG G CATCCG CCACCAGACCTG G CTAATTTTTGTATTTTTAGTAGAGATG G GGTTTCAC
CATCTTGGCCAGGCTGGTCTTG AACTCCTGACCTTGTGATCCAACTGCCTCAGCCTCCAAAA
GTGCTG GGTTTACAGGTGTGAGCCACCATGCCTTG CCAAAGTTG GCTGTTTCTTT
pAR B I-691: GTTTATTCAG AAGTAG
CAAGATCCCTCCATTTTTTTTTCTACCAAAGAAATCGTATGTG G GAT
SU B1_3_ CCCAAACCACAAAATAACCGTTCCTGTG GTTAATACTACTATAATGCCTGAAGTGTCTTTTG
Exogenou GGATCCTGAGAACAGAGTTTGAAAACATTACTAGACAGAAGGATTGGTTAGATTCATAGTT
s_72 TTGTTGTTGAGTG
AAACTTGCTTATGTATATATTTATGATATTTTGGATGTAGTCTTTTGATT
GTTTAAATCTTAAAAAGTAATG GGATCTTTTGACACTGGGGTATGTTTTATTTTTATGTGTGC
AAATTTTAACCATATTCTTTTCTAGTTAAAGAGGAAAAAGCAAGTTGCTCCAGAAAAACCTG
TAAAGAAACAAAAGtgacga ctgtgccttctagttg cc agccatctgttgtttgcccctcccccgtgccttccttga c cctggaaggtgccactcccactgtcctttccta ataa a atgaggaa attgcatcgcattgtctgagtaggtgtcattctatt
172 ctggggggtggggtggggcaggacagcaagggggaggattgggaaga caatagcaggcatgctggggatgcggtgg gctctatggGGATCTGCGATCG CTCCG GTG CCCGTCAGTGGG CAGAGCGCACATCGCCCACAG
TCCCCGAGAAGTTGGGGG GAG GG GTCG GCAATTGAACGGGTGCCTAGAGAAGGTGGCG C
GGGGTAAACTG G GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG GGA
GAACCGTATATAAGTGCAGTAGTCG CCGTGAACGTTCTTTTTCGCAACG G GTTTG CCG CCA
GAACACAGCTGAAGCTTCGAG GGGCTCGCATCTCTCCTTCACG CGCCCGCCGCCCTACCTG
AG GCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTGAA
CTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCG GGCCTTTGTCCGGCG CT
CCCTTG G AG CCTACCTAG ACTCAG C CG G CTCTC CACG CTTTG CCTG ACC CTG CTTG CTCAA
CT
CTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCTAC
TCTAGAG CTAG CGAATTgccgccaccATGGTGAG CAAGG GCGAGGAGCTGTTCACCGGG GIG
GTGCCCATCCTGGTCGAGCTGGACGG CGACGTAAACGGCCACAAGTTCAG CGTGTCC GG C
GAGG GCGAG GGCG ATG CCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGG C
AAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACG GCGTGCAGTGCTTCA
GCCGCTACCCCGACCACATGAAG CAG CACGACTTCTTCAAGTCCGCCATG CC CGAAG GCTA
CGTCCAG G AG CG CACCATCTTCTTCAAGGACG ACG G CAACTACAAGACCCG CGCCGAGGTG
AAGTTCGAGGG CGACACCCTGGTGAACCGCATCGAG CTGAAG GG CATCG ACTTCAAGG AG
GACGGCAACATCCTGG GGCACAAG CTG GAGTACAACTACAACAGCCACAACGTCTATATCA
TGGCCGACAAGCAGAAGAACG GCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG G
ACG GCAGCGTG CAG CTCG CCGACCACTACCAG CAGAACACCCCCATCGGCGACG GCCCCGT
GCTG CTG CCCGACAACCACTACCTGAGCACCCAGTCCAAG CTG AG CAAAG ACCCCAACG AG
AAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCG GCATGG
ACGAG CTGTACAAatgattgtttattgca gcttata a tggttaca aata a agca atagca tca c a a atttca ca a at aa agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatcttatACAGG
TGAGACTTC GAG
AG CCCTGTCATCTTCTAAACAGAG CAG CAG CAGCAGAGATGATAACATGTTTCAGG TAAAG
TTG G CTATTTTTTTTTTTTTTTTTTTTG ACATG G A G TCATG CTCTGTCACC CA G G CTG G A GTG
C
AG TG G CG CCATCTCG G CTCACTG CAACCTCAG CCTC CTG A GTTCAA G CA G TTCTCTG CCTCA

GC CTCCC G AGTAG CTA G GATTACAG G CAT CCG C CA CCA G AC CTG GCTAATTTTTGTATTTTT

AGTAG AG ATG G G GTTTCACCATCTTGG CCAG GCTG GTCTTG AACTCCTG ACCTTG TG ATC CA
ACTGCCTCAGCCTCCAAAAGTGCTGGGTTTACAG GTGTGAG CCACCATGCCTTG CCAAAGTT
G G CTG TTTCTTTA G ATTCAG AG GAATTATTATCTG G CTTG ATCTG AA G AATG TTAAAA GT
pA R B I- CACATTTTAATTTTTGTTTC CATGCTCTTTA G AATTCAACTAG AG G GCAG

692: CCCGAAGCAAGCCTGATGGAACAG GATAGAACCAACCATGTTGAG
GGCAACAGACTAAGT
TET2_1_E CCATTCCTGATACCATCACCTCCCATTTGCCAGACAGAACCTCTGGCTACAAAG CTCCAG AA
xogenous TG GAAG CCCACTG C CTGAGAG AG CTCATCCAGAAGTAAATGG AGACACCAAGTGG CACTCT

GAAGCCAGAATAGTCGTGTGAGTCCTG
ACTTTACACAAGAAAGTAG AG G GTATTCCAAG TGTTTG CAAAATG GAG GAATAAAACG CAC
AGTTAGTGAACCTTCTCTCTCTGGGCTCCTTCAGATCAAGAAATTGAAACAAGACCAAAAG G
CTAATGGA GAAAG ACGTAACtga cga ctgtgccttcta gttg cc agccatctgttgtttgccectcccccgtgcct tccttga ccctgg a aggtgcca ctccca ctgtcctttccta ata a aa tgagga a attgcatcgcattgtctgagtaggtgtc attctattctggggggtggggtggggcaggacagcaagggggaggattggga agacaatagcaggcatgctggggat gcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTG GG CAGAGCG CACATCGC
CCACAGTCCCCGAGAAGTTG G GGG GAGGG GTCG GCAATTGAACGG GTGCCTAGAGAAGG
TG GCG CGGG GTAAACTGG GAAAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCCGAGGGTG
GG GGAGAACCGTATATAAGTG CAGTAGTCG CCGTGAACGTTCTTTTTCGCAACG GGTTTGC
CG CCAGAACACAG CTGAAGCTTCGAGG GGCTCG CATCTCTCCTTCACG CG CCCG CCGCCCT
ACCTGAGG CCGCCATCCACG CCG GTTGAGTCGCGTTCTGCCG CCTCCCG CCTGTG GTGCCTC
CTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAG CTCAGGTCGAGACCGG GCCTTTGTCCG
GCGCTCCCTTGG AG CCTACCTAGACTCAG CCGGCTCTCCACGCTTTGCCTGACCCTG CTTGC
TCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTG CG CCGTTACAGATCCAAG CTGTGACCG GC
GCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAG CAAGGGCGAGGAGCTGTTCACC
GG GGTG GIG CCCATCCTG GTCGAG CTGGACGGCGACGTAAACG G CCACAAGTTCAGCGTG
TCCGGCGAGG GCGAG GGCGATG CCACCTACG GCAAGCTGACCCTGAAGTTCATCTGCACC
ACCGG CAAGCTGCCCGTG CCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT
GCTTCAG CCGCTACCCCGACCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATG CCCGAA
173 GGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCG
AG GTG AAGTTCGAGG GCGACACCCTG GTGAACCG CATCG AGCTGAAG GGCATCGACTTCA
AG GAG G ACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCT
ATATCATGGCCGACAAGCAG AAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACAT
CGAGGACGGCAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGG
CCCCGTGCTG CTGCCCGACAACCACTACCTG AG CACCCAGTCCAAGCTG AGCAAAGACCCC
AACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCG
GCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcacaaatt tca ca a ata aag ca tttttttca ctgcattctagttgtggtttgtcca aa ctca tcaatgta tcttatTTCG G GGTAAG
CCAAGAAAGAAATCCAGGTGAAAGCAGTCAACCAAATGTCTCCGATTTGAGTGATAAGAAA
GAATCTGTGAGTTCTGTAGCCCAAGAAAATGCAGTTAAAGATTTCACCAGTTTTTCAACACA
TAACTGCAGTGGGCCTGAAAATCCAG AGCTTCAGATTCTGAATGAGCAGGAGGGGAAAAG
TG CTAATTACCATGACAAGAACATTGTATTACTTAA AAACAA G G CA GTG CTAATG CCTAATG
GTGCTACAGTTTCTGCCTCTTCCGTGGAACACACACATGGTGAACTCCTGGAAAAAACACTG
TCTCAATATTATCCAGATTGTGTTTCCATTGCGGTGCAGAAAACCACATCTCACATAAATGCC
ATTAACAGTCAGGCTACTAATGAGTTGTCCTGTG AGATCACTCACCCATCGCATACCTCAGG
G CA GATC
pARBI- AATGGAGAAAGACGTAACTTCGGGGTAAGCCAAGAAAGAAATCCAGGTGAAAGCAGTCAA 2541 693:
CCAAATGTCTCCGATTTGAGTGATAAGAAAGAATCTGTGAGTTCTGTAGCCCAAGAAAATG

xogenous CAGATTCTGAATGAGCAGGAGGGGAAAAGTG CTAATTACCATGACAAGAACATTGTATTAC

CCTAATGGTGCTACAGTTTCTGCCTCTTCCGTGGAACAC
ACACATGGTGAACTCCTGGAAAAAACACTGTCTCAATATTATCCAGATTGTGTTTCCATTGC
GGTGCAGAAAACCACATCTCACATAAATGCCATTAACAGTCAGGCTACTAATGAGTTGTCCT
GTGAGATCACTCACCCATCGtgacga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgtgcctt ccttga ccctgga aggtgcca ctccca ctgtcctttccta a ta aaatgaggaaattgcatcgcattgtctgagtaggtgtc attctattctggggggtggggtggggcaggacagcaagggggaggattggga a ga ca atagcaggcatgctggggat gcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCG CACATCGC
CCACAGTCCCCGAGAAGTTG GGG G GAG G GGTCGGCAATTGAACG GGTGCCTAGAGAAGG
TGGCGCGGG GTAAACTGG GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTG
GG GGAGAACCGTATATAAGTG CAGTAGTCG CCGTGAACGTTCTTTTTCGCAACGGGTTTGC
CGCCAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCT
ACCTGAGGCCGCCATCCACGCCG GTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTC
CTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCG
GCGCTCCCTTGG AG CCTACCTAGACTCAG CCGGCTCTCCA CGCTTTGCCTG ACCCTG CTTGC
TCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGC
GCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAG CAAGGGCGAGGAGCTGTTCACC
GG GGTGGTG CCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTG
TCCGGCGAGGGCGAG GGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACC
ACCGG CAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT
GCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAA
GGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCG
AG GTGAAGTTCGAG G GCGACACCCTG GTGAACCGCATCGAG CTGAAGGG CATCGACTTCA
AG GAG G ACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCT
ATATCATGGCCGACAAGCAG AAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACAT
CGAGGACGG CAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGG CGACGG
CCCCGTGCTG CTGCCCGACAACCACTACCTG AG CACCCAGTCCAAGCTG AGCAAAGACCCC
AACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCG
GCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcacaaatt tca ca a ata aag ca tttttttca ctgcattctagttgtggtttgtcca aa ctcatcaatgtatcttatCATACCTCAGGG
CAGATCAATTCCGCACAGACCTCTAACTCTGAGCTGCCTCCAAAGCCAGCTGCAGTGGTGA
GTGAGGCCTGTGATGCTGATGATGCTGATAATGCCAGTAAACTAGCTGCAATGCTAAATAC
CTGTTCCTTTCAGAAACCAGAACAACTACAACAACAAAAATCAGTTTTTGAGATATGCCCAT
CTCCTGCAGAAAATAACATCCAGGGAACCACAAAGCTAGCGTCTGGTGAAGAATTCTGTTC
AG GTTCCAG CA GCAATTTG CAAG CTCCTG GTG G CA G CTCTGAACG GTATTTAAAACAAAAT
174 GAAATGAATG GTG CTTACTTCAAGCAAAG CTCAGTGTTCACTAAG GATTCCTTTTCTG CCAC
TACCACACCACCACCACCATCACAATTGCTTCTTTCTCCCCCTCCTCCTCTTCCACAGGTTCCT
CAGCTT
pARB I- GGAAAAAGCACTCTGAATGGTGGAGTTTTAGAAGAACACCACCACTACCCCAACCAAAGTA

694: ACACAACACTTTTAAGGGAAGTGAAAATAGAGGGTAAACCTG AGGCACCACCTTCCCAGAG
TET2_3_E TCCTAATCCATCTACACATGTATGCAG CCCTTCTCCGATGCTTTCTGAAAGGCCTCAGAATAA
xogenous TTGTGTGAACAGGAATGACATACAGACTGCAGG GACAATGACTGTTCCATTGTGTTCTGAG

AGCTACAG GACAACTGCCAGCAGTTGATGAGAAACAAAGAGCAAGAGATTCTGAAGGGTC
GAGACAAGGAG CAAACACGAGATCTTGTGCCCCCAACACAGCACTATCTGAAACCAG GAT
GGATTGAATTG AAGGCCCCTCGTtga cga ctgtgccttctagttgccagccatctgttgtttgc cc ctcccccgt gccttccttga ccctgga a ggtgcca ctccca ctgtcctttccta ata a aa tg agga a attgcatcgcattgtctgagtag gtgtcattctattctggggggtggggtggggcagga cagcaagggggaggattgggaagacaatagcaggcatgctgg ggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACAT
CGCCCACAGTCCCCGAGAAGTTG GGGG GAGGGGTCGGCAATTGAACGGGTGCCTAGAGA
AG GTGGCGCG GGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCG CCTTTTTCCCGAG
GGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGT
TTGCCGCCAGAACACAG CTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCG
CCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTG
CCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAG GTCGAGACCGGGCCTTTGT
CCGGCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTG CT
TGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACC
GG CGCCTACTCTAGAGCTAG CGAATTgccgccaccATGGTGAGCAAG GGCGAGGAGCTGTTC
ACCGG GGIGGIGCCCATCCIGGICGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGC
GTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGC
ACCACCGG CAAGCTGCCCGTGCCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGC
AGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCC
GAAGGCTACGTCCAGGAG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCG
CCGAG GTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACT
TCAAGGAGGACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACG
TCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAA
CATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGA
CG GCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGAC
CCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTC
TCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcaca aatttca ca a ata aagcatttttttca ctgcattctagttgtggtttgtcca a a ctcatcaatgtatcttatTTTCACCAA
GCGGAATCCCATCTAAAACGTAATGAGGCATCACTGCCATCAATTCTTCAGTATCAACCCAA
TCTCTCCAATCAAATGACCTCCAAACAATACACTGGAAATTCCAACATGCCTGGGGGGCTCC
CAAGGCAAGCTTACACCCAGAAAACAACACAGCTGGAGCACAAGTCACAAATGTACCAAGT
TGAAATGAATCAAGGGCAGTCCCAAGGTACAGTGGACCAACATCTCCAGTTCCAAAAACCC
TCACACCAGGTGCACTTCTCCAAAACAGACCATTTACCAAAAG CTCATGTGCAGTCACTGTG
TG G CA CTAGATTTCATTTTCAACAAAGAG CAGATTCCCAAACTGAAAAACTTATGTCCCCAG
TGTTGAAACAG CACTTGAATCAACAG G CTTCAGAG ACTGAG CCATTTTCAAACTCACA CCTT
TTGCAACAT
pARBI- AAAGGGAGGTTGAGATGGGCTGAGGTCTTCTAGGAGGCGGGGAGGGAGTGCAGCCTTGA 2541 695: CAAGCCTCCCTGTGG GCGAGGTGTAAAGAG GGGCAGAAGTCAGCTCTGGAAGCATAGGG C
TIGIT_LE AGTGGGTGGGGAGGAGATGGGCTGGG CTG GGCTGGAGTAGAGATGTGTG GAGAAGG GT
xogenous GAGAAGACTG GAAAGACAACCTGAATGGG GGACTGGGAGCCTTGAATAACAGG CATG GA

CAGGATGGACTGGAGAAACTATCATTCCAAAATCCAGTTGGGGCCTCAAAGGCCCTTAGAA
TTTTTCTAGGAAGGTTGAAGGCCAGCTGCTGACCCAGGACTCACATGTGCTTCGTCCTCTTC
CCTAGGAATGATGACAGGCACAATAGAAACAtgacgactgtgccttctagttgccagccatctgttgtttgc ccctcccccgtgccttccttgaccctgga aggtgcca ctccca ctgtcctttccta ata a a atga gga a attgcatcgcatt gtctgagtaggtgtcattctattctggggggtggggtggggcagga cagca agggggaggattggga a ga caatagca ggcatgctggggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGA
175 GCGCACATCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGG GTCGGCAATTGAACGGGTG
CCTAGAGAAGGTGGCGCGGGGTAAACTG GGAAAGTGATGTCGTGTACTGGCTCCGCCTTT
TTCCCGAGGGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCG
CAACGGGTTTG CCGCCAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGC
GCCCGCCGCCCTACCTGAGG CCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGC
CTGTGGTGCCTCCTGAACTGCGTCCGCCGTCTAG GTAAGTTTAAAGCTCAGGTCGAGACCG
GGCCTTTGTCCGGCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCT
GACCCTGCTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAA
GCTGTGACCGGCGCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAG
GAGCTGTTCACCGGGGIGGIGCCCATCCIGGICGAGCTGGACGGCGACGTAAACGGCCAC
AAGTTCAG CGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAG
TTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTA
CGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCC
GCCATGCCCGAAGGCTACGTCCAG GAG CGCACCATCTTCTTCAAGGACGACGGCAACTACA
AGACCCGCGCCGAGGTGAAGTTCGAGG G CGACACCCTGGTGAACCGCATCGAGCTGAAG G
GCATCGACTTCAAG GAG GACGGCAACATCCTGG GGCACAAGCTGGAGTACAACTACAACA
GCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGAT
CCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCC
ATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGA
GCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCG
GGATCACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agc aatagcatcaca a a tttca ca a ata a agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatctta tACGGGGAACATTTCTGCAGAGAAAGGTGGCTCTATCATCTTACAATGTCACCTCTCCTCCA
CCACGGCACAAGTGACCCAGGTCAACTGGGAGCAGCAGGACCAGCTTCTGGCCATTTGTAA
TGCTGACTTGGGGTGGCACATCTCCCCATCCTTCAAG GATCGAGTGGCCCCAGGTCCCGGC
CTGGGCCTCACCCTCCAGTCGCTGACCGTGAACGATACAGGGGAGTACTTCTGCATCTATCA
CACCTACCCTGATGGGACGTACACTGGGAGAATCTTCCTGGAGGICCTAG AAAGCTCAG CT
ATTCCTG CTG GAG CAAGTTG GTG GATAAACCTCTCCCTCTAG CATAG AAAATG CAATC CTGA
AACACTGCACAGCAGGGCTTCTCAATTCGGGATCACATTTGAATCACCTGAG GAGATTTTA A
ATCATACTGATGCCGAGGCC
pARB I- GAGGAGATGGGCTGGGCTGGGCTGGAGTAGAGATGTGTGGAGAAGGGTGAGAAGACTG

696: GAAAGACAACCTG AATGGGG GACTGGGAGCCTTG AATAACAGGCATG GAG
AGGAGCGTC
TIGIT_2_E TCTTGAAATGGAAGAAACAGGAAATAATTACAGCCTCTATG GAGGAGCAACAGGATGGAC
xogenous TG GAGAAACTATCATTCCAAAATCCAGTTG GGGCCTCAAAG GCCCTTAGAATTTTTCTAG GA

GTTGAAGGCCAGCTGCTGACCCAGGACTCACATGTGCTTCGTCCTCTTCCCTAGGAATG
ATGACAGGCACAATAGAAACAACG GGGAACATTTCTGCAGAGAAAG GTG GCTCTATCATCT
TACAATGTCACCTCTCCTCCACCACGG CACAAGTGACCCAGGTCAACTGG GAGCAGCAG GA
CCAGCTTCTGGCCATTTGTAATGCTGACtgacga ctgtgccttctagttgccagccatctgttgtttgcccctc ccccgtgccttccttga ccctgga a ggtgcca ctccca ctgtcctttccta ata a aa tgagga a attgcatcgcattgtctg agtaggtgtcattctattctggggggtggggtggggcaggacagcaagggggaggattggga agacaatagcaggcat gctggggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGG GCAGAGCGC
ACATCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGGGTGCCTAG
AG AAGGTGG CG CGGG GTAAACTGG GAAAGTGATGTCGTGTACTGG CTCCGCCTTTTTCCC G
AG GGTGGGGG AGAACCGTATATAAGTGCAGTAGTCG CCGTGAACGTTCTTTTTCG CAACG
GGTTTGCCGCCAGAACACAGCTGAAGCTTCGAGGGG CTCGCATCTCTCCTTCACGCGCCCG
CCGCCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTG
GTGCCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCT
TTGTCCGGCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCC
TGCTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGT
GACCGGCGCCTACTCTAGAG CTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCT
GTTCACCGGG GTGGTGCCCATCCTG GTCG AGCTGGACG G CGACGTAAACGGCCACAAG TT
CAGCGTGTCCG GCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCAT
CTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGC
GTGCAGTG CTTCAG CCG CTACCCCG ACCACATGAAGCAGCACGACTICTICAAGTCCGCCAT
GCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGG ACGACGG CAA CTACAAGACC
176 CGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATC
GACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCAC
AACGTCTATATCATGG CCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCC
ACAACATCGAG GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCG
GCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAG CTGAGCAA
AGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGAT
CACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttata atggtta caa ata a agcaatag catca ca aatttca caa ata a agcatttttttca ctg ca ttctagttgtggtttgtccaa a ctcatca atgtatcttatTTG
GGGTGGCACATCTCCCCATCCTTCAAG GATCGAGTGGCCCCAG GTCCCGG CCTGG GCCTCA
CCCTCCAGTCGCTGACCGTGAACGATACAGGGGAGTACTTCTGCATCTATCACACCTACCCT
GATGGGACGTACACTGGGAGAATCTTCCTGGAGGTCCTAGAAAGCTCAGGTATTCCTGCTG
GAG CAAGTTG GTG GATAAACCTCTCCCTCTAG CATAGAAAATG CAATCCTGAAACACTG CA
CAG CAGG G CTTCTCAATTCG GGATCACATTTGAATCACCTGAG GAG ATTTTAAATCATACTG
ATGCCGAGGCCTCACCCAGACCAATTCAATCAGAATCCCTAATAGCAGAGCTAAACAAGGG
TAAG GTCTAAAAG CATTTCCAG GTGATTCTA ATG G GCAG CCAATACTG AGAAC CACTGTTCT
TATGTAAGAAGCACATC
pARBI- CCTCCCTGTG GGCGAGGTGTAAAGAGG

697: GGGTGGGGAGGAGATGGGCTGGGCTG GGCTGGAGTAGAGATGTGTGGAGAAGGGTGAG

xogenous GAGCGTCTCTTGAAATGGAAGAAACAG GAAATAATTACAGCCTCTATG GAG GAGCAACAG

GATGGACTGGAGAAACTATCATTCCAAAATCCAGTTGGGGCCTCAAAGGCCCTTAGAATTT
TTCTAGGAAGGTTGAAGGCCAGCTGCTGACCCAGGACTCACATGTGCTTCGTCCTCTTCCCT
AGGAATGATGACAGGCACAATAGAAACAACGGGGAACATTTCTGCAGAGAAAGGTG GCTC
TATCATCTTACAATGTCACCTCTCCTCCACCtgacgactgtgccttctagttgccagccatctgttgtttgccc ctcccccgtgccttccttga ccctgga aggtgcca ctcc ca ctgtcctttccta ata a a atga gga a attgcatcgcattgt ctgagtaggtgtcattctattctggggggtggggtggggcaggacagcaagggggaggattgggaagacaatagcagg catgctggggatgcggtgggctctatggGGATCTGCGATCGCTCCG GTG CCCGTCAGTGGGCAGAGC
GCACATCGCCCACAGTCCCCGAGAAGTTG GGG G GAG G GGTCGGCAATTGAACGGGTGCCT
AGAGAAGGTGGCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTC
CCGAGGGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAA
CG GGTTTGCCGCCAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGCG CC
CGCCGCCCTACCTGAGGCCG CCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTG
TGGTGCCTCCTGAACTGCGTCCGCCGTCTAG GTAAGTTTAAAG CTCAGGTCGAGACCGGGC
CTTTGTCCGG CGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGAC
CCTG CTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTG CG CCGTTACAGATC CAAG CT
GTGACCGGCGCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAG GGCGAGGAG
CTGTTCACCG GGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACG GCCACAAG
TTCAGCGTGTCCGG CGAGGGCGAG G GCGATG CCACCTACGGCAAGCTGACCCTGAAGTTC
ATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGG
CGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCA
TGCCCGAAGGCTACGTCCAG GAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGAC
CCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAG CTGAAGGGCAT
CGACTTCAAGGAGGACGG CAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCA
CAACGTCTATATCATG GCCGACAAGCAGAAGAACGG CATCAAGGTGAACTTCAAGATCCGC
CACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCG
GCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAG CTGAGCAA
AGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGAT
CACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttata atggtta caa ata a agcaatag catca ca aatttca caa ata a agcatttttttca ctgca ttctagttgtggtttgtccaa a ctc atca atgtatcttatACG
GCACAAGTGACCCAGGTCAACTGGGAG CAGCAGGACCAGCTTCTGGCCATTTGTAATGCTG
ACTTGGGGTGGCACATCTCCCCATCCTTCAAGGATCGAGTGGCCCCAGGTCCCGGCCTGGG
CCTCACCCTCCAGTCGCTGACCGTGAACGATACAGGGGAGTACTTCTGCATCTATCACACCT
ACCCTGATGGGACGTACACTGGGAGAATCTTCCTGGAGGTCCTAGAAAGCTCAG GTATTCC
TG CTG G AG CAAGTTG GTG GATAAACCTCTCCCTCTAG CATAGAAAATG CAATCCTG AAACA
CTGCACAGCAGGGCTTCTCAATTCGGGATCACATTTGAATCACCTGAGGAGATTTTAAATCA
177 TACTGATGCCG AGG CCTCACCCAGACCAATTCAATCAGAATCCCTAATAG CAGAG CTAAAC
AAGGGTAAGGTCTAAAAG
pA RB I-698: GG CCTTTTTCCCATGCCTGCCTTTACTCTG
CCAGAGTTATATTGCTGGGGTTTTGAAGAAGAT
TRAC_1_ CCTATTAAATAAAAGAATAA G CAGTATTATTAAG TAG CCCTG CATTTCAG GTTT CCTTGAGT
Exogenou GG CAGGCCAGGCCTG GCCGTGAACGTTCACTGAAATCATGGCCTCTTGGCCAAGATTGATA
s_79 GCTTGTGCCTGTCCCTGAGTCCCAGTCCATCACGAGCAGCTGGTTTCTAAGATGCTATTTCC
CGTATAAAGCATGAGACCGTGACTTGCCAGCCCCACAG AGCCCCGCCCTTGTCCATCACTG
GCATCTGGACTCCAGCCTGGGTTGGGGCAAAGAGGGAAATGAGATCATGTCCTAACCCTG
ATCCTCTTGTCCCACAGATtgacga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttcc ttga ccctgga aggtgcca ctccca ctgtcctttccta a ta aaatgaggaaattgcatcgcattgtctgagtaggtgtcatt ctattctggggggtggggtggggcaggacagcaagggggaggattgggaagacaatagcaggcatgctggggatgcg gtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCA
CAGTCCCCGAG AAGTTGGGGG GAGG GGTCGGCAATTGAACGGGTGCCTAGAGAAGGTGG
CGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG
GGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTITTTCGCAACGGGTTTGCCG
CCAGAACACAGCTGAAGCTTCGAGGGG CTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACC
TGAGGCCGCCATCCACGCCG GTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTG
AACTG CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCG
CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAA
CTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCT
ACTCTAGAGCTAGCGAATTgccgccaccATGGTG AGCAAGGGCGAGGAGCTGTTCACCGGGG
TGGTGCCCATCCTGGTCGAG CTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG
GC GAGG GCG AG GG CGATGCCACCTACG G CAAGCTGACCCTG AAGTTCATCTGCACCACCG
GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTC
AG CCG CTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGG CT
ACGTCCAG GAGCGCACCATCTTCTTCAAG GACGACG GCAACTACAAGACCCGCGCCGAGGT
GAAGTTCG AGG G CGACACCCTGGTG AACCGCATCGAG CTGAAGGGCATCGACTTCAAG GA
GGACGGCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATC
ATGGCCGACAAGCAGAAG AACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG
GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCC
GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAG CAAAGACCCCAACG
AG AAGCGCGATCACATGGTCCTG CTGG AGTTCGTGACCG CCG CCG G GATCACTCTCGGCAT
GGACGAGCTGTACAAatgattgtttattgcagcttataatggttacaa ata a agcaatagc atca ca a atttca c a aata a agcatttttttca ctgcattctagttgtggtttgtcca a actcatca atgtatcttatATCCAG
AACCCTGACC
CTGCCGTGTACCAGCTGAGA GACTCTAAATCCAGTGACAAGTCTGTCTGCCTATTCACCG AT
TTTG ATTCTCAAACAAATGTGTCACAAAGTAAG G ATTCTGATGTGTATATCA CA GACAAAAC
TGTGCTAGACATGAGGTCTATGGACTTCAAGAGCAACAGTGCTGTGGCCTGGAG CAACAAA
TCTGACTTTG CATGTG CAAACG CCTTCAACAACAG CATTATTCCAGAA GACACCTTCTT CCCC
AG CCCAGGTAAG GGCAG CTTTGGTGCCTTCGCAGG CTGTTTCCTTGCTTCAG GAATGGCCA
GGTTCTGCCCAGAGCTCTGGTCAATGATGTCTAAAACTCCTCTGATTGGTGGTCTCGGCCTT
ATCCATT GCCACCAAAACC CTCTTTTTACTAAGAAACAGTG A G CCTTGTTCTG GCAGTCCAG
A
pARBI- GATTCCAAGATGTACAGTTTGCTTTGCTGGG

699:
AGTTATATTGCTGGGGTTTTGAAGAAGATCCTATTAAATAAAAGAATAAGCAGTATTATTAA
TRAC_2_ GTAGCCCTGCATTTCAGGTTTCCTTGAGTGGCAG GCCAGGCCTGGCCGTGAACGTTCACTG
Exogenou AAATCATGGCCTCTTGGCCAAGATTGATAGCTTGTGCCTGTCCCTGAGTCCCAGTCCATCAC
s_80 GAGCAGCTGGTTTCTAAGATGCTATTTCCCG TATAAAGCATGAGACCGTG ACTTG
CCA GCCC
CACAG AGCCCCGCCCTTGTCCATCA CTG G CATCTGGA CTCCAG CCTGGGTTG G GGCAAA GA
GGGAAATGAGATCATGTCCTAACCCTGATCCTCTTGTCCCACAGATATCCAGAACCCTGACC
CTGCCGTGTACCAGCTGtgacga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttcctt ga ccctgga aggtg cca ctccca ctgtcctttcctaata a a atgagga a attgcatcgcattgtctga gtaggtgtca ttc tattctggggggtggggtggggca ggacagca a gggggagga ttggga aga ca atag ca ggcatgctggggatgcgg tgggctctatggGG ATCTGCGATCGCTCCGGTGCCCGTCAGTGG GCAGAGCGCACATCGCCCAC
AGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACGG GTGCCTAGAGAAG GTG GC
178 GCGGGGTAAACTGGG AAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGGG
GAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGC
CAGAACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACCT
GAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCCTG
AACTG CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCG
CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAA
CTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCT
ACTCTAGAGCTAGCGAATTgccgccaccATGGTG AGCAAGGGCGAGGAGCTGTTCACCGGGG
TGGTGCCCATCCTGGTCGAG CTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG
GCGAGG GCG AG GG CGATGCCACCTACG G CAAGCTGACCCTG AAGTTCATCTGCACCACCG
GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTC
AG CCG CTACCCCGA CCACATG AAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGG CT
ACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGT
GAAGTTCG AGGG CGACACCCTGGTG AACCGCATCGAG CTGAAGGGCATCGACTTCAAG GA
GGACGGCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATC
ATGGCCGACAAGCAGAAG AACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG
GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCC
GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAG CAAAGACCCCAACG
AG AAGCGCGATCACATGGTCCTG CTGG AGTTCGTGACCG CCG CCG G GATCACTCTCGGCAT
GGACGAGCTGTACAAatgattgtttattgcagcttata atggttacaa ata a agcaatagc atca ca a atttca c a aata a agcatttttttca ctgcattctagttgtggtttgtcca a actcatca atgtatcttatAGAG
ACTCTAAATCC
AGTGACAAGTCTGTCTGCCTATTCACCGATTTTGATTCTCAAACAAATGTGTCACAAAGTAA
G GATTCTGATGTGTATATCACAGACAAAACTGTG CTAGACATG AG GTCTATG GACTTCAAG
AG CAACAGTGCTGTGG CCTGGAGCAACAAATCTGACTTTGCATGTGCAAACGCCTTCAACA
ACAGCATTATTCCAGAAGACACCTTCTTCCCCAGCCCAGGTAAGGGCAGCTTTGGTGCCTTC
GCAGGCTGTTTCCTTGCTTCAGGAATGGCCAGGTTCTGCCCAG AGCTCTGGTCAATGATGTC
TAAAACTCCTCTGATTG GTG GT CTCG G CCTTATCCATTG CCACCAAAACCCTCTTITTACTAA
GAAACAGTGAGCCTTGTTCTGGCAGTCCAGAGAATGACACGG GAAAAAAGCAGATGAAGA
GAAG
pA RB I-700:
TTCTGAGCACCTACCCCATCCCCAGAAGGGCTCAGAAATAAAATAAGAGCCAAGTCTAGTC
TRAC_3_ GGTGTTTCCTGTCTTGAAACACAATACTGTTGGCCCTGGAAGAATGCACAGAATCTGTTTGT
Exogenou AAGGGGATATGCACAGAAGCTGCAAGGGACAGGAG GTGCAGGAGCTGCAGGCCTCCCCC
s_81 ACCCAGCCTGCTCTGCCTTGGG GAAAACCGTG
GGTGTGTCCTGCAGGCCATGCAGGCCTGG
GACAT G CA AG CCCATAACCG CTGTGGCCTCTTGGTTTTACAGATACGAACCTAAACTTTCAA
AACCTGTCAGTGATTGGGTTCCGAATCCTCCTCCTGAAAGTGGCCGGGTTTAATCTGCTCAT
GACGCTGCGGCTGTGGTCCAGCtga cga ctgtgccttctagttgccagc ca tctgttgtttg cc cctc cc ccgtg ccttccttga ccctgga aggtgcca ctccca ctgtcctttccta ata a a atga gga a attgcatcgcattgtctgagtaggt gtcattctattctggggggtggggtggggcagga cagcaagggggaggattgggaagacaatagcaggcatgctgggg atgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCG
CCCACAGTCCCCGAGAA GTTG GGGG GAG GG GTCG G CAATTG AACGG GTGC CTAG AG AAG
GTGGCGCGG GGTAAACTG GGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGT
GGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTG
CCGCCAGAACACAG CTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCC
TACCTGAGGCCGCCATCCACGCCGGTTG AGTCGCGTTCTGCCG CCTCCCGCCTGTGGTG CCT
CCTGAACTG CGTCCG CCGTCTA GGTAAGTTTAAAGCTCAGGTCGAG ACCG GGCCTTTGTCC
GGCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTG
CTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGG
CGCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACC
GG GGTGGTG CCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTG
TCCGGCGAGGGCGAG GGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACC
ACCGG CAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT
GCTTCAG CCGCTACCCCGACCACATGAAGCAG CACGACTTCTTCAAGTCCGCCATG CCCGAA
GGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCG
AG GTG AAGTTCGAGG GCGACACCCTG GTGAACCG CATCG AGCTGAAG GGCATCGACTTCA
179 AG GAG GACG GCAACATCCTGG GGCACAAG CTGGAGTACAACTACAACAG CCACAACGTCT
ATATCATGGCCGACAAGCAG AAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACAT
CGAGGACGGCAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGG
CCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTG AGCAAAGACCCC
AACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCG
GCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcacaaatt tca ca a ata aag ca tttttttca ctgcattctagttgtggtttgtcca aa ctcatcaatgtatcttatTGAGGTGAG GG
GCCTTGAAGCTGGGAGTGGGGTTTAGGGACGCGGGTCTCTGGGTGCATCCTAAGCTCTGA
GAGCAAACCTCCCTGCAGGGTCTTGCTTTTAAGTCCAAAGCCTGAGCCCACCAAACTCTCCT
ACTTCTTCCTGTTACAAATTCCTCTTGTGCAATAATAATGG CCTGAAACGCTGTAAAATATCC
TCATTTCAGCCGCCTCAGTTGCACTTCTCCCCTATGAGGTAGGAAGAACAGTTGTTTAGAAA
CGAAGAAACTGAG GCCCCACAGCTAATGAGTG GAGGAAGAGAGACACTTGTGTACACCAC
ATGCCTTGTGTTGTACTTCTCTCACCGTGTAACCTCCTCATGTCCTCTCTCCCCAGTACG GCTC
TCTTAGCTCAGTAGAAAGAAGACATTACACTCATATTACACCCCAATCCTGGCTAGAGTCTC
CGCACC
pARB I- ATTATTAAGTAGCCCTGCATTTCAGGTTTCCTTGAGTGGCAGGCCAGGCCTG

701: GTTCACTGAAATCATG
GCCTCTTGGCCAAGATTGATAGCTTGTGCCTGTCCCTGAGTCCCAG
TRAC_4_ TCCATCACGAGCAGCTGGTTTCTAAGATGCTATTTCCCGTATAAAGCATGAGACCGTGACTT
Exogenou GCCAG CCCCACAGAGCCCCGCCCTTGTCCATCACTGGCATCTGGACTCCAGCCTGGGTTGG
s82 G G CAAAGAG G
GAAATGAGATCATGTCCTAACCCTGATCCTCTTGTCCCACAGATATCCAG A
ACCCTGACCCTGCCGTGTACCAGCTGAGAGACTCTAAATCCAGTGACAAGTCTGTCTGCCTA
TTCACCG ATTTTGATTCTCAAACAAATGTGTCACAAAGTAAGGATTCTGATGTGTATATCAC
AGACAAAACTGTGCTAGACtga cg actgtgccttcta gttgccagccatctgttgtttgcccctcccccgtgccttc cttga ccctgga aggtgcca ctccca ctgtcctttccta ata a aatgagga a a ttgcatcgc attgtctgagtaggtgtca ttctattctggggggtggggtggggcagga cagcaagggggaggattgggaagacaatagcaggcatgctggggatgc ggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCC
ACAGTCCCCGAGAAGTTGGGG GGAGGGGTCGGCAATTGAACGGGTGCCTAGAGAAGGTG
GCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGG
GG GAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACG G GTTTG CC
GCCAGAACACAGCTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCTA
CCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTCC
TGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGG
CGCTCCCTTG GAGCCTACCTAG ACTCAGCCG GCTCTCCACGCTTTGCCTGACCCTGCTTGCTC
AACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGC
CTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGG CGAG GAG CTGTTCACCGG
GGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTC
CGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCAC
CG GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGC
TTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCG CCATGCCCGAAG
GCTACGTCCAG GAGCGCACCATCTTCTTCAAGGACGACG GCAACTACAAGACCCGCGCCGA
GGTGAAGTTCGAGG G CGACACCCTG GTGAACCGCATCGAGCTGAAG GGCATCGACTTCAA
GGAGGACGGCAACATCCTGGGGCACAAGCTG GAGTACAACTACAACAGCCACAACGTCTA
TATCATG GCCGACAAGCAGAAGAACG G CATCAAGGTGAACTTCAAGATCCG CCACAACATC
GAGGACG GCAGCGTG CAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACG GC
CCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCA
ACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGG
CATGGACGAGCTGTACAAatgattgtttattgcagcttataatggttacaa ata a agca atagcatca ca a attt ca caaataa a gcatttttttca ctg ca ttctagttgtggtttgtccaa a ctcatca atgtatcttatATGAGGTCTATG
GACTTCAAGAGCAACAGTGCTGTGGCCTGGAGCAACAAATCTGACTTTGCATGTGCAAACG
CCTTCAACAACAGCATTATTCCAGAAGACACCTTCTTCCCCAGCCCAGGTAAGGGCAG CUT
GGTGCCTTCGCAGG CTGTTTCCTTGCTTCAGGAATGG CCAGGTTCTGCCCAGAGCTCTGGTC
AATGATGTCTAAAACTCCTCTGATTGGTGGTCTCGGCCTTATCCATTGCCACCAAAACCCTCT
TTTTACTAAGAAACAGTGAGCCTTGTTCTGGCAGTCCAGAGAATGACACGGGAAAAAAGCA
GATGAAGAGAAGGTGGCAGGAGAGGGCACGTGGCCCAGCCTCAGTCTCTCCAACTGAGTT
180 CCTGCCTGCCTGCCTTTGCTCAGACTGTTTGCCCCTTACTGCTCTTCTAGGCCTCATTCTAAG C
CCCTT
pARBI- TTCATTTCCATTTGA GTTGTTCTTATTGAGTCATCCTTCCTGTGGTAGCG

702: GGCCCATCTGGACCCGAGGTATTGTGATGATAAATTCTGAGCACCTACCCCATCCCCAGAA
TRAC_6_ GG GCTCAGAAATAAAATAAGAGCCAAGTCTAGTCGG TGTTTCCTGTCTTGAAACACAATAC
Exogenou TGTTGGCCCTGGAAGAATGCACAGAATCTGTTTGTAAG GGGATATGCACAGAAG CTG CAA
s_84 GGGACAGGAGGTGCAGGAGCTGCAGGCCTCCCCCACCCAGCCTGCTCTGCCTTGGGGAAA
ACCGTGGGTGTGTCCTGCAGGCCATGCAGG CCTGGGACATGCAAGCCCATAACCGCTGTG
GCCTCTTGGTTTTACAGATACGAACCTAAACTTTCAAAACCTGTCAGTGATTGG GTTCCGAA
TCCTCCTCCTGAAAGTGGCCGGGtgacga ctgtgccttctagttgccagccatctgttgtttgcccctcccccgt gccttccttgaccctggaaggtgccactcccactgtcctttcctaataaaatgaggaaattgcatcgcattgtctgagt ag gtgtcattctattctggggggtggggtggggcagga cagcaagggggaggattgggaagacaatagcaggcatgctgg ggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACAT
CGCCCACAGTCCCCGAGAAGTTG GGGG GAGG GGTCGGCAATTGAACGGGTGCCTAGAGA
AG GTGGCGCG GGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCG CCTTTTTCCCGAG
GGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGT
TTGCCGCCAGAACACAG CTGAAGCTTCGAGGGGCTCGCATCTCTCCTTCACGCGCCCGCCG
CCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTG CCGCCTCCCGCCTGTGGTG
CCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAG GTCGAGACCGGGCCTTTGT
CCGGCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTG CT
TGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACC
GG CGCCTACTCTAGAGCTAG CGAATTgccgccaccATGGTGAGCAAG GGCGAGGAGCTGTTC
ACCGG GGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGC
GTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGC
ACCACCGG CAAGCTGCCCGTGCCCTG GCCCACCCTCGTGACCACCCTGACCTACGGCGTGC
AGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCC
GAAGGCTACGTCCAGGAG CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCG
CCGAG GTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACT
TCAAGGAGGACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACG
TCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAA
CATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGA
CG GCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGAC
CCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTC
TCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta caaataaagca atagcatcaca aatttca ca aataaagcatttttttcactgcattctagttgtggtttgtccaaa ctcatcaatgtatcttatTTTAATCTG
CTCATGACGCTGCGGCTGTGGTCCAG CTGAGGTGAGGGGCCTTGAAGCTGGGAGTGGGGT
TTAGGGACGCGGGTCTCTGGGTGCATCCTAAGCTCTGAGAGCAAACCTCCCTGCAGGGTCT
TGCTTTTAAGTCCAAAGCCTGAGCCCACCAAACTCTCCTACTTCTTCCTGTTACAAATTCCTCT
TGTGCAATAATAATGGCCTGAAACGCTGTAAAATATCCTCATTTCAGCCGCCTCAGTTGCAC
TTCTCCCCTATGAGGTAGG AAGAACAGTTGTTTAGAAACGAAGAAACTGAGGCCCCACAGC
TAATGAGTGGAGGAAGAGAGACACTTGTGTACACCACATGCCTTGTGTTGTACTTCTCTCAC
CGTGTAACCTCCTCATGTCCTCTCTCCCCAGTACG GCTCTCTTAGCTCAGTAGAAAGAAGAC
ATTACACTC
pARBI- CAGTCCATCACGAGCAG CTG GTTTCTAAGATG CTATTTCCCGTATAAAG

703: CTTGCCAGCCCCACAGAGCCCCGCCCTTGTCCATCACTGGCATCTGGACTCCAGCCTGGGTT
TRAC_Exo GGGGCAAAGAGGGAAATGAGATCATGTCCTAACCCTGATCCTCTTGTCCCACAGATATCCA
gen ous_8 GAACCCTGACCCTGCCGTGTACCAG CTGAGAGACTCTAAATCCAGTGACAAGTCTGTCTG C

CTATTCACCGATTTTGATTCTCAAACAAATGTGTCACAAAGTAAGGATTCTGATGTGTATATC
ACAGACAAAACTGTGCTAGACATGAGGTCTATGGACTTCAAGAGCAACAGTGCTGTGGCCT
GG AGCAACAAATCTGACTTTGCATGTGCAAACGCCTTCAACAACAG CATTATTCCAGAAGA
CACCTTCTTCCCCAGCCCAtgacgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttcc ttgaccctggaaggtgccactcccactgtcctttcctaata aaatgaggaaattgcatcgcattgtctgagtaggtgtcatt ctattctggggggtggggtggggcaggacagcaagggggaggattgggaagacaatagcaggcatgctggggatgcg gtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCA
CAGTCCCCGAGAAGTTGGGGG GAGG GGTCGGCAATTGAACGGGTGCCTAGAGAAGGTGG
181 CGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGG
GGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCG
CCAGAACACAGCTGAAGCTTCGAGGGG CTCGCATCTCTCCTTCACGCGCCCGCCGCCCTACC
TGAG GCCGCCATCCACG CCG GTTGAGTCG CGTTCTG CCGCCTCCCG CCTGTGGTGCCTCCTG
AACTG CGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCGGCG
CTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGCTCAA
CTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGCGCCT
ACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGG
TGGTGCCCATCCTGGTCGAG CTGGACGGCGACGTAAACG GCCACAAGTTCAGCGTGTCCG
GCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCG
GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTC
AG CCG CTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCG CCATGCCCGAAGG CT
ACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGT
GAAGTTCGAGGG CGACACCCTGGTGAACCGCATCGAG CTGAAGGGCATCGACTTCAAG GA
GGACGGCAACATCCTGGG GCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATC
ATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG
GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCC
GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAG CAAAGACCCCAACG
AGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCAT
GGACGAGCTGTACAAatgattgtttattgcagcttataatggttacaa ata a agcaatagc atca ca a atttca c a aata a agcatttttttca ctgcattctagttgtggtttgtcca a actcatca atgtatcttatGGTAAGG
GCAGCTTT
GGTGCCTTCGCAGG CTGTTTCCTTGCTTCAGGAATGG CCAGGTTCTGCCCAGAGCTCTGGTC
AATGATGTCTAAAACTCCTCTGATTGGTGGTCTCGGCCTTATCCATTGCCACCAAAACCCTCT
TTTTACTAAGAAACAGTGAGCCTTGTTCTGGCAGTCCAGAGAATGACACGGGAAAAAAGCA
GATGAAGAGAAGGTGGCAGGAGAGGGCACGTGGCCCAGCCTCAGTCTCTCCAACTGAGTT
CCTGCCTGCCTGCCTTTGCTCAGACTGTTTGCCCCTTACTGCTCTTCTAGGCCTCATTCTAAG C
CCCTTCTCCAAGTTGCCTCTCCTTATTTCTCCCIGTCTGCCAAAAAATCTITCCCAGCTCACTA
AGTCAGTCTCACGCAGTCACTCATTAACCCACCAATCACTGATTGTGCCGGCACATGAATG
pARBI- TGAGGAAGAAGCGCGGGCGGCGCCTTCGGGAGGCGAGCAGGCAGCAGTTGGCCGTGCCG 2541 704: TAG CAGCGTCCCGCGCG CG GCGG GCAGCGGCCCAGGAGGCG CGTGGCGGCG
CTCGGCCT
TRI M28_ CGCGGCGGCGGCGGCGGCAGCGGCCCAGCAGTTGG CGGCGAGCGCGTCTGCGCCTGCGC
1_Exogen GGCGGGCCCCGCGCCCCTCCTCCCCCCCTGGGCGCCCCCGGCGGCGTGTGAATGGCGGCCT
ous_85 CCGCGGCGGCAGCCTCGG
CAGCAGCGGCCTCGGCCGCCTCTGGCAGCCCGGGCCCGGGCG
AG GGCTCCGCTGGCGGCGAAAAGCGCTCCACCG CCCCTTCG G CCG CAG CCTCGG CCTCTGC
CTCAGCCGCGGCGTCGTCGCCCGCGGGGGGCGGCGCCGAGGCGCTGGAGCTGCTGGAGC
ACTGCGGCGTGTGCAGAGAGCGCCTGCGACCCtgacgactgtgccttctagttgccagccatctgttgttt gcccctcccccgtgccttccttgaccctgga aggtgccactcccactgtcctttccta ataaa atgaggaaattgcatcgc attgtctgagtaggtgtcattctattctggggggtggggtggggcaggacagcaagggggaggattgggaagacaatag caggcatgctggggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAG
AG CGCACATCG CCCACAGTCCCCGAGAAGTTG GGG G GAG G GGTCGG CAATTGAACG GGT
GCCTAGAGAAGGTGG CGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTG GCTCCGCCTT
TTTCCCGAGGGTGG GGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTC
GCAACGGGTTTGCCGCCAGAACACAGCTGAAGCTTCGAG GGGCTCGCATCTCTCCTTCACG
CGCCCGCCGCCCTACCTGAGGCCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCG
CCTGTGGTGCCTCCTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAG GTCGAGACC
GGGCCTTTGTCCGG CGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACG CTTTG CC
TGACCCTGCTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAA
GCTGTGACCGGCGCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAG
GAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCAC
AAGTTCAG CGTGTCCGG CGAGGG CGAG G GCGATGCCACCTACG G CAAGCTGACCCTGAAG
TTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTA
CG GCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCC
GCCATGCCCGAAGGCTACGTCCAG GAG CGCACCATCTTCTTCAAGGACGACGGCAACTACA
AGACCCGCGCCGAG GTGAAGTTCGAG GG CGACACCCTG GTGAACCGCATCGAGCTGAAGG
GCATCGACTTCAAG GAG GACGGCAACATCCTGG GGCACAAGCTGGAGTACAACTACAACA
182 GCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGAT
CCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCC
ATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGA
GCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCG
GGATCACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agc aatagcatcaca a a tttca ca a ata a agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatctta tGAGAGGGAGCCCCGCCTGCTGCCCTGTTTGCACTCGGCCTGTAGTGCCTGCTTAGGGCCC
GCGGCCCCCGCCGCCGCCAACAGCTCGGGGGACGGCGGGGCGGCGGGCGACGGCACCGG
TAAGTACGAAGTGATCG GTG CCACCCCTCCCCCTACTCTCTGCCTTTGATTCCGACTGGGTG
CAGAGATGAGGATGCCACCTGGGCGAGAGGATGGGGGCCCGGACAGGGCACGGGAAATA
CTTTCTGGGTCCTGCATACGAACGTGGGTTTGTGCTGGCCGCTGAGATGGGACATCTGACT
AAAGTTGGAGAAAAGAAGGCTCGGG GAGG GGAGGGGCTGGTTCGCTGCGGGATAATGGT
CGGGGG CCCACCCAGCAGGGGAATG GTG GGGG CCATAACCTGGGTGGGAACTTGTAACA
GTCTCCCACATCCCTGCTTCTCGAAGTGGTG
pARB I- TCGAAGTG GTGGACTGTCCCGTGTG CAAGCAACAGTGCTTCTCCAAAGACATCGTG

705: TTATTTCATGCGTGATAGTGGCAGCAAGGCTGCCACCGACG
CCCAGGATGCGAACCAGGTG
TRI M28_ CGTCCTATCTCAGCAACCACAAGGAGGTTTCTGGGGAGGGGGCATCTGCGCAGGAGGAGC
2_Exogen TTGGCACCAGCTCCAGGCTGTTACTCCACTTTCCCAAGGCTCTGGGTGGGCTGCCTAGGTTG
ous 86 GGTCAAGGGACCAATCTTAAATCTCCGGTTGTATTTTCTGGGATGTAAACGTGGATCTATCA
AGTTGTCTTG CCTTCTCTGACCCTGCCTTTGTCTG G CA GTG CTG CACTAGCTGTGAGGATAA
TGCCCCAGCCACCAGCTACTGTGTGGAGTGCTCGGAGCCTCTGTGTGAGACCTGTGTAGAG
GCGCACCAGCGGGTGAAGTACtgacga ctgtgccttctagttgccag cc atctgttgtttgcccctcc cc cgtg c cttccttgaccctgga a ggtgccactcccactgtcctttccta ata a a atgagga a attgcatcgcattgtctgagtaggtg tca ttctattctggggggtggggtggggcagga cagca agggggaggattggga ag a ca atagcaggca tgctgggga tgcggtgggctctatggGGATCTGCGATCG CTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGC
CCACAGTCCCCGAGAAGTTG GGG G GAG G GGTCGGCAATTGAACGGGTGCCTAGAGAAGG
TGGCGCGGG GTAAACTGG GAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTG
GG GGAGAACCGTATATAAGTG CAGTAGTCG CCGTGAACGTTCTTTTTCGCAACGGGTTTGC
CGCCAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGCGCCCGCCGCCCT
ACCTGAGGCCGCCATCCACGCCG GTTGAGTCGCGTTCTGCCGCCTCCCGCCTGTGGTGCCTC
CTGAACTGCGTCCGCCGTCTAGGTAAGTTTAAAGCTCAGGTCGAGACCGGGCCTTTGTCCG
GCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCTGACCCTGCTTGC
TCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAAGCTGTGACCGGC
GCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAG CAAGGGCGAGGAGCTGTTCACC
GG GGTG GTG CCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTG
TCCGGCGAGGGCGAG GGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACC
ACCGG CAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT
GCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAA
GGCTACGTCCAGGAGCG CACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCG CCG
AG GTGAAGTTCGAG G GCGACACCCTG GTGAACCGCATCGAG CTGAAGGG CATCGACTTCA
AG GAG GACG GCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCT
ATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACAT
CGAGGACGG CAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGG
CCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCC
AACGAGAAGCG CGATCACATG GTCCTG CTGGAGTTCGTGACCGCCGCCGG GATCACTCTCG
GCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agca atagcatcacaaatt tca caaataaagcatttttttcactgcattctagttgtggtttgtccaaa ctcatcaatgtatcttatACCAAGGACCA
TACTGTG CG CTCTACTGGTACATGAG GCTGAGGGGGGCTGTIGGAGTTGTICTCCCATGTG
TGCCCTCAGTTGCTTTTATGATGTTGGTTGCATCTGGTGGATGGGTCCTAGAGTTCTCTAGG
GG GTGCCGCCCGAAG GGCCCGAGGGCAGAACTCCAGAAGCAGAAAAACTGGGGTTGTGG
TGTGTAGTCTCTAGGGCCTAGGTGGGAGAGGGTGGGAAGGG GAAATAAGGGAGACCTTA
ATGTGCTGCAG GGAGGTACAAAGGGTTTGAGAAGGCTTATCAGGGAGTTGTCAGACCTGG
TGGTGAAGGGCTCAGCATATGCAAACAGGGAAAGGCATGGTGTGAAG GGCTTTCTGGGTT
TGTGGTTCCCTGGCACACATCTGGTATAAGATGCTGCTAG GGAAAGTAGACTGTGGGTCCA
TGGATTATAAGTGATGA
183 pARBI- GGCGCCCAATGCGCGTGCGCGGCGGCGTCGG

706: TCGGCTCTTTCTGCGAGCG
GGCGCGCGGGCGAGCGGTTGTGCTTGTGCTTGTGGCGCGTG
TRI M28_ GTGCGGGTTTCGGCGGCGGCTGAGGAAGAAGCGCGGGCGGCGCCTTCGGGAGGCGAGCA
3_Exogen GGCAGCAGTTGGCCGTGCCGTAGCAGCGTCCCGCGCGCGGCGGGCAGCGGCCCAGGAGG
ous_87 CGCGTGGCGGCGCTCGGCCTCGCGGCGGCGG CGGCGGCAGCGG CCCAG
CAGTTGGCG GC
GAGCGCGTCTGCGCCTGCGCGGCGGGCCCCGCGCCCCTCCTCCCCCCCTG GGCGCCCCCGG
CG GCGTGTGAATGGCGGCCTCCGCGGCGGCAGCCTCGG CAGCAGCGG CCTCGGCCGCCTC
TGGCAGCCCGGGCCCGGGCGAGGGCTCCGCTtga cga ctgtgccttctagttgccagccatctgttgtttg cccctcccccgtg ccttccttga ccctgga a ggtgcca ctccca ctgtcctttccta ata a aatgagga a attgcatcgcat tgtctga gtaggtgtca ttctattctggggggtggggtggggcaggacagca agggggaggattggga ag a ca atagc aggcatgctggggatgcggtgggctctatggGGATCTGCGATCGCTCCGGTGCCCGTCAGTGGGCAGA
GCGCACATCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGG GTCGGCAATTGAACGGGTG
CCTAGAGAAGGTGGCGCGGGGTAAACTG GGAAAGTGATGTCGTGTACTGGCTCCGCCTTT
TTCCCGAGGGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCG
CAACGGGTTTG CCGCCAGAACACAGCTGAAGCTTCGAGG GGCTCGCATCTCTCCTTCACGC
GCCCGCCGCCCTACCTGAGG CCGCCATCCACGCCGGTTGAGTCGCGTTCTGCCGCCTCCCGC
CTGTGGTGCCTCCTGAACTGCGTCCGCCGTCTAG GTAAGTTTAAAGCTCAGGTCGAGACCG
GGCCTTTGTCCGGCGCTCCCTTGGAGCCTACCTAGACTCAGCCGGCTCTCCACGCTTTGCCT
GACCCTGCTTGCTCAACTCTACGTCTTTGTTTCGTTTTCTGTTCTGCGCCGTTACAGATCCAA
GCTGTGACCGGCGCCTACTCTAGAGCTAGCGAATTgccgccaccATGGTGAGCAAGGGCGAG
GAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCAC
AAGTTCAG CGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAG
TTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTA
CG GCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCC
GCCATGCCCGAAGGCTACGTCCAG GAG CGCACCATCTTCTTCAAGGACGACGGCAACTACA
AGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGG
GCATCGACTTCAAG GAG GACGGCAACATCCTGG GGCACAAGCTGGAGTACAACTACAACA
GCCACAACGTCTATATCATGG CCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGAT
CCGCCACAACATCGAGGACGGCAG CGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCC
ATCGG CGACGG CCCCGTGCTGCTGCCCGACAACCACTACCTGAG CACCCAGTCCAAGCTGA
GCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTG CTGG AGTTCGTGACCGCCGCCG
GGATCACTCTCGGCATGGACGAGCTGTACAAatgattgtttattgcagcttataatggtta ca a ata a agc aatagcatcaca a a tttca ca a ata a agcatttttttca ctgcattctagttgtggtttgtcca a a ctcatca atgtatctta tGGCGGCGAAAAGCGCTCCACCGCCCCTTCGGCCGCAGCCTCGGCCTCTGCCTCAGCCGCG
GCGTCGTCGCCCGCGGGG GGCGGCG CCGAGGCGCTGGAGCTGCTGGAGCACTGCGGCGT
GTGCAGAGAGCGCCTGCGACCCGAGAGG GAGCCCCGCCTGCTGCCCTGTTTGCACTCGGC
CTGTAGTGCCTGCTTAGGGCCCGCGGCCCCCGCCG CCGCCAACAGCTCGGGGGACGGCGG
GGCGGCGGGCGACGGCACCGGTAAGTACGAAGTGATCG GTG CCACCCCTCCCCCTACTCTC
TG CCTTTGATTCCGACTG G GTG CAGAGATGAG GATGCCACCTGGGCGAGAGGATGG GGGC
CCG GACAG GGCACGG GAAATACTTTCTGGGTCCTGCATACGAACGTGGGTTTGTGCTGGCC
GCTGAGATGGGACATCTGACTAAAGTTGG
184

Claims (148)

T/US2021/056689
1. An engineered cell, comprising at least one sequence encoding a transgene, wherein the at least one sequence is inserted within a safe harbor locus, the safe harbor locus is at any one or more of an sgRNA target loci provided in Table 4; and wherein expression of the at least one sequence encoding the transgene is operatively linked to an endogenous promoter.
2. An engineered cell, comprising at least one sequence encoding a transgene, wherein the at least one sequence is inserted within a safe harbor locus, the safe harbor locus is at any one or more of an sgRNA target loci provided in Table 4; and wherein expression of the at least one sequence encoding the transgene is operatively linked to an exogenous promoter.
3. The engineered cell of claim 1 or 2, wherein the sgRNA target locus is selected from:
chrl 1:128340000-128350000, chr10:33130000-33140000, chr10:72290000-72300000, chrl 1:65425000-65427000 (NEAT1), chr15:92830000-92840000, chr16:11220000-11230000, chr2:87460000-87470000, chr3:186510000-186520000, chr3:59450000-59460000, chr8:127980000-128000000, or chr9:7970000-7980000.
4. The engineered cell of any one of claims 1-3, wherein the sgRNA target locus is selected from: chr11:128340000-128350000, chr10:72290000-72300000, chr15:92830000-92840000, or chr16:11220000-11230000.
5. The engineered cell of any one of claims 1-4, wherein the sgRNA target locus is chr 1 1 :128340000-128350000.
6. The engineered cell of any one of claims 1-4, wherein the sgRNA target locus is chrl 5:92830000-92840000.
7. The engineered cell of claim 1 or 2, wherein the sgRNA target locus is a gene selected from: APRT, B2M, CAPÄTS'I, CBLB, CD2, CD3E, CD3G, CDS, EDF1, FTL, PTEN, PTPN2, PTPN6, PTPRC, PTPRCAP, RPS23, RTRAF, SERF2, SLC38A 1, SMAD2, SOCS 1, SRP 14, SRSF9 , SUB 1, TET2, TIGIT, TRAC , or TRIM28
8. The engineered cell of any one of claims 1-3, wherein the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS94, GS88, GS89, GS90, GS91, GS92, GS93, GS95, GS96, GS97, GS98, GS99, GS100, GS101, GS102, GS103, GS104, GS105, GS106, GS107, GS108, GS109, GS110, GS111, GS112, GS113, GS114, GS115, GS116, GS117, GS118, GS119, or GS120.
9. The engineered cell of any one of claims 1-4, wherein the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS94, GS91, G592, G593, G595, GS96, GS100, GS101, GS102, GS103, GS104, and GS105.
10. The engineered cell of claim 8 or 9, wherein the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS103, GS104, or GS105.
11. The engineered cell of claim 8 or 9, wherein the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS94, GS95, or GS96.
12. The engineered cell of claim 8 or 9, wherein the safe harbor locus is an G594 integration site in Table 4.
13. The engineered cell of claim 8 or 9, wherein the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS100, GS101, orGS102.
14. The engineered cell of claim 8 or 9, wherein the safe harbor locus is an G5102 integration site in Table 4.
15. The engineered cell of claim 8 or 9, wherein the safe harbor locus is selected from any one of the integration sites in Table 4 designated: G591, G592, or GS93.
16. The engineered cell of any one of claims 2-15, wherein the exogenous promoter is an EFla promoter.
17. The engineered cell of any one of claims 1-16, wherein the engineered cell is a stem cell, a human cell, a primary cell, an hematopoietic cell, an adaptive immune cell, an innate immune cell, a T cell or a T cell progenitor.
18. The engineered cell of claim 17, wherein the cell is a T cell or a T
cell progenitor.
19. The engineered cell of any one of claims 1-18, wherein the engineered cell is undifferentiated.
20. The engineered cell of any one of claims 1-18, wherein the engineered cell is CD45RA+ and CCR7'.
21. The engineered cell of any one of claims 1-20, wherein the transgene encodes a recombinant protein, optionally a therapeutic agent.
22. The engineered cell of any one of claims 1-21, wherein the transgene encodes a chimeric antigen receptor (CAR).
23. A composition comprising the engineered cell of any one of claims 1-22 and a pharmaceutical excipient.
24. A guide ribonucleic acids (gRNA) for editing a cell at a safe harbor locus, wherein gRNA comprises any one of the sgRNA sequences in Table 4.
25. The gRNA of claim 24, wherein the gRNA comprises any one of SEQ ID
NOS:1-120.
26. The gRNA of claim 24 or 25, wherein the gRNA comprises any one of SEQ
ID NOS:
91-96 and 100-105.
27. The gRNA of any one of claims 24-26, wherein the gRNA comprises SEQ ID
NO:94 or SEQ ID NO:102.
28. The gRNA of any one of claims 24-26, wherein the gRNA comprises SEQ ID
NO:94.
29. The gRNA of any one of claims 24-26, wherein the gRNA comprises SEQ ID
NO:102.
30. The gRNA of any one of claims 24-29, wherein the cell is a stem cell, a human cell, a primary cell, an hematopoietic cell, an adaptive immune cell, an innate immune cell, a T cell or a T cell progenitor.
31. A method of editing a cell having chromosomal DNA, comprising inserting at least one sequence encoding a transgene within a safe harbor locus in the chromosomal DNA of the cell, wherein the safe harbor locus is any one or more of the sgRNA target loci provided in Table 4.
32. The method of claim 31, wherein the sgRNA target locus is selected from:
chr11:128340000-128350000, chr10:33130000-33140000, chr10:72290000-72300000, chrl 1:65425000-65427000 (NEAT1), chr15:92830000-92840000, chr16:11220000-11230000, chr2:87460000-87470000, chr3:186510000-186520000, chr3:59450000-59460000, chr8:127980000-128000000, or chr9:7970000-7980000.
33. The method of claim 31 or 32, wherein the sgRNA target locus is selected from:
chrl 1:128340000-128350000, chrl 0:72290000-72300000, chrl 5:92830000-92840000, or chr16:11220000-11230000.
34. The method of any one of claims 31-33, wherein the sgRNA target locus is chr11:128340000-128350000.
35. The method of any one of claims 31-33, wherein the sgRNA target locus is chrl 5:92830000-92840000.
36. The method of claim 31, wherein the sgRNA target locus is a gene selected from:
APRT, B2M, CAPNS1, CBLB, CD2, CD3E, CD3G, CDS, EDF1, FTL, PTEN, PTPN2, PTPN6, PTPRC, PTPRCAP, RPS23, RTRAF, SERF2, SLC38A 1, SMAD2, SOCS 1, SRP 14, SRSF9, SUB 1, TET2, TIGIT, TRAC, or TRIM28 .
37. The method of claim 31 or 32, wherein the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS94, GS88, GS89, GS90, GS91, GS92, GS93, G595, G596, G597, G598, G599, GS100, GS101, GS102, G5103, G5104, G5105, GS106, GS107, GS108, GS109, GS110, GS111, GS112, GS113, GS114, GS115, GS116, GS117, G5118, GS119, or GS120.
38. The method of any one of claims 31-33, wherein the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS94, GS91, G592, G593, G595, GS96, GS100, GS101, GS102, G5103, GS104, or GS105.
39. The method of any one of claims 31-33, wherein the safe harbor locus is selected from any one of the integration sites in Table 4 designated: G5103, GS104, or GS105.
40. The method of any one of claims 31-33, wherein the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS94, GS95, or GS96.
41. The method of any one of claims 31-33, wherein the safe harbor locus is the GS94 integration site in Table 4.
42. The method of any one of claims 31-33, wherein the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS100, GS101, or GS102.
43. The method of any one of claims 31-33, wherein the safe harbor locus is the GS102 integration site in Table 4.
44. The method of any one of claims 31-33, wherein the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS91, GS92, or GS93.
45. The method of any one of claims 31-44, wherein the transgene encodes a recombinant protein, optionally a therapeutic agent.
46. The method of any one of claims 31-45, wherein the transgene encodes a chimeric antigen receptor (CAR).
47. The method of any one of claims 31-46, wherein the at least one sequence comprises an exogenous promoter and the exogenous piomoter is operably linked to the transgene.
48. The method of claim 47, wherein the exogenous promoter is an EFla promoter.
49. The method of any one of claims 31-48, wherein the cell is a stem cell, a human cell, a primary cell, an hematopoietic cell, an adaptive immune cell, an innate immune cell, a T
cell or T cell progenitor.
50. The method of claim 49, wherein the cell is a T cell or a T cell progenitor.
51. The method of any one of claims 31-50, wherein the engineered cell is undifferentiated.
52. The method of any one of claims 31-51, wherein the engineered cell is CD45RA+ and CCR7 .
53. The method of any one of claims 31-52, wherein the at least one sequence is inserted using a homology-directed repair.
54. The method of any one of claims 31-52, wherein the at least one sequence is inserted using a homology independent targeted insertion.
55. The method of any one of claims 31-54, wherein the at least one sequence is inserted using one or more guide ribonucleic acids (gRNAs) and one or more Cas9 endonucleases.
56. The method of claim 55, wherein the one or more gRNAs comprises any one of SEQ
ID NOS: 1-120.
57. The method of claim 55 or 56, wherein the one or more gRNAs comprises any one of SEQ ID NOS: 91-96 and 100-105.
58. The method of any one of claims 55-57, wherein the gRNA comprises SEQ
ID NO:94 or SEQ ID NO:102.
59. The method of any one of claims 55-58, wherein the gRNA comprises SEQ
ID
NO:94.
60. The method of any one of claims 55-58, wherein the gRNA comprises SEQ
ID
NO:102.
61. A method of editing a T cell, comprising contacting a T cell with one or more guide ribonucleic acids (gRNAs), at least one sequence encoding a transgene, and one or more Cas9 endonucleases, wherein the one or more gRNAs and Cas9 endonucleases facilitate the insertion of the at least one sequence into chromosomal DNA within a safe harbor locus, wherein the safe harbor locus is selected from any one or more of an sgRNA
target loci in Table 4.
62. The method of claim 61, wherein the one or more gRNAs comprises a sequence selected from any one of the sgRNA sequences in Table 4.
63. The method of claim 61 or 62, wherein the one or more gRNAs comprises any one of SEQ ID NOS: 1-120.
64. The method of any one of claims 61-63, wherein the one or more gRNAs comprises any one of SEQ ID NOS: 91-96 and 100-105.
65. The method of any one of claims 61-64, wherein the gRNA comprises SEQ
ID NO:94 or SEQ ID NO:102.
66. The method of any one of claims 61-65, wherein the gRNA comprises SEQ
ID
NO:94.
67. The method of any one of claims 61-65, wherein the gRNA comprises SEQ
ID
NO:102.
68. The method of any one of claims 61-67, wherein the sgRNA target locus is selected from: chr11:128340000-128350000, chr10:33130000-33140000, chr10:72290000-72300000, chrl 1:65425000-65427000 (NEAT1), chr15:92830000-92840000, chr16:11220000-11230000, chr2:87460000-87470000, chr3:186510000-186520000, chr3:59450000-59460000, chr8:127980000-128000000, or chr9:7970000-7980000.
69. The method of any one of claims 61-68, wherein the sgRNA target locus is selected from: chr11:128340000-128350000, chr10:72290000-72300000, chr15:92830000-92840000, or chr16:11220000-11230000.
70. The method of any one of claims 61-69, wherein the sgRNA target locus is chrl 1:128340000-128350000.
71. The method of any one of claims 61-69, wherein the sgRNA target locus is chr 1 5:92830000-92840000.
72. The method of claim 61-67, wherein the sgRNA target locus is a gene selected from:
APRT, B2111, CAPNS I , CBLB, CD2, CD3E, CD3G, CDS, EDFI, FTL, PTEIV, PTPN2, PTPN6, PTPRC, PTPRCAP, RPS23, RTRAF, SERF2 , SLC38A 1, SM4D2, SOCS 1, SRP 14, SRSF9, SUB] , TET2, TIGIT, TRAC, or TRIM28.
73. The method of any one of claims 61-68, wherein the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS94, GS88, GS89, GS90, GS91, GS92, G593, G595, G596, G597, G598, G599, GS100, GS101, GS102, G5103, G5104, GS105, GS106, GS107, GS108, GS109, GS110, GS111, GS112, GS113, GS114, GS115, GS116, GS117, GS118, GS119, or GS120.
74. The method of any one of claims 61-68 and 73, wherein the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS94, GS91, GS92, GS93, G595, GS96, GS100, GS101, GS102, GS103, GS104, or GS105.
75. The method of any one of claims 61-68 and 73-74, wherein the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS103, GS104, or GS105.
76. The method of any one of claims 61-68 and 73-74, wherein the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS94, GS95, and GS96.
77. The method of claim 76, wherein the safe harbor locus is the GS94 integration site in Table 4.
78. The method of any one of claims 61-68 and 73-74, wherein the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS100, GS101, or GS102.
79. The method of claim 78, wherein the safe harbor locus is the GS102 integration site in Table 4.
80. The method of any one of claims 61-68 and 73-74, wherein the safe harbor locus is selected from any one of the integration sites in Table 4 designated: GS91, GS92, or GS93.
81. The method of any one of claims 61-80, wherein the engineered cell is undifferentiated.
82. The method of any one of claims 61-81, wherein the engineered cell is CD45RA+ and CCR7 .
83. An ex vivo method of obtaining an engineered cell or population thereof, comprising:
a. obtaining a cell;
b. genetically modifying the cell by inserting at least one sequence encoding a transgene within a safe harbor locus, wherein the safe harbor locus is selected from any one of an sgRNA target loci in Table 4.
84. The method of claim 83, wherein obtaining the cell comprises: (i) collecting a tissue sample from a subject, (ii) isolating the cells from the tissue samples, and (iii) culturing the cells in vitro.
85. The method of claim 84, wherein the tissue sample is a blood sample.
86. The method of any one of claims 83-85, wherein the cell is a stem cell, a human cell, a primary cell, an hematopoietic cell, an adaptive immune cell, an innate immune cell, a T
cell, or T cell progenitor.
87. The method of claim 86, wherein the cell is a T cell or a T cell progenitor.
88. The method of any one of claims 83-87, wherein the engineered cell is undifferentiated.
89. The method of any one of claims 83-88, wherein the engineered cell is CD45RA and CCR7 .
90. The method of any one of claims 83-89, wherein the at least one sequence is inserted using a homology-directed repair.
91. The method of any one of claims 83-89, wherein the at least one sequence is inserted using a homology independent targeted insertion.
92. The method of any one of claims 83-91, wherein the genetically modifying in step (b) comprises contacting the cell with one or more guide ribonucleic acids (gRNAs), the at least one sequence, and one or more Cas9 endonucleases, wherein the one or more gRNAs and Cas9 endonucleases facilitate the insertion of the at least one sequence into chromosomal DNA within the safe harbor locus.
93. The method of claim 92, wherein the one or more gRNAs comprises a sequence selected from any one of the sgRNA sequences in Table 4.
94. The method of any one of claims 83-93, wherein the transgene encodes a recombinant protein, optionally a therapeutic agent.
95. The method of any one of claims 83-93, wherein the transgene encodes a chimeric antigen receptor (CAR).
96. The method of claim 83-95, wherein the at least one sequence comprises an exogenous promoter and the exogenous promoter is operably linked to the transgene.
97. The method of claim 96, wherein the exogenous promoter is an EFla promoter.
98. A method of treating a subject having or at risk of having a disease, comprising administering to the subject an effective amount of the cell of any one of claims 1-22, a population thereof, or the composition of claim 23.
99. The method of claim 98, wherein the cell, the population thereof, or the composition is administered to the subject by infusion.
100. A method of treating a subject having or at risk of having a disease, comprising:
a. conducting the method of any one of claims 83-97, and b. administering to the subject an effective amount of a composition comprising the cell or a population thereof
101. The method of claim 100, wherein the composition is administered to the subject by infusion.
102. The method of claim 100 or 101, wherein the disease is cancer.
103. The method of any one of claims 100-102 or, , wherein the disease is blood cancer.
104. A method of identifying a safe harbor locus, comprising:
a. identifying genes or non-coding regions in a chromosome that are above a threshold level for expression across developmental cell states and/or a threshold level for accessibility of chromatin;

b. generating a linear model that correlates the gene or non-coding region from step (a) with knock-in (KI) efficiency and estimates the KI efficiency of any gene or coding region on the chromosome; and c. selecting the safe harbor locus based on threshold parameters;
wherein the safe harbor locus is selected for insertion of at least one sequence encoding a transgene within a cell.
105. The method of claim 104, wherein the threshold parameters include one or more of:
stable expression of a transgene, knockout of the gene confers benefit to the function of the cell, no known function within the cell, stable transgene expression in vitro with or without CD3/CD28 stimulation, negligible off-target cleavage as detected by iGuide-Seq or CRISPR-Seq, less off-target cleavage relative to other loci as detected by iGuide-Seq or CRISPR-Seq, negligible transgene-independent cytotoxicity, negligible transgene-independent cytokine expression, negligible transgene-independent chimeric antigen receptor expression, negligible deregulation or silencing of nearby genes, and positioned outside of a cancer-related gene.
106. The method of claim 105, wherein the stable expression of a transgene at the safe harbor locus is less than or equal to 2-fold expression change over the course of at least 1, 2, 3, 4, 5, 6, or 7 days, and wherein expression change is measured by mean fluorescence intensity of a reporter gene encoded by the at least one sequence.
107. The method of any one of claims 104-106, wherein the accessibility of chromatin is measured using an assay for transposase-accessible chromatin using sequencing (ATAC-seq).
108. The method of any one of claims 104-107, wherein the level of expression across developmental cell states is measured using RNA sequencing (RNA-seq).
109. The method of any one of claims 104-108, wherein the cell is a stem cell, a human cell, a primary cell, an hematopoietic cell, an adaptive immune cell, an innate immune cell, a T cell or T cell progenitor.
110. The method of any one of claims 104-109, wherein the linear model has a coefficient of determination (R2 value) of at least 30%.
111. The engineered cell, composition, gRNA or method of any one of the preceding claims, wherein insertion within the safe harbor locus increase cell cytotoxicity of diseased cells.
112. The engineered cell, composition, gRNA or method of any one of the preceding claims wherein knock-in efficiency at the safe harbor locus is increased relative to other locations along the chromosome.
113. An engineered cell, comprising at least one sequence encoding a transgene, wherein the at least one sequence is inserted within a safe harbor locus, wherein the safe harbor locus is at any one or more of an sgRNA target loci; and wherein expression of the at least one sequence encoding the transgene is operatively linked to an endogenous promoter or an exogenous promoter, and wherein the engineered cell is undifferentiated.
114. The engineered cell of claim 113, wherein the safe harbor locus is selected any one of the integration sites designated: GS94, G588, G589, G590, G591, G592, G593, G595, GS96, GS97, GS98, GS99, GS100, GS101, GS102, GS103, GS104, GS105, GS106, GS107, GS108, GS109, GS110, GS111, GS112, GS113, GS114, GS115, GS116, GS117, GS118, GS119, or GS120.
115. The engineered cell of claim 113 or 114, wherein the safe harbor locus is the G594 integration site.
116. The engineered cell of claim 113, wherein the sgRNA target locus is selected from:
chrl 1:128340000-128350000, chrl 0:33130000-33140000, chrl 0:72290000-72300000, chrl 1:65425000-65427000 (NEAT1), chrl 5:92830000-92840000, chrl 6:11220000-11230000, chr2:87460000-87470000, chr3:186510000-186520000, chr3:59450000-59460000, chr8:127980000-128000000, or chr9:7970000-7980000.
117. The engineered cell of claim 113, wherein the sgRNA target locus is a gene selected from: APRT, B2M, CAPNS I , CBLB, CD2, CD3E, CD3G, CDS, EDF I, FTL, PTEN, PTPN2, PTPN6, PTPRC, PTPRCAP, RPS23, RTRAF, SERF2, SLC38A 1, SMAD2, SOCS I , SRP 14, SRS179, SUB], TET2, TIGIT, TRAC, or TRIM28 .
118. The engineered cell of any one of claims 113-117, wherein the one or more gRNAs comprises any one of SEQ ID NOS: 1-120.
119. The engineered cell of any one of claims 113-118, wherein the engineered cell is a stem cell, a human cell, a primary cell, an hematopoietic cell, an adaptive immune cell, an innate immune cell, a T cell or a T cell progenitor.
120. The engineered cell of claim 119, wherein the cell is a T cell or a T
cell T cell progenitor.
121. The engineered cell of any one of claims 113-120, wherein the engineered cell is CD45RA and CCR7 .
122. The engineered cell of any one of claims 113-121, wherein the transgene encodes a recombinant protein, optionally a therapeutic agent.
123. The engineered cell of any one of claims 113-122, wherein the transgene encodes a chimeric antigen receptor (CAR).
124. A composition comprising the engineered cell of any one of claims 113-123 and a pharmaceutical excipient.
125. A method of editing a cell having chromosomal DNA, comprising inserting at least one sequence encoding a transgene within a safe harbor locus in the chromosomal DNA of the cell, wherein the safe harbor locus is at any one or more of an sgRNA
target loci; and wherein the engineered cell is undifferentiated.
126. The method of claim 125, wherein the engineered cell is a stem cell, a human cell, a primary cell, an hematopoietic cell, an adaptive immune cell, an innate immune cell, a T cell or a T cell progenitor.
127. The method of claim 125 or 126, wherein the cell is a T cell or a T cell progenitor.
128. A method of editing a T cell, comprising contacting a T cell with one or more guide ribonucleic acids (gRNAs), at least one sequence encoding a transgene, and one or more Cas9 endonucleases, wherein the one or more gRNAs and Cas9 endonucleases facilitate the insertion of the at least one sequence into chromosomal DNA within a safe harbor locus.
129. The method of any one of claims 125-128, wherein the safe harbor locus is selected from any one of the integration sites designated: GS88, GS89, GS90, GS91, G592, G593, GS94, GS95, GS96, GS97, GS98, GS99, GS100, GS101, GS102, GS103, G5104, G5105, GS106, GS107, GS108, GS109, GS110, GS111, GS112, GS113, GS114, GS115, GS116, GS117, GS118, GS119, or GS120.
130. The method of any one of claims 125-128, wherein the safe harbor locus is the GS94 integration site.
131. The method of any one of claims 125-128, wherein the sgRNA target locus is selected from: chr11:128340000-128350000, chr10:33130000-33140000, chr10:72290000-72300000, chrl 1:65425000-65427000 (NEAT1), chr15:92830000-92840000, chr16:11220000-11230000, chr2:87460000-87470000, chr3:186510000-186520000, chr3:59450000-59460000, chr8:127980000-128000000, or chr9:7970000-7980000.
132. The method of any one of claims 125-128, wherein the sgRNA target locus is a gene selected from: APRT, B2M, CAPNS1, CBLB, CD2, CD3E, CD3G, CDS, EDF1, FTL, PTEN, PTPN2, PTPN6, PTPRC, PTPRCAP, RPS23, RTRAF, SERF2, SLC38A1, SMAD2, SOCS1, SRP 14, SRSF9, SUB] , TET2, TIGIT, TRAC, or TRIM28.
133. The method of any one of claims 125-132, wherein the one or more gRNAs comprises any one of SEQ ID NOS: 1-120.
134. The method of any one of claims 125-133, wherein the engineered cell is CD45RA+
and CCR7+ after insertion of the at least one sequence into the safe harbor locus.
135. The method of any one of claims 125-134, wherein the transgene encodes a recombinant protein, optionally a therapeutic agent.
136. The method of any one of claims 125-135, wherein the transgene encodes a chimeric antigen receptor (CAR).
137. An ex vivo method of obtaining an undifferentiated engineered cell or population thereof, comprising:
c. obtaining a cell;
d. genetically modifying the cell by inserting at least one sequence encoding a transgene within a safe harbor locus, wherein the engineered cell is undifferentiated.
138. The method of claim 137, wherein obtaining the cell comprises: (i) collecting a tissue sample from a subject, (ii) isolating the cells from the tissue samples, and (iii) culturing the cells in vitro.
139. The method of claim 138, wherein the tissue sample is a blood sample
140. The method of any one of claims 137-139, wherein the cell is a stem cell, a human cell, a primary cell, an hematopoietic cell, an adaptive immune cell, an innate immune cell, a T cell or T cell progenitor.
141. The method of claim 140, wherein the cell is a T cell or a T cell progenitor.
142. The method of any one of claims 137-142, wherein the engineered cell is CD45RA+
and CCR7+.
143. The method of any one of claims 137-143, wherein the transgene encodes a recombinant protein, optionally a therapeutic agent.
144. The method of any one of claims 137-143, wherein the transgene encodes a chimeric antigen receptor (CAR).
145. A method of treating a subject having or at risk of having a disease, comprising administering to the subject an effective amount of the cell of any one of claims 113-123, a population thereof, or the composition of claim 124.
146. A method of treating a subject having or at risk of having a disease, comprising:
c. conducting the method of any one of claims 125-144, and d. administering to the subject an effective amount of a composition comprising the cell or a population thereof
147. The method of claim 146, wherein the composition is administered to the subject by infusion.
148. The method of claim 146 or 147, wherein the disease is cancer.
CA3196269A 2020-10-26 2021-10-26 Safe harbor loci Pending CA3196269A1 (en)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
US202063105834P 2020-10-26 2020-10-26
US63/105,834 2020-10-26
US202163141926P 2021-01-26 2021-01-26
US63/141,926 2021-01-26
US202163179143P 2021-04-23 2021-04-23
US63/179,143 2021-04-23
PCT/US2021/056689 WO2022093846A1 (en) 2020-10-26 2021-10-26 Safe harbor loci

Publications (1)

Publication Number Publication Date
CA3196269A1 true CA3196269A1 (en) 2022-05-05

Family

ID=81383221

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3196269A Pending CA3196269A1 (en) 2020-10-26 2021-10-26 Safe harbor loci

Country Status (14)

Country Link
EP (1) EP4232049A1 (en)
JP (1) JP2023547887A (en)
KR (1) KR20230101839A (en)
AU (1) AU2021369494A1 (en)
BR (1) BR112023007874A2 (en)
CA (1) CA3196269A1 (en)
CL (1) CL2023001176A1 (en)
CO (1) CO2023006809A2 (en)
CR (1) CR20230220A (en)
DO (1) DOP2023000080A (en)
IL (1) IL302315A (en)
MX (1) MX2023004822A (en)
PE (1) PE20231514A1 (en)
WO (1) WO2022093846A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20240099259A (en) 2021-10-14 2024-06-28 아스널 바이오사이언시스, 인크. Immune cells with co-expressed SHRNA and logic gate system
CN115896227A (en) * 2022-12-20 2023-04-04 西北农林科技大学 Method for identifying exogenous gene integration site to enhance transgene expression

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040142325A1 (en) * 2001-09-14 2004-07-22 Liat Mintz Methods and systems for annotating biomolecular sequences
US20070161031A1 (en) * 2005-12-16 2007-07-12 The Board Of Trustees Of The Leland Stanford Junior University Functional arrays for high throughput characterization of gene expression regulatory elements
JP2016500254A (en) * 2012-12-05 2016-01-12 サンガモ バイオサイエンシーズ, インコーポレイテッド Methods and compositions for the regulation of metabolic diseases
US20210054405A1 (en) * 2018-03-02 2021-02-25 Generation Bio Co. Closed-ended dna (cedna) vectors for insertion of transgenes at genomic safe harbors (gsh) in humans and murine genomes

Also Published As

Publication number Publication date
PE20231514A1 (en) 2023-09-28
WO2022093846A1 (en) 2022-05-05
KR20230101839A (en) 2023-07-06
JP2023547887A (en) 2023-11-14
IL302315A (en) 2023-06-01
MX2023004822A (en) 2023-05-10
CL2023001176A1 (en) 2024-03-08
EP4232049A1 (en) 2023-08-30
BR112023007874A2 (en) 2023-10-24
AU2021369494A1 (en) 2023-06-22
CR20230220A (en) 2023-08-24
DOP2023000080A (en) 2023-08-31
CO2023006809A2 (en) 2023-08-09

Similar Documents

Publication Publication Date Title
AU2020223733B2 (en) Compositions and methods for the treatment of hemoglobinopathies
KR102587132B1 (en) Crispr-cpf1-related methods, compositions and components for cancer immunotherapy
JP7101419B2 (en) Targeted substitution of endogenous T cell receptors
JP6835726B2 (en) CRISPR hybrid DNA / RNA polynucleotide and usage
RU2767201C2 (en) Artificial genome modification for gene expression regulation
TW202035693A (en) Compositions and methods for immunotherapy
JP2020510443A (en) Method for increasing the efficiency of homologous recombination repair (HDR) in a cell genome
US11761004B2 (en) Safe harbor loci
CA3196269A1 (en) Safe harbor loci
US20240016934A1 (en) Compositions and Methods for Reducing MHC Class II in a Cell
US20220305141A1 (en) Skeletal myoblast progenitor cell lineage specification by crispr/cas9-based transcriptional activators
CA3232968A1 (en) Immune cells having co-expressed shrnas and logic gate systems
CN117042794A (en) T cell immunoglobulin and mucin domain 3 (TIM 3) compositions and methods for immunotherapy
JP2024501892A (en) Novel nucleic acid-guided nuclease
CN117042793A (en) Lymphocyte activating gene 3 (LAG 3) compositions and methods for immunotherapy
EP3636754A1 (en) Gene therapy for granular corneal dystrophy
RU2812491C2 (en) Compositions and methods of treating hemoglobinopathies
WO2023034276A2 (en) Materials and methods for targeted genetic manipulations in cells
KR20210151110A (en) Gene-editing system for modifying SCN9A or SCN10A gene and methods and uses thereof
WO2024059824A2 (en) Immune cells with combination gene perturbations
CN117098840A (en) Natural killer cell receptor 2B4 compositions and methods for immunotherapy
CN118369110A (en) CD38 compositions and methods for immunotherapy
CN110546262A (en) compositions and methods for treating hemoglobinopathies