WO2020181180A1 - A:t to c:g base editors and uses thereof - Google Patents

A:t to c:g base editors and uses thereof Download PDF

Info

Publication number
WO2020181180A1
WO2020181180A1 PCT/US2020/021362 US2020021362W WO2020181180A1 WO 2020181180 A1 WO2020181180 A1 WO 2020181180A1 US 2020021362 W US2020021362 W US 2020021362W WO 2020181180 A1 WO2020181180 A1 WO 2020181180A1
Authority
WO
WIPO (PCT)
Prior art keywords
fusion protein
cas9
oxidase
sequence
seq
Prior art date
Application number
PCT/US2020/021362
Other languages
French (fr)
Inventor
David R. Liu
Jordan Leigh DOMAN
Jaron August McClure MERCER
Original Assignee
The Broad Institute, Inc.
President And Fellows Of Harvard College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Broad Institute, Inc., President And Fellows Of Harvard College filed Critical The Broad Institute, Inc.
Publication of WO2020181180A1 publication Critical patent/WO2020181180A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/0004Oxidoreductases (1.)
    • C12N9/0071Oxidoreductases (1.) acting on paired donors with incorporation of molecular oxygen (1.14)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y114/00Oxidoreductases acting on paired donors, with incorporation or reduction of molecular oxygen (1.14)
    • C12Y114/11Oxidoreductases acting on paired donors, with incorporation or reduction of molecular oxygen (1.14) with 2-oxoglutarate as one donor, and incorporation of one atom each of oxygen into both donors (1.14.11)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y114/00Oxidoreductases acting on paired donors, with incorporation or reduction of molecular oxygen (1.14)
    • C12Y114/11Oxidoreductases acting on paired donors, with incorporation or reduction of molecular oxygen (1.14) with 2-oxoglutarate as one donor, and incorporation of one atom each of oxygen into both donors (1.14.11)
    • C12Y114/11033DNA oxidative demethylase (1.14.11.33)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/80Fusion polypeptide containing a DNA binding domain, e.g. Lacl or Tet-repressor

Definitions

  • Targeted editing of nucleic acid sequences is a highly promising approach for the study of gene function and also has the potential to provide new therapies for genetic diseases, including those caused by point mutations.
  • Point mutations represent the majority of known human genetic variants associated with disease. Developing robust methods to introduce and correct point mutations is therefore important in understanding and treating diseases with a genetic component.
  • Base editing involves the conversion of a specific nucleic acid base into another at a targeted genomic locus. For certain approaches, this can be achieved without requiring double-stranded DNA breaks (DSB). Since many genetic diseases arise from point mutations, this technology has important implications in the study of human health and disease. Engineered base editors are capable of editing many targets with high efficiency, often achieving editing of 30-70% of cells following a single treatment, without selective enrichment of the cell population for editing events.
  • DSB double-stranded DNA breaks
  • Base editors are typically fusions of a Cas (“CRISPR-associated”) domain and a nucleobase modification domain (e.g., a natural or evolved deaminase, such as a cytidine deaminase that include APOBEC1 (“apolipoprotein B mRNA editing enzyme, catalytic polypeptide 1”), CDA (“cytidine deaminase”), and ATP (“activation-induced cytidine deaminase”)) domains.
  • base editors may also include proteins or domains that alter cellular DNA repair processes to increase the efficiency and/or stability of the resulting single-nucleotide change.
  • C-to-T base editors use a cytidine deaminase to convert cytidine to uridine in the single-stranded DNA loop created by the Cas9 (“CRISPR- associated protein 9”) domain.
  • the opposite strand is nicked by Cas9 to stimulate DNA repair mechanisms that use the edited strand as a template, while a fused uracil glycosylase inhibitor slows excision of the edited base.
  • DNA repair leads to a C:G to T:A base pair conversion.
  • This class of base editor is described in U.S. Patent Publication No. 2017/0121693, published May 4, 2017, which issued on January 1, 2019 as U.S. Patent No. 10,167,457, which is incorporated by reference in its entirety herein.
  • a major limitation of base editing is the inability to generate transversion (purine ⁇ - pyrimidine) changes, which are needed to correct -38% of known human pathogenic SNPs. See Komor, A.C. et al, Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage, Nature 533, 420-424 (2016) and Landrum, M.J. et al, ClinVar: public archive of relationships among sequence variation and human phenotype, Nucleic Acids Res. 42, D980-985 (2014), each of which is incorporated by reference. Of this -38% of known pathogenic SNPs, about 15% arise from C:G to A:T mutations. Many C:G to A:T point mutations introduce premature stop codons (UAA, UAG, UGA), resulting in nonsense mutations in protein coding regions.
  • transversions can only be repaired by nuclease-mediated formation of a double-stranded break (DSB) followed by homology directed repair (HDR), which is typically inefficient, especially in non-mitotic cells, and leads to undesired by-products, such as indels (insertions and deletions) and translocations.
  • DLB double-stranded break
  • HDR homology directed repair
  • transversion base editors requires the development of a new editing strategy, such as the manipulation of endogenous DNA repair pathways or a different nucleobase chemical transformation.
  • the present disclosure describes novel transversion base editors using an innovative adenine oxidation strategy.
  • the present invention greatly expands the capabilities of base editing.
  • the present disclosure provides transversion base editors which add to the repertoire of base editors that have already been developed.
  • the present disclosure provides for adenine-to-cytosine or“ACBE” (or thymine-to-guanine or“TGBE”)
  • transversion base editors which satisfies the need in the art for the installation of targeted single-base transversion nucleobase changes in a target nucleotide sequence, e.g., a genome.
  • the present disclosure provides for nucleic acid molecules encoding and/or expressing these transversion base editors, as well as expression vectors or constructs for expressing the transversion base editors described herein, host cells comprising said nucleic acid molecules and expression vectors, and compositions for delivering and/or administering nucleic acid-based embodiments described herein.
  • the disclosure provides for compositions comprising these transversion base editors.
  • the present disclosure provides for methods of making adenine-to-cytosine transversion base editors, as well as methods of using adenine-to-cytosine transversion base editors or nucleic acid molecules encoding such transversion base editors in applications including editing a nucleic acid molecule, e.g., a genome.
  • the present inventors have discovered strategies to develop novel transversion base editors. Specifically, the inventors have developed a novel adenine oxidation strategy to install transversion A-to-C and T-to-G nucleobase changes in a targeted manner. This new strategy allows for the efficient and specific transversion of A-to-C or T-to-G using the inventive base editors described herein.
  • 8-oxoA is read by a polymerase as a cytosine and the cell’s mismatch repair machinery converts the base-paired thymine of the non-edited strand to a guanine to correct the apparent mismatch.
  • the resulting base pairing features two three-center hydrogen bonding systems.
  • the cell’s mismatch repair machinery converts the 8-oxoA lesion to a cytosine. A desired A-to-C transversion is thus achieved.
  • Adenine oxidation is achieved by the targeted use of a fusion protein comprising a napDNAbp (e.g., a catalytically dead Cas9 (“dCas9”) or Cas9 nickase (“nCas9”)) domain, an adenine oxidase domain, and optionally a linker connecting these two domains (see FIG. 1).
  • a napDNAbp e.g., a catalytically dead Cas9 (“dCas9”) or Cas9 nickase (“nCas9”)
  • adenine oxidase domain e.g., a catalytically dead Cas9 (“dCas9”) or Cas9 nickase (“nCas9”) domain
  • adenine oxidase domain e.g., a catalytically dead Cas9 (“dCas9”) or Cas9 nickase (“nCas9”) domain
  • the base editor fusion protein comprises (i) a nucleic acid
  • the nucleic acid programmable DNA binding protein (napDNAbp), and (ii) an adenine oxidase may be a Cas9 domain.
  • the napDNAbp may be a CasX, a CasY, a C2cl, a C2c2, a C2c3, a GeoCas9, a CjCas9, a Casl2a (formerly known as Cpfl), a Casl2b, a Casl2g, a Casl2h, a Casl2i, a Casl3b, a Casl3c, a Casl3d, a Casl4, a Csn2, an xCas9, an SpCas9-NG, an LbCasl2a, an AsCasl2a, a Cas9- KKH, a circularly permuted Cas9, an Ar
  • the adenine oxidase is a wild-type oxidase, or a variant thereof, that oxidizes an adenine in DNA to 8-oxoA.
  • the adenine oxidase comprises any one of the amino acid sequences of SEQ ID NOs: 5-8, 10, 15-20, 22-31, and 35-41. In particular embodiments, the adenine oxidase comprises any one of the amino acid sequences of SEQ ID NOs: 5-8, 10, 15- 20, 22-31, and 35-41. In particular embodiments, the adenine oxidase comprises the amino acid sequence of SEQ ID NO: 24.
  • a variant of the wild-type oxidase is produced by evolving an adenine oxidase enzyme using a directed evolution methodology.
  • the directed evolution methodology comprises phage assisted continuous evolution (PACE). In other embodiments, the evolution methodology comprises phage assisted non-continuous evolution (PANCE). In still other embodiments, the evolution methodology comprises other non-continuous evolutions, such as antibiotic or other discrete plate-based selections.
  • the fusion protein further comprises an inhibitor of base excision repair (“iBER”).
  • the iBER is a thymine-DNA glycosylase (TDG) inhibitor (“TDG inhibitor”), uracil-DNA glycosylase (UDG) inhibitor (“UDG inhibitor”), or an 8-oxo-guanine glycosylase (OGG or OGGI) inhibitor (“OGG inhibitor”).
  • TDG thymine-DNA glycosylase
  • UDG inhibitor uracil-DNA glycosylase
  • GOGG or OGGI 8-oxo-guanine glycosylase
  • the iBER comprises a catalytically inactive TDG that binds 8-oxoA to prevent its excision during subsequent mismatch repair.
  • the fusion proteins described herein may comprise any of the following structures: NH2-[adenine oxidase]-[napDNAbp]-COOH; or NH2-[napDNAbp]- [adenine oxidase] -COOH, wherein each instance of“]-[” comprises an optional linker.
  • the base editor fusion proteins described herein may comprise any of the following structures: Eh- [iBER] -[adenine oxidase]-[napDNAbp]-COOH; Nth-fadenine oxidase]-[iBER]- [napDNAbp]-COOH; NH 2 -[adenine oxidase] -[napDNAbp]- [iBER] -COOH; NH 2 -[iBER]- [adenine oxidase] -[napDNAbp] -COOH; NH2-[adenine oxidase]-[iBER]-[napDNAbp]- COOH; or NH2-[iBER]-[napDNAbp]-[adenine oxidase] -COOH, wherein each instance of “]-[” comprises an optional linker.
  • the linker fusing the napDNAbp, oxidase, and optional iBER may be any suitable amino acid linker sequence, polymer, or covalent bond.
  • exemplary linkers include any of the following amino acid sequences: SGGSSGGSSGS ETPGTS ES ATPES SGGSSGGS (SEQ ID NO: 11); SGGSGGSGGS (SEQ ID NO: 12); GGG; GGGS (SEQ ID NO: 1); SGGGS (SEQ ID NO: 2); SGSETPGTSESATPES (SEQ ID NO: 48); or SGGS (SEQ ID NO: 14).
  • the disclosure provides nucleic acid molecules or constructs encoding any of the base editor fusion proteins, or domains thereof.
  • the nucleic acid sequences may be codon-optimized for expression in the cells of any organism of interest. In certain embodiments, the nucleic acid sequence is codon-optimized for expression in human cells.
  • the disclosure provides polynucleotides and/or vectors encoding any of the base editor fusion proteins described herein, or domains thereof.
  • These nucleic acid sequences are typically engineered or modified experimentally.
  • these nucleic acid sequences may be codon-optimized for expression in an organism of interest, e.g., mammalian cells.
  • the nucleic acid sequences are codon-optimized for expression in human cells.
  • cells containing such polynucleotides or constructs are provided.
  • complexes comprising any of the fusion proteins described herein and a guide RNA bound to the napDNAbp domain of the fusion protein are provided.
  • the disclosure provides a pharmaceutical composition comprising any of the fusion proteins described herein and a pharmaceutically acceptable excipient.
  • the pharmaceutical composition further comprises a gRNA.
  • the disclosure provides a kit comprising a nucleic acid construct that includes (i) a nucleic acid sequence encoding any of the fusion proteins described herein; (ii) a heterologous promoter that drives expression of the sequence of (i); and optionally an expression construct encoding a guide RNA backbone and the target sequence.
  • methods for targeted nucleic acid editing typically comprise i) contacting a nucleic acid sequence with a complex comprising any of the fusion proteins described herein and a guide nucleic acid, wherein the double-stranded DNA comprises a target A:T (or T:A) nucleobase pair, and ii) editing the thymine (or adenine) of the A:T (or T:A) nucleobase pair.
  • the methods may further comprise iii) cutting or nicking the non-edited strand of the double- stranded DNA.
  • methods of treatment using the inventive base editors are provided.
  • the methods described herein may comprise treating a subject having or at risk of developing a disease, disorder, or condition, comprising administering to the subject a fusion protein as described herein, a polynucleotide as described herein, a vector as described herein, or a pharmaceutical composition as described herein.
  • FIG. 1 is a schematic illustration showing an exemplary fusion protein of the invention.
  • a fusion protein comprising an nCas9 domain linked to an adenine oxidase enzyme is targeted to the correct adenine nucleobase through the hybridization of a single guide RNA (“sgRNA”) to a complementary sequence of nucleic acid.
  • sgRNA single guide RNA
  • the adenine oxidase oxidizes the adenine to an 8-oxoadenine, and subsequently, the cell’s native replication/repair machinery recognizes the mutated base and effects the desired change to a cytosine nucleobase.
  • 8oA, 8-oxoadenine iBER, inhibitor of base excision repair
  • sgRNA single-guide RNA
  • PAM protospacer adjacent motif.
  • FIG. 2 depicts the chemical conversion of adenine to 8-oxoadenine, which disrupts existing hydrogen bonding with the thymine of the unmutated strand.
  • Steric rotation of the 8- oxoA around the glycosidic bond is induced, presenting the Hoogsteen edge for base pairing.
  • 8-oxoA is read by a polymerase as a cytosine, and the cell’s mismatch repair machinery converts the base-paired thymine of the non-edited strand to a guanine to correct the apparent mismatch.
  • the resulting base pairing features two three-center hydrogen bonding systems.
  • the cell’s mismatch repair machinery converts the 8-oxoA to a cytosine, thereby completing the desired A:T to C:G mutation.
  • FIG. 3 depicts a possible chemical mechanism for the a-ketoglutarate-dependent iron oxidase-mediated conversion of adenine to 8-oxoadenine.
  • An oxo group is transferred from a non-heme Fe(IV) center to the 8 position of adenine.
  • Formation of a 7,8-oxaziridine intermediate is induced, which rearranges spontaneously to the desired 8-oxoadenine.
  • FIG. 4 depicts an exemplary assay for selection of evolved variants of human
  • AlkBH3 a-ketoglutarate-dependent iron oxidase that are highly effective at oxidizing adenine.
  • Plasmids containing mutagenized AlkBH3-dCas9 fusion proteins and targeting guide RNAs (sgRNAs), and selection plasmids containing an inactivated spectinomycin resistance gene with a mutation at the active site that requires A:T to C:G editing to correct, are transformed into E. coli cells, which are plated onto agar media containing spectinomycin and sucrose.
  • Cells harboring plasmids with AlkBH3 mutants that restore antibiotic resistance are isolated and subjected to further rounds of mutation and selection under varying selection stringencies.
  • AlkBH3 variants emerging from each round of selection are then expressed within a fusion construct comprising a Cas9 nickase (nCas9). The resulting fusion proteins are tested for base editing activity in mammalian cells.
  • FIG. 5 depicts the operation of an inhibitor of base excision repair (iBER) domain in exemplary base editor fusion proteins disclosed herein.
  • iBER base excision repair
  • competitive base excision repair may interfere with 8-oxoadenine-mediated base editing.
  • an iBER is fused to to a fusion protein comprising an nCas9 domain and an adenine excision domain.
  • the iBER domain competes for binding of the 8-oxoadenine lesion with active, endogenous excision repair enzymes, preventing or slowing base excision repair.
  • oA oxoadenine
  • TDG thymine-DNA glycosylase
  • Pol d, RCA and RCNF are types of mammalian DNA polymerases.
  • the term“accessory plasmid,” as used herein, refers to a plasmid comprising a gene required for the generation of infectious viral particles under the control of a conditional promoter.
  • transcription from the conditional promoter of the accessory plasmid is typically activated, directly or indirectly, by a function of the gene to be evolved.
  • the accessory plasmid serves the function of conveying a competitive advantage to those viral vectors in a given population of viral vectors that carry a version of the gene to be evolved able to activate the conditional promoter or able to activate the conditional promoter more strongly than other versions of the gene to be evolved.
  • only viral vectors carrying an“activating” version of the gene to be evolved will be able to induce expression of the gene required to generate infectious viral particles in the host cell, and, thus, allow for packaging and propagation of the viral genome in the flow of host cells.
  • Vectors carrying non-activating versions of the gene to be evolved will not induce expression of the gene required to generate infectious viral vectors, and, thus, will not be packaged into viral particles that can infect fresh host cells.
  • Exemplary accessory plasmids have been described, for example in U.S. Application No. 15/567,312, published as U.S. Pub. No. 2018/0087046, filed on April 15, 2016, the entire contents of which is incorporated by reference herein.
  • “Base editing” is a genome editing technology that involves the conversion of a specific nucleic acid base into another at a targeted genomic locus. In certain embodiments, this can be achieved without requiring double- stranded DNA breaks (DSB).
  • DSB double- stranded DNA breaks
  • CRISPR-based systems begin with the introduction of a DSB at a locus of interest. Subsequently, cellular DNA repair enzymes mend the break, commonly resulting in random insertions or deletions (indels) of bases at the site of the DSB.
  • base-to-base changes there are 12 possible base-to-base changes that may occur via individual or sequential use of transition (i.e., a purine-to-purine change or pyrimidine-to- pyrimidine change) or transversion (i.e., a purine-to-pyrimidine or pyrimidine-to-purine) editors. These include:
  • C-to-T base editor (or“CTBE”). This type of editor converts a C:G Watson-Crick nucleobase pair to a T:A Watson-Crick nucleobase pair. Because the corresponding Watson-Crick paired bases are also interchanged as a result of the conversion, this category of base editor may also be referred to as a G-to-A base editor (or“GABE”).
  • A-to-G base editor (or“AGBE”). This type of editor converts a A:T Watson-Crick nucleobase pair to a G:C Watson-Crick nucleobase pair. Because the corresponding Watson-Crick paired bases are also interchanged as a result of the conversion, this category of base editor may also be referred to as a T-to-C base editor (or“TCBE”).
  • CGBE o C-to-G base editor
  • This type of editor converts a C:G Watson-Crick nucleobase pair to a G:C Watson-Crick nucleobase pair. Because the corresponding Watson-Crick paired bases are also interchanged as a result of the conversion, this category of base editor may also be referred to as a G-to-C base editor (or“GCBE”).
  • o G-to-T base editor (or“ACBE”). This type of editor converts a G:C Watson-Crick nucleobase pair to a T:A Watson-Crick nucleobase pair. Because the corresponding Watson-Crick paired bases are also interchanged as a result of the conversion, this category of base editor may also be referred to as a C-to-A base editor (or“CABE”).
  • A-to-T base editor (or“TGBE”). This type of editor converts a A:T Watson-Crick nucleobase pair to a T:A Watson-Crick nucleobase pair. Because the corresponding Watson-Crick paired bases are also interchanged as a result of the conversion, this category of base editor may also be referred to as a T-to-A base editor (or“ACBE”).
  • A-to-C base editor (or“ACBE”). This type of editor converts a A:T Watson-Crick nucleobase pair to a C:G Watson-Crick nucleobase pair. Because the corresponding Watson-Crick paired bases are also interchanged as a result of the conversion, this category of base editor may also be referred to as a T-to-G base editor (or“TGBE”).
  • the term“base editors (BEs)”, as used herein, refers to the Cas-fusion proteins described herein.
  • the fusion protein comprises a nuclease-inactive Cas9 (dCas9) fused to an adenine oxidase which binds a nucleic acid in a guide RNA- programmed manner via the formation of an R-loop but does not cleave the nucleic acid.
  • dCas9 nuclease-inactive Cas9
  • the dCas9 domain of the fusion protein may include a D10A and a H840A mutation (which renders Cas9 capable of cleaving only one strand of a nucleic acid duplex) as described in PCT/US2016/058344 (filed on October 22, 2016 and published as WO
  • the DNA cleavage domain of S. pyogenes Cas9 includes two subdomains, the HNH nuclease subdomain and the RuvCl subdomain.
  • the HNH subdomain cleaves the strand
  • the RuvCl mutant D10A generates a nick on the targeted strand
  • the HNH mutant H840A generates a nick on the non-targeted strand
  • the fusion protein comprises a Cas9 nickase fused to an adenine oxidase, e.g., an adenine oxidase which converts an adenine nucleobase to 8- oxoadenine.
  • base editors encompasses the base editors described herein as well as any base editor known or described in the art at the time of this filing or developed in the future. Reference is made to Rees & Liu, Base editing: precision chemistry on the genome and transcriptome of living cells, Nat Rev Genet. 2018;19(12):770-788; as well as.U.S. Patent Publication No. 2018/0073012, published March 15, 2018, which issued as U.S.
  • Cas9 or“Cas9 nuclease” or“Cas9 domain” refers to a CRISPR associated protein 9, or variant thereof, and embraces any naturally occurring Cas9 from any organism, any naturally-occurring Cas9, any Cas9 homolog, ortholog, or paralog from any organism, and any variant of a Cas9, naturally-occurring or engineered. More broadly, a Cas9 protein, domain, or domain is a type of“nucleic acid programmable DNA binding protein
  • Cas9 is not meant to be limiting and may be referred to as a“Cas9 or variant thereof.” Exemplary Cas9 proteins are described herein and also described in the art. The present disclosure is unlimited with regard to the particular Cas9 that is employed in the base editors of the invention.
  • proteins comprising Cas9 or fragments thereof are referred to as“Cas9 variants.”
  • a Cas9 variant shares homology to Cas9, or a fragment thereof.
  • Cas9 variants include functional fragments of Cas9.
  • a Cas9 variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to wild type Cas9.
  • the Cas9 variant may have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32,
  • the Cas9 variant comprises a fragment of Cas9 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild type Cas9.
  • a fragment of Cas9 e.g., a gRNA binding domain or a DNA-cleavage domain
  • the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9.
  • dCas9 refers to a nuclease-inactive Cas9 or nuclease-dead Cas9, or a functional fragment or variant thereof, and embraces any naturally occurring dCas9 from any organism, any naturally-occurring dCas9 equivalent or functional fragment thereof, any dCas9 homolog, ortholog, or paralog from any organism, and any mutant or variant of a dCas9, naturally-occurring or engineered.
  • dCas9 is not meant to be particularly limiting and may be referred to as a“dCas9 or equivalent.”
  • Exemplary dCas9 proteins and method for making dCas9 proteins are further described herein and/or are described in the art and are incorporated herein by reference.
  • nCas9 or“Cas9 nickase” refers to a Cas9 or a functional fragment or variant thereof, which cleaves or nicks only one of the strands of a target cut site thereby introducing a nick in a double strand DNA molecule rather than creating a double strand break. This can be achieved by introducing appropriate mutations in a wild-type Cas9 which inactives one of the two endonuclease activities of the Cas9.
  • Any suitable mutation which inactivates one Cas9 endonuclease activity but leaves the other intact is contemplated, such as one of D10A or H840A mutations in the wild-type Cas9 amino acid sequence (e.g., SEQ ID NO: 9) may be used to form the nCas9.
  • the term“continuous evolution,” as used herein, refers to an evolution procedure, (e.g., PACE) in which a population of nucleic acids is subjected to multiple rounds of (a) replication, (b) mutation, and (c) selection to produce a desired evolved product, for example, a nucleic acid encoding a protein with a desired activity, wherein the multiple rounds can be performed without investigator interaction and wherein the processes under (a)-(c) can be carried out simultaneously.
  • the evolution procedure is carried out in vitro , for example, using cells in culture as host cells.
  • a continuous evolution process relies on a system in which a gene of interest is provided in a nucleic acid vector that undergoes a life-cycle including replication in a host cell and transfer to another host cell, wherein a critical component of the life-cycle is deactivated and reactivation of the component is dependent upon a desired mutation in the gene of interest.
  • a critical component of the life-cycle is deactivated and reactivation of the component is dependent upon a desired mutation in the gene of interest.
  • the nucleic acid vector comprising the gene of interest is a phage, a viral vector, or naked DNA (e.g., a mobilization plasmid).
  • transfer of the gene of interest from cell to cell is via infection, transfection, transduction, conjugation, or uptake of naked DNA, and efficiency of cell-to-cell transfer (e.g., transfer rate) is dependent on the activity of a product encoded by the gene of interest.
  • the nucleic acid vector is a phage harboring the gene of interest and the efficiency of phage transfer (via infection) is dependent on an activity of the gene of interest in that a protein required for the generation of phage particles (e.g., pill for M13 phage) is expressed in the host cells only in the presence of the desired activity of the gene of interest.
  • a protein required for the generation of phage particles e.g., pill for M13 phage
  • the nucleic acid vector is a retroviral vector, for example, a lentiviral or vesicular stomatitis virus vector harboring the gene of interest, and the efficiency of viral transfer from cell to cell is dependent on an activity of the gene of interest in that a protein required for the generation of viral particles (e.g., an envelope protein, such as VSV- g) is expressed in the host cells only in the presence of the desired activity of the gene of interest.
  • a retroviral vector for example, a lentiviral or vesicular stomatitis virus vector harboring the gene of interest
  • a protein required for the generation of viral particles e.g., an envelope protein, such as VSV- g
  • the nucleic acid vector is a DNA vector, for example, in the form of a mobilizable plasmid DNA, comprising the gene of interest, that is transferred between bacterial host cells via conjugation and the efficiency of conjugation-mediated transfer from cell to cell is dependent on the activity of the gene of interest in that a protein required for conjugation-mediated transfer (e.g., traA or traQ) is expressed in the host cells only in the presence of the desired activity of the gene of interest.
  • Host cells contain F plasmid lacking one or both of those genes.
  • some embodiments provide a continuous evolution system, in which a population of viral vectors comprising a gene of interest to be evolved replicates in a flow of host cells, e.g., a flow through a lagoon, wherein the viral vectors are deficient in a gene encoding a protein that is essential for the generation of infectious viral particles, and wherein that gene is comprised in the host cell under the control of a conditional promoter that can be activated by a gene product encoded by the gene of interest, or a mutated version thereof.
  • the activity of the conditional promoter depends on a desired function of a gene product encoded by the gene of interest.
  • Viral vectors in which the gene of interest has not acquired a mutation conferring the desired function, will not activate the conditional promoter, or only achieve minimal activation, while any mutation in the gene of interest that confers the desired mutation will result in activation of the conditional promoter. Since the conditional promoter controls an essential protein for the viral life cycle, activation of this promoter directly corresponds to an advantage in viral spread and replication for those vectors that have acquired an advantageous mutation.
  • CRISPR is a family of DNA sequences (i.e., CRISPR clusters) in bacteria and archaea that represent snippets of prior infections by a virus that have invaded the prokaryote.
  • the snippets of DNA are used by the prokaryotic cell to detect and destroy DNA from subsequent attacks by similar viruses and effectively constitute, along with an array of CRISPR-associated proteins (including Cas9 and homologs thereof) and CRISPR-associated RNA, a prokaryotic immune defense system.
  • CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).
  • tracrRNA trans-encoded small RNA
  • me endogenous ribonuclease 3
  • Cas9 protein a trans-encoded small RNA
  • the tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA.
  • Cas9/crRNA/tracrRNA endonucleolytic ally cleaves linear or circular nucleic acid target complementary to the RNA. Specifically, the target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3 '-5' exonucleolytically.
  • DNA-binding and cleavage typically requires protein and both RNAs.
  • single guide RNAs (“sgRNA”, or simply“gRNA”) can be engineered so as to incorporate embodiments of both the crRNA and tracrRNA into a single RNA species— the guide RNA. See, e.g., Jinek M., el al., Science 337:816-821(2012), the entire contents of which is herein incorporated by reference.
  • Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self.
  • CRISPR biology, as well as Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g.,“Complete genome sequence of an Ml strain of Streptococcus pyogenes.” Ferretti J.J., el al, Proc. Natl. Acad. Sci. U.S.A.
  • Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier,“The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference.
  • an effective amount refers to an amount of a biologically active agent that is sufficient to elicit a desired biological response.
  • an effective amount of a base editor may refer to the amount of the base editor that is sufficient to edit a target site nucleotide sequence, e.g., a genome.
  • an effective amount of a base editor provided herein e.g., of a fusion protein comprising a nuclease-inactive Cas9 domain and a nucleobase modification domain (e.g., an adenine oxidase domain) may refer to the amount of the fusion protein that is sufficient to induce editing of a target site specifically bound and edited by the fusion protein.
  • an effective amount of a base editor provided herein may refer to the amount of the fusion protein sufficient to induce editing having the following characteristics: > 50% product purity, ⁇ 5% indels, and an editing window of 2-8 nucleotides.
  • an agent e.g., a fusion protein, a nuclease, an adenine oxidase, a hybrid protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide
  • an agent e.g., a fusion protein, a nuclease, an adenine oxidase, a hybrid protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide
  • the desired biological response e.g., on the specific allele, genome, or target site to be edited, on the target cell or tissue (i.e., the cell or tissue to be edited)
  • the term“evolved base editor” or“evolved base editor variant” refers to a base editor formed as a result of mutagenizing a reference or starting-point base editor.
  • the term refers to embodiments in which the nucleobase modification domain is evolved or a separate domain is evolved.
  • Mutagenizing a reference or starting-point base editor may comprise mutagenizing an adenine oxidase— by a continuous evolution method (e.g., PACE), wherein the evolved adenine oxidase has one or more amino acid variations introduced into its amino acid sequence relative to the amino acid sequence of the adenine oxidase.
  • PACE continuous evolution method
  • Amino acid sequence variations may include one or more mutated residues within the amino acid sequence of a reference base editor, e.g., as a result of a change in the nucleotide sequence encoding the base editor that results in a change in the codon at any particular position in the coding sequence, the deletion of one or more amino acids (e.g., a truncated protein), the insertion of one or more amino acids, or any combination of the foregoing.
  • the evolved base editor may include variants in one or more components or domains of the base editor (e.g., variants introduced into an adenine oxidase domain, an iBER domain, or a variant introduced into combinations of these domains).
  • fusion protein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins.
  • One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C-terminal) protein thus forming an“amino-terminal fusion protein” or a“carboxy-terminal fusion protein,” respectively.
  • a protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a nucleic-acid editing protein.
  • any of the proteins provided herein may be produced by any method known in the art.
  • the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker.
  • Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4 th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.
  • a suitable host cell refers to a cell that can host, replicate, and transfer a phage vector useful for a continuous evolution process as provided herein.
  • a suitable host cell is a cell that can be infected by the viral vector, can replicate it, and can package it into viral particles that can infect fresh host cells.
  • a cell can host a viral vector if it supports expression of genes of viral vector, replication of the viral genome, and/or the generation of viral particles.
  • One criterion to determine whether a cell is a suitable host cell for a given viral vector is to determine whether the cell can support the viral life cycle of a wild-type viral genome that the viral vector is derived from.
  • a suitable host cell would be any cell that can support the wild-type M13 phage life cycle.
  • Suitable host cells for viral vectors useful in continuous evolution processes are well known to those of skill in the art, and the disclosure is not limited in this respect.
  • the viral vector is a phage and the host cell is a bacterial cell.
  • the host cell is an E. coll cell. Suitable E.
  • coli host strains will be apparent to those of skill in the art, and include, but are not limited to, New England Biolabs (NEB) Turbo, ToplOF’, DH12S, ER2738, ER2267, and XLl-Blue MRF’ . These strain names are art recognized and the genotype of these strains has been well characterized. It should be understood that the above strains are exemplary only and that the invention is not limited in this respect.
  • the host cell is a prokaryotic cell, for example, a bacterial cell.
  • the host cell is an E. coli cell.
  • the host cell is a eukaryotic cell, for example, a yeast cell, an insect cell, or a mammalian cell.
  • the type of host cell will, of course, depend on the viral vector employed, and suitable host cell/viral vector combinations will be readily apparent to those of skill in the art.
  • the host cells are E. coli cells expressing the Fertility factor, also commonly referred to as the F factor, sex factor, or F-plasmid.
  • the F-factor is a bacterial DNA sequence that allows a bacterium to produce a sex pilus necessary for conjugation and is essential for the infection of E. coli cells with certain phage, for example, with M13 phage.
  • the host cells for M13-PACE are of the genotype F'proA + B +
  • AlacIZYA araD139 A(ara,leu)7697 mcrA
  • linker refers to a chemical group or a molecule linking two molecules or domains, e.g., nCas9 and an adenine oxidase or adenine oxidase.
  • a linker joins a dCas9 and modification domain (e.g., an adenine oxidase).
  • the linker is positioned between, or flanked by, two groups, molecules, or other domains and connected to each one via a covalent bond, thus connecting the two.
  • the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein).
  • the linker is an organic molecule, group, polymer, or chemical domain. Chemical domains include, but are not limited to, disulfide, hydrazone, thiol and azo domains.
  • the linker is 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.
  • mutation refers to a substitution of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue; a deletion or insertion of one or more residues within a sequence; or a substitution of a residue within a sequence of a genome in a subject to be corrected. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue.
  • Mutations can include a variety of categories, such as single base polymorphisms, microduplication regions, indel, and inversions, and is not meant to be limiting in any way. Mutations can include“loss-of- function” mutations which is the normal result of a mutation that reduces or abolishes a protein activity.
  • loss-of-function mutations are recessive, because in a heterozygote the second chromosome copy carries an unmutated version of the gene coding for a fully functional protein whose presence compensates for the effect of the mutation. There are some exceptions where a loss-of-function mutation is dominant, one example being
  • haploinsufficiency where the organism is unable to tolerate the approximately 50% reduction in protein activity suffered by the heterozygote.
  • Mutations also embrace“gain-of-function” mutations, which is one which confers an abnormal activity on a protein or cell that is otherwise not present in a normal condition.
  • Many gain-of-function mutations are in regulatory sequences rather than in coding regions, and can therefore have a number of consequences. For example, a mutation might lead to one or more genes being expressed in the wrong tissues, these tissues gaining functions that they normally lack. Alternatively the mutation could lead to overexpression of one or more genes involved in control of the cell cycle, thus leading to uncontrolled cell division and hence to cancer. Because of their nature, gain-of-function mutations are usually dominant.
  • nucleic acid molecules or polypeptides e.g., Cas9 or adenine oxidases
  • nucleic acid molecules or polypeptides mean that the nucleic acid molecule or the polypeptide is at least substantially free from at least one other component with which they are naturally associated in nature and/or as found in nature (e.g., an amino acid sequence not found in nature).
  • nucleic acid refers to RNA as well as single and/or double-stranded DNA. Nucleic acids may be naturally occurring, for example, in the context of a genome, a transcript, an mRNA, tRNA, rRNA, siRNA, snRNA, a plasmid, cosmid, chromosome, chromatid, or other naturally occurring nucleic acid molecule.
  • a nucleic acid molecule may be a non-naturally occurring molecule, e.g., a recombinant DNA or RNA, an artificial chromosome, an engineered genome, or fragment thereof, or a synthetic DNA, RNA, DNA/RNA hybrid, or including non-naturally occurring nucleotides or nucleosides.
  • the terms“nucleic acid,”“DNA,”“RNA,” and/or similar terms include nucleic acid analogs, e.g., analogs having other than a phosphodiester backbone. Nucleic acids can be purified from natural sources, produced using recombinant expression systems and optionally purified, chemically synthesized, etc.
  • nucleic acids can comprise nucleoside analogs such as analogs having chemically modified bases or sugars, and backbone modifications.
  • a nucleic acid sequence is presented in the 5' to 3' direction unless otherwise indicated.
  • a nucleic acid is or comprises natural nucleosides (e.g.
  • nucleoside analogs e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, 2-aminoadenosine, C5- bromouridine, C5-fluorouridine, C5-iodouridine, C5-propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine, 2-aminoadenosine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenine, 8-oxoguanosine, 0(6)-methylguanine, and 2-thiocytidine);
  • biologically modified bases e.g., methylated bases
  • intercalated bases modified sugars (e.g., 2'-fluororibose, ribose, 2'-deoxyribose, arabinose, and hexose); and/or modified phosphate groups (e.g., phosphorothioates and 5'-N-phosphoramidite linkages).
  • modified sugars e.g., 2'-fluororibose, ribose, 2'-deoxyribose, arabinose, and hexose
  • modified phosphate groups e.g., phosphorothioates and 5'-N-phosphoramidite linkages
  • nucleic acid programmable D/RNA binding protein refers to any protein that may associate (e.g., form a complex) with one or more nucleic acid molecules (i.e., which may broadly be referred to as a“napR/DNAbp-programming nucleic acid molecule” and includes, for example, guide RNA in the case of Cas systems) which direct or otherwise program the protein to localize to a specific target nucleotide sequence (e.g., a gene locus of a genome) that is complementary to the one or more nucleic acid molecules (or a portion or region thereof) associated with the protein, thereby causing the protein to bind to the nucleotide sequence at the specific target site.
  • a specific target nucleotide sequence e.g., a gene locus of a genome
  • napR/DNAbp embraces CRISPR Cas9 proteins, as well as Cas9 equivalents, homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or modified), and may include a Cas9 equivalent from any type of CRISPR system (e.g., type II, V, VI), including Cpfl (a type-V CRISPR-Cas systems), C2cl (a type V CRISPR-Cas system), C2c2 (a type VI CRISPR-Cas system), C2c3 (a type V CRISPR-Cas system), dCas9, GeoCas9, CjCas9, Casl2a, Casl2b, Casl2c, Casl2d, Casl2g, Casl2h, Casl2i, Casl3b, Casl3c, Casl3d, Casl4, Csn2, Argonaute (Ago), and
  • the term also embraces Cas homologs and variants such as an xCas9, an SpCas9-NG, an LbCasl2a, an AsCasl2a, a Cas9-KKH, a circularly permuted Cas9, a SmacCas9, a Spy-macCas9. Further Cas-equivalents are described in Makarova et al.,“C2c2 is a single-component
  • nucleic acid programmable DNA binding protein that may be used in connection with this invention are not limited to CRISPR-Cas systems.
  • the invention embraces any such programmable protein, such as the Argonaute protein from Natronobacterium gregoryi (NgAgo) which may also be used for DNA-guided genome editing.
  • NgAgo-guide DNA system does not require a PAM sequence or guide RNA molecules, which means genome editing can be performed simply by the expression of generic NgAgo protein and
  • the napR/DNAbp is a RNA-programmable nuclease, when in a complex with an RNA, may be referred to as a nuclease:RNA complex.
  • the bound RNA(s) is referred to as a guide RNA (gRNA).
  • gRNAs can exist as a complex of two or more RNAs, or as a single RNA molecule.
  • gRNAs that exist as a single RNA molecule may be referred to as single-guide RNAs (sgRNAs), though“gRNA” is used interchangeabley to refer to guide RNAs that exist as either single molecules or as a complex of two or more molecules.
  • gRNAs that exist as single RNA species comprise two domains: (1) a domain that shares homology to a target nucleic acid (e.g., and directs binding of a Cas9 (or equivalent) complex to the target); and (2) a domain that binds a Cas9 protein.
  • domain (2) corresponds to a sequence known as a tracrRNA, and comprises a stem-loop structure.
  • domain (2) is homologous to a tracrRNA as depicted in Figure IE of Jinek et al., Science 337:816-821(2012), the entire contents of which is incorporated herein by reference.
  • gRNAs e.g., those including domain 2
  • gRNAs can be found in U.S. Patent No. 9,340,799, entitled“mRNA-Sensing Switchable gRNAs,” and International Patent Application No. PCT/US2014/054247, filed September 6, 2013, published as WO 2015/035136 and entitled“Delivery System For Functional Nucleases,” the entire contents of each are herein incorporated by reference.
  • a gRNA comprises two or more of domains (1) and (2), and may be referred to as an“extended gRNA.”
  • an extended gRNA will, e.g., bind two or more Cas9 proteins and bind a target nucleic acid at two or more distinct regions, as described herein.
  • the gRNA comprises a nucleotide sequence that complements a target site, which mediates binding of the nuclease/RNA complex to said target site, providing the sequence specificity of the nuclease:RNA complex.
  • the RNA- programmable nuclease is the (CRISPR-associated system) Cas9 endonuclease, for example Cas9 (Csnl) from Streptococcus pyogenes (see, e.g.,“Complete genome sequence of an Ml strain of Streptococcus pyogenes.” Ferretti J.J. et al.., Proc. Natl. Acad. Sci. U.S. A.
  • the napR/DNAbp nucleases (e.g., Cas9) use RNA:DNA hybridization to target DNA cleavage sites, these proteins are able to be targeted, in principle, to any sequence specified by the guide RNA.
  • Methods of using napR/DNAbp nucleases, such as Cas9, for site-specific cleavage (e.g., to modify a genome) are known in the art (see e.g., Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823 (2013); Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823-826 (2013); Hwang, W.Y.
  • the term“napR/DNAbp-programming nucleic acid molecule” or equivalently“guide sequence” refers the one or more nucleic acid molecules which associate with and direct or otherwise program a napR/DNAbp protein to localize to a specific target nucleotide sequence (e.g., a gene locus of a genome) that is complementary to the one or more nucleic acid molecules (or a portion or region thereof) associated with the protein, thereby causing the napR/DNAbp protein to bind to the nucleotide sequence at the specific target site.
  • a specific target nucleotide sequence e.g., a gene locus of a genome
  • a non limiting example is a guide RNA of a Cas protein of a CRISPR-Cas genome editing system.
  • a nuclear localization signal or sequence is an amino acid sequence that tags, designates, or otherwise marks a protein for import into the cell nucleus by nuclear transport. Typically, this signal consists of one or more short sequences of positively charged lysines or arginines exposed on the protein surface. Different nuclear localized proteins may share the same NLS. An NLS has the opposite function of a nuclear export signal (NES), which targets proteins out of the nucleus. Thus, a single nuclear localization signal can direct the entity with which it is associated to the nucleus of a cell.
  • sequences can be of any size and composition, for example more than 25, 25, 15, 12, 10, 8, 7, 6, 5 or 4 amino acids, but will preferably comprise at least a four to eight amino acid sequence known to function as a nuclear localization signal (NLS).
  • nucleobase modification domain or“modification domain” embraces any protein, enzyme, or polypeptide (or functional fragment thereof) which is capable of modifying a DNA or RNA molecule. Nucleobase modification domains may be naturally occurring, or may be engineered.
  • a nucleobase modification domain can include one or more DNA repair enzymes, for example, and an enzyme or protein involved in base excision repair (BER), nucleotide excision repair (NER), homology- dependnent recombinational repair (HR), non-homologous end-joining repair (NHEJ), microhomology end-joining repair (MMEJ), mismatch repair (MMR), direct reversal repair, or other known DNA repair pathway.
  • a nucleobase modification domain can have one or more types of enzymatic activities, including, but not limited to, endonuclease activity, polymerase activity, ligase activity, replication activity, and proofreading activity.
  • Nucleobase modification domains can also include DNA or RNA-modifying enzymes and/or mutagenic enzymes, such as DNA oxidizing enzymes (i.e., adenine oxidases), which covalently modify nucleobases leading in some cases to mutagenic corrections by way of normal cellular DNA repair and replication processes.
  • DNA oxidizing enzymes i.e., adenine oxidases
  • nucleobase modification domains include, but are not limited to, an adenine oxidase, a nuclease, a nickase, a recombinase, a methyltransferase, a methylase, an acetylase, an acetyltransferase, a transcriptional activator, or a transcriptional repressor domain.
  • the nucleobase modification domain is an adenine oxidase (e.g., AlkBHl).
  • the terms“oligonucleotide” and“polynucleotide” can be used interchangeably to refer to a polymer of nucleotides (e.g., a string of at least three
  • phage-assisted continuous evolution refers to continuous evolution that employs phage as viral vectors.
  • PACE phage-assisted continuous evolution
  • PCT/US 2009/056194 filed September 8, 2009, published as WO 2010/028347 on March 11, 2010; International PCT Application, PCT/US2011/066747, filed December 22, 2011, published as WO 2012/088381 on June 28, 2012; U.S. Patent No. 9,023,594, issued May 5, 2015; U.S. Patent No. 9,771,574, issued September 26, 2017; U.S. Patent No. 9,394,537, issued July 19, 2016; International PCT Application, PCT/US2015/012022, filed January 20, 2015, published as WO 2015/134121 on September 11, 2015; U.S. Patent No. 10,179,911, issued January 15, 2019; U.S. Patent No.
  • PANCE phage-assisted non-continuous evolution
  • SP evolving‘selection phage’
  • promoter refers to a nucleic acid molecule with a sequence recognized by the cellular transcription machinery and able to initiate transcription of a downstream gene.
  • a promoter can be constitutively active, meaning that the promoter is always active in a given cellular context, or conditionally active, meaning that the promoter is only active in the presence of a specific condition.
  • conditional promoter may only be active in the presence of a specific protein that connects a protein associated with a regulatory element in the promoter to the basic transcriptional machinery, or only in the absence of an inhibitory molecule.
  • a subclass of conditionally active promoters are inducible promoters that require the presence of a small molecule“inducer” for activity.
  • inducible promoters include, but are not limited to, arabinose-inducible promoters, Tet-on promoters, and tamoxifen-inducible promoters.
  • inducible promoters include, but are not limited to, arabinose-inducible promoters, Tet-on promoters, and tamoxifen-inducible promoters.
  • constitutive, conditional, and inducible promoters are well known to the skilled artisan, and the skilled artisan will be able to ascertain a variety of such promoters useful in carrying out the instant invention, which is not limited in this respect.
  • the specification provides vectors with appropriate promoters for driving expression of the nucleic acid sequences encoding the base editor fusion proteins (or one or more individual components thereof).
  • phage refers to a vims that infects bacterial cells.
  • phages consist of an outer protein capsid enclosing genetic material.
  • the genetic material may be ssRNA, dsRNA, ssDNA, or dsDNA, in either linear or circular form.
  • Phages and phage vectors are well known to those of skill in the art and non-limiting examples of phages that are useful for carrying out the methods provided herein are l, T2, T4, T7, T12, R17, M13, MS2, G4, PI, P2, P4, Phi X174, N4, F6, and F29.
  • the phage utilized in the present invention is M13. Additional suitable phages and host cells will be apparent to those of skill in the art and the invention is not limited in this aspect. For an exemplary description of additional suitable phages and host cells, see Elizabeth Kutter and Alexander Sulakvelidze:
  • Bacteriophages Biology and Applications. CRC Press; 1st edition (December 2004), ISBN: 0849313368; Martha R. J. Clokie and Andrew M.
  • Kropinski Bacteriophages: Methods and Protocols, Volume 1: Isolation, Characterization, and Interactions (Methods in Molecular Biology) Humana Press; 1st edition (December, 2008), ISBN: 1588296822; Martha R. J. Clokie and Andrew M. Kropinski: Bacteriophages: Methods and Protocols, Volume 2:
  • protein refers to a polymer of amino acid residues linked together by peptide (amide) bonds.
  • the terms refer to a protein, peptide, or polypeptide of any size, structure, or function. Typically, a protein, peptide, or polypeptide will be at least three amino acids long.
  • a protein, peptide, or polypeptide may refer to an individual protein or a collection of proteins.
  • One or more of the amino acids in a protein, peptide, or polypeptide may be modified, for example, by the addition of a chemical entity such as a carbohydrate group, a hydroxyl group, a phosphate group, a famesyl group, an isofamesyl group, a fatty acid group, a linker for conjugation, functionalization, or other modification, etc.
  • a protein, peptide, or polypeptide may also be a single molecule or may be a multi-molecular complex.
  • a protein, peptide, or polypeptide may be just a fragment of a naturally occurring protein or peptide.
  • a protein, peptide, or polypeptide may be naturally occurring, engineered, or synthetic, or any combination thereof.
  • fusion protein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins.
  • One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C- terminal) protein thus forming an“amino-terminal fusion protein” or a“carboxy-terminal fusion protein,” respectively.
  • a protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a recombinase.
  • a protein comprises a proteinaceous part, e.g., an amino acid sequence constituting a nucleic acid binding domain, and an organic compound, e.g., a compound that can act as a nucleic acid cleavage agent.
  • a protein is in a complex with, or is in association with, a nucleic acid, e.g., RNA.
  • Any of the proteins provided herein may be produced by any method known in the art.
  • the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor
  • recombinant refers to proteins or nucleic acids that do not occur in nature, but are the product of human engineering.
  • a recombinant protein or nucleic acid molecule comprises an amino acid or nucleotide sequence that comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations as compared to any naturally occurring sequence.
  • subject refers to an individual organism, for example, an individual mammal. In some embodiments, the subject is a human. In some embodiments, the subject is a non-human mammal.
  • the subject is a non-human primate. In some embodiments, the subject is a rodent. In some embodiments, the subject is a sheep, a goat, a cattle, a cat, or a dog. In some embodiments, the subject is a vertebrate, an amphibian, a reptile, a fish, an insect, a fly, or a nematode. In some embodiments, the subject is a research animal. In some embodiments, the subject is an experimental organism. In some embodiments, the subject is a plant. In some embodiments, the subject is genetically engineered, e.g., a genetically engineered non-human subject. The subject may be of either sex and at any stage of development.
  • target site refers to a sequence within a nucleic acid molecule that is edited by a base editor (e.g., a dCas9-adenine oxidase fusion protein provided herein).
  • the target site further refers to the sequence within a nucleic acid molecule to which a complex of the base editor and gRNA binds.
  • the term“vector,” as used herein, may refer to a nucleic acid that has been modified to encode a gene of interest and that is able to enter into a host cell, mutate and replicate within the host cell, and then transfer a replicated form of the vector into another host cell.
  • the term“vector” as used herein may refer to a nucleic acid that has been modified to encode the base editor.
  • Exemplary suitable vectors include viral vectors, such as retroviral vectors or bacteriophages and filamentous phage, and conjugative plasmids.
  • viral particle refers to a viral genome, for example, a DNA or RNA genome, that is associated with a coat of a viral protein or proteins, and, in some cases, with an envelope of lipids.
  • a phage particle comprises a phage genome packaged into a protein encoded by the wild type phage genome.
  • viral vector refers to a nucleic acid comprising a viral genome that, when introduced into a suitable host cell, can be replicated and packaged into viral particles able to transfer the viral genome into another host cell.
  • the term“viral vector” extends to vectors comprising truncated or partial viral genomes.
  • a viral vector is provided that lacks a gene encoding a protein essential for the generation of infectious viral particles.
  • suitable host cells for example, host cells comprising the lacking gene under the control of a conditional promoter, however, such truncated viral vectors can replicate and generate viral particles able to transfer the truncated viral genome into another host cell.
  • the viral vector is an adeno- associated virus (AAV) vector.
  • AAV adeno-associated virus
  • the terms“treatment,”“treat,” and“treating,” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease, disorder, or condition, or one or more symptoms thereof, as described herein.
  • the terms “treatment,”“treat,” and“treating” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease, disorder, or condition, or one or more symptoms thereof, as described herein.
  • treatment may be
  • treatment may be administered after one or more symptoms have developed and/or after a disease has been diagnosed.
  • treatment may be administered in the absence of symptoms, e.g., to prevent or delay onset of a symptom or inhibit onset or progression of a disease.
  • treatment may be administered to a susceptible individual prior to the onset of symptoms (e.g., in light of a history of symptoms and/or in light of genetic or other susceptibility factors). Treatment may also be continued after symptoms have resolved, for example, to prevent or delay their prevention or recurrence.
  • the term“variant” refers to a protein having characteristics that deviate from what occurs in nature that retains at least one functional i.e. binding, interaction, or enzymatic activity and/or therapeutic property thereof.
  • A“variant” is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the wild type protein.
  • a variant of Cas9 may comprise a Cas9 that has one or more changes in amino acid residues as compared to a wild type Cas9 amino acid sequence.
  • a variant of a deaminase may comprise a deaminase that has one or more changes in amino acid residues as compared to a wild type deaminase amino acid sequence, e.g. following ancestral sequence reconstruction of the deaminase.
  • changes include chemical modifications, substitutions of different amino acid residues truncations, covalent additions (e.g. of a tag), and any other changes.
  • This term also embraces fragments of a wild type protein.
  • the level or degree of which the property is retained may be reduced relative to the wild type protein but is typically the same or similar in kind. Generally, variants are overall very similar, and in many regions, identical to the amino acid sequence of the protein described herein. A skilled artisan will appreciate how to make and use variants that maintain all, or at least some, of a functional ability or property.
  • the variant proteins may comprise, or alternatively consist of, an amino acid sequence which is at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%, identical to, for example, the amino acid sequence of a wild-type protein, or any protein provided herein.
  • Further polypeptides encompassed by the invention are polypeptides encoded by polynucleotides which hybridize to the complement of a nucleic acid molecule encoding a protein such as a napDNAbp under stringent hybridization conditions (e.g. hybridization to filter bound DNA in 6x Sodium chloride/S odium citrate (SSC) at about 45 degrees Celsius, followed by one or more washes in 0.2. times.
  • stringent hybridization conditions e.g. hybridization to filter bound DNA in 6x Sodium chloride/S odium citrate (SSC) at about 45 degrees Celsius, followed by one or more washes in 0.2. times.
  • SSC 0.1% SDS at about 50-65 degrees Celsius
  • highly stringent conditions e.g. hybridization to filter bound DNA in 6x sodium chloride/S odium citrate (SSC) at about 45 degrees Celsius, followed by one or more washes in O.lxSSC, 0.2% SDS at about 68 degrees Celsius
  • other stringent hybridization conditions which are known to those of skill in the art (see, for example, Ausubel, F. M. el al, eds., 1989 Current Protocol in Molecular Biology , Green publishing associates, Inc., and John Wiley & Sons Inc., New York, at pp. 6.3.1-6.3.6 and 2.10.3).
  • polypeptide having an amino acid sequence at least, for example, 95%
  • amino acid sequence of the subject polypeptide is identical to the query sequence except that the subject polypeptide sequence may include up to five amino acid alterations per each 100 amino acids of the query amino acid sequence.
  • the subject polypeptide sequence may include up to five amino acid alterations per each 100 amino acids of the query amino acid sequence.
  • up to 5% of the amino acid residues in the subject sequence may be inserted, deleted, or substituted with another amino acid.
  • These alterations of the reference sequence may occur at the amino- or carboxy-terminal positions of the reference amino acid sequence or anywhere between those terminal positions, interspersed either individually among residues in the reference sequence or in one or more contiguous groups within the reference sequence.
  • any particular polypeptide is at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to, for instance, the amino acid sequence of a protein such as a napDNAbp, can be determined conventionally using known computer programs.
  • a preferred method for determining the best overall match between a query sequence (a sequence of the present invention) and a subject sequence, also referred to as a global sequence alignment, can be determined using the FASTDB computer program based on the algorithm of Brutlag el al. ⁇ Comp. App. Biosci. 6:237-245 (1990)).
  • the query and subject sequences are either both nucleotide sequences or both amino acid sequences.
  • the result of said global sequence alignment is expressed as percent identity.
  • the percent identity is corrected by calculating the number of residues of the query sequence that are N- and C- terminal of the subject sequence, which are not matched/aligned with a corresponding subject residue, as a percent of the total bases of the query sequence. Whether a residue is matched/aligned is determined by results of the FASTDB sequence alignment.
  • This percentage is then subtracted from the percent identity, calculated by the above FASTDB program using the specified parameters, to arrive at a final percent identity score.
  • This final percent identity score is what is used for the purposes of the present invention. Only residues to the N- and C-termini of the subject sequence, which are not matched/aligned with the query sequence, are considered for the purposes of manually adjusting the percent identity score. That is, only query residue positions outside the farthest N- and C-terminal residues of the reference sequence.
  • wild type is a term of the art understood by skilled persons and means the typical form of an organism, strain, gene or characteristic as it occurs in nature as distinguished from mutant or variant forms.
  • the present disclosure provides adenine-to-cytosine or“ACBE” (or thymine-to- guanine or“TGBE”) transversion base editors which comprise a napDNAbp (e.g., a nCas9 domain) fused to a nucleobase modification domain.
  • the nucleobase modification domain may comprise an adenine oxidase.
  • the disclosed ACBE transversion base editors are capable of converting an A:T nucleobase pair to a C:G nucleobase pair in a target nucleotide sequence of interest, e.g., the genome of a cell.
  • the disclosed base editors comprise an engineered oxidase variant that catalyzes the conversion of a target adenine to a cytosine via an oxidation reaction.
  • the disclosed base editors also comprise TGBE transversion base editors that comprise an engineered oxidase variant that catalyzes the conversion of a target adenine to a cytosine via an oxidation reaction, wherein the base-paired thymine of the non-edited (i.e. non-oxidized) strand is subsequently converted to a guanine by the concerted action of the cell’s mismatch repair factors.
  • 8-oxoadenine oxidation strategy enzyme-catalyzed oxidation of a targeted A in a nucleic acid of interest results in 8-oxoadenine (8-oxoA) formation.
  • 8-oxoA 8-oxoadenine
  • Steric rotation of the 8- oxoA around the glycosidic bond is induced, presenting the Hoogsteen edge for base pairing.
  • 8-oxoA is read by a polymerase as a cytosine and the cell’s mismatch repair machinery converts the base-paired thymine of the non-edited strand to a guanine to correct the apparent mismatch.
  • the cell’s mismatch repair machiner converts the 8-oxoA lesion to a cytosine, thereby completing the desired A:T to C:G mutation.
  • Adenine oxidation is achieved by the targeted use of a fusion protein comprising a napDNAbp (e.g., a Cas9 nickase (“nCas9”)) domain, an adenine oxidase domain, and optionally a linker connecting these two domains (see FIG. 1).
  • the adenine oxidase domains of the disclosed base editors may comprise variants of wild-type oxidase enzymes. These variants may comprise an amino acid sequence that is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the wild type enzyme.
  • the adenine oxidase domains may comprise an amino acid sequence having 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, or more than 30 amino acids that differ relative to the amino acid sequence of the wild type enzyme.
  • the adenine oxidase domains contain stretches of about 50, about 75, about 100, about 125, about 150, about 175, about 200, about 300, about 400, about 500, or more than 500 consecutive amino acids in common with the wild type enzyme.
  • the adenine oxidase domains comprise truncations at the N-terminus or C-terminus relative to the wild-type enzyme.
  • the adenine oxidase domains comprise truncations of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, or more than 30 amino acids at the N-terminus or C-terminus relative to the wild- type or base sequence.
  • the adenine oxidase is an AlkBH3, or a variant thereof. In certain embodiments, the adenine oxidase is a bacterial AlkB, or a variant thereof. In other embodiments, the adenine oxidase is a human AlkBH, or a variant thereof. In certain embodiments, the adenine oxidase is a human AlkBH 1, AlkBH2, AlkBH3, AlkBH4,
  • the adenine oxidase is a TET-oxidase, or a variant thereof.
  • the oxidase is a human TET1, TET2, TET3, the catalytic domain of a human TET1 (TET1-CD), or other effector domains of human TET1, TET2, or TET3, or a variant thereof.
  • the adenine oxidase is a xanthine dehydrogenase, or a variant thereof.
  • the xanthine dehydrogenase is a human xanthine
  • the xanthine dehydrogenase is a Streptomyces cyanogenus xanthine dehydrogenase (ScXDH), or a variant thereof.
  • the xanthine dehydrogenase or variant thereof is derived from C. capitata, N. crassa, M. hansupus, E. cloacae, S. snoursei, S. albulus, S. himastatinicus , or S. lividans.
  • the adenine oxidase is a cytochrome P450 enzyme, or a variant thereof.
  • the oxidase is a human CYP1A2, CYP2A4, or CYP3A6, or a variant thereof.
  • the oxidase is a molybdopterin-dependent aldehyde oxidase (e.g ., human AOX1).
  • the oxidase is a flavin monooxygenase.
  • the adenine oxidase is a human FTO, or a variant thereof.
  • the instant specification provides for A:T to C:G transversion base editors which overcome a need in the art for installation of targeted transversions into a target or desired nucleotide sequence, e.g., a genome.
  • A:T to C:G base editors e.g., fusion proteins comprising an nCas9 domain and an adenine oxidase domain
  • A:T to C:G trans versions e.g., fusion proteins comprising an nCas9 domain and an adenine oxidase domain
  • compositions comprising the transversion base editors as described herein, e.g., fusion proteins comprising an nCas9 domain and an adenine oxidase domain, and one or more guide RNAs, e.g., a single-guide RNA (“sgRNA”).
  • sgRNA single-guide RNA
  • the instant specification provides for nucleic acid molecules encoding and/or expressing the transversion base editors as described herein, as well as expression vectors or constructs for expressing the transversion base editors described herein and a gRNA, host cells comprising said nucleic acid molecules and expression vectors, and optionally one or more gRNAs, and compositions for delivering and/or administering nucleic acid-based embodiments described herein.
  • the present disclosure provides for methods of making the transversion base editors described herein, as well as methods of using the transversion base editors or nucleic acid molecules encoding the transversion base editors in applications including editing a nucleic acid molecule, e.g., a genome.
  • methods of engineering the transversion base editors provided herein involve a phage-assisted continuous evolution (PACE) system or non-continuous system (e.g., PANCE), which may be utilized to evolve one or more components of a base editor (e.g., an adenine oxidase domain).
  • PACE phage-assisted continuous evolution
  • PANCE non-continuous system
  • methods of making the base editors comprise recombinant protein expression methodologies known to one of ordinary skill in the art.
  • the specification also provides methods for editing a target nucleic acid molecule, e.g., a single nucleobase within a genome, with a base editing system described herein (e.g., in the form of an evolved base editor as described herein, or a vector or construct encoding same).
  • a base editing system described herein e.g., in the form of an evolved base editor as described herein, or a vector or construct encoding same.
  • Such methods involve transducing (e.g., via transfection) cells with a plurality of complexes each comprising a fusion protein (e.g., a fusion protein comprising a Cas9 nickase (nCas9) domain and an adenine oxidase domain) and a gRNA molecule.
  • a fusion protein e.g., a fusion protein comprising a Cas9 nickase (nCas9) domain and an adenine oxidase domain
  • the gRNA is bound to the napDNAbp domain (e.g., nCas9 domain) of the fusion protein.
  • each gRNA comprises a guide sequence of at least 10 contiguous nucleotides (e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 contiguous nucleotides) that is complementary to a target sequence.
  • the methods involve the transfection of nucleic acid constructs (e.g., plasmids) that each (or together) encode the components of a complex of fusion protein and gRNA molecule.
  • a nucleic acid construct that encodes the fusion protein is transfected into the cell separately from the plasmid that encodes the gRNA molecule. In certain embodiments, these components are encoded on a single construct and transfected together.
  • the methods disclosed herein involve the introduction into cells of a complex comprising a fusion protein and gRNA molecule that has been expressed and cloned outside of these cells.
  • any fusion protein e.g., any of the fusion proteins provided herein, may be introduced into the cell in any suitable way, either stably or transiently.
  • a fusion protein may be transfected into the cell.
  • the cell may be transduced or transfected with a nucleic acid construct that encodes a fusion protein.
  • a cell may be transduced (e.g., with a virus encoding a fusion protein), or transfected (e.g., with a plasmid encoding a fusion protein) with a nucleic acid that encodes a fusion protein, or the translated fusion protein.
  • transduction may be a stable or transient transduction.
  • cells expressing a fusion protein or containing a fusion protein may be transduced or transfected with one or more gRNA molecules, for example when the fusion protein comprises a Cas9 (e.g., nCas9) domain.
  • a plasmid expressing a fusion protein may be introduced into cells through electroporation, transient (e.g., lipofection) and stable genome integration (e.g., piggybac) and viral transduction or other methods known to those of skill in the art.
  • the methods described above result in a cutting (or nicking) one strand of the double- stranded DNA, for example, the strand that includes the thymine (T) of the target A:T nucleobase pair opposite the strand containing the target adenine (A) that is being oxidized.
  • This nicking result serves to direct mismatch repair machinery to the non- edited strand, ensuring that the chemically modified nucleobase is not interpreted as a lesion by the machinery.
  • This nick may be created by the use of an nCas9.
  • the present specification provides a complex comprising the base editor fusion proteins described herein and an RNA bound to the Cas9 domain of the fusion protein, such as a guide RNA (gRNA), e.g., a single guide RNA.
  • gRNA guide RNA
  • the target nucleotide sequence may comprise a target sequence (e.g., a point mutation) associated with a disease, disorder, or condition, such as congenital deafness, spastic paraplegia, nonsyndromic hearing loss, spinal muscular atrophy, or hypohidrotic ectodermal dysplasia.
  • the target sequence may comprise a C to A point mutation associated with a disease, disorder, or condition, and wherein the oxidation of the mutant A base results in mismatch repair-mediated correction to a sequence that is not associated with a disease, disorder, or condition.
  • the target sequence may comprise a G to T point mutation associated with a disease, disorder, or condition, and wherein the oxidation of the mutant A base results in mismatch repair-mediated correction to a sequence that is not associated with a disease, disorder, or condition.
  • the target sequence may encode a protein, and where the point mutation is in a codon and results in a change in the amino acid encoded by the mutant codon as compared to a wild-type codon.
  • the target sequence may also be at a splice site, and the point mutation results in a change in the splicing of an mRNA transcript as compared to a wild-type transcript.
  • the target may be at a non-coding sequence of a gene, such as a promoter, and the point mutation results in increased or decreased expression of the gene.
  • Exemplary target genes include GJB2, in which a G to T point mutation at residue 139 results in a congenital deafness phenotype; and SPG11, in which a C to A point mutation at residue 2877 results in a apastic paraplegia phenotype.
  • Additional target genes include OTOF (associated with nonsyndromic hearing loss), IGHMBP2 (associated with spinal muscular atrophy), and EDAR (associated with hypohidrotic ectodermal dysplasia), for which the disease phenotype is frequently caused by C:G to A:T point mutations.
  • C:G to A:T point mutations introduce premature stop codons (UAA, UAG, UGA), resulting in nonsense mutations in protein coding regions.
  • UAA premature stop codon
  • UAG UAG
  • UGA premature stop codons
  • exemplary ACBEs disclosed herein correct these disease alleles in somatic cells, reducing or removing morbidity.
  • exemplary ACBEs disclosed herein may install disease-suppressing alleles in somatic cells.
  • the oxidation of a mutant A results in a change of the amino acid encoded by the mutant codon, which in some cases can result in the expression of a wild-type amino acid.
  • the application of the base editors can also result in a change of the mRNA transcript, and even restoring the mRNA transcript to a wild-type state.
  • the methods described herein involving contacting a base editor with a target nucleotide sequence can occur in vitro, ex vivo, or in vivo.
  • the step of contacting occurs in a subject.
  • the subject has been diagnosed with a disease, disorder, or condition, such as, but not limited to, a disease, disorder, or condition associated with a point mutation in the GJB2 gene, the IGHMBP2 gene, the OTOF gene, the EDAR gene, or the SPG11 gene.
  • the specification discloses a pharmaceutical composition comprising any one of the presently disclosed base editor fusion proteins. In one aspect, the specification discloses a pharmaceutical composition comprising any one of the presently disclosed complexes of fusion proteins and gRNA. In one aspect, the specification discloses a pharmaceutical composition comprising polynucleotides encoding the fusion proteins disclosed herein and polynucleotides encoding a gRNA, or polynucleotides encoding both.
  • the specification discloses a pharmaceutical composition comprising any one of the presently disclosed vectors.
  • the pharmaceutical composition further comprises a pharmaceutically acceptable excipient.
  • the pharmaceutical composition further comprises a lipid and/or polymer.
  • the lipid and/or polymer is cationic. The preparation of such lipid particles is well known. See, e.g. U.S. Patent Nos. 4,880,635; 4,906,477;
  • the present disclosure provides A-to-C (or T-to-G) transversion base editor fusion proteins comprising (i) a nucleic acid programmable DNA binding protein (napDNAbp), and (ii) a nucleobase modification domain capable of facilitating the conversion of a A:T nucleobase pair to a C:G nucleobase pair in a target nucleotide sequence, e.g., a genome.
  • napDNAbp nucleic acid programmable DNA binding protein
  • a nucleobase modification domain capable of facilitating the conversion of a A:T nucleobase pair to a C:G nucleobase pair in a target nucleotide sequence, e.g., a genome.
  • the nucleobase modification domain is an adenine oxidase, which enzymatically converts an adenine nucleobase of an A:T nucleobase pair to an 8- oxoadenine, which is subsequently converted by the cell’s DNA repair and replication machinery to a cytosine, ultimately converting the A:T nucleobase pair to a C:G nucleobase pair.
  • the various domains of the transversion fusion proteins described herein may be obtained as a result of mutagenizing a reference or starting-point base editor (or a component or domain thereof) by a directed evolution process, e.g., a continuous evolution method (e.g., PACE) or a non- continuous evolution method (e.g., PANCE or other discrete plate-based selections).
  • a directed evolution process e.g., a continuous evolution method (e.g., PACE) or a non- continuous evolution method (e.g., PANCE or other discrete plate-based selections).
  • PACE continuous evolution method
  • PANCE non- continuous evolution method
  • the disclosure provides a base editor that has one or more amino acid variations introduced into its amino acid sequence relative to the amino acid sequence of the reference or starting-point base editor.
  • the base editor may include variants in one or more components or domains of the base editor (e.g., variants introduced into a Cas9 domain, an adenine oxidase domain, an inhibitor of base excision repair (iBER) domain, or a variant introduced into combinations of these domains).
  • the nucleobase modification domain may be evolved from a reference protein that is an RNA modifying enzyme (e.g., an /Vl-methyladenosine modification enzyme or a 5-methylcytosine modification enzyme) and evolved using PACE, PANCE, or other plate-based evolution methods to obtain a DNA modifying version of the nucleobase modification domain, which can then be used in the fusion proteins described herein.
  • RNA modifying enzyme e.g., an /Vl-methyladenosine modification enzyme or a 5-methylcytosine modification enzyme
  • the base editors described herein comprise a nucleic acid programmable DNA binding (napDNAbp) domain.
  • the napDNAbp is associated with at least one guide nucleic acid (e.g., guide RNA), which localizes the napDNAbp to a DNA sequence that comprises a DNA strand (i.e., a target strand) that is complementary to the guide nucleic acid, or a portion thereof (e.g., the protospacer of a guide RNA).
  • the guide nucleic-acid “programs” the napDNAbp domain to localize and bind to a complementary sequence of the target strand. Binding of the napDNAbp domain to a complementary sequence enables the nucleobase modification domain of the base editor to access and enzymatically deaminate a target adenine base in the target strand.
  • the napDNAbp can be a CRISPR (clustered regularly interspaced short palindromic repeat)-associated nuclease.
  • CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements and conjugative plasmids).
  • CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids.
  • CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).
  • crRNA CRISPR RNA
  • type II CRISPR systems correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (me) and a Cas9 protein.
  • the tracrRNA serves as a guide for ribonuclease 3- aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the spacer. The target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3 '-5'
  • RNA-binding and cleavage typically requires protein and both RNAs.
  • single guide RNAs sgRNA, or simply“gNRA” can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species. See, e.g., Jinek et al, Science 337:816-821(2012), the entire contents of which is hereby incorporated by reference.
  • the binding mechanism of a napDNAbp - guide RNA complex includes the step of forming an R-loop whereby the napDNAbp induces the unwinding of a double-strand DNA target, thereby separating the strands in the region bound by the napDNAbp.
  • the guideRNA protospacer then hybridizes to the“target strand.” This displaces a“non-target strand” that is
  • the napDNAbp includes one or more nuclease activities, which cuts the DNA leaving various types of lesions (e.g., a nick in one strand of the DNA).
  • the napDNAbp may comprises a nuclease activity that cuts the non-target strand at a first location, and / or cuts the target strand at a second location.
  • the target DNA can be cut to form a“double- stranded break” whereby both strands are cut.
  • the target DNA can be cut at only a single site, i.e., the DNA is“nicked” on one strand.
  • the base editors may comprise the canonical SpCas9, or any ortholog Cas9 protein, or any variant Cas9 protein— including any naturally occurring variant, mutant, or otherwise engineered version of Cas9— that is known or which can be made or evolved through a directed evolution or otherwise mutagenic process.
  • the napDNAbp has a nickase activity, i.e., only cleave one strand of the target DNA sequence.
  • the napDNAbp has an inactive nuclease, e.g., are“dead” proteins.
  • Other variant Cas9 proteins that may be used are those having a smaller molecular weight than the canonical SpCas9 (e.g., for easier delivery) or having modified or rearranged primary amino acid sequence (e.g., the circular permutant forms).
  • the base editors described herein may also comprise Cas9 equivalents, including Casl2a/Cpfl and Casl2b proteins.
  • the napDNAbps used herein e.g., an SpCas9 or SpCas9 variant
  • the disclosure contemplates any Cas9, Cas9 variant, or Cas9 equivalent which has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% sequence identity to a reference Cas9 sequence, such as a reference SpCas9 canonical sequence (set forth in SEQ ID NO: 9), a reference SaCas9 canonical sequence (set forth in SEQ ID NO: 92) or a reference Cas9 equivalent (e.g., Casl2a/Cpfl).
  • a reference Cas9 sequence such as a reference SpCas9 canonical sequence (set forth in SEQ ID NO: 9), a reference SaCas9 canonical sequence (set forth in SEQ ID NO: 92) or a reference Cas9 equivalent (e.g., Casl
  • the napDNAbp directs cleavage of one or both strands at the location of a target sequence, such as within the target sequence and/or within the
  • the napDNAbp directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence.
  • an aspartate-to-alanine substitution (D10A) in the RuvC I catalytic domain of Cas9 from S. pyogenes converts Cas9 from a nuclease that cleaves both strands to a nickase (cleaves a single strand).
  • mutations that render Cas9 a nickase include, without limitation, H840A, N854A, and N863A in reference to the canonical SpCas9 sequence, or to equivalent amino acid positions in other Cas9 variants or Cas9 equivalents.
  • Cas protein refers to a full-length Cas protein obtained from nature, a recombinant Cas protein having a sequences that differs from a naturally occurring Cas protein, or any fragment of a Cas protein that nevertheless retains all or a significant amount of the requisite basic functions needed for the disclosed methods, i.e., (i) possession of nucleic-acid programmable binding of the Cas protein to a target DNA, and (ii) ability to nick the target DNA sequence on one strand.
  • the Cas proteins contemplated herein embrace CRISPR Cas9 proteins, as well as Cas9 equivalents, variants (e.g., Cas9 nickase (nCas9) or nuclease inactive Cas9 (dCas9)) homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or recombinant), and may include a Cas9 equivalent from any type of CRISPR system (e.g., type II, V, VI), including Cpfl (a type-V CRISPR-Cas systems), C2cl (a type V CRISPR-Cas system), C2c2 (a type VI CRISPR-Cas system) and C2c3 (a type V CRISPR-Cas system).
  • Cpfl a type-V CRISPR-Cas systems
  • C2cl a type V CRISPR-Cas system
  • C2c2 a type VI CRISPR-Ca
  • C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector,” Science 2016; 353(6299), the contents of which are incorporated herein by reference.
  • Cas9 or“Cas9 domain” embraces any naturally occurring Cas9 from any organism, any naturally-occurring Cas9 equivalent or functional fragment thereof, any Cas9 homolog, ortholog, or paralog from any organism, and any mutant or variant of a Cas9, naturally-occurring or engineered.
  • the term Cas9 is not meant to be particularly limiting and may be referred to as a“Cas9 or equivalent.”
  • Exemplary Cas9 proteins are further described herein and/or are described in the art and are incorporated herein by reference. The present disclosure is unlimited with regard to the particular napDNAbp that is employed in the base editors of the disclosure.
  • Cas9 and Cas9 equivalents are provided as follows; however, these specific examples are not meant to be limiting.
  • the base editors of the present disclosure may use any suitable napDNAbp, including any suitable Cas9 or Cas9 equivalent.
  • the base editor constructs described herein may comprise the “canonical SpCas9” nuclease from S. pyogenes, which has been widely used as a tool for genome engineering.
  • This Cas9 protein is a large, multi-domain protein containing two distinct nuclease domains. Point mutations can be introduced into Cas9 to abolish one or both nuclease activities, resulting in a nickase Cas9 (nCas9) or dead Cas9 (dCas9), respectively, that still retains its ability to bind DNA in a sgRNA-programmed manner.
  • Cas9 or variant thereof when fused to another protein or domain, Cas9 or variant thereof (e.g., nCas9) can target that protein to virtually any DNA sequence simply by co-expression with an appropriate sgRNA.
  • the canonical SpCas9 protein refers to the wild type protein from
  • Streptococcus pyogenes having the following amino acid sequence:
  • the base editors described herein may include canonical SpCas9, or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity with a wild type Cas9 sequence provided above.
  • These variants may include SpCas9 variants containing one or more mutations, including any known mutation reported with the
  • the base editors described herein may include any of the above SpCas9 sequences, or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
  • the Cas9 protein can be a wild type Cas9 ortholog from another bacterial species.
  • the following Cas9 orthologs can be used in connection with the base editor constructs described in this disclosure.
  • any variant Cas9 orthologs having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to any of the below orthologs may also be used with the disclosed base editors.
  • the base editors described herein may include any of the above Cas9 ortholog sequences, or any variants thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
  • the napDNAbp may include any suitable homologs and/or orthologs or naturally occurring enzymes, such as Cas9.
  • Cas9 homologs and/or orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus .
  • the Cas moiety is configured (e.g, mutagenized, recombinantly engineered, or otherwise obtained from nature) as a nickase, i.e., capable of cleaving only a single strand of the target doubpdditional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier,“The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference.
  • a Cas9 nuclease has an inactive (e.g., an inactivated) DNA cleavage domain, that is, the Cas9 is a nickase.
  • the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 3.
  • the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 protein as provided by any one of the Cas9 orthologs in the above tables.
  • the disclosed base editors may comprise a catalytically inactive, or“dead,” napDNAbp domain.
  • exemplary catalytically inactive domains in the disclosed base editors are dead S. pyogenes Cas9 (dSpCas9) and S. pyogenes Cas9 nickase (SpCas9n).
  • the base editors described herein may include a dead Cas9, e.g., dead SpCas9, which has no nuclease activity due to one or more mutations that inactivate both nuclease domains of SpCas9, namely the RuvC domain (which cleaves the non-protospacer DNA strand) and HNH domain (which cleaves the protospacer DNA strand).
  • the nuclease inactivation may be due to one or mutations that result in one or more substitutions and/or deletions in the amino acid sequence of the encoded protein, or any variants thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
  • the base editors described herein may include a dead Cas9, e.g., dead SpCas9, which has no nuclease activity due to one or more mutations that inactivate both nuclease domains of SpCas9, namely the RuvC domain (which cleaves the non-protospacer DNA strand) and HNH domain (which cleaves the protospacer DNA strand).
  • the D10A and N580A mutations in the wild-type S. aureus Cas9 amino acid sequence may be used to form a dSaCas9.
  • the napDNAbp domain of the base editors provided herein comprises a dSaCas9 that has D10A and N580A mutations relative to the wild-type SaCas9 sequence (SEQ ID NO: 92).
  • dCas9 refers to a nuclease-inactive Cas9 or nuclease-dead Cas9, or a functional fragment thereof, and embraces any naturally occurring dCas9 from any organism, any naturally-occurring dCas9 equivalent or functional fragment thereof, any dCas9 homolog, ortholog, or paralog from any organism, and any mutant or variant of a dCas9, naturally-occurring or engineered.
  • dCas9 is not meant to be particularly limiting and may be referred to as a“dCas9 or equivalent.”
  • Exemplary dCas9 proteins and method for making dCas9 proteins are further described herein and/or are described in the art and are incorporated herein by reference.
  • dCas9 corresponds to, or comprises in part or in whole, a Cas9 amino acid sequence having one or more mutations that inactivate the Cas9 nuclease activity.
  • Cas9 variants having mutations other than D10A and H840A are provided which may result in the full or partial inactivate of the endogenous Cas9 nuclease activity (e.g., nCas9 or dCas9, respectively).
  • Such mutations include other amino acid substitutions at DIO and H820, or other substitutions within the nuclease domains of Cas9 (e.g., substitutions in the HNH nuclease subdomain and/or the RuvCl subdomain) with reference to a wild type sequence such as Cas9 from Streptococcus pyogenes (NCBI Reference Sequence: NC_017053.1).
  • variants or homologues of Cas9 are provided which are at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to NCBI Reference Sequence: NC_017053.1.
  • variants of dCas9 are provided having amino acid sequences which are shorter, or longer than NC_017053.1 by about 5 amino acids, by about 10 amino acids, by about 15 amino acids, by about 20 amino acids, by about 25 amino acids, by about 30 amino acids, by about 40 amino acids, by about 50 amino acids, by about 75 amino acids, by about 100 amino acids or more.
  • the napDNAbp domain of any of the disclosed base editors comprises a dead S. pyogenes Cas9 (dSpCas9).
  • the napDNAbp domain of any of the disclosed based editors is comprises at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to SEQ ID NO: 106.
  • the napDNAbp domain of any of the disclosed base editors comprises the amino acid sequence of SEQ ID NO: 106.
  • the dead Cas9 may be based on the canonical SpCas9 sequence of Q99ZW2 and may have the following sequence, which comprises a D10A and an H810A substitutions (underlined and bolded), or a variant of SEQ ID NO: 106 having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto:
  • the disclosed base editors may comprise a napDNAbp domain that comprises a nickase.
  • the base editors described herein comprise a Cas9 nickase.
  • the term“Cas9 nickase” of“nCas9” refers to a variant of Cas9 which is capable of introducing a single-strand break in a double strand DNA molecule target.
  • the Cas9 nickase comprises only a single functioning nuclease domain.
  • the wild type Cas9 (e.g., the canonical SpCas9) comprises two separate nuclease domains, namely, the RuvC domain (which cleaves the non-protospacer DNA strand) and HNH domain (which cleaves the protospacer DNA strand).
  • the Cas9 nickase comprises a mutation in the RuvC domain which inactivates the RuvC nuclease activity.
  • nickase mutations in the RuvC domain could include D10X, H983X, D986X, or E762X, wherein X is any amino acid other than the wild type amino acid.
  • the nickase could be D10A, of H983A, or D986A, or E762A, or a combination thereof.
  • the napDNAbp domain of any of the disclosed base editors comprises an S. pyogenes Cas9 nickase (SpCas9n).
  • the napDNAbp domain of any of the disclosed based editors is comprises at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to SEQ ID NO: 112 or 118.
  • the napDNAbp domain of any of the disclosed base editors comprises the amino acid sequence of SEQ ID NO: 112.
  • the napDNAbp domain of any of the disclosed base editors comprises the amino acid sequence of SEQ ID NO: 118.
  • the napDNAbp domain of any of the disclosed base editors comprises an S. aureus Cas9 nickase (SaCas9n). In some embodiments, the napDNAbp domain of any of the disclosed based editors is comprises at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to SEQ ID NO: 116. In some embodiments, the napDNAbp domain of any of the disclosed base editors comprises the amino acid sequence of SEQ ID NO: 116.
  • the Cas9 nickase can having a mutation in the RuvC nuclease domain and have one of the following amino acid sequences, or a variant thereof having an amino acid sequence that has at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
  • the Cas9 nickase comprises a mutation in the HNH domain which inactivates the HNH nuclease activity.
  • mutations in histidine (H) 840 or asparagine (R) 863 have been reported as loss-of-function mutations of the HNH nuclease domain and the creation of a functional Cas9 nickase (e.g., Nishimasu el al,“Crystal structure of Cas9 in complex with guide RNA and target DNA,” Cell 156(5), 935-949, which is incorporated herein by reference).
  • nickase mutations in the HNH domain could include H840X and R863X, wherein X is any amino acid other than the wild type amino acid.
  • the nickase could be H840A or R863A or a combination thereof.
  • the Cas9 nickase can have a mutation in the HNH nuclease domain and have one of the following amino acid sequences, or a variant thereof having an amino acid sequence that has at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
  • the N-terminal methionine is removed from a Cas9 nickase, or from any Cas9 variant, ortholog, or equivalent disclosed or contemplated herein.
  • methionine-minus Cas9 nickases include the following sequences, or a variant thereof having an amino acid sequence that has at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
  • the napDNAbp domains used in the base editors described herein may also include other Cas9 variants that area at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about
  • a Cas9 variant may have 1, 2, 3,
  • the Cas9 variant comprises a fragment of a reference Cas9 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild type Cas9.
  • a reference Cas9 e.g., a gRNA binding domain or a DNA-cleavage domain
  • the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9 (e.g., SEQ ID NO: 9).
  • a corresponding wild type Cas9 e.g., SEQ ID NO: 9
  • the disclosure also may utilize Cas9 fragments which retain their functionality and which are fragments of any herein disclosed Cas9 protein.
  • the Cas9 fragment is at least 100 amino acids in length.
  • the fragment is at least 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, or at least 1300 amino acids in length.
  • the base editors disclosed herein may comprise one of the Cas9 variants described as follows, or a Cas9 variant thereof having at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any reference Cas9 variants.
  • the base editors described herein can include any Cas9 equivalent.
  • Cas9 equivalent is a broad term that encompasses any napDNAbp protein that serves the same function as Cas9 in the present base editors despite that its amino acid primary sequence and/or its three-dimensional structure may be different and/or unrelated from an evolutionary standpoint.
  • Cas9 equivalents include any Cas9 ortholog, homolog, mutant, or variant described or embraced herein that are
  • the Cas9 equivalents also embrace proteins that may have evolved through convergent evolution processes to have the same or similar function as Cas9, but which do not necessarily have any similarity with regard to amino acid sequence and/or three dimensional structure.
  • the base editors described here embrace any Cas9 equivalent that would provide the same or similar function as Cas9 despite that the Cas9 equivalent may be based on a protein that arose through convergent evolution.
  • CasX is a Cas9 equivalent that reportedly has the same function as Cas9 but which evolved through convergent evolution.
  • the CasX protein described in Liu et al.,“CasX enzymes comprises a distinct family of RNA-guided genome editors,” Nature , 2019, Vol.566: 218-223, is contemplated to be used with the base editors described herein.
  • Cas9 is a bacterial enzyme that evolved in a wide variety of species.
  • the Cas9 equivalents contemplated herein may also be obtained from archaea, which constitute a domain and kingdom of single-celled prokaryotic microbes different from bacteria.
  • Cas9 equivalents may refer to CasX or CasY, which have been described in, for example, Burstein et ah,“New CRISPR-Cas systems from
  • Cas9 refers to CasX, or a variant of CasX.
  • Cas9 refers to a CasY, or a variant of CasY. It should be appreciated that other RNA-guided DNA binding proteins may be used as a nucleic acid programmable DNA binding protein (napDNAbp), and are within the scope of this disclosure. Also see Liu et ah, “CasX enzymes comprises a distinct family of RNA-guided genome editors,” Nature , 2019, Vol.566: 218-223. Any of these Cas9 equivalents are contemplated.
  • the Cas9 equivalent comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring CasX or CasY protein.
  • the napDNAbp is a naturally-occurring CasX or CasY protein.
  • the napDNAbp comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a wild-type Cas moiety or any Cas moiety provided herein.
  • the nucleic acid programmable DNA binding proteins include, without limitation, Cas9 (e.g ., dCas9 and nCas9), CasX, CasY, Cpfl, C2cl, C2c2, C2C3, Argonaute, Casl2a, and Casl2b.
  • Cas9 e.g ., dCas9 and nCas9
  • CasX CasY
  • Cpfl C2cl
  • C2c2, C2C3, Argonaute Casl2a
  • Casl2b e.g a nucleic acid programmable DNA- binding protein that has different PAM specificity than Cas9 is Clustered Regularly
  • Cpfl Interspaced Short Palindromic Repeats from Prevotella and Francisella 1 (Cpfl). Similar to Cas9, Cpfl is also a class 2 CRISPR effector. It has been shown that Cpfl mediates robust DNA interference with features distinct from Cas9. Cpfl is a single RNA-guided
  • Cpfl cleaves DNA via a staggered DNA double-stranded break.
  • TTN T-rich protospacer-adjacent motif
  • TTTN TTTN
  • YTN T-rich protospacer-adjacent motif
  • Cpfl proteins are known in the art and have been described previously, for example Yamano et al,“Crystal structure of Cpfl in complex with guide RNA and target DNA.” Cell (165) 2016, p. 949-962; the entire contents of which is hereby incorporated by reference. The state of the art may also now refer to Cpfl enzymes as Cas 12a.
  • the Cas protein may include any CRISPR associated protein, including but not limited to Casl2a, Casl2b, Casl, CaslB, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (sometimes referred to as Csnl and Csxl2), CaslO, Csyl, Csy2, Csy3, Csel, Cse2, Cscl, Csc2, Csa5, Csn2.
  • a nickase mutation e.g., a mutation corresponding to the D10A mutation of the wild type SpCas9 polypeptide of SEQ ID NO: 9).
  • the napDNAbp can be any of the following proteins: a Cas9, a Cpfl, a CasX, a CasY, a C2cl, a C2c2, a C2c3, a GeoCas9, a CjCas9, a Casl2a, a Casl2b, a Casl2g, a Casl2h, a Casl2i, a Casl3b, a Casl3c, a Casl3d, a Casl4, a Csn2, an xCas9, an SpCas9-NG, a circularly permuted Cas9, or an Argonaute (Ago), a Cas9-KKH, a SmacCas9, a Spy-macCas9, an SpCas9-VRQR, an SpCas9-NRRH, an SpaCas9-
  • the base editors contemplated herein can include a Cas9 protein that is of smaller molecular weight than the canonical SpCas9 sequence.
  • the smaller-sized Cas9 variants may facilitate delivery to cells, e.g., by an expression vector, nanoparticle, or other means of delivery.
  • the canonical SpCas9 protein is 1368 amino acids in length and has a predicted molecular weight of 158 kilodaltons.
  • small-sized Cas9 variant refers to any Cas9 variant— naturally occurring, engineered, or otherwise— that is less than at least 1300 amino acids, or at least less than 1290 amino acids, or than less than 1280 amino acids, or less than 1270 amino acid, or less than 1260 amino acid, or less than 1250 amino acids, or less than 1240 amino acids, or less than 1230 amino acids, or less than 1220 amino acids, or less than 1210 amino acids, or less than 1200 amino acids, or less than 1190 amino acids, or less than 1180 amino acids, or less than 1170 amino acids, or less than 1160 amino acids, or less than 1150 amino acids, or less than 1140 amino acids, or less than 1130 amino acids, or less than 1120 amino acids, or less than 1110 amino acids, or less than 1100 amino acids, or less than 1050 amino acids, or less than 1000 amino acids, or less than 950 amino acids, or less than 900 amino acids, or less than 850 amino acids, or less than 800 amino acids, or
  • the base editors disclosed herein may comprise one of the small-sized Cas9 variants described as follows, or a Cas9 variant thereof having at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any reference small-sized Cas9 protein.
  • Exemplary small-sized Cas9 variants include, but are not limited to, SaCas9 and LbCasl2a.
  • the base editors described herein may also comprise
  • Casl2a/Cpfl (dCpfl) variants that may be used as a guide nucleotide sequence- programmable DNA-binding protein domain.
  • the Casl2a/Cpfl protein has a RuvC-like endonuclease domain that is similar to the RuvC domain of Cas9 but does not have a HNH endonuclease domain, and the N-terminal of Cpfl does not have the alpha-helical recognition lobe of Cas9.
  • Additional exemplary Cas9 equivalent protein sequences can include the following:
  • the napDNAbp is a nucleic acid programmable DNA binding protein that does not require a canonical (NGG) PAM sequence.
  • the napDNAbp is an argonaute protein.
  • NgAgo is a ssDNA-guided endonuclease. NgAgo binds 5' phosphorylated ssDNA of ⁇ 24 nucleotides (gDNA) to guide it to its target site and will make DNA double-strand breaks at the gDNA site.
  • NgAgo-gDNA system does not require a protospacer-adjacent motif (PAM).
  • PAM protospacer-adjacent motif
  • the disclosure provides napDNAbp domains that comprise SpCas9 variants that recognize and work best with NRRH, NRCH, and NRTH PAMs. See PCT Application No. PCT/US2019/47996, incorporated by reference herein.
  • the disclosed base editors comprise a napDNAbp domain selected from SpCas9-NRRH, SpCas9-NRTH, and SpCas9-NRCH.
  • the disclosed base editors comprise a napDNAbp domain that has a sequence that is at least 90%, at least 95%, at least 98%, or at least 99% identical to SpCas9-NRRH.
  • the disclosed base editors comprise a napDNAbp domain that comprises SpCas9-NRRH.
  • the SpCas9-NRRH has an amino acid sequence as presented in SEQ ID NO: 141 (underligned residues are mutated relative to SpCas9, as set forth in SEQ ID NO: 9)
  • the disclosed base editors comprise a napDNAbp domain that has a sequence that is at least 90%, at least 95%, at least 98%, or at least 99% identical to
  • the disclosed base editors comprise a napDNAbp domain that comprises SpCas9-NRCH.
  • the SpCas9-NRCH has an amino acid sequence as presented in SEQ ID NO: 142 (underligned residues are mutated relative to SpCas9)
  • the disclosed base editors comprise a napDNAbp domain that has a sequence that is at least 90%, at least 95%, at least 98%, or at least 99% identical to SpCas9-NRTH.
  • the disclosed base editors comprise a napDNAbp domain that comprises SpCas9-NRTH.
  • the SpCas9-NRTH has an amino acid sequence as presented in SEQ ID NO: 143 (underligned residues are mutated relative to SpCas9)
  • the napDNAbp of any of the disclosed base editors comprises a Cas9 derived from a Streptococcus macacae, e.g. Streptococcus macacae NCTC 11558, or
  • the napDNAbp comprises a hybrid variant of SmacCas9 that incorporates an SpCas9 domain with the SmacCas9 domain and is known as Spy-macCas9, or a variant thereof.
  • the napDNAbp comprises a hybrid variant of SmacCas9 that incorporates an increased nucleolytic variant of an SpCas9 (iSpy Cas9) domain and is known as iSpy-macCas9.
  • iSpyMac-Cas9 contains two mutations, R221K and N394K, that were identified by deep mutational scans of Spy Cas9 that raise modification rates of the protein on most targets.
  • Jakimo et al. showed that the hybrids Spy- macCas9 and iSpy-macCas9 recognize a short 5'-NAA-3' PAM and recognized all evaluated adenine dinucleotide PAM sequences and posseseds robust editing efficiency in human cells.
  • Liu et al. engineered base editors containing Spy-mac Cas9, and demonstrated that cytidine and base editors containing Spymac domains can induce efficient C-to-T and A-to-G conversions in vivo.
  • Liu et al. suggested that the PAM scope of Spy-mac Cas9 may be 5 '-T AAA-3', rather than 5'-NAA-3' as reported by Jakimo et al. See Liu et al. Cell Discovery (2019) 5:58, herein incorporated by reference.
  • the disclosed base editors comprise a napDNAbp domain that has a sequence that is at least 90%, at least 95%, at least 98%, or at least 99% identical to iSpyMac-Cas9.
  • the disclosed base editors comprise a napDNAbp domain that comprises iSpyMac-Cas9.
  • the iSpyMac-Cas9 has an amino acid sequence as presented in SEQ ID NO: 144 (R221K and N394K mutations are underlined):
  • the napDNAbp of any of the disclosed base editors is a prokaryotic homolog of an Argonaute protein.
  • Prokaryotic homologs of Argonaute proteins are known and have been described, for example, in Makarova K., el al.,“Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements”, Biol Direct. 2009 Aug 25;4:29. doi:
  • the napDNAbp is a Marinitoga piezophila Argunaute (MpAgo) protein.
  • the CRISPR-associated Marinitoga piezophila Argunaute (MpAgo) protein cleaves single- stranded target sequences using 5'-phosphorylated guides.
  • the 5' guides are used by all known Argonautes.
  • the crystal structure of an MpAgo-RNA complex shows a guide strand binding site comprising residues that block 5' phosphate interactions. This data suggests the evolution of an Argonaute subclass with noncanonical specificity for a 5'-hydroxylated guide.
  • the napDNAbp is a single effector of a microbial CRISPR-Cas system.
  • Single effectors of microbial CRISPR-Cas systems include, without limitation, Cas9, Cpfl, C2cl, C2c2, and C2c3.
  • microbial CRISPR-Cas systems are divided into Class 1 and Class 2 systems. Class 1 systems have multisubunit effector complexes, while Class 2 systems have a single protein effector.
  • Cas9 and Cpfl are Class 2 effectors.
  • three distinct Class 2 CRISPR-Cas systems (C2cl, C2c2, and C2c3) have been described by Shmakov el al.,“Discovery and Functional
  • C2cl and C2c3 contain RuvC-like endonuclease domains related to Cpfl.
  • a third system, C2c2 contains an effector with two predicated HEPN RNase domains.
  • C2cl Production of mature CRISPR RNA is tracrRNA-independent, unlike production of CRISPR RNA by C2cl.
  • C2cl depends on both CRISPR RNA and tracrRNA for DNA cleavage.
  • Bacterial C2c2 has been shown to possess a unique RNase activity for CRISPR RNA maturation distinct from its RNA-activated single- stranded RNA degradation activity. These RNase functions are different from each other and from the CRISPR RNA-processing behavior of Cpfl. See, e.g., East-Seletsky, et al.,“Two distinct RNase activities of CRISPR- C2c2 enable guide-RNA processing and RNA detection”, Nature, 2016 Oct
  • C2c2 is a single-component programmable RNA-guided RNA- targeting CRISPR effector”, Science, 2016 Aug 5; 353(6299), the entire contents of which are hereby incorporated by reference.
  • the napDNAbp may be a C2cl, a C2c2, or a C2c3 protein. In some embodiments, the napDNAbp is a C2cl protein. In some embodiments, the napDNAbp is a C2c2 protein. In some embodiments, the napDNAbp is a C2c3 protein.
  • the napDNAbp comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring C2cl, C2c2, or C2c3 protein.
  • the napDNAbp is a naturally-occurring C2cl, C2c2, or C2c3 protein.
  • Cas9 domains that have different PAM specificities.
  • Cas9 proteins such as Cas9 from S. pyogenes (spCas9)
  • spCas9 require a canonical NGG PAM sequence to bind a particular nucleic acid region. This may limit the ability to edit desired bases within a genome.
  • the base editing base editors provided herein may need to be placed at a precise location, for example where a target base is placed within a 4 base region (e.g ., a“editing window” or a“target window”), which is approximately 15 bases upstream of the PAM. See Komor, A.C., et al,
  • any of the base editors provided herein may contain a Cas9 domain that is capable of binding a nucleotide sequence that does not contain a canonical (e.g., NGG) PAM sequence.
  • Cas9 domains that bind to non-canonical PAM sequences have been described in the art and would be apparent to the skilled artisan.
  • Cas9 domains that bind non-canonical PAM sequences have been described in Kleinstiver, B. P., et al.,“Engineered CRISPR-Cas9 nucleases with altered PAM
  • a napDNAbp domain with altered PAM specificity such as a domain with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity with wild type Francisella novicida Cpfl (SEQ ID NO: 145) (D917, E1006, and D1255), which has the following amino acid sequence:
  • An additional napDNAbp domain with altered PAM specificity such as a domain having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity with wild type Geobacillus thermodenitrificans Cas9 (SEQ ID NO: 146), which has the following amino acid sequence:
  • the nucleic acid programmable DNA binding protein [0165] In some embodiments, the nucleic acid programmable DNA binding protein
  • napDNAbp is a nucleic acid programmable DNA binding protein that does not require a canonical (NGG) PAM sequence.
  • the napDNAbp is an argonaute protein.
  • One example of such a nucleic acid programmable DNA binding protein is an Argonaute protein from Natronobacterium gregoryi (NgAgo).
  • NgAgo is a ssDNA-guided endonuclease.
  • NgAgo binds 5' phosphorylated ssDNA of ⁇ 24 nucleotides (gDNA) to guide it to its target site and will make DNA double-strand breaks at the gDNA site.
  • the NgAgo-gDNA system does not require a protospacer-adjacent motif (PAM).
  • NgAgo nuclease inactive NgAgo
  • the characterization and use of NgAgo have been described in Gao et al, Nat Biotechnol., 34(7): 768-73 (2016), PubMed PMID: 27136078; Swarts et al., Nature, 507(7491): 258-61 (2014); and Swarts et al., Nucleic Acids Res. 43(10) (2015): 5120-9, each of which is incorporated herein by reference.
  • the sequence of Natronobacterium gregoryi Argonaute is provided in SEQ ID NO: 147.
  • the disclosed base editors may comprise a napDNAbp domain having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity with wild type Natronobacterium gregoryi Argonaute (SEQ ID NO: 147), which has the following amino acid sequence:
  • the base editors disclosed herein may comprise a circular permutant of Cas9.
  • the term“circularly permuted Cas9” or“circular permutant” of Cas9 or“CP-Cas9”) refers to any Cas9 protein, or variant thereof, that occurs or has been modify to engineered as a circular permutant variant, which means the N-terminus and the C-terminus of a Cas9 protein (e.g., a wild type Cas9 protein) have been topically rearranged.
  • Such circularly permuted Cas9 proteins, or variants thereof retain the ability to bind DNA when complexed with a guide RNA (gRNA).
  • gRNA guide RNA
  • any of the Cas9 proteins described herein, including any variant, ortholog, or naturally occurring Cas9 or equivalent thereof, may be reconfigured as a circular permutant variant.
  • the circular permutants of Cas9 may have the following structure:
  • the present disclosure contemplates the following circular permutants of canonical S. pyogenes Cas9 (1368 amino acids of UniProtKB - Q99ZW2 (CAS9_STRP1) (numbering is based on the amino acid position in SEQ ID NO: 9)):
  • the circular permuant Cas9 has the following structure (based on S. pyogenes Cas9 (1368 amino acids of UniProtKB - Q99ZW2 (CAS9_STRP1) (numbering is based on the amino acid position in SEQ ID NO: 9):
  • the circular permuant Cas9 has the following structure (based on S. pyogenes Cas9 (1368 amino acids of UniProtKB - Q99ZW2 (CAS9_STRP1) (numbering is based on the amino acid position in SEQ ID NO: 9): N -terminu s- [103-1368] - [optional linker] - [ 1 - 102] -C -terminu s ;
  • the circular permutant can be formed by linking a C-terminal fragment of a Cas9 to an N-terminal fragment of a Cas9, either directly or by using a linker, such as an amino acid linker.
  • the C-terminal fragment may correspond to the C-terminal 95% or more of the amino acids of a Cas9 (e.g., amino acids about 1300-1368), or the C-terminal 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%,
  • the N-terminal portion may correspond to the N-terminal 95% or more of the amino acids of a Cas9 (e.g., amino acids about 1-1300), or the N-terminal 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% or more of a Cas9 (e.g., of SEQ ID NO: 9).
  • a Cas9 e.g., amino acids about 1-1300
  • the N-terminal 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% or more of a Cas9 e.g., of SEQ ID NO: 9).
  • the circular permutant can be formed by linking a C-terminal fragment of a Cas9 to an N-terminal fragment of a Cas9, either directly or by using a linker, such as an amino acid linker.
  • the C-terminal fragment that is rearranged to the N-terminus includes or corresponds to the C-terminal 30% or less of the amino acids of a Cas9 (e.g., amino acids 1012-1368 of SEQ ID NO: 9).
  • the C-terminal fragment that is rearranged to the N-terminus includes or corresponds to the C-terminal 30%, 29%, 28%, 27%, 26%, 25%, 24%, 23%, 22%, 21%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%,
  • the C-terminal fragment that is rearranged to the N-terminus includes or corresponds to the C-terminal 410 residues or less of a Cas9 (e.g., the Cas9 of SEQ ID NO:
  • the C-terminal portion that is rearranged to the N-terminus includes or corresponds to the C-terminal 410, 400, 390, 380, 370, 360, 350, 340, 330, 320, 310, 300, 290, 280, 270, 260, 250, 240, 230, 220, 210, 200, 190, 180, 170, 160, 150, 140,
  • circular permutant Cas9 variants may be defined as a topological rearrangement of a Cas9 primary structure based on the following method, which is based on S.
  • pyogenes Cas9 of SEQ ID NO: 9 (a) selecting a circular permutant (CP) site corresponding to an internal amino acid residue of the Cas9 primary structure, which dissects the original protein into two halves: an N-terminal region and a C-terminal region; (b) modifying the Cas9 protein sequence (e.g., by genetic engineering techniques) by moving the original C-terminal region (comprising the CP site amino acid) to preceed the original N- terminal region, thereby forming a new N-terminus of the Cas9 protein that now begins with the CP site amino acid residue.
  • CP circular permutant
  • the CP site can be located in any domain of the Cas9 protein, including, for example, the helical-II domain, the RuvCIII domain, or the CTD domain.
  • the CP site may be located (relative the S. pyogenes Cas9 of SEQ ID NO: 9) at original amino acid residue 181, 199, 230, 270, 310, 1010, 1016, 1023, 1029, 1041, 1247, 1249, or 1282.
  • original amino acid 181, 199, 230, 270, 310, 1010, 1016, 1023, 1029, 1041, 1247, 1249, or 1282 would become the new N- terminal amino acid.
  • Nomenclature of these CP-Cas9 proteins may be referred to as Cas9- CP 181 , Cas9-CP 199 , Cas9-CP 230 , Cas9-CP 270 , Cas9-CP 310 , Cas9-CP 1010 , Cas9-CP 1016 , Cas9- CP 1023 , Cas9-CP 1029 , Cas9-CP 1041 , Cas9-CP 1247 , Cas9-CP 1249 , and Cas9-CP 1282 , respectively.
  • This description is not meant to be limited to making CP variants from SEQ ID NO: 9, but may be implemented to make CP variants in any Cas9 sequence, either at CP sites that correspond to these positions, or at other CP sites entireley. This description is not meant to limit the specific CP sites in any way. Virtually any CP site may be used to form a CP-Cas9 variant.
  • Exemplary CP-Cas9 amino acid sequences are provided below in which linker sequences are indicated by underlining and optional methionine (M) residues are indicated in bold. It should be appreciated that the disclosure provides CP-Cas9 sequences that do not include a linker sequence or that include different linker sequences. It should be appreciated that CP-Cas9 sequences may be based on Cas9 sequences other than that of SEQ ID NO: 9 and any examples provided herein are not meant to be limiting. Exemplary CP-Cas9 sequences are as follows:
  • Cas9 circular permutants that may be useful in the base editor constructs described herein.
  • Exemplary C-terminal fragments of Cas9 based on the Cas9 of SEQ ID NO: 9, which may be rearranged to an N-terminus of Cas9, are provided below. It should be appreciated that such C-terminal fragments of Cas9 are exemplary and are not meant to be limiting.
  • These exemplary CP-Cas9 fragments have the following sequences:
  • the base editors of the present disclosure may also comprise Cas9 variants with modified PAM specificities.
  • Some aspects of this disclosure provide Cas9 proteins that exhibit activity on a target sequence that does not comprise the canonical PAM (5'-NGG-3', where N is A, C, G, or T) at its 3 '-end.
  • the Cas9 protein exhibits activity on a target sequence comprising a 5'-NGG-3' PAM sequence at its 3 '-end.
  • the Cas9 protein exhibits activity on a target sequence comprising a 5 -NNG- 3' PAM sequence at its 3 '-end.
  • the Cas9 protein exhibits activity on a target sequence comprising a 5'-NNA-3' PAM sequence at its 3 '-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5'-NNC-3' PAM sequence at its 3 '-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5 -NNT-3' PAM sequence at its 3'-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5 -NGT-3' PAM sequence at its 3'-end.
  • the Cas9 protein exhibits activity on a target sequence comprising a 5 -NGA-3' PAM sequence at its 3'-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5 -NGC-3' PAM sequence at its 3'-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5'- NAA-3' PAM sequence at its 3 -end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5 -NAC-3' PAM sequence at its 3 '-end.
  • the Cas9 protein exhibits activity on a target sequence comprising a 5 -NAT-3' PAM sequence at its 3 -end. In still other embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5 -NAG-3' PAM sequence at its 3 -end.
  • the disclosed base editors comprise a napDNAbp domain comprising a SpCas9-NG, which has a PAM that corresponds to NGN.
  • the disclosed base editors comprise a napDNAbp domain comprising a SpCas9-KKH, which has a PAM that corresponds to NNNRRT (SEQ ID NO: 160).
  • any of the amino acid mutations described herein, (e.g., A262T) from a first amino acid residue (e.g., A) to a second amino acid residue (e.g., T) may also include mutations from the first amino acid residue to an amino acid residue that is similar to (e.g., conserved) the second amino acid residue.
  • mutation of an amino acid with a hydrophobic side chain may be a mutation to a second amino acid with a different hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan).
  • alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan may be a mutation to a second amino acid with a different hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan).
  • a mutation of an alanine to a threonine may also be a mutation from an alanine to an amino acid that is similar in size and chemical properties to a threonine, for example, serine.
  • mutation of an amino acid with a positively charged side chain e.g., arginine, histidine, or lysine
  • mutation of a second amino acid with a different positively charged side chain e.g., arginine, histidine, or lysine.
  • mutation of an amino acid with a polar side chain may be a mutation to a second amino acid with a different polar side chain (e.g., serine, threonine, asparagine, or glutamine).
  • Additional similar amino acid pairs include, but are not limited to, the following: phenylalanine and tyrosine; asparagine and glutamine; methionine and cysteine; aspartic acid and glutamic acid; and arginine and lysine. The skilled artisan would recognize that such conservative amino acid substitutions will likely have minor effects on protein structure and are likely to be well tolerated without compromising function.
  • any amino of the amino acid mutations provided herein from one amino acid to a threonine may be an amino acid mutation to a serine.
  • any amino of the amino acid mutations provided herein from one amino acid to an arginine may be an amino acid mutation to a lysine.
  • any amino of the amino acid mutations provided herein from one amino acid to an isoleucine may be an amino acid mutation to an alanine, valine, methionine, or leucine.
  • any amino of the amino acid mutations provided herein from one amino acid to a lysine may be an amino acid mutation to an arginine.
  • any amino of the amino acid mutations provided herein from one amino acid to an aspartic acid may be an amino acid mutation to a glutamic acid or asparagine.
  • any amino of the amino acid mutations provided herein from one amino acid to a valine may be an amino acid mutation to an alanine, isoleucine, methionine, or leucine.
  • any amino of the amino acid mutations provided herein from one amino acid to a glycine may be an amino acid mutation to an alanine. It should be appreciated, however, that additional conserved amino acid residues would be recognized by the skilled artisan and any of the amino acid mutations to other conserved amino acid residues are also within the scope of this disclosure.
  • the present disclosure may utilize any of the Cas9 variants disclosed in the SEQUENCES section herein.
  • the Cas9 protein comprises a combination of mutations that exhibit activity on a target sequence comprising a 5 -NAA-3' PAM sequence at its 3 - end.
  • the combination of mutations are present in any one of the clones listed in Table 1.
  • the combination of mutations are conservative mutations of the clones listed in Table 1.
  • the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 1.
  • the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 1. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 1.
  • the Cas9 protein exhibits an increased activity on a target sequence that does not comprise the canonical PAM (5'-NGG-3') at its 3' end as compared to Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 9.
  • the Cas9 protein exhibits an activity on a target sequence having a 3' end that is not directly adjacent to the canonical PAM sequence (5'-NGG-3') that is at least 5-fold increased as compared to the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 9 on the same target sequence.
  • the Cas9 protein exhibits an activity on a target sequence that is not directly adjacent to the canonical PAM sequence (5'-NGG-3') that is at least 10- fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000- fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of Streptococcus pyogenes as provided by SEQ ID NO: 9 on the same target sequence.
  • the 3' end of the target sequence is directly adjacent to an AAA, GAA, CAA, or TAA sequence.
  • the Cas9 protein comprises a combination of mutations that exhibit activity on a target sequence comprising a 5 -NAC-3' PAM sequence at its 3 '-end.
  • the combination of mutations are present in any one of the clones listed in Table 2. In some embodiments, the combination of mutations are conservative mutations of the clones listed in Table 2. In some embodiments, the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 2.
  • the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 2. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 2.
  • the Cas9 protein comprises a combination of mutations that exhibit activity on a target sequence comprising a 5'-NAT-3' PAM sequence at its 3 '-end.
  • the combination of mutations are present in any one of the clones listed in Table 3.
  • the combination of mutations are conservative mutations of the clones listed in Table 3.
  • the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 3.
  • the above description of various napDNAbps which can be used in connection with the presently disclose base editors is not meant to be limiting in any way.
  • the base editors may comprise the canonical SpCas9, or any ortholog Cas9 protein, or any variant Cas9 protein— including any naturally occurring variant, mutant, or otherwise engineered version of Cas9— that is known or which can be made or evolved through a directed evolutionary or otherwise mutagenic process.
  • the Cas9 or Cas9 varants have a nickase activity, i.e., only cleave of strand of the target DNA sequence.
  • the Cas9 or Cas9 variants have inactive nucleases, i.e., are“dead” Cas9 proteins.
  • Other variant Cas9 proteins that may be used are those having a smaller molecular weight than the canonical SpCas9 (e.g., for easier delivery) or having modified or rearranged primary amino acid structure (e.g., the circular permutant formats).
  • the base editors described herein may also comprise Cas9 equivalents, including Casl2a/Cpfl and Casl2b proteins which are the result of convergent evolution.
  • the napDNAbps used herein may also may also contain various modifications that alter/enhance their PAM specifities.
  • the application contemplates any Cas9, Cas9 variant, or Cas9 equivalent which has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% sequence identity to a reference Cas9 sequence, such as a references SpCas9 canonical sequences or a reference Cas9 equivalent (e.g., Casl2a/Cpfl).
  • a reference Cas9 sequence such as a references SpCas9 canonical sequences or a reference Cas9 equivalent (e.g., Casl2a/Cpfl).
  • the Cas9 variant having expanded PAM capabilities is SpCas9 (H840A) VRQR, or SpCas9-VRQR.
  • the disclosed base editors comprise a napDNAbp domain that has a sequence that is at least 90%, at least 95%, at least 98%, or at least 99% identical to SpCas9-VRQR.
  • the disclosed base editors comprise a napDNAbp domain that comprises SpCas9-VRQR.
  • the SpCas9- VRQR comprises the following amino acid sequence (with the V, R, Q, R substitutions relative to the SpCas9 (H840A) of SEQ ID NO: 158 show, in bold underline.
  • the methionine residue in SpCas9 (H840) was removed for SpCas9 (H840A) VRQR):
  • the Cas9 variant having expanded PAM has expanded PAM
  • SpCas9 (H840A) VRER having the following amino acid sequence (with the V, R, E, R substitutions relative to the SpCas9 (H840A) of SEQ ID NO: 159 are shown in bold underline. In addition, the methionine residue in SpCas9 (H840) was removed for SpCas9 (H840A) VRER):
  • any available methods may be utilized to obtain or construct a variant or mutant Cas9 protein.
  • the term“mutation,” as used herein, refers to a substitution of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue, or a deletion or insertion of one or more residues within a sequence. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue.
  • Mutations can include a variety of categories, such as single base polymorphisms, microduplication regions, indel, and inversions, and is not meant to be limiting in any way. Mutations can include“loss-of-function” mutations which is the normal result of a mutation that reduces or abolishes a protein activity.
  • Gain-of-function mutations are recessive, because in a heterozygote the second chromosome copy carries an unmutated version of the gene coding for a fully functional protein whose presence compensates for the effect of the mutation. Mutations also embrace“gain-of-function” mutations, which is one which confers an abnormal activity on a protein or cell that is otherwise not present in a normal condition. Many gain-of-function mutations are in regulatory sequences rather than in coding regions, and can therefore have a number of consequences. For example, a mutation might lead to one or more genes being expressed in the wrong tissues, these tissues gaining functions that they normally lack. Because of their nature, gain-of-function mutations are usually dominant.
  • Mutations can be introduced into a reference Cas9 protein using site-directed mutagenesis.
  • Older methods of site-directed mutagenesis known in the art rely on sub cloning of the sequence to be mutated into a vector, such as an M13 bacteriophage vector, that allows the isolation of single-stranded DNA template.
  • a mutagenic primer i.e., a primer capable of annealing to the site to be mutated but bearing one or more mismatched nucleotides at the site to be mutated
  • a mutagenic primer i.e., a primer capable of annealing to the site to be mutated but bearing one or more mismatched nucleotides at the site to be mutated
  • PCR-based site-directed mutagenesis has employed PCR methodologies, which have the advantage of not requiring a single-stranded template.
  • methods have been developed that do not require sub-cloning.
  • Several issues must be considered when PCR-based site-directed mutagenesis is performed. First, in these methods it is desirable to reduce the number of PCR cycles to prevent expansion of undesired mutations introduced by the polymerase. Second, a selection must be employed in order to reduce the number of non-mutated parental molecules persisting in the reaction. Third, an extended-length PCR method is preferred in order to allow the use of a single PCR primer set. And fourth, because of the non-template-dependent terminal extension activity of some thermostable polymerases it is often necessary to incorporate an end-polishing step into the procedure prior to blunt-end ligation of the PCR-generated mutant product.
  • the ACBE and TGBE transversion base editors provided herein comprise an adenine oxidase nucleobase modification domain (FIG. 1).
  • An adenine oxidase is an enzyme that has catalytic activity in oxidizing an adenosine nucleobase substrate.
  • Exemplary oxidases of this disclosure catalyze oxidation reactions at the 8 position of adenosine.
  • the adenine oxidases of the present disclosure may be modified from wild-type reference proteins, which include 5-methylcytosine, L0 -mcthyladcnosinc and xanthine modification enzymes.
  • Other modification enzymes that may serve as reference proteins are N 4 -acetylcytosine- and 2-thiocytosine-installing RNA-modification enzymes. See Ito, S. et al. Human NAT 10 Is an ATP-dependent RNA Acetyltransferase responsible for N4- Acetylcytidine Formation in 18 S Ribosomal RNA (rRNA). J. Biol. Chem. 2014, 289, 35724-35730; and Cavuzic, V.; Liu, Y., Biosynthesis of Sulfur-Containing tRNA
  • Wild-type reference proteins may be those from E. coli, S. cyanogenus, yeast, mouse, human, or another organism, including other bacteria. See also Falnes, P. 0.; Rognes, T. DNA repair by bacterial AlkB proteins, Res. Microbiol. (2003) 154(8): 531-538; Ito, S. et al, Tet proteins can convert 5- methylcytosine to 5-formylcytosine and 5-carboxylcytosine, Science (2011) 333(6047):
  • Modified adenine oxidases include variants with at least 80%, at least 85%, at least 90%, at least 95% or at least 99% sequence identity to a wild-type adenine oxidase.
  • modified adenine oxidases may be obtained by altering or evolving a reference protein using a continuous evolution process (e.g., PACE) or non-continuous evolution process (e.g., PANCE or discrete plate -based selections) described herein so that the oxidase is effective on a nucleic acid target.
  • a continuous evolution process e.g., PACE
  • non-continuous evolution process e.g., PANCE or discrete plate -based selections
  • 8-oxopurines common products of oxidative DNA damage, tend to rotate around the glycosidic bond to adopt the syn conformation, presenting the Hoogsteen edge for base pairing.
  • the Hoogsteen edge of 8-oxoA and the Watson-Crick edge of G form a base pair featuring two three-center hydrogen bonding systems (FIG. 2).
  • the 8-oxoA:G pair makes a minimal perturbation to the DNA double helix. Consequently, polymerases misread 8-oxoA and pair it with G, eventually resulting in an A:T to C:G transversion mutation. See Kamiya, H.
  • Exemplary adenine oxidases include, but are not limited to, a-ketoglutarate-dependent iron oxidases, molybdopterin-dependent oxidases, heme iron oxidases, and flavin
  • Exemplary a-ketoglutarate-dependent iron oxidases include AlkbH (ABH) family oxidases, which include human AlkBH3, is to clear /Vl-methylation from adenine in DNA and RNA. These non-heme enzymes perform methyl group C-H hydroxylation on DNA and RNA via an active Fe(IV)-oxo intermediate formed through an iron cofactor. The resulting hemiaminal breaks down to release formaldehyde and the demethylated adenine base.
  • ABH3 is selective for ssDNA over dsDNA, a characteristic of exocyclic amine hydrolyzing enzymes that likely contributes to the selective modification of bases in the targeted ssDNA loop of the ternary Cas9-sgRNA-DNA complex.
  • the TET oxidases are structurally related a-ketoglutarate-dependent iron oxidases and perform C-H hydroxylation on 5-methylcytosine as the first step in removing this important epigenetic marker. Oxidized forms of 5-methylcytosine are recognized by DNA glycosylases and hydrolytically removed, to be replaced eventually by unmethylated cytosine.
  • the Fe(IV)-oxo species of the cofactor- enzyme may be induced to transfer the oxo group from the non-heme Fe(IV) center to the 8 position of adenine.
  • This potential mechanism involves the formation of a 7,8-oxaziridine intermediate, which rearranges spontaneously to the desired 8-oxoadenine (FIG. 3).
  • Exemplary molybdopterin-dependent oxidases that selectively oxidize adenine at the 8 position include xanthine dehydrogenases and aldehyde oxidases. In eukaryotes, these enzymes utilize a monophosphate pyranopterin cofactor, which complexes with a
  • molybdenum to form molybdenum cofactor may effect alkene/arene epoxidation reactions in natural product biosynthesis pathways via similar oxo group transfer mechanisms as those of the non-heme ABH and TET iron oxidases.
  • Exemplary heme iron oxidases that selectively oxidize adenine at the 8 position include cytochrome P450 enzymes.
  • exemplary adenine oxidase domains that can be fused to napDNAbp domains according to embodiments of this disclosure are provided below.
  • Exemplary adenine oxidase domains include variants with at least 80%, at least 85%, at least 90%, at least 95% or at least 99% sequence identity to the following wild-type enzymes:
  • Cytochrome P 1A2 (“CYP1A2”) (human):
  • TET1-CD (“Catalytic domain”) (human):
  • the disclosed fusion proteins comprise an adenine oxidase domain that does not comprise a variant of an alkB dehydrogenase or alkA dehydrogenase. In some embodiments, the disclosed fusion proteins comprise an adenine oxidase domain that does not comprise a TET family dioxygenase, such as TET1. In some embodiments, the disclosed fusion proteins comprise an adenine oxidase domain that does not comprise a variant of a TET family dioxygenase. In some embodiments, the disclosed fusion proteins do not comprise an alkA dehydrogenase, an alkB dehydrogenase, or a TET family dioxygenase, or a variant thereof.
  • the base editors disclosed herein further comprise one or more additional base editor elements, e.g., a nuclear localization signal(s), an inhibitor of base excision repair, and/or a heterologous protein domain.
  • additional base editor elements e.g., a nuclear localization signal(s), an inhibitor of base excision repair, and/or a heterologous protein domain.
  • the base editors disclosed herein further comprise one or more, preferably, at least two nuclear localization signals.
  • the base editors comprise at least two NLSs.
  • the NLSs can be the same NLSs, or they can be different NLSs.
  • the NLSs may be expressed as part of a fusion protein with the remaining portions of the base editors.
  • one or more of the NLSs are bipartite NLSs (“bpNLS”).
  • bpNLS bipartite NLSs
  • the disclosed fusion proteins comprise two bipartite NLSs. In some embodiments, the disclosed fusion proteins comprise more than two bipartite NLSs.
  • the location of the NLS fusion can be at the N-terminus, the C-terminus, or within a sequence of a base editor (e.g., inserted between the encoded napDNAbp domain (e.g., Cas9) and a DNA nucleobase modification domain (e.g., an adenine oxidase)).
  • a base editor e.g., inserted between the encoded napDNAbp domain (e.g., Cas9) and a DNA nucleobase modification domain (e.g., an adenine oxidase)).
  • the NLSs may be any known NLS sequence in the art.
  • the NLSs may also be any future-discovered NLSs for nuclear localization.
  • the NLSs also may be any naturally- occurring NLS, or any non-naturally occurring NLS (e.g., an NLS with one or more desired mutations).
  • nuclear localization sequence refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport. Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. Lor example, NLS sequences are described in Plank et al., International PCT application PCT/EP2000/011690, filed November 23, 2000, published as WO/2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference.
  • an NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO: 51), MDSLLMNRRKFLY QFKNVRWAKGRRETYLC (SEQ ID NO: 52),
  • NLS comprises the amino acid sequences
  • a base editor may be modified with one or more nuclear localization signals (NLS), preferably at least two NLSs.
  • the base editors are modified with two or more NLSs.
  • the invention contemplates the use of any nuclear localization signal known in the art at the time of the invention, or any nuclear localization signal that is identified or otherwise made available in the state of the art after the time of the instant filing.
  • a representative nuclear localization signal is a peptide sequence that directs the protein to the nucleus of the cell in which the sequence is expressed.
  • a nuclear localization signal is predominantly basic, can be positioned almost anywhere in a protein's amino acid sequence, generally comprises a short sequence of four amino acids (Autieri & Agrawal, (1998) J.
  • Nuclear localization signals often comprise proline residues.
  • a variety of nuclear localization signals have been identified and have been used to effect transport of biological molecules from the cytoplasm to the nucleus of a cell. See, e.g., Tinland et al., (1992) Proc. Natl. Acad. Sci. U.S.A. 89:7442-46; Moede et al., (1999) FEBS Lett. 461:229-34, which is incorporated by reference. Translocation is currently thought to involve nuclear pore proteins.
  • NLSs can be classified in three general groups: (i) a monopartite NLS exemplified by the SV40 large T antigen NLS (PKKKRKV (SEQ ID NO: 51)); (ii) a bipartite motif consisting of two basic domains separated by a variable number of spacer amino acids and exemplified by the Xenopus nucleoplasmin NLS (KRXXXXXXXXXKKKL (SEQ ID NO: 50)); and (iii) noncanonical sequences such as M9 of the hnRNP Al protein, the influenza virus nucleoprotein NLS, and the yeast Gal4 protein NLS (Dingwall and Laskey 1991).
  • NLS nuclear localization signals appear at various points in the amino acid sequences of proteins.
  • NLS have been identified at the N-terminus, the C-terminus, and in the central region of proteins.
  • the specification provides base editors that may be modified with one or more NLSs at the C -terminus, the N-terminus, as well as at in internal region of the base editor.
  • the residues of a longer sequence that do not function as component NLS residues should be selected so as not to interfere, for example tonically or sterically, with the nuclear localization signal itself. Therefore, although there are no strict limits on the composition of an NLS -comprising sequence, in practice, such a sequence can be
  • the present disclosure contemplates any suitable means by which to modify a base editor to include one or more NLSs.
  • the base editors can be engineered to express a base editor protein that is translationally fused at its N-terminus or its C-terminus (or both) to one or more NLSs, i.e., to form a base editor-NLS fusion construct.
  • the base editor-encoding nucleotide sequence can be genetically modified to incorporate a reading frame that encodes one or more NLSs in an internal region of the encoded base editor.
  • the NLSs may include various amino acid linkers or spacer regions encoded between the base editor and the N-terminally, C-terminally, or internally- attached NLS amino acid sequence, e.g, and in the central region of proteins.
  • the present disclosure also provides for nucleotide constructs, vectors, and host cells for expressing fusion proteins that comprise a base editor and one or more NLSs.
  • the base editors described herein may also comprise nuclear localization signals which are linked to a base editor through one or more linkers, e.g., and polymeric, amino acid, nucleic acid, polysaccharide, chemical, or nucleic acid linker element.
  • linkers within the contemplated scope of the disclosure are not intented to have any limitations and can be any suitable type of molecule (e.g., polymer, amino acid, polysaccharide, nucleic acid, lipid, or any synthetic chemical linker domain) and be joined to the base editor by any suitable strategy that effectuates forming a bond (e.g., covalent linkage, hydrogen bonding) between the base editor and the one or more NLSs.
  • the base editors described herein also may include one or more additional elements.
  • an additional element may comprise an effector of base repair.
  • the base editors described herein may comprise an inhibitor of base excision repair.
  • the term“inhibitor of base excision repair” or“iBER” refers to a protein that is capable of inhibiting the activity of a nucleic acid repair enzyme, for example a base excision repair enzyme.
  • Mammalian cells clear 8-oxoadenine lesions that arise naturally from oxidative DNA damage by action of thymine-DNA glycosylase (TDG), which hydrolytically cleaves the glycosidic bond of the damaged base, leaving behind an abasic site (FIG. 5). Abasic sites are excised by AP lyase during the base excision repair process, introducing a break in the modified DNA strand.
  • TDG thymine-DNA glycosylase
  • an iBER is fused to to the fusion proteins disclosed herein, to compete for binding of the 8-oxoadenine lesion with active, endogenous excision repair enzymes, preventing or slowing base excision repair.
  • the iBER is an inhibitor of 8-oxoadenine base excision repair.
  • Exemplary iBERs include OGG inhibitors, MUG inhibitors, and TDG inhibitors.
  • Exemplary iBERs include inhibitors of hOGGl, hTDG, ecMUG, APE1, Endo III, Endo IV, Endo V, Endo VIII, Fpg, hNEILl, T7 Endol, T4PDG, UDG, hSMUGl, and hAAG.
  • the iBER may be a catalytically inactive OGG, a catalytically inactive TDG, a catlytically inactive MUG, or small molecule or peptide inhibitor of OGG, TDG, or MUG, or a variant threreof.
  • the iBER is a catalytically inactive TDG.
  • exemplary catalytically inactive TDGs include mutagenized variants of wild-type TDG (SEQ ID NO:
  • Exemplary catalytically inactive MUGs include mutagenized variants of wild-type MUG (SEQ ID NO: 44) that bind DNA nucleobases, including 8-oxoadenine, but lack DNA glycosylase activity.
  • An exemplary catalytically inactive hTDG is an N140A mutant of SEQ ID NO: 43, shown below as SEQ ID NO: 46.
  • an exemplary catalytically inactive ecMUG is an N18A mutant of SEQ ID NO: 44, shown below as SEQ ID NO: 47.
  • exemplary iBERs comprise variants with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to wild-type hTDG and ecMUG, above.
  • Other exemplary iBERs comprise variants with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to wild-type hOGGl, UDG, hSMUGl, and hAAG.
  • the fusion proteins described herein may comprise one or more heterologous protein domains (e.g., about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more domains in addition to the base editor components).
  • a fusion protein may comprise any additional protein sequence, and optionally a linker sequence between any two domains.
  • localization sequences such as cytoplasmic localization sequences, export sequences, such as nuclear export sequences, or other localization sequences, as well as sequence tags.
  • protein domains that may be fused to a fusion protein or component thereof (e.g., the napDNAbp domain, the nucleobase modification domain, or the NLS domain) include, without limitation, epitope tags and reporter gene sequences.
  • epitope tags include histidine (His) tags, V5 tags, FLAG tags, influenza hemagglutinin (HA) tags, Myc tags, VSV-G tags, and thioredoxin (Trx) tags.
  • reporter genes include, but are not limited to, glutathione-5-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT), beta-galactosidase, beta- glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and autofluorescent proteins including blue fluorescent protein (BFP).
  • GST glutathione-5-transferase
  • HRP horseradish peroxidase
  • CAT chloramphenicol acetyltransferase
  • beta-galactosidase beta-galactosidase
  • beta-glucuronidase beta-galactosidase
  • luciferase green fluorescent protein
  • GFP green fluorescent protein
  • HcRed HcRed
  • DsRed cyan fluorescent protein
  • YFP
  • a base editor may be fused to a gene sequence encoding a protein or a fragment of a protein that bind DNA molecules or bind other cellular molecules, including, but not limited to, maltose binding protein (MBP), S-tag, Lex A DNA binding domain (DBD) fusions, GAL4 DNA binding domain fusions, and herpes simplex virus (HSV) BP 16 protein fusions. Additional domains that may form part of a base editor are described in US Patent Publication No. 2011/0059502, published March 10, 2011, and incorporated herein by reference in its entirety.
  • a reporter gene which includes, but is not limited to, glutathione-5-transferase (GST), horseradish peroxidase (HRP), chloramphenicol
  • acetyltransferase beta-galactosidase, beta-glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and autofluorescent proteins including blue fluorescent protein (BFP), may be introduced into a cell to encode a gene product which serves as a marker by which to measure the alteration or modification of expression of the gene product.
  • the gene product is luciferase.
  • the expression of the gene product is decreased.
  • Suitable protein tags include, but are not limited to, biotin carboxylase carrier protein (BCCP) tags, myc- tags, calmodulin-tags, FLAG-tags, hemagglutinin (HA)-tags, bgh-PolyA tags, polyhistidine tags, and also referred to as histidine tags or His-tags, maltose binding protein (MBP)-tags, nus-tags, glutathione-S-transferase (GST)-tags, green fluorescent protein (GFP)-tags, thioredoxin-tags, S-tags, Softags (e.g., Softag 1, Softag 3), strep-tags , biotin ligase tags, FlAsH tags, V5 tags, and SBP-tags. Additional suitable sequences will be apparent to those of skill in the art.
  • the fusion protein comprises one or more His tags. IV. Linkers
  • linkers may be used to link any of the peptides or peptide domains or domains of the base editor (e.g., domain A covalently linked to domain B which is covalently linked to domain C).
  • the term“linker,” as used herein, refers to a chemical group or a molecule linking two molecules or domains, e.g., a binding domain and a cleavage domain of a nuclease.
  • a linker joins a gRNA binding domain of a napDNAbp and the catalytic domain of a recombinase.
  • a linker joins a dCas9 and base editor domain (e.g., an adenine oxidase).
  • the linker is positioned between, or flanked by, two groups, molecules, or other domains and connected to each one via a covalent bond, thus connecting the two.
  • the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein).
  • the linker is an organic molecule, group, polymer, or chemical domain. Chemical domains include, but are not limited to, disulfide, hydrazone, thiol and azo domains.
  • the linker is 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
  • the linker is a molecule in length. Longer or shorter linkers are also contemplated.
  • the linker may be as simple as a covalent bond, or it may be a polymeric linker many atoms in length.
  • the linker is a polpeptide or based on amino acids. In other embodiments, the linker is not peptide-like.
  • the linker is a covalent bond (e.g., a carbon-carbon bond, disulfide bond, carbon-heteroatom bond, etc.).
  • the linker is a carbon-nitrogen bond of an amide linkage.
  • the linker is a cyclic or acyclic, substituted or unsubstituted, branched or unbranched aliphatic or hetero aliphatic linker.
  • the linker is polymeric (e.g., polyethylene, polyethylene glycol, polyamide, polyester, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of aminoalkanoic acid. In certain embodiments, the linker comprises an aminoalkanoic acid (e.g., glycine, ethanoic acid, alanine, beta-alanine, 3-aminopropanoic acid, 4-aminobutanoic acid, 5- pentanoic acid, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of aminohexanoic acid (Ahx).
  • Ahx aminohexanoic acid
  • the linker is based on a carbocyclic domain (e.g., cyclopentane, cyclohexane). In other embodiments, the linker comprises a polyethylene glycol domain (PEG). In other embodiments, the linker comprises amino acids. In certain embodiments, the linker comprises a peptide. In certain embodiments, the linker comprises an aryl or heteroaryl domain. In certain embodiments, the linker is based on a phenyl ring. The linker may included funtionalized domains to facilitate attachment of a nucleophile (e.g., thiol, amino) from the peptide to the linker. Any electrophile may be used as part of the linker. Exemplary electrophiles include, but are not limited to, activated esters, activated amides, Michael acceptors, alkyl halides, aryl halides, acyl halides, and isothiocyanates.
  • electrophile include, but are not limited to, activated
  • the linker comprises the amino acid sequence (GGGGS) n (SEQ ID NO: 78), (G) context (SEQ ID NO: 79), (EAAAK) meaning (SEQ ID NO: 80), (GGS) friendship (SEQ ID NO: 81), (SGGS) n (SEQ ID NO: 82), (XP) n (SEQ ID NO: 83), or any combination thereof, wherein n is independently an integer between 1 and 30, and wherein X is any amino acid.
  • the linker comprises the amino acid sequence (GGS) n (SEQ ID NO: 70), wherein n is 1, 3, or 7.
  • the linker comprises the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 48). In some embodiments, the linker comprises the amino acid sequence SGGSSGGSSGS ETPGT S ES ATPES SGGSSGGS (SEQ ID NO: 11), also known as XTEN linker. In some embodiments, the linker comprises the amino acid sequence SGGSGGSGGS (SEQ ID NO: 12). In some embodiments, the linker comprises the amino acid sequence SGGS (SEQ ID NO: 14).
  • the fusion protein comprises the structure [adenine oxidase] - [optional linker sequence] -[dCas9 or Cas9 nickase]- [optional linker sequence], or [dCas9 or Cas9 nickase] -[optional linker sequence] -[adenine oxidase].
  • the fusion protein comprises the structure [adenine oxidase] - [optional linker sequence] -[dCas9 or Cas9 nickase] -[optional linker sequence] -[iBER];
  • the fusion protein comprises one or more nuclear localization sequences, and comprises the structure [adenine oxidase] -[optional linker sequence]-[dCas9 or Cas9 nickase] -[optional linker sequence] -[iBER] -[optional linker sequence] -[NLS];
  • the target nucleotide sequence is a DNA sequence in a genome, e.g. a eukaryotic genome.
  • the target nucleotide sequence is in a mammalian (e.g. a human) genome.
  • the target nucleotide sequence is in a human genome.
  • the target nucleotide sequence is in the genome of a rodent, such as a mouse or rate.
  • the target nucleotide sequence is in the genome of a domesticated animal, such as a horse, cat, dog, or rabbit.
  • Some embodiments of the disclosure are based on the recognition that any of the fusion proteins provided herein are capable of modifying a specific nucleobase without generating a significant proportion of indels.
  • An“indel”, as used herein, refers to the insertion or deletion of a nucleobase within a nucleic acid. Such insertions or deletions can lead to frame shift mutations within a coding region of a gene.
  • any of the fusion proteins provided herein are capable of generating a greater proportion of intended modifications (e.g., point mutations) versus indels.
  • the fusion proteins provided herein are capable of generating a ratio of intended point mutations to indels that is greater than 1:1.
  • the fusion proteins provided herein are capable of generating a ratio of intended point mutations to indels that is at least 1.5:1, at least 2:1, at least 2.5:1, at least 3:1, at least 3.5:1, at least 4:1, at least 4.5:1, at least 5:1, at least 5.5:1, at least 6:1, at least 6.5:1, at least 7:1, at least 7.5:1, at least 8:1, at least 10:1, at least 12:1, at least 15:1, at least 20:1, at least 25:1, at least 30:1, at least 40:1, at least 50:1, at least 100:1, at least 200:1, at least 300:1, at least 400:1, at least 500:1, at least 600:1, at least 700:1, at least 800:1, at least 900:1, or at least 1000:1, or more.
  • the number of intended mutations and indels may be determined using any suitable method, for example the methods used in the below Examples.
  • sequencing reads are scanned for exact matches to two 10-bp sequences that flank both sides of a window in which indels might occur. If no exact matches are located, the read is excluded from analysis. If the length of this indel window exactly matches the reference sequence the read is classified as not containing an indel. If the indel window is two or more bases longer or shorter than the reference sequence, then the sequencing read is classified as an insertion or deletion, respectively.
  • the fusion proteins provided herein are capable of limiting formation of indels in a region of a nucleic acid.
  • the region is at a nucleotide targeted by a fusion protein or a region within 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides of a nucleotide targeted by a fusion protein.
  • any of the fusion proteins provided herein are capable of limiting the formation of indels at a region of a nucleic acid to less than 1%, less than 1.5%, less than 2%, less than 2.5%, less than 3%, less than 3.5%, less than 4%, less than 4.5%, less than 5%, less than 6%, less than 7%, less than 8%, less than 9%, less than 10%, less than 12%, less than 15%, or less than 20%.
  • the number of indels formed at a nucleic acid region may depend on the amount of time a nucleic acid (e.g., a nucleic acid within the genome of a cell) is exposed to a fusion protein.
  • an number or proportion of indels is determined after at least 1 hour, at least 2 hours, at least 6 hours, at least 12 hours, at least 24 hours, at least 36 hours, at least 48 hours, at least 3 days, at least 4 days, at least 5 days, at least 7 days, at least 10 days, or at least 14 days of exposing a nucleic acid (e.g., a nucleic acid within the genome of a cell) to a fusion protein.
  • a nucleic acid e.g., a nucleic acid within the genome of a cell
  • an intended mutation such as a point mutation
  • a nucleic acid e.g. a nucleic acid within a genome of a subject
  • an intended mutation is a mutation that is generated by a specific fusion protein bound to a gRNA, specifically designed to generate the intended mutation.
  • the intended mutation is a mutation associated with a disease, disorder, or condition.
  • the intended mutation is the correction of a cytosine (C) to adenine (A) point mutation associated with a disease, disorder, or condition. In some embodiments, the intended mutation is the correction of a guanine (G) to thymine (T) point mutation associated with a disease, disorder, or condition. In some embodiments, the intended mutation is the correction of a cytosine (C) to adenine (A) point mutation within the coding region of a gene. In some embodiments, the intended mutation is the correction of a guanine (G) to thymine (T) point mutation within the coding region of a gene.
  • the intended mutation is a point mutation that generates a stop codon, for example, a premature stop codon within the coding region of a gene. In some embodiments, the intended mutation is a mutation that eliminates a stop codon. In some embodiments, the intended mutation is a mutation that alters the splicing of a gene. In some embodiments, the intended mutation is a mutation that alters the regulatory sequence of a gene (e.g., a gene promotor or gene repressor).
  • any of the fusion proteins provided herein are capable of generating a ratio of intended mutations to unintended mutations (e.g., intended point m utati o n s : u n i n t c n dcd point mutations) that is greater than 1: 1. In some embodiments, any of the fusion proteins provided herein are capable of generating a ratio of intended mutations to unintended mutations (e.g., intended point
  • Some embodiments of the disclosure are based on the recognition that the formation of indels in a region of a nucleic acid may be limited by nicking the non-edited strand opposite to the strand in which edits are introduced.
  • This nick serves to direct mismatch repair machinery to the non-edited strand, ensuring that the chemically modified nucleobase is not interpreted as a lesion by the machinery.
  • This nick may be created by the use of an nCas9.
  • the methods provided in this disclosure comprise cutting (or nicking) the non-edited strand of the double-stranded DNA, for example, wherein the one strand comprises the A of the target T: A nucleobase pair, or the T of the T:A nucleobase pair.
  • Guide sequences e.g., guide RNAs
  • the present disclosure further provides guide RNAs for use in accordance with the disclosed methods of editing.
  • the disclosure provides guide RNAs that are designed to recognize target sequences.
  • Such gRNAs may be designed to have guide sequences (or “spacers”) having complementarity to a protospacer within the target sequence.
  • Guide RNAs are also provided for use with one or more of the disclosed fusion proteins, e.g., in the disclosed methods of editing a nucleic acid molecule.
  • Such gRNAs may be designed to have guide sequences having complementarity to a protospacer within a target sequence to be edited, and to have backbone sequences that interact specifically with the napDNAbp domains of any of the disclosed base editors, such as Cas9 nickase domains of the disclosed base editors.
  • the ACBEs may be complexed, bound, or otherwise associated with (e.g., via any type of covalent or non-covalent bond) one or more guide sequences, i.e., the sequence which becomes associated or bound to the base editor and directs its localization to a specific target sequence having complementarity to the guide sequence or a portion thereof.
  • a guide sequence will depend upon the nucleotide sequence of a genomic target site of interest (i.e., the desired site to be edited) and the type of napDNAbp (e.g., type of Cas protein) present in the base editor, among other factors, such as PAM sequence locations, percent G/C content in the target sequence, the degree of microhomology regions, secondary structures, etc.
  • a genomic target site of interest i.e., the desired site to be edited
  • type of napDNAbp e.g., type of Cas protein
  • a guide sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a napDNAbp (e.g., a Cas9, Cas9 homolog, or Cas9 variant) to the target sequence.
  • a napDNAbp e.g., a Cas9, Cas9 homolog, or Cas9 variant
  • the degree of complementarity between a guide sequence and its corresponding target sequence when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%,
  • Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting example of which include the Smith- Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g. the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net).
  • a guide sequence is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length.
  • a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length.
  • the ability of a guide sequence to direct sequence- specific binding of a base editor to a target sequence may be assessed by any suitable assay.
  • the components of a base editor, including the guide sequence to be tested may be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of a base editor disclosed herein, followed by an assessment of preferential cleavage within the target sequence, such as by Surveyor assay as described herein.
  • cleavage of a target polynucleotide sequence may be evaluated in a test tube by providing the target sequence, components of a base editor, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions.
  • Other assays are possible, and will occur to those skilled in the art.
  • a guide sequence may be selected to target any target sequence. In some embodiments,
  • the target sequence is a sequence within a genome of a cell.
  • Exemplary target sequences include those that are unique in the target genome. For example, for the S.
  • a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXGG (SEQ ID NO: 58) where
  • a unique target sequence in a genome may include an S. pyogenes Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNXGG (SEQ ID NO: 60) where NNNNNNNNNXGG (N is A, G, T, or C; and X can be anything) (SEQ ID NO: 61) has a single occurrence in the genome.
  • S. thermophilus CRISPRlCas9 a unique target sequence in a genome may include a Cas9 target site of the form
  • a unique target sequence in a genome may include an S. thermophilus CRISPR 1 Cas9 target site of the form
  • N is A, G, T, or C; X can be anything; and W is A or T
  • SEQ ID NO: 65 has a single occurrence in the genome.
  • a unique target sequence in a genome may include a Cas9 target site of the form
  • a unique target sequence in a genome may include an S. pyogenes Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNXGGXG (SEQ ID NO: 68) where NNNNNNNNNXGGXG (N is A, G, T, or C; and X can be anything) (SEQ ID NO: 69) has a single occurrence in the genome.
  • sequences“M” may be A, G, T, or C, and need not be considered in identifying a sequence as unique.
  • a guide sequence is selected to reduce the degree of secondary structure within the guide sequence.
  • Secondary structure may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimal Gibbs free energy. An example of one such algorithm is mFold, as described by Zuker & Stiegler ⁇ Nucleic Acids Res. 9 (1981), 133-148). Another example folding algorithm is the online Webserver RNAfold, developed at Institute for Theoretical Chemistry at the University of Vienna, using the centroid structure prediction algorithm (see, e.g., A. R.
  • a tracr mate sequence includes any sequence that has sufficient
  • complementarity with a tracr sequence to promote one or more of: (1) excision of a guide sequence flanked by tracr mate sequences in a cell containing the corresponding tracr sequence; and (2) formation of a complex at a target sequence, wherein the complex comprises the tracr mate sequence hybridized to the tracr sequence.
  • degree of complementarity is with reference to the optimal alignment of the tracr mate sequence and tracr sequence, along the length of the shorter of the two sequences.
  • Optimal alignment may be determined by any suitable alignment algorithm, and may further account for secondary structures, such as self-complementarity within either the tracr sequence or tracr mate sequence.
  • the degree of complementarity between the tracr sequence and tracr mate sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher.
  • the tracr sequence is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length.
  • the tracr sequence and tracr mate sequence are contained within a single transcript, such that hybridization between the two produces a transcript having a secondary structure, such as a hairpin.
  • Preferred loop forming sequences for use in hairpin structures are four nucleotides in length, and most preferably have the sequence GAAA. However, longer or shorter loop sequences may be used, as may alternative sequences.
  • the sequences preferably include a nucleotide triplet (for example, AAA), and an additional nucleotide (for example C or G). Examples of loop forming sequences include CAAA and AAAG.
  • the transcript or transcribed polynucleotide sequence has at least two or more hairpins. In certain embodiments, the transcript has two, three, four or five hairpins. In a further embodiment of the invention, the transcript has at most five hairpins.
  • the single transcript further includes a transcription termination sequence; preferably this is a polyT sequence, for example six T nucleotides.
  • a transcription termination sequence preferably this is a polyT sequence, for example six T nucleotides.
  • single polynucleotides comprising a guide sequence, a tracr mate sequence, and a tracr sequence are as follows (listed 5' to 3'), where“N” represents a base of a guide sequence, the first block of lower case letters represent the tracr mate sequence, and the second block of lower case letters represent the tracr sequence, and the final poly-T sequence represents the transcription terminator:
  • sequences (1) to (3) are used in combination with Cas9 from S. thermophilus CRISPR1.
  • sequences (4) to (6) are used in combination with Cas9 from S. pyogenes.
  • the tracr sequence is a separate transcript from a transcript comprising the tracr mate sequence.
  • a guide RNA typically comprises a tracrRNA framework allowing for Cas9 binding, and a guide sequence, which confers sequence specificity to the Cas9:nucleic acid editing enzyme/domain fusion protein.
  • the guide RNAs for use in accordance with the disclosed methods of editing comprise a backbone structure that is recognized by an S. pyogenes Cas9 protein or domain, such as an SpCas9 domain of the disclosed base editors.
  • the backbone structure recognized by an SpCas9 protein may comprise the sequence 5'-[guide sequence]- guuuuagagcuagaaauagcaaguuaaaauaaggcuaguccguuaucaacuugaaaaaaguggcaccgagucggugcuuuu u-3' (SEQ ID NO: 77), wherein the guide sequence comprises a sequence that is complementary to the protospacer of the target sequence. See U.S. Publication No.
  • the guide sequence is typically 20 nucleotides long.
  • the guide RNAs for use in accordance with the disclosed methods of editing comprise a backbone structure that is recognized by an S. aureus Cas9 protein.
  • the backbone structure recognized by an SaCas9 protein may comprise the sequence 5 '-[guide sequence] - guuuuaguacucuguaaugaaaauuacagaaucuacuaaaacaaggcaaaaugccguguuuaucucgucaacuuguugg cgagauuuuuuuuu-3' (SEQ ID NO: 161).
  • suitable guide RNAs for targeting the disclosed fusion proteins to specific genomic target sites will be apparent to those of skill in the art based on the instant disclosure.
  • Such suitable guide RNA sequences typically comprise guide sequences that are complementary to a nucleic sequence within 50 nucleotides upstream or downstream of the target nucleotide to be edited.
  • Some exemplary guide RNA sequences suitable for targeting any of the provided fusion proteins to specific target sequences are provided herein.
  • Additional guide sequences are are well known in the art and can be used with the base editors described herein. Additional exemplary guide sequences are disclosed in, for example, Jinek M., et al., Science 337:816-821(2012); Mali P, Esvelt KM & Church GM (2013) Cas9 as a versatile tool for engineering biology, Nature Methods , 10, 957-963; Li JF et al, (2013) Multiplex and homologous recombination-mediated genome editing in
  • the disclosure further relates in various aspects to methods of making the disclosed fusion proteins by various modes of manipulation that include, but are not limited to, codon optimization to achieve greater expression levels in a cell, and the use of nuclear localization sequences (NLSs), preferably at least two NLSs, e.g., two bipartite NLSs, to increase the localization of the expressed fusion proteins into a cell nucleus.
  • NLSs nuclear localization sequences
  • fusion proteins contemplated herein can include modifications that result in increased expression, for example, through codon optimization.
  • the fusion proteins (or a component thereof) is codon optimized for expression in particular cells, such as eukaryotic cells.
  • the eukaryotic cells may be those of or derived from a particular organism, such as a mammal, including, but not limited to, human, mouse, rat, rabbit, dog, or non-human primate.
  • codon optimization refers to a process of modifying a nucleic acid sequence for enhanced expression in the host cells of interest by replacing at least one codon (e.g. about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more codons) of the native sequence with codons that are more frequently or most frequently used in the genes of that host cell while maintaining the native amino acid sequence.
  • Codon bias differences in codon usage between organisms
  • mRNA messenger RNA
  • tRNA transfer RNA
  • Codon usage tables are readily available, for example, at the“Codon Usage Database”, and these tables can be adapted in a number of ways. See Nakamura, Y., et al.“Codon usage tabulated from the international DNA sequence databases: status for the year 2000” Nucl. Acids Res. 28:292 (2000).
  • codon optimizing a particular sequence for expression in a particular host cell are also available, such as Gene Forge (Aptagen; Jacobus, Pa.), are also available.
  • one or more codons e.g. 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more, or all codons
  • one or more codons in a sequence encoding a CRISPR enzyme correspond to the most frequently used codon for a particular amino acid.
  • Directed evolution methods e.g., PACE or PANCE
  • Various embodiments of the disclosure relate to providing directed evolution methods and systems (e.g., appropriate vectors, cells, phage, flow vessels, etc.) for engineering of the base editors or base editor domains of the present disclosure.
  • the disclosure provides vector systems for the disclosed directed evolution methods to engineer any of the disclosed base editors or base editor domains.
  • the directed evolution vector systems and methods provided herein allow for a gene of interest (e.g., a base editor- or adenine oxidase-encoding gene) in a viral vector to be evolved over multiple generations of viral life cycles in a flow of host cells to acquire a desired function or activity.
  • a gene of interest e.g., a base editor- or adenine oxidase-encoding gene
  • the gene under selection is encoded on the M13 bacteriophage genome. Its activity is linked to M13 propagation by controlling expression of gene III so that only active variants produce infectious progeny phage. Phage are continuously propagated and mutagenized, but mutations accumulate only in the phage genome, not the host or its selection circuit, because fresh host cells are continually flowed into (and out of) the growth vessel, effectively resetting the selection background.
  • PACE enables the rapid continuous evolution of biomolecules through many generations of mutation, selection, and replication per day.
  • host E. coli cells continuously dilute a population of bacteriophage (selection phage, SP) containing the gene of interest.
  • the gene of interest replaces gene III on the SP, which is required for progeny phage infectivity.
  • SP containing desired gene variants trigger host-cell gene III expression from an accessory plasmid (AP).
  • AP accessory plasmid
  • Host-cell DNA plasmids encode a genetic circuit that links the desired activity of the protein encoded in the SP to the expression of gene III on the AP.
  • SP variants containing desired gene variants can propagate, while phage encoding inactive variants do not generate infectious progeny and are rapidly diluted out of the culture vessel (or lagoon).
  • An arabinose-inducible mutagenesis plasmid (MP) controls the phage mutation rate.
  • the viral vector or the phage is a filamentous phage, for example, an M13 phage, such as an M13 selection phage as described in more detail elsewhere herein.
  • the gene required for the production of infectious viral particles is the M13 gene III (gill).
  • the viral vector infects mammalian cells. In some embodiments, the viral vector infects mammalian cells. In some embodiments, the viral vector infects mammalian cells.
  • the viral vector is a retroviral vector.
  • the viral vector is a vesicular stomatitis virus (VSV) vector.
  • VSV vesicular stomatitis virus
  • VSV vesicular stomatitis virus
  • VSV-G a viral glycoprotein that mediates phosphatidylserine attachment and cell entry.
  • VSV can infect a broad spectrum of host cells, including mammalian and insect cells. VSV is therefore a highly suitable vector for continuous evolution in human, mouse, or insect host cells.
  • other retroviral vectors that can be pseudotyped with VSV-G envelope protein are equally suitable for continuous evolution processes as described herein.
  • VSV-G packagable vectors are adapted for use in a continuous evolution system in that the native envelope (env) protein (e.g., VSV-G in VSVS vectors, or env in MLV vectors) is deleted from the viral genome, and a gene of interest is inserted into the viral genome under the control of a promoter that is active in the desired host cells.
  • the host cells express the VSV-G protein, another env protein suitable for vector
  • pseudotyping or the viral vector’s native env protein, under the control of a promoter the activity of which is dependent on an activity of a product encoded by the gene of interest, so that a viral vector with a mutation leading to increased activity of the gene of interest will be packaged with higher efficiency than a vector with baseline or a loss-of-function mutation.
  • mammalian host cells are subjected to infection by a continuously evolving population of viral vectors, for example, VSV vectors comprising a gene of interest and lacking the VSV-G encoding gene, wherein the host cells comprise a gene encoding the VSV-G protein under the control of a conditional promoter.
  • viral vectors for example, VSV vectors comprising a gene of interest and lacking the VSV-G encoding gene, wherein the host cells comprise a gene encoding the VSV-G protein under the control of a conditional promoter.
  • retrovirus-bases system could be a two-vector system (the viral vector and an expression construct comprising a gene encoding the envelope protein), or, alternatively, a helper virus can be employed, for example, a VSV helper vims.
  • a helper virus typically comprises a truncated viral genome deficient of structural elements required to package the genome into viral particles, but including viral genes encoding proteins required for viral genome processing in the host cell, and for the generation of viral particles.
  • the viral vector-based system could be a three-vector system (the viral vector, the expression construct comprising the envelope protein driven by a conditional promoter, and the helper vims comprising viral functions required for viral genome propagation but not the envelope protein).
  • expression of the five genes of the VSV genome from a helper vims or expression constmct in the host cells allows for production of infectious viral particles carrying a gene of interest, indicating that unbalanced gene expression permits viral replication at a reduced rate, suggesting that reduced expression of VSV-G would indeed serve as a limiting step in efficient viral production.
  • helper vims One advantage of using a helper vims is that the viral vector can be deficient in genes encoding proteins or other functions provided by the helper vims, and can, accordingly, carry a longer gene of interest.
  • the helper vims does not express an envelope protein, because expression of a viral envelope protein is known to reduce the infectability of host cells by some viral vectors via receptor interference.
  • Viral vectors for example retroviral vectors, suitable for continuous evolution processes, their respective envelope proteins, and helper vimses for such vectors, are well known to those of skill in the art.
  • helper vimses for continuous evolution procedures as described herein, see Coffin et al., Retroviruses, CSHL Press 1997, ISBN0-87969-571-4, incorporated herein in its entirety.
  • the incubating of the host cells is for a time sufficient for at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 200, at least 300, at least 400, at least, 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 1250, at least 1500, at least 1750, at least 2000, at least 2500, at least 3000, at least 4000, at least 5000, at least 7500, at least 10000, or more consecutive viral life cycles.
  • the viral vector is an M13 phage, and the length of a single viral life cycle is about 10-20 minutes.
  • a viral vector/host cell combination is chosen in which the life cycle of the viral vector is significantly shorter than the average time between cell divisions of the host cell.
  • Average cell division times and viral vector life cycle times are well known in the art for many cell types and vectors, allowing those of skill in the art to ascertain such host cell/vector combinations.
  • host cells are being removed from the population of host cells contacted with the viral vector at a rate that results in the average time of a host cell remaining in the host cell population before being removed to be shorter than the average time between cell divisions of the host cells, but to be longer than the average life cycle of the viral vector employed.
  • the host cells on average, do not have sufficient time to proliferate during their time in the host cell population while the viral vectors do have sufficient time to infect a host cell, replicate in the host cell, and generate new viral particles during the time a host cell remains in the cell population.
  • the average time a host cell remains in the host cell population is about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 70, about 80, about 90, about 100, about 120, about 150, or about 180 minutes.
  • the average time a host cell remains in the host cell population depends on how fast the host cells divide and how long infection (or conjugation) requires.
  • the flow rate should be faster than the average time required for cell division, but slow enough to allow viral (or conjugative) propagation.
  • the former will vary, for example, with the media type, and can be delayed by adding cell division inhibitor antibiotics (FtsZ inhibitors in E. coli, etc.). Since the limiting step in continuous evolution is production of the protein required for gene transfer from cell to cell, the flow rate at which the vector washes out will depend on the current activity of the gene(s) of interest. In some embodiments, titratable production of the protein required for the generation of infectious particles, as described herein, can mitigate this problem.
  • an indicator of phage infection allows computer-controlled optimization of the flow rate for the current activity level in real-time.
  • the fresh host cells comprise the accessory plasmid required for selection of viral vectors, for example, the accessory plasmid comprising the gene required for the generation of infectious phage particles that is lacking from the phages being evolved.
  • the host cells are generated by contacting an uninfected host cell with the relevant vectors, for example, the accessory plasmid and, optionally, a mutagenesis plasmid, and growing an amount of host cells sufficient for the replenishment of the host cell population in a continuous evolution experiment.
  • Methods for the introduction of plasmids and other gene constructs into host cells are well known to those of skill in the art and the invention is not limited in this respect.
  • such methods include, but are not limited to, electroporation and heat-shock of competent cells.
  • the accessory plasmid comprises a selection marker, for example, an antibiotic resistance marker, and the fresh host cells are grown in the presence of the respective antibiotic to ensure the presence of the plasmid in the host cells.
  • a selection marker for example, an antibiotic resistance marker
  • different markers are typically used. Such selection markers and their use in cell culture are known to those of skill in the art, and the invention is not limited in this respect.
  • the selection marker is a spectinomycin antibiotic resistance marker.
  • Cells are transformed with a selection plasmid containing an inactivated
  • spectinomycin resistance gene with a mutation at an active site that requires A:T to C:G editing to correct. Cells that fail to install the correct transversion mutation in the
  • spectinomycin resistance gene will die, while cells that make the correction will survive.
  • E. coli cells expressing an sgRNA targeting the active site mutation in the spectinomycin resistance gene and a nucleobase modification domain-dCas9 fusion protein are plated onto 2xYT agar with 256 pg/mL of spectinomycin. Surviving colonies (measured through CFUs) were sequenced to find consensus mutations in the fusion proteins expressed in the evolved survivors (FIG. 4).
  • a similar selection assay was used to evolve adenine deaminase activity in DNA during adenine base editor development, as described in Gaudelli, N. M. el al, Programmable base editing of A ⁇ T to G*C in genomic DNA without DNA cleavage. Nature 551, 464-471 (2017), herein incorporated in its entirety by reference.
  • the selection marker is a chloramphenicol antibiotic resistance marker.
  • Cells are transformed with a selection plasmid containing an inactivated
  • chloramphenicol resistance gene with a mutation at an active site that requires A:T to C:G editing to correct. Cells that fail to install the correct transversion mutation in the spectinomycin resistance gene will die, while cells that make the correction will survive.
  • E. coli cells expressing an sgRNA targeting the active site mutation in the chloramphenicol resistance gene and a nucleobase modification domain-dCas9 fusion protein are plated onto 2xYT agar with 256 pg/mL of chloramphenicol. Surviving colonies (measured through CFUs) were sequenced to find consensus mutations in the fusion proteins expressed in the evolved survivors.
  • the selection marker is a carbenicillin antibiotic resistance marker.
  • Cells are transformed with a selection plasmid containing an inactivated
  • carbenicillin resistance gene with a mutation at an active site that requires A:T to C:G editing to correct. Cells that fail to install the correct transversion mutation in the spectinomycin resistance gene will die, while cells that make the correction will survive.
  • E. coli cells expressing an sgRNA targeting the active site mutation in the carbenecillin resistance gene and a nucleobase modification domain-dCas9 fusion protein are plated onto 2xYT agar with 256 pg/mL of carbenicillin. Surviving colonies (measured through CFUs) were sequenced to find consensus mutations in the fusion proteins expressed in the evolved survivors.
  • mismatch-specific uracil-DNA glycosylase (MUG) knockout E. coli cells are used during the above spectinomycin, carbencillin, and/or chloramphenicol screening experiments to avoid excision of the target 8-oxoadenine before the full base editing process can be completed.
  • the host cell population in a continuous evolution experiment is replenished with fresh host cells growing in a parallel, continuous culture.
  • the cell density of the host cells in the host cell population contacted with the viral vector and the density of the fresh host cell population is substantially the same.
  • the cells being removed from the cell population contacted with the viral vector comprise cells that are infected with the viral vector and uninfected cells.
  • cells are being removed from the cell populations continuously, for example, by effecting a continuous outflow of the cells from the population.
  • cells are removed semi-continuously or intermittently from the population.
  • the replenishment of fresh cells will match the mode of removal of cells from the cell population, for example, if cells are continuously removed, fresh cells will be continuously introduced.
  • the modes of replenishment and removal may be mismatched, for example, a cell population may be continuously replenished with fresh cells, and cells may be removed semi-continuously or in batches.
  • the rate of fresh host cell replenishment and/or the rate of host cell removal is adjusted based on quantifying the host cells in the cell population. For example, in some embodiments, the turbidity of culture media comprising the host cell population is monitored and, if the turbidity falls below a threshold level, the ratio of host cell inflow to host cell outflow is adjusted to effect an increase in the number of host cells in the population, as manifested by increased cell culture turbidity. In other embodiments, if the turbidity rises above a threshold level, the ratio of host cell inflow to host cell outflow is adjusted to effect a decrease in the number of host cells in the population, as manifested by decreased cell culture turbidity.
  • Maintaining the density of host cells in the host cell population within a specific density range ensures that enough host cells are available as hosts for the evolving viral vector population, and avoids the depletion of nutrients at the cost of viral packaging and the accumulation of cell-originated toxins from overcrowding the culture.
  • the cell density in the host cell population and/or the fresh host cell density in the inflow is about 102 cells/ml to about 1012 cells/ml.
  • the host cell density is about 102 cells/ml, about 103 cells/ml, about 104 cells/ml, about 105 cells/ml, about 5- 105 cells/ml, about 106 cells/ml, about 5- 106 cells/ml, about 107 cells/ml, about 5- 107 cells/ml, about 108 cells/ml, about 5- 108 cells/ml, about 109 cells/ml, about 5- 109 cells/ml, about 1010 cells/ml, or about 5- 1010 cells/ml.
  • the host cell density is more than about 1010 cells/ml.
  • the host cell population is contacted with a mutagen.
  • the cell population contacted with the viral vector e.g., the phage
  • the mutagen is continuously exposed to the mutagen at a concentration that allows for an increased mutation rate of the gene of interest, but is not significantly toxic for the host cells during their exposure to the mutagen while in the host cell population.
  • the host cell population is contacted with the mutagen intermittently, creating phases of increased mutagenesis, and accordingly, of increased viral vector diversification.
  • the host cells are exposed to a concentration of mutagen sufficient to generate an increased rate of mutagenesis in the gene of interest for about 10%, about 20%, about 50%, or about 75% of the time.
  • selection of the mutagen is guided by crystallographic structural information about the wild-type oxidase to be evolved, for instance information about a binding pocket within the oxidase.
  • mutations are targeted to residues in the active site of a wild-type iron-dependent oxidase with the goal of affecting the relative orientation of the target adenine and the non-heme Fe(IV) center.
  • mutations are targeted to the DNA binding interface of a wild-type iron- dependent oxidase with the goal of affecting the relative orientation of the target adenine and the non-heme Fe(IV) center.
  • variants of AlkBH3 were evolved using continuous evolution systems to form a large library of AlkBH3 mutants, wherein mutations were targeted to residue in the active site and/or DNA binding interface of AlkBH3.
  • the host cells comprise a mutagenesis expression construct, for example, in the case of bacterial host cells, a mutagenesis plasmid.
  • the mutagenesis plasmid comprises a gene expression cassette encoding a mutagenesis- promoting gene product, for example, a proofreading-impaired DNA polymerase.
  • the mutagenesis plasmid including a gene involved in the SOS stress response, (e.g., UmuC, UmuD', and/or RecA).
  • the mutagenesis- promoting gene is under the control of an inducible promoter.
  • Suitable inducible promoters are well known to those of skill in the art and include, for example, arabinose-inducible promoters, tetracycline or doxycyclin-inducible promoters, and tamoxifen-inducible promoters.
  • the host cell population is contacted with an inducer of the inducible promoter in an amount sufficient to effect an increased rate of mutagenesis.
  • a bacterial host cell population is provided in which the host cells comprise a mutagenesis plasmid in which a dnaQ926, UmuC, UmuD', and RecA expression cassette is controlled by an arabinose-inducible promoter.
  • the population of host cells is contacted with the inducer, for example, arabinose in an amount sufficient to induce an increased rate of mutation.
  • diversifying the viral vector population is achieved by providing a flow of host cells that does not select for gain-of-function mutations in the gene of interest for replication, mutagenesis, and propagation of the population of viral vectors.
  • the host cells are host cells that express all genes required for the generation of infectious viral particles, for example, bacterial cells that express a complete helper phage, and, thus, do not impose selective pressure on the gene of interest.
  • the host cells comprise an accessory plasmid comprising a conditional promoter with a baseline activity sufficient to support viral vector propagation even in the absence of significant gain-of-function mutations of the gene of interest.
  • the disclosure provides vectors for the continuous evolution processes.
  • phage vectors for phage-assisted continuous evolution are provided.
  • a selection phage is provided that comprises a phage genome deficient in at least one gene required for the generation of infectious phage particles and a gene of interest to be evolved.
  • the disclosure provides viral vectors for the continuous evolution processes.
  • phage vectors for phage-assisted continuous evolution are provided.
  • a selection phage is provided that comprises a phage genome deficient in at least one gene required for the generation of infectious phage particles and a gene of interest to be evolved.
  • the selection phage comprises an M13 phage genome deficient in a gene required for the generation of infectious M13 phage particles, for example, a full-length gill.
  • the selection phage comprises a phage genome providing all other phage functions required for the phage life cycle except the gene required for generation of infectious phage particles.
  • an M13 selection phage is provided that comprises a gl, gll, gIV, gV, gVI, gVII, gVIII, glX, and a gX gene, but not a full-length gill.
  • the selection phage comprises a 3'- fragment of gill, but no full-length gill.
  • the 3 '-end of gill comprises a promoter (see Figure 16) and retaining this promoter activity is beneficial, in some embodiments, for an increased expression of gVI, which is immediately downstream of the gill 3 '-promoter, or a more balanced (wild-type phage-like) ratio of expression levels of the phage genes in the host cell, which, in turn, can lead to more efficient phage production.
  • the 3'- fragment of gill gene comprises the 3 '-gill promoter sequence.
  • the 3'- fragment of gill comprises the last 180 bp, the last 150 bp, the last 125 bp, the last 100 bp, the last 50 bp, or the last 25 bp of gill. In some embodiments, the 3'- fragment of gill comprises the last 180 bp of gin.
  • M13 selection phage comprises a gene of interest in the phage genome, for example, inserted downstream of the gVIII 3 '-terminator and upstream of the gIII-3 '-promoter.
  • an M13 selection phage is provided that comprises a multiple cloning site for cloning a gene of interest into the phage genome, for example, a multiple cloning site (MCS) inserted downstream of the gVIII 3 '-terminator and upstream of the gill- 3 '-promoter.
  • MCS multiple cloning site
  • a vector system for continuous evolution procedures comprising of a viral vector, for example, a selection phage, and a matching accessory plasmid.
  • a vector system for phage-based continuous directed evolution comprises (a) a selection phage comprising a gene of interest to be evolved, wherein the phage genome is deficient in a gene required to generate infectious phage; and (b) an accessory plasmid comprising the gene required to generate infectious phage particle under the control of a conditional promoter, wherein the conditional promoter is activated by a function of a gene product encoded by the gene of interest.
  • the selection phage is an M 13 phage as described herein.
  • the selection phage comprises an M13 genome including all genes required for the generation of phage particles, for example, gl, gll, gIV, gV, gVI, gVII, gVIII, glX, and gX gene, but not a full-length gill gene.
  • the selection phage genome comprises an FI or an M 13 origin of replication.
  • the selection phage genome comprises a 3 '-fragment of gill gene.
  • the selection phage comprises a multiple cloning site upstream of the gill 3 '-promoter and downstream of the gVIII 3 '-terminator.
  • Some embodiments of this disclosure provide a method of non-continuous evolution of a gene of interest.
  • the method of non-continuous evolution is PANCE.
  • the method of non-continuous evolution is an antibiotic or plate-based selection method.
  • the cells are re-transformed with the mutagenesis plasmid regularly to ensure the plasmid has not been inactivated.
  • An aliquot of a desired concentration, often 2 mL, is then transferred to a smaller flask, supplemeted with inducing agent arabinose (Ara) for the mutagenesis plasmid, and infected with the selection phage (SP).
  • a drift plasmid can also be provided that enables phage to propagate without passing the selection.
  • Expression is under the control of an inducible promoter and can be turned on with 50 ng/mL of anhydrotetracycline. This culture is incubated at 37 °C for 8-12 h to facilitate phage growth, which is confirmed by determination of the phage titer. Following phage growth, an aliquot of infected cells is used to transfect a subsequent flask containing host E. coli. This process is continued until the desired phenotype is evolved for as many transfers as required, while increasing the stringency in stepwise fashion by decreasing the incubation time or titer of phage with which the bacteria is infected. Reference is made to Suzuki T. et ah, Crystal structures reveal an elusive functional domain of pyrrolysyl-tRNA synthetase, Nat Chem Biol. 13(12): 1261-1266 (2017), incorporated herein in its entirety.
  • negative selection is applied during a non-continuous evolution method as described herein, by penalizing undesired activities.
  • this is achieved by causing the undesired activity to interfere with pill production.
  • expression of an antisense RNA complementary to the gill RBS and/or start codon is one way of applying negative selection, while expressing a protease (e.g., TEV) and engineering the protease recognition sites into pill is another.
  • a protease e.g., TEV
  • Vectors can be designed to clone and/or express the base editors of the disclosure.
  • Vectors may also be designed to transfect the base editors of the disclosure into one or more cells, e.g., a target diseased eukaryotic cell for treatment with the base editor systems and methods disclosed herein.
  • Vectors may be designed for expression of base editor transcripts (e.g. nucleic acid transcripts, proteins, or enzymes) in prokaryotic or eukaryotic cells.
  • base editor transcripts may be expressed in bacterial cells such as Escherichia coli, insect cells (using baculovims expression vectors), yeast cells, or mammalian cells. Suitable host cells are discussed further in Goeddel, Gene Expression Technology: Methods In Enzymology 185, Academic Press. San Diego, Calif. (1990).
  • expression vectors encoding one or more base editors described herein may be transcribed and translated in vitro, for example using T7 promoter regulatory sequences and T7 polymerase.
  • Vectors for rational mutagenesis methods such as PACE may be introduced and propagated in a prokaryotic cells.
  • a prokaryote is used to amplify copies of a vector to be introduced into a eukaryotic cell or as an intermediate vector in the production of a vector to be introduced into a eukaryotic cell (e.g., amplifying a plasmid as part of a viral vector packaging system).
  • a prokaryote is used to amplify copies of a vector and express one or more nucleic acids, such as to provide a source of one or more proteins for delivery to a host cell or host organism.
  • Fusion expression vectors also may be used to express the base editors of the disclosure. Such vectors generally add a number of amino acids to a protein encoded therein, such as to the amino terminus of the recombinant protein. Such fusion vectors may serve one or more purposes, such as: (i) to increase expression of a recombinant protein; (ii) to increase the solubility of a recombinant protein; and (iii) to aid in the purification of a recombinant protein by acting as a ligand in affinity purification. Often, in fusion expression vectors, a proteolytic cleavage site is introduced at the junction of the fusion domain and the
  • recombinant protein to enable separation of the recombinant protein from the fusion domain subsequent to purification of the fusion protein.
  • enzymes, and their cognate recognition sequences include Factor Xa, thrombin and enterokinase.
  • Example fusion expression vectors include pGEX (Pharmacia Biotech Inc; Smith and Johnson, 1988. Gene 67: 31-40), pMAL (New England Biolabs, Beverly, Mass.) and pRIT5 (Pharmacia, Piscataway, N.J.) that fuse glutathione S-transferase (GST), maltose E binding protein, or protein A, respectively, to the target recombinant protein.
  • GST glutathione S-transferase
  • E. coli expression vectors examples include pTrc (Amrann et al., (1988) Gene 69:301-315) and pET l id (Studier et al., Gene Expression Technology: Methods In Enzymology 185, Academic Press, San Diego, Calif. (1990) 60-89).
  • a vector is a yeast expression vector for expressing the base editors described herein.
  • yeast Saccharomyces cerivisae examples include pYepSecl (Baldari, et al., 1987. EMBO J. 6: 229-234), pMFa (Kuijan and Herskowitz, 1982. Cell 30: 933-943), pJRY88 (Schultz et al., 1987. Gene 54: 113-123), pYES2 (Invitrogen Corporation, San Diego, Calif.), and picZ (InVitrogen Corp, San Diego, Calif.).
  • a vector drives protein expression in insect cells using baculovirus expression vectors.
  • Baculovirus vectors available for expression of proteins in cultured insect cells include the pAc series (Smith, et al., 1983. Mol. Cell. Biol. 3: 2156-2165) and the pVL series (Lucklow and Summers, 1989. Virology 170: 31-39).
  • a vector is capable of driving expression of one or more sequences in mammalian cells using a mammalian expression vector.
  • mammalian expression vectors include pCDM8 (Seed, 1987. Nature 329: 840) and pMT2PC (Kaufman, et al., 1987. EMBO J. 6: 187-195).
  • the expression vector's control functions are typically provided by one or more regulatory elements.
  • commonly used promoters are derived from polyoma, adenovirus 2,
  • cytomegalovirus simian virus 40, and others disclosed herein and known in the art.
  • suitable expression systems for both prokaryotic and eukaryotic cells see, e.g., Chapters 16 and 17 of Sambrook, et al., Molecular Cloning: A Laboratory Manual. 2nd ed., Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989.
  • the recombinant mammalian expression vector is capable of directing expression of the nucleic acid preferentially in a particular cell type (e.g., tissue-specific regulatory elements are used to express the nucleic acid).
  • tissue-specific regulatory elements are known in the art.
  • suitable tissue-specific promoters include the albumin promoter (liver- specific; Pinkert, et al., 1987. Genes Dev. 1: 268-277), lymphoid- specific promoters (Calame and Eaton, 1988. Adv. Immunol. 43: 235-275), in particular promoters of T cell receptors (Winoto and Baltimore, 1989. EMBO J.
  • promoters are also encompassed, e.g., the murine hox promoters (Kessel and Gruss, 1990. Science 249: 374-379) and the a-fetoprotein promoter (Campes and Tilghman, 1989. Genes Dev. 3: 537-546).
  • Some embodiments of the disclosure provide methods for editing a nucleic acid using the base editors described herein to effectuate substitution of an A:T base pair to a C:G base pair.
  • the method is a method for editing a nucleobase of a nucleic acid (e.g., a base pair of a double- stranded DNA sequence).
  • the method comprises the steps of: a) contacting a target region of a nucleic acid (e.g., a double- stranded DNA sequence) with a complex comprising a fusion protein (e.g., a Cas9 domain fused to an adenine oxidase domain) and a guide nucleic acid (e.g., gRNA), wherein the target region comprises a targeted nucleobase pair.
  • a target region of a nucleic acid e.g., a double- stranded DNA sequence
  • a complex comprising a fusion protein (e.g., a Cas9 domain fused to an adenine oxidase domain) and a guide nucleic acid (e.g., gRNA), wherein the target region comprises a targeted nucleobase pair.
  • strand separation of said target region is induced, a first nucleobase of said target nucleobase pair in a single strand of the target region is converted to a second nucleobase, and no more than one strand of said target region is cut (or nicked), wherein a third nucleobase complementary to the first nucleobase base is replaced by a fourth nucleobase complementary to the second nucleobase.
  • the first nucleobase is an adenine (of the target A:T nucleobase pair).
  • the second nucleobase is the intermediate 8-oxoadenine.
  • the third nucleobase is a thymine (of the target A:T base pair).
  • the fourth nucleobase is a guanine.
  • the method further comprises replacing the second nucleobase with a fifth nucleobase (cytosine) that is complementary to the fourth nucleobase, thereby generating an intended edited base pair (e.g., A:T pair to a C:G pair).
  • an intended edited base pair e.g., A:T pair to a C:G pair.
  • at least 5% of the intended base pairs are edited.
  • at least 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, or 50% of the intended base paires are edited.
  • the ratio of intended products to unintended products in the target nucleotide is at least 2:1, 5:1, 10:1, 20:1, 30:1, 40:1, 50:1, 60:1, 70:1, 80:1, 90:1, 100:1, or 200:1, or more. In some embodiments, the ratio of intended point mutation to indel formation is greater than 1:1, 10:1, 50:1, 100:1, 500:1, or 1000:1, or more.
  • the cut single strand (nicked strand) is hybridized to the guide nucleic acid. In some embodiments, the cut single strand is opposite to the strand comprising the first nucleobase.
  • the base editor comprises nickase activity. In some embodiments, the intended edited base pair is upstream of a PAM site. In some
  • the intended edited base pair is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
  • the intended edited basepair is downstream of a PAM site.
  • the intended edited base pair is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides downstream stream of the PAM site.
  • the method does not require a canonical (e.g., NGG) PAM site.
  • the base editor comprises a linker.
  • the linker is 1-25 amino acids in length.
  • the linker is 5-20 amino acids in length.
  • linker is 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids in length.
  • the target region comprises a target window, wherein the target window comprises the target nucleobase pair.
  • the target window comprises 1-10 nucleotides. In some embodiments, the target window is 1-9, 1-8, 1- 7, 1-6, 1-5, 1-4, 1-3, 1-2, or 1 nucleotides in length. In some embodiments, the target window is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length. In some embodiments, the intended edited base pair is within the target window. In some embodiments, the target window comprises the intended edited base pair. In some embodiments, the method is performed using any of the base editors provided herein. In some embodiments, a target window is a editing window.
  • the disclosure provides methods for editing a nucleotide.
  • the disclosure provides a method for editing a nucleobase pair of a double-stranded DNA sequence.
  • the method comprises a) contacting a target region of the double-stranded DNA sequence with a complex comprising a base editor and a guide nucleic acid (e.g., gRNA), where the target region comprises a target nucleobase pair (e.g., A:T target base pair), b) converting a first nucleobase (e.g., the A base) of said target nucleobase pair in a single strand of the target region to a second nucleobase (e.g., converted to an intermediate, such as 8-oxoadenine, which is then replaced with a C through DNA replication/repair processes), c) cutting (or nicking) no more than one strand of said target region, wherein a third nucleobase complementary to the first
  • the method causes less than 19%, 18%, 16%, 14%, 12%, 10%, 8%, 6%, 4%, 2%, 1%, 0.5%, 0.2%, or less than 0.1% indel formation. In some embodiments, the method results in less than 19%, 18%, 16%, 14%, 12%, 10%, 8%, 6%, 4%, 2%, 1%, 0.5%, 0.2%, or less than 0.1% indel formation. In some embodiments, the method results in less than 20% indel formation in the nucleic acid. In other embodiments, the method results in less than 35% indel formation in the nucleic acid. In some
  • the method results in less than 20% indel formation in the nucleic acid. In other embodiments, the method results in less than 35% indel formation in the nucleic acid.
  • the ratio of intended product to unintended products at the target nucleotide is at least 2:1, 5:1, 10:1, 20:1, 30:1, 40:1, 50:1, 60:1, 70:1, 80:1, 90:1, 100:1, or 200:1, or more. In some embodiments, the ratio of intended point mutation to indel formation is greater than 1:1, 10:1, 50:1, 100:1, 500:1, or 1000:1, or more.
  • the cut single strand is hybridized to the guide nucleic acid. In some embodiments, the cut single strand is opposite to the strand comprising the first nucleobase.
  • the base editor comprises adenine oxidation and/or DNA glycosylase inhibition activity. In some embodiments, the base editor comprises nickase activity. In some embodiments, the intended edited base pair is upstream of a PAM site. In some embodiments, the intended edited base pair is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
  • the intended edited basepair is downstream of a PAM site.
  • the intended edited base pair is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides downstream stream of the PAM site.
  • the method does not require a canonical (e.g., NGG) PAM site.
  • the base editor comprises a linker. In some embodiments, the linker is 1-25 amino acids in length. In some
  • the linker is 5-20 amino acids in length. In some embodiments, the linker is
  • the target region comprises a target window, wherein the target window comprises the target nucleobase pair.
  • the target window comprises 1-10 nucleotides.
  • the target window is 1-9, 1-8, 1-7, 1-6, 1-5, 1-4, 1-3, 1-2, or 1 nucleotides in length.
  • the target window is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
  • the intended edited base pair occurs within the target window.
  • the target window comprises the intended edited base pair.
  • the base editor is any one of the base editors provided herein.
  • the disclosure provides editing methods comprising contacting a DNA, or RNA molecule with any of the base editors provided herein, and with at least one guide nucleic acid (e.g., guide RNA), wherein the guide nucleic acid, (e.g., guide RNA) is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence.
  • the 3' end of the target sequence is immediately adjacent to a canonical PAM sequence (NGG).
  • NGS canonical PAM sequence
  • the 3' end of the target sequence is not immediately adjacent to a canonical PAM sequence (NGG). In some embodiments, the 3' end of the target sequence is immediately adjacent to an AGC, GAG, TTT, GTG, or CAA sequence.
  • the target nucleic acid sequence comprises a sequence associated with a disease, disorder, or condition. In some embodiments, the target nucleic acid sequence comprises a point mutation associated with a disease, disorder, or condition.
  • the activity of the fusion protein results in a correction of the point mutation.
  • the target nucleic acid sequence comprises an C A point mutation associated with a disease, disorder, or condition, and wherein the conversion of the mutant A to a C results in a sequence that is not associated with a disease, disorder, or condition.
  • the target sequence may comprise a G T point mutation associated with a disease, disorder, or condition, and wherein the conversion of the mutant T to a G results in a sequence that is not associated with a disease, disorder, or conditionr.
  • the target nucleic acid sequence encodes a protein
  • the point mutation is in a codon and results in a change in the amino acid encoded by the mutant codon as compared to the wild-type codon.
  • the transversion of the mutant A (or mutant T) results in a change of the amino acid encoded by the mutant codon.
  • the transversion of the mutant A (or mutant T) results in the codon encoding the wild-type amino acid.
  • the contacting is in vivo in a subject.
  • the subject has or has been diagnosed with a disease, disorder, or condition.
  • the disease, disorder, or condition is congenital deafness, spastic paraplegia, nonsyndromic hearing loss, spinal muscular atrophy, or hypohidrotic ectodermal dysplasia.
  • the base editors are used to introduce a point mutation into a nucleic acid by oxidizing a target A nucleobase.
  • the oxidation of the target nucleobase results in the correction of a genetic defect, e.g., in the correction of a point mutation that leads to a loss of function in a gene product.
  • the genetic defect is associated with a disease, disorder, or condition, e.g., a lysosomal storage disorder or a metabolic disease, such as, for example, type I diabetes.
  • the methods provided herein are used to introduce a deactivating point mutation into a gene or allele that encodes a gene product that is associated with a disease, disorder, or condition.
  • methods are provided herein that employ a DNA editing fusion protein to introduce a deactivating point mutation into an oncogene (e.g., in the treatment of a proliferative disease).
  • a deactivating mutation may, in some embodiments, generate a premature stop codon in a coding sequence, which results in the expression of a truncated gene product, e.g., a truncated protein lacking the function of the full-length protein.
  • the purpose of the methods provided herein is to restore the function of a dysfunctional gene via genome editing.
  • the base editor proteins provided herein can be validated for gene editing-based human therapeutics in vitro , e.g., by correcting a disease-associated mutation in human cell culture. It will be understood by the skilled artisan that the base editors provided herein, e.g., the fusion proteins comprising a nucleic acid programmable DNA binding protein (e.g., Cas9) and a nucleobase modification domain can be used to correct any single point A to C or T to G mutation. Oxidation of the mutant A that is base-paired with the mutant T, followed by a round of replication, corrects the mutation.
  • a nucleic acid programmable DNA binding protein e.g., Cas9
  • the instant disclosure provides methods for the treatment of a subject diagnosed with a disease associated with or caused by a point mutation that can be corrected by a DNA editing fusion protein provided herein.
  • a method comprises administering to a subject having such a disease, e.g., a cancer associated with a point mutation as described above, an effective amount of an adenine oxidase fusion protein and a gRNA that forms a complex with the fusion protein, that corrects the point mutation or introduces a deactivating mutation into a disease-associated gene.
  • a method comprises administering to a subject having such a disease, e.g., a cancer associated with a point mutation as described above, an effective amount of an adenine oxidase fusion protein-gRNA complex that corrects the point mutation or introduces a deactivating mutation into a disease-associated gene.
  • a subject having such a disease e.g., a cancer associated with a point mutation as described above
  • an effective amount of an adenine oxidase fusion protein-gRNA complex that corrects the point mutation or introduces a deactivating mutation into a disease-associated gene.
  • methods comprising administering to a subject one or more vectors that contains a nucleotide sequence that expresses the fusion protein and gRNA that forms a complex with the fusion protein.
  • the disease is a proliferative disease.
  • the disease is a genetic disease.
  • the disease is a neoplastic disease.
  • the disease is a metabolic disease.
  • the disease is a lysosomal storage disease.
  • Other diseases that can be treated by correcting a point mutation or introducing a deactivating mutation into a disease-associated gene will be known to those of skill in the art, and the disclosure is not limited in this respect.
  • the instant disclosure provides methods for the treatment of additional diseases or disorders, e.g., diseases or disorders that are associated or caused by a point mutation that can be corrected by adenine oxidase-mediated gene editing.
  • additional diseases or disorders e.g., diseases or disorders that are associated or caused by a point mutation that can be corrected by adenine oxidase-mediated gene editing.
  • Some such diseases are described herein, and additional suitable diseases that can be treated with the strategies and fusion proteins provided herein will be apparent to those of skill in the art based on the instant disclosure.
  • Exemplary suitable diseases and disorders are listed below. It will be understood that the numbering of the specific positions or residues in the respective sequences depends on the particular protein and numbering scheme used. Numbering might be different, e.g., in precursors of a mature protein and the mature protein itself, and differences in sequences from species to species may affect numbering.
  • Suitable diseases and disorders include, without limitation: Non-Bruton type Agammaglobulinemia, Hypomyelinating Leukodystrophy, 21 -hydroxylase deficiency, familial Breast-ovarian cancer, Immunodeficiency with basal ganglia
  • Neurodevelopmental disorder with or without anomalies of the brain, eye, or heart is a neurodevelopmental disorder with or without anomalies of the brain, eye, or heart.
  • Immunodeficiency Leber congenital amaurosis, Amyotrophic lateral sclerosis type 10, Motor neuron disease, Malignant melanoma of skin, Focal cortical dysplasia type II, papillary Renal cell carcinoma, Glioblastoma, Colorectal Neoplasms, Uterine cervical neoplasms, sporadic Papillary renal cell carcinoma, Malignant neoplasm of body of uterus, Kidney Carcinoma, Neoplasm of the breast, Glioblastoma, Smith-Kingsmore syndrome, Homocysteinemia due to MTHFR deficiency, type 2A2A Charcot-Marie-Tooth disease, Bartter syndrome type 3, Cataract, multiple types, Gastrointestinal stroma tumor, Paragangliomas, Pheochromocytoma, Hereditary cancer-predisposing syndrome, Paragangliomas, Hereditary cancer-predisposing syndrome, Gastrointestinal stroma tumor, Paragan
  • Hereditary neutrophilia Ceroid lipofuscinosis neuronal, Neuronal ceroid lipofuscinosis, Lethal tight skin contracture syndrome, DFNA 2 Nonsyndromic Hearing Loss, Osteogenesis imperfecta type 8, GLUT1 deficiency syndrome, autosomal recessive, Glucose transporter type 1 deficiency syndrome, Congenital amegakaryocytic thrombocytopenia, Myelofibrosis with myeloid metaplasia, somatic, Myelofibrosis with myeloid metaplasia,
  • Thrombocythemia somatic, Hematologic neoplasm, Early infantile epileptic encephalopathy, Mental retardation, autosomal recessive, Familial porphyria cutanea tarda, MYH-associated polyposis, Hereditary cancer-predisposing syndrome, MUTYH- associated polyposis, Hereditary cancer-predisposing syndrome, Methylmalonic acidemia with homocystinuria, Methylmalonic aciduria and homocystinuria, cblC type, digenic, Muscle eye brain disease, Congenital Muscular Dystrophy, alpha-dystroglycan related, Limb-Girdle Muscular
  • Dystrophy Recessive, Muscle eye brain disease, Congenital muscular dystrophy- dystroglycanopathy with brain and eye anomalies, type A3, Adenocarcinoma of the colon, Congenital primary aphakia, Hepatic failure, early-onset, and neurologic disorder due to cytochrome C oxidase deficiency, Carnitine palmitoyltransferase II deficiency, infantile, Carnitine palmitoyltransferase II deficiency, myopathic, stress-induced, Carnitine
  • palmitoyltransferase II deficiency Carnitine palmitoyltransferase II deficiency, myopathic, stress-induced, Sensorineural deafness with mild renal dysfunction, Bartter syndrome type 4, Hypercholesterolemia, autosomal dominant, Low density lipoprotein cholesterol level quantitative trait locus, Familial hypercholesterolemia, Hypocholesterolemia,
  • Hypercholesterolemia autosomal dominant, Familial hypercholesterolemia, Low density lipoprotein cholesterol level quantitative trait locus, Hypocholesterolemia, Lattice corneal dystrophy Type III, Epileptic encephalopathy, early infantile, Hypobetalipoproteinemia, familial, Congenital disorder of glycosylation type It, Leber congenital amaurosis, Retinitis pigmentosa, Medium-chain acyl-coenzyme A dehydrogenase deficiency, Dilated
  • cardiomyopathy ICC Venous malformation, Aase syndrome, Stargardt disease, Cone-rod dystrophy, Retinitis pigmentosa, Stargardt disease, Congenital stationary night blindness, Retinal dystrophy, Nonsyndromic cleft lip with or without cleft palate, Glycogen storage disease type III, Glycogen storage disease Ilia, Intermediate maple syrup urine disease type 2, Maple syrup urine disease, Chorea, childhood-onset, with psychomotor retardation, Marshall syndrome, Stickler syndrome, type 2, Marshall/Stickler syndrome, Chudley-McCullough syndrome, Auriculocondylar syndrome, Pontocerebellar hypoplasia, type 9, Epileptic encephalopathy, early infantile, Spinocerebellar ataxia, Muscle AMP deaminase deficiency, Congenital giant melanocytic nevus, Liver cancer, Chronic lymphocytic leukemia,
  • Neurocutaneous melanosis Malignant melanoma of skin, Multiple myeloma, Neuroblastoma, Lung adenocarcinoma, Non-small cell lung cancer, Acute myeloid leukemia, Renal cell carcinoma, papillary, Neoplasm of brain, Cutaneous melanoma, Glioblastoma, Hepatocellular carcinoma, Transitional cell carcinoma of the bladder, Colorectal Neoplasms,
  • Ovarian Serous Cystadenocarcinoma Malignant neoplasm of body of uterus, RAS Inhibitor response, Malignant lymphoma, non-Hodgkin, Medulloblastoma, Malignant melanoma of skin, Multiple myeloma, Acute myeloid leukemia, Myelodysplastic syndrome, Cutaneous melanoma, Transitional cell carcinoma of the bladder, Neoplasm, Colorectal Neoplasms, Adenocarcinoma of stomach, Cutaneous melanoma, Malignant melanoma of skin, Multiple myeloma, Acute myeloid leukemia, Noonan syndrome, Myelodysplastic syndrome,
  • Hereditary insensitivity to pain with anhidrosis Hereditary insensitivity to pain with anhidrosis, Familial medullary thyroid carcinoma, Hereditary insensitivity to pain with anhidrosis Spherocytosis, type 3, autosomal recessive, Spherocytosis, Recessive, Elliptocytosis, Hereditary pyropoikilocytosis,
  • Mitochondrial complex I deficiency Charcot-Marie-Tooth disease, demyelinating, type lb, Charcot-Marie-Tooth disease, type I, Roussy-Levy syndrome, Neuropathy, congenital hypomyelinating, autosomal dominant, Charcot-Marie-Tooth disease, demyelinating, type lb, Charcot-Marie-Tooth disease type 2J, Charcot-Marie-Tooth disease dominant intermediate, Charcot-Marie-Tooth disease, type I, Gastrointestinal stroma tumor, Paragangliomas, Hereditary cancer-predisposing syndrome, Achromatopsia, Thrombophilia due to activated protein C resistance, Geroderma osteodysplastica, Trimethylaminuria, FM03 activity, decreased, Trimethylaminuria, Primary open angle glaucoma juvenile onset, Glaucoma, open angle, digenic, Glaucoma, primary congenital, digenic, MYOC-Related Disorders,
  • Hereditary nephrotic syndrome Nephrotic syndrome, idiopathic, steroid-resistant, Pituitary hormone deficiency, combined, Glutamine deficiency, congenital, Prostate cancer, hereditary, Junctional epidermolysis bullosa gravis of Herlitz, Hyperparathyroidism, Factor H
  • scapulohumeroperoneal Nemaline myopathy, autosomal dominant or recessive, Myopathy, actin, congenital, with cores, Cardioencephalomyopathy, fatal infantile, due to cytochrome c oxidase deficiency, Chediak-Higashi syndrome, Familial hypertrophic cardiomyopathy, Methylcobalamin deficiency, cblG type, Catecholaminergic polymorphic ventricular tachycardia, Catecholaminergic polymorphic ventricular tachycardia type 1,
  • hypobetalipoproteinemia familial
  • Hypobetalipoproteinemia familial
  • Proopiomelanocortin deficiency Acute myeloid leukemia
  • Shashi-Pena syndrome Primary pulmonary
  • Pheochromocytoma Hereditary cancer-predisposing syndrome, Retinitis pigmentosa, Cone- rod dystrophy amelogenesis imperfecta, Cd8 deficiency, familial, Severe combined immunodeficiency, atypical, Achromatopsia, Monochromacy, Ectodermal dysplasia, hypohidrotic/hair/tooth type, autosomal dominant, Autosomal recessive hypohidrotic ectodermal dysplasia syndrome, Autosomal dominant hypohidrotic ectodermal dysplasia, Colorectal cancer with chromosomal instability, Retinitis pigmentosa, Osteomyelitis, sterile multifocal, with periostitis and pustulosis, Hypochromic microcytic anemia with iron overload, Culler-Jones syndrome, Autosomal recessive centronuclear myopathy,
  • Thrombophilia hereditary, due to protein C deficiency, autosomal dominant, Congenital disorders of glycosylation type II, Congenital disorder of glycosylation, type IIo, Warburg micro syndrome, Hypomyelination with brainstem and spinal cord involvement and leg spasticity, Warts, hypogammaglobulinemia, infections, and myelokathexis, Congenital NAD deficiency disorder, Vertebral, cardiac, renal, and limb defects syndrome, Mowat-Wilson syndrome, Homocystinuria, cblD type, variant 1, Nemaline myopathy, Nemaline myopathy, Idiopathic generalized epilepsy, Epilepsy, idiopathic generalized, Juvenile myoclonic epilepsy, Episodic ataxia, type 5, Progressive myositis ossificans, Amelogenesis imperfecta, type IH, Benign familial neonatal-infantile seizures, Early infantile epileptic
  • Brachydactyly-syndactyly-oligodactyly syndrome Brachydactyl-syndactyly-oligodactyly syndrome (1 patient), immunodeficiency, developmental delay, and hypohomocysteinemia
  • Hereditary myopathy with early respiratory failure Familial dilated cardiomyopathy, Dilated cardiomyopathy, Primary dilated cardiomyopathy, Limb-girdle muscular dystrophy, type 2J, Primary dilated cardiomyopathy, Familial dilated cardiomyopathy, Familial hypertrophic cardiomyopathy, Diabetes mellitus type 2, Ehlers-Danlos syndrome, type 4, Cardiovascular phenotype, Ehlers-Danlos syndrome, type 2, Ehlers-Danlos syndrome, classic type,
  • hyperammonemia type I, Hereditary cancer-predisposing syndrome, Familial cancer of breast, Hereditary cancer-predisposing syndrome, Hereditary cancer-predisposing syndrome, Spondylometaphyseal dysplasia - Sutcliffe type, Spondylometaphyseal dysplasia, Short stature, Focal segmental glomerulosclerosis, Microcephaly, Small for gestational age, Disproportionate short-trunk short stature, Decreased body weight, Atrioventricular canal defect, Congenital microcephaly, Steroid-resistant nephrotic syndrome, Schimke
  • immunoosseous dysplasia Short stature, Focal segmental glomerulosclerosis, Microcephaly, Small for gestational age, Disproportionate short-trunk short stature, Decreased body weight, Atrioventricular canal defect, Congenital microcephaly, Steroid-resistant nephrotic syndrome, Gracile syndrome, Cholestanol storage disease, Odontoonychodermal dysplasia, Schopf- Schulz-Passarge syndrome, Tooth agenesis, selective, Type A1 brachydactyly,
  • Dyschromatosis universalis hereditaria Charcot-Marie-Tooth disease, axonal, type 2T, Myopathy, centronuclear, Three M syndrome, Waardenburg syndrome type 1, Alport syndrome, autosomal recessive, Benign familial hematuria, Basal ganglia disease, biotin- responsive, ARMC9-related Joubert syndrome, ARMC9-related Joubert syndrome, Jourbert syndrome, Arthrogryposis, distal, type 5d, Microphthalmia, isolated, Myasthenic syndrome, congenital, fast-channel, Congenital myasthenic syndrome, fast-channel, Oguchi's disease, Crigler Najjar syndrome, type 1, Crigler-Najjar syndrome, type II, Crigler-Najjar syndrome, Crigler-Najjar syndrome, type II, Gilbert's syndrome, Crigler Najjar syndrome, type 1, Hyperbilirubinemia, Ullrich congenital muscular dystrophy, Bethlem myopathy, Ullrich congenital muscular
  • Microphthalmia Microphthalmia, syndromic, Congenital disorder of deglycosylation, Cardiovascular phenotype, Loeys-Dietz syndrome, Thoracic aortic aneurysm and aortic dissection,
  • Osteogenesis imperfecta type 7 Lynch syndrome I, Hereditary cancer-predisposing syndrome, Turcot syndrome, Hereditary nonpolyposis colon cancer, Atrial fibrillation, Atrial fibrillation, familial, Atrial fibrillation, Brugada syndrome, Congenital long QT syndrome, Cardiac arrhythmia, Sudden infant death syndrome, Long qt syndrome, acquired,
  • Medulloblastoma Malignant melanoma of skin, Squamous cell carcinoma of the head and neck, Malignant tumor of prostate, Lung adenocarcinoma, Hepatoblastoma, Cutaneous melanoma, Hepatocellular carcinoma, Craniopharyngioma, Adrenocortical carcinoma, Adenocarcinoma of stomach, Malignant neoplasm of body of uterus, Liver cancer,
  • Medulloblastoma Lung adenocarcinoma, Neoplasm of stomach, Cutaneous melanoma, Hepatocellular carcinoma, Transitional cell carcinoma of the bladder, Carcinoma of esophagus, Uterine cervical neoplasms, Adenocarcinoma of stomach, Malignant neoplasm of body of uterus, Adenocarcinoma of prostate, Liver cancer, Malignant melanoma of skin,
  • Lung adenocarcinoma Cutaneous melanoma, Hepatocellular carcinoma, Transitional cell carcinoma of the bladder, Colorectal Neoplasms, Adrenocortical carcinoma, Malignant neoplasm of body of uterus, Adenocarcinoma of prostate, Nemaline myopathy,
  • Hepatocellular carcinoma Pancreatic adenocarcinoma, Transitional cell carcinoma of the bladder, Brainstem glioma, Carcinoma of esophagus, PIK3CA related overgrowth spectrum, Colorectal Neoplasms, Uterine cervical neoplasms, Papillary renal cell carcinoma, sporadic, Nasopharyngeal Neoplasms, Adenocarcinoma of stomach, Ovarian Serous
  • Cystadenocarcinoma Malignant neoplasm of body of uterus, Adenocarcinoma of prostate, Uterine Carcinosarcoma, Carcinoma of gallbladder, Lung cancer, Medulloblastoma, Malignant melanoma of skin, Squamous cell carcinoma of the head and neck, Malignant tumor of prostate, Ovarian epithelial cancer, Carcinoma of colon, Neoplasm of brain, Neoplasm of the breast, Glioblastoma, Transitional cell carcinoma of the bladder, PIK3CA related overgrowth spectrum, Ovarian Neoplasms, Colorectal Neoplasms, Uterine cervical neoplasms, Adenocarcinoma of stomach, Malignant neoplasm of body of uterus, Adenocarcinoma of prostate, Uterine Carcinosarcoma, Cowden syndrome, PIK3CA related overgrowth spectrum, Colorectal Neoplasms, Ciliary dyskinesia, Ciliary dyskinesi
  • Maldergem syndrome short-rib thoracic dysplasia with polydactyly, Ceroid lipofuscinosis neuronal, Macular dystrophy with central cone involvement, Ceroid lipofuscinosis neuronal, Methylmalonic aciduria cblA type, Pseudohypoaldosteronism type 1 autosomal dominant, Pseudohypoaldosteronism, Common variable immunodeficiency, with autoimmunity, Afibrinogenemia, congenital, Familial visceral amyloidosis, Ostertag type,
  • cardiomyopathy 1A Limb-girdle muscular dystrophy, type 2S, Mitochondrial myopathy, Myopia, Mitochondrial DNA depletion syndrome (cardiomyopathic type), autosomal recessive, Progressive sensorineural hearing impairment, Hypertrophic cardiomyopathy, Left ventricular hypertrophy, Vertigo, Abnormality of mitochondrial metabolism, Mitochondrial respiratory chain defects, Bietti crystalline corneoretinal dystrophy, Comeal Dystrophy, Recessive, Bietti crystalline corneoretinal dystrophy, Hereditary factor XI deficiency disease, Mitochondrial complex II deficiency, Paragangliomas, Hereditary cancer-predisposing syndrome, Mitochondrial complex II deficiency, Dyskeratosis congenita autosomal dominant, Ciliary dyskinesia, Mental retardation, autosomal dominant, Chondrocalcinosis, Oculocutaneous albinism type 4, Inherited bone marrow failure syndrome,
  • adenomatous polyposis Familial adenomatous polyposis, Hereditary cancer-predisposing syndrome, Familial adenomatous polyposis, Colorectal cancer, susceptibility to, Familial adenomatous polyposis, Hereditary cancer-predisposing syndrome, Familial adenomatous polyposis, Hereditary cancer-predisposing syndrome, Familial adenomatous polyposis, Hereditary cancer-predisposing syndrome, Familial adenomatous polyposis, Hereditary cancer-predisposing syndrome, Familial adenomatous polyposis, Anencephalus, Aortic aneurysm, familial thoracic, Pyridoxine-dependent epilepsy, Seizures,
  • Ventriculomegaly, Pyridoxine-dependent epilepsy, Myopathy areflexia, respiratory distress, and dysphagia, early-onset, Congenital contractural arachnodactyly, Neuro myotonia and axonal neuropathy, autosomal recessive, Renal carnitine transport defect, Hereditary cancer- predisposing syndrome, Chylomicron retention disease, Groenouw comeal dystrophy type I, Reis-Bucklers' corneal dystrophy, Lattice corneal dystrophy type 3A, Lattice corneal dystrophy Type I, Pseudohypoaldosteronism, type 2, Pseudohypoaldosteronism type 2D, Myotilinopathy, Charcot-Marie-Tooth disease, axonal, type 2w, Leber congenital amaurosis, Retinitis pigmentosa, Diastrophic dysplasia, de la Chapelle dysplasia, Achondro
  • Cardiomyopathy Abnormality of cardiovascular system morphology, Malformation of the heart and great vessels, Cardiovascular phenotype, Congenital heart disease, Atrial septal defect with or without atrioventricular conduction defects, Hypothyroidism, congenital, nongoitrous, Cardiovascular phenotype, Congenital heart disease, Craniosynostosis, Lewy body dementia, Sotos syndrome, Hypercalcemia, infantile, Hereditary angioneurotic edema with normal C 1 esterase inhibitor activity, Hereditary angioneurotic edema, Acute myeloid leukemia, Myelodysplasia, Ehlers-Danlos syndrome progeroid type, Axenfeld-Rieger syndrome type 3, Polymicrogyria, asymmetric, Combined oxidative phosphorylation deficiency, Combined oxidative phosphorylation deficiency, Factor XIII subunit A deficiency, Cardiovascular phenotype, Bicuspid aortic valve,
  • Osteopetrosis autosomal recessive, Amyotrophic lateral sclerosis type, Progressive pseudorheumatoid dysplasia, Metaphyseal chondrodysplasia, Schmid type, Ovarian dysgenesis, Alopecia congenita keratosis palmoplantaris, Oculodentodigital dysplasia, Merosin deficient congenital muscular dystrophy, Laminin alpha 2-related dystrophy, Merosin deficient congenital muscular dystrophy, Arginase deficiency, Arterial calcification of infancy, Hypophosphatemic rickets, autosomal recessive, Arterial calcification of infancy, Hypophosphatemic Rickets, Recessive, Arterial calcification of infancy, Joubert syndrome, Leber congenital amaurosis, Disseminated atypical mycobacterial infection,
  • Neoplasms Multiple myeloma, Squamous cell carcinoma of the head and neck, Lung adenocarcinoma, Non-small cell lung cancer, Squamous cell lung carcinoma, Colorectal Neoplasms, Non small cell lung cancer, Rasopathy, Neoplasm of the breast, Neoplasm, Carcinoma of colon, Noonan syndrome, Cataract and cardiomyopathy, Myotonia congenital, Congenital myotonia, autosomal recessive form, Premature ovarian failure, Cortical dysplasia-focal epilepsy syndrome, Rolandic epilepsy, Pitt-Hopkins-like syndrome, Rolandic epilepsy, Long QT syndrome, Congenital long QT syndrome, Short QT syndrome, Cardiovascular phenotype, Long QT syndrome, Glaucoma, open angle, F, Glycogen storage disease of heart, lethal congenital, Familial hypertrophic cardiomyopathy, Primary familial hypertrophic cardiomyopathy, Holoprosencephaly, Currarino
  • Hyperlipoproteinemia type I, lipoprotein lipase (Olbia), Surfactant metabolism dysfunction, pulmonary, Osteogenesis imperfecta, type xiii, Hypermanganesemia with dystonia, Charcot- Marie-Tooth disease, demyelinating, type If, Charcot-Marie-Tooth disease type 2E, Charcot- Marie-Tooth disease, demyelinating, type If, Trichothiodystrophy 6, nonphotosensitive, Cholesterol monooxygenase (side-chain cleaving) deficiency, Kallmann syndrome, Hartsfield syndrome, Medulloblastoma, Neuroblastoma, Encephalocraniocutaneous lipomatosis, Astrocytoma, Brainstem glioma, Adenocarcinoma of stomach, Rosette-forming glioneuronal tumor, Hypogonadotropic hypogonadism with anosmia, Spherocytosis type 1, Mental retardation, autosomal dominant, Idiopathic basal gangli
  • Ataxia with vitamin E deficiency nocturnal frontal lobe epilepsy
  • Joubert syndrome Melnick-Fraser syndrome
  • Osteopetrosis with renal tubular acidosis carbonic anhydrase II variant
  • Achromatopsia Hereditary cancer-predisposing syndrome
  • Trichorhinophalangeal dysplasia type I Multiple congenital exostosis, Dandy-Walker like malformation with atrioventricular septal defect, Benign familial neonatal seizures, Ciliary dyskinesia, primary, Iodotyrosyl coupling defect, Mental retardation, autosomal recessive, Deficiency of steroid 11 -beta-monooxygenase, Corticosterone methyloxidase type 1 deficiency, Hyperlipoproteinemia, type ID, Amelogenesis imperfecta, hypocalcification type, 5-Oxoprolinase deficiency, Mitochondrial complex III deficiency, nuclear type 6, Brown- Vialetto-Van Laere syndrome, Hereditary acrodermatitis enteropathica, Rothmund-Thomson syndrome, Baller-Gerold syndrome, Hyperimmunoglobulin E recurrent infection syndrome, autosomal recessive, Nicolaides-Baraitser
  • encephalopathy 59 Foeys-Dietz syndrome, Thoracic aortic aneurysm and aortic dissection, Foeys-Dietz syndrome, Congenital disorder of glycosylation type 1, Hereditary fmctosuria, Familial hypoalphalipoproteinemia, Tangier disease, Fimb-girdle muscular dystrophy- dystroglycanopathy, type C4, Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies, type A4, Primary autosomal recessive microcephaly, Meretoja syndrome, adrenal insufficiency, NR5A1 -related, 46, XY sex reversal, type 3, Nail-patella syndrome, Early infantile epileptic encephalopathy 4, Epileptic encephalopathy, Primary pulmonary hypertension, Osier hemorrhagic telangiectasia syndrome, Coenzyme Q10 deficiency, primary, Ichthyosis prematurity
  • hypomyelinating neuropathy Neuropathy, congenital hypomyelinating, autosomal dominant, Shprintzen-Goldberg syndrome, Goldberg-Shprintzen megacolon syndrome, Shprintzen- Goldberg syndrome, Diarrhea, malabsorptive, congenital, Aplastic anemia, Hemophagocytic lymphohistiocytosis, familial, nephrotic syndrome, Hyperphenylalaninemia, BH4 -deficient, D, Histiocytosis-lymphadenopathy plus syndrome, Usher syndrome, type ID, pituitary adenoma, multiple types, Usher syndrome, type ID, Usher syndrome, type ID, Gaucher disease, atypical, due to saposin C deficiency, Krabbe disease atypical due to Saposin A deficiency, Combined saposin deficiency, Sphingolipid activator protein deficiency, Gaucher disease, atypical, due to saposin C deficiency, Spondylo
  • Genitopatellar syndrome Young Simpson syndrome, Hypomyelinating leukodystrophy, Idiopathic fibrosing alveolitis, chronic form, Hepatic methionine adenosyltransferase deficiency, Hereditary cancer-predisposing syndrome, Juvenile polyposis syndrome, Juvenile polyposis syndrome, Hereditary cancer-predisposing syndrome, Hyperinsulinism- hyperammonemia syndrome, Spondyloepimetaphyseal dysplasia, pakistani type,
  • Neoplasm of the breast PTEN hamartoma tumor syndrome, Malignant melanoma of skin, Squamous cell carcinoma of the head and neck, Small cell lung cancer, Squamous cell lung carcinoma, Renal cell carcinoma, papillary, Neoplasm of the breast, Glioblastoma, Hereditary cancer-predisposing syndrome, Colorectal Neoplasms, Uterine cervical neoplasms, Adenocarcinoma of stomach, Malignant neoplasm of body of uterus, Adenocarcinoma of prostate, Uterine
  • Carcinosarcoma PTEN hamartoma tumor syndrome, Cowden syndrome, Hereditary cancer- predisposing syndrome, Hereditary cancer-predisposing syndrome, Lhermitte-Duclos disease, Neoplasm of the breast, Colorectal Neoplasms, Hereditary cancer-predisposing syndrome, Macrocephaly/autism syndrome, Hereditary cancer-predisposing syndrome, PTEN
  • hamartoma tumor syndrome Cutaneous melanoma, Hereditary cancer-predisposing syndrome, PTEN hamartoma tumor syndrome, Hereditary cancer-predisposing syndrome, Autoimmune lymphoproliferative syndrome, type la, Lysosomal acid lipase deficiency, Microcephaly with or without chorioretinopathy, lymphedema, or mental retardation, Hydranencephaly with renal aplasia-dysplasia, Spastic paraplegia, Cutis laxa, autosomal dominant, Primary hyperoxaluria, type III, Spastic tetraparesis, Hermansky-Pudlak syndrome, Dubin-Johnson syndrome, Renal coloboma syndrome, Autosomal dominant progressive external ophthalmoplegia with mitochondrial DNA deletions, Mitochondrial diseases, Autosomal dominant progressive external ophthalmoplegia with mitochondrial DNA deletions, Mitochondrial diseases, Kallmann syndrome,
  • adenocarcinoma Acute myeloid leukemia, Myelodysplastic syndrome, Nevus sebaceous, Nevus sebaceous, somatic, Rasopathy, Neoplasm of the breast, Glioblastoma, Bladder carcinoma, Hepatocellular carcinoma, Pancreatic adenocarcinoma, Squamous cell carcinoma of the skin, Transitional cell carcinoma of the bladder, Carcinoma of esophagus, Colorectal Neoplasms, Uterine cervical neoplasms, Neoplasm of the thyroid gland, Papillary renal cell carcinoma, sporadic, Adenoid cystic carcinoma, Nasopharyngeal Neoplasms,
  • Adenocarcinoma of stomach Ovarian Serous Cystadenocarcinoma, Malignant neoplasm of body of uterus, Adenocarcinoma of prostate, Uterine Carcinosarcoma, Early myoclonic encephalopathy, Neutral lipid storage disease with myopathy, Ceroid lipofuscinosis neuronal, Growth restriction, severe, with distinctive facies, Hyperproinsulinemia, Permanent neonatal diabetes mellitus, Hyperproinsulinemia, Segawa syndrome, autosomal recessive, Dystonia, Segawa syndrome, autosomal recessive, Jervell and Lange-Nielsen syndrome, Long QT syndrome, Cardiovascular phenotype, Congenital long QT syndrome, Long QT syndrome, Congenital long QT syndrome, Long QT syndrome, Long QT syndrome 1/2, digenic, Long QT syndrome, Congenital long QT syndrome, Cardiovascular phenotype, Long QT syndrome, Congenital long QT syndrome, Long QT syndrome, Cardiovascular pheno
  • Gnathodiaphyseal dysplasia Limb-girdle muscular dystrophy, type 2L, Gnathodiaphyseal dysplasia, Limb-girdle muscular dystrophy, type 2L, Miyoshi muscular dystrophy, AN05- Related Disorders, Limb-girdle muscular dystrophy, type 2L, Elevated serum creatine phosphokinase, Myopathy, Distal muscle weakness, Fatty replacement of skeletal muscle, Limb-girdle muscular dystrophy, type 2L, Follicle-stimulating hormone deficiency, isolated, Aniridia, Irido-corneo-trabecular dysgenesis, Foveal hypoplasia with cataract, Irido-comeo- trabecular dysgenesis, Anophthalmia - microphthalmia, Aniridia, Irido-corneo-trabecular dysgenesis, Wilms tumor, Combined cellular and humoral immune defects with granulomas, Severe combined immunodefici
  • Thrombophilia Hereditary factor II deficiency disease, Xeroderma pigmentosum, group E, Left ventricular noncompaction, Hypertrophic cardiomyopathy, Primary familial
  • hypertrophic cardiomyopathy Hypertrophic cardiomyopathy, Cardiovascular phenotype, Primary familial hypertrophic cardiomyopathy, Cardiovascular phenotype, Familial hypertrophic cardiomyopathy, Primary familial hypertrophic cardiomyopathy, Hypertrophic, Primary familial hypertrophic cardiomyopathy, Cardiovascular phenotype, Familial hypertrophic cardiomyopathy, Familial hypertrophic cardiomyopathy, Primary familial hypertrophic cardiomyopathy, Hypertrophic cardiomyopathy, Cardiovascular phenotype, Primary familial hypertrophic cardiomyopathy, Hypertrophic cardiomyopathy, Primary familial hypertrophic cardiomyopathy, Primary familial hypertrophic cardiomyopathy, Hypertrophic cardiomyopathy, Hypertrophic cardiomyopathy, Primary familial hypertrophic cardiomyopathy, Cardiovascular phenotype, Familial hypertrophic cardiomyopathy,
  • Cardiovascular phenotype Primary familial hypertrophic cardiomyopathy, Familial hypertrophic cardiomyopathy, Hypertrophic cardiomyopathy, Myasthenic syndrome, congenital, associated with acetylcholine receptor deficiency, Pena-Shokeir syndrome type I, Myasthenic syndrome, congenital, associated with acetylcholine receptor deficiency, Congenital myasthenic syndrome, Myopathy, Myasthenic syndrome, congenital, associated with acetylcholine receptor deficiency, Congenital Myasthenic Syndrome, Recessive, Myasthenic syndrome, congenital, associated with acetylcholine receptor deficiency, Hereditary angioedema type 1, Hereditary Cl esterase inhibitor deficiency - dysfunctional factor, Poikiloderma, hereditary fibrosing, with tendon contractures, myopathy, and pulmonary fibrosis, Gracile bone dysplasia, Joubert syndrome, Joubert syndrome,
  • Encephalopathy progressive, with or without lipodystrophy, Familial renal hypouricemia, Platelet-type bleeding disorder, Glycogen storage disease, type V, Hereditary cancer- predisposing syndrome, Multiple endocrine neoplasia, type 1, Hereditary cancer-predisposing syndrome, Hereditary cancer-predisposing syndrome, Multiple endocrine neoplasia, type 1, Multiple endocrine neoplasia, type 1, Hereditary cancer-predisposing syndrome, Coffin-Siris syndrome, Calfan syndrome, Verloes Bourguignon syndrome, Bardet-Biedl syndrome, Bardet-Biedl syndrome, Spinocerebellar ataxia, autosomal recessive, Pyruvate carboxylase deficiency, Cold-induced sweating syndrome, Crisponi/Cold-induced sweating syndrome, Somatotroph adenoma, Pituitary adenoma predisposition, Mitochondrial complex I de
  • LAMM labyrinthine aplasia microtia and microdontia
  • LAMM labyrinthine aplasia microtia and microdontia
  • LAMM labyrinthine aplasia microtia and microdontia
  • Smith-Lemli-Opitz syndrome Cerebral folate deficiency
  • Opsismodysplasia 3-methylglutaconic aciduria with cataracts, neurologic involvement
  • neutropenia Joubert syndrome
  • Vitreoretinopathy neovascular inflammatory, Usher syndrome, type 1, Usher syndrome, type 1, Usher syndrome, type IB, Usher syndrome, type 1, MY07A-Related Disorders
  • polycystic liver disease with or without kidney cysts Tremor, hereditary essential, Mitochondrial complex I deficiency, Mitochondrial diseases, Tyrosinase-negative oculocutaneous albinism,
  • Tyrosinase-negative oculocutaneous albinism Oculocutaneous albinism type IB, Albinism, ocular, with sensorineural deafness, Skin/hair/eye pigmentation, variation in, Oculocutaneous albinism, Hereditary cancer-predisposing syndrome, Ataxia-telangiectasia-like disorder, Charcot-Marie-Tooth disease, type 4B1, Focal segmental glomerulosclerosis, Coloboma, ocular, with or without hearing impairment, cleft lip/palate, and/or mental retardation, Metaphyseal chondrodysplasia, Spahr type, Short-rib polydactyly syndrome type III, Jeune thoracic dystrophy, Short-rib thoracic dysplasia with or without polydactyly, Short-rib polydactyly syndrome type I, Short-rib polydactyly syndrome type III, Deficiency of acetyl- CoA acetyltransferase,
  • Paragangliomas Hereditary Paraganglioma-Pheochromocytoma Syndromes, Cowden syndrome, Paraganglioma and gastric stromal sarcoma, Pheochromocytoma, Mitochondrial complex II deficiency, Paragangliomas, Hereditary Paraganglioma-Pheochromocytoma Syndromes, Cowden syndrome 3, Apolipoprotein A-IV polymorphism,
  • APOA4* l/APOA4*2 Hyperalphalipoproteinemia, Coronary heart disease, Apolipoprotein A-I (Baltimore), Immunodeficiency, Kabuki syndrome, Wiedemann-Steiner syndrome, Short stature, rhizomelic, with microcephaly, micrognathia, and developmental delay, Glucose-6- phosphate transport defect, Acute intermittent porphyria, Congenital myasthenic syndrome, Noonan syndrome-like disorder with or without juvenile myelomonocytic leukemia, Microphthalmia, isolated, Gaze palsy, familial horizontal, with progressive scoliosis, Megalencephalic leukoencephalopathy with subcortical cysts 2a, Deficiency of isobutyryl- CoA dehydrogenase, Cone dystrophy, Retinal cone dystrophy, Megalencephaly- polymicrogyria-polydactyly-hydrocephalus syndrome, Tumoral
  • cardiomyopathy Primary familial hypertrophic cardiomyopathy, Familial hypertrophic cardiomyopathy, Death in infancy, Ventricular extrasystoles, Cardiovascular phenotype, Noonan syndrome, Noonan syndrome, Rasopathy, Juvenile myelomonocytic leukemia, Noonan syndrome, Leopard syndrome, Rasopathy, Metachondromatosis, Noonan syndrome with multiple lentigines, Noonan syndrome 1, LEOPARD syndrome, Scoliosis, Rasopathy, Abnormal facial shape, cafe-au-lait spot, Specific learning disability, Intellectual disability, mild, Aortic valve disease, Holt-Oram syndrome, Mental retardation and distinctive facial features with or without cardiac defects, Charcot-Marie-Tooth disease, type 2L,
  • Medulloblastoma Wilms tumor, Malignant tumor of prostate, Tracheoesophageal fistula, Pancreatic cancer, Glioma susceptibility, Hereditary breast and ovarian cancer syndrome, Hereditary cancer-predisposing syndrome, BRCA2-Related Disorders, Breast-ovarian cancer, familial, Fanconi anemia, complementation group Dl, Fanconi anemia, Hereditary breast and ovarian cancer syndrome, Hereditary cancer-predisposing syndrome, Primary pulmonary hypertension, Congenital disorder of glycosylation type 2L, Hyperornithinemia- hyperammonemia-homocitrullinuria syndrome, Retinoblastoma, Retinoblastoma, Neoplasm, Small cell lung cancer, Neoplasm, Retinitis pigmentosa, Retinal dystrophy with or without extraocular anomalies, Retinitis pigmentosa, Retinal dystrophy with extraocular anomalies, Aicardi Goutieres
  • cardiomyopathy Wolff-Parkinson-White pattern
  • Dilated cardiomyopathy 1EE Familial hypertrophic cardiomyopathy
  • Primary familial hypertrophic cardiomyopathy Sudden cardiac death
  • Cardiovascular phenotype Hypertrophic cardiomyopathy
  • Primary familial hypertrophic cardiomyopathy Primary familial hypertrophic cardiomyopathy
  • Cardiovascular phenotype Familial hypertrophic
  • cardiomyopathy Primary familial hypertrophic cardiomyopathy, Cardiovascular phenotype, Familial hypertrophic cardiomyopathy, Familial cardiomyopathy, Hypertrophic cardiomyopathy, Cardiomyopathy, Hypertrophic cardiomyopathy, Dyskeratosis congenita, Dyskeratosis congenita autosomal dominant, Dyskeratosis congenita autosomal dominant, Dyskeratosis congenita, autosomal dominant, Revesz syndrome, Dyskeratosis congenita autosomal dominant, Dyskeratosis congenita, Dyskeratosis Congenita, Dominant, Autosomal recessive congenital ichthyosis, Rett syndrome, congenital variant, Mitochondrial complex I deficiency, Ectodermal dysplasia, anhidrotic, with T-cell immunodeficiency, autosomal dominant, Benign hereditary chorea, Choreoathetosis, hypothyroid
  • Kartagener syndrome L-2-hydroxyglutaric aciduria, Penetrating foot ulcers, Distal sensory impairment, Osteomyelitis leading to amputation due to slow healing fractures, Distal lower limb muscle weakness, Glycogen storage disease, type VI, Dystonia, Dopa-responsive type, Microphthalmia syndromic, Anophthalmia, combined immunodeficiency and megaloblastic anemia, Hereditary cancer-predisposing syndrome, congential disorder of glycosylation with defective fucosylation, Leber congenital amaurosis, Platelet-type bleeding disorder,
  • Alzheimer disease type 3, Alzheimer disease, type 3, Pick's disease, Alzheimer disease, type 3, Frontotemporal dementia, Pick's disease, Acne inversa, familial, Coenzyme Q10 deficiency, primary, Methylmalonate semialdehyde dehydrogenase deficiency, Niemann-Pick disease type C2, Niemann-Pick disease, type C, Leukoencephalopathy with vanishing white matter, Carcinoma of colon, Endometrial carcinoma, Hereditary nonpolyposis colorectal cancer type 7, Lynch syndrome, MLH3-Related Lynch Syndrome, Nevus comedonicus, Proliferative vasculopathy and hydranencephaly-hydrocephaly syndrome, Cone-rod dystrophy, Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies, type A2, Congenital muscular dystrophy-dystroglycanopathy with mental retardation, type B2, Limb-girdle muscular dystrophy-dystrogly
  • pleuropulmonary blastoma cancer predisposition syndrome Hereditary cancer-predisposing syndrome, Gabriele-De Vries Syndrome, Spinal muscular atrophy, SMA, Spinal muscular atrophy, lower extremity predominant, autosomal dominant, Mental retardation, autosomal dominant, Mental retardation, autosomal dominant, Charcot-Marie-Tooth disease, dominant intermediate E, cerebellar-facial-dental syndrome, Cerebellofaciodental syndrome,
  • cardiomyopathy Familial pulmonary capillary hemangiomatosis, Isovaleric acidemia, type I, Adams-Oliver syndrome, Limb-girdle muscular dystrophy, type 2A, Spherocytosis type 5, Peeling skin syndrome, Peeling skin syndrome, acral type, Microcephaly and
  • chorioretinopathy autosomal recessive, Hypoproteinemia, hypercatabolic, Arginine: glycine amidinotransferase deficiency, Bartter syndrome, type 1, antenatal, Marfan syndrome,
  • Marfan lipodystrophy syndrome Cardiovascular phenotype, Marfan syndrome, Thoracic aortic aneurysm and aortic dissection, Thoracic aortic Aneurysm and dissection (TAAD), Cardiovascular phenotype, Stiff skin syndrome, Marfan syndrome, Thoracic aortic aneurysm and aortic dissection, Thoracic aortic Aneurysm and dissection (TAAD), Marfan
  • Cardiovascular phenotype Seckel syndrome, Aromatase deficiency, Lethal congenital contracture syndrome, Intellectual developmental disorder with cardiac arrhythmia, Primary ciliary dyskinesia, Craniosynostosis, Parkinson disease, age at onset, susceptibility to, Parkinson disease, Parkinson disease, autosomal recessive early-onset, Hyperchlorhidrosis, isolated, Nemaline myopathy, Congenital stationary night blindness, type ID, Lung adenocarcinoma, Non-small cell lung cancer, Cutaneous melanoma, Cardio-facio-cutaneous syndrome, Cardiofaciocutaneous syndrome, Cardio-facio-cutaneous syndrome, Aortic valve disease, Thoracic aortic aneurysm and aortic dissection, Cardiovascular phenotype, Loeys- Dietz syndrome, Ceroid lipofuscinosis neurode
  • Camptocormia Acrocallosal syndrome, Schinzel type, Spondylocostal dysostosis, Liver cancer, Acute myeloid leukemia, Neoplasm of brain, Hepatocellular carcinoma, Brainstem glioma, Colorectal Neoplasms, Multiple myeloma, Squamous cell carcinoma of the head and neck, Acute myeloid leukemia, Myelodysplastic syndrome, Colorectal Neoplasms, Bloom syndrome, Bloom syndrome, Hereditary cancer-predisposing syndrome, Arthrogryposis renal dysfunction cholestasis syndrome, Epileptic encephalopathy, childhood-onset, Congenital heart defects, multiple types, Weill-Marchesani-like syndrome, Autosomal recessive congenital ichthyosis, Microphthalmia, isolated, Osteosclerotic metaphyseal dysplasia, alpha Thalassemia, Hemoglobin Loire, Erythrocytosis, Hemoglobin Chesa
  • Mucolipidosis III Gamma You-Hoover-Fong syndrome, Renal dysplasia, retinal pigmentary dystrophy, cerebellar ataxia and skeletal dysplasia, Joubert syndrome with Jeune asphyxiating thoracic dystrophy, Renal dysplasia, retinal pigmentary dystrophy, cerebellar ataxia and skeletal dysplasia, Retinis pigmentosa, Leigh syndrome, Combined oxidative
  • Lymphangiomyomatosis Tuberous sclerosis syndrome, Polycystic kidney disease, adult type, Digitorenocerebral syndrome, Early infantile epileptic encephalopathy, Myoclonic epilepsy, familial infantile, Digitorenocerebral syndrome, Progressive myoclonus epilepsy with ataxia, Familial Mediterranean fever, Rubinstein-Taybi syndrome, Nephronophthisis, Congenital disorder of glycosylation type IK, Carbohydrate-deficient glycoprotein syndrome type I, Carbohydrate-deficient glycoprotein syndrome type I, Congenital disorder of glycosylation, Epilepsy, focal, with speech disorder and with or without mental retardation, Rolandic epilepsy, Bare lymphocyte syndrome type 2, complementation group A, Charcot-Marie- Tooth disease, type 1C, Fanconi anemia, complementation group Q, Dyskeratosis congenita, Dyskeratosis congenita, autosomal recessive, Lissencephaly
  • Bile acid synthesis defect congenital, Generalized epilepsy with febrile seizures plus, type 9, Warfarin response, warfarin response - Dosage, Warfarin response, Familial renal glucosuria, Glycogen storage disease IXb, Behcet's syndrome, Cylindromatosis, familial, Townes-Brocks syndrome, Joubert syndrome, Hamamy syndrome, Multicentric osteolysis, nodulosis and arthropathy, Bardet-Biedl syndrome, Retinitis pigmentosa, Nephrotic syndrome, type 12, Familial hypokalemia-hypomagnesemia, Spondyloepimetaphyseal dysplasia, Faden-Alkuraya type, Polymicrogyria, bilateral frontoparietal, Lissencephaly, with microcephaly, Retinitis pigmentosa, Poikiloderma with neutropenia, Brachioskeletogenital syndrome, Mitochondrial DNA depletion syndrome, Lamella
  • Hyperlipidemia Short metacarpal, Intellectual disability, severe, Short stature, brachydactyly, intellectual developmental disability, and seizures, Acanthosis nigricans, Skeletal dysplasia, Insulin resistance, Short stature, Self-injurious behavior, Abnormal facial shape,
  • Brachydactyly Renal hypoplasia, Abnormality of the dentition, Hepatic steatosis, Obesity, Lumbar hyperlordosis, Hyperlipidemia, Short metacarpal, Intellectual disability, severe, Hereditary diffuse gastric cancer, Hereditary cancer-predisposing syndrome, Ectropion inferior cleft lip and or palate, Breast cancer, lobular, Hereditary diffuse gastric cancer, Hereditary cancer-predisposing syndrome, Ectropion inferior cleft lip and or palate,
  • Medulloblastoma Malignant melanoma of skin, Multiple myeloma, Squamous cell carcinoma of the head and neck, Small cell lung cancer, Lung adenocarcinoma, Squamous cell lung carcinoma, Acute myeloid leukemia, Neoplasm of brain, Neoplasm of the breast, Glioblastoma, Hepatocellular carcinoma, Hereditary cancer-predisposing syndrome,
  • Pancreatic adenocarcinoma Transitional cell carcinoma of the bladder, Brainstem glioma, Carcinoma of esophagus, Colorectal Neoplasms, Adrenocortical carcinoma, Adenocarcinoma of stomach, Ovarian Serous Cystadenocarcinoma, Malignant neoplasm of body of uterus, Adenocarcinoma of prostate, Uterine Carcinosarcoma, Metastatic pancreatic neuroendocrine tumours, Liver cancer, Chronic lymphocytic leukemia, Medulloblastoma, Malignant melanoma of skin, Multiple myeloma, Squamous cell carcinoma of the head and neck, Small cell lung cancer, Lung adenocarcinoma, Squamous cell lung carcinoma, Acute myeloid leukemia, Neoplasm of brain, Neoplasm of the breast, Glioblastoma, Hepatocellular carcinoma, Hereditary cancer-predisposing syndrome, Pancreatic a
  • Transitional cell carcinoma of the bladder Brainstem glioma, Carcinoma of esophagus, Colorectal Neoplasms, Adrenocortical carcinoma, Adenocarcinoma of stomach, Ovarian Serous Cystadenocarcinoma, Malignant neoplasm of body of uterus, Adenocarcinoma of prostate, Uterine Carcinosarcoma, Medulloblastoma, Multiple myeloma, Squamous cell carcinoma of the head and neck, Li-Fraumeni syndrome, Lung adenocarcinoma, Renal cell carcinoma, papillary, Neoplasm of the breast, Hereditary cancer-predisposing syndrome, Pancreatic adenocarcinoma, Squamous cell carcinoma of the skin, Transitional cell carcinoma of the bladder, Colorectal Neoplasms, Adenocarcinoma of stomach, Ovarian Serous Cystadenocarcinoma, Malignant neoplasm of body of uterus, Heredit
  • Hepatocellular carcinoma Hereditary cancer-predisposing syndrome, Liver cancer,
  • adenocarcinoma Li-Fraumeni syndrome, Squamous cell lung carcinoma, Neoplasm of brain, Neoplasm of the breast, Glioblastoma, Hepatocellular carcinoma, Hereditary cancer- predisposing syndrome, Pancreatic adenocarcinoma, Transitional cell carcinoma of the bladder, Brainstem glioma, Carcinoma of esophagus, Colorectal Neoplasms,
  • Adenocarcinoma of stomach Ovarian Serous Cystadenocarcinoma, Adenocarcinoma of prostate, Uterine Carcinosarcoma, Liver cancer, Chronic lymphocytic leukemia, Multiple myeloma, Squamous cell carcinoma of the head and neck, Lung adenocarcinoma, Li- Fraumeni syndrome, Neoplasm of brain, Neoplasm of the breast, Glioblastoma,
  • Hepatocellular carcinoma Pancreatic adenocarcinoma, Transitional cell carcinoma of the bladder, Carcinoma of esophagus, Colorectal Neoplasms, Uterine cervical neoplasms, Adenocarcinoma of stomach, Ovarian Serous Cystadenocarcinoma, Malignant neoplasm of body of uterus, Uterine Carcinosarcoma, Li-Fraumeni syndrome, Liver cancer,
  • Hepatocellular carcinoma Hereditary cancer-predisposing syndrome, Liver cancer,
  • Malignant melanoma of skin Multiple myeloma, Squamous cell carcinoma of the head and neck, Lung adenocarcinoma, Breast cancer, somatic, Squamous cell lung carcinoma, Neoplasm of brain, Neoplasm of the breast, Hepatocellular carcinoma, Breast
  • adenocarcinoma Hereditary cancer-predisposing syndrome, Pancreatic adenocarcinoma, Transitional cell carcinoma of the bladder, Carcinoma of esophagus, Colorectal Neoplasms, Adenoid cystic carcinoma, Adenocarcinoma of stomach, Ovarian Serous
  • Cystadenocarcinoma Malignant neoplasm of body of uterus, Uterine Carcinosarcoma, Carcinoma of pancreas, Dyskeratosis congenita, autosomal recessive, Leber congenital amaurosis, Cone-rod dystrophy, Autosomal recessive congenital ichthyosis, Ichthyosis, Autosomal recessive congenital ichthyosis, Spondylocostal dysostosis, Inclusion Body Myopathy, Dominant, Hepatic failure, early-onset, and neurologic disorder due to
  • cytochrome C oxidase deficiency Charcot-Marie-Tooth disease and deafness, Dejerine- Sottas disease, Dejerine- Sottas disease, Dejerine- Sottas syndrome, autosomal dominant, Charcot-Marie-Tooth disease, type IA, Dejerine- Sottas syndrome, autosomal dominant, Charcot-Marie-Tooth disease, type I, Mitochondrial complex III deficiency, nuclear type 2, Common variable immunodeficiency, Immunoglobulin A deficiency, Common Variable Immune Deficiency, Dominant, Common variable immunodeficiency, Hereditary cancer- predisposing syndrome, Multiple fibrofolliculomas, Hereditary cancer-predisposing syndrome, Hereditary cancer-predisposing syndrome, Multiple fibrofolliculomas, Hereditary cancer-predisposing syndrome, Smith-Magenis syndrome, Joubert syndrome, Meckel-Gruber syndrome, Sjogren-L
  • ophthalmoplegia Frontotemporal dementia, Progressive supranuclear ophthalmoplegia, Muscular dystrophy, Epilepsy, progressive myoclonic 6, Glanzmann thrombasthenia, Amelogenesis imperfecta, type IV, Tricho-dento-osseous syndrome, Osteogenesis imperfecta type I, Osteogenesis imperfecta type 2, thin-bone, Osteogenesis imperfecta with normal sclerae, dominant form, Osteogenesis imperfecta type I, Osteogenesis imperfecta type IIC, Osteogenesis imperfecta, recessive perinatal lethal, Osteogenesis imperfecta type I,
  • Hyperkalemic Periodic Paralysis Type 1 Hypokalemic periodic paralysis, Hypokalemic periodic paralysis, type 2, Hyperkalemic Periodic Paralysis Type 1, Carcinoma of colon, Oligodontia-colorectal cancer syndrome, Carney complex, type 1, Andersen Tawil syndrome, Familial periodic paralysis, Andersen Tawil syndrome, Andersen Tawil syndrome, Congenital long QT syndrome, Acampomelic campomelic dysplasia, Camptomelic dysplasia, Striatal necrosis, bilateral, and progressive polyneuropathy, Pontocerebellar hypoplasia type 4, Pontocerebellar hypoplasia type 2A, Pontocerebellar hypoplasia type 4, Pontocerebellar hypoplasia type 2A,
  • adrenoleukodystrophy Epidermodysplasia verruciformis, Desbuquois dysplasia, Rolandic epilepsy, Ciliary dyskinesia, Ciliary dyskinesia, primary, Glycogen storage disease, type II, Glycogen storage disease type II, infantile, Glycogen storage disease, type II, Baraitser- Winter Syndrome, Nephrotic syndrome, type 8, Autosomal recessive cutis laxa type 2B, Encephalopathy, progressive, early-onset, with brain atrophy and thin corpus callosum, Arhinia choanal atresia microphthalmia, Oculomelic amyoplasia, Dystonia, Spinocerebellar ataxia, ACTH resistance, Glucocorticoid Deficiency, Renal hypodysplasia/aplasia, Left ventricular noncompaction, Pancreatic agenesis and congenital heart disease, Abnormality of cardiovascular system morphology, Con
  • Spondyloenchondrodysplasia with immune dysregulation Deficiency of alpha-mannosidase, Aicardi Goutieres syndrome, Blood group - Lutheran inhibitor, Glutaric aciduria, type 1, Marshall-Smith syndrome, Epileptic encephalopathy, early infantile, Lamilial hemiplegic migraine type 1, Episodic ataxia type 2, Epileptic encephalopathy, early infantile, Lamilial hemiplegic migraine type 1, Autosomal recessive non-syndromic intellectual disability, Lehman syndrome, Cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy, Combined oxidative phosphorylation deficiency, Severe combined immunodeficiency, autosomal recessive, T cell-negative, B cell-positive, NK cell-negative, Thyroid dyshormonogenesis, Cold-induced sweating syndrome, Pseudoachondroplastic spondyloepiphys
  • Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies type A5 Muscle weakness, Headache, Gait imbalance, Difficulty walking, Paresthesia, Difficulty climbing stairs, Scapular winging, Difficulty standing, Muscular dystrophy- dystroglycanopathy, Walker- Warburg congenital muscular dystrophy, Congenital muscular dystrophy-dystroglycanopathy with mental retardation, type B5, Fimb-girdle muscular dystrophy-dystroglycanopathy, type C5, Walker- Warburg congenital muscular dystrophy, Walker-Warburg congenital muscular dystrophy, Congenital muscular dystrophy- dystroglycanopathy without mental retardation, type B5, Walker-Warburg congenital muscular dystrophy, Congenital muscular dystrophy-dystroglycanopathy with mental retardation, type B5, Fimb-girdle muscular dystrophy-dystroglycanopathy, type C5,
  • Congenital muscular dystrophy-dystroglycanopathy with mental retardation type B5, Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies type A5, Walker-Warburg congenital muscular dystrophy, Hypocalciuric hypercalcemia, familial, type III, Mental retardation, autosomal recessive, Hyperferritinemia cataract syndrome, F-ferritin deficiency, autosomal recessive, Isolated lutropin deficiency, Autistic disorder of childhood onset, Motor delay, Iris coloboma, Autism, Delayed speech and language development, Abnormality of vision, Early infantile epileptic encephalopathy, Ataxia-oculomotor apraxia, Early infantile epileptic encephalopathy, Peripheral neuropathy, myopathy, hoarseness, and hearing loss, Spinocerebellar ataxia, Spinocerebellar ataxia, Retinitis pigmentosa, Nemaline myopathy, Polygluco
  • Cardiomyopathy hypertrophic, midventricular, digenic, Dowling-Degos disease, C-like syndrome, Multiple synostoses syndrome, Symphalangism, proximal, Fibular hypoplasia and complex brachydactyly, schizophrenia, Aicardi Goutieres syndrome, Severe combined immunodeficiency due to ADA deficiency, Partial adenosine deaminase deficiency, Multiple congenital anomalies-hypotonia-seizures syndrome, Primary autosomal recessive
  • microcephaly Galloway-Mowat Syndrome, Arterial tortuosity syndrome, Epileptic encephalopathy, early infantile, Helsmoortel-van der aa syndrome, Congenital disorder of glycosylation type IE, Idiopathic hypercalcemia of infancy, Cushing's syndrome, McCune- Albright syndrome, Polyostotic fibrous dysplasia, somatic, mosaic, Pituitary Tumor, Growth Hormone- Secreting, Somatic, Liver cancer, McCune- Albright syndrome, Malignant melanoma of skin, Squamous cell carcinoma of the head and neck, Lung adenocarcinoma, Neoplasm of the breast, Hepatocellular carcinoma, Pancreatic adenocarcinoma, Neoplasm, Colorectal Neoplasms, Uterine cervical neoplasms, Adrenocortical carcinoma,
  • Congenital cataract Klippel-feil syndrome, autosomal recessive, with nemaline myopathy and facial dysmorphism
  • Hermansky-Pudlak syndrome Cataract, congenital nuclear, autosomal recessive, Cataract, multiple types, Familial cancer of breast, Hereditary cancer- predisposing syndrome, Hereditary cancer-predisposing syndrome, Familial cancer of breast, Prostate cancer, somatic, Hereditary cancer-predisposing syndrome, Osteosarcoma,
  • Neurofibromatosis type 2
  • Epilepsy familial focal, with variable foci, Rolandic epilepsy, Parkinson disease, Sorsby fundus dystrophy, Macrothrombocytopenia and granulocyte inclusions with or without nephritis or sensorineural hearing loss, Microcytic anemia, Peripheral demyelinating neuropathy, central dysmyelination, Waardenburg syndrome, and Hirschsprung disease, Waardenburg syndrome type 4C, Parkinson disease, Infantile neuroaxonal dystrophy, Adenylosuccinate lyase deficiency, Nephronophthisis-like nephropathy, Carcinoma of colon, Rubinstein-Taybi syndrome, Carcinoma of colon, Kanzaki disease, Methemoglobinemia type 2, Autosomal recessive syndrome of syndactyly, undescended testes and central nervous system defects, Megalencephalic
  • leukoencephalopathy with subcortical cysts Microcephaly with chorioretinopathy, autosomal recessive, Mitochondrial DNA depletion syndrome (MNGIE type), Muscular dystrophy, congenital, megaconial type, Metachromatic leukodystrophy, juvenile type, Metachromatic leukodystrophy, late infantile, Metachromatic leukodystrophy, Metachromatic
  • leukodystrophy severe, Metachromatic leukodystrophy, Short stature, idiopathic, X-linked, Leri Weill dyschondrosteosis, Chondrodysplasia punctata, X-linked recessive, Kallmann syndrome, Ocular albinism, type I, Opitz-Frias syndrome, Amelogenesis imperfecta, type IE, Spondyloepiphyseal dysplasia tarda, Oral-facial-digital syndrome, Joubert syndrome, Joubert syndrome, Oral-facial-digital syndrome, Paroxysmal nocturnal hemoglobinuria 1, Multiple congenital anomalies-hypotonia-seizures syndrome, Pettigrew syndrome, Nance-Horan syndrome, Congenital cataract, Early infantile epileptic encephalopathy, Early infantile epileptic encephalopathy, Atypical Rett syndrome, Early infantile epileptic encephalopathy, Angelman syndrome-like, Early infantile epil
  • Phosphoribosylpyrophosphate synthetase superactivity Charcot-Marie-Tooth disease, X- linked recessive, type 5, Alport syndrome, X-linked recessive, Microscopic hematuria, Elevated mean arterial pressure, Chronic kidney disease, Mental retardation, X-linked, Megalocornea, Mental retardation, X-linked, Heterotopia, Lissencephaly, X-linked,
  • Fucosidosis Lissencephaly, X-linked, Subcortical laminar heterotopia, X-linked, Danon disease, Syndromic X-linked mental retardation, Cabezas type, Mental retardation, X-linked, syndromic, wu type, Lymphoproliferative syndrome, X-linked, Lymphoproliferative syndrome, X-linked, Simpson-Golabi-Behmel syndrome, Borjeson-Forssman-Lehmann syndrome, Lesch-Nyhan syndrome, Lesch-Nyhan syndrome, HPRT Flint, Partial
  • Frontometaphyseal dysplasia Cardiac valvular dysplasia, X-linked, Periventricular nodular heterotopia, Oto-palato-digital syndrome, type II, Oto-palato-digital syndrome, type I, Emery-Dreifuss muscular dystrophy, X-linked, 3-Methylglutaconic aciduria type 2,
  • Galloway-Mowat Syndrome X-Linked, Glucose 6 phosphate dehydrogenase deficiency, G6pd a-, G6PD Canton, G6PD GIFU, G6PD Agrigento, G6PD Taiwan-Hakka, Anemia, nonspherocytic hemolytic, due to G6PD deficiency, G6PD LOMA Linda, Anemia, nonspherocytic hemolytic, due to G6PD deficiency, Glucose phosphate dehydrogenase deficiency, G6pd a-G6PD Gastonia, G6PD Marion, G6PD Minnesota, Anemia,
  • nonspherocytic hemolytic due to G6PD deficiency, Hypohidrotic ectodermal dysplasia with immune deficiency, Dyskeratosis congenita X-linked, Hereditary factor VIII deficiency disease, Parkinsonism, early onset with mental retardation, Mental retardation, X-linked, Leri Weill dyschondrosteosis, XY sex reversal, type 1, Leigh syndrome, Chloramphenicol resistance, nonsyndromic sensorineural, mitochondrial, Leber's optic atrophy, Cytochrome c oxidase i deficiency, Leigh syndrome, Mitochondrial complex I deficiency, Leigh syndrome, Retinitis pigmentosa-deafness syndrome, Cerebellar ataxia, cataract, and diabetes mellitus.
  • the present disclosure provides uses of any one of the fusion proteins described herein and a guide RNA targeting this fusion protein to a target A:T base pair in a nucleic acid molecule in the manufacture of a kit for nucleic acid editing, wherein the nucleic acid editing comprises contacting the nucleic acid molecule with the fusion protein and guide RNA under conditions suitable for the substitution of the adenine (A) of the A:T nucleobase pair with a cytosine (C).
  • the nucleic acid molecule is a double-stranded DNA molecule.
  • the step of contacting of induces separation of the double- stranded DNA at a target region.
  • the step of contacting further comprises nicking one strand of the double-stranded DNA, wherein the one strand comprises an unmutated strand that comprises the T of the target A:T nucleobase pair.
  • the step of contacting is performed in vitro. In other embodiments, the step of contacting is performed in vivo. In some
  • the step of contacting is performed in a subject (e.g., a human subject or a non human animal subject). In some embodiments, the step of contacting is performed in a cell, such as a human or non-human animal cell.

Abstract

The instant specification provides for base editors which satisfy a need in the art for installation of targeted transversions of adenine (A) to cytosine (C), or correspondingly, trans versions of thymine (T) to guanine (G). The base editor domains include a nucleic acid programmable DNA binding protein and an adenine oxidase. The base editors may be engineered through the use of continuous or non-continuous evolution systems, such as phage-assisted continuous evolution (PACE). In particular, the instant specification provides for adenine-to-cytosine (or thymine-to-guanine) base editor variants that overcome deficiencies in the art for base editors that can install single-base transversion mutations. In some embodiments, methods for targeted nucleic acid editing are provided. In some embodiments, pharmaceutical compositions comprising, and vectors and kits for the generation of, targeted base editors are provided. In some embodiments, cells containing such vectors are provided. In some embodiments, methods of treatment comprising administering the base editors are provided.

Description

A:T TO C:G BASE EDITORS AND USES THEREOF
RELATED APPLICATIONS
[0001] This application claims benefit of U.S. Provisional Application No. 62/814,766, filed on March 6, 2019, the entire disclosure of which is incorporated by reference herein.
BACKGROUND OF THE INVENTION
[0002] Targeted editing of nucleic acid sequences, including the targeted cleavage or targeted introduction of a specific modification into genomic DNA, is a highly promising approach for the study of gene function and also has the potential to provide new therapies for genetic diseases, including those caused by point mutations. Point mutations represent the majority of known human genetic variants associated with disease. Developing robust methods to introduce and correct point mutations is therefore important in understanding and treating diseases with a genetic component.
[0003] Base editing involves the conversion of a specific nucleic acid base into another at a targeted genomic locus. For certain approaches, this can be achieved without requiring double-stranded DNA breaks (DSB). Since many genetic diseases arise from point mutations, this technology has important implications in the study of human health and disease. Engineered base editors are capable of editing many targets with high efficiency, often achieving editing of 30-70% of cells following a single treatment, without selective enrichment of the cell population for editing events.
SUMMARY OF THE INVENTION
[0004] Engineered base editors have been recently developed. Reference is made to Komor, A.C. et al, Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity, Sci Adv 3
(2017);Rees, H.A. et al., Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery, Nat. Commun. 8, 15790 (2017); U.S. Patent Publication No. 2018/0073012, published March 15, 2018, U.S. Patent Publication No.
2017/0121693, published May 4, 2017, International Publication No. WO 2017/070633, published April 27, 2017, and U.S. Patent Publication No. 2015/0166980, published June 18, 2015, U.S. Patent No. 9,840,699, issued December 12, 2017, U.S. Patent No. 10,077,453, issued September 18, 2018, and International Application No. PCT/US2019/61685, filed November 15, 2019, the contents of each of which are incorporated herein by reference in their entireties, each of which are incorporated herein in their entireties. Base editors (BEs) are typically fusions of a Cas (“CRISPR-associated”) domain and a nucleobase modification domain (e.g., a natural or evolved deaminase, such as a cytidine deaminase that include APOBEC1 (“apolipoprotein B mRNA editing enzyme, catalytic polypeptide 1”), CDA (“cytidine deaminase”), and ATP (“activation-induced cytidine deaminase”)) domains. In some cases, base editors may also include proteins or domains that alter cellular DNA repair processes to increase the efficiency and/or stability of the resulting single-nucleotide change.
[0005] Two classes of base editors have been generally described to date: cytidine base editors convert target C:G base pairs to T:A base pairs, and adenine base editors convert A:T base pairs to G:C base pairs. Collectively, these two classes of base editors enable the targeted installation of all possible transition mutations (C-to-T, G-to-A, A-to-G, T-to-C, C- to-U, and A-to-U), which collectively account for about 61% of known human pathogenic single nucleotide polymorphisms (SNPs) in the ClinVar database. See Gaudelli, N.M. el al, Programmable base editing of A:T to G:C in genomic DNA without DNA cleavage. Nature 551, 464-471 (2017). In particular, C-to-T base editors use a cytidine deaminase to convert cytidine to uridine in the single-stranded DNA loop created by the Cas9 (“CRISPR- associated protein 9”) domain. The opposite strand is nicked by Cas9 to stimulate DNA repair mechanisms that use the edited strand as a template, while a fused uracil glycosylase inhibitor slows excision of the edited base. Eventually, DNA repair leads to a C:G to T:A base pair conversion. This class of base editor is described in U.S. Patent Publication No. 2017/0121693, published May 4, 2017, which issued on January 1, 2019 as U.S. Patent No. 10,167,457, which is incorporated by reference in its entirety herein.
[0006] A major limitation of base editing is the inability to generate transversion (purine <- pyrimidine) changes, which are needed to correct -38% of known human pathogenic SNPs. See Komor, A.C. et al, Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage, Nature 533, 420-424 (2016) and Landrum, M.J. et al, ClinVar: public archive of relationships among sequence variation and human phenotype, Nucleic Acids Res. 42, D980-985 (2014), each of which is incorporated by reference. Of this -38% of known pathogenic SNPs, about 15% arise from C:G to A:T mutations. Many C:G to A:T point mutations introduce premature stop codons (UAA, UAG, UGA), resulting in nonsense mutations in protein coding regions.
[0007] Currently, transversions can only be repaired by nuclease-mediated formation of a double-stranded break (DSB) followed by homology directed repair (HDR), which is typically inefficient, especially in non-mitotic cells, and leads to undesired by-products, such as indels (insertions and deletions) and translocations. See Komor, A. C., Badran, A. H. & Liu, D. R. CRISPR-Based Technologies for the Manipulation of Eukaryotic Genomes, Cell 168, 20-36, (2017), which is incorporated herein by reference. Since nucleobase deamination alone cannot interconvert purines and pyrimidines, the development of transversion base editors requires the development of a new editing strategy, such as the manipulation of endogenous DNA repair pathways or a different nucleobase chemical transformation. The present disclosure describes novel transversion base editors using an innovative adenine oxidation strategy. The present invention greatly expands the capabilities of base editing.
[0008] The present disclosure provides transversion base editors which add to the repertoire of base editors that have already been developed. In particular, the present disclosure provides for adenine-to-cytosine or“ACBE” (or thymine-to-guanine or“TGBE”)
transversion base editors which satisfies the need in the art for the installation of targeted single-base transversion nucleobase changes in a target nucleotide sequence, e.g., a genome. In addition, the present disclosureprovides for nucleic acid molecules encoding and/or expressing these transversion base editors, as well as expression vectors or constructs for expressing the transversion base editors described herein, host cells comprising said nucleic acid molecules and expression vectors, and compositions for delivering and/or administering nucleic acid-based embodiments described herein. In addition, the disclosure provides for compositions comprising these transversion base editors. Still further, the present disclosure provides for methods of making adenine-to-cytosine transversion base editors, as well as methods of using adenine-to-cytosine transversion base editors or nucleic acid molecules encoding such transversion base editors in applications including editing a nucleic acid molecule, e.g., a genome.
[0009] The present inventors have discovered strategies to develop novel transversion base editors. Specifically, the inventors have developed a novel adenine oxidation strategy to install transversion A-to-C and T-to-G nucleobase changes in a targeted manner. This new strategy allows for the efficient and specific transversion of A-to-C or T-to-G using the inventive base editors described herein.
[0010] In this adenine oxidation strategy, enzyme-catalyzed oxidation of a targeted adenine (A) in a nucleic acid of interest is induced, resulting in 8-oxoadenine (8-oxoA) formation. Steric rotation of the 8-oxoA around the glycosidic bond is induced, presenting the
Hoogsteen edge for base pairing (see FIG. 2). Without wishing to be bound by any particular theory, during replication or repair of the unmutated strand, 8-oxoA is read by a polymerase as a cytosine and the cell’s mismatch repair machinery converts the base-paired thymine of the non-edited strand to a guanine to correct the apparent mismatch. The resulting base pairing features two three-center hydrogen bonding systems. Upon the next round of replication, the cell’s mismatch repair machinery converts the 8-oxoA lesion to a cytosine. A desired A-to-C transversion is thus achieved. Adenine oxidation is achieved by the targeted use of a fusion protein comprising a napDNAbp (e.g., a catalytically dead Cas9 (“dCas9”) or Cas9 nickase (“nCas9”)) domain, an adenine oxidase domain, and optionally a linker connecting these two domains (see FIG. 1).
[0011] In one aspect, the base editor fusion protein comprises (i) a nucleic acid
programmable DNA binding protein (napDNAbp), and (ii) an adenine oxidase. The nucleic acid programmable DNA binding protein (napDNAbp) may be a Cas9 domain. The napDNAbp may be a CasX, a CasY, a C2cl, a C2c2, a C2c3, a GeoCas9, a CjCas9, a Casl2a (formerly known as Cpfl), a Casl2b, a Casl2g, a Casl2h, a Casl2i, a Casl3b, a Casl3c, a Casl3d, a Casl4, a Csn2, an xCas9, an SpCas9-NG, an LbCasl2a, an AsCasl2a, a Cas9- KKH, a circularly permuted Cas9, an Argonaute (Ago) domain, a SmacCas9, or a Spy- macCas9. The napDNAbp domain may be a nuclease active Cas9 domain, a nuclease inactive Cas9 (dCas9) domain, or a Cas9 nickase (nCas9) domain.
[0012] In various embodiments of the base editor fusion protein, the adenine oxidase is a wild-type oxidase, or a variant thereof, that oxidizes an adenine in DNA to 8-oxoA.
[0013] In various embodiments, the adenine oxidase comprises any one of the amino acid sequences of SEQ ID NOs: 5-8, 10, 15-20, 22-31, and 35-41. In particular embodiments, the adenine oxidase comprises any one of the amino acid sequences of SEQ ID NOs: 5-8, 10, 15- 20, 22-31, and 35-41. In particular embodiments, the adenine oxidase comprises the amino acid sequence of SEQ ID NO: 24. In certain embodiments, a variant of the wild-type oxidase is produced by evolving an adenine oxidase enzyme using a directed evolution methodology. In certain embodiments, the directed evolution methodology comprises phage assisted continuous evolution (PACE). In other embodiments, the evolution methodology comprises phage assisted non-continuous evolution (PANCE). In still other embodiments, the evolution methodology comprises other non-continuous evolutions, such as antibiotic or other discrete plate-based selections.
[0014] In various embodiments, the fusion protein further comprises an inhibitor of base excision repair (“iBER”). In certain embodiments, the iBER is a thymine-DNA glycosylase (TDG) inhibitor (“TDG inhibitor”), uracil-DNA glycosylase (UDG) inhibitor (“UDG inhibitor”), or an 8-oxo-guanine glycosylase (OGG or OGGI) inhibitor (“OGG inhibitor”). In certain embodiments, the iBER comprises a catalytically inactive TDG that binds 8-oxoA to prevent its excision during subsequent mismatch repair.
[0015] In various embodiments, the fusion proteins described herein may comprise any of the following structures: NH2-[adenine oxidase]-[napDNAbp]-COOH; or NH2-[napDNAbp]- [adenine oxidase] -COOH, wherein each instance of“]-[” comprises an optional linker.
[0016] In various embodiments, wherein the fusion proteins include an iBER domain, the base editor fusion proteins described herein may comprise any of the following structures: Eh- [iBER] -[adenine oxidase]-[napDNAbp]-COOH; Nth-fadenine oxidase]-[iBER]- [napDNAbp]-COOH; NH2-[adenine oxidase] -[napDNAbp]- [iBER] -COOH; NH2-[iBER]- [adenine oxidase] -[napDNAbp] -COOH; NH2-[adenine oxidase]-[iBER]-[napDNAbp]- COOH; or NH2-[iBER]-[napDNAbp]-[adenine oxidase] -COOH, wherein each instance of “]-[” comprises an optional linker.
[0017] The linker fusing the napDNAbp, oxidase, and optional iBER may be any suitable amino acid linker sequence, polymer, or covalent bond. Exemplary linkers include any of the following amino acid sequences: SGGSSGGSSGS ETPGTS ES ATPES SGGSSGGS (SEQ ID NO: 11); SGGSGGSGGS (SEQ ID NO: 12); GGG; GGGS (SEQ ID NO: 1); SGGGS (SEQ ID NO: 2); SGSETPGTSESATPES (SEQ ID NO: 48); or SGGS (SEQ ID NO: 14).
[0018] In various other embodiments, the disclosure provides nucleic acid molecules or constructs encoding any of the base editor fusion proteins, or domains thereof. The nucleic acid sequences may be codon-optimized for expression in the cells of any organism of interest. In certain embodiments, the nucleic acid sequence is codon-optimized for expression in human cells.
[0019] In other embodiments, the disclosure provides polynucleotides and/or vectors encoding any of the base editor fusion proteins described herein, or domains thereof. These nucleic acid sequences are typically engineered or modified experimentally. For instance, these nucleic acid sequences may be codon-optimized for expression in an organism of interest, e.g., mammalian cells. In certain embodiments, the nucleic acid sequences are codon-optimized for expression in human cells. In other embodiments, cells containing such polynucleotides or constructs are provided. In other embodiments, complexes comprising any of the fusion proteins described herein and a guide RNA bound to the napDNAbp domain of the fusion protein are provided.
[0020] In other embodiments, the disclosure provides a pharmaceutical composition comprising any of the fusion proteins described herein and a pharmaceutically acceptable excipient. In certain embodiments, the pharmaceutical composition further comprises a gRNA. In other embodiments, the disclosure provides a kit comprising a nucleic acid construct that includes (i) a nucleic acid sequence encoding any of the fusion proteins described herein; (ii) a heterologous promoter that drives expression of the sequence of (i); and optionally an expression construct encoding a guide RNA backbone and the target sequence.
[0021] In some embodiments, methods for targeted nucleic acid editing are provided. The methods described herein typically comprise i) contacting a nucleic acid sequence with a complex comprising any of the fusion proteins described herein and a guide nucleic acid, wherein the double-stranded DNA comprises a target A:T (or T:A) nucleobase pair, and ii) editing the thymine (or adenine) of the A:T (or T:A) nucleobase pair. The methods may further comprise iii) cutting or nicking the non-edited strand of the double- stranded DNA.
[0022] In some embodiments, methods of treatment using the inventive base editors are provided. The methods described herein may comprise treating a subject having or at risk of developing a disease, disorder, or condition, comprising administering to the subject a fusion protein as described herein, a polynucleotide as described herein, a vector as described herein, or a pharmaceutical composition as described herein.
[0023] It should be appreciated that the foregoing concepts, and additional concepts discussed below, may be arranged in any suitable combination, as the present disclosure is not limited in this respect. Further, other advantages and novel features of the present disclosure will become apparent from the following detailed description of various non limiting embodiments when considered in conjunction with the accompanying figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0024] The following drawings form part of the present specification and are included to further demonstrate certain embodiments of the present disclosure, which can be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.
[0025] FIG. 1 is a schematic illustration showing an exemplary fusion protein of the invention. A fusion protein comprising an nCas9 domain linked to an adenine oxidase enzyme is targeted to the correct adenine nucleobase through the hybridization of a single guide RNA (“sgRNA”) to a complementary sequence of nucleic acid. The adenine oxidase oxidizes the adenine to an 8-oxoadenine, and subsequently, the cell’s native replication/repair machinery recognizes the mutated base and effects the desired change to a cytosine nucleobase. Abbreviations: 8oA, 8-oxoadenine; iBER, inhibitor of base excision repair; sgRNA, single-guide RNA; PAM, protospacer adjacent motif.
[0026] FIG. 2 depicts the chemical conversion of adenine to 8-oxoadenine, which disrupts existing hydrogen bonding with the thymine of the unmutated strand. Steric rotation of the 8- oxoA around the glycosidic bond is induced, presenting the Hoogsteen edge for base pairing. Without wishing to be bound by any particular theory, during replication or repair of the unmutated strand, 8-oxoA is read by a polymerase as a cytosine, and the cell’s mismatch repair machinery converts the base-paired thymine of the non-edited strand to a guanine to correct the apparent mismatch. The resulting base pairing features two three-center hydrogen bonding systems. Upon the next round of replication, the cell’s mismatch repair machinery converts the 8-oxoA to a cytosine, thereby completing the desired A:T to C:G mutation.
[0027] FIG. 3 depicts a possible chemical mechanism for the a-ketoglutarate-dependent iron oxidase-mediated conversion of adenine to 8-oxoadenine. An oxo group is transferred from a non-heme Fe(IV) center to the 8 position of adenine. Formation of a 7,8-oxaziridine intermediate is induced, which rearranges spontaneously to the desired 8-oxoadenine.
[0028] FIG. 4 depicts an exemplary assay for selection of evolved variants of human
AlkBH3 a-ketoglutarate-dependent iron oxidase that are highly effective at oxidizing adenine. Plasmids containing mutagenized AlkBH3-dCas9 fusion proteins and targeting guide RNAs (sgRNAs), and selection plasmids containing an inactivated spectinomycin resistance gene with a mutation at the active site that requires A:T to C:G editing to correct, are transformed into E. coli cells, which are plated onto agar media containing spectinomycin and sucrose. Cells harboring plasmids with AlkBH3 mutants that restore antibiotic resistance are isolated and subjected to further rounds of mutation and selection under varying selection stringencies. AlkBH3 variants emerging from each round of selection are then expressed within a fusion construct comprising a Cas9 nickase (nCas9). The resulting fusion proteins are tested for base editing activity in mammalian cells.
[0029] FIG. 5 depicts the operation of an inhibitor of base excision repair (iBER) domain in exemplary base editor fusion proteins disclosed herein. In mammalian cells, competitive base excision repair may interfere with 8-oxoadenine-mediated base editing. Accordingly, an iBER is fused to to a fusion protein comprising an nCas9 domain and an adenine excision domain. The iBER domain competes for binding of the 8-oxoadenine lesion with active, endogenous excision repair enzymes, preventing or slowing base excision repair.
Abbreviations: oA, oxoadenine; TDG, thymine-DNA glycosylase. Pol d, RCA and RCNF are types of mammalian DNA polymerases. DEFINITIONS
[0030] As used herein and in the claims, the singular forms“a,”“an,” and“the” include the singular and the plural reference unless the context clearly indicates otherwise. Thus, for example, a reference to“an agent” includes a single agent and a plurality of such agents.
[0031] The term“accessory plasmid,” as used herein, refers to a plasmid comprising a gene required for the generation of infectious viral particles under the control of a conditional promoter. In the context of continuous evolution of genes, transcription from the conditional promoter of the accessory plasmid is typically activated, directly or indirectly, by a function of the gene to be evolved. Accordingly, the accessory plasmid serves the function of conveying a competitive advantage to those viral vectors in a given population of viral vectors that carry a version of the gene to be evolved able to activate the conditional promoter or able to activate the conditional promoter more strongly than other versions of the gene to be evolved. In some embodiments, only viral vectors carrying an“activating” version of the gene to be evolved will be able to induce expression of the gene required to generate infectious viral particles in the host cell, and, thus, allow for packaging and propagation of the viral genome in the flow of host cells. Vectors carrying non-activating versions of the gene to be evolved, on the other hand, will not induce expression of the gene required to generate infectious viral vectors, and, thus, will not be packaged into viral particles that can infect fresh host cells. Exemplary accessory plasmids have been described, for example in U.S. Application No. 15/567,312, published as U.S. Pub. No. 2018/0087046, filed on April 15, 2016, the entire contents of which is incorporated by reference herein.
[0032]“Base editing” is a genome editing technology that involves the conversion of a specific nucleic acid base into another at a targeted genomic locus. In certain embodiments, this can be achieved without requiring double- stranded DNA breaks (DSB). To date, other genome editing techniques, including CRISPR-based systems, begin with the introduction of a DSB at a locus of interest. Subsequently, cellular DNA repair enzymes mend the break, commonly resulting in random insertions or deletions (indels) of bases at the site of the DSB. However, when the introduction or correction of a point mutation at a target locus is desired rather than stochastic disruption of the entire gene, these genome editing techniques are unsuitable, as correction rates are low (e.g., typically 0.1% to 5%), with the major genome editing products being indels. In order to increase the efficiency of gene correction without simultaneously introducing random indels, the present inventors previously modified the CRISPR/Cas9 system to directly convert one DNA base into another without DSB formation. See, Komor, A.C., et al, Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420-424 (2016), the entire contents of which is incorporated by reference herein.
[0033] In principle, there are 12 possible base-to-base changes that may occur via individual or sequential use of transition (i.e., a purine-to-purine change or pyrimidine-to- pyrimidine change) or transversion (i.e., a purine-to-pyrimidine or pyrimidine-to-purine) editors. These include:
• Transition base editors:
o C-to-T base editor (or“CTBE”). This type of editor converts a C:G Watson-Crick nucleobase pair to a T:A Watson-Crick nucleobase pair. Because the corresponding Watson-Crick paired bases are also interchanged as a result of the conversion, this category of base editor may also be referred to as a G-to-A base editor (or“GABE”).
o A-to-G base editor (or“AGBE”). This type of editor converts a A:T Watson-Crick nucleobase pair to a G:C Watson-Crick nucleobase pair. Because the corresponding Watson-Crick paired bases are also interchanged as a result of the conversion, this category of base editor may also be referred to as a T-to-C base editor (or“TCBE”).
• Transversion base editors:
o C-to-G base editor (or“CGBE”). This type of editor converts a C:G Watson-Crick nucleobase pair to a G:C Watson-Crick nucleobase pair. Because the corresponding Watson-Crick paired bases are also interchanged as a result of the conversion, this category of base editor may also be referred to as a G-to-C base editor (or“GCBE”).
o G-to-T base editor (or“ACBE”). This type of editor converts a G:C Watson-Crick nucleobase pair to a T:A Watson-Crick nucleobase pair. Because the corresponding Watson-Crick paired bases are also interchanged as a result of the conversion, this category of base editor may also be referred to as a C-to-A base editor (or“CABE”).
o A-to-T base editor (or“TGBE”). This type of editor converts a A:T Watson-Crick nucleobase pair to a T:A Watson-Crick nucleobase pair. Because the corresponding Watson-Crick paired bases are also interchanged as a result of the conversion, this category of base editor may also be referred to as a T-to-A base editor (or“ACBE”).
o A-to-C base editor (or“ACBE”). This type of editor converts a A:T Watson-Crick nucleobase pair to a C:G Watson-Crick nucleobase pair. Because the corresponding Watson-Crick paired bases are also interchanged as a result of the conversion, this category of base editor may also be referred to as a T-to-G base editor (or“TGBE”).
[0034] The term“base editors (BEs)”, as used herein, refers to the Cas-fusion proteins described herein. In some embodiments, the fusion protein comprises a nuclease-inactive Cas9 (dCas9) fused to an adenine oxidase which binds a nucleic acid in a guide RNA- programmed manner via the formation of an R-loop but does not cleave the nucleic acid. For example, the dCas9 domain of the fusion protein may include a D10A and a H840A mutation (which renders Cas9 capable of cleaving only one strand of a nucleic acid duplex) as described in PCT/US2016/058344 (filed on October 22, 2016 and published as WO
2017/070632 on April 27, 2017), which is incorporated herein by reference in its entirety.
The DNA cleavage domain of S. pyogenes Cas9 includes two subdomains, the HNH nuclease subdomain and the RuvCl subdomain. The HNH subdomain cleaves the strand
complementary to the gRNA (the“targeted strand,” or the strand at which editing or oxidation occurs), whereas the RuvCl subdomain cleaves the non-complementary strand containing the PAM sequence (the“non-targeted strand”, or the strand at which editing or oxidation does not occur). The RuvCl mutant D10A generates a nick on the targeted strand, while the HNH mutant H840A generates a nick on the non-targeted strand (see Jinek et al, Science. 337:816-821(2012); Qi et al, Cell. 28; 152(5): 1173-83 (2013))
[0035] In some embodiments, the fusion protein comprises a Cas9 nickase fused to an adenine oxidase, e.g., an adenine oxidase which converts an adenine nucleobase to 8- oxoadenine. The term“base editors” encompasses the base editors described herein as well as any base editor known or described in the art at the time of this filing or developed in the future. Reference is made to Rees & Liu, Base editing: precision chemistry on the genome and transcriptome of living cells, Nat Rev Genet. 2018;19(12):770-788; as well as.U.S. Patent Publication No. 2018/0073012, published March 15, 2018, which issued as U.S. Patent No. 10,113,163; on October 30, 2018; U.S. Patent Publication No. 2017/0121693, published May 4, 2017, which issued as U.S. Patent No. 10,167,457 on January 1, 2019; International Publication No. WO 2017/070633, published April 27, 2017; International Publication No. WO 2018/027078, published August 2, 2018; International Application No PCT/US2018/056146, filed October 16, 2018, which published as Publication No. WO 2019/079347 on April 25, 2019; International Application No PCT/US2019/033848, filed May 23, 2019, which published as Publication No. WO 2019/226593 on November 28, 2019; U.S. Patent Publication No. 2015/0166980, published June 18, 2015; U.S. Patent No.
9,840,699, issued December 12, 2017; U.S. Patent No. 10,077,453, issued September 18, 2018; International Publication No. WO 2019/023680, published January 31, 2019;
International Publication No. WO 2018/0176009, published September 27, 2018;
International Application No. PCT/US2019/47996, filed August 23, 2019; International Application No. PCT/US2019/049793, filed September 5, 2019; U.S. Provisional Application No. 62/835,490, filed April 17, 2019; International Application No. PCT/US2019/61685, filed November 15, 2019; International Application No. PCT/US2019/57956, filed October 24, 2019, the contents of each of which are incorporated herein by reference in their entireties.
[0036] The term“Cas9” or“Cas9 nuclease” or“Cas9 domain” refers to a CRISPR associated protein 9, or variant thereof, and embraces any naturally occurring Cas9 from any organism, any naturally-occurring Cas9, any Cas9 homolog, ortholog, or paralog from any organism, and any variant of a Cas9, naturally-occurring or engineered. More broadly, a Cas9 protein, domain, or domain is a type of“nucleic acid programmable DNA binding protein
(napDNAbp)”. The term Cas9 is not meant to be limiting and may be referred to as a“Cas9 or variant thereof.” Exemplary Cas9 proteins are described herein and also described in the art. The present disclosure is unlimited with regard to the particular Cas9 that is employed in the base editors of the invention.
[0037] In some embodiments, proteins comprising Cas9 or fragments thereof are referred to as“Cas9 variants.” A Cas9 variant shares homology to Cas9, or a fragment thereof. Cas9 variants include functional fragments of Cas9. For example, a Cas9 variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to wild type Cas9. In some embodiments, the Cas9 variant may have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32,
33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 or more amino acid changes compared to a wild type Cas9. In some embodiments, the Cas9 variant comprises a fragment of Cas9 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild type Cas9.
In some embodiments, the fragment is is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9.
[0038] As used herein, the term“dCas9” refers to a nuclease-inactive Cas9 or nuclease-dead Cas9, or a functional fragment or variant thereof, and embraces any naturally occurring dCas9 from any organism, any naturally-occurring dCas9 equivalent or functional fragment thereof, any dCas9 homolog, ortholog, or paralog from any organism, and any mutant or variant of a dCas9, naturally-occurring or engineered. The term dCas9 is not meant to be particularly limiting and may be referred to as a“dCas9 or equivalent.” Exemplary dCas9 proteins and method for making dCas9 proteins are further described herein and/or are described in the art and are incorporated herein by reference.
[0039] As used herein, the term“nCas9” or“Cas9 nickase” refers to a Cas9 or a functional fragment or variant thereof, which cleaves or nicks only one of the strands of a target cut site thereby introducing a nick in a double strand DNA molecule rather than creating a double strand break. This can be achieved by introducing appropriate mutations in a wild-type Cas9 which inactives one of the two endonuclease activities of the Cas9. Any suitable mutation which inactivates one Cas9 endonuclease activity but leaves the other intact is contemplated, such as one of D10A or H840A mutations in the wild-type Cas9 amino acid sequence (e.g., SEQ ID NO: 9) may be used to form the nCas9.
[0040] The term“continuous evolution,” as used herein, refers to an evolution procedure, (e.g., PACE) in which a population of nucleic acids is subjected to multiple rounds of (a) replication, (b) mutation, and (c) selection to produce a desired evolved product, for example, a nucleic acid encoding a protein with a desired activity, wherein the multiple rounds can be performed without investigator interaction and wherein the processes under (a)-(c) can be carried out simultaneously. Typically, the evolution procedure is carried out in vitro , for example, using cells in culture as host cells. In general, a continuous evolution process provided herein relies on a system in which a gene of interest is provided in a nucleic acid vector that undergoes a life-cycle including replication in a host cell and transfer to another host cell, wherein a critical component of the life-cycle is deactivated and reactivation of the component is dependent upon a desired mutation in the gene of interest. Reference is made to U.S Patent Publication No. 2013/0345064, which published on December 26, 2013 and issued as U.S. Patent No. 9,394,537 on July 19, 2016; U.S Patent Publication No.
2016/0348096, which published on December 1, 2016 and issued as U.S. Patent No. 10, 179, 911 on January 15, 2019; U.S Patent Publication No. 2017/0233708, which published August 17, 2017; and U.S Patent Publication No. 2017/0044520, which published on February 16, 2017, the contents of each of which are incorporated herein by reference in their entireties.
[0041] In some embodiments, the nucleic acid vector comprising the gene of interest is a phage, a viral vector, or naked DNA (e.g., a mobilization plasmid). In some embodiments, transfer of the gene of interest from cell to cell is via infection, transfection, transduction, conjugation, or uptake of naked DNA, and efficiency of cell-to-cell transfer (e.g., transfer rate) is dependent on the activity of a product encoded by the gene of interest. For example, in some embodiments, the nucleic acid vector is a phage harboring the gene of interest and the efficiency of phage transfer (via infection) is dependent on an activity of the gene of interest in that a protein required for the generation of phage particles (e.g., pill for M13 phage) is expressed in the host cells only in the presence of the desired activity of the gene of interest. In another example, the nucleic acid vector is a retroviral vector, for example, a lentiviral or vesicular stomatitis virus vector harboring the gene of interest, and the efficiency of viral transfer from cell to cell is dependent on an activity of the gene of interest in that a protein required for the generation of viral particles (e.g., an envelope protein, such as VSV- g) is expressed in the host cells only in the presence of the desired activity of the gene of interest. In another example, the nucleic acid vector is a DNA vector, for example, in the form of a mobilizable plasmid DNA, comprising the gene of interest, that is transferred between bacterial host cells via conjugation and the efficiency of conjugation-mediated transfer from cell to cell is dependent on the activity of the gene of interest in that a protein required for conjugation-mediated transfer (e.g., traA or traQ) is expressed in the host cells only in the presence of the desired activity of the gene of interest. Host cells contain F plasmid lacking one or both of those genes.
[0042] For example, some embodiments provide a continuous evolution system, in which a population of viral vectors comprising a gene of interest to be evolved replicates in a flow of host cells, e.g., a flow through a lagoon, wherein the viral vectors are deficient in a gene encoding a protein that is essential for the generation of infectious viral particles, and wherein that gene is comprised in the host cell under the control of a conditional promoter that can be activated by a gene product encoded by the gene of interest, or a mutated version thereof. In some embodiments, the activity of the conditional promoter depends on a desired function of a gene product encoded by the gene of interest. Viral vectors, in which the gene of interest has not acquired a mutation conferring the desired function, will not activate the conditional promoter, or only achieve minimal activation, while any mutation in the gene of interest that confers the desired mutation will result in activation of the conditional promoter. Since the conditional promoter controls an essential protein for the viral life cycle, activation of this promoter directly corresponds to an advantage in viral spread and replication for those vectors that have acquired an advantageous mutation.
[0043]“CRISPR” is a family of DNA sequences (i.e., CRISPR clusters) in bacteria and archaea that represent snippets of prior infections by a virus that have invaded the prokaryote. The snippets of DNA are used by the prokaryotic cell to detect and destroy DNA from subsequent attacks by similar viruses and effectively constitute, along with an array of CRISPR-associated proteins (including Cas9 and homologs thereof) and CRISPR-associated RNA, a prokaryotic immune defense system. In nature, CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In certain types of CRISPR systems (e.g., type II CRISPR systems), correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (me), and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently,
Cas9/crRNA/tracrRNA endonucleolytic ally cleaves linear or circular nucleic acid target complementary to the RNA. Specifically, the target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3 '-5' exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply“gRNA”) can be engineered so as to incorporate embodiments of both the crRNA and tracrRNA into a single RNA species— the guide RNA. See, e.g., Jinek M., el al., Science 337:816-821(2012), the entire contents of which is herein incorporated by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self. CRISPR biology, as well as Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g.,“Complete genome sequence of an Ml strain of Streptococcus pyogenes.” Ferretti J.J., el al, Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001);“CRISPR RNA maturation by trans- encoded small RNA and host factor RNase III.” Deltcheva E., el al, Nature 471:602-607 (2011); and“A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., et al, Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes, S. thermophiles, C. ulcerans, S. diphtheria, S. syrphidicola, P. intermedia, S. taiwanense, S. iniae, B. baltica, P. torquis, S. thermophilus , L. innocua, C. jejuni, and N. meningitidis. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier,“The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference.
[0044] The term“effective amount,” as used herein, refers to an amount of a biologically active agent that is sufficient to elicit a desired biological response. For example, in some embodiments, an effective amount of a base editor may refer to the amount of the base editor that is sufficient to edit a target site nucleotide sequence, e.g., a genome. In some
embodiments, an effective amount of a base editor provided herein, e.g., of a fusion protein comprising a nuclease-inactive Cas9 domain and a nucleobase modification domain (e.g., an adenine oxidase domain) may refer to the amount of the fusion protein that is sufficient to induce editing of a target site specifically bound and edited by the fusion protein. In some embodiments, an effective amount of a base editor provided herein may refer to the amount of the fusion protein sufficient to induce editing having the following characteristics: > 50% product purity, < 5% indels, and an editing window of 2-8 nucleotides. As will be appreciated by the skilled artisan, the effective amount of an agent, e.g., a fusion protein, a nuclease, an adenine oxidase, a hybrid protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide, may vary depending on various factors as, for example, on the desired biological response, e.g., on the specific allele, genome, or target site to be edited, on the target cell or tissue (i.e., the cell or tissue to be edited), and on the agent being used.
[0045] The term“evolved base editor” or“evolved base editor variant” refers to a base editor formed as a result of mutagenizing a reference or starting-point base editor. The term refers to embodiments in which the nucleobase modification domain is evolved or a separate domain is evolved. Mutagenizing a reference or starting-point base editor may comprise mutagenizing an adenine oxidase— by a continuous evolution method (e.g., PACE), wherein the evolved adenine oxidase has one or more amino acid variations introduced into its amino acid sequence relative to the amino acid sequence of the adenine oxidase. Amino acid sequence variations may include one or more mutated residues within the amino acid sequence of a reference base editor, e.g., as a result of a change in the nucleotide sequence encoding the base editor that results in a change in the codon at any particular position in the coding sequence, the deletion of one or more amino acids (e.g., a truncated protein), the insertion of one or more amino acids, or any combination of the foregoing. The evolved base editor may include variants in one or more components or domains of the base editor (e.g., variants introduced into an adenine oxidase domain, an iBER domain, or a variant introduced into combinations of these domains).
[0046] The term“fusion protein” as used herein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins. One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C-terminal) protein thus forming an“amino-terminal fusion protein” or a“carboxy-terminal fusion protein,” respectively. A protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a nucleic-acid editing protein. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.
[0047] The term“host cell,” as used herein, refers to a cell that can host, replicate, and transfer a phage vector useful for a continuous evolution process as provided herein. In embodiments where the vector is a viral vector, a suitable host cell is a cell that can be infected by the viral vector, can replicate it, and can package it into viral particles that can infect fresh host cells. A cell can host a viral vector if it supports expression of genes of viral vector, replication of the viral genome, and/or the generation of viral particles. One criterion to determine whether a cell is a suitable host cell for a given viral vector is to determine whether the cell can support the viral life cycle of a wild-type viral genome that the viral vector is derived from. For example, if the viral vector is a modified M13 phage genome, as provided in some embodiments described herein, then a suitable host cell would be any cell that can support the wild-type M13 phage life cycle. Suitable host cells for viral vectors useful in continuous evolution processes are well known to those of skill in the art, and the disclosure is not limited in this respect. In some embodiments, the viral vector is a phage and the host cell is a bacterial cell. In some embodiments, the host cell is an E. coll cell. Suitable E. coli host strains will be apparent to those of skill in the art, and include, but are not limited to, New England Biolabs (NEB) Turbo, ToplOF’, DH12S, ER2738, ER2267, and XLl-Blue MRF’ . These strain names are art recognized and the genotype of these strains has been well characterized. It should be understood that the above strains are exemplary only and that the invention is not limited in this respect. The term“fresh,” as used herein interchangeably with the terms“non-infected” or“uninfected” in the context of host cells, refers to a host cell that has not been infected by a viral vector comprising a gene of interest as used in a continuous evolution process provided herein. A fresh host cell can, however, have been infected by a viral vector unrelated to the vector to be evolved or by a vector of the same or a similar type but not carrying the gene of interest.
[0048] In some embodiments, the host cell is a prokaryotic cell, for example, a bacterial cell. In some embodiments, the host cell is an E. coli cell. In some embodiments, the host cell is a eukaryotic cell, for example, a yeast cell, an insect cell, or a mammalian cell. The type of host cell, will, of course, depend on the viral vector employed, and suitable host cell/viral vector combinations will be readily apparent to those of skill in the art.
[0049] In some PACE embodiments, for example, in embodiments employing an M13 selection phage, the host cells are E. coli cells expressing the Fertility factor, also commonly referred to as the F factor, sex factor, or F-plasmid. The F-factor is a bacterial DNA sequence that allows a bacterium to produce a sex pilus necessary for conjugation and is essential for the infection of E. coli cells with certain phage, for example, with M13 phage. For example, in some embodiments, the host cells for M13-PACE are of the genotype F'proA+B+
A(lacIZY) zzf::Tnl0(TetR)/ endAl recAl galE15 galK16 nupG rpsF AlacIZYA araD139 A(ara,leu)7697 mcrA A(mrr-hsdRMS-mcrBC) proBA::pirl l6 l .
[0050] The term“linker,” as used herein, refers to a chemical group or a molecule linking two molecules or domains, e.g., nCas9 and an adenine oxidase or adenine oxidase. In some embodiments, a linker joins a dCas9 and modification domain (e.g., an adenine oxidase). Typically, the linker is positioned between, or flanked by, two groups, molecules, or other domains and connected to each one via a covalent bond, thus connecting the two. In some embodiments, the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein). In some embodiments, the linker is an organic molecule, group, polymer, or chemical domain. Chemical domains include, but are not limited to, disulfide, hydrazone, thiol and azo domains. In some embodiments, the linker is 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.
[0051] The term“mutation,” as used herein, refers to a substitution of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue; a deletion or insertion of one or more residues within a sequence; or a substitution of a residue within a sequence of a genome in a subject to be corrected. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue. Various methods for making the amino acid substitutions (mutations) provided herein are well known in the art, and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)). Mutations can include a variety of categories, such as single base polymorphisms, microduplication regions, indel, and inversions, and is not meant to be limiting in any way. Mutations can include“loss-of- function” mutations which is the normal result of a mutation that reduces or abolishes a protein activity. Most loss-of-function mutations are recessive, because in a heterozygote the second chromosome copy carries an unmutated version of the gene coding for a fully functional protein whose presence compensates for the effect of the mutation. There are some exceptions where a loss-of-function mutation is dominant, one example being
haploinsufficiency, where the organism is unable to tolerate the approximately 50% reduction in protein activity suffered by the heterozygote. This is the explanation for a few genetic diseases in humans, including Marfan syndrome which results from a mutation in the gene for the connective tissue protein called fibrillin. Mutations also embrace“gain-of-function” mutations, which is one which confers an abnormal activity on a protein or cell that is otherwise not present in a normal condition. Many gain-of-function mutations are in regulatory sequences rather than in coding regions, and can therefore have a number of consequences. For example, a mutation might lead to one or more genes being expressed in the wrong tissues, these tissues gaining functions that they normally lack. Alternatively the mutation could lead to overexpression of one or more genes involved in control of the cell cycle, thus leading to uncontrolled cell division and hence to cancer. Because of their nature, gain-of-function mutations are usually dominant.
[0052] The terms“non-naturally occurring” or“engineered” are used interchangeably and indicate the involvement of the hand of man. The terms, when referring to nucleic acid molecules or polypeptides (e.g., Cas9 or adenine oxidases) mean that the nucleic acid molecule or the polypeptide is at least substantially free from at least one other component with which they are naturally associated in nature and/or as found in nature (e.g., an amino acid sequence not found in nature).
[0053] The term“nucleic acid,” as used herein, refers to RNA as well as single and/or double-stranded DNA. Nucleic acids may be naturally occurring, for example, in the context of a genome, a transcript, an mRNA, tRNA, rRNA, siRNA, snRNA, a plasmid, cosmid, chromosome, chromatid, or other naturally occurring nucleic acid molecule. On the other hand, a nucleic acid molecule may be a non-naturally occurring molecule, e.g., a recombinant DNA or RNA, an artificial chromosome, an engineered genome, or fragment thereof, or a synthetic DNA, RNA, DNA/RNA hybrid, or including non-naturally occurring nucleotides or nucleosides. Furthermore, the terms“nucleic acid,”“DNA,”“RNA,” and/or similar terms include nucleic acid analogs, e.g., analogs having other than a phosphodiester backbone. Nucleic acids can be purified from natural sources, produced using recombinant expression systems and optionally purified, chemically synthesized, etc. Where appropriate, e.g., in the case of chemically synthesized molecules, nucleic acids can comprise nucleoside analogs such as analogs having chemically modified bases or sugars, and backbone modifications. A nucleic acid sequence is presented in the 5' to 3' direction unless otherwise indicated. In some embodiments, a nucleic acid is or comprises natural nucleosides (e.g. adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine); nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, 2-aminoadenosine, C5- bromouridine, C5-fluorouridine, C5-iodouridine, C5-propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine, 2-aminoadenosine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenine, 8-oxoguanosine, 0(6)-methylguanine, and 2-thiocytidine); chemically modified bases;
biologically modified bases (e.g., methylated bases); intercalated bases; modified sugars (e.g., 2'-fluororibose, ribose, 2'-deoxyribose, arabinose, and hexose); and/or modified phosphate groups (e.g., phosphorothioates and 5'-N-phosphoramidite linkages).
[0054] The term“nucleic acid programmable D/RNA binding protein (napR/DNAbp)” refers to any protein that may associate (e.g., form a complex) with one or more nucleic acid molecules (i.e., which may broadly be referred to as a“napR/DNAbp-programming nucleic acid molecule” and includes, for example, guide RNA in the case of Cas systems) which direct or otherwise program the protein to localize to a specific target nucleotide sequence (e.g., a gene locus of a genome) that is complementary to the one or more nucleic acid molecules (or a portion or region thereof) associated with the protein, thereby causing the protein to bind to the nucleotide sequence at the specific target site. This term napR/DNAbp embraces CRISPR Cas9 proteins, as well as Cas9 equivalents, homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or modified), and may include a Cas9 equivalent from any type of CRISPR system (e.g., type II, V, VI), including Cpfl (a type-V CRISPR-Cas systems), C2cl (a type V CRISPR-Cas system), C2c2 (a type VI CRISPR-Cas system), C2c3 (a type V CRISPR-Cas system), dCas9, GeoCas9, CjCas9, Casl2a, Casl2b, Casl2c, Casl2d, Casl2g, Casl2h, Casl2i, Casl3b, Casl3c, Casl3d, Casl4, Csn2, Argonaute (Ago), and nCas9. The term also embraces Cas homologs and variants such as an xCas9, an SpCas9-NG, an LbCasl2a, an AsCasl2a, a Cas9-KKH, a circularly permuted Cas9, a SmacCas9, a Spy-macCas9. Further Cas-equivalents are described in Makarova et al.,“C2c2 is a single-component
programmable RNA-guided RNA-targeting CRISPR effector,” Science 2016; 353 (6299), the contents of which are incorporated herein by reference. However, the nucleic acid programmable DNA binding protein (napDNAbp) that may be used in connection with this invention are not limited to CRISPR-Cas systems. The invention embraces any such programmable protein, such as the Argonaute protein from Natronobacterium gregoryi (NgAgo) which may also be used for DNA-guided genome editing. NgAgo-guide DNA system does not require a PAM sequence or guide RNA molecules, which means genome editing can be performed simply by the expression of generic NgAgo protein and
introduction of synthetic oligonucleotides on any genomic sequence. See Gao et al., DNA- guided genome editing using the Natronobacterium gregoryi Argonaute. Nature
Biotechnology 2016; 34(7):768-73, which is incorporated herein by reference.
[0055] In some embodiments, the napR/DNAbp is a RNA-programmable nuclease, when in a complex with an RNA, may be referred to as a nuclease:RNA complex. Typically, the bound RNA(s) is referred to as a guide RNA (gRNA). gRNAs can exist as a complex of two or more RNAs, or as a single RNA molecule. gRNAs that exist as a single RNA molecule may be referred to as single-guide RNAs (sgRNAs), though“gRNA” is used interchangeabley to refer to guide RNAs that exist as either single molecules or as a complex of two or more molecules. Typically, gRNAs that exist as single RNA species comprise two domains: (1) a domain that shares homology to a target nucleic acid (e.g., and directs binding of a Cas9 (or equivalent) complex to the target); and (2) a domain that binds a Cas9 protein. In some embodiments, domain (2) corresponds to a sequence known as a tracrRNA, and comprises a stem-loop structure. For example, in some embodiments, domain (2) is homologous to a tracrRNA as depicted in Figure IE of Jinek et al., Science 337:816-821(2012), the entire contents of which is incorporated herein by reference. Other examples of gRNAs (e.g., those including domain 2) can be found in U.S. Patent No. 9,340,799, entitled“mRNA-Sensing Switchable gRNAs,” and International Patent Application No. PCT/US2014/054247, filed September 6, 2013, published as WO 2015/035136 and entitled“Delivery System For Functional Nucleases,” the entire contents of each are herein incorporated by reference. In some embodiments, a gRNA comprises two or more of domains (1) and (2), and may be referred to as an“extended gRNA.” For example, an extended gRNA will, e.g., bind two or more Cas9 proteins and bind a target nucleic acid at two or more distinct regions, as described herein. The gRNA comprises a nucleotide sequence that complements a target site, which mediates binding of the nuclease/RNA complex to said target site, providing the sequence specificity of the nuclease:RNA complex. In some embodiments, the RNA- programmable nuclease is the (CRISPR-associated system) Cas9 endonuclease, for example Cas9 (Csnl) from Streptococcus pyogenes (see, e.g.,“Complete genome sequence of an Ml strain of Streptococcus pyogenes.” Ferretti J.J. et al.., Proc. Natl. Acad. Sci. U.S. A. 98:4658- 4663(2001);“CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E. et al., Nature 471:602-607(2011); and“A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M. et al., Science 337:816- 821(2012), the entire contents of each of which are incorporated herein by reference.
[0056] The napR/DNAbp nucleases (e.g., Cas9) use RNA:DNA hybridization to target DNA cleavage sites, these proteins are able to be targeted, in principle, to any sequence specified by the guide RNA. Methods of using napR/DNAbp nucleases, such as Cas9, for site-specific cleavage (e.g., to modify a genome) are known in the art (see e.g., Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823 (2013); Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823-826 (2013); Hwang, W.Y. et al. Efficient genome editing in zebrafish using a CRISPR-Cas system. Nature Biotechnology 31, 227-229 (2013); Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e00471 (2013); Dicarlo, J.E. et al., Genome engineering in
Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic Acid Res. (2013); Jiang, W. et al. RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nature
Biotechnology 31, 233-239 (2013); the entire contents of each of which are incorporated herein by reference).
[0057] The term“napR/DNAbp-programming nucleic acid molecule” or equivalently“guide sequence” refers the one or more nucleic acid molecules which associate with and direct or otherwise program a napR/DNAbp protein to localize to a specific target nucleotide sequence (e.g., a gene locus of a genome) that is complementary to the one or more nucleic acid molecules (or a portion or region thereof) associated with the protein, thereby causing the napR/DNAbp protein to bind to the nucleotide sequence at the specific target site. A non limiting example is a guide RNA of a Cas protein of a CRISPR-Cas genome editing system.
[0058] A nuclear localization signal or sequence (NLS) is an amino acid sequence that tags, designates, or otherwise marks a protein for import into the cell nucleus by nuclear transport. Typically, this signal consists of one or more short sequences of positively charged lysines or arginines exposed on the protein surface. Different nuclear localized proteins may share the same NLS. An NLS has the opposite function of a nuclear export signal (NES), which targets proteins out of the nucleus. Thus, a single nuclear localization signal can direct the entity with which it is associated to the nucleus of a cell. Such sequences can be of any size and composition, for example more than 25, 25, 15, 12, 10, 8, 7, 6, 5 or 4 amino acids, but will preferably comprise at least a four to eight amino acid sequence known to function as a nuclear localization signal (NLS).
[0059] The term, as used herein,“nucleobase modification domain” or“modification domain” embraces any protein, enzyme, or polypeptide (or functional fragment thereof) which is capable of modifying a DNA or RNA molecule. Nucleobase modification domains may be naturally occurring, or may be engineered. For example, a nucleobase modification domain can include one or more DNA repair enzymes, for example, and an enzyme or protein involved in base excision repair (BER), nucleotide excision repair (NER), homology- dependnent recombinational repair (HR), non-homologous end-joining repair (NHEJ), microhomology end-joining repair (MMEJ), mismatch repair (MMR), direct reversal repair, or other known DNA repair pathway. A nucleobase modification domain can have one or more types of enzymatic activities, including, but not limited to, endonuclease activity, polymerase activity, ligase activity, replication activity, and proofreading activity.
Nucleobase modification domains can also include DNA or RNA-modifying enzymes and/or mutagenic enzymes, such as DNA oxidizing enzymes (i.e., adenine oxidases), which covalently modify nucleobases leading in some cases to mutagenic corrections by way of normal cellular DNA repair and replication processes. Exemplary nucleobase modification domains include, but are not limited to, an adenine oxidase, a nuclease, a nickase, a recombinase, a methyltransferase, a methylase, an acetylase, an acetyltransferase, a transcriptional activator, or a transcriptional repressor domain. In some embodiments the nucleobase modification domain is an adenine oxidase (e.g., AlkBHl). [0060] As used herein, the terms“oligonucleotide” and“polynucleotide” can be used interchangeably to refer to a polymer of nucleotides (e.g., a string of at least three
nucleotides).
[0061] The term“phage-assisted continuous evolution (PACE),” as used herein, refers to continuous evolution that employs phage as viral vectors. The general concept of PACE technology has been described, for example, in International PCT Application,
PCT/US 2009/056194, filed September 8, 2009, published as WO 2010/028347 on March 11, 2010; International PCT Application, PCT/US2011/066747, filed December 22, 2011, published as WO 2012/088381 on June 28, 2012; U.S. Patent No. 9,023,594, issued May 5, 2015; U.S. Patent No. 9,771,574, issued September 26, 2017; U.S. Patent No. 9,394,537, issued July 19, 2016; International PCT Application, PCT/US2015/012022, filed January 20, 2015, published as WO 2015/134121 on September 11, 2015; U.S. Patent No. 10,179,911, issued January 15, 2019; U.S. Patent No. 10,179,911, issued January 15, 2019; International PCT Application, PCT/US2016/027795, filed April 15, 2016, published as WO 2016/168631 on October 20, 2016, and International Patent Publication WO 2019/023680, published January 31, 2019, the entire contents of each of which are incorporated herein by reference.
[0062] The term“phage-assisted non-continuous evolution (PANCE),” as used herein, refers to non-continuous evolution that employs phage as viral vectors. The general concept of PANCE technology has been described, for example, in Suzuki T. et al, Crystal structures reveal an elusive functional domain of pyrrolysyl-tRNA synthetase, Nat Chem Biol. 13(12): 1261-1266 (2017), incorporated herein by reference in its entirety. Briefly, PANCE is a simplified technique for rapid in vivo directed evolution using serial flask transfers of evolving‘selection phage’ (SP), which contain a gene of interest to be evolved, across fresh E. coli host cells, thereby allowing genes inside the host E. coli to be held constant while genes contained in the SP continuously evolve. Following phage growth, an aliquot of infected cells is used to transfect a subsequent flask containing host E. coli. This process is continued until the desired phenotype is evolved, for as many transfers as required. Serial flask transfers have long served as a widely-accessible approach for laboratory evolution of microbes, and, more recently, analogous approaches have been developed for bacteriophage evolution. The PANCE system features lower stringency than the PACE system.
[0063] The term“promoter” is art-recognized and refers to a nucleic acid molecule with a sequence recognized by the cellular transcription machinery and able to initiate transcription of a downstream gene. A promoter can be constitutively active, meaning that the promoter is always active in a given cellular context, or conditionally active, meaning that the promoter is only active in the presence of a specific condition. For example, a conditional promoter may only be active in the presence of a specific protein that connects a protein associated with a regulatory element in the promoter to the basic transcriptional machinery, or only in the absence of an inhibitory molecule. A subclass of conditionally active promoters are inducible promoters that require the presence of a small molecule“inducer” for activity. Examples of inducible promoters include, but are not limited to, arabinose-inducible promoters, Tet-on promoters, and tamoxifen-inducible promoters. A variety of constitutive, conditional, and inducible promoters are well known to the skilled artisan, and the skilled artisan will be able to ascertain a variety of such promoters useful in carrying out the instant invention, which is not limited in this respect. In various embodiments, the specification provides vectors with appropriate promoters for driving expression of the nucleic acid sequences encoding the base editor fusion proteins (or one or more individual components thereof).
[0064] The term“phage,” as used herein interchangeably with the term“bacteriophage,” refers to a vims that infects bacterial cells. Typically, phages consist of an outer protein capsid enclosing genetic material. The genetic material may be ssRNA, dsRNA, ssDNA, or dsDNA, in either linear or circular form. Phages and phage vectors are well known to those of skill in the art and non-limiting examples of phages that are useful for carrying out the methods provided herein are l, T2, T4, T7, T12, R17, M13, MS2, G4, PI, P2, P4, Phi X174, N4, F6, and F29. In certain embodiments, the phage utilized in the present invention is M13. Additional suitable phages and host cells will be apparent to those of skill in the art and the invention is not limited in this aspect. For an exemplary description of additional suitable phages and host cells, see Elizabeth Kutter and Alexander Sulakvelidze:
Bacteriophages: Biology and Applications. CRC Press; 1st edition (December 2004), ISBN: 0849313368; Martha R. J. Clokie and Andrew M. Kropinski: Bacteriophages: Methods and Protocols, Volume 1: Isolation, Characterization, and Interactions (Methods in Molecular Biology) Humana Press; 1st edition (December, 2008), ISBN: 1588296822; Martha R. J. Clokie and Andrew M. Kropinski: Bacteriophages: Methods and Protocols, Volume 2:
Molecular and Applied Embodiments (Methods in Molecular Biology) Humana Press; 1st edition (December 2008), ISBN: 1603275649; all of which are incorporated herein in their entirety by reference for disclosure of suitable phages and host cells as well as methods and protocols for isolation, culture, and manipulation of such phages).
[0065] The terms“protein,”“peptide,” and“polypeptide” are used interchangeably herein, and refer to a polymer of amino acid residues linked together by peptide (amide) bonds. The terms refer to a protein, peptide, or polypeptide of any size, structure, or function. Typically, a protein, peptide, or polypeptide will be at least three amino acids long. A protein, peptide, or polypeptide may refer to an individual protein or a collection of proteins. One or more of the amino acids in a protein, peptide, or polypeptide may be modified, for example, by the addition of a chemical entity such as a carbohydrate group, a hydroxyl group, a phosphate group, a famesyl group, an isofamesyl group, a fatty acid group, a linker for conjugation, functionalization, or other modification, etc. A protein, peptide, or polypeptide may also be a single molecule or may be a multi-molecular complex. A protein, peptide, or polypeptide may be just a fragment of a naturally occurring protein or peptide. A protein, peptide, or polypeptide may be naturally occurring, engineered, or synthetic, or any combination thereof. The term“fusion protein” as used herein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins. One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C- terminal) protein thus forming an“amino-terminal fusion protein” or a“carboxy-terminal fusion protein,” respectively. A protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a recombinase. In some embodiments, a protein comprises a proteinaceous part, e.g., an amino acid sequence constituting a nucleic acid binding domain, and an organic compound, e.g., a compound that can act as a nucleic acid cleavage agent. In some embodiments, a protein is in a complex with, or is in association with, a nucleic acid, e.g., RNA. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor
Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.
[0066] The term“recombinant” as used herein in the context of proteins or nucleic acids refers to proteins or nucleic acids that do not occur in nature, but are the product of human engineering. For example, in some embodiments, a recombinant protein or nucleic acid molecule comprises an amino acid or nucleotide sequence that comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations as compared to any naturally occurring sequence. [0067] The term“subject,” as used herein, refers to an individual organism, for example, an individual mammal. In some embodiments, the subject is a human. In some embodiments, the subject is a non-human mammal. In some embodiments, the subject is a non-human primate. In some embodiments, the subject is a rodent. In some embodiments, the subject is a sheep, a goat, a cattle, a cat, or a dog. In some embodiments, the subject is a vertebrate, an amphibian, a reptile, a fish, an insect, a fly, or a nematode. In some embodiments, the subject is a research animal. In some embodiments, the subject is an experimental organism. In some embodiments, the subject is a plant. In some embodiments, the subject is genetically engineered, e.g., a genetically engineered non-human subject. The subject may be of either sex and at any stage of development.
[0068] The term“target site” refers to a sequence within a nucleic acid molecule that is edited by a base editor (e.g., a dCas9-adenine oxidase fusion protein provided herein). The target site further refers to the sequence within a nucleic acid molecule to which a complex of the base editor and gRNA binds.
[0069] The term“vector,” as used herein, may refer to a nucleic acid that has been modified to encode a gene of interest and that is able to enter into a host cell, mutate and replicate within the host cell, and then transfer a replicated form of the vector into another host cell. Alternatively, the term“vector” as used herein may refer to a nucleic acid that has been modified to encode the base editor. Exemplary suitable vectors include viral vectors, such as retroviral vectors or bacteriophages and filamentous phage, and conjugative plasmids.
[0070] The term“viral particle,” as used herein, refers to a viral genome, for example, a DNA or RNA genome, that is associated with a coat of a viral protein or proteins, and, in some cases, with an envelope of lipids. For example, a phage particle comprises a phage genome packaged into a protein encoded by the wild type phage genome.
[0071] The term“viral vector,” as used herein, refers to a nucleic acid comprising a viral genome that, when introduced into a suitable host cell, can be replicated and packaged into viral particles able to transfer the viral genome into another host cell. The term“viral vector” extends to vectors comprising truncated or partial viral genomes. For example, in some embodiments, a viral vector is provided that lacks a gene encoding a protein essential for the generation of infectious viral particles. In suitable host cells, for example, host cells comprising the lacking gene under the control of a conditional promoter, however, such truncated viral vectors can replicate and generate viral particles able to transfer the truncated viral genome into another host cell. In some embodiments, the viral vector is an adeno- associated virus (AAV) vector. [0072] The terms“treatment,”“treat,” and“treating,” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease, disorder, or condition, or one or more symptoms thereof, as described herein. As used herein, the terms “treatment,”“treat,” and“treating” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease, disorder, or condition, or one or more symptoms thereof, as described herein. In some embodiments, treatment may be
administered after one or more symptoms have developed and/or after a disease has been diagnosed. In other embodiments, treatment may be administered in the absence of symptoms, e.g., to prevent or delay onset of a symptom or inhibit onset or progression of a disease. For example, treatment may be administered to a susceptible individual prior to the onset of symptoms (e.g., in light of a history of symptoms and/or in light of genetic or other susceptibility factors). Treatment may also be continued after symptoms have resolved, for example, to prevent or delay their prevention or recurrence.
[0073] As used herein, the term“variant” refers to a protein having characteristics that deviate from what occurs in nature that retains at least one functional i.e. binding, interaction, or enzymatic activity and/or therapeutic property thereof. A“variant” is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the wild type protein. For instance, a variant of Cas9 may comprise a Cas9 that has one or more changes in amino acid residues as compared to a wild type Cas9 amino acid sequence. As another example, a variant of a deaminase may comprise a deaminase that has one or more changes in amino acid residues as compared to a wild type deaminase amino acid sequence, e.g. following ancestral sequence reconstruction of the deaminase. These changes include chemical modifications, substitutions of different amino acid residues truncations, covalent additions (e.g. of a tag), and any other changes. This term also embraces fragments of a wild type protein.
[0074] The level or degree of which the property is retained may be reduced relative to the wild type protein but is typically the same or similar in kind. Generally, variants are overall very similar, and in many regions, identical to the amino acid sequence of the protein described herein. A skilled artisan will appreciate how to make and use variants that maintain all, or at least some, of a functional ability or property.
[0075] The variant proteins may comprise, or alternatively consist of, an amino acid sequence which is at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%, identical to, for example, the amino acid sequence of a wild-type protein, or any protein provided herein. Further polypeptides encompassed by the invention are polypeptides encoded by polynucleotides which hybridize to the complement of a nucleic acid molecule encoding a protein such as a napDNAbp under stringent hybridization conditions (e.g. hybridization to filter bound DNA in 6x Sodium chloride/S odium citrate (SSC) at about 45 degrees Celsius, followed by one or more washes in 0.2. times. SSC, 0.1% SDS at about 50-65 degrees Celsius), under highly stringent conditions (e.g. hybridization to filter bound DNA in 6x sodium chloride/S odium citrate (SSC) at about 45 degrees Celsius, followed by one or more washes in O.lxSSC, 0.2% SDS at about 68 degrees Celsius), or under other stringent hybridization conditions which are known to those of skill in the art (see, for example, Ausubel, F. M. el al, eds., 1989 Current Protocol in Molecular Biology , Green publishing associates, Inc., and John Wiley & Sons Inc., New York, at pp. 6.3.1-6.3.6 and 2.10.3).
[0076] By a polypeptide having an amino acid sequence at least, for example, 95%
“identical” to a query amino acid sequence, it is intended that the amino acid sequence of the subject polypeptide is identical to the query sequence except that the subject polypeptide sequence may include up to five amino acid alterations per each 100 amino acids of the query amino acid sequence. In other words, to obtain a polypeptide having an amino acid sequence at least 95% identical to a query amino acid sequence, up to 5% of the amino acid residues in the subject sequence may be inserted, deleted, or substituted with another amino acid. These alterations of the reference sequence may occur at the amino- or carboxy-terminal positions of the reference amino acid sequence or anywhere between those terminal positions, interspersed either individually among residues in the reference sequence or in one or more contiguous groups within the reference sequence.
[0077] As a practical matter, whether any particular polypeptide is at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to, for instance, the amino acid sequence of a protein such as a napDNAbp, can be determined conventionally using known computer programs. A preferred method for determining the best overall match between a query sequence (a sequence of the present invention) and a subject sequence, also referred to as a global sequence alignment, can be determined using the FASTDB computer program based on the algorithm of Brutlag el al. {Comp. App. Biosci. 6:237-245 (1990)). In a sequence alignment the query and subject sequences are either both nucleotide sequences or both amino acid sequences. The result of said global sequence alignment is expressed as percent identity. Preferred parameters used in a FASTDB amino acid alignment are: Matrix=PAM 0, k- tuple=2, Mismatch Penalty=l, Joining Penalty=20, Randomization Group Length=0, Cutoff Score=l, Window Size=sequence length, Gap Penalty=5, Gap Size Penalty=0.05, Window Size=500 or the length of the subject amino acid sequence, whichever is shorter.
[0078] If the subject sequence is shorter than the query sequence due to N- or C-terminal deletions, not because of internal deletions, a manual correction must be made to the results. This is because the FASTDB program does not account for N- and C-terminal truncations of the subject sequence when calculating global percent identity. For subject sequences truncated at the N- and C-termini, relative to the query sequence, the percent identity is corrected by calculating the number of residues of the query sequence that are N- and C- terminal of the subject sequence, which are not matched/aligned with a corresponding subject residue, as a percent of the total bases of the query sequence. Whether a residue is matched/aligned is determined by results of the FASTDB sequence alignment. This percentage is then subtracted from the percent identity, calculated by the above FASTDB program using the specified parameters, to arrive at a final percent identity score. This final percent identity score is what is used for the purposes of the present invention. Only residues to the N- and C-termini of the subject sequence, which are not matched/aligned with the query sequence, are considered for the purposes of manually adjusting the percent identity score. That is, only query residue positions outside the farthest N- and C-terminal residues of the reference sequence.
[0079] As used herein the term“wild type” is a term of the art understood by skilled persons and means the typical form of an organism, strain, gene or characteristic as it occurs in nature as distinguished from mutant or variant forms.
DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS
[0080] The present disclosure provides adenine-to-cytosine or“ACBE” (or thymine-to- guanine or“TGBE”) transversion base editors which comprise a napDNAbp (e.g., a nCas9 domain) fused to a nucleobase modification domain. The nucleobase modification domain may comprise an adenine oxidase. The disclosed ACBE transversion base editors are capable of converting an A:T nucleobase pair to a C:G nucleobase pair in a target nucleotide sequence of interest, e.g., the genome of a cell. The disclosed base editors comprise an engineered oxidase variant that catalyzes the conversion of a target adenine to a cytosine via an oxidation reaction.
[0081] The disclosed base editors also comprise TGBE transversion base editors that comprise an engineered oxidase variant that catalyzes the conversion of a target adenine to a cytosine via an oxidation reaction, wherein the base-paired thymine of the non-edited (i.e. non-oxidized) strand is subsequently converted to a guanine by the concerted action of the cell’s mismatch repair factors.
[0082] In this adenine oxidation strategy, enzyme-catalyzed oxidation of a targeted A in a nucleic acid of interest results in 8-oxoadenine (8-oxoA) formation. Steric rotation of the 8- oxoA around the glycosidic bond is induced, presenting the Hoogsteen edge for base pairing. Without wishing to be bound by any particular theory, during replication or repair of the unmutated strand, 8-oxoA is read by a polymerase as a cytosine and the cell’s mismatch repair machinery converts the base-paired thymine of the non-edited strand to a guanine to correct the apparent mismatch. The resulting base pairing features two three-center hydrogen bonding systems. Upon the next round of replication, the cell’s mismatch repair machiner converts the 8-oxoA lesion to a cytosine, thereby completing the desired A:T to C:G mutation. Adenine oxidation is achieved by the targeted use of a fusion protein comprising a napDNAbp (e.g., a Cas9 nickase (“nCas9”)) domain, an adenine oxidase domain, and optionally a linker connecting these two domains (see FIG. 1).
[0083] The adenine oxidase domains of the disclosed base editors may comprise variants of wild-type oxidase enzymes. These variants may comprise an amino acid sequence that is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the wild type enzyme. In some embodiments, the adenine oxidase domains may comprise an amino acid sequence having 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, or more than 30 amino acids that differ relative to the amino acid sequence of the wild type enzyme. These differences may comprise nucleotides that have been inserted, deleted, or substituted relative to the amino acid sequence of the wild type enzyme. In some embodiments, the adenine oxidase domains contain stretches of about 50, about 75, about 100, about 125, about 150, about 175, about 200, about 300, about 400, about 500, or more than 500 consecutive amino acids in common with the wild type enzyme. In some embodiments, the adenine oxidase domains comprise truncations at the N-terminus or C-terminus relative to the wild-type enzyme. In some embodiments, the adenine oxidase domains comprise truncations of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, or more than 30 amino acids at the N-terminus or C-terminus relative to the wild- type or base sequence.
[0084] In certain embodiments, the adenine oxidase is an AlkBH3, or a variant thereof. In certain embodiments, the adenine oxidase is a bacterial AlkB, or a variant thereof. In other embodiments, the adenine oxidase is a human AlkBH, or a variant thereof. In certain embodiments, the adenine oxidase is a human AlkBH 1, AlkBH2, AlkBH3, AlkBH4,
AlkBH5, AlkBH7, AlkBH8, or a variant thereof.
[0085] In other embodiments, the adenine oxidase is a TET-oxidase, or a variant thereof. In certain embodiments, the oxidase is a human TET1, TET2, TET3, the catalytic domain of a human TET1 (TET1-CD), or other effector domains of human TET1, TET2, or TET3, or a variant thereof.
[0086] In other embodiments, the adenine oxidase is a xanthine dehydrogenase, or a variant thereof. In certain embodiments, the xanthine dehydrogenase is a human xanthine
dehydrogenase, or a variant thereof. In certain embodiments, the xanthine dehydrogenase is a Streptomyces cyanogenus xanthine dehydrogenase (ScXDH), or a variant thereof. In other embodiments, the xanthine dehydrogenase or variant thereof is derived from C. capitata, N. crassa, M. hansupus, E. cloacae, S. snoursei, S. albulus, S. himastatinicus , or S. lividans.
[0087] In other embodiments, the adenine oxidase is a cytochrome P450 enzyme, or a variant thereof. In certain embodiments, the oxidase is a human CYP1A2, CYP2A4, or CYP3A6, or a variant thereof.
[0088] In other embodiments, the oxidase is a molybdopterin-dependent aldehyde oxidase ( e.g ., human AOX1). In other embodiments, the oxidase is a flavin monooxygenase. In other embodiments, the adenine oxidase is a human FTO, or a variant thereof.
[0089] The instant specification provides for A:T to C:G transversion base editors which overcome a need in the art for installation of targeted transversions into a target or desired nucleotide sequence, e.g., a genome. In particular, the instant specification provides A:T to C:G base editors (e.g., fusion proteins comprising an nCas9 domain and an adenine oxidase domain) which overcome a need in the art for installation of targeted trans versions, particularly A:T to C:G trans versions. In addition, the disclosure provides compositions comprising the transversion base editors as described herein, e.g., fusion proteins comprising an nCas9 domain and an adenine oxidase domain, and one or more guide RNAs, e.g., a single-guide RNA (“sgRNA”). In addition, the instant specification provides for nucleic acid molecules encoding and/or expressing the transversion base editors as described herein, as well as expression vectors or constructs for expressing the transversion base editors described herein and a gRNA, host cells comprising said nucleic acid molecules and expression vectors, and optionally one or more gRNAs, and compositions for delivering and/or administering nucleic acid-based embodiments described herein. [0090] Still further, the present disclosure provides for methods of making the transversion base editors described herein, as well as methods of using the transversion base editors or nucleic acid molecules encoding the transversion base editors in applications including editing a nucleic acid molecule, e.g., a genome. In certain embodiments, methods of engineering the transversion base editors provided herein involve a phage-assisted continuous evolution (PACE) system or non-continuous system (e.g., PANCE), which may be utilized to evolve one or more components of a base editor (e.g., an adenine oxidase domain). In certain embodiments, following the successful evolution of the one or more components of the transversion base editor (e.g., an adenine oxidase domain), methods of making the base editors comprise recombinant protein expression methodologies known to one of ordinary skill in the art.
[0091] The specification also provides methods for editing a target nucleic acid molecule, e.g., a single nucleobase within a genome, with a base editing system described herein (e.g., in the form of an evolved base editor as described herein, or a vector or construct encoding same). Such methods involve transducing (e.g., via transfection) cells with a plurality of complexes each comprising a fusion protein (e.g., a fusion protein comprising a Cas9 nickase (nCas9) domain and an adenine oxidase domain) and a gRNA molecule. In some
embodiments, the gRNA is bound to the napDNAbp domain (e.g., nCas9 domain) of the fusion protein. In some embodiments, each gRNA comprises a guide sequence of at least 10 contiguous nucleotides (e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 contiguous nucleotides) that is complementary to a target sequence. In certain embodiments, the methods involve the transfection of nucleic acid constructs (e.g., plasmids) that each (or together) encode the components of a complex of fusion protein and gRNA molecule.
[0092] In certain embodiments of the disclosed methods, a nucleic acid construct that encodes the fusion protein is transfected into the cell separately from the plasmid that encodes the gRNA molecule. In certain embodiments, these components are encoded on a single construct and transfected together.
[0093] In other embodiments, the methods disclosed herein involve the introduction into cells of a complex comprising a fusion protein and gRNA molecule that has been expressed and cloned outside of these cells.
[0094] It should be appreciated that any fusion protein, e.g., any of the fusion proteins provided herein, may be introduced into the cell in any suitable way, either stably or transiently. In some embodiments, a fusion protein may be transfected into the cell. In some embodiments, the cell may be transduced or transfected with a nucleic acid construct that encodes a fusion protein. For example, e a cell may be transduced (e.g., with a virus encoding a fusion protein), or transfected (e.g., with a plasmid encoding a fusion protein) with a nucleic acid that encodes a fusion protein, or the translated fusion protein. Such transduction may be a stable or transient transduction. In some embodiments, cells expressing a fusion protein or containing a fusion protein may be transduced or transfected with one or more gRNA molecules, for example when the fusion protein comprises a Cas9 (e.g., nCas9) domain. In some embodiments, a plasmid expressing a fusion protein may be introduced into cells through electroporation, transient (e.g., lipofection) and stable genome integration (e.g., piggybac) and viral transduction or other methods known to those of skill in the art.
[0095] In certain embodiments, the methods described above result in a cutting (or nicking) one strand of the double- stranded DNA, for example, the strand that includes the thymine (T) of the target A:T nucleobase pair opposite the strand containing the target adenine (A) that is being oxidized. This nicking result serves to direct mismatch repair machinery to the non- edited strand, ensuring that the chemically modified nucleobase is not interpreted as a lesion by the machinery. This nick may be created by the use of an nCas9.
[0096] In other embodiments, the present specification provides a complex comprising the base editor fusion proteins described herein and an RNA bound to the Cas9 domain of the fusion protein, such as a guide RNA (gRNA), e.g., a single guide RNA.
[0097] The target nucleotide sequence may comprise a target sequence (e.g., a point mutation) associated with a disease, disorder, or condition, such as congenital deafness, spastic paraplegia, nonsyndromic hearing loss, spinal muscular atrophy, or hypohidrotic ectodermal dysplasia. The target sequence may comprise a C to A point mutation associated with a disease, disorder, or condition, and wherein the oxidation of the mutant A base results in mismatch repair-mediated correction to a sequence that is not associated with a disease, disorder, or condition. The target sequence may comprise a G to T point mutation associated with a disease, disorder, or condition, and wherein the oxidation of the mutant A base results in mismatch repair-mediated correction to a sequence that is not associated with a disease, disorder, or condition. The target sequence may encode a protein, and where the point mutation is in a codon and results in a change in the amino acid encoded by the mutant codon as compared to a wild-type codon. The target sequence may also be at a splice site, and the point mutation results in a change in the splicing of an mRNA transcript as compared to a wild-type transcript. In addition, the target may be at a non-coding sequence of a gene, such as a promoter, and the point mutation results in increased or decreased expression of the gene.
[0098] Exemplary target genes include GJB2, in which a G to T point mutation at residue 139 results in a congenital deafness phenotype; and SPG11, in which a C to A point mutation at residue 2877 results in a apastic paraplegia phenotype. Additional target genes include OTOF (associated with nonsyndromic hearing loss), IGHMBP2 (associated with spinal muscular atrophy), and EDAR (associated with hypohidrotic ectodermal dysplasia), for which the disease phenotype is frequently caused by C:G to A:T point mutations. For these target genes, C:G to A:T point mutations introduce premature stop codons (UAA, UAG, UGA), resulting in nonsense mutations in protein coding regions. For all of the genetic disorders associated with the point mutations in these target genes, morbidity is high and treatment is not curative. Exemplary ACBEs disclosed herein correct these disease alleles in somatic cells, reducing or removing morbidity. In other embodiments, exemplary ACBEs disclosed herein may install disease-suppressing alleles in somatic cells.
[0099] Thus, in some aspects, the oxidation of a mutant A results in a change of the amino acid encoded by the mutant codon, which in some cases can result in the expression of a wild-type amino acid. The application of the base editors can also result in a change of the mRNA transcript, and even restoring the mRNA transcript to a wild-type state.
[0100] The methods described herein involving contacting a base editor with a target nucleotide sequence can occur in vitro, ex vivo, or in vivo. In certain embodiments, the step of contacting occurs in a subject. In certain embodiments, the subject has been diagnosed with a disease, disorder, or condition, such as, but not limited to, a disease, disorder, or condition associated with a point mutation in the GJB2 gene, the IGHMBP2 gene, the OTOF gene, the EDAR gene, or the SPG11 gene.
[0101] In another aspect, the specification discloses a pharmaceutical composition comprising any one of the presently disclosed base editor fusion proteins. In one aspect, the specification discloses a pharmaceutical composition comprising any one of the presently disclosed complexes of fusion proteins and gRNA. In one aspect, the specification discloses a pharmaceutical composition comprising polynucleotides encoding the fusion proteins disclosed herein and polynucleotides encoding a gRNA, or polynucleotides encoding both.
[0102] In another aspect, the specification discloses a pharmaceutical composition comprising any one of the presently disclosed vectors. In certain embodiments, the pharmaceutical composition further comprises a pharmaceutically acceptable excipient. In certain embodiments, the pharmaceutical composition further comprises a lipid and/or polymer. In certain embodiments, the lipid and/or polymer is cationic. The preparation of such lipid particles is well known. See, e.g. U.S. Patent Nos. 4,880,635; 4,906,477;
4,911,928; 4,917,951; 4,920,016; 4,921,757; and 9,737,604, each of which is incorporated herein by reference.
[0103] In various embodiments, the present disclosure provides A-to-C (or T-to-G) transversion base editor fusion proteins comprising (i) a nucleic acid programmable DNA binding protein (napDNAbp), and (ii) a nucleobase modification domain capable of facilitating the conversion of a A:T nucleobase pair to a C:G nucleobase pair in a target nucleotide sequence, e.g., a genome.
[0104] In various embodiments, the nucleobase modification domain is an adenine oxidase, which enzymatically converts an adenine nucleobase of an A:T nucleobase pair to an 8- oxoadenine, which is subsequently converted by the cell’s DNA repair and replication machinery to a cytosine, ultimately converting the A:T nucleobase pair to a C:G nucleobase pair.
[0105] The various domains of the transversion fusion proteins described herein (e.g., the Cas9 domain or the nucleobase modification domains) may be obtained as a result of mutagenizing a reference or starting-point base editor (or a component or domain thereof) by a directed evolution process, e.g., a continuous evolution method (e.g., PACE) or a non- continuous evolution method (e.g., PANCE or other discrete plate-based selections). In various embodiments, the disclosure provides a base editor that has one or more amino acid variations introduced into its amino acid sequence relative to the amino acid sequence of the reference or starting-point base editor. The base editor may include variants in one or more components or domains of the base editor (e.g., variants introduced into a Cas9 domain, an adenine oxidase domain, an inhibitor of base excision repair (iBER) domain, or a variant introduced into combinations of these domains). For example, the nucleobase modification domain may be evolved from a reference protein that is an RNA modifying enzyme (e.g., an /Vl-methyladenosine modification enzyme or a 5-methylcytosine modification enzyme) and evolved using PACE, PANCE, or other plate-based evolution methods to obtain a DNA modifying version of the nucleobase modification domain, which can then be used in the fusion proteins described herein. I. NapDNAbp domains
[0106] The base editors described herein comprise a nucleic acid programmable DNA binding (napDNAbp) domain. The napDNAbp is associated with at least one guide nucleic acid (e.g., guide RNA), which localizes the napDNAbp to a DNA sequence that comprises a DNA strand (i.e., a target strand) that is complementary to the guide nucleic acid, or a portion thereof (e.g., the protospacer of a guide RNA). In other words, the guide nucleic-acid “programs” the napDNAbp domain to localize and bind to a complementary sequence of the target strand. Binding of the napDNAbp domain to a complementary sequence enables the nucleobase modification domain of the base editor to access and enzymatically deaminate a target adenine base in the target strand.
[0107] The napDNAbp can be a CRISPR (clustered regularly interspaced short palindromic repeat)-associated nuclease. As outlined above, CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In type II CRISPR systems correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (me) and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3- aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the spacer. The target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3 '-5'
exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply“gNRA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species. See, e.g., Jinek et al, Science 337:816-821(2012), the entire contents of which is hereby incorporated by reference.
[0108] Without wishing to be bound by any particular theory, the binding mechanism of a napDNAbp - guide RNA complex, in general, includes the step of forming an R-loop whereby the napDNAbp induces the unwinding of a double-strand DNA target, thereby separating the strands in the region bound by the napDNAbp. The guideRNA protospacer then hybridizes to the“target strand.” This displaces a“non-target strand” that is
complementary to the target strand, which forms the single strand region of the R-loop. In some embodiments, the napDNAbp includes one or more nuclease activities, which cuts the DNA leaving various types of lesions (e.g., a nick in one strand of the DNA). For example, the napDNAbp may comprises a nuclease activity that cuts the non-target strand at a first location, and / or cuts the target strand at a second location. Depending on the nuclease activity, the target DNA can be cut to form a“double- stranded break” whereby both strands are cut. In other embodiments, the target DNA can be cut at only a single site, i.e., the DNA is“nicked” on one strand.
[0109] The below description of various napDNAbps which can be used in connection with the disclosed nucleobase modification domains is not meant to be limiting in any way. The base editors may comprise the canonical SpCas9, or any ortholog Cas9 protein, or any variant Cas9 protein— including any naturally occurring variant, mutant, or otherwise engineered version of Cas9— that is known or which can be made or evolved through a directed evolution or otherwise mutagenic process. In various embodiments, the napDNAbp has a nickase activity, i.e., only cleave one strand of the target DNA sequence. In other
embodiments, the napDNAbp has an inactive nuclease, e.g., are“dead” proteins. Other variant Cas9 proteins that may be used are those having a smaller molecular weight than the canonical SpCas9 (e.g., for easier delivery) or having modified or rearranged primary amino acid sequence (e.g., the circular permutant forms). The base editors described herein may also comprise Cas9 equivalents, including Casl2a/Cpfl and Casl2b proteins. The napDNAbps used herein (e.g., an SpCas9 or SpCas9 variant) may also may also contain various modifications that alter/enhance their PAM specifities. The disclosure contemplates any Cas9, Cas9 variant, or Cas9 equivalent which has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% sequence identity to a reference Cas9 sequence, such as a reference SpCas9 canonical sequence (set forth in SEQ ID NO: 9), a reference SaCas9 canonical sequence (set forth in SEQ ID NO: 92) or a reference Cas9 equivalent (e.g., Casl2a/Cpfl).
[0110] In some embodiments, the napDNAbp directs cleavage of one or both strands at the location of a target sequence, such as within the target sequence and/or within the
complement of the target sequence. In some embodiments, the napDNAbp directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence. For example, an aspartate-to-alanine substitution (D10A) in the RuvC I catalytic domain of Cas9 from S. pyogenes converts Cas9 from a nuclease that cleaves both strands to a nickase (cleaves a single strand). Other examples of mutations that render Cas9 a nickase include, without limitation, H840A, N854A, and N863A in reference to the canonical SpCas9 sequence, or to equivalent amino acid positions in other Cas9 variants or Cas9 equivalents.
[0111] As used herein, the term“Cas protein” refers to a full-length Cas protein obtained from nature, a recombinant Cas protein having a sequences that differs from a naturally occurring Cas protein, or any fragment of a Cas protein that nevertheless retains all or a significant amount of the requisite basic functions needed for the disclosed methods, i.e., (i) possession of nucleic-acid programmable binding of the Cas protein to a target DNA, and (ii) ability to nick the target DNA sequence on one strand. The Cas proteins contemplated herein embrace CRISPR Cas9 proteins, as well as Cas9 equivalents, variants (e.g., Cas9 nickase (nCas9) or nuclease inactive Cas9 (dCas9)) homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or recombinant), and may include a Cas9 equivalent from any type of CRISPR system (e.g., type II, V, VI), including Cpfl (a type-V CRISPR-Cas systems), C2cl (a type V CRISPR-Cas system), C2c2 (a type VI CRISPR-Cas system) and C2c3 (a type V CRISPR-Cas system). Further Cas-equivalents are described in Makarova et al.,“C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector,” Science 2016; 353(6299), the contents of which are incorporated herein by reference.
[0112] The term“Cas9” or“Cas9 domain” embraces any naturally occurring Cas9 from any organism, any naturally-occurring Cas9 equivalent or functional fragment thereof, any Cas9 homolog, ortholog, or paralog from any organism, and any mutant or variant of a Cas9, naturally-occurring or engineered. The term Cas9 is not meant to be particularly limiting and may be referred to as a“Cas9 or equivalent.” Exemplary Cas9 proteins are further described herein and/or are described in the art and are incorporated herein by reference. The present disclosure is unlimited with regard to the particular napDNAbp that is employed in the base editors of the disclosure.
[0113] Additional Cas9 sequences and structures are well known to those of skill in the art (see, e.g.,“Complete genome sequence of an Ml strain of Streptococcus pyogenes.” Ferretti et al., J.J., McShan W.M., Ajdic D.J., Savic D.J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar F.Z., Ren Q., Zhu H., Song L., White L, Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001);“CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C.M., Gonzales K., Chao Y., Pirzada Z.A., Eckert M.R., Vogel L, Charpentier E., Nature 471:602-607(2011); and“A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara L, Hauer M., Doudna J.A., Charpentier E. Science 337:816- 821(2012), the entire contents of each of which are incorporated herein by reference), and also provided below.
[0114] Examples of Cas9 and Cas9 equivalents are provided as follows; however, these specific examples are not meant to be limiting. The base editors of the present disclosure may use any suitable napDNAbp, including any suitable Cas9 or Cas9 equivalent.
A. Wild type canonical SpCas9
[0115] In one embodiment, the base editor constructs described herein may comprise the “canonical SpCas9” nuclease from S. pyogenes, which has been widely used as a tool for genome engineering. This Cas9 protein is a large, multi-domain protein containing two distinct nuclease domains. Point mutations can be introduced into Cas9 to abolish one or both nuclease activities, resulting in a nickase Cas9 (nCas9) or dead Cas9 (dCas9), respectively, that still retains its ability to bind DNA in a sgRNA-programmed manner. In principle, when fused to another protein or domain, Cas9 or variant thereof (e.g., nCas9) can target that protein to virtually any DNA sequence simply by co-expression with an appropriate sgRNA. As used herein, the canonical SpCas9 protein refers to the wild type protein from
Streptococcus pyogenes having the following amino acid sequence:
Figure imgf000041_0001
Figure imgf000042_0001
Figure imgf000043_0001
[0116] The base editors described herein may include canonical SpCas9, or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity with a wild type Cas9 sequence provided above. These variants may include SpCas9 variants containing one or more mutations, including any known mutation reported with the
SwissProt Accession No. Q99ZW2 entry, which include:
Figure imgf000043_0002
Figure imgf000044_0001
[0117] Other wild type SpCas9 sequences that may be used in the present disclosure, include:
Figure imgf000044_0002
Figure imgf000045_0001
Figure imgf000046_0001
Figure imgf000047_0001
Figure imgf000048_0001
Figure imgf000049_0001
Figure imgf000050_0001
Figure imgf000051_0001
[0118] The base editors described herein may include any of the above SpCas9 sequences, or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
B. Wild type Cas9 orthologs
[0119] In other embodiments, the Cas9 protein can be a wild type Cas9 ortholog from another bacterial species. For example, the following Cas9 orthologs can be used in connection with the base editor constructs described in this disclosure. In addition, any variant Cas9 orthologs having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to any of the below orthologs may also be used with the disclosed base editors.
Figure imgf000051_0002
Figure imgf000052_0001
Figure imgf000053_0001
Figure imgf000054_0001
Figure imgf000055_0001
Figure imgf000056_0001
Figure imgf000057_0001
[0120] The base editors described herein may include any of the above Cas9 ortholog sequences, or any variants thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
[0121] The napDNAbp may include any suitable homologs and/or orthologs or naturally occurring enzymes, such as Cas9. Cas9 homologs and/or orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus . Preferably, the Cas moiety is configured (e.g, mutagenized, recombinantly engineered, or otherwise obtained from nature) as a nickase, i.e., capable of cleaving only a single strand of the target doubpdditional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier,“The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference. In some embodiments, a Cas9 nuclease has an inactive (e.g., an inactivated) DNA cleavage domain, that is, the Cas9 is a nickase. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 3. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 protein as provided by any one of the Cas9 orthologs in the above tables.
C. Dead napDNAbp variants
[0122] In some embodiments, the disclosed base editors may comprise a catalytically inactive, or“dead,” napDNAbp domain. Exemplary catalytically inactive domains in the disclosed base editors are dead S. pyogenes Cas9 (dSpCas9) and S. pyogenes Cas9 nickase (SpCas9n).
[0123] In certain embodiments, the base editors described herein may include a dead Cas9, e.g., dead SpCas9, which has no nuclease activity due to one or more mutations that inactivate both nuclease domains of SpCas9, namely the RuvC domain (which cleaves the non-protospacer DNA strand) and HNH domain (which cleaves the protospacer DNA strand). The nuclease inactivation may be due to one or mutations that result in one or more substitutions and/or deletions in the amino acid sequence of the encoded protein, or any variants thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
[0124] In certain embodiments, the base editors described herein may include a dead Cas9, e.g., dead SpCas9, which has no nuclease activity due to one or more mutations that inactivate both nuclease domains of SpCas9, namely the RuvC domain (which cleaves the non-protospacer DNA strand) and HNH domain (which cleaves the protospacer DNA strand). The D10A and N580A mutations in the wild-type S. aureus Cas9 amino acid sequence may be used to form a dSaCas9. Accordingly, in some embodiments, the napDNAbp domain of the base editors provided herein comprises a dSaCas9 that has D10A and N580A mutations relative to the wild-type SaCas9 sequence (SEQ ID NO: 92).
[0125] As used herein, the term“dCas9” refers to a nuclease-inactive Cas9 or nuclease-dead Cas9, or a functional fragment thereof, and embraces any naturally occurring dCas9 from any organism, any naturally-occurring dCas9 equivalent or functional fragment thereof, any dCas9 homolog, ortholog, or paralog from any organism, and any mutant or variant of a dCas9, naturally-occurring or engineered. The term dCas9 is not meant to be particularly limiting and may be referred to as a“dCas9 or equivalent.” Exemplary dCas9 proteins and method for making dCas9 proteins are further described herein and/or are described in the art and are incorporated herein by reference.
[0126] In other embodiments, dCas9 corresponds to, or comprises in part or in whole, a Cas9 amino acid sequence having one or more mutations that inactivate the Cas9 nuclease activity. In other embodiments, Cas9 variants having mutations other than D10A and H840A are provided which may result in the full or partial inactivate of the endogenous Cas9 nuclease activity (e.g., nCas9 or dCas9, respectively). Such mutations, by way of example, include other amino acid substitutions at DIO and H820, or other substitutions within the nuclease domains of Cas9 (e.g., substitutions in the HNH nuclease subdomain and/or the RuvCl subdomain) with reference to a wild type sequence such as Cas9 from Streptococcus pyogenes (NCBI Reference Sequence: NC_017053.1). In some embodiments, variants or homologues of Cas9 (e.g., variants of Cas9 from Streptococcus pyogenes (NCBI Reference Sequence: NC_017053.1)) are provided which are at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to NCBI Reference Sequence: NC_017053.1. In some embodiments, variants of dCas9 (e.g., variants of NCBI Reference Sequence: NC_017053.1) are provided having amino acid sequences which are shorter, or longer than NC_017053.1 by about 5 amino acids, by about 10 amino acids, by about 15 amino acids, by about 20 amino acids, by about 25 amino acids, by about 30 amino acids, by about 40 amino acids, by about 50 amino acids, by about 75 amino acids, by about 100 amino acids or more.
[0127] In some embodiments, the napDNAbp domain of any of the disclosed base editors comprises a dead S. pyogenes Cas9 (dSpCas9). In some embodiments, the napDNAbp domain of any of the disclosed based editors is comprises at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to SEQ ID NO: 106. In some embodiments, the napDNAbp domain of any of the disclosed base editors comprises the amino acid sequence of SEQ ID NO: 106.
[0128] In one embodiment, the dead Cas9 may be based on the canonical SpCas9 sequence of Q99ZW2 and may have the following sequence, which comprises a D10A and an H810A substitutions (underlined and bolded), or a variant of SEQ ID NO: 106 having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto:
Figure imgf000059_0001
Figure imgf000060_0001
Figure imgf000061_0001
D. napDNAbp nickase variants
[0129] In some embodiments, the disclosed base editors may comprise a napDNAbp domain that comprises a nickase. In some embodments, the base editors described herein comprise a Cas9 nickase. The term“Cas9 nickase” of“nCas9” refers to a variant of Cas9 which is capable of introducing a single-strand break in a double strand DNA molecule target. In some embodiments, the Cas9 nickase comprises only a single functioning nuclease domain. The wild type Cas9 (e.g., the canonical SpCas9) comprises two separate nuclease domains, namely, the RuvC domain (which cleaves the non-protospacer DNA strand) and HNH domain (which cleaves the protospacer DNA strand). In one embodiment, the Cas9 nickase comprises a mutation in the RuvC domain which inactivates the RuvC nuclease activity. For example, mutations in aspartate (D) 10, histidine (H) 983, aspartate (D) 986, or glutamate (E) 762, have been reported as loss-of-function mutations of the RuvC nuclease domain and the creation of a functional Cas9 nickase (e.g., Nishimasu el al,“Crystal structure of Cas9 in complex with guide RNA and target DNA,” Cell 156(5), 935-949, which is incorporated herein by reference). Thus, nickase mutations in the RuvC domain could include D10X, H983X, D986X, or E762X, wherein X is any amino acid other than the wild type amino acid. In certain embodiments, the nickase could be D10A, of H983A, or D986A, or E762A, or a combination thereof.
[0130] In some embodiments, the napDNAbp domain of any of the disclosed base editors comprises an S. pyogenes Cas9 nickase (SpCas9n). In some embodiments, the napDNAbp domain of any of the disclosed based editors is comprises at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to SEQ ID NO: 112 or 118. In some embodiments, the napDNAbp domain of any of the disclosed base editors comprises the amino acid sequence of SEQ ID NO: 112. In some embodiments, the napDNAbp domain of any of the disclosed base editors comprises the amino acid sequence of SEQ ID NO: 118.
[0131] In some embodiments, the napDNAbp domain of any of the disclosed base editors comprises an S. aureus Cas9 nickase (SaCas9n). In some embodiments, the napDNAbp domain of any of the disclosed based editors is comprises at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to SEQ ID NO: 116. In some embodiments, the napDNAbp domain of any of the disclosed base editors comprises the amino acid sequence of SEQ ID NO: 116.
[0132] In various embodiments, the Cas9 nickase can having a mutation in the RuvC nuclease domain and have one of the following amino acid sequences, or a variant thereof having an amino acid sequence that has at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
Figure imgf000062_0001
Figure imgf000063_0001
Figure imgf000064_0001
Figure imgf000065_0001
Figure imgf000066_0001
Figure imgf000067_0001
[0133] In another embodiment, the Cas9 nickase comprises a mutation in the HNH domain which inactivates the HNH nuclease activity. For example, mutations in histidine (H) 840 or asparagine (R) 863 have been reported as loss-of-function mutations of the HNH nuclease domain and the creation of a functional Cas9 nickase (e.g., Nishimasu el al,“Crystal structure of Cas9 in complex with guide RNA and target DNA,” Cell 156(5), 935-949, which is incorporated herein by reference). Thus, nickase mutations in the HNH domain could include H840X and R863X, wherein X is any amino acid other than the wild type amino acid. In certain embodiments, the nickase could be H840A or R863A or a combination thereof.
[0134] In various embodiments, the Cas9 nickase can have a mutation in the HNH nuclease domain and have one of the following amino acid sequences, or a variant thereof having an amino acid sequence that has at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
Figure imgf000067_0002
Figure imgf000068_0001
Figure imgf000069_0001
[0135] In some embodiments, the N-terminal methionine is removed from a Cas9 nickase, or from any Cas9 variant, ortholog, or equivalent disclosed or contemplated herein. For example, methionine-minus Cas9 nickases include the following sequences, or a variant thereof having an amino acid sequence that has at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.
Figure imgf000070_0001
Figure imgf000071_0001
E. Other Cas9 variants
[0136] The napDNAbp domains used in the base editors described herein may also include other Cas9 variants that area at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about
99.9% identical to any reference Cas9 protein, including any wild type Cas9, or mutant Cas9
(e.g., a dead Cas9 or Cas9 nickase), or circular permutant Cas9, or other variant of Cas9 disclosed herein or known in the art. In some embodiments, a Cas9 variant may have 1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30,
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more amino acid changes compared to a reference Cas9. In some embodiments, the Cas9 variant comprises a fragment of a reference Cas9 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild type Cas9. In some embodiments, the fragment is is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9 (e.g., SEQ ID NO: 9).
[0137] In some embodiments, the disclosure also may utilize Cas9 fragments which retain their functionality and which are fragments of any herein disclosed Cas9 protein. In some embodiments, the Cas9 fragment is at least 100 amino acids in length. In some embodiments, the fragment is at least 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, or at least 1300 amino acids in length.
[0138] In various embodiments, the base editors disclosed herein may comprise one of the Cas9 variants described as follows, or a Cas9 variant thereof having at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any reference Cas9 variants.
F. Other Cas9 equivalents
[0139] In some embodiments, the base editors described herein can include any Cas9 equivalent. As used herein, the term“Cas9 equivalent” is a broad term that encompasses any napDNAbp protein that serves the same function as Cas9 in the present base editors despite that its amino acid primary sequence and/or its three-dimensional structure may be different and/or unrelated from an evolutionary standpoint. Thus, while Cas9 equivalents include any Cas9 ortholog, homolog, mutant, or variant described or embraced herein that are
evolutionarily related, the Cas9 equivalents also embrace proteins that may have evolved through convergent evolution processes to have the same or similar function as Cas9, but which do not necessarily have any similarity with regard to amino acid sequence and/or three dimensional structure. The base editors described here embrace any Cas9 equivalent that would provide the same or similar function as Cas9 despite that the Cas9 equivalent may be based on a protein that arose through convergent evolution.
[0140] For example, CasX is a Cas9 equivalent that reportedly has the same function as Cas9 but which evolved through convergent evolution. Thus, the CasX protein described in Liu et al.,“CasX enzymes comprises a distinct family of RNA-guided genome editors,” Nature , 2019, Vol.566: 218-223, is contemplated to be used with the base editors described herein.
In addition, any variant or modification of CasX is conceivable and within the scope of the present disclosure.
[0141] Cas9 is a bacterial enzyme that evolved in a wide variety of species. However, the Cas9 equivalents contemplated herein may also be obtained from archaea, which constitute a domain and kingdom of single-celled prokaryotic microbes different from bacteria.
[0142] In some embodiments, Cas9 equivalents may refer to CasX or CasY, which have been described in, for example, Burstein et ah,“New CRISPR-Cas systems from
uncultivated microbes.” Cell Res. 2017 Feb 21. doi: 10.1038/cr.2017.21, the entire contents of which is hereby incorporated by reference. Using genome-resolved metagenomics, a number of CRISPR-Cas systems were identified, including the first reported Cas9 in the archaeal domain of life. This divergent Cas9 protein was found in little- studied nanoarchaea as part of an active CRISPR-Cas system. In bacteria, two previously unknown systems were discovered, CRISPR-CasX and CRISPR-CasY, which are among the most compact systems yet discovered. In some embodiments, Cas9 refers to CasX, or a variant of CasX. In some embodiments, Cas9 refers to a CasY, or a variant of CasY. It should be appreciated that other RNA-guided DNA binding proteins may be used as a nucleic acid programmable DNA binding protein (napDNAbp), and are within the scope of this disclosure. Also see Liu et ah, “CasX enzymes comprises a distinct family of RNA-guided genome editors,” Nature , 2019, Vol.566: 218-223. Any of these Cas9 equivalents are contemplated.
[0143] In some embodiments, the Cas9 equivalent comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring CasX or CasY protein. In some embodiments, the napDNAbp is a naturally-occurring CasX or CasY protein. In some embodiments, the napDNAbp comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a wild-type Cas moiety or any Cas moiety provided herein.
[0144] In various embodiments, the nucleic acid programmable DNA binding proteins include, without limitation, Cas9 ( e.g ., dCas9 and nCas9), CasX, CasY, Cpfl, C2cl, C2c2, C2C3, Argonaute, Casl2a, and Casl2b. One example of a nucleic acid programmable DNA- binding protein that has different PAM specificity than Cas9 is Clustered Regularly
Interspaced Short Palindromic Repeats from Prevotella and Francisella 1 (Cpfl). Similar to Cas9, Cpfl is also a class 2 CRISPR effector. It has been shown that Cpfl mediates robust DNA interference with features distinct from Cas9. Cpfl is a single RNA-guided
endonuclease lacking tracrRNA, and it utilizes a T-rich protospacer-adjacent motif (TTN, TTTN, or YTN). Moreover, Cpfl cleaves DNA via a staggered DNA double-stranded break. Out of 16 Cpf 1-family proteins, two enzymes from Acidaminococcus and Lachnospiraceae are shown to have efficient genome-editing activity in human cells. Cpfl proteins are known in the art and have been described previously, for example Yamano et al,“Crystal structure of Cpfl in complex with guide RNA and target DNA.” Cell (165) 2016, p. 949-962; the entire contents of which is hereby incorporated by reference. The state of the art may also now refer to Cpfl enzymes as Cas 12a.
[0145] In still other embodiments, the Cas protein may include any CRISPR associated protein, including but not limited to Casl2a, Casl2b, Casl, CaslB, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (sometimes referred to as Csnl and Csxl2), CaslO, Csyl, Csy2, Csy3, Csel, Cse2, Cscl, Csc2, Csa5, Csn2. Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Csxl6, CsaX, Csx3, Csxl, Csxl5, Csfl, Csf2, Csf3, Csf4, homologs thereof, or modified versions thereof, and preferably comprising a nickase mutation (e.g., a mutation corresponding to the D10A mutation of the wild type SpCas9 polypeptide of SEQ ID NO: 9).
[0146] In various other embodiments, the napDNAbp can be any of the following proteins: a Cas9, a Cpfl, a CasX, a CasY, a C2cl, a C2c2, a C2c3, a GeoCas9, a CjCas9, a Casl2a, a Casl2b, a Casl2g, a Casl2h, a Casl2i, a Casl3b, a Casl3c, a Casl3d, a Casl4, a Csn2, an xCas9, an SpCas9-NG, a circularly permuted Cas9, or an Argonaute (Ago), a Cas9-KKH, a SmacCas9, a Spy-macCas9, an SpCas9-VRQR, an SpCas9-NRRH, an SpaCas9-NRTH, an SpCas9-NRCH, or a variant thereof.
[0147] In certain embodiments, the base editors contemplated herein can include a Cas9 protein that is of smaller molecular weight than the canonical SpCas9 sequence. In some embodiments, the smaller-sized Cas9 variants may facilitate delivery to cells, e.g., by an expression vector, nanoparticle, or other means of delivery. The canonical SpCas9 protein is 1368 amino acids in length and has a predicted molecular weight of 158 kilodaltons. The term“small-sized Cas9 variant”, as used herein, refers to any Cas9 variant— naturally occurring, engineered, or otherwise— that is less than at least 1300 amino acids, or at least less than 1290 amino acids, or than less than 1280 amino acids, or less than 1270 amino acid, or less than 1260 amino acid, or less than 1250 amino acids, or less than 1240 amino acids, or less than 1230 amino acids, or less than 1220 amino acids, or less than 1210 amino acids, or less than 1200 amino acids, or less than 1190 amino acids, or less than 1180 amino acids, or less than 1170 amino acids, or less than 1160 amino acids, or less than 1150 amino acids, or less than 1140 amino acids, or less than 1130 amino acids, or less than 1120 amino acids, or less than 1110 amino acids, or less than 1100 amino acids, or less than 1050 amino acids, or less than 1000 amino acids, or less than 950 amino acids, or less than 900 amino acids, or less than 850 amino acids, or less than 800 amino acids, or less than 750 amino acids, or less than 700 amino acids, or less than 650 amino acids, or less than 600 amino acids, or less than 550 amino acids, or less than 500 amino acids, but at least larger than about 400 amino acids and retaining the required functions of the Cas9 protein.
[0148] In various embodiments, the base editors disclosed herein may comprise one of the small-sized Cas9 variants described as follows, or a Cas9 variant thereof having at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any reference small-sized Cas9 protein. Exemplary small-sized Cas9 variants include, but are not limited to, SaCas9 and LbCasl2a.
[0149] In some embodiments, the base editors described herein may also comprise
Casl2a/Cpfl (dCpfl) variants that may be used as a guide nucleotide sequence- programmable DNA-binding protein domain. The Casl2a/Cpfl protein has a RuvC-like endonuclease domain that is similar to the RuvC domain of Cas9 but does not have a HNH endonuclease domain, and the N-terminal of Cpfl does not have the alpha-helical recognition lobe of Cas9. It was shown in Zetsche el al, Cell , 163, 759-771, 2015 (which is incorporated herein by reference) that, the RuvC-like domain of Cpf 1 is responsible for cleaving both DNA strands and inactivation of the RuvC-like domain inactivates Cpfl nuclease activity.
Figure imgf000076_0001
Figure imgf000077_0001
Figure imgf000078_0001
[0150] Additional exemplary Cas9 equivalent protein sequences can include the following:
Figure imgf000078_0002
Figure imgf000079_0001
Figure imgf000080_0001
Figure imgf000081_0001
Figure imgf000082_0001
Figure imgf000083_0001
G. napDNAbps that recognize non-canonical PAM sequences
[0151] In some embodiments, the napDNAbp is a nucleic acid programmable DNA binding protein that does not require a canonical (NGG) PAM sequence. In some embodiments, the napDNAbp is an argonaute protein. One example of such a nucleic acid programmable DNA binding protein is an Argonaute protein from Natronobacterium gregoryi (NgAgo). NgAgo is a ssDNA-guided endonuclease. NgAgo binds 5' phosphorylated ssDNA of ~24 nucleotides (gDNA) to guide it to its target site and will make DNA double-strand breaks at the gDNA site. In contrast to Cas9, the NgAgo-gDNA system does not require a protospacer-adjacent motif (PAM). Using a nuclease inactive NgAgo (dNgAgo) can greatly expand the bases that may be targeted. The characterization and use of NgAgo have been described in Gao et al., Nat Biotechnol., 2016 Jul;34(7):768-73. PubMed RMP3: 27136078; Swarts et al., Nature. 507(7491) (2014):258-61; and Swarts et al., Nucleic Acids Res. 43(10) (2015):5120-9, each of which is incorporated herein by reference. [0152] In some embodiments, the disclosure provides napDNAbp domains that comprise SpCas9 variants that recognize and work best with NRRH, NRCH, and NRTH PAMs. See PCT Application No. PCT/US2019/47996, incorporated by reference herein. In some embodiments, the disclosed base editors comprise a napDNAbp domain selected from SpCas9-NRRH, SpCas9-NRTH, and SpCas9-NRCH.
[0153] In some embodiments, the disclosed base editors comprise a napDNAbp domain that has a sequence that is at least 90%, at least 95%, at least 98%, or at least 99% identical to SpCas9-NRRH. In some embodiments, the disclosed base editors comprise a napDNAbp domain that comprises SpCas9-NRRH. The SpCas9-NRRH has an amino acid sequence as presented in SEQ ID NO: 141 (underligned residues are mutated relative to SpCas9, as set forth in SEQ ID NO: 9)
MDKKY S IGLDIGTNS VGWAVITDEYKVPS KKFKVLGNTDRHS IKKNLIGALLFDS GET
AEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERH
PIF GNIVDE VAYHEKYPTIYHLRKKLVDS TD KADLRLIYLALAHMIKFRGHFLIEGDLN
PDNS D VDKLFIQLV QT YN QLFEENPIN AS G VD AKAILS ARLS KS RRLENLI AQLPGEK
KNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLF
LAAKNLSDAILLSDILRVNTEITKAPLSASMVKRYDEHHQDLTLLKALVRQQLPEKYK
EIFFDQS KN G YAGYIDGGAS QEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFD
N GIIPHQIHLGELH AILRRQGDFYPFLKDNREKIEKILTFRIP YYV GPLARGNSRFAWM
TRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNE
LTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEI
SGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTY
AHLFDDKVMKQLKRLRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQ
LIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGG
HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKL
YLY YLQN GRDM Y VD QELDINRLS D YD VDHI VPQS FLKDDS IDNKVLTRS DKNRGKS
DN VPS EE V VKKMKN YWRQLLN AKLIT QRKFDNLTK AERGGLS ELDKAGFIKRQLVE
TRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNY
HH AHD AYLN AV V GTALIKKYPKLES EF V Y GD YKV YD VRKMIAKS EQEIGKATAKYF
FY S NIMNFFKTEITLAN GEIRKRPLIETN GET GEIVWDKGRDFAT VRKVLS MPQ VNIV
KKTEVQTGGFSKESILPKGNSDKLIARKKDWDPKKYGGFNSPTAAYSVLVVAKVEKG
KSKKLKSVKELLGITIMERSSFEKNPIGFLEAKGYKEVKKDLIIKLPKYSLFELENGRK
RMLAS AG VLHKGNELALPS KY VNFLYLAS H YEKLKGS PEDNEQKQLF VEQHKH YLD EIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGVPAAFKYF
DTTIDKKRYT S TKE VLD ATLIHQS IT GLYETRIDLS QLGGD (SEQ ID NO: 141).
[0154] In some embodiments, the disclosed base editors comprise a napDNAbp domain that has a sequence that is at least 90%, at least 95%, at least 98%, or at least 99% identical to
SpCas9-NRCH. In some embodiments, the disclosed base editors comprise a napDNAbp domain that comprises SpCas9-NRCH. The SpCas9-NRCH has an amino acid sequence as presented in SEQ ID NO: 142 (underligned residues are mutated relative to SpCas9)
MDKKY S IGLDIGTNS VGWAVITDEYKVPS KKFKVLGNTDRHS IKKNLIGALLFDS GET
AEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERH
PIF GNIVDE VAYHEKYPTIYHLRKKLVDS TD KADLRLIYLALAHMIKFRGHFLIEGDLN
PDNS D VDKLFIQLV QT YN QLFEENPIN AS G VD AKAILS ARLS KS RRLENLI AQLPGEK
KNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLF
LAAKNLSDAILLSDILRVNTEITKAPLSASMVKRYDEHHQDLTLLKALVRQQLPEKYK
EIFFDQS KN G YAGYIDGGAS QEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFD
N GIIPHQIHLGELH AILRRQGDFYPFLKDNREKIEKILTFRIP YYV GPLARGNSRFAWM
TRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNE
LTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEI
SGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTY
AHLFDDKVMKQLKRLRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQ
LIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGG
HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKL
YLY YLQN GRDM Y VD QELDINRLS D YD VDHI VPQS FLKDDS IDNKVLTRS DKNRGKS
DN VPS EE V VKKMKN YWRQLLN AKLIT QRKFDNLTK AERGGLS ELDKAGFIKRQLVE
TRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNY
HH AHD AYLN AV V GTALIKKYPKLES EF V Y GD YKV YD VRKMIAKS EQEIGKATAKYF
FY S NIMNFFKTEITLAN GEIRKRPLIETN GET GEIVWDKGRDFAT VRKVLS MPQ VNIV
KKTEVQTGGFSKESILPKGNSDKLIARKKDWDPKKY GGFNSPTVAY S VLVVAKVEKG
KSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRK
RMLASAGVLQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLD
EIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYF
DTTINRKQYNTTKE VLD ATLIRQS ITGLYETRIDLS QLGGD (SEQ ID NO: 142)
[0155] In some embodiments, the disclosed base editors comprise a napDNAbp domain that has a sequence that is at least 90%, at least 95%, at least 98%, or at least 99% identical to SpCas9-NRTH. In some embodiments, the disclosed base editors comprise a napDNAbp domain that comprises SpCas9-NRTH. The SpCas9-NRTH has an amino acid sequence as presented in SEQ ID NO: 143 (underligned residues are mutated relative to SpCas9)
MDKKY S IGLDIGTNS VGWAVITDEYKVPS KKFKVLGNTDRHS IKKNLIGALLFDS GET
AEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERH
PIF GNIVDE VAYHEKYPTIYHLRKKLVDS TD KADLRLIYLALAHMIKFRGHFLIEGDLN
PDNS D VDKLFIQLV QT YN QLFEENPIN AS G VD AKAILS ARLS KS RRLENLI AQLPGEK
KNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLF
LAAKNLSDAILLSDILRVNTEITKAPLSASMVKRYDEHHQDLTLLKALVRQQLPEKYK
EIFFDQS KN G YAGYIDGGAS QEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFD
N GIIPHQIHLGELH AILRRQGDFYPFLKDNREKIEKILTFRIP YYV GPLARGNSRFAWM
TRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNE
LTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEI
SGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTY
AHLFDDKVMKQLKRLRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQ
LIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGG
HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKL
YLY YLQN GRDM Y VD QELDINRLS D YD VDHI VPQS FLKDDS IDNKVLTRS DKNRGKS
DN VPS EE V VKKMKN YWRQLLN AKLIT QRKFDNLTK AERGGLS ELDKAGFIKRQLVE
TRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNY
HH AHD AYLN AV V GTALIKKYPKLES EF V Y GD YKV YD VRKMIAKS EQEIGKATAKYF
FY S NIMNFFKTEITLAN GEIRKRPLIETN GET GEIVWDKGRDFAT VRKVLS MPQ VNIV
KKTEVQTGGFSKESILPKGNSDKLIARKKDWDPKKY GGFNSPTVAY S VLVVAKVEKG
KSKKLKSVKELLGITIMERSSFEKNPIGFLEAKGYKEVKKDLIIKLPKYSLFELENGRK
RMLASASVLHKGNELALPSKYVNFLYLASHYEKLKGSSEDNKQKQLFVEQHKHYLD
EIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGASAAFKYF
DTTIGRKLYT S TKE VLD ATLIHQS IT GLYETRIDLS QLGGD (SEQ ID NO: 143)
[0156] In other embodiments, the napDNAbp of any of the disclosed base editors comprises a Cas9 derived from a Streptococcus macacae, e.g. Streptococcus macacae NCTC 11558, or
SmacCas9, or a variant thereof. In some embodiments, the napDNAbp comprises a hybrid variant of SmacCas9 that incorporates an SpCas9 domain with the SmacCas9 domain and is known as Spy-macCas9, or a variant thereof. In some embodiments, the napDNAbp comprises a hybrid variant of SmacCas9 that incorporates an increased nucleolytic variant of an SpCas9 (iSpy Cas9) domain and is known as iSpy-macCas9. Relative to Spymac-Cas9, iSpyMac-Cas9 contains two mutations, R221K and N394K, that were identified by deep mutational scans of Spy Cas9 that raise modification rates of the protein on most targets. See Jakimo el al, bioRxiv, A Cas9 with Complete PAM Recognition for Adenine Dinucleotides (Sep 2018), herein incorporated by reference. Jakimo et al. showed that the hybrids Spy- macCas9 and iSpy-macCas9 recognize a short 5'-NAA-3' PAM and recognized all evaluated adenine dinucleotide PAM sequences and posseseds robust editing efficiency in human cells. Liu et al. engineered base editors containing Spy-mac Cas9, and demonstrated that cytidine and base editors containing Spymac domains can induce efficient C-to-T and A-to-G conversions in vivo. In addition, Liu et al. suggested that the PAM scope of Spy-mac Cas9 may be 5 '-T AAA-3', rather than 5'-NAA-3' as reported by Jakimo et al. See Liu et al. Cell Discovery (2019) 5:58, herein incorporated by reference.
[0157] In some embodiments, the disclosed base editors comprise a napDNAbp domain that has a sequence that is at least 90%, at least 95%, at least 98%, or at least 99% identical to iSpyMac-Cas9. In some embodiments, the disclosed base editors comprise a napDNAbp domain that comprises iSpyMac-Cas9. The iSpyMac-Cas9 has an amino acid sequence as presented in SEQ ID NO: 144 (R221K and N394K mutations are underlined):
DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA
EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPI
FGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNP
DNS D VDKLFIQLV QT YN QLFEENPIN AS G VD AKAILS ARLS KS RKLENLIAQLPGEKK
NGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFL
AAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEI
FFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLKREDLLRKQRTFDNG
SIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTR
KS EETITPWNFEE V VD KG AS AQS FIERMTNFDKNLPNEKVLPKHS LLYE YFT V YNELT
KVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISG
VEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAH
LFDDKVMKQLKRRRYTGW GRLSRKLIN GIRDKQS GKTILDFLKS DGFANRNFMQLIH
DDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHK
PENIVIEMAREN QTTQKGQKNSRERMKRIEEGIKELGS QILKEHPVENTQLQNEKLYL
YYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDN
VPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETR QITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHH
AHDAYFNAVVGTAFIKKYPKFESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYS
NIMNFFKTEITFANGEIRKRPFIETNGETGEIVWDKGRDFATVRKVFSMPQVNIVKKT
EIQT V GQN GGFFDDNPKS PEE VTPS KFVPFKKEFNPKKY GG Y QKPTTAYP VFFITDTK
QLIPIS VMNKKQFEQNPVKFLRDRGY QQVGKNDFIKLPKYTLVDIGDGIKRLWAS S KE
IHKGN QLV VS KKS QILLYH AHHLDS DLS ND YLQNHN QQFD VLFNEIIS F S KKCKLGKE
HIQKIENVYSNKKNSASffiELAESFIKLLGFTQLGATSPFNFLGVKLNQKQYKGKKDYI
LPCTEGTLIRQS ITGLYETRVDLS KIGED (SEQ ID NO: 144)
[0158] In other embodiments, the napDNAbp of any of the disclosed base editors is a prokaryotic homolog of an Argonaute protein. Prokaryotic homologs of Argonaute proteins are known and have been described, for example, in Makarova K., el al.,“Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements”, Biol Direct. 2009 Aug 25;4:29. doi:
10.1186/1745-6150-4-29, the entire contents of which is hereby incorporated by reference. In some embodiments, the napDNAbp is a Marinitoga piezophila Argunaute (MpAgo) protein.
The CRISPR-associated Marinitoga piezophila Argunaute (MpAgo) protein cleaves single- stranded target sequences using 5'-phosphorylated guides. The 5' guides are used by all known Argonautes. The crystal structure of an MpAgo-RNA complex shows a guide strand binding site comprising residues that block 5' phosphate interactions. This data suggests the evolution of an Argonaute subclass with noncanonical specificity for a 5'-hydroxylated guide.
See, e.g., Kaya el al.,“A bacterial Argonaute with noncanonical guide RNA specificity”,
Proc Natl Acad Sci U S A. 2016 Apr 12;113(15):4057-62, the entire contents of which are hereby incorporated by reference). It should be appreciated that other argonaute proteins may be used, and are within the scope of this disclosure.
[0159] In some embodiments, the napDNAbp is a single effector of a microbial CRISPR-Cas system. Single effectors of microbial CRISPR-Cas systems include, without limitation, Cas9, Cpfl, C2cl, C2c2, and C2c3. Typically, microbial CRISPR-Cas systems are divided into Class 1 and Class 2 systems. Class 1 systems have multisubunit effector complexes, while Class 2 systems have a single protein effector. For example, Cas9 and Cpfl are Class 2 effectors. In addition to Cas9 and Cpfl, three distinct Class 2 CRISPR-Cas systems (C2cl, C2c2, and C2c3) have been described by Shmakov el al.,“Discovery and Functional
Characterization of Diverse Class 2 CRISPR Cas Systems”, Mol. Cell, 2015 Nov 5; 60(3): 385-397, the entire contents of which is hereby incorporated by reference. Effectors of two of the systems, C2cl and C2c3, contain RuvC-like endonuclease domains related to Cpfl. A third system, C2c2 contains an effector with two predicated HEPN RNase domains.
Production of mature CRISPR RNA is tracrRNA-independent, unlike production of CRISPR RNA by C2cl. C2cl depends on both CRISPR RNA and tracrRNA for DNA cleavage.
Bacterial C2c2 has been shown to possess a unique RNase activity for CRISPR RNA maturation distinct from its RNA-activated single- stranded RNA degradation activity. These RNase functions are different from each other and from the CRISPR RNA-processing behavior of Cpfl. See, e.g., East-Seletsky, et al.,“Two distinct RNase activities of CRISPR- C2c2 enable guide-RNA processing and RNA detection”, Nature, 2016 Oct
13;538(7624):270-273, the entire contents of which are hereby incorporated by reference. In vitro biochemical analysis of C2c2 in Leptotrichia shahii has shown that C2c2 is guided by a single CRISPR RNA and can be programed to cleave ssRNA targets carrying complementary protospacers. Catalytic residues in the two conserved HEPN domains mediate cleavage. Mutations in the catalytic residues generate catalytically inactive RNA-binding proteins. See e.g., Abudayyeh et al.,“C2c2 is a single-component programmable RNA-guided RNA- targeting CRISPR effector”, Science, 2016 Aug 5; 353(6299), the entire contents of which are hereby incorporated by reference.
[0160] The crystal structure of Alicyclobaccillus acidoterrastris C2cl (AacC2cl) has been reported in complex with a chimeric single-molecule guide RNA (sgRNA). See e.g., Liu et al.,“C2cl-sgRNA Complex Structure Reveals RNA-Guided DNA Cleavage Mechanism”, Mol. Cell, 2017 Jan 19;65(2):310-322, the entire contents of which are hereby incorporated by reference. The crystal structure has also been reported in Alicyclobacillus acidoterrestris C2cl bound to target DNAs as ternary complexes. See e.g., Yang et al.,“P AM-dependent Target DNA Recognition and Cleavage by C2C1 CRISPR-Cas endonuclease”, Cell, 2016 Dec 15; 167(7): 1814-1828, the entire contents of which are hereby incorporated by reference. Catalytically competent conformations of AacC2cl, both with target and non-target DNA strands, have been captured independently positioned within a single RuvC catalytic pocket, with C2cl -mediated cleavage resulting in a staggered seven-nucleotide break of target DNA. Structural comparisons between C2cl ternary complexes and previously identified Cas9 and Cpfl counterparts demonstrate the diversity of mechanisms used by CRISPR-Cas9 systems.
[0161] In some embodiments, the napDNAbp may be a C2cl, a C2c2, or a C2c3 protein. In some embodiments, the napDNAbp is a C2cl protein. In some embodiments, the napDNAbp is a C2c2 protein. In some embodiments, the napDNAbp is a C2c3 protein. In some embodiments, the napDNAbp comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring C2cl, C2c2, or C2c3 protein. In some embodiments, the napDNAbp is a naturally-occurring C2cl, C2c2, or C2c3 protein.
[0162] Some aspects of the disclosure provide Cas9 domains that have different PAM specificities. Typically, Cas9 proteins, such as Cas9 from S. pyogenes (spCas9), require a canonical NGG PAM sequence to bind a particular nucleic acid region. This may limit the ability to edit desired bases within a genome. In some embodiments, the base editing base editors provided herein may need to be placed at a precise location, for example where a target base is placed within a 4 base region ( e.g ., a“editing window” or a“target window”), which is approximately 15 bases upstream of the PAM. See Komor, A.C., et al,
“Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage” Nature 533, 420-424 (2016), the entire contents of which are hereby incorporated by reference. Accordingly, in some embodiments, any of the base editors provided herein may contain a Cas9 domain that is capable of binding a nucleotide sequence that does not contain a canonical (e.g., NGG) PAM sequence. Cas9 domains that bind to non-canonical PAM sequences have been described in the art and would be apparent to the skilled artisan. For example, Cas9 domains that bind non-canonical PAM sequences have been described in Kleinstiver, B. P., et al.,“Engineered CRISPR-Cas9 nucleases with altered PAM
specificities” Nature 523, 481-485 (2015); and Kleinstiver, B. P., et al.,“Broadening the targeting range of Staphylococcus aureus CRISPR-Cas9 by modifying PAM recognition” Nature Biotechnology 33, 1293-1298 (2015); the entire contents of each are hereby incorporated by reference.
[0163] For example, a napDNAbp domain with altered PAM specificity, such as a domain with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity with wild type Francisella novicida Cpfl (SEQ ID NO: 145) (D917, E1006, and D1255), which has the following amino acid sequence:
Figure imgf000090_0001
Figure imgf000091_0001
[0164] An additional napDNAbp domain with altered PAM specificity, such as a domain having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity with wild type Geobacillus thermodenitrificans Cas9 (SEQ ID NO: 146), which has the following amino acid sequence:
Figure imgf000091_0002
[0165] In some embodiments, the nucleic acid programmable DNA binding protein
(napDNAbp) is a nucleic acid programmable DNA binding protein that does not require a canonical (NGG) PAM sequence. In some embodiments, the napDNAbp is an argonaute protein. One example of such a nucleic acid programmable DNA binding protein is an Argonaute protein from Natronobacterium gregoryi (NgAgo). NgAgo is a ssDNA-guided endonuclease. NgAgo binds 5' phosphorylated ssDNA of ~24 nucleotides (gDNA) to guide it to its target site and will make DNA double-strand breaks at the gDNA site. In contrast to Cas9, the NgAgo-gDNA system does not require a protospacer-adjacent motif (PAM).
Using a nuclease inactive NgAgo (dNgAgo) can greatly expand the bases that may be targeted. The characterization and use of NgAgo have been described in Gao et al, Nat Biotechnol., 34(7): 768-73 (2016), PubMed PMID: 27136078; Swarts et al., Nature, 507(7491): 258-61 (2014); and Swarts et al., Nucleic Acids Res. 43(10) (2015): 5120-9, each of which is incorporated herein by reference. The sequence of Natronobacterium gregoryi Argonaute is provided in SEQ ID NO: 147. [0166] The disclosed base editors may comprise a napDNAbp domain having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity with wild type Natronobacterium gregoryi Argonaute (SEQ ID NO: 147), which has the following amino acid sequence:
Figure imgf000092_0001
H. Cas9 circular permutants
[0167] In various embodiments, the base editors disclosed herein may comprise a circular permutant of Cas9.
[0168] The term“circularly permuted Cas9” or“circular permutant” of Cas9 or“CP-Cas9”) refers to any Cas9 protein, or variant thereof, that occurs or has been modify to engineered as a circular permutant variant, which means the N-terminus and the C-terminus of a Cas9 protein (e.g., a wild type Cas9 protein) have been topically rearranged. Such circularly permuted Cas9 proteins, or variants thereof, retain the ability to bind DNA when complexed with a guide RNA (gRNA). See, Oakes et ah,“Protein Engineering of Cas9 for enhanced function,” Methods Enzymol, 2014, 546: 491-511 and Oakes et ah,“CRISPR-Cas9 Circular Permutants as Programmable Scaffolds for Genome Modification,” Cell , January 10, 2019, 176: 254-267, and Huang, T.P. et al. Circularly permuted and PAM-modified Cas9 variants broaden the targeting scope of base editors. Nat. Biotechnol. 37, 626-631 (2019). each of are incorporated herein by reference. Reference is also made to International Application No. PCT/US2019/47996, filed August 23, 2019, herein incorporated by reference. The instant disclosure contemplates any previously known CP-Cas9 or use a new CP-Cas9 so long as the resulting circularly permuted protein retains the ability to bind DNA when complexed with a guide RNA (gRNA).
[0169] Any of the Cas9 proteins described herein, including any variant, ortholog, or naturally occurring Cas9 or equivalent thereof, may be reconfigured as a circular permutant variant. [0170] In various embodiments, the circular permutants of Cas9 may have the following structure:
[0171] N-terminus-[original C-terminus] - [optional linker] - [original N-terminus]-C- terminus.
[0172] As an example, the present disclosure contemplates the following circular permutants of canonical S. pyogenes Cas9 (1368 amino acids of UniProtKB - Q99ZW2 (CAS9_STRP1) (numbering is based on the amino acid position in SEQ ID NO: 9)):
[0173] N-terminus-[1268-1368]-[optional linker]-[l-1267]-C-terminus;
[0174] N-terminus-[1168-1368]-[optional linker]-[l-1167]-C-terminus;
[0175] N-terminus-[1068-1368]-[optional linker]-[l-1067]-C-terminus;
[0176] N-terminus-[968-1368]-[optional linker]-[l-967]-C-terminus;
[0177] N -terminu s- [868-1368] - [optional linker] - [l-867]-C -terminu s ;
[0178] N -terminu s- [768-1368] - [optional linker] - [ 1 -767 ] -C -terminu s ;
[0179] N-terminus-[668-1368]-[optional linker]-[l-667]-C-terminus;
[0180] N -terminu s- [568-1368] - [optional linker] - [l-567]-C -terminu s ;
[0181] N -terminu s- [468-1368] - [optional linker] - [l-467]-C -terminu s ;
[0182] N -terminu s- [368-1368] - [optional linker] - [l-367]-C -terminu s ;
[0183] N -terminu s- [268-1368] - [optional linker] - [l-267]-C -terminu s ;
[0184] N -terminu s- [168-1368] - [optional linker] - [1-167]-C -terminu s ;
[0185] N-terminus-[68-1368]-[optional linker]-[l-67]-C-terminus; or
[0186] N-terminus-[10-1368]-[optional linker]-[l-9]-C-terminus, or the corresponding circular permutants of other Cas9 proteins (including other Cas9 orthologs, variants, etc).
[0187] In particular embodiments, the circular permuant Cas9 has the following structure (based on S. pyogenes Cas9 (1368 amino acids of UniProtKB - Q99ZW2 (CAS9_STRP1) (numbering is based on the amino acid position in SEQ ID NO: 9):
N -terminu s- [102-1368] - [optional linker] - [1-101]-C -terminu s ;
N-terminus-[1028-1368]-[optional linker]-[l-1027]-C-terminus;
N -terminus- [1041-1368] - [optional linker] -[1-1043] -C-terminus ;
N-terminus-[1249-1368]-[optional linker]-[l-1248]-C-terminus; or
N-terminus-[1300-1368]-[optional linker]-[l-1299]-C-terminus, or the corresponding circular permutants of other Cas9 proteins (including other Cas9 orthologs, variants, etc).
[0188] In still other embodiments, the circular permuant Cas9 has the following structure (based on S. pyogenes Cas9 (1368 amino acids of UniProtKB - Q99ZW2 (CAS9_STRP1) (numbering is based on the amino acid position in SEQ ID NO: 9): N -terminu s- [103-1368] - [optional linker] - [ 1 - 102] -C -terminu s ;
N-terminus-[1029-1368]-[optional linker]-[l-1028]-C-terminus;
N -terminu s- [ 1042- 1368 ] - [optional linker] -[1-1041] -C- terminu s ;
N-terminus-[1250-1368]-[optional linker]-[l-1249]-C-terminus; or
N-terminus-[1301-1368]-[optional linker]-[l-1300]-C-terminus, or the corresponding circular permutants of other Cas9 proteins (including other Cas9 orthologs, variants, etc.).
[0189] In some embodiments, the circular permutant can be formed by linking a C-terminal fragment of a Cas9 to an N-terminal fragment of a Cas9, either directly or by using a linker, such as an amino acid linker. In some embodiments, The C-terminal fragment may correspond to the C-terminal 95% or more of the amino acids of a Cas9 (e.g., amino acids about 1300-1368), or the C-terminal 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%,
45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% or more of a Cas9. The N-terminal portion may correspond to the N-terminal 95% or more of the amino acids of a Cas9 (e.g., amino acids about 1-1300), or the N-terminal 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% or more of a Cas9 (e.g., of SEQ ID NO: 9).
[0190] In some embodiments, the circular permutant can be formed by linking a C-terminal fragment of a Cas9 to an N-terminal fragment of a Cas9, either directly or by using a linker, such as an amino acid linker. In some embodiments, the C-terminal fragment that is rearranged to the N-terminus, includes or corresponds to the C-terminal 30% or less of the amino acids of a Cas9 (e.g., amino acids 1012-1368 of SEQ ID NO: 9). In some
embodiments, the C-terminal fragment that is rearranged to the N-terminus, includes or corresponds to the C-terminal 30%, 29%, 28%, 27%, 26%, 25%, 24%, 23%, 22%, 21%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%,
2%, or 1% of the amino acids of a Cas9 (e.g., the Cas9 of SEQ ID NO: 9). In some embodiments, the C-terminal fragment that is rearranged to the N-terminus, includes or corresponds to the C-terminal 410 residues or less of a Cas9 (e.g., the Cas9 of SEQ ID NO:
9). In some embodiments, the C-terminal portion that is rearranged to the N-terminus, includes or corresponds to the C-terminal 410, 400, 390, 380, 370, 360, 350, 340, 330, 320, 310, 300, 290, 280, 270, 260, 250, 240, 230, 220, 210, 200, 190, 180, 170, 160, 150, 140,
130, 120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 20, or 10 residues of a Cas9 (e.g., the Cas9 of SEQ ID NO: 9). In some embodiments, the C-terminal portion that is rearranged to the N- terminus, includes or corresponds to the C-terminal 357, 341, 328, 120, or 69 residues of a Cas9 (e.g., the Cas9 of SEQ ID NO: 9). [0191] In other embodiments, circular permutant Cas9 variants may be defined as a topological rearrangement of a Cas9 primary structure based on the following method, which is based on S. pyogenes Cas9 of SEQ ID NO: 9: (a) selecting a circular permutant (CP) site corresponding to an internal amino acid residue of the Cas9 primary structure, which dissects the original protein into two halves: an N-terminal region and a C-terminal region; (b) modifying the Cas9 protein sequence (e.g., by genetic engineering techniques) by moving the original C-terminal region (comprising the CP site amino acid) to preceed the original N- terminal region, thereby forming a new N-terminus of the Cas9 protein that now begins with the CP site amino acid residue. The CP site can be located in any domain of the Cas9 protein, including, for example, the helical-II domain, the RuvCIII domain, or the CTD domain. For example, the CP site may be located (relative the S. pyogenes Cas9 of SEQ ID NO: 9) at original amino acid residue 181, 199, 230, 270, 310, 1010, 1016, 1023, 1029, 1041, 1247, 1249, or 1282. Thus, once relocated to the N-terminus, original amino acid 181, 199, 230, 270, 310, 1010, 1016, 1023, 1029, 1041, 1247, 1249, or 1282 would become the new N- terminal amino acid. Nomenclature of these CP-Cas9 proteins may be referred to as Cas9- CP181, Cas9-CP199, Cas9-CP230, Cas9-CP270, Cas9-CP310, Cas9-CP1010, Cas9-CP1016, Cas9- CP1023, Cas9-CP1029, Cas9-CP1041, Cas9-CP1247, Cas9-CP1249, and Cas9-CP1282, respectively. This description is not meant to be limited to making CP variants from SEQ ID NO: 9, but may be implemented to make CP variants in any Cas9 sequence, either at CP sites that correspond to these positions, or at other CP sites entireley. This description is not meant to limit the specific CP sites in any way. Virtually any CP site may be used to form a CP-Cas9 variant.
[0192] Exemplary CP-Cas9 amino acid sequences, based on the Cas9 of SEQ ID NO: 9, are provided below in which linker sequences are indicated by underlining and optional methionine (M) residues are indicated in bold. It should be appreciated that the disclosure provides CP-Cas9 sequences that do not include a linker sequence or that include different linker sequences. It should be appreciated that CP-Cas9 sequences may be based on Cas9 sequences other than that of SEQ ID NO: 9 and any examples provided herein are not meant to be limiting. Exemplary CP-Cas9 sequences are as follows:
Figure imgf000095_0001
Figure imgf000096_0001
Figure imgf000097_0001
Figure imgf000098_0001
[0193] The Cas9 circular permutants that may be useful in the base editor constructs described herein. Exemplary C-terminal fragments of Cas9, based on the Cas9 of SEQ ID NO: 9, which may be rearranged to an N-terminus of Cas9, are provided below. It should be appreciated that such C-terminal fragments of Cas9 are exemplary and are not meant to be limiting. These exemplary CP-Cas9 fragments have the following sequences:
Figure imgf000099_0001
I. Cas9 variants with modified PAM specificities
[0194] The base editors of the present disclosure may also comprise Cas9 variants with modified PAM specificities. Some aspects of this disclosure provide Cas9 proteins that exhibit activity on a target sequence that does not comprise the canonical PAM (5'-NGG-3', where N is A, C, G, or T) at its 3 '-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5'-NGG-3' PAM sequence at its 3 '-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5 -NNG- 3' PAM sequence at its 3 '-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5'-NNA-3' PAM sequence at its 3 '-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5'-NNC-3' PAM sequence at its 3 '-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5 -NNT-3' PAM sequence at its 3'-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5 -NGT-3' PAM sequence at its 3'-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5 -NGA-3' PAM sequence at its 3'-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5 -NGC-3' PAM sequence at its 3'-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5'- NAA-3' PAM sequence at its 3 -end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5 -NAC-3' PAM sequence at its 3 '-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5 -NAT-3' PAM sequence at its 3 -end. In still other embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5 -NAG-3' PAM sequence at its 3 -end.
[0195] In some embodiments, the disclosed base editors comprise a napDNAbp domain comprising a SpCas9-NG, which has a PAM that corresponds to NGN. In some
embodiments, the disclosed base editors comprise a napDNAbp domain comprising a SpCas9-KKH, which has a PAM that corresponds to NNNRRT (SEQ ID NO: 160).
[0196] It should be appreciated that any of the amino acid mutations described herein, (e.g., A262T) from a first amino acid residue (e.g., A) to a second amino acid residue (e.g., T) may also include mutations from the first amino acid residue to an amino acid residue that is similar to (e.g., conserved) the second amino acid residue. For example, mutation of an amino acid with a hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan) may be a mutation to a second amino acid with a different hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan). For example, a mutation of an alanine to a threonine (e.g., a A262T mutation) may also be a mutation from an alanine to an amino acid that is similar in size and chemical properties to a threonine, for example, serine. As another example, mutation of an amino acid with a positively charged side chain (e.g., arginine, histidine, or lysine) may be a mutation to a second amino acid with a different positively charged side chain (e.g., arginine, histidine, or lysine). As another example, mutation of an amino acid with a polar side chain (e.g., serine, threonine, asparagine, or glutamine) may be a mutation to a second amino acid with a different polar side chain (e.g., serine, threonine, asparagine, or glutamine). Additional similar amino acid pairs include, but are not limited to, the following: phenylalanine and tyrosine; asparagine and glutamine; methionine and cysteine; aspartic acid and glutamic acid; and arginine and lysine. The skilled artisan would recognize that such conservative amino acid substitutions will likely have minor effects on protein structure and are likely to be well tolerated without compromising function. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a threonine may be an amino acid mutation to a serine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to an arginine may be an amino acid mutation to a lysine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to an isoleucine, may be an amino acid mutation to an alanine, valine, methionine, or leucine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a lysine may be an amino acid mutation to an arginine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to an aspartic acid may be an amino acid mutation to a glutamic acid or asparagine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a valine may be an amino acid mutation to an alanine, isoleucine, methionine, or leucine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a glycine may be an amino acid mutation to an alanine. It should be appreciated, however, that additional conserved amino acid residues would be recognized by the skilled artisan and any of the amino acid mutations to other conserved amino acid residues are also within the scope of this disclosure.
[0197] In some embodiments, the present disclosure may utilize any of the Cas9 variants disclosed in the SEQUENCES section herein.
[0198] In some embodiments, the Cas9 protein comprises a combination of mutations that exhibit activity on a target sequence comprising a 5 -NAA-3' PAM sequence at its 3 - end. In some embodiments, the combination of mutations are present in any one of the clones listed in Table 1. In some embodiments, the combination of mutations are conservative mutations of the clones listed in Table 1. In some embodiments, the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 1.
Table 1: NAA PAM Clones
Figure imgf000101_0001
Figure imgf000102_0001
[0199] In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 1. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 1.
[0200] In some embodiments, the Cas9 protein exhibits an increased activity on a target sequence that does not comprise the canonical PAM (5'-NGG-3') at its 3' end as compared to Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 9. In some embodiments, the Cas9 protein exhibits an activity on a target sequence having a 3' end that is not directly adjacent to the canonical PAM sequence (5'-NGG-3') that is at least 5-fold increased as compared to the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 9 on the same target sequence. In some embodiments, the Cas9 protein exhibits an activity on a target sequence that is not directly adjacent to the canonical PAM sequence (5'-NGG-3') that is at least 10- fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000- fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of Streptococcus pyogenes as provided by SEQ ID NO: 9 on the same target sequence. In some embodiments, the 3' end of the target sequence is directly adjacent to an AAA, GAA, CAA, or TAA sequence. In some embodiments, the Cas9 protein comprises a combination of mutations that exhibit activity on a target sequence comprising a 5 -NAC-3' PAM sequence at its 3 '-end. In some
embodiments, the combination of mutations are present in any one of the clones listed in Table 2. In some embodiments, the combination of mutations are conservative mutations of the clones listed in Table 2. In some embodiments, the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 2.
Table 2: NAC PAM Clones
Figure imgf000103_0001
Figure imgf000104_0001
Figure imgf000105_0001
[0201] In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 2. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 2.
[0202] In some embodiments, the Cas9 protein comprises a combination of mutations that exhibit activity on a target sequence comprising a 5'-NAT-3' PAM sequence at its 3 '-end. In some embodiments, the combination of mutations are present in any one of the clones listed in Table 3. In some embodiments, the combination of mutations are conservative mutations of the clones listed in Table 3. In some embodiments, the Cas9 protein comprises the combination of mutations of any one of the Cas9 clones listed in Table 3.
Table 3: NAT PAM Clones
Figure imgf000105_0002
Figure imgf000106_0001
[0203] The above description of various napDNAbps which can be used in connection with the presently disclose base editors is not meant to be limiting in any way. The base editors may comprise the canonical SpCas9, or any ortholog Cas9 protein, or any variant Cas9 protein— including any naturally occurring variant, mutant, or otherwise engineered version of Cas9— that is known or which can be made or evolved through a directed evolutionary or otherwise mutagenic process. In various embodiments, the Cas9 or Cas9 varants have a nickase activity, i.e., only cleave of strand of the target DNA sequence. In other
embodiments, the Cas9 or Cas9 variants have inactive nucleases, i.e., are“dead” Cas9 proteins. Other variant Cas9 proteins that may be used are those having a smaller molecular weight than the canonical SpCas9 (e.g., for easier delivery) or having modified or rearranged primary amino acid structure (e.g., the circular permutant formats). The base editors described herein may also comprise Cas9 equivalents, including Casl2a/Cpfl and Casl2b proteins which are the result of convergent evolution. The napDNAbps used herein (e.g., SpCas9, Cas9 variant, or Cas9 equivalents) may also may also contain various modifications that alter/enhance their PAM specifities. Lastly, the application contemplates any Cas9, Cas9 variant, or Cas9 equivalent which has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% sequence identity to a reference Cas9 sequence, such as a references SpCas9 canonical sequences or a reference Cas9 equivalent (e.g., Casl2a/Cpfl).
[0204] In a particular embodiment, the Cas9 variant having expanded PAM capabilities is SpCas9 (H840A) VRQR, or SpCas9-VRQR. In some embodiments, the disclosed base editors comprise a napDNAbp domain that has a sequence that is at least 90%, at least 95%, at least 98%, or at least 99% identical to SpCas9-VRQR. In some embodiments, the disclosed base editors comprise a napDNAbp domain that comprises SpCas9-VRQR. The SpCas9- VRQR comprises the following amino acid sequence (with the V, R, Q, R substitutions relative to the SpCas9 (H840A) of SEQ ID NO: 158 show, in bold underline. In addition, the methionine residue in SpCas9 (H840) was removed for SpCas9 (H840A) VRQR):
Figure imgf000107_0001
[0205] In another particular embodiment, the Cas9 variant having expanded PAM
capabilities is SpCas9 (H840A) VRER, having the following amino acid sequence (with the V, R, E, R substitutions relative to the SpCas9 (H840A) of SEQ ID NO: 159 are shown in bold underline. In addition, the methionine residue in SpCas9 (H840) was removed for SpCas9 (H840A) VRER):
Figure imgf000107_0002
Figure imgf000108_0001
[0206] In addition, any available methods may be utilized to obtain or construct a variant or mutant Cas9 protein. The term“mutation,” as used herein, refers to a substitution of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue, or a deletion or insertion of one or more residues within a sequence. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue. Various methods for making the amino acid substitutions (mutations) provided herein are well known in the art, and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)). Mutations can include a variety of categories, such as single base polymorphisms, microduplication regions, indel, and inversions, and is not meant to be limiting in any way. Mutations can include“loss-of-function” mutations which is the normal result of a mutation that reduces or abolishes a protein activity. Most loss-of-function mutations are recessive, because in a heterozygote the second chromosome copy carries an unmutated version of the gene coding for a fully functional protein whose presence compensates for the effect of the mutation. Mutations also embrace“gain-of-function” mutations, which is one which confers an abnormal activity on a protein or cell that is otherwise not present in a normal condition. Many gain-of-function mutations are in regulatory sequences rather than in coding regions, and can therefore have a number of consequences. For example, a mutation might lead to one or more genes being expressed in the wrong tissues, these tissues gaining functions that they normally lack. Because of their nature, gain-of-function mutations are usually dominant.
[0207] Mutations can be introduced into a reference Cas9 protein using site-directed mutagenesis. Older methods of site-directed mutagenesis known in the art rely on sub cloning of the sequence to be mutated into a vector, such as an M13 bacteriophage vector, that allows the isolation of single-stranded DNA template. In these methods, one anneals a mutagenic primer (i.e., a primer capable of annealing to the site to be mutated but bearing one or more mismatched nucleotides at the site to be mutated) to the single-stranded template and then polymerizes the complement of the template starting from the 3 ' end of the mutagenic primer. The resulting duplexes are then transformed into host bacteria and plaques are screened for the desired mutation. More recently, site-directed mutagenesis has employed PCR methodologies, which have the advantage of not requiring a single-stranded template. In addition, methods have been developed that do not require sub-cloning. Several issues must be considered when PCR-based site-directed mutagenesis is performed. First, in these methods it is desirable to reduce the number of PCR cycles to prevent expansion of undesired mutations introduced by the polymerase. Second, a selection must be employed in order to reduce the number of non-mutated parental molecules persisting in the reaction. Third, an extended-length PCR method is preferred in order to allow the use of a single PCR primer set. And fourth, because of the non-template-dependent terminal extension activity of some thermostable polymerases it is often necessary to incorporate an end-polishing step into the procedure prior to blunt-end ligation of the PCR-generated mutant product.
[0208] Any of the references noted above which relate to napDNAbp domains are hereby incorporated by reference in their entireties, if not already stated so.
II. Adenine oxidases
[0209] In various embodiments, the ACBE and TGBE transversion base editors provided herein comprise an adenine oxidase nucleobase modification domain (FIG. 1). An adenine oxidase is an enzyme that has catalytic activity in oxidizing an adenosine nucleobase substrate. Oxidation reactions catalyzed by the exemplary enzymes of the present disclosure may comprise transfers of oxo (=0) substituents to the adenosine nucleobase, which creates an aldehyde, 8-oxoadenine. Exemplary oxidases of this disclosure catalyze oxidation reactions at the 8 position of adenosine. The 8 position of adenine is the most readily oxidized position on the nucleobase. See Saladino, R. et al., A new and efficient synthesis of 8-hydroxypurine derivatives by dimethyldioxirane oxidation, Tet. Lett. (1995) 36: 2665-2668; Chang, W.-C. et al, Mechanistic Investigation of a Non-Heme Iron Enzyme Catalyzed Epoxidation in (-)-4’-Methoxycyclopenin Biosynthesis, J. Am. Chem. Soc. (2016) 138(33): 10390-10393, the entire contents of each of which is herein incorporated by reference.
[0210] The adenine oxidases of the present disclosure may be modified from wild-type reference proteins, which include 5-methylcytosine, L0 -mcthyladcnosinc and xanthine modification enzymes. Other modification enzymes that may serve as reference proteins are N4-acetylcytosine- and 2-thiocytosine-installing RNA-modification enzymes. See Ito, S. et al. Human NAT 10 Is an ATP-dependent RNA Acetyltransferase Responsible for N4- Acetylcytidine Formation in 18 S Ribosomal RNA (rRNA). J. Biol. Chem. 2014, 289, 35724-35730; and Cavuzic, V.; Liu, Y., Biosynthesis of Sulfur-Containing tRNA
Modifications: A Comparison of Bacterial, Archaeal, and Eukaryotic Pathways. Biomolecules 2017, 7, 27, each of which is herein incorporated by reference. Wild-type reference proteins may be those from E. coli, S. cyanogenus, yeast, mouse, human, or another organism, including other bacteria. See also Falnes, P. 0.; Rognes, T. DNA repair by bacterial AlkB proteins, Res. Microbiol. (2003) 154(8): 531-538; Ito, S. et al, Tet proteins can convert 5- methylcytosine to 5-formylcytosine and 5-carboxylcytosine, Science (2011) 333(6047):
1300-1303; Fortini, P. et al, 8-Oxoguanine DNA damage: at the crossroad of alternative repair pathways, Mutat. Res. (2003) 531(1-2): 127-39; Leonard, G. A. et al., Conformation of guanine-8-oxoadenine base pairs in the crystal structure of d(CGCGAATT(08A)GCG), Biochem. (1992) 31(36): 8415-8420; Ohe, T. & Watanabe, Y. Purification and Properties of Xanthine Dehydrogenase from Streptomyces cyanogenus, J. Biochem. 86:45-53, (1979), the entire contents of each of which is herein incorporated by reference.
[0211] Modified adenine oxidases include variants with at least 80%, at least 85%, at least 90%, at least 95% or at least 99% sequence identity to a wild-type adenine oxidase. In other embodiments, modified adenine oxidases may be obtained by altering or evolving a reference protein using a continuous evolution process (e.g., PACE) or non-continuous evolution process (e.g., PANCE or discrete plate -based selections) described herein so that the oxidase is effective on a nucleic acid target. 8-oxopurines, common products of oxidative DNA damage, tend to rotate around the glycosidic bond to adopt the syn conformation, presenting the Hoogsteen edge for base pairing. The Hoogsteen edge of 8-oxoA and the Watson-Crick edge of G form a base pair featuring two three-center hydrogen bonding systems (FIG. 2). The 8-oxoA:G pair makes a minimal perturbation to the DNA double helix. Consequently, polymerases misread 8-oxoA and pair it with G, eventually resulting in an A:T to C:G transversion mutation. See Kamiya, H. et al., 8 -Hydroxy adenine (7,8-dihydro-8-oxoadenine) induces misincorporation in in vitro DNA synthesis and mutations in NIH 3T3 cells, Nucleic Acids Res. (1995) 23(15): 2893-2895; Tan, X., Grollman, A. P., & Shibutani, S., Comparison of the mutagenic properties of 8-oxo-7,8-dihydro-2'-deoxyadenosine and 8-oxo-7,8-dihydro- 2'-deoxyguanosine DNA lesions in mammalian cells, Carcinogenesis (1999) 20(12): 2287- 2292; Leonard, G. A. et al., Conformation of guanine-8-oxoadenine base pairs in the crystal structure of d(CGCGAATT(08A)GCG), Biochem. (1992) 31(36): 8415-8420, the entire contents of each of which is herein incorporated by reference.
[0212] Exemplary adenine oxidases include, but are not limited to, a-ketoglutarate-dependent iron oxidases, molybdopterin-dependent oxidases, heme iron oxidases, and flavin
monooxygenases. See Rashidi, M. R. & Soltani, S., An overview of aldehyde oxidase: an enzyme of emerging importance in novel drug discovery, Expert Opin. Drug Discov. (2017) 12(3): 305-316; Coon, M. J., Cytochrome P450: nature’s most versatile biological catalyst, Annu. Rev. Pharmacol. Taxicol. (2005) 45: 1-25; Eswaramoorthy, S. et al, Mechanism of action of a flavin-containing monooxygenase, Proc. Natl. Acad. Sci. (2006) 103(26): 9832- 9837, the entire contents of each of which is herein incorporated by reference.
[0213] Exemplary a-ketoglutarate-dependent iron oxidases include AlkbH (ABH) family oxidases, which include human AlkBH3, is to clear /Vl-methylation from adenine in DNA and RNA. These non-heme enzymes perform methyl group C-H hydroxylation on DNA and RNA via an active Fe(IV)-oxo intermediate formed through an iron cofactor. The resulting hemiaminal breaks down to release formaldehyde and the demethylated adenine base. ABH3 is selective for ssDNA over dsDNA, a characteristic of exocyclic amine hydrolyzing enzymes that likely contributes to the selective modification of bases in the targeted ssDNA loop of the ternary Cas9-sgRNA-DNA complex. The TET oxidases are structurally related a-ketoglutarate-dependent iron oxidases and perform C-H hydroxylation on 5-methylcytosine as the first step in removing this important epigenetic marker. Oxidized forms of 5-methylcytosine are recognized by DNA glycosylases and hydrolytically removed, to be replaced eventually by unmethylated cytosine. Without being bound by a particular theory, in the absence of a labile C-H bond substrate, the Fe(IV)-oxo species of the cofactor- enzyme may be induced to transfer the oxo group from the non-heme Fe(IV) center to the 8 position of adenine. This potential mechanism involves the formation of a 7,8-oxaziridine intermediate, which rearranges spontaneously to the desired 8-oxoadenine (FIG. 3).
[0214] Exemplary molybdopterin-dependent oxidases that selectively oxidize adenine at the 8 position include xanthine dehydrogenases and aldehyde oxidases. In eukaryotes, these enzymes utilize a monophosphate pyranopterin cofactor, which complexes with a
molybdenum to form molybdenum cofactor (Moco). These oxidases may effect alkene/arene epoxidation reactions in natural product biosynthesis pathways via similar oxo group transfer mechanisms as those of the non-heme ABH and TET iron oxidases.
[0215] Exemplary heme iron oxidases that selectively oxidize adenine at the 8 position include cytochrome P450 enzymes.
[0216] Some exemplary adenine oxidase domains that can be fused to napDNAbp domains according to embodiments of this disclosure are provided below. Exemplary adenine oxidase domains include variants with at least 80%, at least 85%, at least 90%, at least 95% or at least 99% sequence identity to the following wild-type enzymes:
AlkBHl (human)
[0217] MGKM A A A V GS V ATL ATEPGED AFRKLFRF YRQS RPGT ADLEG VIDFS A AH A ARGKGPG AQKVIKS QLN V S S V S EQN A YRAGLQP V S KW Q A Y GLKG YPGFIFIPNPFLP GYQWHWVKQCLKLYSQKPNVCNLDKHMSKEETQDLWEQSKEFLRYKEATKRRPR S LLEKLRW VT V G YH YNWDS KKY S ADH YTPFPS DLGFLS EQ V A A AC GFEDFR AE AGI LN Y YRLDS TLGIH VDRS ELDHS KPLLS FS F GQS AIFLLGGLQRDE APT AMFMHS GDIM IMSGFSRLLNHAVPRVLPNPEGEGLPHCLEAPLPAVLPRDSMVEPCSMEDWQVCASY LKTARVNMTVRQVLATDQNFPLEPIEDEKRDISTEGFCHLDDQNSEVKRARINPDS
(SEQ ID NO: 22)
AlkBH2 (human)
[0218] MDRFLVKGAQGGLLRKQEEQEPTGEEPAVLGGDKESTRKRPRREAPGNGGH S AGPS WRHIR AEGLDC S YT VLFGKAE ADEIF QELEKE VE YFT G ALAR V Q VF GKWHS VPRKQATYGDAGLTYTFSGLTLSPKPWIPVLERIRDHVSGVTGQTFNFVLINRYKDG CDHIGEHRDDERELAPGS PIAS VS F G ACRDF VFRHKDS RGKS PS RRV A V VRLPLAHGS LLMMNHPTNTHWYHSLPVRKKVLAPRVNLTFRKILLTKK (SEQ ID NO: 23)
AlkBH3 (human):
[0219] MEEKRRR ARV QG A W A AP VKS Q AIAQP ATT AKS HLHQKPGQT WKNKEHHLS DREFVFKEPQQVVRRAPEPRVIDREGVYEISLSPTGVSRVCLYPGFVDVKEADWILEQ LCQDVPWKQRTGIREDITY QQPRLTAWY GELPYTYSRITMEPNPHWHPVLRTLKNRI EENTGHTFN S LLCNLYRNEKDS VD WHS DDEPS LGRCPIIAS LS F G ATRTFEMRKKPPP EENGDYTYVERVKIPLDHGTLLIMEGATQADWQHRVPKEYHSREPRVNLTFRTVYP DPRGAPW (SEQ ID NO: 24)
AlkBH4 (human)
[0220] MAAAAAETPEVLRECGCKGIRTCLICERQRGSDPPWELPPAKTYRFIYCSDTG W A V GTEES DFEGW AFPFPG VMLIEDF VTREEE AELVRLMDRDPWKLS QS GRRKQD Y GPKVNFRKQKLKTEGFCGLPSFSREVVRRMGLYPGLEGFRPVEQCNLDYCPERGSAI DPHLDD A WLW GERLV S LNLLS PT VLS MCRE APGS LLLC S APS A APE ALVDS VIAPS R S VLCQE VE V AIPLP ARS LLVLT G A ARHQWKH AIHRRHIE ARRVC VTFRELS AEF GPG GRQQELGQELLRIALS FQGRPV (SEQ ID NO: 25)
AlkBH5 (human)
[0221] M AA AS G YTDLREKLKS MTS RDN YKAGS RE A A A A A A AAV A A A A A AAA A AE PYPV S GAKRKY QEDSDPERSD YEEQQLQKEEEARKVKS GIRQMRLFS QDEC AKIE AR IDEVVSRAEKGLYNEHTVDRAPLRNKYFFGEGYTYGAQLQKRGPGQERLYPPGDVD EIPEWVHQLVIQKLVEHRVIPEGFVNSAVINDYQPGGCIVSHVDPIHIFERPIVSVSFFS DSALCFGCKFQFKPIRVSEPVLSLPVRRGSVTVLSGYAADEITHCIRPQDIKERRAVIIL RKTRLD APRLETKS LS S S VLPPS Y AS DRLS GNNRDP ALKPKRS HRKADPD A AHRPRIL EMDKEENRRS VLLPTHRRRGS FS S EN YWRKS YES S EDC S E A AGS P ARKVKMRRH
(SEQ ID NO: 26) AlkBH6 (human)
[0222] MEEQDARVPALEPFRVEQAPPVIYYVPDFISKEEEEYLLRQVFNAPKPKWTQL S GRKLQNW GGLPHPRGMVPERLPPWLQRY VDKVSNLS LFGGLPANHVLVN Q YLPG EGIMPHEDGPLY YPT V S TIS LGS HT VLDFYEPRRPEDDDPTEQPRPPPRPTT S LLLEPRS LLVLRGP A YTRLLHGI A A ARVD ALD A AS S PPN A A ACPS ARPG ACLVRGTRV S LTIRR VPRVLRAGLLLGK (SEQ ID NO: 27)
AlkBH7 (human)
[0223] M AGT GLLALRTLPGPS W VRGS GPS VLS RLQD A A V VRPGFLS T AEEETLS RELE PELRRRRYE YDHWD A AIHGFRETEKS RW SEAS RAILQRV Q A A AF GPGQTLLS S VH VL DLEARGYIKPHVDSIKFCGATIAGLSLLSPSVMRLVHTQEPGEWLELLLEPGSLYILRG SARYDFSHEILRDEESFFGERRIPRGRRISVICRSLPEGMGPGESGQPPPAC (SEQ ID NO: 28)
AlkBH8 (human)
[0224] MDS NHQS N YKLS KTEKKFLRKQIKAKHTLLRHEGIET V S Y ATQS LV V AN GGL GN GVSRN QLLPVLEKCGLVD ALLMPPNKPYSFARYRTTEES KRA YVTLN GKEVVDD LGQKITLYLNF VEKV QWKELRPQ ALPPGLM V VEEIIS S EEEKMLLES VD WTEDTDN Q NSQKSLKHRRVKHFGYEFHYENNNVDKDKPLSGGLPDICESFLEKWLRKGYIKHKP DQMTINQYEPGQGIPAHIDTHSAFEDEIVSLSLGSEIVMDFKHPDGIAVPVMLPRRSLL VMT GES R YLWTHGITCRKFDT V Q AS ES LKS GUTS D V GDLTLS KRGLRT S FTFRKVRQ TPCNCSYPLVCDSQRKETPPSFPESDKEASRLEQEYVHQVYEEIAGHFSSTRHTPWPH IVEFLKALPSGSIVADIGCGNGKYLGINKELYMIGCDRSQNLVDICRERQFQAFVCDA LA VP VRS GS CD AC IS I A VIHHFAT AERRV A ALQEIVRLLRPGGKALIY VW AMEQE YN KQKSKYLRGNRNSQGKKEEMNSDTSVQRSLVEQMRDMGSRDSASSVPRINDSQEG GCN S RQ V S NS KLP VH VNRTS F Y S QD VLVPWHLKGNPDKGKP VEPF GPIGS QDPS P VF HRYYHVFREGELEGACRTVSDVRILQSYYDQGNWCVILQKA (SEQ ID NO: 29)
FTO (human)
[0225] MKRTPT AEERERE AKKLRLLEELEDTWLPYLTPKDDEFY QQW QLKYPKLILR
EASSVSEELHKEVQEAFLTLHKHGCLFRDLVRIQGKDLLTPVSRILIGNPGCTYKYLN
TRLFTVPWPVKGSNIKHTEAEIAAACETFLKLNDYLQIETIQALEELAAKEKANEDAV
PLCMSADFPRVGMGSSYNGQDEVDIKSRAAYNVTLLNFMDPQKMPYLKEEPYFGM
GKMAVSWHHDENLVDRSAVAVYSYSCEGPEEESEDDSHLEGRDPDIWHVGFKISW
DIETPGLAIPLHQGDCYFMLDDLNATHQHCVLAGSQPRFSSTHRVAECSTGTLDYILQ
RC QLALQN V CDD VDNDD V S LKS FEP A VLKQGEEIHNE VEFE WLRQFWFQGNRYRK
CTDWWCQPMAQLEALWKKMEGVTNAVLHEVKREGLPVEQRNEILTAILASLTARQ NLRREWH ARC QS RIARTLP ADQKPECRP YWEKDD AS MPLPFDLTDI V S ELRGQLLE A KP (SEQ ID NO: 30)
E. coli AlkB
[0226] MLDLFADAEPWQEPLAAGAVILRRFAFNAAEQLIRDINDVASQSPFRQMVTP GG YTMS V AMTNCGHLGWTTHRQG YLY S PIDPQTNKPWP AMPQS FHNLCQR A AT A A GYPDFQPDACLINRYAPGAKLSLHQDKDEPDLRAPIVSVSLGLPAIFQFGGLKRNDPL KRLLLEHGDVVVWGGESRLFYHGIQPLKAGFHPLTIDCRYNLTFRQAGKKE (SEQ ID NO: 40)
XDH (human)
[0227] MTADKLVFFVNGRKVVEKNADPETTLLAYLRRKLGLSGTKLGCGEGGCGAC TVMLSKYDRLQNKIVHFSANACLAPICSLHHVAVTTVEGIGSTKTRLHPVQERIAKSH GSQCGFCTPGIVMSMYTLLRNQPEPTMEEIENAFQGNLCRCTGYRPILQGFRTFARD GGCC GGDGNNPN CCMN QKKDHS VS LS PS LFKPEEFTPLDPT QEPIFPPELLRLKDTPR KQLRFEGERVTWIQASTLKELLDLKAQHPDAKLVVGNTEIGIEMKFKNMLFPMIVCP AWIPELNSVEHGPDGISFGAACPLSIVEKTLVDAVAKLPAQKTEVFRGVLEQLRWFA GKQ VKS V AS V GGNIIT AS PIS DLNP VFM AS G AKLTLV S RGTRRT V QMDHTFFPG YRK TLLS PEEILLS IEIP Y S REGE YFS AFKQ AS RREDDIAKVT S GMRVLFKPGTTE V QELALC YGGMANRTISALKTTQRQLSKLWKEELLQDVCAGLAEELHLPPDAPGGMVDFRCTL TLSFFFKFYLTVLQKLGQENLEDKCGKLDPTFASATLLFQKDPPADVQLFQEVPKGQ S EEDM V GRPLPHLA ADMQ AS GE A V Y CDDIPR YENELS LRLVT S TR AH AKIKS IDTS E AKKVPGFVCFISADDVPGSNITGICNDETVFAKDKVTCVGHIIGAVVADTPEHTQRAA QG VKIT YEELP AIITIED AIKNN S F Y GPELKIE KGDLKKGFS E ADN V V S GEIYIGGQEHF YLETHCTIAVPKGEAGEMELFVSTQNTMKTQSFVAKMLGVPANRIVVRVKRMGGGF GGKETRS T V V S T A V ALA A YKT GRP VRCMLDRDEDMLITGGRHPFLARYK V GFMKT GT V V ALE VDHFS N V GNT QDLS QS IMERALFHMDNC YKIPNIRGT GRLCKTNLPS NT A FRGFGGPQGMLIAECWMSEVAVTCGMPAEEVRRKNLYKEGDLTHFNQKLEGFTLPR CWEECLASSQYHARKSEVDKFNKENCWKKRGLCIIPTKFGISFTVPFLNQAGALLHV YTDGS VLLTHGGTEMGQGLHTKM V Q V AS R ALKIPT S KIYIS ETS TNT VPNTSPT A AS V S ADLN GQ A V Y A ACQTILKRLEP YKKKNPS GS WED W VT A A YMDT V S LS AT GF YRTP NLG Y S FETN S GNPFH YF S Y G V ACS E VEIDCLT GDHKNLRTDIVMD V GS S LNPAIDIGQ VEG AF V QGLGLFTLEELH Y S PEGS LHTRGPS T YKIP AF GS IPIEFRV S LLRDCPNKKAIY ASKAVGEPPLFLAASIFFAIKDAIRAARAQHTGNNVKELFRLDSPATPEKIRNACVDK FTTLC VT G VPEN C KPW S VRV (SEQ ID NO: 31)
AOX1 (human) [0228] MDRASELLFYVNGRKVIEKNVDPETMLLPYLRKKLRLTGTKYGCGGGGCGA CT VMIS R YNPITKRIRHHP AN ACLIPIC S LY G A A VTT VEGIGS THTRIHP V QERIAKCHG TQCGFCTPGM VMS IYTLLRNHPEPTLDQLTD ALGGNLCRCTGYRPIID ACKTFCKT S G CCQSKENGVCCLDQGINGLPEFEEGSKTSPKLFAEEEFLPLDPTQELIFPPELMIMAEK QS QRTRVF GS ERMM WF S P VTLKELLEFKFKYPQ AP VIMGNTS V GPE VKFKG VFHP VI IS PDRIEELS V VNH A YN GLTLG AGLS LAQ VKDILAD V V QKLPEEKT QM YH ALLKHLG TLAGSQIRNMASLGGHIISRHPDSDLNPILAVGNCTLNLLSKEGKRQIPLNEQFLSKCP N ADLKPQEILV S VNIP Y S RKWEFV S AFRQ AQRQEN ALAIVN S GMRVFFGEGDGIIREL CIS Y GG V GPATIC AKNSCQKLIGRHWNEQMLDIACRLILNEV S LLGS APGGKVEFKRT LIIS FLFKF YLE V S QILKKMDP VH YPS LADKYES ALEDLHS KHHC S TLK Y QNIGPKQH PEDPIGHPIMHLS GVKHATGE AIY CDDMPLVDQELFLTFVT S SRAH AKIV S IDLSE ALS MPGVVDIMTAEHLSDVNSFCFFTEAEKFLATDKVFCVGQLVCAVLADSEVQAKRAA KRVKIVYQDLEPLILTIEESIQHNSSFKPERKLEYGNVDEAFKVVDQILEGEIHMGGQE HFYMETQSMLVVPKGEDQEMDVYVSTQFPKYIQDIVASTLKLPANKVMCHVRRVG GAFGGKVLKTGIIAAVTAFAANKHGRAVRCVLERGEDMLITGGRHPYLGKYKAGF MNDGRILALDMEHYSNAGASLDESLFVIEMGLLKMDNAYKFPNLRCRGWACRTNL PS NT AFRGF GFPQ A ALITES CITE V A AKC GLS PEKVRIINM YKEID QTP YKQEIN AKNLI QC WRECM AMS SYS LRKV A VEKFN AEN YWKKKGLAM VPLKFP V GLGS RA AGQ AA A LVHIYLD GS VLVTHGGIEMGQG VHTKMIQ V V S RELRMPMS N VHLRGT S TET VPN AN IS GGS V V ADLN GLA VKD AC QTLLKRLEPIIS KNPKGT WKD W AQT AFDES INLS A V GY FRGYESDMNWEKGEGQPFEYFVYGAACSEVEIDCLTGDHKNIRTDIVMDVGCSINPA IDIGQIEGAFIQGMGLYTIEELNYSPQGILHTRGPDQYKIPAICDMPTELHIALLPPSQN SNTLY S S KGLGES GVFLGCS VFFAIHD A VS AARQERGLHGPLTLNSPLTPEKIRMACE DKFTKMIPRDEPGSYVPWNVPI (SEQ ID NO: 41)
S. cyanogenus XDH (“ScXDH”)
[0229] MSHLSERPEKPVV GVSMPHES A V QHVTGAALYTDDLV QRTKD VLHA YPV Q VMKARGRVTALRTGAALAVPGVVRVLTGADVPGVNDAGMKHDEPLFPDEVMFHG HAVAWVLGETLEAARIGAAAVEVDLEELPSVITLQDAIAADSYHGARPVMTHGDVD AGFADS AH VFTGEF QF S GQEHFYLETH A ALAQ VDEN GQ VFIQS S TQHPS ETQEIVS H VLGVPAHEVTVQCLRMGGGFGGKEMQPHGFAAIAALGAKLTGRPVRFRLNRTQDL TMS GKRHGFHATWKIGFDTEGRIQ ALD ATLT ADGGW S LDLSEPVLARALCHIDNT Y WIPNARVAGRIARTNTVSNTAFRGFGGPQGMLVIEDILGRCAPRLGVDAKELRERNF YRPGQGQTTPY GQPVTQPERIAA VW QQV QDNGHIADREREIAAFN AAHPHTKRALA VTGVKFGISFNLTAFNQGGALVLIYKDGSVLINHGGTEMGQGLHTKMLQVAATTLGI PLHKVRLAPTRTDK VPNT S AT A AS S G ADLN GG A VKN ACEQLRERLLR V A AS QLGTN AS D VRIVEG V ARS LGS DQELA WDDLVRT A YF QRV QLS A AG Y YRTEGLHWD AKS FR GSPFKYFAIGAAATEVEVDGFTGAYRIRRVDIVHDVGDSLSPLIDIGQVEGGFVQGAG WLTLEDLRWDTGDGPNRGRLLTQAASTYKLPSFSEMPEEFNVTLLENATEEGAVFGS KA V GEPPLMLAFS VRE ALRQ A A A AF GPRGT A VELAS P ATPE A V YW AIES ARQGGT A GDGRTHG A A AS D A V A VRT G VEALS G A (SEQ ID NO: 5)
C. capitata XDH
[0230] MTTNGNSFIVPVEKESPLIFFVNGKKVIDPTPDPECTLLTYLREKLRLCGTKLG CGEGGCGACTVMLSRVDRATNSVKHLAVNACLMPVCAMHGCAVTTIEGIGSTRTRL HP V QERLAKAHGS QC GFCTPGIVMS M Y ALLRS MPLPS MKDLE V AF QGNLCRCTG YR PILEGYKTFTKEFSCGMGEKCCKLQSNGNDVEKNGDDKLFERSAFLPFDPSQEPIFPP ELHLN S QFD AENLLFKGPRS TW YRP VELS DLLKLKS ENPHGKIIV GNTE V G VEMKFK QFLYT VHINPIKVPELNEMQELEDS ILF GS A VTLMDIEE YLRERIAKLPEHETRFFRC A VKMLH YF AGKQIRN V AS LGGNIMT GS PIS DMNPILT A AC AKLKV C S LVEGRIETRE V CMGPGFFTGYRKNTIQPHEVLVAIHFPKSKKDQHFVAFKQARRRDDDIAIVNAAVNV TFES NTNIVRQIYM AF GGM APTT VM VPKT S QIM AKQKWNRVLVERV S ES LC AELPL APT APGGMIA YRRS LV V S LFFKA YLAIS QEL VKS N VIEED AIPEREQS G A AIFHTPILKS AQLFERVCVEQSTCDPIGRPKVHASAFKQATGEAIYCDDIPRHENELYLALVLSTKAH AKIV S VDES D ALKQ AG VH AFF S S KDITE YENKV GS VFHDEE VF AS ERV Y C QGQ VIG A IVADSQVLAQRAARLVHIKYEELTPVIITIEQAIKHKSYFPNYPQYIVQGDVATAFEEA DHVYENSCRMGGQEHFYLETNACVATPRDSDEIELFCSTQNPTEVQKLVAHVLSVPC HRVVCRSKRLGGGFGGKESRSIILALPVALASYRLRRPVRCMLDRDEDMMTTGTRH PFLFKYKV GFTKEGLIT ACDIEC YNN AGC S MDLS FS VLDR AMNHFENC YRIPN VK V A GWVCRTNLPSNTAFRGFGGPQGMFAAEHIVRDVARIVGKDYLDIMQMNFYKTGDY THYN QKLENFPIEKCFTDCLN QSEFHKKRLAIEEFNKKNRWRKRGIALVPTKY GIAFG AMHLNQAGALINIYGDGSVLLSHGGVEIGQGLHTKMIQCCARALGIPTELIHIAETAT DKVPNTSPTAASVGSDINGMAVLDACEKLNQRLKPIREANPKATWQECISKAYFDRI S LS AS GFYKMPD V GDDPKTNPN ART YN YFTN G V G V S V VEIDCLTGDHQ VLS TDIVM DIGSSLNPAIDIGQIEGAFMQGYGLFVLEELIYSPQGALYSRGPGMYKLPGFADIPGEF N V S LLT G APNPR A V Y S S KA V GEPPLFIGS T VFFAIKQ AIA A AR AERGLS ITFELD AP AT AARIRMACQDEFTDLIEQPSPGTYTPWNVVP (SEQ ID NO: 6)
N. crass a XDH:
[0231] MTTNGNSFIVPVEKESPLIFFVNGKKVIDPTPDPECTLLTYLREKLRLCGTKLG CGEGGCGACTVMLSRVDRATNSVKHLAVNACLMPVCAMHGCAVTTIEGIGSTRTRL HP V QERLAKAHGS QC GFCTPGIVMS M Y ALLRS MPLPS MKDLE V AF QGNLCRCTG YR PILEGYKTFTKEFSCGMGEKCCKLQSNGNDVEKNGDDKLFERSAFLPFDPSQEPIFPP ELHLN S QFD AENLLFKGPRS TW YRP VELS DLLKLKS ENPHGKIIV GNTE V G VEMKFK QFLYT VHINPIKVPELNEMQELEDS ILF GS A VTLMDIEE YLRERIAKLPEHETRFFRC A VKMLH YF AGKQIRN V AS LGGNIMT GS PIS DMNPILT A AC AKLKV C S LVEGRIETRE V CMGPGFFTGYRKNTIQPHEVLVAIHFPKSKKDQHFVAFKQARRRDDDIAIVNAAVNV TFES NTNIVRQIYM AF GGM APTT VM VPKT S QIM AKQKWNRVLVERV S ES LC AELPL APT APGGMIA YRRS LV V S LFFKA YLAIS QEL VKS N VIEED AIPEREQS G A AIFHTPILKS AQLFERVCVEQSTCDPIGRPKVHASAFKQATGEAIYCDDIPRHENELYLALVLSTKAH AKIV S VDES D ALKQ AG VH AFF S S KDITE YENKV GS VFHDEE VF AS ERV Y C QGQ VIG A IVADSQVLAQRAARLVHIKYEELTPVIITIEQAIKHKSYFPNYPQYIVQGDVATAFEEA DHVYENSCRMGGQEHFYLETNACVATPRDSDEIELFCSTQNPTEVQKLVAHVLSVPC HRVVCRSKRLGGGFGGKESRSIILALPVALASYRLRRPVRCMLDRDEDMMTTGTRH PFLFKYKV GFTKEGLIT ACDIEC YNN AGC S MDLS FS VLDR AMNHFENC YRIPN VK V A GWVCRTNLPSNTAFRGFGGPQGMFAAEHIVRDVARIVGKDYLDIMQMNFYKTGDY THYN QKLENFPIEKCFTDCLN QSEFHKKRLAIEEFNKKNRWRKRGIALVPTKY GIAFG AMHLNQAGALINIYGDGSVLLSHGGVEIGQGLHTKMIQCCARALGIPTELIHIAETAT DKVPNTSPTAASVGSDINGMAVLDACEKLNQRLKPIREANPKATWQECISKAYFDRI S LS AS GFYKMPD V GDDPKTNPN ART YN YFTN G V G V S V VEIDCLTGDHQ VLS TDIVM DIGSSLNPAIDIGQIEGAFMQGYGLFVLEELIYSPQGALYSRGPGMYKLPGFADIPGEF N V S LLT G APNPR A V Y S S KA V GEPPLFIGS T VFFAIKQ AIA A AR AERGLS ITFELD AP AT AARIRMACQDEFTDLIEQPSPGTYTPWNVVP (SEQ ID NO: 7)
M. hansupus XDH:
[0232] MSNMFEFRLNGATVRVDGVSPNTTLLDFLRNRGLTGTKQGCAEGDCGACTV
ALVDRDAQGNRCLRAFNACIALVPMVAGRELVTVEGVGSSEKPHPVQQAMVKHYG
SQCGFCTPGFIVSMAEGYSRKDVCTPSSVADQLCGNLCRCTGYRPIRDAMMEALAE
RD AD AS PAT AIPS APLGGP AEPLS ALH YE AT GQTFLRPTS WKELLDLR ARHPE AHLV
AGATELGVDITKKARRFPFLISTEGVESLREVRREKDCWYVGGAASLVALEEALGDA
LPEVTKMLNVFASRQIRQRATLAGNLVTASPIGDMAPVLLALDARLVLGSVRGERTV
ALS EFFLA YRKT ALQ ADE V VRHIVIPHP A VPERGQRLS DS FK V S KRRELDIS IV A AGFR
VELD AHG V V S LARLG Y GG V A ATP VRA VR AE A ALT GQPWTRET VD Q VLP VLAEEITP
ISDQRGSAEYRRGLVAGLFEKFFAGTYSPVLDAAPGFEKGDAQVPADAGRALRHES
AMGHVTGSARYVDDLAQRQPMLEVWPVCAPHAHARILKRDPTAARKVPGVVRVL
MAEDIPGTNDTGPIRHDEPLLADREVLFHGQIVALVVGESVEACRAGARAVEVEYEP LP AILT VED AM AQGS YHTEPH VIRRGD VD A ALAS S PHRLS GTM AIGGQEHF YLET Q A AFAERGDDGDIT V V S S TQHPS E V Q AIIS H VLHLPRS RV V VKS PRMGGGF GGKETQGN SPAALVALASWHTGRPTRWMMDRDVDMVVTGKRHPFHAAYEVGFDDEGKLLALR
V QLV S N GGW S LDLS ES rTDRALFHLDN A Y Y VP ALT YT GRV AKTHLV S NT AFRGF GG PQGMLVTEE VLAH V ARS V G VP AD V VRERNLYRGT GETNTTH Y GQELEDERIHR VW EELKRT S DFEQRRAE VD AFN ARS PFIKRGLAITPMKFGIS FT ATFLN Q AG ALVHLYRD GS VM V S HGGTEMGQGLHTKVQG V AMRELG VE AS A VRIAKT ATD KVPNT S AT A AS S GS DLN G A A VRLACITLRERLAP V A VRLLADRHGRT V APE ALLF S EGKV GLRGEPE V S LPFANVVEAAYLARVGLS ATGYY QTPGIGYDKAKGRGRPFLYFAY GAS VCEVEVDG HTGVKRVLRVDLLED V GDS LNPGVDRGQIEGGFV QGLGWLTGEELRWD AN GRLLT HS AS T Y A VP AF S D APIDFR VRLLERAHQHNTIHGS KA V GEPPLMLAMS ARE ALRD A V GAFGQAGGGVALASPATHEALFLAIQKRLSRGAREDGREAA (SEQ ID NO: 8)
E. cloacae XDH:
[0233] MKFDKPATTNPIDTLRVVGQPHTRIDGPRKTTGSAHYAYEWHDIAPNAAYGH
VVGAPIAKGRITAIDTKAAEAAPGVLAVITADNAGPLGKGEKNTATLLGGPEIEHYH
QAVALVVAETFEQARAAAALVKVTCKRAQGAYDLAAEKASVTEPPEDTPDKNVGD
V AT AF AS A A VKLD AIYTTPDQS HM AMEPH AS M A VWEGDN VT VWT S N QMID W CRT DLALTLKIPPENVRIVSPYIGGGFGGKLFLRSDALLAALGARAVKRPVKVMLPRPTIP NNTTHRPATLQHIRIGTDTEGKIVAIAHDS W S GNLPGGTPETA V QQTELLY AGANRH TGLRLATLDLPEGNAMRAPGEAPGLMALEIAIDEIADKAGVDPVAFRILNDTQVDPA NPERRFSRRQLVECLQTGAERFGWQKRHAQPGQVRDGRWLVGMGMAAGFRNNLV AT S G AR VHLN AD GS V A VETDMTDIGTGS YTIIAQT A AEMLGLPLEK VD VRLGDS RFP VS AGS GGQW GANTS TAG VY A AC VKLRE AI ARQLGFDP AT AEFADETIS AQGRS APL AEAAKSGVLTAEDSIEFGDLDKEYQQSTFAGHFVEVGVDSATGEVRVRRMLAVCAA GRILNPIT ARS QVIGAMTMGLG AALMEELA VDTRLGYFVNHDM AA YE VPVHADIPE QE VIFLEDTDPIS S PMKAKG V GELGLC G V S A AIAN AIYN AT G VRVRD YPITLD KLID A LPDAV (SEQ ID NO: 10)
S. snoursei XDH:
[0234] MSHDPVPHLPPAAPLPHPLGAPS VRREGREKVTGAARY AAEHTPPGCAY AW P VP AT V ARGRITELDT A A ALALPG VIA VLTHEN APRLAS TGDPTLA VLQEDRVPHRG W YV ALA V ADTLE A ARD A AE A VH V GY ATEPHD VRIT ADHPRLY VPEE VF GGPG ARE RGDFDAAFAAAPATVDVAYTVPPLHNHPMEPHAATAQWTDGHLTVHDSSQGATRV CEDLA ALFKLGTDEIT V V S EH V GGGF G AKGTPRPQ V VLA AM A ARHT GRP VKLALPR RQLPGVVGHRAPTLHRVRIGAGHDGVITALAHEIVTHTSTVTEFVEQAAIPARMMYT S PHS RT VHRLA ALD VPTPS WMR APGE APGM Y ALES ALDELA V VLDIDP VELRIRNDP ATEPDTGRPFSSRHLVECLRAGAERFGWLPRDPRPAVRRRGDLLLGTGVAAATYPV QIS ETE AE AH A A ADGG YRIRVN ATDIGT G ART VLT QIA A A VLG APEDRVR VDIGS S D LPPA VLAGGSTGT AS WGW A VHKACTSLLARLRAHHGPLPAEGIM AELSEW APMAL RA WRIIS GLGLPTKY GS TP V ALVMRA ATEP V AGS GPS VEGP V S S GLV AMKR APF SMS RMALVSASKL (SEQ ID NO: 15)
S. albulus XDH:
[0235] MTPPPTTRTRAMSHPPEEAPFPPGPPPHPLGDPLVRREGREKVTGTARYAAEH TPDGCAYAWPVPATVVRGRITELDTGAALALPGVIAVLTHENAPRLAPTGDPTLALL QEDR VPHRGW Y V ALA V ADTLE A ARD A AE A VH V S Y ATEPHD VTLT ADHPRLY VP AE VFGGPGARERGDFDT AFAA APATVD VT YT VPPLHNHPMEPHAAT ALWTHGHLT VH DS S QGATRVREDLAALFKLGQDQIT VHSEHV GGGFGS KGTPRPQVVLAAMA ARHTG RP VKLALPRRHLP A V V GHRAPTLHR VRLG AGPDG VIT ALAHEIVTHT S T V AEF VEQ A AMPARIMYTSPHSRTVHRLAALDVPTPSWMRAPGEAPGMYALESAVDELAVVLDL DPIDLRIRNEPGTEPDTGRPFSSRHLVDCLRAGAARFGWSSRDPRPAVRRQGDLLLGT GVAAATYPVQISATDAEAHAAADGTFRVRVNATDIGTGARTVLAQIAAAALGAPAD RVRVEIGS SDLPP A VLAGGSTGT AS WGW A VHKACT VLLARLREHRGPLPAEGVT VT EDTRRETEQPS P Y S RHAF G A VF AE V Q VDTRTGE VR ARRLLGQ Y A AGHILNPRT ARS Q FVGGMVMGLGMALTEDSALDPVYGDFTARDLAAYHVPACADVPAIEAHWLDEEDP HLNPMGSKGIGEIGIVGTPAAIGNAVWHATGVRLRDLPLTPDRILTARTVPLT (SEQ ID NO: 16)
S. himastatinicus XDH:
[0236] MTRVDGLDKVTGAATYAYEFPTPDVGYVWPVQATIARGRVTEVDGAPALA
RPGVLAVLDSGNAPRLNTEAQAGPDLFVLQSPEVAYHGQIVAAVVATSLEAAREGA
AAVRVSYEQEPHDVVLRFDDERAQVAETVTDGSPGFVEHGDAEGALAAAPVRTEA
MYTTPVEHTSPMEPHATIAAWDEDRLTLYNADQGPFMSSQLLAAVFGLDQGAVEV
V AE YIGGGF GS KGIPRS PAVLA ALA AKHLGRPVKIALTRQQMF QLIP YRAPTIQRIRL
G AERDGRLT AIDHE V V QQRS AM AEFADQT GS S TR VM Y A APNIRTT VKT APLD VLTP
A WFRAPGHTPGMF ALES AMDELATELEIDP VELRIRNDT G VDPDS GKPF S S RGLV AC
LREG A ARFD W ALRDPKPGIRREGRWLV GTG V AS AHHPD Y VFPS S AT AR AE ADGTFT
VRVGAVDIGTGGRTALTQLAADALGIPVERLRLEIGRASLGPAPFAGGSLGTASWGW
AVDKACRALLAELDTYGGAVPDGGLEVRADTTEDVELRASFSRHSFGAHFAQVRVD
TDTGEIRVDRMLGVFAAGRIVNPKTARSQFVGAMTMGLSMALLEIGEVDPVFGDFA NHDFAGYHVAANADVPKLEALWLDEQDDNPNPVRGKGIGELGIVGAAAAVTNAFH HATGQRVRDLPIRVERSREALRAARAEAQKRGPGAAEQGKPVG (SEQ ID NO: 17)
S. lividans XDH:
[0237] MSHLSERPEKPVV GVSMPHES A V QHVTGAALYTDDLV QRTKD VLHA YPV Q VMKARGRVTALRTGAALAVPGVVRVLTGADVPGVNDAGMKHDEPLFPDEVMFHG HAVAWVLGETLEAARIGAAAVEVDLEELPSVITLQDAIAADSYHGARPVMTHGDVD AGFADS AH VFTGEF QF S GQEHFYLETH A ALAQ VDEN GQ VFIQS S TQHPS ETQEIVS H VLGVPAHEVTVQCLRMGGGFGGKEMQPHGFAAIAALGAKLTGRPVRFRLNRTQDL TMS GKRHGFHATWKIGFDTEGRIQ ALD ATLT ADGGW S LDLSEPVLARALCHIDNT Y WIPNARVAGRIARTNTVSNTAFRGFGGPQGMLVIEDILGRCAPRLGVDAKELRERNF YRPGQGQTTPY GQPVTQPERIAA VW QQV QDNGHIADREREIAAFN AAHPHTKRALA VTGVKFGISFNLTAFNQGGALVLIYKDGSVLINHGGTEMGQGLHTKMLQVAATTLGI PLHKVRLAPTRTDK VPNT S AT A AS S G ADLN GG A VKN ACEQLRERLLR V A AS QLGTN AS D VRIVEG V ARS LGS DQELA WDDLVRT A YF QRV QLS A AG Y YRTEGLHWD AKS FR GSPFKYFAIGAAATEVEVDGFTGAYRIRRVDIVHDVGDSLSPLIDIGQVEGGFVQGAG WLTLEDLRWDTGDGPNRGRLLTQAASTYKLPSFSEMPEEFNVTLLENATEEGAVFGS KA V GEPPLMLAFS VRE ALRQ A A A AF GPRGT A VELAS P ATPE A V YW AIES ARQGGT A GDGRTHG A A AS D A V A VRT G VEALS G A (SEQ ID NO: 18)
Cytochrome P 1A2 (“CYP1A2”) (human):
[0238] M ALS QS VPF S ATELLL AS AIFCLVFW VLKGLRPR VPKGLKS PPEPW GWPLLG HVLTLGKNPHLALSRMSQRYGDVLQIRIGSTPVLVLSRLDTIRQALVRQGDDFKGRP DLYT S TLITDGQS LTFS TDS GP VW A ARRRLAQN ALNTF S IAS DP AS S S S C YLEEH V S K E AKALIS RLQELM AGPGHFDP YN Q V V V S V AN VIG AMCF GQHFPES S DEMLS LVKNT HEFVET AS S GNPLDFFPILRYLPNPALQRFKAFN QRFLWFLQKT V QEHY QDFDKNS V RDITGALFKHSKKGPRASGNLIPQEKIVNLVNDIFGAGFDTVTTAISWSLMYLVTKPEI QRKIQKELDT VIGRERRPRLS DRPQLP YLE AFILETFRHS S FLPFTIPHS TTRDTTLN GF YIPKKCCVFVNQWQVNHDPELWEDPSEFRPERFLTADGTAINKPLSEKMMLFGMGK RRCIGEVLAKWEIFLFLAILLQQLEFSVPPGVKVDLTPIYGLTMKHARCEHVQARRFSI
N (SEQ ID NO: 19)
CYP2A6 (human):
[0239] MLAS GMLLV ALLVCLT VM VLMS VW QQRKS KGKLPPGPTPLPFIGN YLQLNT EQMYNSLMKISERYGPVFTIHLGPRRVVVLCGHDAVREALVDQAEEFSGRGEQATF DWVFKGYGVVFSNGERAKQLRRFSIATLRDFGVGKRGIEERIQEEAGFLIDALRGTG G ANIDPTFFLS RT V S N VIS S IVF GDRFD YKDKEFLS LLRMMLGIF QFT S T S TGQLYEMF SSVMKHLPGPQQQAFQLLQGLEDFIAKKVEHNQRTLDPNSPRDFIDSFLIRMQEEEKN PNTEFYLKNLVMTTLNLFIGGTETVSTTLRYGFLLLMKHPEVEAKVHEEIDRVIGKNR QPKFEDRAKMPYMEAVIHEIQRFGDVIPMSLARRVKKDTKFRDFFLPKGTEVFPMLG S VLRDPS FFS NPQDFNPQHFLNEKGQFKKS D AF VPFS IGKRN CF GEGLARMELFLFFT TVMQNFRLKS S QSPKDID VSPKHV GFATIPRNYTMSFLPR (SEQ ID NO: 20)
CYP3A4 (human):
[0240] M ALIPDLAMETWLLLA V S LVLLYLY GTHS HGLFKKLGIPGPTPLPFLGNILS Y HKGFCMFDMECHKKY GKVW GFYDGQQPVLAITDPDMIKTVLVKEC YS VFTNRRPF GP V GFMKS AIS IAEDEEWKRLRS LLS PTFT S GKLKEM VPIIAQ Y GD VLVRNLRRE AET GKP VTLKD VF GAY S MD VIT S TS F G VNIDS LNNPQDPFVENTKKLLRFDFLDPFFLS IT V FPFLIPILE VLNIC VFPRE VTNFLRKS VKRMKES RLEDT QKHR VDFLQLMIDS QN S KET ESHKALSDLELVAQSIIFIFAGYETTSSVLSFIMYELATHPDVQQKLQEEIDAVLPNKA PPTYDTVLQMEYLDMVVNETLRLFPIAMRLERVCKKDVEINGMFIPKGVVVMIPSYA LHRDPKYWTEPEKFLPERFSKKNKDNIDPYIYTPFGSGPRNCIGMRFALMNMKLALI RVLQNFSFKPCKETQIPLKLSLGGLLQPEKPVVLKVESRDGTVSGA (SEQ ID NO: 35) TET1 (human):
[0241] MSRS RH ARPS RLVRKED VNKKKKN S QLRKTTKG ANKN V AS VKTLS PGKLKQ LIQERD VKKKTEPKPP VP VRS LLTRAG A ARMNLDRTE VLF QNPES LT CN GFTM ALRS TS LS RRLS QPPLV V AKS KKVPLS KGLEKQHDCD YKILP ALG VKHS ENDS VPMQDTQ V LPDIETLIGV QNPS LLKGKS QETTQFW S QRVEDS KINIPTHS GPAAEILPGPLEGTRCGE GLF S EETLNDTS GS PKMF AQDT V C APFPQRATPKVTS QGNPS IQLEELGS R VES LKLS DS YLDPIKS EHDC YPT S S LNKVIPDLNLRNCL ALGGS TS PTS VIKFLL AGS KQ ATLG AK PDHQEAFEATANQQEVSDTTSFLGQAFGAIPHQWELPGADPVHGEALGETPDLPEIP G AIP V QGE VF GTILDQQETLGMS GS V VPDLP VFLP VPPNPIATFN APS KWPEPQS TVS Y GLA V QG AIQILPLGS GHTPQS S S NS EKN S LPP VM AIS N VENEKQ VHIS FLPANTQGFP LAPERGLFH AS LGIAQLS Q AGPS KS DRGS S Q V S VTS T VH V VNTT V VTMP VPM V S TS S SSYTTLLPTLEKKKRKRCGVCEPCQQKTNCGECTYCKNRKNSHQICKKRKCEELKK KPSVVVPLEVIKENKRPQREKKPKVLKADFDNKPVNGPKSESMDYSRCGHGEEQKL ELNPHT VEN VTKNED S MT GIE VEKWTQNKKS QLTDH VKGDF S AN VPE AEKS KN S E V DKKRTKS PKLF V QT VRN GIKH VHCLP AETN V S FKKFNIEEF GKTLENN S YKFLKDT A NHKN AMS S V ATDMS CDHLKGRS N VLVF QQPGFNCS S IPHS S HS IINHH AS IHNEGDQP KTPENIPS KEPKDGS P V QPS LLS LMKDRRLTLEQ V V AIE ALTQLS E APS EN S S PS KS EK DEESEQRTAS LLNSCKAILYTVRKDLQDPNLQGEPPKLNHCPS LEKQS SCNTVVFN G QTTTLS NS HINS ATN Q AS TKS HEY S KVTN S LS LFIPKS NS S KIDTNKS IAQGIITLDNCS NDLHQLPPRNNEVEYCNQLLDSSKKLDSDDLSCQDATHTQIEEDVATQLTQLASIIKI NYIKPEDKKVESTPTSLVTCNV QQKYN QEKGTIQQKPPS S VHNNHGS S LTKQKNPTQ KKTKS TPS RDRRKKKPT V V S Y QENDRQKWEKLS YM Y GTICDIWIAS KF QNF GQFCP HDFPTVFGKISSSTKIWKPLAQTRSIMQPKTVFPPLTQIKLQRYPESAEEKVKVEPLDS LS LFHLKTES N GKAFTDKA YN S Q V QLT VN AN QKAHPLTQPS S PPN QC AN VM AGDD QIRFQQ V VKEQLMHQRLPTLPGIS HETPLPES ALTLRN VN V V C S GGIT V V S TKS EEE V CSSSFGTSEFSTVDSAQKNFNDYAMNFFTNPTKNLVSITKDSELPTCSCLDRVIQKDK GP Y YTHLG AGPS V A A VREIMENRY GQKGN AIRIEIV V YT GKEGKS S HGCPIAKW VLR RSSDEEKVLCLVRQRTGHHCPTAVMVVLIMVWDGIPLPMADRLYTELTENLKSYNG HPTDRRCTLNENRTCTCQGIDPETCGASFSFGCSWSMYFNGCKFGRSPSPRRFRIDPS S PLHEKNLEDNLQS LATRLAPIYKQ Y AP V AY QN Q VE YEN V ARECRLGS KEGRPFS G V T ACLDFC AHPHRDIHNMNN GS T V V CTLTREDNRS LG VIPQDEQLH VLPLYKLS DTDE FGS KEGME AKIKS GAIEVLAPRRKKRTCFTQPVPRS GKKRAAMMTE VLAHKIRA VE KKPIPRIKRKNN S TTTNN S KPS S LPTLGS NTET V QPE VKS ETEPHFILKS S DNTKT Y S L MPSAPHPVKEASPGFSWSPKTASATPAPLKNDATASCGFSERSSTPHCTMPSGRLSGA N A A A ADGPGIS QLGE V APLPTLS AP VMEPLIN S EPS TG VTEPLTPHQPNHQPS FLTS PQ DLAS S PMEEDEQHS E ADEPPS DEPLS DDPLS P AEEKLPHIDE YW S DS EHIFLD ANIGG V AIAPAHGSVLIECARRELHATTPVEHPNRNHPTRLSLVFYQHKNLNKPQHGFELNKIK FEAKEAKNKKMKASEQKDQAANEGPEQSSEVNELNQIPSHKALTLTHDNVVTVSPY ALTH V AGP YNHW V (SEQ ID NO: 36)
TET1-CD (“Catalytic domain”) (human):
[0242] MGSLPTCSCLDRVIQKDKGPYYTHLGAGPSVAAVREIMENRYGQKGNAIRIEI VVYTGKEGKSSHGCPIAKWVLRRSSDEEKVLCLVRQRTGHHCPTAVMVVLIMVWD GIPLPMADRLYTELTENLKSYNGHPTDRRCTLNENRTCTCQGIDPETCGASFSFGCSW S M YFN GCKF GRS PS PRRFRIDPS S PLHEKNLEDNLQS LATRLAPIYKQ Y AP VA Y QN Q V EYENVARECRLGSKEGRPFSGVTACLDFCAHPHRDIHNMNNGSTVVCTLTREDNRS LGVIPQDEQLHVLPLYKLSDTDEFGSKEGMEAKIKSGAIEVLAPRRKKRTCFTQPVPR S GKKRAAMMTE VLAHKIRA VE KKPIPRIKRKNN S TTTNN S KPS S LPTLGS NTET V QPE VKS ETEPHFILKS S DNTKT Y S LMPS APHP VKE AS PGFS W S PKT AS ATP APLKND AT AS CGF S ERS S TPHCTMPS GRLS GAN A A A ADGPGIS QLGE V APLPTLS AP VMEPLIN S EPS T G VTEPLTPHQPNHQPS FLTS PQDLAS S PMEEDEQHS E ADEPPS DEPLS DDPLS P AEEKL PHIDE YW S DS EHIFLD ANIGG V AIAP AHGS VLIEC ARRELH ATTP VEHPNRNHPTRLS L VFYQHKNLNKPQHGFELNKIKFEAKEAKNKKMKASEQKDQAANEGPEQSSEVNEL N QIPS HKALTLTHDN V VT V S P Y ALTH V AGP YNHW V (SEQ ID NO: 37) TET2 (human):
[0243] MEQDRTNHVEGNRLSPFLIPSPPICQTEPLATKLQNGSPLPERAHPEVNGDTK WHS FKS Y Y GIPCMKGS QN S R V S PDFTQES RG Y S KCLQN GGIKRT V S EPS LS GLLQIKK LKQDQKANGERRNFGVSQERNPGESSQPNVSDLSDKKESVSSVAQENAVKDFTSFST HNCS GPENPELQILNEQEGKS AN YHD KNIVLLKNKA VLMPN G AT V S AS S VEHTHGEL LEKTLS Q Y YPDC V S IA V QKTTS HIN AINS Q ATNELS CEITHPS HTS GQIN S AQTS NS ELP PKP A A V V S E ACD ADD ADN AS KLA AMLNTCS F QKPEQLQQQKS VFEICPS P AENNIQG TTKLASGEEFCSGSSSNLQAPGGSSERYLKQNEMNGAYFKQSSVFTKDSFSATTTPPP PS QLLLS PPPPLPQ VPQLPS EGKS TLNGG VLEEHHH YPN QS NTTLLRE VKIEGKPE APP S QS PNPS TH VCS PS PMLS ERPQNN C VNRNDIQT AGTMT VPLCS EKTRPMS EHLKHNP PIFGSSGELQDNCQQLMRNKEQEILKGRDKEQTRDLVPPTQHYLKPGWffiLKAPRFH Q AES HLKRNE AS LPS ILQ Y QPNLS N QMTS KQ YTGN S NMPGGLPRQ A YTQKTTQLEH KS QM Y Q VEMN QGQS QGT VDQHLQF QKPS HQ VHF S KTDHLPKAH V QS LCGTRFHF Q QRADS QTEKLMS P VLKQHLN QQ AS ETEPF S NS HLLQHKPHKQ A AQTQPS QS SHLPQ N QQQQQKLQIKNKEEILQTFPHPQS NND QQREGS FF GQTKVEECFHGEN Q Y S KS S EF ETHN V QMGLEE V QNINRRN S P YS QTMKS S ACKIQ V S CS NNTHLV S ENKEQTTHPELF AGNKTQNLHHMQYFPNNVIPKQDLLHRCFQEQEQKSQQASVLQGYKNRNQDMSGQ QAAQLAQQRYLIHNHANVFPVPDQGGSHTQTPPQKDTQKHAALRWHLLQKQEQQQ TQQPQTESCHSQMHRPIKVEPGCKPHACMHTAPPENKTWKKVTKQENPPASCDNVQ QKSIIETMEQHLKQFHAKSLFDHKALTLKSQKQVKVEMSGPVTVLTRQTTAAELDSH TPALEQQTTSSEKTPTKRTAASVLNNFIESPSKLLDTPIKNLLDTPVKTQYDFPSCRCV EQIIEKDEGPFYTHLGAGPNVAAIREIMEERFGQKGKAIRIERVIYTGKEGKSSQGCPI AKWVVRRSSSEEKLLCLVRERAGHTCEAAVIVILILVWEGIPLSLADKLYSELTETLR KYGTLTNRRCALNEERTCACQGLDPETCGASFSFGCSWSMYYNGCKFARSKIPRKF KLLGDDPKEEEKLES HLQNLS TLMAPT YKKLAPD A YNN QIEYEHRAPECRLGLKEGR PFS GVT ACLDFC AHAHRDLHNMQN GSTLV CTLTREDNREFGGKPEDEQLHVLPLYK V S D VDEF GS VE AQEEKKRS G AIQ VLS S FRRK VRMLAEP VKT CRQRKLE AKKA A AEK LS S LENS S NKNEKEKS APS RTKQTEN AS Q AKQLAELLRLS GP VMQQS QQPQPLQKQP PQPQQQQRPQQQQPHHPQTESVNSYSASGSTNPYMRRPNPVSPYPNSSHTSDIYGSTS PMNF Y S TS S Q A AGS YLN S S NPMNP YPGLLN QNTQ YPS Y QCN GNLS VDN C S P YLGS Y S PQS QPMDLYRYPS QDPLS KLS LPPIHTLY QPRF GN S QS FTS KYLG Y GN QNMQGDGF S S CTIRPN VHH V GKLPP YPTHEMDGHFMG AT S RLPPNLS NPNMD YKN GEHHS PS HII HN Y S A APGMFN S S LH ALHLQNKENDMLS HT AN GLS KMLP ALNHDRT AC VQGGLHK LS DAN GQEKQPLALV QG V AS G AEDNDE VW S DS EQS FLDPDIGG V A V APTHGS ILIEC AKRELH ATTPLKNPNRNHPTRIS LVF Y QHKS MNEPKHGLALWE AKM AEKAREKEEE CEKY GPD Y VPQKS HGKKVKREP AEPHET S EPT YLRFIKS LAERTMS VTTDS T VTT S P Y AFTRVT GP YNRYI (SEQ ID NO: 38)
TET3 (human):
[0244] MDS GPVYHGDSRQLS AS GVPVN GAREPAGPS LLGTGGPWRVDQKPDWE AA PGP AHT ARLED AHDL V AFS A V AE A V S S Y GALS TRLYETFNREMS RE AGNN S RGPRP GPEGCSAGSEDLDTLQTALALARHGMKPPNCNCDGPECPDYLEWLEGKIKSVVMEG GEERPRLPGPLPPGE AGLP APS TRPLLS S E VPQIS PQEGLPLS QS ALS I AKEKNIS LQT AI AIE ALTQLS S ALPQPS HS TPQ AS CPLPE ALS PP APFRS PQS YLRAPS WP V VPPEEHS S FA PDS S AFPP ATPRTEFPE A W GTDTPP ATPRS S WPMPRPS PDPM AELEQLLGS AS D YIQS VFKRPE ALPTKPKVK VE APS S S PAP APS P VLQRE APTPS S EPDTHQKAQT ALQQHLHH KRS LFLEQ VHDT S FP APS EPS APGWWPPPS SP VPRLPDRPPKEKKKKLPTP AGGP V GT EKA APGIKPS VRKPIQIKKS RPRE AQPLFPP VRQIVLEGLRS PAS QE V Q AHPP APLP AS QGSAVPLPPEPSLALFAPSPSRDSLLPPTQEMRSPSPMTALQPGSTGPLPPADDKLEELI RQFEAEFGDSFGLPGPPSVPIQDPENQQTCLPAPESPFATRSPKQIKIESSGAVTVLSTT CFHSEEGGQEATPTKAENPLTPTLSGFLESPLKYLDTPTKSLLDTPAKRAQAEFPTCD CVEQIVEKDEGPYYTHLGSGPTVASIRELMEERYGEKGKAIRIEKVIYTGKEGKSSRG CPIAKWVIRRHTLEEKLLCLVRHRAGHHCQNAVIVILILAWEGIPRSLGDTLYQELTD TLRKYGNPTSRRCGLNDDRTCACQGKDPNTCGASFSFGCSWSMYFNGCKYARSKTP RKFRLAGDNPKEEEVLRKSFQDLATEVAPLYKRLAPQAYQNQVTNEEIAIDCRLGLK EGRPFAGVTACMDFCAHAHKDQHNLYNGCTVVCTLTKEDNRCVGKIPEDEQLHVL PLYKM ANTDEF GS EEN QN AKV GS G AIQ VLT AFPRE VRRLPEP AKS CRQRQLE ARKA AAEKKKIQKEKLSTPEKIKQEALELAGITSDPGLSLKGGLSQQGLKPSLKVEPQNHFS S FKY S GN AWES Y S VLGN CRPS DP Y S MN S V Y S YHS Y Y AQPS LTS VN GFHS K Y ALPS F S Y Y GFPS S NP VFPS QFLGPG A W GHS GS S GS FEKKPDLH ALHN S LS PAY GG AEF AELPS Q A VPTD AHHPTPHHQQP A YPGPKE YLLPK APLLHS V S RDPS PF AQS S N C YNRS IKQEP VDPLTQ AEP VPRD AGKMGKTPLS E VS QN GGPS HLW GQ Y S GGPS MSPKRTN G VGGS W GVF S S GES P AIVPD KLS S FG AS CLAPS HFTDGQW GLFPGEGQQ A AS HS GGRLRGKP WSPCKFGNSTSALAGPSLTEKPWALGAGDFNSALKGSPGFQDKLWNPMKGEEGRIP A AG AS QLDR A W QS F GLPLGS S EKLF G ALKS EEKLWDPF S LEEGP AEEPPS KG A VKEE KGGGG AEEEEEELW S DS EHNFLDENIGG V A V AP AHGS ILIEC ARRELH ATTPLKKPN RCHPTRIS LVFY QHKNLN QPNHGLALWEAKMKQLAERARARQEEAARLGLGQQE A KLY GKKRKW GGT V V AEPQQKEKKG V VPTRQ ALA VPTDS A VT V S S Y A YTKVT GP Y S RWI (SEQ ID NO: 39) [0245] In various embodiments, the disclosed fusion proteins comprise an adenine oxidase domain that does not comprise an alkB or alkA dehydrogenase, such as an E. coli alkB or alkA dehydrogenase. In various embodiments, the disclosed fusion proteins comprise an adenine oxidase domain that does not comprise a variant of an alkB dehydrogenase or alkA dehydrogenase. In some embodiments, the disclosed fusion proteins comprise an adenine oxidase domain that does not comprise a TET family dioxygenase, such as TET1. In some embodiments, the disclosed fusion proteins comprise an adenine oxidase domain that does not comprise a variant of a TET family dioxygenase. In some embodiments, the disclosed fusion proteins do not comprise an alkA dehydrogenase, an alkB dehydrogenase, or a TET family dioxygenase, or a variant thereof.
III. Additional base editor elements
[0246] In various embodiments, the base editors disclosed herein further comprise one or more additional base editor elements, e.g., a nuclear localization signal(s), an inhibitor of base excision repair, and/or a heterologous protein domain.
[0247] In various embodiments, the base editors disclosed herein further comprise one or more, preferably, at least two nuclear localization signals. In certain embodiments, the base editors comprise at least two NLSs. In embodiments with at least two NLSs, the NLSs can be the same NLSs, or they can be different NLSs. In addition, the NLSs may be expressed as part of a fusion protein with the remaining portions of the base editors. In some
embodiments, one or more of the NLSs are bipartite NLSs (“bpNLS”). In certain
embodiments, the disclosed fusion proteins comprise two bipartite NLSs. In some embodiments, the disclosed fusion proteins comprise more than two bipartite NLSs.
[0248] The location of the NLS fusion can be at the N-terminus, the C-terminus, or within a sequence of a base editor (e.g., inserted between the encoded napDNAbp domain (e.g., Cas9) and a DNA nucleobase modification domain (e.g., an adenine oxidase)).
[0249] The NLSs may be any known NLS sequence in the art. The NLSs may also be any future-discovered NLSs for nuclear localization. The NLSs also may be any naturally- occurring NLS, or any non-naturally occurring NLS (e.g., an NLS with one or more desired mutations).
[0250] The term“nuclear localization sequence” or“NLS” refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport. Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. Lor example, NLS sequences are described in Plank et al., International PCT application PCT/EP2000/011690, filed November 23, 2000, published as WO/2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference. In some embodiments, an NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO: 51), MDSLLMNRRKFLY QFKNVRWAKGRRETYLC (SEQ ID NO: 52),
KRT ADGS EFES PKKKRKV (SEQ ID NO: 53), or KRT ADGS EFEPKKKRKV (SEQ ID NO: 13). In other embodiments, NLS comprises the amino acid sequences
NLS KRP A AIKKAGQ AKKKK (SEQ ID NO: 54), PAAKRVKLD (SEQ ID NO: 55), RQRRNELKRS F (SEQ ID NO: 56),
N QS SNFGPMKGGNFGGRS S GPY GGGGQYFAKPRN QGGY (SEQ ID NO: 57).
[0251] In one aspect of the invention, a base editor may be modified with one or more nuclear localization signals (NLS), preferably at least two NLSs. In certain embodiments, the base editors are modified with two or more NLSs. The invention contemplates the use of any nuclear localization signal known in the art at the time of the invention, or any nuclear localization signal that is identified or otherwise made available in the state of the art after the time of the instant filing. A representative nuclear localization signal is a peptide sequence that directs the protein to the nucleus of the cell in which the sequence is expressed. A nuclear localization signal is predominantly basic, can be positioned almost anywhere in a protein's amino acid sequence, generally comprises a short sequence of four amino acids (Autieri & Agrawal, (1998) J. Biol. Chem. 273: 14731-37, incorporated herein by reference) to eight amino acids, and is typically rich in lysine and arginine residues (Magin et al., (2000) Virology 274: 11-16, incorporated herein by reference). Nuclear localization signals often comprise proline residues. A variety of nuclear localization signals have been identified and have been used to effect transport of biological molecules from the cytoplasm to the nucleus of a cell. See, e.g., Tinland et al., (1992) Proc. Natl. Acad. Sci. U.S.A. 89:7442-46; Moede et al., (1999) FEBS Lett. 461:229-34, which is incorporated by reference. Translocation is currently thought to involve nuclear pore proteins.
[0252] Most NLSs can be classified in three general groups: (i) a monopartite NLS exemplified by the SV40 large T antigen NLS (PKKKRKV (SEQ ID NO: 51)); (ii) a bipartite motif consisting of two basic domains separated by a variable number of spacer amino acids and exemplified by the Xenopus nucleoplasmin NLS (KRXXXXXXXXXXKKKL (SEQ ID NO: 50)); and (iii) noncanonical sequences such as M9 of the hnRNP Al protein, the influenza virus nucleoprotein NLS, and the yeast Gal4 protein NLS (Dingwall and Laskey 1991).
[0253] Nuclear localization signals appear at various points in the amino acid sequences of proteins. NLS’s have been identified at the N-terminus, the C-terminus, and in the central region of proteins. Thus, the specification provides base editors that may be modified with one or more NLSs at the C -terminus, the N-terminus, as well as at in internal region of the base editor. The residues of a longer sequence that do not function as component NLS residues should be selected so as not to interfere, for example tonically or sterically, with the nuclear localization signal itself. Therefore, although there are no strict limits on the composition of an NLS -comprising sequence, in practice, such a sequence can be
functionally limited in length and composition.
[0254] The present disclosure contemplates any suitable means by which to modify a base editor to include one or more NLSs. In one aspect, the base editors can be engineered to express a base editor protein that is translationally fused at its N-terminus or its C-terminus (or both) to one or more NLSs, i.e., to form a base editor-NLS fusion construct. In other embodiments, the base editor-encoding nucleotide sequence can be genetically modified to incorporate a reading frame that encodes one or more NLSs in an internal region of the encoded base editor. In addition, the NLSs may include various amino acid linkers or spacer regions encoded between the base editor and the N-terminally, C-terminally, or internally- attached NLS amino acid sequence, e.g, and in the central region of proteins. Thus, the present disclosure also provides for nucleotide constructs, vectors, and host cells for expressing fusion proteins that comprise a base editor and one or more NLSs.
[0255] The base editors described herein may also comprise nuclear localization signals which are linked to a base editor through one or more linkers, e.g., and polymeric, amino acid, nucleic acid, polysaccharide, chemical, or nucleic acid linker element. The linkers within the contemplated scope of the disclosure are not intented to have any limitations and can be any suitable type of molecule (e.g., polymer, amino acid, polysaccharide, nucleic acid, lipid, or any synthetic chemical linker domain) and be joined to the base editor by any suitable strategy that effectuates forming a bond (e.g., covalent linkage, hydrogen bonding) between the base editor and the one or more NLSs.
[0256] The base editors described herein also may include one or more additional elements.
In certain embodiments, an additional element may comprise an effector of base repair.
[0257] In certain embodiments, the base editors described herein may comprise an inhibitor of base excision repair. The term“inhibitor of base excision repair” or“iBER” refers to a protein that is capable of inhibiting the activity of a nucleic acid repair enzyme, for example a base excision repair enzyme. Mammalian cells clear 8-oxoadenine lesions that arise naturally from oxidative DNA damage by action of thymine-DNA glycosylase (TDG), which hydrolytically cleaves the glycosidic bond of the damaged base, leaving behind an abasic site (FIG. 5). Abasic sites are excised by AP lyase during the base excision repair process, introducing a break in the modified DNA strand. If this occurs before mismatch repair machinery locates the nick left by an nCas9 domain, as in the fusion proteins disclosed herein, in the non-edited strand, a double strand break is generated, which could lead to undesired indels during repair. Competitive base excision repair may interfere with 8- oxoadenine-mediated base editing. Accordingly, in exemplary embodiments, an iBER is fused to to the fusion proteins disclosed herein, to compete for binding of the 8-oxoadenine lesion with active, endogenous excision repair enzymes, preventing or slowing base excision repair.
[0258] In some embodiments, the iBER is an inhibitor of 8-oxoadenine base excision repair. Exemplary iBERs include OGG inhibitors, MUG inhibitors, and TDG inhibitors. Exemplary iBERs include inhibitors of hOGGl, hTDG, ecMUG, APE1, Endo III, Endo IV, Endo V, Endo VIII, Fpg, hNEILl, T7 Endol, T4PDG, UDG, hSMUGl, and hAAG. In some embodiments, the iBER may be a catalytically inactive OGG, a catalytically inactive TDG, a catlytically inactive MUG, or small molecule or peptide inhibitor of OGG, TDG, or MUG, or a variant threreof.
[0259] In particular embodiments, the iBER is a catalytically inactive TDG. Exemplary catalytically inactive TDGs include mutagenized variants of wild-type TDG (SEQ ID NO:
43) that bind DNA nucleobases, including 8-oxoadenine, but lack DNA glycosylase activity.
TDG (human) (wild-type)
[0260] ME AEN AGS YS LQQ AQ AF YTFPF QQLM AE APNM A V VNEQQMPEE VP AP AP A QEPV QE APKGRKRKPRTTEPKQPVEPKKPVES KKS GKS AKS KEKQEKITDTFKVKRK VDRFNGVSEAELLTKTLPDILTFNLDIVIIGINPGLMAAYKGHHYPGPGNHFWKCLFM S GLSEV QLNHMDDHTLPGKY GIGFTNM VERTTPGS KDLS S KEFREGGRILV QKLQKY QPRIAVFNGKCIYEIFSKEVFGVKVKNLEFGLQPHKIPDTETLCYVMPSSSARCAQFPR AQDKVHYYIKLKDLRDQLKGIERNMD V QE V QYTFDLQLAQED AKKM A VKEEKYDP GYEAAYGGAYGENPCSSEPCGFSSNGLIESVELRGESAFSGIPNGQWMTQSFTDQIPS FSNHCGTQEQEEESHA (SEQ ID NO: 43)
[0261] Exemplary catalytically inactive MUGs include mutagenized variants of wild-type MUG (SEQ ID NO: 44) that bind DNA nucleobases, including 8-oxoadenine, but lack DNA glycosylase activity.
E. coli MUG (wild-type)
[0262] MVEDILAPGLRVVFCGINPGLSSAGTGFPFAHPANRFWKVIYQAGFTDRQLKP QEAQHLLD YRCGVTKLVDRPTV QANE V S KQELHAGGRKLIEKIED Y QPQALAILGK Q A YEQGFS QRG AQW GKQTLTIGS TQIW VLPNPS GLS RV S LEKLVE A YRELD QALV V RGR (SEQ ID NO: 44)
[0263] Some exemplary suitable inhibitors of base excision repair that may be fused to Cas9 domains according to embodiments of this disclosure are provided below. An exemplary catalytically inactive hTDG is an N140A mutant of SEQ ID NO: 43, shown below as SEQ ID NO: 46. Analogously, an exemplary catalytically inactive ecMUG is an N18A mutant of SEQ ID NO: 44, shown below as SEQ ID NO: 47.
Catalytically inactive TDG (human)
[0264] ME AEN AGS YS LQQ AQ AF YTFPF QQLM AE APNM A V VNEQQMPEE VP AP AP A QEPV QE APKGRKRKPRTTEPKQPVEPKKPVES KKS GKS AKS KEKQEKITDTFKVKRK VDRFNGVSEAELLTKTLPDILTFNLDIVIIGIAPGLMAAYKGHHYPGPGNHFWKCLFM S GLSEV QLNHMDDHTLPGKY GIGFTNM VERTTPGS KDLS S KEFREGGRILV QKLQKY QPRIAVFNGKCIYEIFSKEVFGVKVKNLEFGLQPHKIPDTETLCYVMPSSSARCAQFPR AQDKVHYYIKLKDLRDQLKGIERNMD V QE V QYTFDLQLAQED AKKM A VKEEKYDP GYEAAYGGAYGENPCSSEPCGFSSNGLIESVELRGESAFSGIPNGQWMTQSFTDQIPS FSNHCGTQEQEEESHA (SEQ ID NO: 46)
Catalytically inactive E. coli MUG
[0265] MVEDILAPGLRVVFCGIAPGLSSAGTGFPFAHPANRFWKVIYQAGFTDRQLKP QEAQHLLD YRCGVTKLVDRPTV QANE V S KQELHAGGRKLIEKIED Y QPQALAILGK Q A YEQGFS QRG AQW GKQTLTIGS TQIW VLPNPS GLS RV S LEKLVE A YRELD QALV V RG (SEQ ID NO: 47)
[0266] Other exemplary iBERs comprise variants with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to wild-type hTDG and ecMUG, above. Other exemplary iBERs comprise variants with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to wild-type hOGGl, UDG, hSMUGl, and hAAG.
[0267] In some embodiments, the fusion proteins described herein may comprise one or more heterologous protein domains (e.g., about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more domains in addition to the base editor components). A fusion protein may comprise any additional protein sequence, and optionally a linker sequence between any two domains.
Other exemplary features that may be present are localization sequences, such as cytoplasmic localization sequences, export sequences, such as nuclear export sequences, or other localization sequences, as well as sequence tags. [0268] Examples of protein domains that may be fused to a fusion protein or component thereof (e.g., the napDNAbp domain, the nucleobase modification domain, or the NLS domain) include, without limitation, epitope tags and reporter gene sequences. Non-limiting examples of epitope tags include histidine (His) tags, V5 tags, FLAG tags, influenza hemagglutinin (HA) tags, Myc tags, VSV-G tags, and thioredoxin (Trx) tags. Examples of reporter genes include, but are not limited to, glutathione-5-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT), beta-galactosidase, beta- glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and autofluorescent proteins including blue fluorescent protein (BFP). A base editor may be fused to a gene sequence encoding a protein or a fragment of a protein that bind DNA molecules or bind other cellular molecules, including, but not limited to, maltose binding protein (MBP), S-tag, Lex A DNA binding domain (DBD) fusions, GAL4 DNA binding domain fusions, and herpes simplex virus (HSV) BP 16 protein fusions. Additional domains that may form part of a base editor are described in US Patent Publication No. 2011/0059502, published March 10, 2011, and incorporated herein by reference in its entirety.
[0269] In an aspect of the invention, a reporter gene which includes, but is not limited to, glutathione-5-transferase (GST), horseradish peroxidase (HRP), chloramphenicol
acetyltransferase (CAT) beta-galactosidase, beta-glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and autofluorescent proteins including blue fluorescent protein (BFP), may be introduced into a cell to encode a gene product which serves as a marker by which to measure the alteration or modification of expression of the gene product. In certain embodiments of the invention the gene product is luciferase. In a further embodiment of the invention the expression of the gene product is decreased.
[0270] Other exemplary features that may be present are tags that are useful for
solubilization, purification, or detection of the fusion proteins. Suitable protein tags provided herein include, but are not limited to, biotin carboxylase carrier protein (BCCP) tags, myc- tags, calmodulin-tags, FLAG-tags, hemagglutinin (HA)-tags, bgh-PolyA tags, polyhistidine tags, and also referred to as histidine tags or His-tags, maltose binding protein (MBP)-tags, nus-tags, glutathione-S-transferase (GST)-tags, green fluorescent protein (GFP)-tags, thioredoxin-tags, S-tags, Softags (e.g., Softag 1, Softag 3), strep-tags , biotin ligase tags, FlAsH tags, V5 tags, and SBP-tags. Additional suitable sequences will be apparent to those of skill in the art. In some embodiments, the fusion protein comprises one or more His tags. IV. Linkers
[0271] In certain embodiments, linkers may be used to link any of the peptides or peptide domains or domains of the base editor (e.g., domain A covalently linked to domain B which is covalently linked to domain C).
[0272] As defined above, the term“linker,” as used herein, refers to a chemical group or a molecule linking two molecules or domains, e.g., a binding domain and a cleavage domain of a nuclease. In some embodiments, a linker joins a gRNA binding domain of a napDNAbp and the catalytic domain of a recombinase. In some embodiments, a linker joins a dCas9 and base editor domain (e.g., an adenine oxidase). Typically, the linker is positioned between, or flanked by, two groups, molecules, or other domains and connected to each one via a covalent bond, thus connecting the two. In some embodiments, the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein). In some embodiments, the linker is an organic molecule, group, polymer, or chemical domain. Chemical domains include, but are not limited to, disulfide, hydrazone, thiol and azo domains. In some embodiments, the linker is 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. In some embodiments, the linker is a molecule in length. Longer or shorter linkers are also contemplated.
[0273] The linker may be as simple as a covalent bond, or it may be a polymeric linker many atoms in length. In certain embodiments, the linker is a polpeptide or based on amino acids. In other embodiments, the linker is not peptide-like. In certain embodiments, the linker is a covalent bond (e.g., a carbon-carbon bond, disulfide bond, carbon-heteroatom bond, etc.). In certain embodiments, the linker is a carbon-nitrogen bond of an amide linkage. In certain embodiments, the linker is a cyclic or acyclic, substituted or unsubstituted, branched or unbranched aliphatic or hetero aliphatic linker. In certain embodiments, the linker is polymeric (e.g., polyethylene, polyethylene glycol, polyamide, polyester, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of aminoalkanoic acid. In certain embodiments, the linker comprises an aminoalkanoic acid (e.g., glycine, ethanoic acid, alanine, beta-alanine, 3-aminopropanoic acid, 4-aminobutanoic acid, 5- pentanoic acid, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of aminohexanoic acid (Ahx). In certain embodiments, the linker is based on a carbocyclic domain (e.g., cyclopentane, cyclohexane). In other embodiments, the linker comprises a polyethylene glycol domain (PEG). In other embodiments, the linker comprises amino acids. In certain embodiments, the linker comprises a peptide. In certain embodiments, the linker comprises an aryl or heteroaryl domain. In certain embodiments, the linker is based on a phenyl ring. The linker may included funtionalized domains to facilitate attachment of a nucleophile (e.g., thiol, amino) from the peptide to the linker. Any electrophile may be used as part of the linker. Exemplary electrophiles include, but are not limited to, activated esters, activated amides, Michael acceptors, alkyl halides, aryl halides, acyl halides, and isothiocyanates.
[0274] In some other embodiments, the linker comprises the amino acid sequence (GGGGS)n (SEQ ID NO: 78), (G)„ (SEQ ID NO: 79), (EAAAK)„ (SEQ ID NO: 80), (GGS)„ (SEQ ID NO: 81), (SGGS)n (SEQ ID NO: 82), (XP)n (SEQ ID NO: 83), or any combination thereof, wherein n is independently an integer between 1 and 30, and wherein X is any amino acid. In some embodiments, the linker comprises the amino acid sequence (GGS)n (SEQ ID NO: 70), wherein n is 1, 3, or 7. In some embodiments, the linker comprises the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 48). In some embodiments, the linker comprises the amino acid sequence SGGSSGGSSGS ETPGT S ES ATPES SGGSSGGS (SEQ ID NO: 11), also known as XTEN linker. In some embodiments, the linker comprises the amino acid sequence SGGSGGSGGS (SEQ ID NO: 12). In some embodiments, the linker comprises the amino acid sequence SGGS (SEQ ID NO: 14).
[0275] In some embodiments, the fusion protein comprises the structure [adenine oxidase] - [optional linker sequence] -[dCas9 or Cas9 nickase]- [optional linker sequence], or [dCas9 or Cas9 nickase] -[optional linker sequence] -[adenine oxidase].
[0276] In some embodiments, the fusion protein comprises the structure [adenine oxidase] - [optional linker sequence] -[dCas9 or Cas9 nickase] -[optional linker sequence] -[iBER];
[adenine oxidase] -[optional linker sequence]-[iBER]-[optional linker sequence]-[dCas9 or Cas9 nickase]; [iBER] -[optional linker sequence] -[adenine oxidase] -[optional linker sequence] -[dCas9 or Cas9 nickase]; [iBER] -[optional linker sequence] -[dCas9 or Cas9 nickase] -[optional linker sequence] -[adenine oxidase]; [dCas9 or Cas9 nickase] -[optional linker sequence]-[iBER]-[optional linker sequence] -[adenine oxidase]; or [dCas9 or Cas9 nickase] -[optional linker sequence] -[adenine oxidase] -[optional linker sequence] -[iBER].
[0277] In some embodiments, the fusion protein comprises one or more nuclear localization sequences, and comprises the structure [adenine oxidase] -[optional linker sequence]-[dCas9 or Cas9 nickase] -[optional linker sequence] -[iBER] -[optional linker sequence] -[NLS];
[NLS]- [optional linker sequence] -[adenine oxidase] -[optional linker sequence] -[dCas9 or Cas9 nickase] -[optional linker sequence] -[iBER]; [adenine oxidase] -[optional linker sequence]-[iBER]-[optional linker sequence] -[dCas9 or Cas9 nickase] -[optional linker sequence] -[NLS]; [NLS]- [optional linker sequence] -[adenine oxidase] -[optional linker sequence]-[iBER]-[optional linker sequence] -[dCas9 or Cas9 nickase]; [NLS] -[optional linker sequence]-[iBER]-[optional linker sequence] -[adenine oxidase] -[optional linker sequence] -[dCas9 or Cas9 nickase]; [iBER]- [optional linker sequence] -[adenine oxidase]- [optional linker sequence] -[dCas9 or Cas9 nickase] -[optional linker sequence]-[NLS];
[iBER] -[optional linker sequence]-[dCas9 or Cas9 nickase] -[optional linker sequence]- [adenine oxidase] -[optional linker sequence] -[NLS]; [NLS] -[optional linker sequence] - [iBER]- [optional linker sequence]-[dCas9 or Cas9 nickase] -[optional linker sequence]- [adenine oxidase]; [NLS] -[optional linker sequence]-[dCas9 or Cas9 nickase] -[optional linker sequence]-[iBER]-[optional linker sequence] -[adenine oxidase]; [dCas9 or Cas9 nickase]- [optional linker sequence] -[iBER] -[optional linker sequence] -[adenine oxidase] -[optional linker sequence] -[NLS]; [NLS] -[optional linker sequence]-[dCas9 or Cas9 nickase] -[optional linker sequence] -[adenine oxidase] -[optional linker sequence]-[iBER]; or [dCas9 or Cas9 nickase] -[optional linker sequence] -[adenine oxidase] -[optional linker sequence] -[iBER] - [optional linker sequence] -[NLS].
Reduced off-target effects
[0278] In some embodiments, the target nucleotide sequence is a DNA sequence in a genome, e.g. a eukaryotic genome. In certain embodiments, the target nucleotide sequence is in a mammalian (e.g. a human) genome. In certain embodiments, the target nucleotide sequence is in a human genome. In other embodiments, the target nucleotide sequence is in the genome of a rodent, such as a mouse or rate. In other embodiments, the target nucleotide sequence is in the genome of a domesticated animal, such as a horse, cat, dog, or rabbit.
[0279] Some embodiments of the disclosure are based on the recognition that any of the fusion proteins provided herein are capable of modifying a specific nucleobase without generating a significant proportion of indels. An“indel”, as used herein, refers to the insertion or deletion of a nucleobase within a nucleic acid. Such insertions or deletions can lead to frame shift mutations within a coding region of a gene. In some embodiments, it is desirable to generate fusion proteins that efficiently modify (e.g. oxidize) a specific nucleotide within a nucleic acid, without generating a large number of insertions or deletions (i.e., indels) in the nucleic acid. In certain embodiments, any of the fusion proteins provided herein are capable of generating a greater proportion of intended modifications (e.g., point mutations) versus indels. [0280] In some embodiments, the fusion proteins provided herein are capable of generating a ratio of intended point mutations to indels that is greater than 1:1. In some embodiments, the fusion proteins provided herein are capable of generating a ratio of intended point mutations to indels that is at least 1.5:1, at least 2:1, at least 2.5:1, at least 3:1, at least 3.5:1, at least 4:1, at least 4.5:1, at least 5:1, at least 5.5:1, at least 6:1, at least 6.5:1, at least 7:1, at least 7.5:1, at least 8:1, at least 10:1, at least 12:1, at least 15:1, at least 20:1, at least 25:1, at least 30:1, at least 40:1, at least 50:1, at least 100:1, at least 200:1, at least 300:1, at least 400:1, at least 500:1, at least 600:1, at least 700:1, at least 800:1, at least 900:1, or at least 1000:1, or more. The number of intended mutations and indels may be determined using any suitable method, for example the methods used in the below Examples. In some embodiments, to calculate indel frequencies, sequencing reads are scanned for exact matches to two 10-bp sequences that flank both sides of a window in which indels might occur. If no exact matches are located, the read is excluded from analysis. If the length of this indel window exactly matches the reference sequence the read is classified as not containing an indel. If the indel window is two or more bases longer or shorter than the reference sequence, then the sequencing read is classified as an insertion or deletion, respectively.
[0281] In some embodiments, the fusion proteins provided herein are capable of limiting formation of indels in a region of a nucleic acid. In some embodiments, the region is at a nucleotide targeted by a fusion protein or a region within 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides of a nucleotide targeted by a fusion protein. In some embodiments, any of the fusion proteins provided herein are capable of limiting the formation of indels at a region of a nucleic acid to less than 1%, less than 1.5%, less than 2%, less than 2.5%, less than 3%, less than 3.5%, less than 4%, less than 4.5%, less than 5%, less than 6%, less than 7%, less than 8%, less than 9%, less than 10%, less than 12%, less than 15%, or less than 20%. The number of indels formed at a nucleic acid region may depend on the amount of time a nucleic acid (e.g., a nucleic acid within the genome of a cell) is exposed to a fusion protein. In some embodiments, an number or proportion of indels is determined after at least 1 hour, at least 2 hours, at least 6 hours, at least 12 hours, at least 24 hours, at least 36 hours, at least 48 hours, at least 3 days, at least 4 days, at least 5 days, at least 7 days, at least 10 days, or at least 14 days of exposing a nucleic acid (e.g., a nucleic acid within the genome of a cell) to a fusion protein.
[0282] Some embodiments of the disclosure are based on the recognition that any of the fusion proteins provided herein are capable of efficiently generating an intended mutation, such as a point mutation, in a nucleic acid (e.g. a nucleic acid within a genome of a subject) without generating a significant number of unintended mutations, such as unintended point mutations. In some embodiments, an intended mutation is a mutation that is generated by a specific fusion protein bound to a gRNA, specifically designed to generate the intended mutation. In some embodiments, the intended mutation is a mutation associated with a disease, disorder, or condition. In some embodiments, the intended mutation is the correction of a cytosine (C) to adenine (A) point mutation associated with a disease, disorder, or condition. In some embodiments, the intended mutation is the correction of a guanine (G) to thymine (T) point mutation associated with a disease, disorder, or condition. In some embodiments, the intended mutation is the correction of a cytosine (C) to adenine (A) point mutation within the coding region of a gene. In some embodiments, the intended mutation is the correction of a guanine (G) to thymine (T) point mutation within the coding region of a gene. In some embodiments, the intended mutation is a point mutation that generates a stop codon, for example, a premature stop codon within the coding region of a gene. In some embodiments, the intended mutation is a mutation that eliminates a stop codon. In some embodiments, the intended mutation is a mutation that alters the splicing of a gene. In some embodiments, the intended mutation is a mutation that alters the regulatory sequence of a gene (e.g., a gene promotor or gene repressor). In some embodiments, any of the fusion proteins provided herein are capable of generating a ratio of intended mutations to unintended mutations (e.g., intended point m utati o n s : u n i n t c n dcd point mutations) that is greater than 1: 1. In some embodiments, any of the fusion proteins provided herein are capable of generating a ratio of intended mutations to unintended mutations (e.g., intended point
m utati o n s : u n i n t c n dcd point mutations) that is at least 1.5: 1, at least 2: 1, at least 2.5: 1, at least 3: 1, at least 3.5: 1, at least 4: 1, at least 4.5: 1, at least 5: 1, at least 5.5: 1, at least 6: 1, at least 6.5: 1, at least 7: 1, at least 7.5: 1, at least 8: 1, at least 10: 1, at least 12: 1, at least 15: 1, at least 20: 1, at least 25: 1, at least 30: 1, at least 40: 1, at least 50: 1, at least 100: 1, at least 150: 1, at least 200: 1, at least 250: 1, at least 500: 1, or at least 1000: 1, or more.
[0283] Some embodiments of the disclosure are based on the recognition that the formation of indels in a region of a nucleic acid may be limited by nicking the non-edited strand opposite to the strand in which edits are introduced. This nick serves to direct mismatch repair machinery to the non-edited strand, ensuring that the chemically modified nucleobase is not interpreted as a lesion by the machinery. This nick may be created by the use of an nCas9. The methods provided in this disclosure comprise cutting (or nicking) the non-edited strand of the double-stranded DNA, for example, wherein the one strand comprises the A of the target T: A nucleobase pair, or the T of the T:A nucleobase pair. Guide sequences (e.g., guide RNAs)
[0284] The present disclosure further provides guide RNAs for use in accordance with the disclosed methods of editing. The disclosure provides guide RNAs that are designed to recognize target sequences. Such gRNAs may be designed to have guide sequences (or “spacers”) having complementarity to a protospacer within the target sequence. Guide RNAs are also provided for use with one or more of the disclosed fusion proteins, e.g., in the disclosed methods of editing a nucleic acid molecule. Such gRNAs may be designed to have guide sequences having complementarity to a protospacer within a target sequence to be edited, and to have backbone sequences that interact specifically with the napDNAbp domains of any of the disclosed base editors, such as Cas9 nickase domains of the disclosed base editors.
[0285] In various embodiments, the ACBEs may be complexed, bound, or otherwise associated with (e.g., via any type of covalent or non-covalent bond) one or more guide sequences, i.e., the sequence which becomes associated or bound to the base editor and directs its localization to a specific target sequence having complementarity to the guide sequence or a portion thereof. The particular design embodiments of a guide sequence will depend upon the nucleotide sequence of a genomic target site of interest (i.e., the desired site to be edited) and the type of napDNAbp (e.g., type of Cas protein) present in the base editor, among other factors, such as PAM sequence locations, percent G/C content in the target sequence, the degree of microhomology regions, secondary structures, etc.
[0286] In general, a guide sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a napDNAbp (e.g., a Cas9, Cas9 homolog, or Cas9 variant) to the target sequence. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%,
85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting example of which include the Smith- Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g. the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net). In some embodiments, a guide sequence is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length.
[0287] In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length. The ability of a guide sequence to direct sequence- specific binding of a base editor to a target sequence may be assessed by any suitable assay. For example, the components of a base editor, including the guide sequence to be tested, may be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of a base editor disclosed herein, followed by an assessment of preferential cleavage within the target sequence, such as by Surveyor assay as described herein. Similarly, cleavage of a target polynucleotide sequence may be evaluated in a test tube by providing the target sequence, components of a base editor, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions. Other assays are possible, and will occur to those skilled in the art.
[0288] A guide sequence may be selected to target any target sequence. In some
embodiments, the target sequence is a sequence within a genome of a cell. Exemplary target sequences include those that are unique in the target genome. For example, for the S.
pyogenes Cas9, a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXGG (SEQ ID NO: 58) where
NNNNNNNNNNNNXGG (N is A, G, T, or C; and X can be anything) (SEQ ID NO: 59) has a single occurrence in the genome. A unique target sequence in a genome may include an S. pyogenes Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNXGG (SEQ ID NO: 60) where NNNNNNNNNNNXGG (N is A, G, T, or C; and X can be anything) (SEQ ID NO: 61) has a single occurrence in the genome. For the S. thermophilus CRISPRlCas9, a unique target sequence in a genome may include a Cas9 target site of the form
MMMMMMMMNNNNNNNNNNNNXXAGAAW (SEQ ID NO: 62) where
NNNNNNNNNNNNXXAGAAW (N is A, G, T, or C; X can be anything; and W is A or T) (SEQ ID NO: 63) has a single occurrence in the genome. A unique target sequence in a genome may include an S. thermophilus CRISPR 1 Cas9 target site of the form
MMMMMMMMMNNNNNNNNNNNXXAGAAW (SEQ ID NO: 64) where
NNNNNNNNNNNXXAGAAW (N is A, G, T, or C; X can be anything; and W is A or T) (SEQ ID NO: 65) has a single occurrence in the genome. For the S. pyogenes Cas9, a unique target sequence in a genome may include a Cas9 target site of the form
MMMMMMMMNNNNNNNNNNNNXGGXG (SEQ ID NO: 66) where NNNNNNNNNNNNXGGXG (N is A, G, T, or C; and X can be anything) (SEQ ID NO: 67) has a single occurrence in the genome. A unique target sequence in a genome may include an S. pyogenes Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNXGGXG (SEQ ID NO: 68) where NNNNNNNNNNNXGGXG (N is A, G, T, or C; and X can be anything) (SEQ ID NO: 69) has a single occurrence in the genome. In each of these sequences“M” may be A, G, T, or C, and need not be considered in identifying a sequence as unique.
[0289] In some embodiments, a guide sequence is selected to reduce the degree of secondary structure within the guide sequence. Secondary structure may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimal Gibbs free energy. An example of one such algorithm is mFold, as described by Zuker & Stiegler {Nucleic Acids Res. 9 (1981), 133-148). Another example folding algorithm is the online Webserver RNAfold, developed at Institute for Theoretical Chemistry at the University of Vienna, using the centroid structure prediction algorithm (see, e.g., A. R. Gruber el al., 2008, Cell 106(1): 23-24; and PA Carr & GM Church, 2009, Nature Biotechnology 27(12): 1151- 62). Additional algorithms may be found in Chuai, G. el al, DeepCRISPR: optimized
CRISPR guide RNA design by deep learning , Genome Biol. 19:80 (2018), and U.S.
application Ser. No. 61/836,080 and U.S. Patent No. 8,871,445, issued October 28, 2014, the entireties of each of which are incorporated herein by reference.
[0290] In general, a tracr mate sequence includes any sequence that has sufficient
complementarity with a tracr sequence to promote one or more of: (1) excision of a guide sequence flanked by tracr mate sequences in a cell containing the corresponding tracr sequence; and (2) formation of a complex at a target sequence, wherein the complex comprises the tracr mate sequence hybridized to the tracr sequence. In general, degree of complementarity is with reference to the optimal alignment of the tracr mate sequence and tracr sequence, along the length of the shorter of the two sequences. Optimal alignment may be determined by any suitable alignment algorithm, and may further account for secondary structures, such as self-complementarity within either the tracr sequence or tracr mate sequence. In some embodiments, the degree of complementarity between the tracr sequence and tracr mate sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher. In some embodiments, the tracr sequence is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length. In some embodiments, the tracr sequence and tracr mate sequence are contained within a single transcript, such that hybridization between the two produces a transcript having a secondary structure, such as a hairpin. Preferred loop forming sequences for use in hairpin structures are four nucleotides in length, and most preferably have the sequence GAAA. However, longer or shorter loop sequences may be used, as may alternative sequences. The sequences preferably include a nucleotide triplet (for example, AAA), and an additional nucleotide (for example C or G). Examples of loop forming sequences include CAAA and AAAG. In an embodiment of the invention, the transcript or transcribed polynucleotide sequence has at least two or more hairpins. In certain embodiments, the transcript has two, three, four or five hairpins. In a further embodiment of the invention, the transcript has at most five hairpins.
In some embodiments, the single transcript further includes a transcription termination sequence; preferably this is a polyT sequence, for example six T nucleotides. Further non limiting examples of single polynucleotides comprising a guide sequence, a tracr mate sequence, and a tracr sequence are as follows (listed 5' to 3'), where“N” represents a base of a guide sequence, the first block of lower case letters represent the tracr mate sequence, and the second block of lower case letters represent the tracr sequence, and the final poly-T sequence represents the transcription terminator:
(1) NNNNNNNNgtttttgtactctcaagatttaGAAAtaaatcttgcagaagctacaaagataaggctt
catgccgaaatcaacaccctgtcattttatggcagggtgttttcgttatttaaTTTTTT (SEQ ID NO: 71);
(2)
NNNNNNNNNNNNNNNNNNgtttttgtactctcaGAAAtgcagaagctacaaagataaggcttcatgccgaaatca acaccctgtcattttatggcagggtgttttcgttatttaaTTTTTT (SEQ ID NO: 72); (3)
NNNNNNNNNNNNNNNNNNNNgtttttgtactctcaGAAAtgcagaagctacaaagataaggcttcatgccgaa atca acaccctgtcattttatggcagggtgtTTTTT (SEQ ID NO: 73); (4)
NNNNNNNNNNNNNNNNNNNNgttttagagctaGAAAtagcaagttaaaataaggctagtccgttatcaacttga aaa agtggcaccgagtcggtgcTTTTTT (SEQ ID NO: 74); (5)
NNNNNNNNNNNNNNNNNNNNgttttagagctaGAAATAGcaagttaaaataaggctagtccgttatcaactt gaa aaagtgTTTTTTT (SEQ ID NO: 75); and (6)
NNNNNNNNNNNNNNNNNNNNgttttagagctagAAATAGcaagttaaaataaggctagtccgttatcaTTT TT TTT (SEQ ID NO: 76). In some embodiments, sequences (1) to (3) are used in combination with Cas9 from S. thermophilus CRISPR1. In some embodiments, sequences (4) to (6) are used in combination with Cas9 from S. pyogenes. In some embodiments, the tracr sequence is a separate transcript from a transcript comprising the tracr mate sequence.
[0291] It will be apparent to those of skill in the art that in order to target any of the fusion proteins comprising a Cas9 domain and an oxidase, as disclosed herein, to a target site, e.g., a site comprising a point mutation to be edited, it is typically necessary to co-express the fusion protein together with a guide RNA, e.g., an sgRNA. As explained in more detail elsewhere herein, a guide RNA typically comprises a tracrRNA framework allowing for Cas9 binding, and a guide sequence, which confers sequence specificity to the Cas9:nucleic acid editing enzyme/domain fusion protein.
[0292] In some embodiments, the guide RNAs for use in accordance with the disclosed methods of editing comprise a backbone structure that is recognized by an S. pyogenes Cas9 protein or domain, such as an SpCas9 domain of the disclosed base editors. The backbone structure recognized by an SpCas9 protein may comprise the sequence 5'-[guide sequence]- guuuuagagcuagaaauagcaaguuaaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuu uu-3' (SEQ ID NO: 77), wherein the guide sequence comprises a sequence that is complementary to the protospacer of the target sequence. See U.S. Publication No.
2015/0166981, published June 18, 2015, the disclosure of which is incorporated by reference herein. The guide sequence is typically 20 nucleotides long.
[0293] In other embodiments, the guide RNAs for use in accordance with the disclosed methods of editing comprise a backbone structure that is recognized by an S. aureus Cas9 protein. The backbone structure recognized by an SaCas9 protein may comprise the sequence 5 '-[guide sequence] - guuuuaguacucuguaaugaaaauuacagaaucuacuaaaacaaggcaaaaugccguguuuaucucgucaacuuguugg cgagauuuuuuu-3' (SEQ ID NO: 161).
[0294] The sequences of suitable guide RNAs for targeting the disclosed fusion proteins to specific genomic target sites will be apparent to those of skill in the art based on the instant disclosure. Such suitable guide RNA sequences typically comprise guide sequences that are complementary to a nucleic sequence within 50 nucleotides upstream or downstream of the target nucleotide to be edited. Some exemplary guide RNA sequences suitable for targeting any of the provided fusion proteins to specific target sequences are provided herein.
Additional guide sequences are are well known in the art and can be used with the base editors described herein. Additional exemplary guide sequences are disclosed in, for example, Jinek M., et al., Science 337:816-821(2012); Mali P, Esvelt KM & Church GM (2013) Cas9 as a versatile tool for engineering biology, Nature Methods , 10, 957-963; Li JF et al, (2013) Multiplex and homologous recombination-mediated genome editing in
Arabidopsis and Nicotiana benthamiana using guide RNA and Cas9, Nature Biotechnology , 31, 688-691; Hwang, W.Y. et al., Efficient genome editing in zebrafish using a CRISPR-Cas system, Nature Biotechnology 31, 227-229 (2013); Cong L et al., (2013) Multiplex genome engineering using CRIPSR/Cas systems, Science, 339, 819-823; Cho SW et al., (2013) Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease, Nature Biotechnology, 31, 230-232; Jinek, M. et al., RNA-programmed genome editing in human cells, eLife 2, e00471 (2013); Dicarlo, J.E. el al., Genome engineering in
Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic Acid Res. (2013); Briner AE et al., (2014) Guide RNA functional modules direct Cas9 activity and orthogonality, Mol Cell, 56, 333-339, the entire contents of each of which are herein incorporated by reference.
Methods for making fusion proteins
[0295] The disclosure further relates in various aspects to methods of making the disclosed fusion proteins by various modes of manipulation that include, but are not limited to, codon optimization to achieve greater expression levels in a cell, and the use of nuclear localization sequences (NLSs), preferably at least two NLSs, e.g., two bipartite NLSs, to increase the localization of the expressed fusion proteins into a cell nucleus.
[0296] The fusion proteins contemplated herein can include modifications that result in increased expression, for example, through codon optimization.
[0297] In some embodiments, the fusion proteins (or a component thereof) is codon optimized for expression in particular cells, such as eukaryotic cells. The eukaryotic cells may be those of or derived from a particular organism, such as a mammal, including, but not limited to, human, mouse, rat, rabbit, dog, or non-human primate. In general, codon optimization refers to a process of modifying a nucleic acid sequence for enhanced expression in the host cells of interest by replacing at least one codon (e.g. about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more codons) of the native sequence with codons that are more frequently or most frequently used in the genes of that host cell while maintaining the native amino acid sequence. Various species exhibit particular bias for certain codons of a particular amino acid. Codon bias (differences in codon usage between organisms) often correlates with the efficiency of translation of messenger RNA (mRNA), which is in turn believed to be dependent on, among other things, the properties of the codons being translated and the availability of particular transfer RNA (tRNA) molecules. The
predominance of selected tRNAs in a cell is generally a reflection of the codons used most frequently in peptide synthesis. Accordingly, genes can be tailored for optimal gene expression in a given organism based on codon optimization. Codon usage tables are readily available, for example, at the“Codon Usage Database”, and these tables can be adapted in a number of ways. See Nakamura, Y., et al.“Codon usage tabulated from the international DNA sequence databases: status for the year 2000” Nucl. Acids Res. 28:292 (2000).
Computer algorithms for codon optimizing a particular sequence for expression in a particular host cell are also available, such as Gene Forge (Aptagen; Jacobus, Pa.), are also available. In some embodiments, one or more codons (e.g. 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more, or all codons) in a sequence encoding a CRISPR enzyme correspond to the most frequently used codon for a particular amino acid.
[0298] The above description is meant to be non-limiting with regard to making fusion proteins having increased expression, and thereby increase editing efficiencies.
Directed evolution methods (e.g., PACE or PANCE)
[0299] Various embodiments of the disclosure relate to providing directed evolution methods and systems (e.g., appropriate vectors, cells, phage, flow vessels, etc.) for engineering of the base editors or base editor domains of the present disclosure. The disclosure provides vector systems for the disclosed directed evolution methods to engineer any of the disclosed base editors or base editor domains.
[0300] The directed evolution vector systems and methods provided herein allow for a gene of interest (e.g., a base editor- or adenine oxidase-encoding gene) in a viral vector to be evolved over multiple generations of viral life cycles in a flow of host cells to acquire a desired function or activity.
[0301] In PACE, the gene under selection is encoded on the M13 bacteriophage genome. Its activity is linked to M13 propagation by controlling expression of gene III so that only active variants produce infectious progeny phage. Phage are continuously propagated and mutagenized, but mutations accumulate only in the phage genome, not the host or its selection circuit, because fresh host cells are continually flowed into (and out of) the growth vessel, effectively resetting the selection background.
[0302] PACE enables the rapid continuous evolution of biomolecules through many generations of mutation, selection, and replication per day. During PACE, host E. coli cells continuously dilute a population of bacteriophage (selection phage, SP) containing the gene of interest. The gene of interest replaces gene III on the SP, which is required for progeny phage infectivity. SP containing desired gene variants trigger host-cell gene III expression from an accessory plasmid (AP). Host-cell DNA plasmids encode a genetic circuit that links the desired activity of the protein encoded in the SP to the expression of gene III on the AP. Thus, SP variants containing desired gene variants can propagate, while phage encoding inactive variants do not generate infectious progeny and are rapidly diluted out of the culture vessel (or lagoon). An arabinose-inducible mutagenesis plasmid (MP) controls the phage mutation rate.
[0303] The key to new PACE selections is linking gene III expression to the activity of interest. A low stringency selection was designed in which base editing activates T7 RNA polymerase, which transcribes gill. A single editing event can lead to high output
amplification immediately upon transcription of the edited DNA. Reference is made to International Patent Publication WO 2019/023680, published January 31, 2019; Badran, A.H. & Liu, D.R. In vivo continuous directed evolution. Curr. Opin. Chem. Biol. 24, 1-10 (2015); Dickinson, B.C., Packer, M.S., Badran, A.H. & Liu, D.R. A system for the continuous directed evolution of proteases rapidly reveals drug-resistance mutations. Nat. Commun. 5, 5352 (2014); Hubbard, B.P. et al. Continuous directed evolution of DNA-binding proteins to improve TALEN specificity. Nat. Methods 12, 939-942 (2015); Wang, T., Badran, A.H., Huang, T.P. & Liu, D.R. Continuous directed evolution of proteins with improved soluble expression. Nat. Chem. Biol. 14, 972-980 (2018), and Thuronyi, B.W. et al. Continuous evolution of base editors with expanded target compatibility and improved activity. Nat. Biotechnol., 1070-1079 (2019), each of which is herein incorporated by reference.
[0304] In some embodiments, the viral vector or the phage is a filamentous phage, for example, an M13 phage, such as an M13 selection phage as described in more detail elsewhere herein. In some such embodiments, the gene required for the production of infectious viral particles is the M13 gene III (gill).
[0305] In some embodiments, the viral vector infects mammalian cells. In some
embodiments, the viral vector is a retroviral vector. In some embodiments, the viral vector is a vesicular stomatitis virus (VSV) vector. As a dsRNA virus, VSV has a high mutation rate, and can carry cargo, including a gene of interest, of up to 4.5 kb in length. The generation of infectious VSV particles requires the envelope protein VSV-G, a viral glycoprotein that mediates phosphatidylserine attachment and cell entry. VSV can infect a broad spectrum of host cells, including mammalian and insect cells. VSV is therefore a highly suitable vector for continuous evolution in human, mouse, or insect host cells. Similarly, other retroviral vectors that can be pseudotyped with VSV-G envelope protein are equally suitable for continuous evolution processes as described herein.
[0306] It is known to those of skill in the art that many retroviral vectors, for example, Murine Leukemia Vims vectors, or Lentiviral vectors can efficiently be packaged with VSV- G envelope protein as a substitute for the vims’s native envelope protein. In some embodiments, such VSV-G packagable vectors are adapted for use in a continuous evolution system in that the native envelope (env) protein (e.g., VSV-G in VSVS vectors, or env in MLV vectors) is deleted from the viral genome, and a gene of interest is inserted into the viral genome under the control of a promoter that is active in the desired host cells. The host cells, in turn, express the VSV-G protein, another env protein suitable for vector
pseudotyping, or the viral vector’s native env protein, under the control of a promoter the activity of which is dependent on an activity of a product encoded by the gene of interest, so that a viral vector with a mutation leading to increased activity of the gene of interest will be packaged with higher efficiency than a vector with baseline or a loss-of-function mutation.
[0307] In some embodiments, mammalian host cells are subjected to infection by a continuously evolving population of viral vectors, for example, VSV vectors comprising a gene of interest and lacking the VSV-G encoding gene, wherein the host cells comprise a gene encoding the VSV-G protein under the control of a conditional promoter. Such retrovirus-bases system could be a two-vector system (the viral vector and an expression construct comprising a gene encoding the envelope protein), or, alternatively, a helper virus can be employed, for example, a VSV helper vims. A helper virus typically comprises a truncated viral genome deficient of structural elements required to package the genome into viral particles, but including viral genes encoding proteins required for viral genome processing in the host cell, and for the generation of viral particles. In such embodiments, the viral vector-based system could be a three-vector system (the viral vector, the expression construct comprising the envelope protein driven by a conditional promoter, and the helper vims comprising viral functions required for viral genome propagation but not the envelope protein). In some embodiments, expression of the five genes of the VSV genome from a helper vims or expression constmct in the host cells, allows for production of infectious viral particles carrying a gene of interest, indicating that unbalanced gene expression permits viral replication at a reduced rate, suggesting that reduced expression of VSV-G would indeed serve as a limiting step in efficient viral production.
[0308] One advantage of using a helper vims is that the viral vector can be deficient in genes encoding proteins or other functions provided by the helper vims, and can, accordingly, carry a longer gene of interest. In some embodiments, the helper vims does not express an envelope protein, because expression of a viral envelope protein is known to reduce the infectability of host cells by some viral vectors via receptor interference. Viral vectors, for example retroviral vectors, suitable for continuous evolution processes, their respective envelope proteins, and helper vimses for such vectors, are well known to those of skill in the art. For an overview of some exemplary viral genomes, helper vimses, host cells, and envelope proteins suitable for continuous evolution procedures as described herein, see Coffin et al., Retroviruses, CSHL Press 1997, ISBN0-87969-571-4, incorporated herein in its entirety.
[0309] In some embodiments, the incubating of the host cells is for a time sufficient for at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 200, at least 300, at least 400, at least, 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 1250, at least 1500, at least 1750, at least 2000, at least 2500, at least 3000, at least 4000, at least 5000, at least 7500, at least 10000, or more consecutive viral life cycles. In certain embodiments, the viral vector is an M13 phage, and the length of a single viral life cycle is about 10-20 minutes.
[0310] In some embodiments, a viral vector/host cell combination is chosen in which the life cycle of the viral vector is significantly shorter than the average time between cell divisions of the host cell. Average cell division times and viral vector life cycle times are well known in the art for many cell types and vectors, allowing those of skill in the art to ascertain such host cell/vector combinations. In certain embodiments, host cells are being removed from the population of host cells contacted with the viral vector at a rate that results in the average time of a host cell remaining in the host cell population before being removed to be shorter than the average time between cell divisions of the host cells, but to be longer than the average life cycle of the viral vector employed. The result of this is that the host cells, on average, do not have sufficient time to proliferate during their time in the host cell population while the viral vectors do have sufficient time to infect a host cell, replicate in the host cell, and generate new viral particles during the time a host cell remains in the cell population.
This assures that the only replicating nucleic acid in the host cell population is the viral vector, and that the host cell genome, the accessory plasmid, or any other nucleic acid constructs cannot acquire mutations allowing for escape from the selective pressure imposed.
[0311] For example, in some embodiments, the average time a host cell remains in the host cell population is about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 70, about 80, about 90, about 100, about 120, about 150, or about 180 minutes.
[0312] In some embodiments, the average time a host cell remains in the host cell population depends on how fast the host cells divide and how long infection (or conjugation) requires.
In general, the flow rate should be faster than the average time required for cell division, but slow enough to allow viral (or conjugative) propagation. The former will vary, for example, with the media type, and can be delayed by adding cell division inhibitor antibiotics (FtsZ inhibitors in E. coli, etc.). Since the limiting step in continuous evolution is production of the protein required for gene transfer from cell to cell, the flow rate at which the vector washes out will depend on the current activity of the gene(s) of interest. In some embodiments, titratable production of the protein required for the generation of infectious particles, as described herein, can mitigate this problem. In some embodiments, an indicator of phage infection allows computer-controlled optimization of the flow rate for the current activity level in real-time.
[0313] In some embodiments, the fresh host cells comprise the accessory plasmid required for selection of viral vectors, for example, the accessory plasmid comprising the gene required for the generation of infectious phage particles that is lacking from the phages being evolved. In some embodiments, the host cells are generated by contacting an uninfected host cell with the relevant vectors, for example, the accessory plasmid and, optionally, a mutagenesis plasmid, and growing an amount of host cells sufficient for the replenishment of the host cell population in a continuous evolution experiment. Methods for the introduction of plasmids and other gene constructs into host cells are well known to those of skill in the art and the invention is not limited in this respect. For bacterial host cells, such methods include, but are not limited to, electroporation and heat-shock of competent cells.
[0314] In some embodiments, the accessory plasmid comprises a selection marker, for example, an antibiotic resistance marker, and the fresh host cells are grown in the presence of the respective antibiotic to ensure the presence of the plasmid in the host cells. Where multiple plasmids are present, different markers are typically used. Such selection markers and their use in cell culture are known to those of skill in the art, and the invention is not limited in this respect.
[0315] In some embodiments, the selection marker is a spectinomycin antibiotic resistance marker. Cells are transformed with a selection plasmid containing an inactivated
spectinomycin resistance gene with a mutation at an active site that requires A:T to C:G editing to correct. Cells that fail to install the correct transversion mutation in the
spectinomycin resistance gene will die, while cells that make the correction will survive. E. coli cells expressing an sgRNA targeting the active site mutation in the spectinomycin resistance gene and a nucleobase modification domain-dCas9 fusion protein are plated onto 2xYT agar with 256 pg/mL of spectinomycin. Surviving colonies (measured through CFUs) were sequenced to find consensus mutations in the fusion proteins expressed in the evolved survivors (FIG. 4). A similar selection assay was used to evolve adenine deaminase activity in DNA during adenine base editor development, as described in Gaudelli, N. M. el al, Programmable base editing of A·T to G*C in genomic DNA without DNA cleavage. Nature 551, 464-471 (2017), herein incorporated in its entirety by reference.
[0316] In some embodiments, the selection marker is a chloramphenicol antibiotic resistance marker. Cells are transformed with a selection plasmid containing an inactivated
chloramphenicol resistance gene with a mutation at an active site that requires A:T to C:G editing to correct. Cells that fail to install the correct transversion mutation in the spectinomycin resistance gene will die, while cells that make the correction will survive. E. coli cells expressing an sgRNA targeting the active site mutation in the chloramphenicol resistance gene and a nucleobase modification domain-dCas9 fusion protein are plated onto 2xYT agar with 256 pg/mL of chloramphenicol. Surviving colonies (measured through CFUs) were sequenced to find consensus mutations in the fusion proteins expressed in the evolved survivors.
[0317] In other embodiments, the selection marker is a carbenicillin antibiotic resistance marker. Cells are transformed with a selection plasmid containing an inactivated
carbenicillin resistance gene with a mutation at an active site that requires A:T to C:G editing to correct. Cells that fail to install the correct transversion mutation in the spectinomycin resistance gene will die, while cells that make the correction will survive. E. coli cells expressing an sgRNA targeting the active site mutation in the carbenecillin resistance gene and a nucleobase modification domain-dCas9 fusion protein are plated onto 2xYT agar with 256 pg/mL of carbenicillin. Surviving colonies (measured through CFUs) were sequenced to find consensus mutations in the fusion proteins expressed in the evolved survivors.
[0318] In some embodiments, mismatch-specific uracil-DNA glycosylase (MUG) knockout E. coli cells are used during the above spectinomycin, carbencillin, and/or chloramphenicol screening experiments to avoid excision of the target 8-oxoadenine before the full base editing process can be completed.
[0319] In some embodiments, the host cell population in a continuous evolution experiment is replenished with fresh host cells growing in a parallel, continuous culture. In some embodiments, the cell density of the host cells in the host cell population contacted with the viral vector and the density of the fresh host cell population is substantially the same.
[0320] Typically, the cells being removed from the cell population contacted with the viral vector comprise cells that are infected with the viral vector and uninfected cells. In some embodiments, cells are being removed from the cell populations continuously, for example, by effecting a continuous outflow of the cells from the population. In other embodiments, cells are removed semi-continuously or intermittently from the population. In some embodiments, the replenishment of fresh cells will match the mode of removal of cells from the cell population, for example, if cells are continuously removed, fresh cells will be continuously introduced. However, in some embodiments, the modes of replenishment and removal may be mismatched, for example, a cell population may be continuously replenished with fresh cells, and cells may be removed semi-continuously or in batches.
[0321] In some embodiments, the rate of fresh host cell replenishment and/or the rate of host cell removal is adjusted based on quantifying the host cells in the cell population. For example, in some embodiments, the turbidity of culture media comprising the host cell population is monitored and, if the turbidity falls below a threshold level, the ratio of host cell inflow to host cell outflow is adjusted to effect an increase in the number of host cells in the population, as manifested by increased cell culture turbidity. In other embodiments, if the turbidity rises above a threshold level, the ratio of host cell inflow to host cell outflow is adjusted to effect a decrease in the number of host cells in the population, as manifested by decreased cell culture turbidity. Maintaining the density of host cells in the host cell population within a specific density range ensures that enough host cells are available as hosts for the evolving viral vector population, and avoids the depletion of nutrients at the cost of viral packaging and the accumulation of cell-originated toxins from overcrowding the culture.
[0322] In some embodiments, the cell density in the host cell population and/or the fresh host cell density in the inflow is about 102 cells/ml to about 1012 cells/ml. In some embodiments, the host cell density is about 102 cells/ml, about 103 cells/ml, about 104 cells/ml, about 105 cells/ml, about 5- 105 cells/ml, about 106 cells/ml, about 5- 106 cells/ml, about 107 cells/ml, about 5- 107 cells/ml, about 108 cells/ml, about 5- 108 cells/ml, about 109 cells/ml, about 5- 109 cells/ml, about 1010 cells/ml, or about 5- 1010 cells/ml. In some embodiments, the host cell density is more than about 1010 cells/ml.
[0323] In some embodiments, the host cell population is contacted with a mutagen. In some embodiments, the cell population contacted with the viral vector (e.g., the phage), is continuously exposed to the mutagen at a concentration that allows for an increased mutation rate of the gene of interest, but is not significantly toxic for the host cells during their exposure to the mutagen while in the host cell population. In other embodiments, the host cell population is contacted with the mutagen intermittently, creating phases of increased mutagenesis, and accordingly, of increased viral vector diversification. For example, in some embodiments, the host cells are exposed to a concentration of mutagen sufficient to generate an increased rate of mutagenesis in the gene of interest for about 10%, about 20%, about 50%, or about 75% of the time.
[0324] In some embodiments, selection of the mutagen is guided by crystallographic structural information about the wild-type oxidase to be evolved, for instance information about a binding pocket within the oxidase. In some embodiments, mutations are targeted to residues in the active site of a wild-type iron-dependent oxidase with the goal of affecting the relative orientation of the target adenine and the non-heme Fe(IV) center. In other embodiments, mutations are targeted to the DNA binding interface of a wild-type iron- dependent oxidase with the goal of affecting the relative orientation of the target adenine and the non-heme Fe(IV) center. In the Examples disclosed below, variants of AlkBH3 were evolved using continuous evolution systems to form a large library of AlkBH3 mutants, wherein mutations were targeted to residue in the active site and/or DNA binding interface of AlkBH3.
[0325] In some embodiments, the host cells comprise a mutagenesis expression construct, for example, in the case of bacterial host cells, a mutagenesis plasmid. In some embodiments, the mutagenesis plasmid comprises a gene expression cassette encoding a mutagenesis- promoting gene product, for example, a proofreading-impaired DNA polymerase. In other embodiments, the mutagenesis plasmid, including a gene involved in the SOS stress response, (e.g., UmuC, UmuD', and/or RecA). In some embodiments, the mutagenesis- promoting gene is under the control of an inducible promoter. Suitable inducible promoters are well known to those of skill in the art and include, for example, arabinose-inducible promoters, tetracycline or doxycyclin-inducible promoters, and tamoxifen-inducible promoters. In some embodiments, the host cell population is contacted with an inducer of the inducible promoter in an amount sufficient to effect an increased rate of mutagenesis. For example, in some embodiments, a bacterial host cell population is provided in which the host cells comprise a mutagenesis plasmid in which a dnaQ926, UmuC, UmuD', and RecA expression cassette is controlled by an arabinose-inducible promoter. In some such embodiments, the population of host cells is contacted with the inducer, for example, arabinose in an amount sufficient to induce an increased rate of mutation.
[0326] In some embodiments, diversifying the viral vector population is achieved by providing a flow of host cells that does not select for gain-of-function mutations in the gene of interest for replication, mutagenesis, and propagation of the population of viral vectors. In some embodiments, the host cells are host cells that express all genes required for the generation of infectious viral particles, for example, bacterial cells that express a complete helper phage, and, thus, do not impose selective pressure on the gene of interest. In other embodiments, the host cells comprise an accessory plasmid comprising a conditional promoter with a baseline activity sufficient to support viral vector propagation even in the absence of significant gain-of-function mutations of the gene of interest. This can be achieved by using a“leaky” conditional promoter, by using a high-copy number accessory plasmid, thus amplifying baseline leakiness, and/or by using a conditional promoter on which the initial version of the gene of interest effects a low level of activity while a desired gain- of-function mutation effects a significantly higher activity.
[0327] Detailed methods of procedures for directing continuous evolution of base editors in a population of host cells using phage particles are disclosed in International PCT Application, PCT/US 2009/056194, filed September 8, 2009, published as WO 2010/028347 on March 11, 2010; International PCT Application, PCT/US2011/066747, filed December 22, 2011, published as WO 2012/088381 on June 28, 2012; U.S. Patent No. 9,023,594, issued May 5, 2015; U.S. Patent No. 9,771,574, issued September 26, 2017; U.S. Patent No. 9,394,537, issued July 19, 2016; International PCT Application, PCT/US2015/012022, filed January 20, 2015, published as WO 2015/134121 on September 11, 2015; U.S. Patent No. 10,179,911, issued January 15, 2019; International Application No. PCT/US2019/37216, published as WO 2019/241649 on December 19, 2019, International Patent Publication WO 2019/023680, published January 31, 2019, International PCT Application, PCT/US2016/027795, filed April 15, 2016, published as WO 2016/168631 on October 20, 2016, and International Application No. PCT/US2019/47996, filed August 23, 2019, each of which are incorporated herein by reference.
[0328] Methods and strategies to design conditional promoters suitable for carrying out the selection strategies described herein are well known to those of skill in the art. For an overview over exemplary suitable selection strategies and methods for designing conditional promoters driving the expression of a gene required for cell-cell gene transfer, e.g., gene III (gill), see Vidal and Legrain, Yeast n-hybrid review, Nucleic Acid Res. 27, 919 (1999), incorporated herein in its entirety.
[0329] The disclosure provides vectors for the continuous evolution processes. In some embodiments, phage vectors for phage-assisted continuous evolution are provided. In some embodiments, a selection phage is provided that comprises a phage genome deficient in at least one gene required for the generation of infectious phage particles and a gene of interest to be evolved. Reference is made to International Patent Publication WO 2019/023680, published January 31, 2019, herein incorporated by reference. The disclosure provides viral vectors for the continuous evolution processes. In some embodiments, phage vectors for phage-assisted continuous evolution are provided. In some embodiments, a selection phage is provided that comprises a phage genome deficient in at least one gene required for the generation of infectious phage particles and a gene of interest to be evolved.
[0330] For example, in some embodiments, the selection phage comprises an M13 phage genome deficient in a gene required for the generation of infectious M13 phage particles, for example, a full-length gill. In some embodiments, the selection phage comprises a phage genome providing all other phage functions required for the phage life cycle except the gene required for generation of infectious phage particles. In some such embodiments, an M13 selection phage is provided that comprises a gl, gll, gIV, gV, gVI, gVII, gVIII, glX, and a gX gene, but not a full-length gill. In some embodiments, the selection phage comprises a 3'- fragment of gill, but no full-length gill. The 3 '-end of gill comprises a promoter (see Figure 16) and retaining this promoter activity is beneficial, in some embodiments, for an increased expression of gVI, which is immediately downstream of the gill 3 '-promoter, or a more balanced (wild-type phage-like) ratio of expression levels of the phage genes in the host cell, which, in turn, can lead to more efficient phage production. In some embodiments, the 3'- fragment of gill gene comprises the 3 '-gill promoter sequence. In some embodiments, the 3'- fragment of gill comprises the last 180 bp, the last 150 bp, the last 125 bp, the last 100 bp, the last 50 bp, or the last 25 bp of gill. In some embodiments, the 3'- fragment of gill comprises the last 180 bp of gin.
[0331] M13 selection phage is provided that comprises a gene of interest in the phage genome, for example, inserted downstream of the gVIII 3 '-terminator and upstream of the gIII-3 '-promoter. In some embodiments, an M13 selection phage is provided that comprises a multiple cloning site for cloning a gene of interest into the phage genome, for example, a multiple cloning site (MCS) inserted downstream of the gVIII 3 '-terminator and upstream of the gill- 3 '-promoter.
[0332] Some embodiments of this disclosure provide a vector system for continuous evolution procedures, comprising of a viral vector, for example, a selection phage, and a matching accessory plasmid. In some embodiments, a vector system for phage-based continuous directed evolution is provided that comprises (a) a selection phage comprising a gene of interest to be evolved, wherein the phage genome is deficient in a gene required to generate infectious phage; and (b) an accessory plasmid comprising the gene required to generate infectious phage particle under the control of a conditional promoter, wherein the conditional promoter is activated by a function of a gene product encoded by the gene of interest.
[0333] In some embodiments, the selection phage is an M 13 phage as described herein. For example, in some embodiments, the selection phage comprises an M13 genome including all genes required for the generation of phage particles, for example, gl, gll, gIV, gV, gVI, gVII, gVIII, glX, and gX gene, but not a full-length gill gene. In some embodiments, the selection phage genome comprises an FI or an M 13 origin of replication. In some embodiments, the selection phage genome comprises a 3 '-fragment of gill gene. In some embodiments, the selection phage comprises a multiple cloning site upstream of the gill 3 '-promoter and downstream of the gVIII 3 '-terminator.
[0334] Some embodiments of this disclosure provide a method of non-continuous evolution of a gene of interest. In certain embodiments, the method of non-continuous evolution is PANCE. In other embodiments, the method of non-continuous evolution is an antibiotic or plate-based selection method.
[0335] The PANCE methododology comprises first growing the host strain containing a mutagenesis plasmid of E. coli until optical density reaches Aeon = 0.3-0.5 in a large volume. The cells are re-transformed with the mutagenesis plasmid regularly to ensure the plasmid has not been inactivated. An aliquot of a desired concentration, often 2 mL, is then transferred to a smaller flask, supplemeted with inducing agent arabinose (Ara) for the mutagenesis plasmid, and infected with the selection phage (SP). To increase the titer level, a drift plasmid can also be provided that enables phage to propagate without passing the selection. Expression is under the control of an inducible promoter and can be turned on with 50 ng/mL of anhydrotetracycline. This culture is incubated at 37 °C for 8-12 h to facilitate phage growth, which is confirmed by determination of the phage titer. Following phage growth, an aliquot of infected cells is used to transfect a subsequent flask containing host E. coli. This process is continued until the desired phenotype is evolved for as many transfers as required, while increasing the stringency in stepwise fashion by decreasing the incubation time or titer of phage with which the bacteria is infected. Reference is made to Suzuki T. et ah, Crystal structures reveal an elusive functional domain of pyrrolysyl-tRNA synthetase, Nat Chem Biol. 13(12): 1261-1266 (2017), incorporated herein in its entirety.
[0336] In some embodiments, negative selection is applied during a non-continuous evolution method as described herein, by penalizing undesired activities. In some
embodiments, this is achieved by causing the undesired activity to interfere with pill production. For example, expression of an antisense RNA complementary to the gill RBS and/or start codon is one way of applying negative selection, while expressing a protease (e.g., TEV) and engineering the protease recognition sites into pill is another.
[0337] Other non-continuous selection schemes for gene products having a desired activity are well known to those of skill in the art or will be apparent from the instant disclosure. In certain embodiments, following the successful directed evolution of one or more components of the transversion base editor (e.g., a Cas9 domain or an adenine oxidase domain), methods of making the base editors comprise recombinant protein expression methodologies known to one of ordinary skill in the art.
Vectors
[0338] Several embodiments of the making and using the base editors of the disclosure relate to vector systems comprising one or more vectors encoding the disclosed ACBEs. Vectors can be designed to clone and/or express the base editors of the disclosure. Vectors may also be designed to transfect the base editors of the disclosure into one or more cells, e.g., a target diseased eukaryotic cell for treatment with the base editor systems and methods disclosed herein.
[0339] Vectors may be designed for expression of base editor transcripts (e.g. nucleic acid transcripts, proteins, or enzymes) in prokaryotic or eukaryotic cells. For example, base editor transcripts may be expressed in bacterial cells such as Escherichia coli, insect cells (using baculovims expression vectors), yeast cells, or mammalian cells. Suitable host cells are discussed further in Goeddel, Gene Expression Technology: Methods In Enzymology 185, Academic Press. San Diego, Calif. (1990). Alternatively, expression vectors encoding one or more base editors described herein may be transcribed and translated in vitro, for example using T7 promoter regulatory sequences and T7 polymerase.
[0340] Vectors for rational mutagenesis methods such as PACE may be introduced and propagated in a prokaryotic cells. In some embodiments, a prokaryote is used to amplify copies of a vector to be introduced into a eukaryotic cell or as an intermediate vector in the production of a vector to be introduced into a eukaryotic cell (e.g., amplifying a plasmid as part of a viral vector packaging system). In some embodiments, a prokaryote is used to amplify copies of a vector and express one or more nucleic acids, such as to provide a source of one or more proteins for delivery to a host cell or host organism. Expression of proteins in prokaryotes is most often carried out in Escherichia coli with vectors containing constitutive or inducible promoters directing the expression of either fusion or non-fusion proteins. [0341] Fusion expression vectors also may be used to express the base editors of the disclosure. Such vectors generally add a number of amino acids to a protein encoded therein, such as to the amino terminus of the recombinant protein. Such fusion vectors may serve one or more purposes, such as: (i) to increase expression of a recombinant protein; (ii) to increase the solubility of a recombinant protein; and (iii) to aid in the purification of a recombinant protein by acting as a ligand in affinity purification. Often, in fusion expression vectors, a proteolytic cleavage site is introduced at the junction of the fusion domain and the
recombinant protein to enable separation of the recombinant protein from the fusion domain subsequent to purification of the fusion protein. Such enzymes, and their cognate recognition sequences, include Factor Xa, thrombin and enterokinase. Example fusion expression vectors include pGEX (Pharmacia Biotech Inc; Smith and Johnson, 1988. Gene 67: 31-40), pMAL (New England Biolabs, Beverly, Mass.) and pRIT5 (Pharmacia, Piscataway, N.J.) that fuse glutathione S-transferase (GST), maltose E binding protein, or protein A, respectively, to the target recombinant protein.
[0342] Examples of suitable inducible non-fusion E. coli expression vectors include pTrc (Amrann et al., (1988) Gene 69:301-315) and pET l id (Studier et al., Gene Expression Technology: Methods In Enzymology 185, Academic Press, San Diego, Calif. (1990) 60-89).
[0343] In some embodiments, a vector is a yeast expression vector for expressing the base editors described herein. Examples of vectors for expression in yeast Saccharomyces cerivisae include pYepSecl (Baldari, et al., 1987. EMBO J. 6: 229-234), pMFa (Kuijan and Herskowitz, 1982. Cell 30: 933-943), pJRY88 (Schultz et al., 1987. Gene 54: 113-123), pYES2 (Invitrogen Corporation, San Diego, Calif.), and picZ (InVitrogen Corp, San Diego, Calif.).
[0344] In some embodiments, a vector drives protein expression in insect cells using baculovirus expression vectors. Baculovirus vectors available for expression of proteins in cultured insect cells (e.g., SF9 cells) include the pAc series (Smith, et al., 1983. Mol. Cell. Biol. 3: 2156-2165) and the pVL series (Lucklow and Summers, 1989. Virology 170: 31-39).
[0345] In some embodiments, a vector is capable of driving expression of one or more sequences in mammalian cells using a mammalian expression vector. Examples of mammalian expression vectors include pCDM8 (Seed, 1987. Nature 329: 840) and pMT2PC (Kaufman, et al., 1987. EMBO J. 6: 187-195). When used in mammalian cells, the expression vector's control functions are typically provided by one or more regulatory elements. For example, commonly used promoters are derived from polyoma, adenovirus 2,
cytomegalovirus, simian virus 40, and others disclosed herein and known in the art. For other suitable expression systems for both prokaryotic and eukaryotic cells see, e.g., Chapters 16 and 17 of Sambrook, et al., Molecular Cloning: A Laboratory Manual. 2nd ed., Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989.
[0346] In some embodiments, the recombinant mammalian expression vector is capable of directing expression of the nucleic acid preferentially in a particular cell type (e.g., tissue- specific regulatory elements are used to express the nucleic acid). Tissue-specific regulatory elements are known in the art. Non-limiting examples of suitable tissue-specific promoters include the albumin promoter (liver- specific; Pinkert, et al., 1987. Genes Dev. 1: 268-277), lymphoid- specific promoters (Calame and Eaton, 1988. Adv. Immunol. 43: 235-275), in particular promoters of T cell receptors (Winoto and Baltimore, 1989. EMBO J. 8: 729-733) and immunoglobulins (Baneiji, et al., 1983. Cell 33: 729-740; Queen and Baltimore, 1983. Cell 33: 741-748), neuron- specific promoters (e.g., the neurofilament promoter; Byme and Ruddle, 1989. Proc. Natl. Acad. Sci. USA 86: 5473-5477), pancreas-specific promoters (Edlund, et al., 1985. Science 230: 912-916), and mammary gland- specific promoters (e.g., milk whey promoter, U.S. Pat. No. 4,873,316 and European Application Publication No. 264,166). Developmentally-regulated promoters are also encompassed, e.g., the murine hox promoters (Kessel and Gruss, 1990. Science 249: 374-379) and the a-fetoprotein promoter (Campes and Tilghman, 1989. Genes Dev. 3: 537-546).
Methods of Editing A Target Nucleobase Pair, Methods of Treatment, and Uses for the ACBEs
[0347] Some embodiments of the disclosure provide methods for editing a nucleic acid using the base editors described herein to effectuate substitution of an A:T base pair to a C:G base pair. In some embodiments, the method is a method for editing a nucleobase of a nucleic acid (e.g., a base pair of a double- stranded DNA sequence). In some embodiments, the method comprises the steps of: a) contacting a target region of a nucleic acid (e.g., a double- stranded DNA sequence) with a complex comprising a fusion protein (e.g., a Cas9 domain fused to an adenine oxidase domain) and a guide nucleic acid (e.g., gRNA), wherein the target region comprises a targeted nucleobase pair. As a result of embodiments of these methods, strand separation of said target region is induced, a first nucleobase of said target nucleobase pair in a single strand of the target region is converted to a second nucleobase, and no more than one strand of said target region is cut (or nicked), wherein a third nucleobase complementary to the first nucleobase base is replaced by a fourth nucleobase complementary to the second nucleobase. [0348] In some embodiments, the first nucleobase is an adenine (of the target A:T nucleobase pair). In some embodiments, the second nucleobase is the intermediate 8-oxoadenine. In some embodiments, the third nucleobase is a thymine (of the target A:T base pair). In some embodiments, the fourth nucleobase is a guanine. In some embodiments, the method further comprises replacing the second nucleobase with a fifth nucleobase (cytosine) that is complementary to the fourth nucleobase, thereby generating an intended edited base pair (e.g., A:T pair to a C:G pair). In some embodiments, at least 5% of the intended base pairs are edited. In some embodiments, at least 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, or 50% of the intended base paires are edited.
[0349] In some embodiments, the ratio of intended products to unintended products in the target nucleotide is at least 2:1, 5:1, 10:1, 20:1, 30:1, 40:1, 50:1, 60:1, 70:1, 80:1, 90:1, 100:1, or 200:1, or more. In some embodiments, the ratio of intended point mutation to indel formation is greater than 1:1, 10:1, 50:1, 100:1, 500:1, or 1000:1, or more. In some embodiments, the cut single strand (nicked strand) is hybridized to the guide nucleic acid. In some embodiments, the cut single strand is opposite to the strand comprising the first nucleobase. In some embodiments, the base editor comprises nickase activity. In some embodiments, the intended edited base pair is upstream of a PAM site. In some
embodiments, the intended edited base pair is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, or 20 nucleotides upstream of the PAM site. In some embodiments, the intended edited basepair is downstream of a PAM site.
[0350] In some embodiments, the intended edited base pair is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides downstream stream of the PAM site. In some embodiments, the method does not require a canonical (e.g., NGG) PAM site. In some embodiments, the base editor comprises a linker. In some embodiments, the linker is 1-25 amino acids in length. In some embodiments, the linker is 5-20 amino acids in length. In some embodiments, linker is 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids in length. In some embodiments, the target region comprises a target window, wherein the target window comprises the target nucleobase pair. In some embodiments, the target window comprises 1-10 nucleotides. In some embodiments, the target window is 1-9, 1-8, 1- 7, 1-6, 1-5, 1-4, 1-3, 1-2, or 1 nucleotides in length. In some embodiments, the target window is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length. In some embodiments, the intended edited base pair is within the target window. In some embodiments, the target window comprises the intended edited base pair. In some embodiments, the method is performed using any of the base editors provided herein. In some embodiments, a target window is a editing window.
[0351] In some embodiments, the disclosure provides methods for editing a nucleotide. In some embodiments, the disclosure provides a method for editing a nucleobase pair of a double-stranded DNA sequence. In some embodiments, the method comprises a) contacting a target region of the double-stranded DNA sequence with a complex comprising a base editor and a guide nucleic acid (e.g., gRNA), where the target region comprises a target nucleobase pair (e.g., A:T target base pair), b) converting a first nucleobase (e.g., the A base) of said target nucleobase pair in a single strand of the target region to a second nucleobase (e.g., converted to an intermediate, such as 8-oxoadenine, which is then replaced with a C through DNA replication/repair processes), c) cutting (or nicking) no more than one strand of said target region, wherein a third nucleobase complementary to the first nucleobase base is replaced by a fourth nucleobase complementary to the second nucleobase, and the second nucleobase is replaced with a fifth nucleobase that is complementary to the fourth
nucleobase, thereby generating an intended edited base pair, wherein the efficiency of generating the intended edited base pair is at least 5%.
[0352] In some embodiments, at least 5% of the intended base pairs are edited. In some embodiments, at least 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, or 50% of the intended base pairs are edited. In some embodiments, the method causes less than 19%, 18%, 16%, 14%, 12%, 10%, 8%, 6%, 4%, 2%, 1%, 0.5%, 0.2%, or less than 0.1% indel formation. In some embodiments, the method results in less than 19%, 18%, 16%, 14%, 12%, 10%, 8%, 6%, 4%, 2%, 1%, 0.5%, 0.2%, or less than 0.1% indel formation. In some embodiments, the method results in less than 20% indel formation in the nucleic acid. In other embodiments, the method results in less than 35% indel formation in the nucleic acid. In some
embodiments, the method results in less than 20% indel formation in the nucleic acid. In other embodiments, the method results in less than 35% indel formation in the nucleic acid.
[0353] In some embodiments, the ratio of intended product to unintended products at the target nucleotide is at least 2:1, 5:1, 10:1, 20:1, 30:1, 40:1, 50:1, 60:1, 70:1, 80:1, 90:1, 100:1, or 200:1, or more. In some embodiments, the ratio of intended point mutation to indel formation is greater than 1:1, 10:1, 50:1, 100:1, 500:1, or 1000:1, or more. In some embodiments, the cut single strand is hybridized to the guide nucleic acid. In some embodiments, the cut single strand is opposite to the strand comprising the first nucleobase.
In some embodiments, the base editor comprises adenine oxidation and/or DNA glycosylase inhibition activity. In some embodiments, the base editor comprises nickase activity. In some embodiments, the intended edited base pair is upstream of a PAM site. In some embodiments, the intended edited base pair is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, or 20 nucleotides upstream of the PAM site. In some embodiments, the intended edited basepair is downstream of a PAM site. In some embodiments, the intended edited base pair is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides downstream stream of the PAM site. In some embodiments, the method does not require a canonical (e.g., NGG) PAM site. In some embodiments, the base editor comprises a linker. In some embodiments, the linker is 1-25 amino acids in length. In some
embodiments, the linker is 5-20 amino acids in length. In some embodiments, the linker is
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids in length. In some embodiments, the target region comprises a target window, wherein the target window comprises the target nucleobase pair. In some embodiments, the target window comprises 1-10 nucleotides. In some embodiments, the target window is 1-9, 1-8, 1-7, 1-6, 1-5, 1-4, 1-3, 1-2, or 1 nucleotides in length. In some embodiments, the target window is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length. In some embodiments, the intended edited base pair occurs within the target window. In some embodiments, the target window comprises the intended edited base pair. In some embodiments, the base editor is any one of the base editors provided herein.
[0354] In some embodiments, the disclosure provides editing methods comprising contacting a DNA, or RNA molecule with any of the base editors provided herein, and with at least one guide nucleic acid (e.g., guide RNA), wherein the guide nucleic acid, (e.g., guide RNA) is about 15-100 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence. In some embodiments, the 3' end of the target sequence is immediately adjacent to a canonical PAM sequence (NGG). In some
embodiments, the 3' end of the target sequence is not immediately adjacent to a canonical PAM sequence (NGG). In some embodiments, the 3' end of the target sequence is immediately adjacent to an AGC, GAG, TTT, GTG, or CAA sequence.
[0355] In some embodiments, the target nucleic acid sequence comprises a sequence associated with a disease, disorder, or condition. In some embodiments, the target nucleic acid sequence comprises a point mutation associated with a disease, disorder, or condition.
In some embodiments, the activity of the fusion protein (e.g., comprising an adenine oxidase and a Cas9 domain), or the complex, results in a correction of the point mutation. In some embodiments, the target nucleic acid sequence comprises an C A point mutation associated with a disease, disorder, or condition, and wherein the conversion of the mutant A to a C results in a sequence that is not associated with a disease, disorder, or condition. The target sequence may comprise a G T point mutation associated with a disease, disorder, or condition, and wherein the conversion of the mutant T to a G results in a sequence that is not associated with a disease, disorder, or conditionr. In some embodiments, the target nucleic acid sequence encodes a protein, and the point mutation is in a codon and results in a change in the amino acid encoded by the mutant codon as compared to the wild-type codon. In some embodiments, the transversion of the mutant A (or mutant T) results in a change of the amino acid encoded by the mutant codon. In some embodiments, the transversion of the mutant A (or mutant T) results in the codon encoding the wild-type amino acid. In some embodiments, the contacting is in vivo in a subject. In some embodiments, the subject has or has been diagnosed with a disease, disorder, or condition. In some embodiments, the disease, disorder, or conditionis congenital deafness, spastic paraplegia, nonsyndromic hearing loss, spinal muscular atrophy, or hypohidrotic ectodermal dysplasia.
[0356] In some embodiments, the base editors are used to introduce a point mutation into a nucleic acid by oxidizing a target A nucleobase. In some embodiments, the oxidation of the target nucleobase results in the correction of a genetic defect, e.g., in the correction of a point mutation that leads to a loss of function in a gene product. In some embodiments, the genetic defect is associated with a disease, disorder, or condition, e.g., a lysosomal storage disorder or a metabolic disease, such as, for example, type I diabetes. In some embodiments, the methods provided herein are used to introduce a deactivating point mutation into a gene or allele that encodes a gene product that is associated with a disease, disorder, or condition.
For example, in some embodiments, methods are provided herein that employ a DNA editing fusion protein to introduce a deactivating point mutation into an oncogene (e.g., in the treatment of a proliferative disease). A deactivating mutation may, in some embodiments, generate a premature stop codon in a coding sequence, which results in the expression of a truncated gene product, e.g., a truncated protein lacking the function of the full-length protein.
[0357] In some embodiments, the purpose of the methods provided herein is to restore the function of a dysfunctional gene via genome editing. The base editor proteins provided herein can be validated for gene editing-based human therapeutics in vitro , e.g., by correcting a disease-associated mutation in human cell culture. It will be understood by the skilled artisan that the base editors provided herein, e.g., the fusion proteins comprising a nucleic acid programmable DNA binding protein (e.g., Cas9) and a nucleobase modification domain can be used to correct any single point A to C or T to G mutation. Oxidation of the mutant A that is base-paired with the mutant T, followed by a round of replication, corrects the mutation.
[0358] The successful correction of point mutations in disease-associated genes and alleles opens up new strategies for gene correction with applications in therapeutics and basic research. Site-specific single-base modification systems like the disclosed fusions of a nucleic acid programmable DNA binding protein and an adenine oxidase domain also have applications in“reverse” gene therapy, where certain gene functions are purposely suppressed or abolished. In these cases, site- specifically mutating residues that lead to inactivating mutations in a protein, or mutations that inhibit function of the protein can be used to abolish or inhibit protein function.
Methods of treatment
[0359] The instant disclosure provides methods for the treatment of a subject diagnosed with a disease associated with or caused by a point mutation that can be corrected by a DNA editing fusion protein provided herein. For example, in some embodiments, a method is provided that comprises administering to a subject having such a disease, e.g., a cancer associated with a point mutation as described above, an effective amount of an adenine oxidase fusion protein and a gRNA that forms a complex with the fusion protein, that corrects the point mutation or introduces a deactivating mutation into a disease-associated gene. In some embodiments, a method is provided that comprises administering to a subject having such a disease, e.g., a cancer associated with a point mutation as described above, an effective amount of an adenine oxidase fusion protein-gRNA complex that corrects the point mutation or introduces a deactivating mutation into a disease-associated gene. Further provided herein are methods comprising administering to a subject one or more vectors that contains a nucleotide sequence that expresses the fusion protein and gRNA that forms a complex with the fusion protein.
[0360] In some embodiments, the disease is a proliferative disease. In some embodiments, the disease is a genetic disease. In some embodiments, the disease is a neoplastic disease. In some embodiments, the disease is a metabolic disease. In some embodiments, the disease is a lysosomal storage disease. Other diseases that can be treated by correcting a point mutation or introducing a deactivating mutation into a disease-associated gene will be known to those of skill in the art, and the disclosure is not limited in this respect.
[0361] The instant disclosure provides methods for the treatment of additional diseases or disorders, e.g., diseases or disorders that are associated or caused by a point mutation that can be corrected by adenine oxidase-mediated gene editing. Some such diseases are described herein, and additional suitable diseases that can be treated with the strategies and fusion proteins provided herein will be apparent to those of skill in the art based on the instant disclosure. Exemplary suitable diseases and disorders are listed below. It will be understood that the numbering of the specific positions or residues in the respective sequences depends on the particular protein and numbering scheme used. Numbering might be different, e.g., in precursors of a mature protein and the mature protein itself, and differences in sequences from species to species may affect numbering. One of skill in the art will be able to identify the respective residue in any homologous protein and in the respective encoding nucleic acid by methods well known in the art, e.g., by sequence alignment and determination of homologous residues. Exemplary suitable diseases and disorders include, without limitation: Non-Bruton type Agammaglobulinemia, Hypomyelinating Leukodystrophy, 21 -hydroxylase deficiency, familial Breast-ovarian cancer, Immunodeficiency with basal ganglia
calcification, Congenital myasthenic syndrome, Shprintzen-Goldberg syndrome, Peroxisome biogenesis disorder, Nephronophthisis, autosomal recessive early-onset, digenic, PINK1/DJ1 Parkinson disease, Cerebral visual impairment and intellectual disability,
Neurodevelopmental disorder with or without anomalies of the brain, eye, or heart,
Immunodeficiency, Leber congenital amaurosis, Amyotrophic lateral sclerosis type 10, Motor neuron disease, Malignant melanoma of skin, Focal cortical dysplasia type II, papillary Renal cell carcinoma, Glioblastoma, Colorectal Neoplasms, Uterine cervical neoplasms, sporadic Papillary renal cell carcinoma, Malignant neoplasm of body of uterus, Kidney Carcinoma, Neoplasm of the breast, Glioblastoma, Smith-Kingsmore syndrome, Homocysteinemia due to MTHFR deficiency, type 2A2A Charcot-Marie-Tooth disease, Bartter syndrome type 3, Cataract, multiple types, Gastrointestinal stroma tumor, Paragangliomas, Pheochromocytoma, Hereditary cancer-predisposing syndrome, Paragangliomas, Hereditary cancer-predisposing syndrome, Gastrointestinal stroma tumor, Paragangliomas, Pheochromocytoma, Hereditary Paraganglioma-Pheochromocytoma Syndromes, Hereditary cancer-predisposing syndrome, Gastrointestinal stroma tumor, Paraganglioma and gastric stromal sarcoma, Uncombable hair syndrome, Parkinson disease, autosomal recessive early-onset, Childhood hypophosphatasia, Odontohypophosphatasia, Takenouchi-Kosaki syndrome, Clq deficiency, Prostate cancer/brain cancer susceptibility, UDPglucose-4-epimerase deficiency, Deficiency of hydroxymethylglutaryl-CoA lyase, Fucosidosis, nonsyndromic cleft palate, Van der Woude syndrome, autosomal recessive Hypercholesterolemia, Eichsfeld type congenital muscular dystrophy, autosomal dominant Mental retardation, Hyperphosphatasia with mental retardation syndrome, Hyperphosphatasia-intellectual disability syndrome, Obesity, mild, early-onset, Ectodermal dysplasia, hypohidrotic/hair/tooth/nail type, Dystonia, torsion, autosomal recessive, Reticular dysgenesis, Erythrokeratodermia variabilis et progressiva, Corneal dystrophy, Fuchs endothelial, Corneal dystrophy, posterior polymorphous,
Hereditary neutrophilia, Ceroid lipofuscinosis neuronal, Neuronal ceroid lipofuscinosis, Lethal tight skin contracture syndrome, DFNA 2 Nonsyndromic Hearing Loss, Osteogenesis imperfecta type 8, GLUT1 deficiency syndrome, autosomal recessive, Glucose transporter type 1 deficiency syndrome, Congenital amegakaryocytic thrombocytopenia, Myelofibrosis with myeloid metaplasia, somatic, Myelofibrosis with myeloid metaplasia,
Thrombocythemia, somatic, Hematologic neoplasm, Early infantile epileptic encephalopathy, Mental retardation, autosomal recessive, Familial porphyria cutanea tarda, MYH-associated polyposis, Hereditary cancer-predisposing syndrome, MUTYH- associated polyposis, Hereditary cancer-predisposing syndrome, Methylmalonic acidemia with homocystinuria, Methylmalonic aciduria and homocystinuria, cblC type, digenic, Muscle eye brain disease, Congenital Muscular Dystrophy, alpha-dystroglycan related, Limb-Girdle Muscular
Dystrophy, Recessive, Muscle eye brain disease, Congenital muscular dystrophy- dystroglycanopathy with brain and eye anomalies, type A3, Adenocarcinoma of the colon, Congenital primary aphakia, Hepatic failure, early-onset, and neurologic disorder due to cytochrome C oxidase deficiency, Carnitine palmitoyltransferase II deficiency, infantile, Carnitine palmitoyltransferase II deficiency, myopathic, stress-induced, Carnitine
palmitoyltransferase II deficiency, Carnitine palmitoyltransferase II deficiency, myopathic, stress-induced, Sensorineural deafness with mild renal dysfunction, Bartter syndrome type 4, Hypercholesterolemia, autosomal dominant, Low density lipoprotein cholesterol level quantitative trait locus, Familial hypercholesterolemia, Hypocholesterolemia,
Hypercholesterolemia, autosomal dominant, Familial hypercholesterolemia, Low density lipoprotein cholesterol level quantitative trait locus, Hypocholesterolemia, Lattice corneal dystrophy Type III, Epileptic encephalopathy, early infantile, Hypobetalipoproteinemia, familial, Congenital disorder of glycosylation type It, Leber congenital amaurosis, Retinitis pigmentosa, Medium-chain acyl-coenzyme A dehydrogenase deficiency, Dilated
cardiomyopathy ICC, Venous malformation, Aase syndrome, Stargardt disease, Cone-rod dystrophy, Retinitis pigmentosa, Stargardt disease, Congenital stationary night blindness, Retinal dystrophy, Nonsyndromic cleft lip with or without cleft palate, Glycogen storage disease type III, Glycogen storage disease Ilia, Intermediate maple syrup urine disease type 2, Maple syrup urine disease, Chorea, childhood-onset, with psychomotor retardation, Marshall syndrome, Stickler syndrome, type 2, Marshall/Stickler syndrome, Chudley-McCullough syndrome, Auriculocondylar syndrome, Pontocerebellar hypoplasia, type 9, Epileptic encephalopathy, early infantile, Spinocerebellar ataxia, Muscle AMP deaminase deficiency, Congenital giant melanocytic nevus, Liver cancer, Chronic lymphocytic leukemia,
Neurocutaneous melanosis, Malignant melanoma of skin, Multiple myeloma, Neuroblastoma, Lung adenocarcinoma, Non-small cell lung cancer, Acute myeloid leukemia, Renal cell carcinoma, papillary, Neoplasm of brain, Cutaneous melanoma, Glioblastoma, Hepatocellular carcinoma, Transitional cell carcinoma of the bladder, Colorectal Neoplasms,
Nasopharyngeal Neoplasms, Adrenocortical carcinoma, Adenocarcinoma of stomach,
Ovarian Serous Cystadenocarcinoma, Malignant neoplasm of body of uterus, RAS Inhibitor response, Malignant lymphoma, non-Hodgkin, Medulloblastoma, Malignant melanoma of skin, Multiple myeloma, Acute myeloid leukemia, Myelodysplastic syndrome, Cutaneous melanoma, Transitional cell carcinoma of the bladder, Neoplasm, Colorectal Neoplasms, Adenocarcinoma of stomach, Cutaneous melanoma, Malignant melanoma of skin, Multiple myeloma, Acute myeloid leukemia, Noonan syndrome, Myelodysplastic syndrome,
Cutaneous melanoma, Colorectal Neoplasms, Adenocarcinoma of stomach, Malignant neoplasm of body of uterus, Malignant melanoma of skin, Multiple myeloma, Non-small cell lung cancer, Acute myeloid leukemia, Myelodysplastic syndrome, Cutaneous melanoma, Colorectal Neoplasms, Adenocarcinoma of stomach, Malignant neoplasm of body of uterus, Secondary hypothyroidism, Cardiovascular phenotype, Neurodevelopmental disorder, mitochondrial, with abnormal movements and lactic acidosis, with or without seizures, beta- Hydroxysteroid dehydrogenase deficiency, Hajdu-Cheney syndrome, Hemochromatosis type 2A, Hemochromatosis type 1, Atrial fibrillation, familial, Nager syndrome, Severe congenital neutropenia, autosomal recessive, Retinitis pigmentosa, Neurodevelopmental disorder with microcephaly, hypotonia, and variable brain anomalies, Abnormality of brain morphology, Neurodevelopmental disorder with microcephaly, hypotonia, and variable brain anomalies, Paget disease of bone, White-sutton syndrome, Ichthyosis vulgaris, Dermatitis, atopic, Ichthyosis vulgaris, Mental retardation, autosomal dominant, Nemaline myopathy,
Congenital myopathy with fiber type disproportion, Epilepsy, nocturnal frontal lobe, type 3, Aicardi-Goutieres syndrome, Gaucher disease, perinatal lethal, Gaucher's disease, type 1, Subacute neuronopathic Gaucher's disease, Pyruvate kinase deficiency of red cells, Mental retardation, autosomal dominant, myopathy, mitochondrial, and ataxia, Grange syndrome, Charcot-Marie-Tooth disease, type 2, Lamilial partial lipodystrophy, Hutchinson-Gilford progeria syndrome, childhood-onset, Charcot-Marie-Tooth disease, Charcot-Marie-Tooth disease, type 2, Lamilial partial lipodystrophy, Benign scapuloperoneal muscular dystrophy with cardiomyopathy, Mandibuloacral dysostosis, Dilated cardiomyopathy, Encephalopathy, progressive, early-onset, with brain edema and/or leukoencephalopathy, Infantile
encephalopathy, Hereditary insensitivity to pain with anhidrosis, Familial medullary thyroid carcinoma, Hereditary insensitivity to pain with anhidrosis Spherocytosis, type 3, autosomal recessive, Spherocytosis, Recessive, Elliptocytosis, Hereditary pyropoikilocytosis,
Elliptocytosis, Spherocytosis, Recessive, Enlarged vestibular aqueduct, Alternating hemiplegia of childhood, Autoimmune interstitial lung, joint, and kidney disease,
Mitochondrial complex I deficiency, Charcot-Marie-Tooth disease, demyelinating, type lb, Charcot-Marie-Tooth disease, type I, Roussy-Levy syndrome, Neuropathy, congenital hypomyelinating, autosomal dominant, Charcot-Marie-Tooth disease, demyelinating, type lb, Charcot-Marie-Tooth disease type 2J, Charcot-Marie-Tooth disease dominant intermediate, Charcot-Marie-Tooth disease, type I, Gastrointestinal stroma tumor, Paragangliomas, Hereditary cancer-predisposing syndrome, Achromatopsia, Thrombophilia due to activated protein C resistance, Geroderma osteodysplastica, Trimethylaminuria, FM03 activity, decreased, Trimethylaminuria, Primary open angle glaucoma juvenile onset, Glaucoma, open angle, digenic, Glaucoma, primary congenital, digenic, MYOC-Related Disorders,
Leukoencephalopathy with Brainstem and Spinal Cord Involvement and Lactate Elevation, Antithrombin III deficiency, Antithrombin deficiency, Antithrombin III deficiency,
Hereditary nephrotic syndrome, Nephrotic syndrome, idiopathic, steroid-resistant, Pituitary hormone deficiency, combined, Glutamine deficiency, congenital, Prostate cancer, hereditary, Junctional epidermolysis bullosa gravis of Herlitz, Hyperparathyroidism, Factor H
deficiency, Basal laminar drusen, CFHR5 deficiency, Factor XIII subunit B deficiency, Primary autosomal recessive microcephaly 5, Macular dystrophy, Leber congenital amaurosis, Retinitis pigmentosa, Leber congenital amaurosis, Macular dystrophy, Acute myeloid leukemia with maturation, Microcephaly, primary, autosomal recessive,
Hypokalemic periodic paralysis, Left ventricular noncompaction, Familial hypertrophic cardiomyopathy, Left ventricular noncompaction, Familial restrictive cardiomyopathy, Cardiovascular phenotype, Renal dysplasia, Amelogenesis imperfecta, type IA, Popliteal pterygium syndrome, Van der Woude syndrome, Zimmermann-Lahand syndrome, Leber congenital amaurosis, Stromme syndrome, Ciliary dyskinesia, primary, Usher syndrome, type 2A, Retinitis pigmentosa, Usher syndrome, Usher syndrome, type 2A, Retinal dystrophy, USH2A-Related Disorders, Usher syndrome, Blindness, Rod-cone dystrophy, Pigmentary retinopathy, Abnormal macular morphology, Retinal pigment epithelial atrophy, Loeys-Dietz syndrome, Holt-Oram syndrome, Cardiovascular phenotype, Martsolf syndrome, Warburg micro syndrome, Skraban-Deardorff syndrome, Coenzyme Q10 deficiency, primary, Multiple mitochondrial dysfunctions syndrome, Nemaline myopathy, Myopathy,
scapulohumeroperoneal, Nemaline myopathy, autosomal dominant or recessive, Myopathy, actin, congenital, with cores, Cardioencephalomyopathy, fatal infantile, due to cytochrome c oxidase deficiency, Chediak-Higashi syndrome, Familial hypertrophic cardiomyopathy, Methylcobalamin deficiency, cblG type, Catecholaminergic polymorphic ventricular tachycardia, Catecholaminergic polymorphic ventricular tachycardia type 1,
Catecholaminergic polymorphic ventricular tachycardia, Tooth agenesis, selective, Multiple cutaneous leiomyomas, Fumarase deficiency, Hereditary cancer-predisposing syndrome, Mental retardation, autosomal dominant, Diamond-Blackfan anemia, Maturity-onset diabetes of the young, type 7, Myoglobinuria, acute recurrent, autosomal recessive, Feingold syndrome, Cranioectodermal dysplasia, Short rib polydactyly syndrome, Jeune thoracic dystrophy, Short-rib thoracic dysplasia without polydactyly, Short-rib thoracic dysplasia with polydactyly, digenic, Multiple epiphyseal dysplasia, Familial hypobetalipoproteinemia, Familial hypercholesterolemia, Hypercholesterolemia, autosomal dominant, type B,
Hypobetalipoproteinemia, familial, Hypobetalipoproteinemia, familial, Proopiomelanocortin deficiency, Acute myeloid leukemia, Shashi-Pena syndrome, Primary pulmonary
hypertension 4, Navajo neurohepatopathy, Retinitis pigmentosa, Retinitis pigmentosa, Neuroblastoma, Neuroblastoma, Lung adenocarcinoma, Neuroblastoma, Non-small cell lung cancer, Benign Soft Tissue Neoplasm of Uncertain Differentiation, 3-Oxo-5 alpha-steroid delta 4-dehydrogenase deficiency, Spastic paraplegia, autosomal dominant, Glaucoma, primary congenital, Gingival fibromatosis, Noonan syndrome, Noonan syndrome, Rasopathy, Noonan syndrome, Short-rib thoracic dysplasia with polydactyly, Sitosterolemia, Cystinuria, Holoprosencephaly, Single median maxillary incisor, SCHIZENCEPHALY, Erythrocytosis, familial, Factor v and factor viii, combined deficiency of, Multiple gastrointestinal atresias, Lynch syndrome, Lynch syndrome, Hereditary cancer-predisposing syndrome, Lynch syndrome, Hereditary nonpolyposis colon cancer, Lynch syndrome, Hereditary cancer- predisposing syndrome, Hereditary nonpolyposis colon cancer, Lynch syndrome, Hereditary cancer-predisposing syndrome, Hereditary nonpolyposis colorectal cancer type 5, Lynch syndrome, Hereditary cancer-predisposing syndrome, Colorectal cancer, non-polyposis, Hereditary nonpolyposis colon cancer, Leydig hypoplasia, type I, Leydig cell agenesis, Ovarian dysgenesis 1, Ovarian hyperstimulation syndrome, Combined oxidative
phosphorylation deficiency, Intellectual developmental disorder with persistence of fetal hemoglobin, Bardet-Biedl syndrome, Multiple mitochondrial dysfunctions syndrome, Miyoshi muscular dystrophy, Limb-girdle muscular dystrophy, type 2B, Limb-girdle muscular dystrophy, type 2B, Miyoshi muscular dystrophy, Limb-girdle muscular dystrophy, type 2B, Dysferlinopathy, Limb-girdle muscular dystrophy, type 2B, Miyoshi muscular dystrophy, Limb-girdle muscular dystrophy, type 2B, Radiohumeral fusions with other skeletal and craniofacial anomalies, Sepiapterin reductase deficiency, Alstrom syndrome, Microcephaly-capillary malformation syndrome, Visceral myopathy, Chronic intestinal pseudoobstruction, Progressive external ophthalmoplegia with mitochondrial DNA deletions, autosomal recessive, Mitochondrial DNA-depletion syndrome, hepatocerebral, Congenital disorder of glycosylation type 2B, Vitamin k-dependent clotting factors, combined deficiency of, Surfactant metabolism dysfunction, pulmonary, Wolcott-Rallison dysplasia,
Pheochromocytoma, Hereditary cancer-predisposing syndrome, Retinitis pigmentosa, Cone- rod dystrophy amelogenesis imperfecta, Cd8 deficiency, familial, Severe combined immunodeficiency, atypical, Achromatopsia, Monochromacy, Ectodermal dysplasia, hypohidrotic/hair/tooth type, autosomal dominant, Autosomal recessive hypohidrotic ectodermal dysplasia syndrome, Autosomal dominant hypohidrotic ectodermal dysplasia, Colorectal cancer with chromosomal instability, Retinitis pigmentosa, Osteomyelitis, sterile multifocal, with periostitis and pustulosis, Hypochromic microcytic anemia with iron overload, Culler-Jones syndrome, Autosomal recessive centronuclear myopathy,
Thrombophilia, hereditary, due to protein C deficiency, autosomal dominant, Congenital disorders of glycosylation type II, Congenital disorder of glycosylation, type IIo, Warburg micro syndrome, Hypomyelination with brainstem and spinal cord involvement and leg spasticity, Warts, hypogammaglobulinemia, infections, and myelokathexis, Congenital NAD deficiency disorder, Vertebral, cardiac, renal, and limb defects syndrome, Mowat-Wilson syndrome, Homocystinuria, cblD type, variant 1, Nemaline myopathy, Nemaline myopathy, Idiopathic generalized epilepsy, Epilepsy, idiopathic generalized, Juvenile myoclonic epilepsy, Episodic ataxia, type 5, Progressive myositis ossificans, Amelogenesis imperfecta, type IH, Benign familial neonatal-infantile seizures, Early infantile epileptic encephalopathy, Episodic ataxia, Early infantile epileptic encephalopathy, Seizures, Vertigo, Benign familial neonatal-infantile seizures, Mental retardation, autosomal dominant, Tumoral calcinosis, familial, hyperphosphatemic, Short rib-polydactyly syndrome, Majewski type, Severe myoclonic epilepsy in infancy, Generalized epilepsy with febrile seizures plus, type 2, Severe myoclonic epilepsy in infancy, Seizures, Delayed speech and language development, Early infantile epileptic encephalopathy, Severe myoclonic epilepsy in infancy, Familial hemiplegic migraine type 3, Paroxysmal extreme pain disorder, Hereditary sensory and autonomic neuropathy type IIA, Generalized epilepsy with febrile seizures plus, type 7, Rolandic epilepsy, Small fiber neuropathy, Primary erythromelalgia, Hereditary sensory and autonomic neuropathy type IIA, Generalized epilepsy with febrile seizures plus, type 7, Inherited Erythromelalgia, Primary erythromelalgia, Indifference to pain, congenital, autosomal recessive, Febrile seizures, familial, 3b, Benign recurrent intrahepatic cholestasis, Progressive familial intrahepatic cholestasis, Myasthenic syndrome, slow-channel congenital, Lethal multiple pterygium syndrome, Duane syndrome type 2, Synpolydactyly,
Brachydactyly-syndactyly-oligodactyly syndrome, Brachydactyl-syndactyly-oligodactyly syndrome (1 patient), immunodeficiency, developmental delay, and hypohomocysteinemia, Hereditary myopathy with early respiratory failure, Familial dilated cardiomyopathy, Dilated cardiomyopathy, Primary dilated cardiomyopathy, Limb-girdle muscular dystrophy, type 2J, Primary dilated cardiomyopathy, Familial dilated cardiomyopathy, Familial hypertrophic cardiomyopathy, Diabetes mellitus type 2, Ehlers-Danlos syndrome, type 4, Cardiovascular phenotype, Ehlers-Danlos syndrome, type 2, Ehlers-Danlos syndrome, classic type,
Hemochromatosis type 4, Immunodeficiency, Mycobacterial and viral infections,
susceptibility to, autosomal recessive, Immunodeficiency, Mental retardation, autosomal recessive, Acute myeloid leukemia, Myelodysplastic syndrome, Myelodysplastic syndrome progressed to acute myeloid leukemia, Mitochondrial complex I deficiency, Joubert syndrome, Infantile-onset ascending hereditary spastic paralysis, ALS2-Related Disorders, Amyotrophic lateral sclerosis type 2, Pulmonary venoocclusive disease, Primary pulmonary hypertension, Autoimmune lymphoproliferatiVe syndrome, type V, Aculeiform cataract, Congenital cataract, Cataract, coppock-like, Liver cancer, Medulloblastoma, Malignant melanoma of skin, Multiple myeloma, Lung adenocarcinoma, Acute myeloid leukemia, Myelodysplastic syndrome, Neoplasm of brain, Neoplasm of the breast, Glioblastoma, Hepatocellular carcinoma, Transitional cell carcinoma of the bladder, Brainstem glioma, Colorectal Neoplasms, Adenoid cystic carcinoma, Adenocarcinoma of prostate, Hypotonia, infantile, with psychomotor retardation and characteristic facies, Congenital
hyperammonemia, type I, Hereditary cancer-predisposing syndrome, Familial cancer of breast, Hereditary cancer-predisposing syndrome, Hereditary cancer-predisposing syndrome, Spondylometaphyseal dysplasia - Sutcliffe type, Spondylometaphyseal dysplasia, Short stature, Focal segmental glomerulosclerosis, Microcephaly, Small for gestational age, Disproportionate short-trunk short stature, Decreased body weight, Atrioventricular canal defect, Congenital microcephaly, Steroid-resistant nephrotic syndrome, Schimke
immunoosseous dysplasia, Short stature, Focal segmental glomerulosclerosis, Microcephaly, Small for gestational age, Disproportionate short-trunk short stature, Decreased body weight, Atrioventricular canal defect, Congenital microcephaly, Steroid-resistant nephrotic syndrome, Gracile syndrome, Cholestanol storage disease, Odontoonychodermal dysplasia, Schopf- Schulz-Passarge syndrome, Tooth agenesis, selective, Type A1 brachydactyly,
Dyschromatosis universalis hereditaria, Charcot-Marie-Tooth disease, axonal, type 2T, Myopathy, centronuclear, Three M syndrome, Waardenburg syndrome type 1, Alport syndrome, autosomal recessive, Benign familial hematuria, Basal ganglia disease, biotin- responsive, ARMC9-related Joubert syndrome, ARMC9-related Joubert syndrome, Jourbert syndrome, Arthrogryposis, distal, type 5d, Microphthalmia, isolated, Myasthenic syndrome, congenital, fast-channel, Congenital myasthenic syndrome, fast-channel, Oguchi's disease, Crigler Najjar syndrome, type 1, Crigler-Najjar syndrome, type II, Crigler-Najjar syndrome, Crigler-Najjar syndrome, type II, Gilbert's syndrome, Crigler Najjar syndrome, type 1, Hyperbilirubinemia, Ullrich congenital muscular dystrophy, Bethlem myopathy, Ullrich congenital muscular dystrophy, Bethlem myopathy, Primary hyperoxaluria, type I, D-2- hydroxyglutaric aciduria, Sideroblastic anemia with B-cell immunodeficiency, periodic fevers, and developmental delay, Multiple sulfatase deficiency, Gillespie syndrome, Limb- girdle muscular dystrophy, type 1C, Rippling muscle disease, Familial partial lipodystrophy, Severe congenital neutropenia, autosomal recessive, Severe congenital neutropenia, Von Hippel-Lindau syndrome, Hereditary cancer-predisposing syndrome, Erythrocytosis, familial, Von Hippel-Lindau syndrome, Erythrocytosis, familial, Von Hippel-Lindau syndrome, Renal cell carcinoma, papillary, Metabolic syndrome, susceptibility to, Obesity, age at onset of, Morbid obesity, Noonan syndrome, Rasopathy, Xeroderma pigmentosum, group C, Endplate acetylcholinesterase deficiency, Biotinidase deficiency, Thyroid hormone resistance, generalized, autosomal dominant, Thyroid hormone resistance, selective pituitary,
Microphthalmia, syndromic, Congenital disorder of deglycosylation, Cardiovascular phenotype, Loeys-Dietz syndrome, Thoracic aortic aneurysm and aortic dissection,
Congenital disorder of glycosylation type lx, Mucopolysaccharidosis, MPS-IV-B,
Osteogenesis imperfecta type 7, Lynch syndrome I, Hereditary cancer-predisposing syndrome, Turcot syndrome, Hereditary nonpolyposis colon cancer, Atrial fibrillation, Atrial fibrillation, familial, Atrial fibrillation, Brugada syndrome, Congenital long QT syndrome, Cardiac arrhythmia, Sudden infant death syndrome, Long qt syndrome, acquired,
susceptibility to, Long QT syndrome, Romano-Ward syndrome, Brugada syndrome, , Sick sinus syndrome, Progressive familial heart block, Cardiovascular phenotype, Paroxysmal familial ventricular fibrillation, Dilated Cardiomyopathy, Dominant, Long QT syndrome, Congenital long QT syndrome, Cardiac conduction defect, nonprogressive, Cardiac conduction defect, nonspecific, Brugada syndrome, Asplenia, isolated congenital, Liver cancer, Medulloblastoma, Malignant melanoma of skin, Pilomatrixoma, Hepatoblastoma, Hepatocellular carcinoma, Transitional cell carcinoma of the bladder, Uterine cervical neoplasms, Craniopharyngioma, Adenocarcinoma of stomach, Malignant neoplasm of body of uterus, Adenocarcinoma of prostate, Liver cancer, Medulloblastoma, Malignant melanoma of skin, Pilomatrixoma, Lung adenocarcinoma, Carcinoma of colon, Endometrial neoplasm, Hepatocellular carcinoma, Pancreatic adenocarcinoma, Transitional cell carcinoma of the bladder, Carcinoma of esophagus, Colorectal Neoplasms, Adenocarcinoma of stomach, Malignant neoplasm of body of uterus, Adenocarcinoma of prostate, Liver cancer,
Medulloblastoma, Malignant melanoma of skin, Squamous cell carcinoma of the head and neck, Malignant tumor of prostate, Lung adenocarcinoma, Hepatoblastoma, Cutaneous melanoma, Hepatocellular carcinoma, Craniopharyngioma, Adrenocortical carcinoma, Adenocarcinoma of stomach, Malignant neoplasm of body of uterus, Liver cancer,
Medulloblastoma, Lung adenocarcinoma, Neoplasm of stomach, Cutaneous melanoma, Hepatocellular carcinoma, Transitional cell carcinoma of the bladder, Carcinoma of esophagus, Uterine cervical neoplasms, Adenocarcinoma of stomach, Malignant neoplasm of body of uterus, Adenocarcinoma of prostate, Liver cancer, Malignant melanoma of skin,
Lung adenocarcinoma, Cutaneous melanoma, Hepatocellular carcinoma, Transitional cell carcinoma of the bladder, Colorectal Neoplasms, Adrenocortical carcinoma, Malignant neoplasm of body of uterus, Adenocarcinoma of prostate, Nemaline myopathy,
Spinocerebellar ataxia, autosomal recessive, Perrault syndrome, Hydrops, lactic acidosis, and sideroblastic anemia, Perrault syndrome, Bardet-Biedl syndrome, Bardet-Biedl syndrome, Failure of tooth eruption, primary, Gray platelet syndrome, Pretibial epidermolysis bullosa, Epidermolysis bullosa pruriginosa, autosomal dominant, Recessive dystrophic epidermolysis bullosa, Microcephaly, progressive, with seizures and cerebral and cerebellar atrophy, Epileptic encephalopathy, Nephrotic syndrome, type 5, with or without ocular abnormalities, Muscular dystrophy-dystroglycanopathy (congenital with brain and eye anomalies), type a,, Tumor susceptibility linked to germline BAP1 mutations, Dilated cardiomyopathy 1Z, Dilated cardiomyopathy IS, Familial hypertrophic cardiomyopathy, Cardiovascular phenotype, Familial hypertrophic cardiomyopathy, Hypogonadotropic hypogonadism with anosmia, Atelosteogenesis type 1, Atelosteogenesis type 3, Spondylocarpotarsal synostosis syndrome, Nemaline myopathy, Mental retardation with language impairment and with or without autistic features, Glycogen storage disease, type IV, Glycogen storage disease IV, congenital neuromuscular, Glycogen storage disease, type IV, Frontotemporal Dementia, Chromosome 3-Linked, Amyotrophic lateral sclerosis, Pituitary hormone deficiency, combined 1, Joubert syndrome, Neuropathy, hereditary motor and sensory, Okinawa type, Macular dystrophy, vitelliform, Retinitis pigmentosa, Combined oxidative phosphorylation deficiency, Spermatogenic failure, epileptic encephalopathy, infantile or early childhood, Alkaptonuria, Senior-Loken syndrome, Leber congenital amaurosis, Nephronophthisis, congenital deafness, Hypocalciuric hypercalcemia, familial, type 1, Neonatal severe hyperparathyroidism, Hypocalcemia, autosomal dominant, Hypocalciuric hypercalcemia, familial, type 1, Neonatal severe hyperparathyroidism, Hypocalciuric hypercalcemia, familial, type 1, Hypocalcemia, autosomal dominant, Hypocalcemia, autosomal dominant, with bartter syndrome, Dyskinesia, familial, with facial myokymia, Visceral myopathy, Lymphedema, primary, with myelodysplasia, Dendritic cell, monocyte, B lymphocyte, and natural killer lymphocyte deficiency, Acyl-CoA dehydrogenase family, member, deficiency of, Retinitis pigmentosa, Retinitis pigmentosa, autosomal recessive, Congenital stationary night blindness, autosomal dominant, Familial benign pemphigus, Epileptic encephalopathy, early infantile, Adolescent nephronophthisis, Primary hypertrophic osteoarthropathy, autosomal recessive, Myopathy, myofibrillar, Propionyl-CoA carboxylase deficiency, Blepharophimosis, ptosis, and epicanthus inversus, Seckel syndrome, Bruck syndrome, Craniosynostosis, Deficiency of ferroxidase, Usher syndrome, type 3A, Usher syndrome, type 3 A, Retinitis pigmentosam, Deficiency of butyrylcholine esterase, BCHE, fluoride, Retinitis pigmentosa, Fanconi-Bickel syndrome, Short stature, idiopathic, autosomal, Liver cancer, Malignant melanoma of skin, Squamous cell carcinoma of the head and neck, Small cell lung cancer, Lung adenocarcinoma, Squamous cell lung carcinoma, Renal cell carcinoma, papillary, Neoplasm of brain, Neoplasm of the breast, Glioblastoma,
Hepatocellular carcinoma, Pancreatic adenocarcinoma, Transitional cell carcinoma of the bladder, Brainstem glioma, Carcinoma of esophagus, PIK3CA related overgrowth spectrum, Colorectal Neoplasms, Uterine cervical neoplasms, Papillary renal cell carcinoma, sporadic, Nasopharyngeal Neoplasms, Adenocarcinoma of stomach, Ovarian Serous
Cystadenocarcinoma, Malignant neoplasm of body of uterus, Adenocarcinoma of prostate, Uterine Carcinosarcoma, Carcinoma of gallbladder, Lung cancer, Medulloblastoma, Malignant melanoma of skin, Squamous cell carcinoma of the head and neck, Malignant tumor of prostate, Ovarian epithelial cancer, Carcinoma of colon, Neoplasm of brain, Neoplasm of the breast, Glioblastoma, Transitional cell carcinoma of the bladder, PIK3CA related overgrowth spectrum, Ovarian Neoplasms, Colorectal Neoplasms, Uterine cervical neoplasms, Adenocarcinoma of stomach, Malignant neoplasm of body of uterus, Adenocarcinoma of prostate, Uterine Carcinosarcoma, Cowden syndrome, PIK3CA related overgrowth spectrum, Colorectal Neoplasms, Ciliary dyskinesia, Ciliary dyskinesia, primary, Microphthalmia syndromic, Methylcrotonyl-CoA carboxylase deficiency, Epidermolysis bullosa simplex, Koebner type, Epidermolysis bullosa simplex, generalized, with scarring and hair loss, Leukoencephalopathy with vanishing white matter, Congenital disorder of glycosylation type ID, Woolly hair, autosomal recessive, with or without hypotrichosis, Membranous cataract, Myopia, high, with cataract and vitreoretinal degeneration, Primary hypomagnesemia, Dominant hereditary optic atrophy, Autosomal dominant optic atrophy plus syndrome, Dominant hereditary optic atrophy, Abortive cerebellar ataxia, Dominant hereditary optic atrophy, Retinitis pigmentosa, Retinal dystrophy, Congenital stationary night blindness, autosomal dominant, Retinitis pigmentosa, Retinitis pigmentosa, Epileptic encephalopathy, early infantile, Abnormality of brain morphology, Dysostosis multiplex, Mucopolysaccharidosis type I, Hypochondroplasia, Thanatophoric dysplasia type 1,
Epidermal nevus, Bladder carcinoma, Achondroplasia, Crouzon syndrome with acanthosis nigricans, Craniosynostosis, Carcinoma, Thanatophoric dysplasia type 1, Achondroplasia, Hypochondroplasia, Craniosynostosis, Camptodactyly, tall stature, and hearing loss syndrome, Hypochondroplasia, Thanatophoric dysplasia type 1, Craniosynostosis, Fibrous dysplasia of jaw, Myasthenia, limb-girdle, familial, Selective tooth agenesis, Orofacial cleft, Hypoplastic enamel-onycholysis-hypohidrosis syndrome, Jeune thoracic dystrophy, Ellis-van Creveld Syndrome, Short rib-polydactyly syndrome, Majewski type, Chondroectodermal dysplasia, Short rib-polydactyly syndrome, Majewski type, Diabetes mellitus type 2, Diabetes mellitus and insipidus with optic atrophy and deafness, Wolfram syndrome, Joubert syndrome, Coach syndrome, Retinitis pigmentosa, Cone-rod dystrophy, Retinal dystrophy, Spastic paraplegia, autosomal recessive, Epileptic encephalopathy, early infantile, Retinitis pigmentosa, Limb-girdle muscular dystrophy, type 2E, Gastrointestinal stroma tumor, Gastrointestinal stromal tumor, familial, Cutaneous mastocytosis, Gastrointestinal stroma tumor, Cutaneous melanoma, Gastrointestinal stroma tumor, Acute myeloid leukemia, Hematologic neoplasm, Cutaneous melanoma, Congenital disorder of glycosylation type IQ, Hypogonadotropic hypogonadism with or without anosmia, Epilepsy, progressive myoclonic, with or without renal failure, Cryptophthalmos syndrome, Hyaline fibromatosis syndrome, Deafness, autosomal dominant nonsyndromic sensorineural, with dentinogenesis imperfecta, Dentinogenesis imperfecta - Shield's type II, Dentinogenesis imperfecta - Shield's type III, Deafness, autosomal dominant nonsyndromic sensorineural, with dentinogenesis imperfecta, Basan syndrome, Adermatoglyphia, Type A2 brachydactyly, Acromesomelic dysplasia, Demirhan type, Fibular hypoplasia and complex brachydactyly, Brachydactyly, type al, Abetalipoproteinaemia, SLC39A8 deficiency, congenital disorder of glycosylation, type Iln, Beta-D-mannosidosis, Sudden cardiac failure, infantile, Deficiency of 3-hydroxyacyl-CoA dehydrogenase, Hyperinsulinemic hypoglycemia, familial, Fibrosis of extraocular muscles, congenital, Cardiac arrhythmia, Cardiac arrhythmia, ankyrin B -related, Long QT syndrome, Cardiovascular phenotype, Cardiac arrhythmia, Cardiac arrhythmia, ankyrin B-related, Long QT syndrome, Arrhythmia, Cardiovascular phenotype, Bardet-Biedl syndrome, Van
Maldergem syndrome, short-rib thoracic dysplasia with polydactyly, Ceroid lipofuscinosis neuronal, Macular dystrophy with central cone involvement, Ceroid lipofuscinosis neuronal, Methylmalonic aciduria cblA type, Pseudohypoaldosteronism type 1 autosomal dominant, Pseudohypoaldosteronism, Common variable immunodeficiency, with autoimmunity, Afibrinogenemia, congenital, Familial visceral amyloidosis, Ostertag type,
Hypodysfibrinogenemia, congenital, Afibrinogenemia, congenital, Glutaric acidemia IIC, Glutaric aciduria, type 2, Short rib-polydactyly syndrome, Majewski type, Dilated
cardiomyopathy 1A, Limb-girdle muscular dystrophy, type 2S, Mitochondrial myopathy, Myopia, Mitochondrial DNA depletion syndrome (cardiomyopathic type), autosomal recessive, Progressive sensorineural hearing impairment, Hypertrophic cardiomyopathy, Left ventricular hypertrophy, Vertigo, Abnormality of mitochondrial metabolism, Mitochondrial respiratory chain defects, Bietti crystalline corneoretinal dystrophy, Comeal Dystrophy, Recessive, Bietti crystalline corneoretinal dystrophy, Hereditary factor XI deficiency disease, Mitochondrial complex II deficiency, Paragangliomas, Hereditary cancer-predisposing syndrome, Mitochondrial complex II deficiency, Dyskeratosis congenita autosomal dominant, Ciliary dyskinesia, Mental retardation, autosomal dominant, Chondrocalcinosis, Oculocutaneous albinism type 4, Inherited bone marrow failure syndrome, Bone marrow failure syndrome, Cornelia de Lange syndrome, Joubert syndrome, Orofaciodigital syndrome, Complement component deficiency, C7 and C6 deficiency, combined subtotal, Succinyl-CoA acetoacetate transferase deficiency, Laron syndrome with undetectable serum GH-binding protein, Laron-type isolated somatotropin defect, Levy-Hollister syndrome, Molybdenum cofactor deficiency, complementation group B, Distal hereditary motor neuronopathy type 2C, Kartagener syndrome, Acrodysostosis, with or without hormone resistance, UV-sensitive syndrome, Cockayne syndrome type A, Retinitis pigmentosa with or without skeletal anomalies, Immunodeficiency, Kugelberg-Welander disease, Werdnig-Hoffmann disease, 3- methylcrotonyl CoA carboxylase deficiency, Striatal degeneration, autosomal dominant, Hermansky Pudlak syndrome, Mucopolysaccharidosis, type vi, intermediate, Short stature, microcephaly, and endocrine dysfunction, Wagner syndrome, Basal cell carcinoma, somatic, Capillary malformation- arteriovenous malformation, Usher syndrome, type 2C, Febrile seizures, familial, Bosch-Boonstra-Schaaf optic atrophy syndrome, Proprotein convertase deficiency, Familial adenomatous polyposis, Familial colorectal cancer, Familial
adenomatous polyposis, Familial adenomatous polyposis, Hereditary cancer-predisposing syndrome, Familial adenomatous polyposis, Colorectal cancer, susceptibility to, Familial adenomatous polyposis, Hereditary cancer-predisposing syndrome, Familial adenomatous polyposis, Hereditary cancer-predisposing syndrome, Familial adenomatous polyposis, Hereditary cancer-predisposing syndrome, Familial adenomatous polyposis, Anencephalus, Aortic aneurysm, familial thoracic, Pyridoxine-dependent epilepsy, Seizures,
Ventriculomegaly, Pyridoxine-dependent epilepsy, Myopathy, areflexia, respiratory distress, and dysphagia, early-onset, Congenital contractural arachnodactyly, Neuro myotonia and axonal neuropathy, autosomal recessive, Renal carnitine transport defect, Hereditary cancer- predisposing syndrome, Chylomicron retention disease, Groenouw comeal dystrophy type I, Reis-Bucklers' corneal dystrophy, Lattice corneal dystrophy type 3A, Lattice corneal dystrophy Type I, Pseudohypoaldosteronism, type 2, Pseudohypoaldosteronism type 2D, Myotilinopathy, Charcot-Marie-Tooth disease, axonal, type 2w, Leber congenital amaurosis, Retinitis pigmentosa, Diastrophic dysplasia, de la Chapelle dysplasia, Achondrogenesis, type IB, Multiple epiphyseal dysplasia, Diastrophic dysplasia, Hereditary diffuse
leukoencephalopathy with spheroids, Infantile myofibromatosis, Mental retardation, autosomal recessive, Tay-Sachs disease, variant AB, Hyperglycinuria, Iminoglycinuria, digenic, Hyperekplexia hereditary, epileptic encephalopathy, early infantile, Autosomal recessive congenital ichthyosis, Congenital ichthyosiform erythroderma, Epilepsy, childhood absence, Familial febrile seizures, Leukodystrophy, hypomyelinating, Atrial septal defect with or without atrioventricular conduction defects, Ventricular septal defect, Primary dilated cardiomyopathy, Atrial septal defect, Ventricular fibrillation, Noncompaction
cardiomyopathy, Abnormality of cardiovascular system morphology, Malformation of the heart and great vessels, Cardiovascular phenotype, Congenital heart disease, Atrial septal defect with or without atrioventricular conduction defects, Hypothyroidism, congenital, nongoitrous, Cardiovascular phenotype, Congenital heart disease, Craniosynostosis, Lewy body dementia, Sotos syndrome, Hypercalcemia, infantile, Hereditary angioneurotic edema with normal C 1 esterase inhibitor activity, Hereditary angioneurotic edema, Acute myeloid leukemia, Myelodysplasia, Ehlers-Danlos syndrome progeroid type, Axenfeld-Rieger syndrome type 3, Polymicrogyria, asymmetric, Combined oxidative phosphorylation deficiency, Combined oxidative phosphorylation deficiency, Factor XIII subunit A deficiency, Cardiovascular phenotype, Bicuspid aortic valve, Arrhythmia, Sudden cardiac death, Ventricular fibrillation, Aortic dilatation, Bicuspid aortic valve, Branchiooculofacial syndrome, Hypoparathyroidism familial isolated, Auriculocondylar syndrome, Lafora disease, Hemochromatosis type 1, Transient neonatal diabetes mellitus, Michelin-tire baby, Combined oxidative phosphorylation deficiency, Peeling skin syndrome, Thrombocytopenia, anemia, and myelofibrosis, Premature ovarian failure, Sialidosis type I, 21 -hydroxylase deficiency, Adenoma, cortisol-producing, Carcinoma, adrenocortical, androgen- secreting, Nakajo syndrome, Otospondylomegaepiphyseal dysplasia, Nonsyndromic Deafness, Mental retardation, autosomal dominant, Leber congenital amaurosis, Polycystic lipomembranous osteodysplasia with sclerosing leukoencephalopathy, Macular dystrophy, vitelliform, adult- onset, Retinitis pigmentosa, Choroidal dystrophy, central areolar, Glycine N- methyltransferase deficiency, Heimler syndrome, Three M syndrome, Xeroderma
pigmentosum, variant type, Jaberi-Elahi syndrome, Ciliary dyskinesia, Platelet-activating factor acetylhydrolase deficiency, Methylmalonic aciduria due to methylmalonyl-CoA mutase deficiency, methylmalonic aciduria, mut(-) type, Methylmalonic aciduria due to methylmalonyl-CoA mutase deficiency, methylmalonic aciduria, mut(0) type, Methylmalonic acidemia, Methylmalonic aciduria due to methylmalonyl-CoA mutase deficiency, methylmalonic aciduria, mut(-) type Rh-null, regulator type, Rh-mod syndrome, Char syndrome, Autosomal recessive polycystic kidney disease, Polycystic kidney dysplasia, Autosomal recessive polycystic kidney disease, Spinocerebellar ataxia, Retinitis pigmentosa, Retinitis pigmentosa, mental retardation, autosomal dominant, Hydatidiform mole, recurrent, Deafness, autosomal dominant, Macular dystrophy, vitelliform, developmental delay, intellectual disability, obesity, and dysmorphic features, Leber congenital amaurosis, Maple syrup urine disease, Immunodeficiency, Hyper- IgE syndrome, Calcification of joints and arteries, Spinocerebellar ataxia, autosomal recessive, Forney Robinson Pascoe syndrome, Mitochondrial DNA depletion syndrome (encephalomyopathic type), North Carolina macular dystrophy, Spastic paraplegia and psychomotor retardation with or without seizures,
Osteopetrosis, autosomal recessive, Amyotrophic lateral sclerosis type, Progressive pseudorheumatoid dysplasia, Metaphyseal chondrodysplasia, Schmid type, Ovarian dysgenesis, Alopecia congenita keratosis palmoplantaris, Oculodentodigital dysplasia, Merosin deficient congenital muscular dystrophy, Laminin alpha 2-related dystrophy, Merosin deficient congenital muscular dystrophy, Arginase deficiency, Arterial calcification of infancy, Hypophosphatemic rickets, autosomal recessive, Arterial calcification of infancy, Hypophosphatemic Rickets, Recessive, Arterial calcification of infancy, Joubert syndrome, Leber congenital amaurosis, Disseminated atypical mycobacterial infection,
neurodegeneration with brain iron accumulation, Mental retardation, autosomal dominant, Congenital heart defects, multiple types, Mitochondrial diseases, Combined oxidative phosphorylation deficiency, Mitochondrial diseases, Estrogen resistance, Neoplasm of the breast, Spinocerebellar ataxia, autosomal recessive, Liver cancer, Hepatocellular carcinoma, Plasminogen deficiency, type I, Dysplasminogenemia, Plasminogen deficiency, type I, Parkinson disease, Dentin dysplasia, type I, with extreme microdontia and misshapen teeth, Ciliary dyskinesia, Spondylocostal dysostosis, Baraitser- Winter syndrome, Hereditary cancer-predisposing syndrome, Hereditary nonpolyposis colon cancer, Lynch syndrome, Neurodevelopmental abnormality, leukodystrophy, hypomyelinating, Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies, type A7, Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies, type A7, Muscular dystrophy- dystroglycanopathy (limb-girdle), type c, Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies, type A7, Saethre-Chotzen syndrome, Ciliary dyskinesia, primary, Hypomyelination and Congenital Cataract, microtia without hearing impairment, Microtia, hearing impairment, and cleft palate, Isolated growth hormone deficiency type IB, Uridine 5-prime monophosphate hydrolase deficiency, hemolytic anemia due to, Bardet- Biedl syndrome, Focal segmental glomerulosclerosis, Wilms tumor and radial bilateral aplasia, Pallister-Hall syndrome, Greig cephalopolysyndactyly syndrome, Pallister-Hall syndrome, Pallister-Hall syndrome, Hyperbiliverdinemia, Ehlers-Danlos syndrome, classic like, Permanent neonatal diabetes mellitus, Maturity-onset diabetes of the young, type 2, Immunodeficiency, common variable, Cowden syndrome, Lung adenocarcinoma, Non-small cell lung cancer, Nonsmall cell lung cancer, response to tyrosine kinase inhibitor in, somatic, Glioblastoma, Non-small cell lung cancer, Squamous cell lung carcinoma, Carcinoma of esophagus, Non-small cell lung cancer, Mucopolysaccharidosis type VII, Argininosuccinate lyase deficiency, Epilepsy, progressive myoclonic, Shwachman syndrome, Disordered steroidogenesis due to cytochrome p450 oxidoreductase deficiency, Charcot-Marie-Tooth disease, Charcot-Marie-Tooth disease type 2F, Cholestasis, intrahepatic, of pregnancy, Progressive familial intrahepatic cholestasis, Progressive familial intrahepatic cholestasis, Intrahepatic cholestasis, Colchicine resistance, Cerebral cavernous malformation, Cerebral cavernous malformation, Cerebral cavernous malformations, Zellweger syndrome, Deafness enamel hypoplasia nail defects, Myelocerebellar disorder, C0L1A2-Related Disorder, Ehlers-Danlos syndrome, classic type, Osteogenesis imperfecta type I, Osteogenesis imperfecta type III, Osteogenesis imperfecta type I, Osteogenesis imperfecta, recessive perinatal lethal, Ehlers-Danlos syndrome, classic type, Osteogenesis imperfecta type I, Osteogenesis imperfecta with normal sclerae, dominant form, Osteogenesis imperfecta type III, Osteogenesis imperfecta with normal sclerae, dominant form, Osteogenesis imperfecta type III, Ehlers-Danlos syndrome, autosomal recessive, cardiac valvular form, Neonatal intrahepatic cholestasis caused by citrin deficiency, Citmllinemia type II, Split-hand/foot malformation, Asparagine synthetase deficiency, Epilepsy, familial temporal lobe,
Lissencephaly, Epilepsy, familial temporal lobe, Rolandic epilepsy, Epilepsy, familial temporal lobe, Enlarged vestibular aqueduct, Pendred's syndrome, Pendred's syndrome, Enlarged vestibular aqueduct, Pendred's syndrome, SLC26A4-Related Disorders, Enlarged vestibular aqueduct, Pendred's syndrome, Enlarged vestibular aqueduct, Congenital secretory diarrhea, chloride type, Maple syrup urine disease, type 3, DLD-Related Disorders,
Lissencephaly, Lipodystrophy, congenital generalized, type 3, Renal cell carcinoma, papillary, Cystic fibrosis, Hereditary pancreatitis, Cystic fibrosis, Hereditary pancreatitis, ataluren response - Efficacy, Persistent hyperplastic primary vitreous, autosomal recessive, Atrophia bulbomm hereditaria, Exudative vitreoretinopathy, Leptin dysfunction, Myofibrillar myopathy, filamin C-related, Myopathy, distal, Cardiomyopathy, familial hypertrophic, Dilated Cardiomyopathy, Dominant, Dilated Cardiomyopathy, Dominant, Basal cell carcinoma, somatic, Ghosal hematodiaphyseal syndrome, Multiple myeloma, Lung adenocarcinoma, Rasopathy, Glioblastoma, Transitional cell carcinoma of the bladder, Cardio-facio-cutaneous syndrome, Malignant melanoma of skin, Multiple myeloma, Lung adenocarcinoma, Non-small cell lung cancer, Squamous cell lung carcinoma, Squamous cell carcinoma of the skin, Transitional cell carcinoma of the bladder, Neoplasm, Colorectal Neoplasms, Adenocarcinoma of prostate, Lung cancer, Malignant melanoma of skin,
Multiple myeloma, Squamous cell carcinoma of the head and neck, Lung adenocarcinoma, Non-small cell lung cancer, Squamous cell lung carcinoma, Colorectal Neoplasms, Non small cell lung cancer, Rasopathy, Neoplasm of the breast, Neoplasm, Carcinoma of colon, Noonan syndrome, Cataract and cardiomyopathy, Myotonia congenital, Congenital myotonia, autosomal recessive form, Premature ovarian failure, Cortical dysplasia-focal epilepsy syndrome, Rolandic epilepsy, Pitt-Hopkins-like syndrome, Rolandic epilepsy, Long QT syndrome, Congenital long QT syndrome, Short QT syndrome, Cardiovascular phenotype, Long QT syndrome, Glaucoma, open angle, F, Glycogen storage disease of heart, lethal congenital, Familial hypertrophic cardiomyopathy, Primary familial hypertrophic cardiomyopathy, Holoprosencephaly, Currarino triad, Limb-girdle muscular dystrophy, type IE, Neuronal ceroid lipofuscinosis, Maturity-onset diabetes of the young, type 11, Congenital heart disease, Atrial septal defect, Congenital heart disease, Atrial septal defect, Tetralogy of Fallot, Ventricular septal defect, Atrioventricular septal defect, Idiopathic transverse myelitis, Jankovic Rivera syndrome, Farber disease, Hyperlipoproteinemia, type I,
Hyperlipoproteinemia, type I, lipoprotein lipase (Olbia), Surfactant metabolism dysfunction, pulmonary, Osteogenesis imperfecta, type xiii, Hypermanganesemia with dystonia, Charcot- Marie-Tooth disease, demyelinating, type If, Charcot-Marie-Tooth disease type 2E, Charcot- Marie-Tooth disease, demyelinating, type If, Trichothiodystrophy 6, nonphotosensitive, Cholesterol monooxygenase (side-chain cleaving) deficiency, Kallmann syndrome, Hartsfield syndrome, Medulloblastoma, Neuroblastoma, Encephalocraniocutaneous lipomatosis, Astrocytoma, Brainstem glioma, Adenocarcinoma of stomach, Rosette-forming glioneuronal tumor, Hypogonadotropic hypogonadism with anosmia, Spherocytosis type 1, Mental retardation, autosomal dominant, Idiopathic basal ganglia calcification, Basal ganglia calcification, idiopathic, Dystonia, torsion, Mucopolysaccharidosis, MPS-III-C, Retinitis pigmentosa, Mucopolysaccharidosis, MPS-III-C, Vesicoureteral reflux, CHARGE
association, Ataxia with vitamin E deficiency, nocturnal frontal lobe epilepsy, Joubert syndrome, Melnick-Fraser syndrome, Osteopetrosis with renal tubular acidosis, carbonic anhydrase II variant, Achromatopsia, Hereditary cancer-predisposing syndrome,
Microcephaly, normal intelligence and immunodeficiency, Microcephaly, normal intelligence and immunodeficiency, Joubert syndrome, Meckel syndrome type 3, Nephronophthisis, Meckel-Gruber syndrome, coach syndrome, Pyruvate dehydrogenase phosphatase deficiency, Carcinoma of colon, Leigh syndrome, multiple synostoses syndrome, Microphthalmia, isolated, Klippel-Feil syndrome, autosomal dominant, Leber congenital amaurosis, Klippel- Feil syndrome, autosomal dominant, Anauxetic dysplasia, Cohen syndrome, Cohen syndrome, Abnormality of the eye, Ciliary dyskinesia, primary, 28, Epilepsy, nocturnal frontal lobe, Comeal dystrophy, corneal dystrophy, posterior polymorphous, RRM2B-related mitochondrial disease, Mitochondrial DNA depletion syndrome, encephalomyopathic form, with renal tubulopathy, RRM2B-related mitochondrial disease, Nail disorder, nonsyndromic congenital, Nail disease, Dihydropyrimidinase deficiency, Tetraamelia syntrome,
Trichorhinophalangeal dysplasia type I, Multiple congenital exostosis, Dandy-Walker like malformation with atrioventricular septal defect, Benign familial neonatal seizures, Ciliary dyskinesia, primary, Iodotyrosyl coupling defect, Mental retardation, autosomal recessive, Deficiency of steroid 11 -beta-monooxygenase, Corticosterone methyloxidase type 1 deficiency, Hyperlipoproteinemia, type ID, Amelogenesis imperfecta, hypocalcification type, 5-Oxoprolinase deficiency, Mitochondrial complex III deficiency, nuclear type 6, Brown- Vialetto-Van Laere syndrome, Hereditary acrodermatitis enteropathica, Rothmund-Thomson syndrome, Baller-Gerold syndrome, Hyperimmunoglobulin E recurrent infection syndrome, autosomal recessive, Nicolaides-Baraitser syndrome, Cerebellar ataxia, mental retardation, and dysequilibrium syndrome, Retinal cone dystrophy, Familial erythrocytosis, Chronic myelogenous leukemia, Polycythemia vera, Budd-Chiari syndrome, Myelofibrosis, Budd- Chiari syndrome, susceptibility to, somatic, Acute myeloid leukemia, Thrombocythemia, Myeloproliferative disorder, Subacute lymphoid leukemia, Non-ketotic hyperglycinemia, Hydrocephalus, Melanoma-pancreatic cancer syndrome, Hereditary cutaneous melanoma, Hereditary cancer-predisposing syndrome, Cutaneous malignant melanoma, Hereditary cutaneous melanoma, Hereditary cancer-predisposing syndrome, Hereditary cutaneous melanoma, Melanoma-pancreatic cancer syndrome, Hereditary cutaneous melanoma, Hereditary cancer-predisposing syndrome, neurodevelopmental disorder with progressive microcephaly, spasticity, and brain anomalies, Bardet-Biedl syndrome, Glaucoma, primary congenital, Singleton-Merten syndrome, Ciliary dyskinesia, Distal spinal muscular atrophy, autosomal recessive, Deficiency of UDPglucose-hexose-1 -phosphate uridylyltransferase, Deficiency of UDPglucose-hexose-l-phosphate uridylyltransferase, Galactosemia, Inclusion body myopathy with early-onset paget disease and frontotemporal dementia, Fanconi anemia, complementation group G, Metaphyseal chondrodysplasia, McKusick type, Acromesomelic dysplasia Maroteaux type, Inclusion body myopathy, Nonaka myopathy, Sialuria, GNE myopathy, Sialuria, Inclusion body myopathy, Nonaka myopathy, Sialuria, Primary hyperoxaluria, type II, Pontocerebellar hypoplasia, type lb, Friedreich's ataxia, Progressive familial intrahepatic cholestasis, Hypomagnesemia, intestinal, Cone-rod dystrophy and hearing loss, Obesity, hyperphagia, and developmental delay, AGTPBP1 -related condition, Type B brachydactyly, Fmctose-biphosphatase deficiency, Fanconi anemia, Fanconi anemia, complementation group C, Hereditary cancer-predisposing syndrome, Gorlin syndrome, Gorlin syndrome, Hereditary cancer-predisposing syndrome, Xeroderma pigmentosum, type 1, Spondyloepimetaphyseal dysplasia Genevieve type, Early infantile epileptic
encephalopathy 59, Foeys-Dietz syndrome, Thoracic aortic aneurysm and aortic dissection, Foeys-Dietz syndrome, Congenital disorder of glycosylation type 1, Hereditary fmctosuria, Familial hypoalphalipoproteinemia, Tangier disease, Fimb-girdle muscular dystrophy- dystroglycanopathy, type C4, Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies, type A4, Primary autosomal recessive microcephaly, Meretoja syndrome, adrenal insufficiency, NR5A1 -related, 46, XY sex reversal, type 3, Nail-patella syndrome, Early infantile epileptic encephalopathy 4, Epileptic encephalopathy, Primary pulmonary hypertension, Osier hemorrhagic telangiectasia syndrome, Coenzyme Q10 deficiency, primary, Ichthyosis prematurity syndrome, Congenital disorder of glycosylation type 1M, Citrullinemia type I, Citrullinemia type I, Citrullinemia, mild, Neuropathy, hereditary sensory and autonomic, type VIII, short stature, hearing loss, retinis pigmentosa, and distinctive facies, Cortical malformations, occipital, Limb-girdle muscular dystrophy- dystroglycanopathy, type Cl, Congenital muscular dystrophy-dystroglycanopathy with mental retardation, type Bl, Walker- Warburg congenital muscular dystrophy,
Spinocerebellar ataxia autosomal recessive, Tuberous sclerosis syndrome, Tuberous sclerosis, Lymphangiomyomatosis, Congenital nonprogressive myopathy with Moebius and Robin sequences, Dopamine beta hydroxylase deficiency, Ehlers-Danlos syndrome, type 2, Ehlers- Danlos syndrome, classic type, Early infantile epileptic encephalopathy, Epilepsy, nocturnal frontal lobe, Joubert syndrome, Adams-Oliver syndrome, Aortic valve disorder, Adams- Oliver syndrome, Congenital generalized lipodystrophy type 1, Neurodevelopmental disorder with or without hyperkinetic movements and seizures, autosomal dominant, Autosomal recessive hypophosphatemic bone disease, Chromosome 9q deletion syndrome, Neoplasm of stomach, Prostate cancer, somatic, Refsum disease, adult, Severe combined
immunodeficiency, athabascan-type, Renal adysplasia, Megaloblastic anemia due to inborn errors of metabolism, Primary ciliary dyskinesia, Kartagener syndrome, Desanto-shinawi syndrome, Neural tube defect, Familial medullary thyroid carcinoma, Multiple endocrine neoplasia, type 2, MEN2A and FMTC, Multiple endocrine neoplasia, type 2, MEN2A and Unclassified, MEN2A and FMTC, Multiple endocrine neoplasia, type 2, MEN2A and FMTC, Hereditary cancer-predisposing syndrome, Multiple endocrine neoplasia, type 2b, Familial medullary thyroid carcinoma, Multiple endocrine neoplasia, type 2a, Multiple endocrine neoplasia, type 2, MEN2A and FMTC, FMTC and Unclassified, Multiple endocrine neoplasia, type 2a, Hereditary cancer-predisposing syndrome, Pheochromocytoma, Familial medullary thyroid carcinoma, Multiple endocrine neoplasia, type 2a, Multiple endocrine neoplasia, type 2, MEN2A and FMTC, Medullary thyroid carcinoma, Multiple endocrine neoplasia, type 2a, Multiple endocrine neoplasia, type 2, MEN2 phenotype: Unknown, Hereditary cancer-predisposing syndrome, Multiple endocrine neoplasia, type 2b, Multiple endocrine neoplasia, type 2a, MEN2 phenotype: Unclassified, Multiple endocrine neoplasia, type 2, MEN2A and FMTC, Hereditary cancer-predisposing syndrome, Familial medullary thyroid carcinoma, Multiple endocrine neoplasia, type 2a, MEN2A and FMTC, , Hereditary cancer-predisposing syndrome, Multiple endocrine neoplasia, Telangiectasia, hereditary hemorrhagic, type 5, Cockayne syndrome B, Premature ovarian failure, Familial infantile myasthenia, Charcot-Marie-Tooth disease, demyelinating, type Id, Congenital
hypomyelinating neuropathy, Neuropathy, congenital hypomyelinating, autosomal dominant, Shprintzen-Goldberg syndrome, Goldberg-Shprintzen megacolon syndrome, Shprintzen- Goldberg syndrome, Diarrhea, malabsorptive, congenital, Aplastic anemia, Hemophagocytic lymphohistiocytosis, familial, nephrotic syndrome, Hyperphenylalaninemia, BH4 -deficient, D, Histiocytosis-lymphadenopathy plus syndrome, Usher syndrome, type ID, pituitary adenoma, multiple types, Usher syndrome, type ID, Usher syndrome, type ID, Gaucher disease, atypical, due to saposin C deficiency, Krabbe disease atypical due to Saposin A deficiency, Combined saposin deficiency, Sphingolipid activator protein deficiency, Gaucher disease, atypical, due to saposin C deficiency, Spondyloepiphyseal dysplasia with congenital joint dislocations, Dilated cardiomyopathy 1W, Familial hypertrophic cardiomyopathy, Cardiovascular phenotype, Hypermethioninemia due to adenosine kinase deficiency,
Genitopatellar syndrome, Young Simpson syndrome, Hypomyelinating leukodystrophy, Idiopathic fibrosing alveolitis, chronic form, Hepatic methionine adenosyltransferase deficiency, Hereditary cancer-predisposing syndrome, Juvenile polyposis syndrome, Juvenile polyposis syndrome, Hereditary cancer-predisposing syndrome, Hyperinsulinism- hyperammonemia syndrome, Spondyloepimetaphyseal dysplasia, pakistani type,
hyperekplexia, Cowden syndrome, PTEN hamartoma tumor syndrome, Hereditary cancer- predisposing syndrome, Hereditary cancer-predisposing syndrome, Neoplasm of the breast, PTEN hamartoma tumor syndrome, Malignant melanoma of skin, Squamous cell carcinoma of the head and neck, Small cell lung cancer, Squamous cell lung carcinoma, Renal cell carcinoma, papillary, Neoplasm of the breast, Glioblastoma, Hereditary cancer-predisposing syndrome, Colorectal Neoplasms, Uterine cervical neoplasms, Adenocarcinoma of stomach, Malignant neoplasm of body of uterus, Adenocarcinoma of prostate, Uterine
Carcinosarcoma, PTEN hamartoma tumor syndrome, Cowden syndrome, Hereditary cancer- predisposing syndrome, Hereditary cancer-predisposing syndrome, Lhermitte-Duclos disease, Neoplasm of the breast, Colorectal Neoplasms, Hereditary cancer-predisposing syndrome, Macrocephaly/autism syndrome, Hereditary cancer-predisposing syndrome, PTEN
hamartoma tumor syndrome, Cutaneous melanoma, Hereditary cancer-predisposing syndrome, PTEN hamartoma tumor syndrome, Hereditary cancer-predisposing syndrome, Autoimmune lymphoproliferative syndrome, type la, Lysosomal acid lipase deficiency, Microcephaly with or without chorioretinopathy, lymphedema, or mental retardation, Hydranencephaly with renal aplasia-dysplasia, Spastic paraplegia, Cutis laxa, autosomal dominant, Primary hyperoxaluria, type III, Spastic tetraparesis, Hermansky-Pudlak syndrome, Dubin-Johnson syndrome, Renal coloboma syndrome, Autosomal dominant progressive external ophthalmoplegia with mitochondrial DNA deletions, Mitochondrial diseases, Autosomal dominant progressive external ophthalmoplegia with mitochondrial DNA deletions, Mitochondrial diseases, Kallmann syndrome, Combined partial 17-alpha- hydroxylase/ 17, 20-lyase deficiency, Complete combined 17-alpha-hydroxylase/17, 20-lyase deficiency, Cerebroretinal microangiopathy with calcifications and cysts, Adult junctional epidermolysis bullosa, Epidermolysis bullosa, junctional, spermatogenic failure, Primary dilated cardiomyopathy, Dilated cardiomyopathy, Microphthalmia, syndromic, Myofibrillar myopathy, BAG3-related, Myofibrillar myopathy, BAG3-related, Dilated cardiomyopathy, Jackson-Weiss syndrome, Craniosynostosis, nonsyndromic unicoronal, Pfeiffer syndrome, Craniofacial-skeletal-dermatologic dysplasia, FGFR2 related craniosynostosis, Pfeiffer syndrome, FGFR2 related craniosynostosis, Cerebral arteriopathy, autosomal dominant, with subcortical infarcts and leukoencephalopathy, type 2, Ornithine aminotransferase deficiency, Congenital erythropoietic porphyria, Muscular hypotonia, Muscular hypotonia, Intellectual disability (severe), Hypotonia, ataxia, and delayed development syndrome, Global developmental delay, Expressive language delay, Intellectual disability, Ataxia, Muscular hypotonia, Hypotonia, ataxia, and delayed development syndrome, Mitochondrial short-chain enoyl-coa hydratase deficiency, Noonan syndrome, Follicular thyroid carcinoma,
Spermatocytic seminoma, somatic, Spermatocytic seminoma, Neoplasm of the breast, Costello syndrome, Myopathy, congenital, with excess of muscle spindles, Fiver cancer, Chronic lymphocytic leukemia, Malignant melanoma of skin, Multiple myeloma, Squamous cell carcinoma of the head and neck, Costello syndrome, Fung adenocarcinoma, Squamous cell lung carcinoma, Acute myeloid leukemia, Neoplasm of the breast, Hepatocellular carcinoma, Pancreatic adenocarcinoma, Squamous cell carcinoma of the skin, Transitional cell carcinoma of the bladder, Colorectal Neoplasms, Uterine cervical neoplasms, Thymoma, Adenocarcinoma of stomach, Malignant neoplasm of body of uterus, Fiver cancer, Chronic lymphocytic leukemia, Malignant melanoma of skin, Multiple myeloma, Squamous cell carcinoma of the head and neck, Costello syndrome, Fung adenocarcinoma, Squamous cell lung carcinoma, Acute myeloid leukemia, Rasopathy, Neoplasm of the breast, Hepatocellular carcinoma, Pancreatic adenocarcinoma, Squamous cell carcinoma of the skin, Transitional cell carcinoma of the bladder, Neoplasm, Colorectal Neoplasms, Uterine cervical neoplasms, Neoplasm of the thyroid gland, Adenocarcinoma of stomach, Malignant neoplasm of body of uterus, Malignant tumor of urinary bladder, Costello syndrome, Epidermal nevus, Myopathy, congenital, with excess of muscle spindles, Cutaneous melanoma, Neoplasm of the thyroid gland, Liver cancer, Malignant melanoma of skin, Multiple myeloma, Squamous cell carcinoma of the head and neck, Costello syndrome, Epidermal nevus, Lung
adenocarcinoma, Acute myeloid leukemia, Myelodysplastic syndrome, Nevus sebaceous, Nevus sebaceous, somatic, Rasopathy, Neoplasm of the breast, Glioblastoma, Bladder carcinoma, Hepatocellular carcinoma, Pancreatic adenocarcinoma, Squamous cell carcinoma of the skin, Transitional cell carcinoma of the bladder, Carcinoma of esophagus, Colorectal Neoplasms, Uterine cervical neoplasms, Neoplasm of the thyroid gland, Papillary renal cell carcinoma, sporadic, Adenoid cystic carcinoma, Nasopharyngeal Neoplasms,
Adenocarcinoma of stomach, Ovarian Serous Cystadenocarcinoma, Malignant neoplasm of body of uterus, Adenocarcinoma of prostate, Uterine Carcinosarcoma, Early myoclonic encephalopathy, Neutral lipid storage disease with myopathy, Ceroid lipofuscinosis neuronal, Growth restriction, severe, with distinctive facies, Hyperproinsulinemia, Permanent neonatal diabetes mellitus, Hyperproinsulinemia, Segawa syndrome, autosomal recessive, Dystonia, Segawa syndrome, autosomal recessive, Jervell and Lange-Nielsen syndrome, Long QT syndrome, Cardiovascular phenotype, Congenital long QT syndrome, Long QT syndrome, Congenital long QT syndrome, Long QT syndrome, Long QT syndrome 1/2, digenic, Long QT syndrome, Congenital long QT syndrome, Cardiovascular phenotype, Long QT syndrome, Congenital long QT syndrome, Long QT syndrome, Cardiovascular phenotype, Beckwith- Wiedemann syndrome, Intrauterine growth retardation, metaphyseal dysplasia, adrenal hypoplasia congenita, and genital anomalies, Intrauterine growth retardation, metaphyseal dysplasia, adrenal hypoplasia congenita, and genital anomalies, Russell-Silver syndrome, Beckwith- Wiedemann syndrome, Myopathy with tubular aggregates, hemoglobin Ohio, erythrocytosis, hemoglobin TY gard, erythrocytosis, Beta-thalassemia, dominant inclusion body type, Hemoglobinopathy, Beta-plus -thalassemia, Beta thalassemia intermedia, hemoglobin Ypsilanti, erythrocytosis, betaA0A Thalassemia, Heinz body anemia, Beta-plus- thalassemia, Beta thalassemia major, betaA0A Thalassemia, beta Thalassemia, Beta-plus- thalassemia, Hemoglobin Knossos, Beta-knos so s -thalassemia, beta Thalassemia, Hemoglobin Palmerston north, erythrocytosis, Hb niigata, Beta-plus-thalassemia, beta Thalassemia, Beta thalassemia intermedia, beta Thalassemia, delta Thalassemia, hemoglobin A(2) Yialousa, Petal hemoglobin quantitative trait locus 1, Sphingomyelin/cholesterol lipidosis, Niemann- Pick disease, type B, Niemann-Pick disease, type A, Niemann-pick disease, intermediate, protracted neurovisceral, Sphingomyelin/cholesterol lipidosis, Niemann-Pick disease, type B, Niemann-Pick disease, type A, Niemann-Pick disease, type B, Niemann-Pick disease, type A, Sphingomyelin/cholesterol lipidosis, Ceroid lipofuscinosis neuronal, Neuronal ceroid lipofuscinosis, Van Maldergem syndrome, Permanent neonatal diabetes mellitus, Permanent neonatal diabetes mellitus, Diabetes mellitus, permanent neonatal, with neurologic features, Islet cell hyperplasia, Permanent neonatal diabetes mellitus, Persistent hyperinsulinemic hypoglycemia of infancy, Permanent neonatal diabetes mellitus, Hyperekplexia,
Gnathodiaphyseal dysplasia, Limb-girdle muscular dystrophy, type 2L, Gnathodiaphyseal dysplasia, Limb-girdle muscular dystrophy, type 2L, Miyoshi muscular dystrophy, AN05- Related Disorders, Limb-girdle muscular dystrophy, type 2L, Elevated serum creatine phosphokinase, Myopathy, Distal muscle weakness, Fatty replacement of skeletal muscle, Limb-girdle muscular dystrophy, type 2L, Follicle-stimulating hormone deficiency, isolated, Aniridia, Irido-corneo-trabecular dysgenesis, Foveal hypoplasia with cataract, Irido-comeo- trabecular dysgenesis, Anophthalmia - microphthalmia, Aniridia, Irido-corneo-trabecular dysgenesis, Wilms tumor, Combined cellular and humoral immune defects with granulomas, Severe combined immunodeficiency, B cell-negative, Histiocytic medullary reticulosis, Severe immunodeficiency, autosomal recessive, T-cell negative, B-cell negative, NK cell positive, Combined cellular and humoral immune defects with granulomas, Multiple exostoses type 2, Parietal foramina, Congenital disorder of glycosylation type 2C,
Thrombophilia, Hereditary factor II deficiency disease, Xeroderma pigmentosum, group E, Left ventricular noncompaction, Hypertrophic cardiomyopathy, Primary familial
hypertrophic cardiomyopathy, Hypertrophic cardiomyopathy, Cardiovascular phenotype, Primary familial hypertrophic cardiomyopathy, Cardiovascular phenotype, Familial hypertrophic cardiomyopathy, Primary familial hypertrophic cardiomyopathy, Hypertrophic, Primary familial hypertrophic cardiomyopathy, Cardiovascular phenotype, Familial hypertrophic cardiomyopathy, Familial hypertrophic cardiomyopathy, Primary familial hypertrophic cardiomyopathy, Hypertrophic cardiomyopathy, Cardiovascular phenotype, Primary familial hypertrophic cardiomyopathy, Hypertrophic cardiomyopathy, Primary familial hypertrophic cardiomyopathy, Primary familial hypertrophic cardiomyopathy, Hypertrophic cardiomyopathy, Hypertrophic cardiomyopathy, Primary familial hypertrophic cardiomyopathy, Cardiovascular phenotype, Familial hypertrophic cardiomyopathy,
Cardiovascular phenotype, Primary familial hypertrophic cardiomyopathy, Familial hypertrophic cardiomyopathy, Hypertrophic cardiomyopathy, Myasthenic syndrome, congenital, associated with acetylcholine receptor deficiency, Pena-Shokeir syndrome type I, Myasthenic syndrome, congenital, associated with acetylcholine receptor deficiency, Congenital myasthenic syndrome, Myopathy, Myasthenic syndrome, congenital, associated with acetylcholine receptor deficiency, Congenital Myasthenic Syndrome, Recessive, Myasthenic syndrome, congenital, associated with acetylcholine receptor deficiency, Hereditary angioedema type 1, Hereditary Cl esterase inhibitor deficiency - dysfunctional factor, Poikiloderma, hereditary fibrosing, with tendon contractures, myopathy, and pulmonary fibrosis, Gracile bone dysplasia, Joubert syndrome, Joubert syndrome, Meckel syndrome type, Retinal dystrophy, polycystic kidney disease with polycystic liver disease, Congenital generalized lipodystrophy type 2, Charcot-Marie-Tooth disease, type 2,
Encephalopathy, progressive, with or without lipodystrophy, Familial renal hypouricemia, Platelet-type bleeding disorder, Glycogen storage disease, type V, Hereditary cancer- predisposing syndrome, Multiple endocrine neoplasia, type 1, Hereditary cancer-predisposing syndrome, Hereditary cancer-predisposing syndrome, Multiple endocrine neoplasia, type 1, Multiple endocrine neoplasia, type 1, Hereditary cancer-predisposing syndrome, Coffin-Siris syndrome, Calfan syndrome, Verloes Bourguignon syndrome, Bardet-Biedl syndrome, Bardet-Biedl syndrome, Spinocerebellar ataxia, autosomal recessive, Pyruvate carboxylase deficiency, Cold-induced sweating syndrome, Crisponi/Cold-induced sweating syndrome, Somatotroph adenoma, Pituitary adenoma predisposition, Mitochondrial complex I deficiency, Osteopetrosis autosomal recessive, Severe congenital neutropenia autosomal dominant, congenital neutropenia, High bone mass, Osteoporosis with pseudoglioma, Epilepsy, familial temporal lobe, Carnitine palmitoyltransferase I deficiency, Charcot-Marie- Tooth disease, Charcot-Marie-Tooth disease, axonal, type 2S, IGHMBP2-related condition, Spinal muscular atrophy, distal, autosomal recessive, Charcot-Marie-Tooth disease, axonal, type 2S, Werdnig-Hoffmann disease, Charcot-Marie-Tooth disease, axonal, type 2S,
Deafness with labyrinthine aplasia microtia and microdontia (LAMM), Smith-Lemli-Opitz syndrome, Cerebral folate deficiency, Opsismodysplasia, 3-methylglutaconic aciduria with cataracts, neurologic involvement, and neutropenia, Joubert syndrome, Vitreoretinopathy, neovascular inflammatory, Usher syndrome, type 1, Usher syndrome, type 1, Usher syndrome, type IB, Usher syndrome, type 1, MY07A-Related Disorders, polycystic liver disease with or without kidney cysts, Tremor, hereditary essential, Mitochondrial complex I deficiency, Mitochondrial diseases, Tyrosinase-negative oculocutaneous albinism,
Tyrosinase-negative oculocutaneous albinism, Oculocutaneous albinism type IB, Albinism, ocular, with sensorineural deafness, Skin/hair/eye pigmentation, variation in, Oculocutaneous albinism, Hereditary cancer-predisposing syndrome, Ataxia-telangiectasia-like disorder, Charcot-Marie-Tooth disease, type 4B1, Focal segmental glomerulosclerosis, Coloboma, ocular, with or without hearing impairment, cleft lip/palate, and/or mental retardation, Metaphyseal chondrodysplasia, Spahr type, Short-rib polydactyly syndrome type III, Jeune thoracic dystrophy, Short-rib thoracic dysplasia with or without polydactyly, Short-rib polydactyly syndrome type I, Short-rib polydactyly syndrome type III, Deficiency of acetyl- CoA acetyltransferase, Hereditary cancer-predisposing syndrome, Ataxia-telangiectasia syndrome, Ataxia-telangiectasia syndrome, Ataxia-telangiectasia variant, Pyruvate dehydrogenase E2 deficiency, Pheochromocytoma, Paragangliomas, Hereditary cancer- predisposing syndrome, Paragangliomas, Hereditary Paraganglioma-Pheochromocytoma Syndromes, Paraganglioma and gastric stromal sarcoma, Pheochromocytoma,
Paragangliomas, Hereditary Paraganglioma-Pheochromocytoma Syndromes, Cowden syndrome, Paraganglioma and gastric stromal sarcoma, Pheochromocytoma, Mitochondrial complex II deficiency, Paragangliomas, Hereditary Paraganglioma-Pheochromocytoma Syndromes, Cowden syndrome 3, Apolipoprotein A-IV polymorphism,
APOA4* l/APOA4*2, Hyperalphalipoproteinemia, Coronary heart disease, Apolipoprotein A-I (Baltimore), Immunodeficiency, Kabuki syndrome, Wiedemann-Steiner syndrome, Short stature, rhizomelic, with microcephaly, micrognathia, and developmental delay, Glucose-6- phosphate transport defect, Acute intermittent porphyria, Congenital myasthenic syndrome, Noonan syndrome-like disorder with or without juvenile myelomonocytic leukemia, Microphthalmia, isolated, Gaze palsy, familial horizontal, with progressive scoliosis, Megalencephalic leukoencephalopathy with subcortical cysts 2a, Deficiency of isobutyryl- CoA dehydrogenase, Cone dystrophy, Retinal cone dystrophy, Megalencephaly- polymicrogyria-polydactyly-hydrocephalus syndrome, Tumoral calcinosis, familial, hyperphosphatemic, Episodic ataxia type 1, Myokymia, Atrial fibrillation, familial, von Willebrand disease type 3, von Willebrand disease type 2N, von Willebrand disease type 2N, TNF receptor-associated periodic fever syndrome (TRAPS), Sifrim-Hitz-Weiss syndrome, Triosephosphate isomerase deficiency, Ehlers-Danlos syndrome, type 8, Immunodeficiency with hyper IgM type 2, Aortic aneurysm, familial thoracic, Acute myeloid leukemia, Diarrhea, Brachydactyly with hypertension, Hypoglycemia with deficiency of glycogen synthetase in the liver, Lamb-shaffer syndrome, Non-small cell lung cancer, Colorectal Neoplasms, Neoplasm of the thyroid gland, Non-small cell lung cancer, Rasopathy, on-small cell lung cancer, Colorectal Neoplasms, Neoplasm of the thyroid gland, cetuximab response - Dosage, panitumumab response - Dosage, Non-small cell lung cancer, RAS -associated autoimmune leukoproliferative disorder, Colorectal Neoplasms, Cerebral arteriovenous malformation, Juvenile myelomonocytic leukemia, Carcinoma of pancreas, Non-small cell lung cancer, Acute myeloid leukemia, Nevus sebaceous, Nevus sebaceous, somatic, Ovarian Neoplasms, Colorectal Neoplasms, Neoplasm of the thyroid gland, Endometrial carcinoma, Lung cancer, Lung adenocarcinoma, Non-small cell lung cancer, Ovarian Neoplasms, Colorectal Neoplasms, Neoplasm of the thyroid gland, Charcot-Marie-Tooth disease, type 4H, Optic atrophy, Encephalopathy due to defective mitochondrial and peroxisomal fission, Arrhythmogenic right ventricular cardiomyopathy, Arrhythmogenic right ventricular cardiomyopathy, type 9, Arrhythmogenic right ventricular dysplasia/cardiomyopathy, Cardiovascular phenotype, Parkinson disease, late-onset, Parkinson disease, autosomal dominant, IRAK4 deficiency, Vitamin D-dependent rickets, type 2, Spondyloperipheral dysplasia, Short ribs, Absent vertebral body mineralization, Spondylometaphyseal dysplasia, Stickler syndrome type 1, Stickler syndrome, type I, nonsyndromic ocular, Achondrogenesis, type II, Stickler syndrome type 1, Spondylometaphyseal dysplasia, Spondylometaphyseal dysplasia, Stickler syndrome, type I, nonsyndromic ocular, Glycogen storage disease, type VII, Glycogen storage disease, type VII, Osteogenesis imperfecta, type xv, Osteogenesis imperfecta, type xv, Osteogenesis imperfecta, type xv, Kabuki syndrome, Smith-Magenis Syndrome-like, Lissencephaly, Diabetes insipidus, nephrogenic, autosomal recessive, Diffuse palmoplantar keratoderma, Bothnian type, Hypochromic microcytic anemia with iron overload, Early infantile epileptic encephalopathy, Hereditary hemorrhagic telangiectasia type 2, Pulmonary arterial hypertension related to hereditary hemorrhagic telangiectasia, Primary pulmonary hypertension, Beaded hair, Pachyonychia congenita, Epidermolysis bullosa simplex, Dowling-Meara type, with severe palmoplantar keratoderma, Epidermolysis bullosa simplex, Cockayne-Touraine type, Epidermolysis bullosa simplex, Koebner type, Dowling-Degos disease, Ichthyosis bullosa of Siemens, Bullous ichthyosiform erythroderma, Cirrhosis, cryptogenic, Cirrhosis, noncryptogenic, susceptibility to, Glucocorticoid deficiency with achalasia, Ectodermal dysplasia, hair/nail type, Pigmentary retinal dystrophy, Fundus albipunctatus, autosomal recessive, Sulfite oxidase deficiency, isolated, Immunodeficiency, Congenital cataract, axonal, type 2u, Nephrotic syndrome, type 11, Bardet-Biedl syndrome, Myopathy, centronuclear, Joubert syndrome, Leber congenital amaurosis, Meckel syndrome type 4, Senior-Loken syndrome, Bardet-Biedl syndrome, Joubert syndrome, Leber congenital amaurosis, Meckel-Gruber syndrome, Meckel syndrome type 4, Senior-Loken syndrome, Joubert syndrome, Bardet-Biedl syndrome, Nephronophthisis, Meckel-Gruber syndrome, Nephronophthisis, Leber congenital amaurosis, Meckel syndrome type 4, Senior-Loken syndrome, Meckel-Gruber syndrome, Nephronophthisis, CEP290-Related Disorders, Leber congenital amaurosis, Meckel syndrome type 4, Senior-Loken syndrome, Leber congenital amaurosis, Meckel syndrome type 4, Senior-Loken syndrome, Cone-rod dystrophy, Cornea plana, Nephronophthisis, I cell disease, Pseudo-Hurler polydystrophy, Phenylketonuria, Hyperphenylalaninemia, non-pku, Congenital central hypoventilation, Hypomyelinating leukodystrophy, with or without oligodontia and/or hypogonadotropic hypogonadism, Methylmalonic aciduria cblB type, Methylmalonic academia, Spondylometaphyseal dysplasia, Kozlowski type, Skeletal dysplasia, Charcot-Marie-Tooth disease type 2C, Skeletal dysplasia, Neuromuscular Diseases, Digital arthropathy-brachydactyly, familial, Metatrophic dysplasia, Spondylometaphyseal dysplasia, Distal spinal muscular atrophy, congenital nonprogressive, Scapuloperoneal spinal muscular atrophy, Charcot-Marie-Tooth disease type 2C, Skeletal dysplasia, , Neuromuscular Diseases, Charcot-Marie-Tooth, Type 2,
Brachyolmia, Metatrophic dysplasia, Skeletal dysplasia, Neuromuscular Diseases, Darier disease, acral hemorrhagic type, Darier disease, segmental, Familial hypertrophic
cardiomyopathy, Primary familial hypertrophic cardiomyopathy, Familial hypertrophic cardiomyopathy, Death in infancy, Ventricular extrasystoles, Cardiovascular phenotype, Noonan syndrome, Noonan syndrome, Rasopathy, Juvenile myelomonocytic leukemia, Noonan syndrome, Leopard syndrome, Rasopathy, Metachondromatosis, Noonan syndrome with multiple lentigines, Noonan syndrome 1, LEOPARD syndrome, Scoliosis, Rasopathy, Abnormal facial shape, Cafe-au-lait spot, Specific learning disability, Intellectual disability, mild, Aortic valve disease, Holt-Oram syndrome, Mental retardation and distinctive facial features with or without cardiac defects, Charcot-Marie-Tooth disease, type 2L,
Microcephaly, primary, autosomal recessive, Deficiency of butyryl-CoA dehydrogenase, Maturity-onset diabetes of the young, type 3, Immune dysfunction with T-cell inactivation due to calcium entry defect, Leukoencephalopathy with vanishing white matter, Joubert syndrome, Cutis laxa with osteodystrophy, Myopathy, lactic acidosis, and sideroblastic anemia, Knuckle pads, deafness and leukonychia syndrome, Keratitis-ichthyosis-deafness syndrome, autosomal dominant, Mutilating keratoderma, Hystrix-like ichthyosis with deafness, Keratitis-ichthyosis-deafness syndrome, autosomal dominant, Keratoderma palmoplantar deafness, Knuckle pads, deafness and leukonychia syndrome, Deafness, X- linked, Hearing impairment, Keratoderma palmoplantar deafness, Cardiomyopathy, Left ventricular noncompaction, Cardiomyopathy, Infantile muscular hypotonia, Combined oxidative phosphorylation deficiency, Pancreatic agenesis, congenital, Diabetes mellitus type 2, Acute lymphoid leukemia, Acute myeloid leukemia, Breast-ovarian cancer, familial, Hereditary breast and ovarian cancer syndrome, Breast-ovarian cancer, familial, Hereditary breast and ovarian cancer syndrome, Hereditary cancer-predisposing syndrome, Hereditary breast and ovarian cancer syndrome, Familial cancer of breast, Breast-ovarian cancer, familial, Hereditary breast and ovarian cancer syndrome, Hereditary cancer-predisposing syndrome, Familial cancer of breast, Breast-ovarian cancer, familial, Fanconi anemia, complementation group Dl, Medulloblastoma, Wilms tumor, Malignant tumor of prostate, Tracheoesophageal fistula, Pancreatic cancer, Glioma susceptibility, Hereditary breast and ovarian cancer syndrome, Hereditary cancer-predisposing syndrome, Familial cancer of breast, Breast-ovarian cancer, familial, Hereditary breast and ovarian cancer syndrome, Neoplasm of the breast, Hereditary cancer-predisposing syndrome, Familial cancer of breast, Breast-ovarian cancer, familial, Fanconi anemia, complementation group Dl,
Medulloblastoma, Wilms tumor, Malignant tumor of prostate, Tracheoesophageal fistula, Pancreatic cancer, Glioma susceptibility, Hereditary breast and ovarian cancer syndrome, Hereditary cancer-predisposing syndrome, BRCA2-Related Disorders, Breast-ovarian cancer, familial, Fanconi anemia, complementation group Dl, Fanconi anemia, Hereditary breast and ovarian cancer syndrome, Hereditary cancer-predisposing syndrome, Primary pulmonary hypertension, Congenital disorder of glycosylation type 2L, Hyperornithinemia- hyperammonemia-homocitrullinuria syndrome, Retinoblastoma, Retinoblastoma, Neoplasm, Small cell lung cancer, Neoplasm, Retinitis pigmentosa, Retinal dystrophy with or without extraocular anomalies, Retinitis pigmentosa, Retinal dystrophy with extraocular anomalies, Aicardi Goutieres syndrome, Wilson disease, Ceroid lipofuscinosis neuronal, Hirschsprung disease, Waardenburg syndrome type 4A, Deafness and myopia, Catel Manzke syndrome, Propionyl-CoA carboxylase deficiency, Hypotonia, infantile, with psychomotor retardation and characteristic facies, Congenital contractures of the limbs and face, hypotonia, and developmental delay, Xeroderma pigmentosum, group G, Xeroderma pigmentosum group g/Cockayne syndrome, Xeroderma pigmentosum, group G, Xeroderma pigmentosum, Schizencephaly, Angiopathy, hereditary, with nephropathy, aneurysms, and muscle cramps, Squamous cell carcinoma of the head and neck, Oguchi disease, Cone-rod dystrophy, Leber congenital amaurosis, Cone-Rod Dystrophy, Recessive, Autism, susceptibility to, Ocular coloboma, autosomal recessive, Lysinuric protein intolerance, Primary dilated
cardiomyopathy, Wolff-Parkinson-White pattern, Dilated cardiomyopathy 1EE, Familial hypertrophic cardiomyopathy, Primary familial hypertrophic cardiomyopathy, Sudden cardiac death, Cardiovascular phenotype, Hypertrophic cardiomyopathy, Primary familial hypertrophic cardiomyopathy, Cardiovascular phenotype, Familial hypertrophic
cardiomyopathy, Primary familial hypertrophic cardiomyopathy, Cardiovascular phenotype, Familial hypertrophic cardiomyopathy, Familial cardiomyopathy, Hypertrophic cardiomyopathy, Cardiomyopathy, Hypertrophic cardiomyopathy, Dyskeratosis congenita, Dyskeratosis congenita autosomal dominant, Dyskeratosis congenita autosomal dominant, Dyskeratosis congenita, autosomal dominant, Revesz syndrome, Dyskeratosis congenita autosomal dominant, Dyskeratosis congenita, Dyskeratosis Congenita, Dominant, Autosomal recessive congenital ichthyosis, Rett syndrome, congenital variant, Mitochondrial complex I deficiency, Ectodermal dysplasia, anhidrotic, with T-cell immunodeficiency, autosomal dominant, Benign hereditary chorea, Choreoathetosis, hypothyroidism, and neonatal respiratory distress, Partial congenital absence of teeth, Ciliary dyskinesia, primary,
Kartagener syndrome, L-2-hydroxyglutaric aciduria, Penetrating foot ulcers, Distal sensory impairment, Osteomyelitis leading to amputation due to slow healing fractures, Distal lower limb muscle weakness, Glycogen storage disease, type VI, Dystonia, Dopa-responsive type, Microphthalmia syndromic, Anophthalmia, combined immunodeficiency and megaloblastic anemia, Hereditary cancer-predisposing syndrome, congential disorder of glycosylation with defective fucosylation, Leber congenital amaurosis, Platelet-type bleeding disorder,
Alzheimer disease, type 3, Alzheimer disease, type 3, Pick's disease, Alzheimer disease, type 3, Frontotemporal dementia, Pick's disease, Acne inversa, familial, Coenzyme Q10 deficiency, primary, Methylmalonate semialdehyde dehydrogenase deficiency, Niemann-Pick disease type C2, Niemann-Pick disease, type C, Leukoencephalopathy with vanishing white matter, Carcinoma of colon, Endometrial carcinoma, Hereditary nonpolyposis colorectal cancer type 7, Lynch syndrome, MLH3-Related Lynch Syndrome, Nevus comedonicus, Proliferative vasculopathy and hydranencephaly-hydrocephaly syndrome, Cone-rod dystrophy, Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies, type A2, Congenital muscular dystrophy-dystroglycanopathy with mental retardation, type B2, Limb-girdle muscular dystrophy-dystroglycanopathy, type C2, Neuropathy, hereditary sensory, type IC, Hereditary sensory and autonomic neuropathy type IC, Thyroid adenoma, hyperfunctioning, somatic, Thyroid adenoma, hyperfunctioning, Hypothyroidism, congenital, nongoitrous, Hyperthyroidism, nonautoimmune, Thyroid adenoma, hyperfunctioning, somatic, Thyroid adenoma, hyperfunctioning, Galactosylceramide beta-galactosidase deficiency, Leber congenital amaurosis, Autosomal recessive cutis laxa type IA, TRIP11- related condition, Alpha- 1- antitrypsin deficiency, Pineoblastoma, DICER 1 -related
pleuropulmonary blastoma cancer predisposition syndrome, Hereditary cancer-predisposing syndrome, Gabriele-De Vries Syndrome, Spinal muscular atrophy, SMA, Spinal muscular atrophy, lower extremity predominant, autosomal dominant, Mental retardation, autosomal dominant, Mental retardation, autosomal dominant, Charcot-Marie-Tooth disease, dominant intermediate E, cerebellar-facial-dental syndrome, Cerebellofaciodental syndrome,
Precocious puberty, central, Schaaf-yang syndrome, Angelman syndrome, Epileptic encephalopathy, early infantile, Tyrosinase-positive oculocutaneous albinism, Congenital stationary night blindness, type 1C, Andermann syndrome, Familial hypertrophic
cardiomyopathy, Familial pulmonary capillary hemangiomatosis, Isovaleric acidemia, type I, Adams-Oliver syndrome, Limb-girdle muscular dystrophy, type 2A, Spherocytosis type 5, Peeling skin syndrome, Peeling skin syndrome, acral type, Microcephaly and
chorioretinopathy, autosomal recessive, Hypoproteinemia, hypercatabolic, Arginine: glycine amidinotransferase deficiency, Bartter syndrome, type 1, antenatal, Marfan syndrome,
Marfan lipodystrophy syndrome, Cardiovascular phenotype, Marfan syndrome, Thoracic aortic aneurysm and aortic dissection, Thoracic aortic Aneurysm and dissection (TAAD), Cardiovascular phenotype, Stiff skin syndrome, Marfan syndrome, Thoracic aortic aneurysm and aortic dissection, Thoracic aortic Aneurysm and dissection (TAAD), Marfan
Syndrome/Loeys-Dietz Syndrome/Familial Thoracic Aortic Aneurysms and Dissections, Cardiovascular phenotype, Seckel syndrome, Aromatase deficiency, Lethal congenital contracture syndrome, Intellectual developmental disorder with cardiac arrhythmia, Primary ciliary dyskinesia, Craniosynostosis, Parkinson disease, age at onset, susceptibility to, Parkinson disease, Parkinson disease, autosomal recessive early-onset, Hyperchlorhidrosis, isolated, Nemaline myopathy, Congenital stationary night blindness, type ID, Lung adenocarcinoma, Non-small cell lung cancer, Cutaneous melanoma, Cardio-facio-cutaneous syndrome, Cardiofaciocutaneous syndrome, Cardio-facio-cutaneous syndrome, Aortic valve disease, Thoracic aortic aneurysm and aortic dissection, Cardiovascular phenotype, Loeys- Dietz syndrome, Ceroid lipofuscinosis neuronal, Tay-Sachs disease,Bardet-Biedl syndrome, Sick sinus syndrome, autosomal dominant, Tyrosinemia type I, Tyrosinemia type I,
Hypertyrosinemia, Osteochondritis dissecans, Autosomal dominant progressive external ophthalmoplegia with mitochondrial DNA deletions 1, Progressive sclerosing
poliodystrophy, Progressive sclerosing poliodystrophy, Mitochondrial diseases,
Camptocormia, Acrocallosal syndrome, Schinzel type, Spondylocostal dysostosis, Liver cancer, Acute myeloid leukemia, Neoplasm of brain, Hepatocellular carcinoma, Brainstem glioma, Colorectal Neoplasms, Multiple myeloma, Squamous cell carcinoma of the head and neck, Acute myeloid leukemia, Myelodysplastic syndrome, Colorectal Neoplasms, Bloom syndrome, Bloom syndrome, Hereditary cancer-predisposing syndrome, Arthrogryposis renal dysfunction cholestasis syndrome, Epileptic encephalopathy, childhood-onset, Congenital heart defects, multiple types, Weill-Marchesani-like syndrome, Autosomal recessive congenital ichthyosis, Microphthalmia, isolated, Osteosclerotic metaphyseal dysplasia, alpha Thalassemia, Hemoglobin Loire, Erythrocytosis, Hemoglobin Chesapeake, Erythrocytosis, Hemoglobin Legnano, Erythrocytosis, Spinocerebellar ataxia, autosomal recessive,
Mucolipidosis III Gamma, You-Hoover-Fong syndrome, Renal dysplasia, retinal pigmentary dystrophy, cerebellar ataxia and skeletal dysplasia, Joubert syndrome with Jeune asphyxiating thoracic dystrophy, Renal dysplasia, retinal pigmentary dystrophy, cerebellar ataxia and skeletal dysplasia, Retinis pigmentosa, Leigh syndrome, Combined oxidative
phosphorylation deficiency, Tuberous sclerosis, Tuberous sclerosis syndrome,
Lymphangiomyomatosis, Tuberous sclerosis syndrome, Polycystic kidney disease, adult type, Digitorenocerebral syndrome, Early infantile epileptic encephalopathy, Myoclonic epilepsy, familial infantile, Digitorenocerebral syndrome, Progressive myoclonus epilepsy with ataxia, Familial Mediterranean fever, Rubinstein-Taybi syndrome, Nephronophthisis, Congenital disorder of glycosylation type IK, Carbohydrate-deficient glycoprotein syndrome type I, Carbohydrate-deficient glycoprotein syndrome type I, Congenital disorder of glycosylation, Epilepsy, focal, with speech disorder and with or without mental retardation, Rolandic epilepsy, Bare lymphocyte syndrome type 2, complementation group A, Charcot-Marie- Tooth disease, type 1C, Fanconi anemia, complementation group Q, Dyskeratosis congenita, Dyskeratosis congenita, autosomal recessive, Lissencephaly, Aortic aneurysm, familial thoracic, Pseudoxanthoma elasticum, Pseudoxanthoma elasticum, Generalized arterial calcification of infancy, Familial juvenile gout, Uromodulin-associated kidney disease, Medullary cystic kidney disease, Bronchiectasis with or without elevated sweat chloride, Familial cancer of breast, Fanconi anemia, complementation group N, Tracheoesophageal fistula, Pancreatic cancer, Hereditary breast and ovarian cancer syndrome, Hereditary cancer- predisposing syndrome, Familial cancer of breast, Pancreatic cancer, Progressive
sensorineural hearing impairment, IL21R immunodeficiency, Juvenile neuronal ceroid lipofuscinosis, Ceroid lipofuscinosis, neuronal, protracted, Brody myopathy,
Spondyloepimetaphyseal dysplasia with multiple dislocations, Spondylocostal dysostosis,
Bile acid synthesis defect, congenital, Generalized epilepsy with febrile seizures plus, type 9, Warfarin response, warfarin response - Dosage, Warfarin response, Familial renal glucosuria, Glycogen storage disease IXb, Behcet's syndrome, Cylindromatosis, familial, Townes-Brocks syndrome, Joubert syndrome, Hamamy syndrome, Multicentric osteolysis, nodulosis and arthropathy, Bardet-Biedl syndrome, Retinitis pigmentosa, Nephrotic syndrome, type 12, Familial hypokalemia-hypomagnesemia, Spondyloepimetaphyseal dysplasia, Faden-Alkuraya type, Polymicrogyria, bilateral frontoparietal, Lissencephaly, with microcephaly, Retinitis pigmentosa, Poikiloderma with neutropenia, Brachioskeletogenital syndrome, Mitochondrial DNA depletion syndrome, Lamellar cataract, Combined T and B cell immunodeficiency, Dyskeratosis congenita, autosomal dominant, Dyskeratosis congenita, autosomal recessive, Norum disease, Acanthosis nigricans, Skeletal dysplasia, Insulin resistance, Short stature, Self-injurious behavior, Abnormal facial shape, Brachydactyly, Renal hypoplasia,
Abnormality of the dentition, Hepatic steatosis, Obesity, Lumbar hyperlordosis,
Hyperlipidemia, Short metacarpal, Intellectual disability, severe, Short stature, brachydactyly, intellectual developmental disability, and seizures, Acanthosis nigricans, Skeletal dysplasia, Insulin resistance, Short stature, Self-injurious behavior, Abnormal facial shape,
Brachydactyly, Renal hypoplasia, Abnormality of the dentition, Hepatic steatosis, Obesity, Lumbar hyperlordosis, Hyperlipidemia, Short metacarpal, Intellectual disability, severe, Hereditary diffuse gastric cancer, Hereditary cancer-predisposing syndrome, Ectropion inferior cleft lip and or palate, Breast cancer, lobular, Hereditary diffuse gastric cancer, Hereditary cancer-predisposing syndrome, Ectropion inferior cleft lip and or palate,
Congenital disorder of glycosylation type 2J, Striatonigral degeneration, childhood-onset, Ciliary dyskinesia, primary, Kartagener syndrome, Tyrosinemia type 2, Macular corneal dystrophy Type I, Macular comeal dystrophy, type II, Microcomea, myopic chorioretinal atrophy, and telecanthus, Spinocerebellar ataxia, autosomal recessive, Cataract, multiple types, Ayme-gripp syndrome, Giant axonal neuropathy, Autoinflammation, antibody deficiency, and immune dysregulation, plcg2-associated, Ciliary dyskinesia, primary, Persistent fetal circulation, Keratoconus, Corneal fragility keratoglobus, blue sclerae AND joint hypermobility, Keratoconus, Granulomatous disease, chronic, autosomal recessive, cytochrome b-negative, Chronic granulomatous disease, Granulomatous disease, chronic, autosomal recessive, cytochrome b-negative, Lymphedema, hereditary, III, Adenine phosphoribosyltransferase deficiency, Mucopolysaccharidosis, MPS-IV-A, KBG syndrome, Astigmatism, Cryptorchidism, Hypertelorism, Esotropia, Retrognathia, Hypermetropia, Wide nasal bridge, Cryptorchidism, Epicanthus, Hypertelorism, Astigmatism, Intellectual disability, Global developmental delay, Fanconi anemia, complementation group A, Fanconi anemia, Cutaneous malignant melanoma, Malignant Melanoma Susceptibility, Ciliary dyskinesia, primary, Syndactyly type 9, Retinitis pigmentosa, Lissencephaly, Spongy degeneration of central nervous system, Spongy degeneration of central nervous system, Canavan Disease, Familial Form, Palmoplantar keratoderma, mutilating, with periorificial keratotic plaques, , Nephropathic cystinosis, Cystinosis, atypical nephropathic, Myasthenic syndrome, congenital, 4a, slow-channel, Myasthenic syndrome, congenital, associated with acetylcholine receptor deficiency, Congenital myasthenic syndrome IB, fast-channel, Pseudo von Willebrand disease, Amyotrophic lateral sclerosis, Combined oxidative phosphorylation deficiency, Leber congenital amaurosis, Orofaciodigital syndrome XV, Very long chain acyl- CoA dehydrogenase deficiency, Myasthenic syndrome, congenital, slow-channel, Li- Fraumeni syndrome, Hereditary cancer-predisposing syndrome, Familial colorectal cancer, Malignant lymphoma, non-Hodgkin, Liver cancer, Chronic lymphocytic leukemia,
Medulloblastoma, Malignant melanoma of skin, Multiple myeloma, Squamous cell carcinoma of the head and neck, Small cell lung cancer, Lung adenocarcinoma, Squamous cell lung carcinoma, Acute myeloid leukemia, Neoplasm of brain, Neoplasm of the breast, Glioblastoma, Hepatocellular carcinoma, Hereditary cancer-predisposing syndrome,
Pancreatic adenocarcinoma, Transitional cell carcinoma of the bladder, Brainstem glioma, Carcinoma of esophagus, Colorectal Neoplasms, Adrenocortical carcinoma, Adenocarcinoma of stomach, Ovarian Serous Cystadenocarcinoma, Malignant neoplasm of body of uterus, Adenocarcinoma of prostate, Uterine Carcinosarcoma, Metastatic pancreatic neuroendocrine tumours, Liver cancer, Chronic lymphocytic leukemia, Medulloblastoma, Malignant melanoma of skin, Multiple myeloma, Squamous cell carcinoma of the head and neck, Small cell lung cancer, Lung adenocarcinoma, Squamous cell lung carcinoma, Acute myeloid leukemia, Neoplasm of brain, Neoplasm of the breast, Glioblastoma, Hepatocellular carcinoma, Hereditary cancer-predisposing syndrome, Pancreatic adenocarcinoma,
Transitional cell carcinoma of the bladder, Brainstem glioma, Carcinoma of esophagus, Colorectal Neoplasms, Adrenocortical carcinoma, Adenocarcinoma of stomach, Ovarian Serous Cystadenocarcinoma, Malignant neoplasm of body of uterus, Adenocarcinoma of prostate, Uterine Carcinosarcoma, Medulloblastoma, Multiple myeloma, Squamous cell carcinoma of the head and neck, Li-Fraumeni syndrome, Lung adenocarcinoma, Renal cell carcinoma, papillary, Neoplasm of the breast, Hereditary cancer-predisposing syndrome, Pancreatic adenocarcinoma, Squamous cell carcinoma of the skin, Transitional cell carcinoma of the bladder, Colorectal Neoplasms, Adenocarcinoma of stomach, Ovarian Serous Cystadenocarcinoma, Malignant neoplasm of body of uterus, Hereditary cancer- predisposing syndrome, Carcinoma of cervix, Liver cancer, Li-Fraumeni syndrome,
Hepatocellular carcinoma, Hereditary cancer-predisposing syndrome, Liver cancer,
Squamous cell carcinoma of the head and neck, Li-Fraumeni syndrome, Lung
adenocarcinoma, Li-Fraumeni syndrome, Squamous cell lung carcinoma, Neoplasm of brain, Neoplasm of the breast, Glioblastoma, Hepatocellular carcinoma, Hereditary cancer- predisposing syndrome, Pancreatic adenocarcinoma, Transitional cell carcinoma of the bladder, Brainstem glioma, Carcinoma of esophagus, Colorectal Neoplasms,
Adenocarcinoma of stomach, Ovarian Serous Cystadenocarcinoma, Adenocarcinoma of prostate, Uterine Carcinosarcoma, Liver cancer, Chronic lymphocytic leukemia, Multiple myeloma, Squamous cell carcinoma of the head and neck, Lung adenocarcinoma, Li- Fraumeni syndrome, Neoplasm of brain, Neoplasm of the breast, Glioblastoma,
Hepatocellular carcinoma, Pancreatic adenocarcinoma, Transitional cell carcinoma of the bladder, Carcinoma of esophagus, Colorectal Neoplasms, Uterine cervical neoplasms, Adenocarcinoma of stomach, Ovarian Serous Cystadenocarcinoma, Malignant neoplasm of body of uterus, Uterine Carcinosarcoma, Li-Fraumeni syndrome, Liver cancer,
Hepatocellular carcinoma, Hereditary cancer-predisposing syndrome, Liver cancer,
Malignant melanoma of skin, Multiple myeloma, Squamous cell carcinoma of the head and neck, Lung adenocarcinoma, Breast cancer, somatic, Squamous cell lung carcinoma, Neoplasm of brain, Neoplasm of the breast, Hepatocellular carcinoma, Breast
adenocarcinoma, Hereditary cancer-predisposing syndrome, Pancreatic adenocarcinoma, Transitional cell carcinoma of the bladder, Carcinoma of esophagus, Colorectal Neoplasms, Adenoid cystic carcinoma, Adenocarcinoma of stomach, Ovarian Serous
Cystadenocarcinoma, Malignant neoplasm of body of uterus, Uterine Carcinosarcoma, Carcinoma of pancreas, Dyskeratosis congenita, autosomal recessive, Leber congenital amaurosis, Cone-rod dystrophy, Autosomal recessive congenital ichthyosis, Ichthyosis, Autosomal recessive congenital ichthyosis, Spondylocostal dysostosis, Inclusion Body Myopathy, Dominant, Hepatic failure, early-onset, and neurologic disorder due to
cytochrome C oxidase deficiency, Charcot-Marie-Tooth disease and deafness, Dejerine- Sottas disease, Dejerine- Sottas disease, Dejerine- Sottas syndrome, autosomal dominant, Charcot-Marie-Tooth disease, type IA, Dejerine- Sottas syndrome, autosomal dominant, Charcot-Marie-Tooth disease, type I, Mitochondrial complex III deficiency, nuclear type 2, Common variable immunodeficiency, Immunoglobulin A deficiency, Common Variable Immune Deficiency, Dominant, Common variable immunodeficiency, Hereditary cancer- predisposing syndrome, Multiple fibrofolliculomas, Hereditary cancer-predisposing syndrome, Hereditary cancer-predisposing syndrome, Multiple fibrofolliculomas, Hereditary cancer-predisposing syndrome, Smith-Magenis syndrome, Joubert syndrome, Meckel-Gruber syndrome, Sjogren-Larsson syndrome, Congenital disorders of glycosylation type II, Congenital disorder of glycosylation Up, Congenital defect of folate absorption,
Immunodeficiency, Cone-Rod Dystrophy, Dominant, Neurofibromatosis, type 1, Hereditary cancer-predisposing syndrome, Breast-ovarian cancer, familial 4, Hereditary cancer- predisposing syndrome, Infantile Refsum's disease, Peroxisome biogenesis disorders, Zellweger syndrome spectrum, Peroxisome biogenesis disorder, Familial hypoplastic, glomerulocystic kidney, Limb-girdle muscular dystrophy, type 2G, Hyperphosphatasia with mental retardation syndrome, Neoplasm of the breast, Transitional cell carcinoma of the bladder, Carcinoma of esophagus, Uterine cervical neoplasms, Adenocarcinoma of stomach, Neoplasm of the breast, Colorectal Neoplasms, Adenocarcinoma of stomach,
Hypothyroidism, congenital, nongoitrous, Autosomal recessive woolly hair, Autosomal Recessive Hypotrichosis with Woolly Hair, Bullous ichthyosiform erythroderma, Meesman's comeal dystrophy, Dermatopathia pigmentosa reticularis, Naxos disease, Ciliary dyskinesia, primary, Autoimmune disease, multisystem, infantile-onset, Mucopolysaccharidosis, MPS- III-B, Glycogen storage disease type 1A, Breast-ovarian cancer, familial, Breast-ovarian cancer, familial, Hereditary breast and ovarian cancer syndrome, Breast-ovarian cancer, familial 1, Hereditary breast and ovarian cancer syndrome, Hereditary cancer-predisposing syndrome, Familial cancer of breast, Hereditary cancer-predisposing syndrome, Breast- ovarian cancer, familial, Hereditary breast and ovarian cancer syndrome, Breast-ovarian cancer, familial, Familial cancer of breast, Breast-ovarian cancer, familial 1 , Hereditary breast and ovarian cancer syndrome, Neoplasm of the breast, Hereditary cancer-predisposing syndrome, Breast-ovarian cancer, familial, Hereditary breast and ovarian cancer syndrome, Hereditary cancer-predisposing syndrome, Breast-ovarian cancer, familial, Hereditary breast and ovarian cancer syndrome, Hereditary breast and ovarian cancer syndrome, Hereditary cancer-predisposing syndrome, Familial cancer of breast, Breast-ovarian cancer, familial, Hereditary breast and ovarian cancer syndrome, Hereditary cancer-predisposing syndrome, Neoplasm of the breast, Renal tubular acidosis, autosomal dominant, Frontotemporal dementia, ubiquitin-positive, Growth and mental retardation, mandibulofacial dysostosis, microcephaly, and cleft palate, Alexander's disease, Progressive supranuclear
ophthalmoplegia, Frontotemporal dementia, Progressive supranuclear ophthalmoplegia, Muscular dystrophy, Epilepsy, progressive myoclonic 6, Glanzmann thrombasthenia, Amelogenesis imperfecta, type IV, Tricho-dento-osseous syndrome, Osteogenesis imperfecta type I, Osteogenesis imperfecta type 2, thin-bone, Osteogenesis imperfecta with normal sclerae, dominant form, Osteogenesis imperfecta type I, Osteogenesis imperfecta type IIC, Osteogenesis imperfecta, recessive perinatal lethal, Osteogenesis imperfecta type I,
Osteogenesis imperfecta with normal sclerae, dominant form, Osteogenesis imperfecta type III, Osteogenesis imperfecta, type Ill/iv, Osteogenesis imperfecta, recessive perinatal lethal, Osteogenesis imperfecta with normal sclerae, dominant form, Osteogenesis imperfecta type 1, mild, Proximal symphalangism, Tarsal carpal coalition syndrome, Joubert syndrome, Joubert syndrome, Fanconi anemia, complementation group O, Hereditary cancer- predisposing syndrome, Fanconi anemia, complementation group O, Retinitis pigmentosa, Ischiopatellar dysplasia, Familial cancer of breast, Fanconi anemia, complementation group J, Neoplasm of ovary, Hereditary breast and ovarian cancer syndrome, Hereditary cancer- predisposing syndrome, Hereditary cancer-predisposing syndrome, Familial cancer of breast, Fanconi anemia, complementation group J, Hereditary cancer-predisposing syndrome, Hereditary breast and ovarian cancer syndrome, Hereditary cancer-predisposing syndrome, Rolandic epilepsy, Isolated growth hormone deficiency type IB, Hyperkalemic Periodic Paralysis Type 1, Potassium aggravated myotonia, Paramyotonia congenita of von
Eulenburg, Paramyotonia congenita/hyperkalemic periodic paralysis, Hyperkalemic Periodic Paralysis Type 1, Hypokalemic periodic paralysis, Hypokalemic periodic paralysis, type 2, Hyperkalemic Periodic Paralysis Type 1, Carcinoma of colon, Oligodontia-colorectal cancer syndrome, Carney complex, type 1, Andersen Tawil syndrome, Familial periodic paralysis, Andersen Tawil syndrome, Andersen Tawil syndrome, Congenital long QT syndrome, Acampomelic campomelic dysplasia, Camptomelic dysplasia, Striatal necrosis, bilateral, and progressive polyneuropathy, Pontocerebellar hypoplasia type 4, Pontocerebellar hypoplasia type 2A, Pontocerebellar hypoplasia type 4, Pontocerebellar hypoplasia type 2A,
Pontocerebellar hypoplasia type 5, Congenital cerebellar hypoplasia, Hypertonia,
Microcephaly, Amblyopia, Global developmental delay, Olivopontocerebellar hypoplasia, Non-syndromic pontocerebellar hypoplasia, Olivopontocerebellar hypoplasia, Deficiency of galactokinase, Hemophagocytic lymphohistiocytosis, familial, Pseudoneonatal
adrenoleukodystrophy, Epidermodysplasia verruciformis, Desbuquois dysplasia, Rolandic epilepsy, Ciliary dyskinesia, Ciliary dyskinesia, primary, Glycogen storage disease, type II, Glycogen storage disease type II, infantile, Glycogen storage disease, type II, Baraitser- Winter Syndrome, Nephrotic syndrome, type 8, Autosomal recessive cutis laxa type 2B, Encephalopathy, progressive, early-onset, with brain atrophy and thin corpus callosum, Arhinia choanal atresia microphthalmia, Oculomelic amyoplasia, Dystonia, Spinocerebellar ataxia, ACTH resistance, Glucocorticoid Deficiency, Renal hypodysplasia/aplasia, Left ventricular noncompaction, Pancreatic agenesis and congenital heart disease, Abnormality of cardiovascular system morphology, Congenital diaphragmatic hernia, Seckel syndrome, Niemann-Pick disease type Cl, Niemann-Pick disease type Cl, Niemann-Pick disease, type D, Scalp ear nipple syndrome, Arrhythmogenic right ventricular cardiomyopathy, type 10, Arrhythmogenic right ventricular cardiomyopathy, Cardiovascular phenotype, Arrhythmogenic right ventricular cardiomyopathy, type 10, Amyloidogenic transthyretin amyloidosis, Cardiovascular phenotype, Bainbridge-Ropers syndrome, Mental retardation, autosomal recessive, Vici syndrome, Carcinoma of pancreas, Juvenile polyposis syndrome, Colorectal Neoplasms, Juvenile polyposis/hereditary hemorrhagic telangiectasia syndrome, Juvenile polyposis syndrome, Colorectal Neoplasms, Mirror movements, Carcinoma of colon, Pitt-Hopkins syndrome, Erythropoietic protoporphyria, Progressive intrahepatic cholestasis, Periventricular nodular heterotopia with syndactyly, cleft palate and
developmental delay, Periventricular nodular heterotopia, Immunodeficiency, Obesity, Schizophrenia, Obesity, Osteopetrosis autosomal recessive, Burn-McKeown syndrome, Severe congenital neutropenia autosomal dominant, Cyclical neutropenia, Complement factor d deficiency, Spondylometaphyseal dysplasia Sedaghatian type, Carcinoma of pancreas, Peutz-Jeghers syndrome, Hereditary cancer-predisposing syndrome, Hereditary cancer- predisposing syndrome, Cutaneous malignant melanoma, Cutaneous melanoma, Persistent mullerian duct syndrome, type I, Preimplantation embryonic lethality, Hypocalcemia, autosomal dominant, Cone-rod dystrophy, Age-related macular degeneration, Spinocerebellar ataxia, Cardiofaciocutaneous syndrome, Cardio-facio-cutaneous syndrome, CODAS syndrome, Leukodystrophy, hypomyelinating, Insulin-resistant diabetes mellitus and acanthosis nigricans, Pineal hyperplasia and diabetes mellitus syndrome, Insulin-resistant diabetes mellitus and acanthosis nigricans, Leprechaunism syndrome, Pineal hyperplasia and diabetes mellitus syndrome, Retinitis pigmentosa, Mucolipidosis type IV, Mucolipidosis type IV, Mucolipidosis type IV, Boucher Neuhauser syndrome, Weill-Marchesani syndrome, Cerebellar ataxia, deafness and narcolepsy, autosomal dominant, Tyrosine kinase 2 deficiency, Charcot-Marie-Tooth disease, type 2M, Lamilial hypercholesterolemia, Lamilial hypercholesterolemias, Kartagener syndrome, Ciliary dyskinesia, primary,
Spondyloenchondrodysplasia with immune dysregulation, Deficiency of alpha-mannosidase, Aicardi Goutieres syndrome, Blood group - Lutheran inhibitor, Glutaric aciduria, type 1, Marshall-Smith syndrome, Epileptic encephalopathy, early infantile, Lamilial hemiplegic migraine type 1, Episodic ataxia type 2, Epileptic encephalopathy, early infantile, Lamilial hemiplegic migraine type 1, Autosomal recessive non-syndromic intellectual disability, Lehman syndrome, Cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy, Combined oxidative phosphorylation deficiency, Severe combined immunodeficiency, autosomal recessive, T cell-negative, B cell-positive, NK cell-negative, Thyroid dyshormonogenesis, Cold-induced sweating syndrome, Pseudoachondroplastic spondyloepiphyseal dysplasia syndrome, Multiple epiphyseal dysplasia, Pseudoachondroplastic spondyloepiphyseal dysplasia syndrome, Epiphyseal dysplasia, multiple, severe, Bilateral right-sidedness sequence, Transposition of the great arteries, dextro-looped, Heterotaxia, Acute myeloid leukemia, Arthrogryposis multiplex congenita, neurogenic, with myelin defect, Hemochromatosis type 1, Dystonia 28, childhood-onset, Finnish congenital nephrotic syndrome, Central core disease, Central core disease, Malignant hyperthermia, susceptibility to, RYRl-Related Disorders, Congenital myopathy with fiber type disproportion, RYRl-Related Disorders, Myopathy, Congenital myopathy with fiber type disproportion, Central core disease, Malignant hyperthermia, susceptibility to, Central core disease, Congenital myopathy with fiber type disproportion, Central core disease, Cutis laxa with severe pulmonary, gastrointestinal, and urinary abnormalities, Nephrotic syndrome, type 9, Maple syrup urine disease, Diamond-Blackfan anemia, Alternating hemiplegia of childhood, Dystonia, Familial partial lipodystrophy 6, Ethylmalonic encephalopathy, Blood group - Futheran Null, Familial type 3 hyperlipoproteinemia, Apolipoprotein C2 deficiency, Apolipoprotein C-II (Padova), Apolipoprotein C2 deficiency, Apolipoprotein C-II
(Auckland), Immunodeficiency, Hermansky-Pudlak syndrome, Xeroderma pigmentosum, group D, Trichothiodystrophy, photosensitive, Congenital muscular dystrophy- dystroglycanopathy with mental retardation, type B5, Muscular dystrophy- dystroglycanopathy (congenital with brain and eye anomalies), type a, Congenital muscular dystrophy-dystroglycanopathy (with or without mental retardation) type B5, Fimb-girdle muscular dystrophy-dystroglycanopathy, type C5, Fimb-girdle muscular dystrophy,
Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies type A5, Muscle weakness, Headache, Gait imbalance, Difficulty walking, Paresthesia, Difficulty climbing stairs, Scapular winging, Difficulty standing, Muscular dystrophy- dystroglycanopathy, Walker- Warburg congenital muscular dystrophy, Congenital muscular dystrophy-dystroglycanopathy with mental retardation, type B5, Fimb-girdle muscular dystrophy-dystroglycanopathy, type C5, Walker- Warburg congenital muscular dystrophy, Walker-Warburg congenital muscular dystrophy, Congenital muscular dystrophy- dystroglycanopathy without mental retardation, type B5, Walker-Warburg congenital muscular dystrophy, Congenital muscular dystrophy-dystroglycanopathy with mental retardation, type B5, Fimb-girdle muscular dystrophy-dystroglycanopathy, type C5,
Congenital muscular dystrophy-dystroglycanopathy with mental retardation, type B5, Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies type A5, Walker-Warburg congenital muscular dystrophy, Hypocalciuric hypercalcemia, familial, type III, Mental retardation, autosomal recessive, Hyperferritinemia cataract syndrome, F-ferritin deficiency, autosomal recessive, Isolated lutropin deficiency, Autistic disorder of childhood onset, Motor delay, Iris coloboma, Autism, Delayed speech and language development, Abnormality of vision, Early infantile epileptic encephalopathy, Ataxia-oculomotor apraxia, Early infantile epileptic encephalopathy, Peripheral neuropathy, myopathy, hoarseness, and hearing loss, Spinocerebellar ataxia, Spinocerebellar ataxia, Retinitis pigmentosa, Nemaline myopathy, Polyglucosan body myopathy with or without immunodeficiency, Glycogen storage disease, type IV, Brown- Vialetto- Van Laere syndrome, Spinocerebellar ataxia, Cerebro-costo-mandibular syndrome, Neurohypophyseal diabetes insipidus, Pigmentary pallidal degeneration, Hypoprebetalipoproteinemia, acanthocytosis, retinitis pigmentosa, and pallidal degeneration, Pigmentary pallidal degeneration, Spongiform encephalopathy with neuropsychiatric features, Genetic prion diseases, Gerstmann-Straussler-Scheinker syndrome, Cerebral Amyloid Angiopathy, PRNP-related, Ataxia-telangiectasia-like disorder, Kindler's syndrome, Short stature, facial dysmorphism, and skeletal anomalies with or without cardiac anomalies, Auriculocondylar syndrome, McKusick Kaufman syndrome, Alagille syndrome, Mitochondrial complex I deficiency, Leigh syndrome, Congenital dyserythropoietic anemia, type II, Cowden syndrome, Congenital dyserythropoietic anemia, Retinitis pigmentosa, Otofaciocervical syndrome, Thrombophilia due to thrombomodulin defect, Thrombophilia due to thrombomodulin defect, Joint laxity, short stature, and myopia, Craniofacial anomalies and anterior segment dysgenesis syndrome, Familial hypertrophic cardiomyopathy,
Cardiomyopathy, hypertrophic, midventricular, digenic, Dowling-Degos disease, C-like syndrome, Multiple synostoses syndrome, Symphalangism, proximal, Fibular hypoplasia and complex brachydactyly, schizophrenia, Aicardi Goutieres syndrome, Severe combined immunodeficiency due to ADA deficiency, Partial adenosine deaminase deficiency, Multiple congenital anomalies-hypotonia-seizures syndrome, Primary autosomal recessive
microcephaly, Galloway-Mowat Syndrome, Arterial tortuosity syndrome, Epileptic encephalopathy, early infantile, Helsmoortel-van der aa syndrome, Congenital disorder of glycosylation type IE, Idiopathic hypercalcemia of infancy, Cushing's syndrome, McCune- Albright syndrome, Polyostotic fibrous dysplasia, somatic, mosaic, Pituitary Tumor, Growth Hormone- Secreting, Somatic, Liver cancer, McCune- Albright syndrome, Malignant melanoma of skin, Squamous cell carcinoma of the head and neck, Lung adenocarcinoma, Neoplasm of the breast, Hepatocellular carcinoma, Pancreatic adenocarcinoma, Neoplasm, Colorectal Neoplasms, Uterine cervical neoplasms, Adrenocortical carcinoma,
Adenocarcinoma of stomach, McCune- Albright syndrome, Pseudohypoparathyroidism, type IA, with testotoxicosis, Pseudohypoparathyroidism type 1C, Waardenburg syndrome type 4B, Early infantile epileptic encephalopathy, Benign familial neonatal seizures, Early infantile epileptic encephalopathy, Seizures, Generalized hypotonia, Early infantile epileptic encephalopathy, Benign familial neonatal seizures, Dyskeratosis congenita, autosomal recessive, Pulmonary fibrosis and/or bone marrow failure, telomere-related, Dyskeratosis congenita, Dyskeratosis congenita, autosomal recessive, Dyskeratosis congenita, autosomal recessive, Glomerulonephritis with sparse hair and telangiectases, Alzheimer disease, type 1, Amyotrophic lateral sclerosis type 1, Inflammatory bowel disease, autosomal recessive, Immunodeficiency, Familial platelet disorder with associated myeloid malignancy, Familial platelet disorder with associated myeloid malignancy, Transient myeloproliferative disorder of Down syndrome, Leukemia, acute myeloid, mO subtype, Popliteal pterygium syndrome lethal type, Kartagener syndrome, Primary ciliary dyskinesia, Kartagener syndrome, Ciliary dyskinesia, Primary ciliary dyskinesia, Homocystinuria due to CBS deficiency, Epileptic encephalopathy, early infantile, Unverricht-Lundborg syndrome, Autoimmune polyglandular syndrome type 1, autosomal dominant, Leukocyte adhesion deficiency type 1, Bethlem myopathy, Ullrich congenital muscular dystrophy, Ullrich congenital muscular dystrophy, Microcephalic osteodysplastic primordial dwarfism type 2, Polyarteritis nodosa, childhoood- onset, Peroxisome biogenesis disorder, Proline dehydrogenase deficiency, Schizophrenia, Autosomal recessive Noonan-like syndrome due to compound heterozygous variants in LZTR1, Spinal muscular atrophy, jokela type, Frontotemporal dementia and/or amyotrophic lateral sclerosis, Myopathy, isolated mitochondrial, autosomal dominant, Rhabdoid tumor predisposition syndrome, Schwannomatosis, Deficiency of beta-ureidopropionase,
Congenital cataract, Klippel-feil syndrome, autosomal recessive, with nemaline myopathy and facial dysmorphism, Hermansky-Pudlak syndrome, Cataract, congenital nuclear, autosomal recessive, Cataract, multiple types, Familial cancer of breast, Hereditary cancer- predisposing syndrome, Hereditary cancer-predisposing syndrome, Familial cancer of breast, Prostate cancer, somatic, Hereditary cancer-predisposing syndrome, Osteosarcoma,
Neurofibromatosis, type 2, Epilepsy, familial focal, with variable foci, Rolandic epilepsy, Parkinson disease, Sorsby fundus dystrophy, Macrothrombocytopenia and granulocyte inclusions with or without nephritis or sensorineural hearing loss, Microcytic anemia, Peripheral demyelinating neuropathy, central dysmyelination, Waardenburg syndrome, and Hirschsprung disease, Waardenburg syndrome type 4C, Parkinson disease, Infantile neuroaxonal dystrophy, Adenylosuccinate lyase deficiency, Nephronophthisis-like nephropathy, Carcinoma of colon, Rubinstein-Taybi syndrome, Carcinoma of colon, Kanzaki disease, Methemoglobinemia type 2, Autosomal recessive syndrome of syndactyly, undescended testes and central nervous system defects, Megalencephalic
leukoencephalopathy with subcortical cysts, Microcephaly with chorioretinopathy, autosomal recessive, Mitochondrial DNA depletion syndrome (MNGIE type), Muscular dystrophy, congenital, megaconial type, Metachromatic leukodystrophy, juvenile type, Metachromatic leukodystrophy, late infantile, Metachromatic leukodystrophy, Metachromatic
leukodystrophy, severe, Metachromatic leukodystrophy, Short stature, idiopathic, X-linked, Leri Weill dyschondrosteosis, Chondrodysplasia punctata, X-linked recessive, Kallmann syndrome, Ocular albinism, type I, Opitz-Frias syndrome, Amelogenesis imperfecta, type IE, Spondyloepiphyseal dysplasia tarda, Oral-facial-digital syndrome, Joubert syndrome, Joubert syndrome, Oral-facial-digital syndrome, Paroxysmal nocturnal hemoglobinuria 1, Multiple congenital anomalies-hypotonia-seizures syndrome, Pettigrew syndrome, Nance-Horan syndrome, Congenital cataract, Early infantile epileptic encephalopathy, Early infantile epileptic encephalopathy, Atypical Rett syndrome, Early infantile epileptic encephalopathy, Angelman syndrome-like, Early infantile epileptic encephalopathy, Atypical Rett syndrome, Early infantile epileptic encephalopathy, Angelman syndrome-like, Early infantile epileptic encephalopathy, Atypical Rett syndrome, Early infantile epileptic encephalopathy, Angelman syndrome-like, Juvenile retinoschisis, Glycogen storage disease type IXal, Coffin-Lowry syndrome, Deafness, X-linked, IFAP syndrome with or without BRESHECK syndrome, Familial X-linked hypophosphatemic vitamin D refractory rickets, Hydranencephaly with abnormal genitalia, Proud Levine Carpenter syndrome, Lissencephaly, X-linked, epileptic encephalopathy, early infanitle, Mental retardation, X-linked, Congenital adrenal hypoplasia, X-linked, Becker muscular dystrophy, Duchenne muscular dystrophy, Becker muscular dystrophy, Duchenne muscular dystrophy, Dilated cardiomyopathy, Granulomatous disease, chronic, X-linked, variant, Cone-rod dystrophy, X-linked, Retinitis pigmentosa, Ornithine carbamoyltransferase deficiency, Mental retardation, X-linked, Mental retardation, X-linked, Congenital stationary night blindness, type 1A, Mental retardation and microcephaly with pontine and cerebellar hypoplasia, FG syndrome, Monoamine oxidase A deficiency, Atrophia bulborum hereditaria, Familial exudative vitreoretinopathy, X-linked, Atrophia bulborum hereditaria, Kabuki syndrome, Retinitis pigmentosa, Arthrogryposis multiplex congenita, distal, X-linked, Properdin deficiency, X-linked, Chondrodysplasia punctata, X-linked dominant, atypical, Chondrodysplasia punctata X-linked dominant, MEND syndrome, Wiskott-Aldrich syndrome, GATA-1 -related thrombocytopenia with dyserythropoiesis, Dyserythropoietic anemia with thrombocytopenia, GATA-1 -related thrombocytopenia with dyserythropoiesis, Neurodegeneration with brain iron accululation, Nephrolithiasis, X-linked recessive, Dent disease, Mental retardation, syndromic, Claes-Jensen type, X-linked, 2- methyl-3-hydroxybutyric aciduria, Aarskog syndrome, Hereditary sideroblastic anemia, Amyotrophic lateral sclerosis, with or without frontotemporal dementia, Androgen resistance syndrome, Partial androgen insensitivity syndrome, Prostate cancer susceptibility, Androgen resistance syndrome, Partial androgen insensitivity syndrome, Craniofrontonasal dysplasia, Hypohidrotic X-linked ectodermal dysplasia, Hypohidrotic ectodermal dysplasia, Tooth agenesis, selective, X-linked, Myopia, X-Linked, Female-Limited, X-linked severe combined immunodeficiency, Ohdo syndrome, X-linked, FG syndrome, Intellectual functioning disability, Cardiovascular phenotype, X-linked hereditary motor and sensory neuropathy, Mental retardation, X-linked, syndromic, Mental Retardation, X-Linked, Cornelia de Lange syndrome 5, Glycogen storage disease, Allan-Herndon-Dudley syndrome, Mental retardation, X-linked, Metacarpal 4-5 fusion, ATR-X syndrome, Menkes kinky-hair syndrome, Menkes kinky-hair syndrome, Cutis laxa, X-linked, Distal spinal muscular atrophy, X-linked, Phosphoglycerate kinase 1 deficiency, Cleft palate with ankyloglossia, Mental retardation, X- linked, Choroideremia, Early infantile epileptic encephalopathy, Mohr-Tranebjaerg syndrome, X-linked agammaglobulinemia, Agammaglobulinemia, non-Bruton type, Fabry disease, Fabry disease, Deoxygalactonojirimycin response, Pelizaeus-Merzbacher disease, Pelizaeus-Merzbacher disease, connatal, Thyroxine-binding globulin, variant P,
Phosphoribosylpyrophosphate synthetase superactivity, Charcot-Marie-Tooth disease, X- linked recessive, type 5, Alport syndrome, X-linked recessive, Microscopic hematuria, Elevated mean arterial pressure, Chronic kidney disease, Mental retardation, X-linked, Megalocornea, Mental retardation, X-linked, Heterotopia, Lissencephaly, X-linked,
Fucosidosis, Lissencephaly, X-linked, Subcortical laminar heterotopia, X-linked, Danon disease, Syndromic X-linked mental retardation, Cabezas type, Mental retardation, X-linked, syndromic, wu type, Lymphoproliferative syndrome, X-linked, Lymphoproliferative syndrome, X-linked, Simpson-Golabi-Behmel syndrome, Borjeson-Forssman-Lehmann syndrome, Lesch-Nyhan syndrome, Lesch-Nyhan syndrome, HPRT Flint, Partial
hypoxanthine-guanine phosphoribosyltransferase deficiency, HPRT Munich, HPRT
Milwaukee, Lesch-Nyhan syndrome, Christianson syndrome, Hypertrophic cardiomyopathy, Myopathy, reducing body, X-linked, early-onset, severe, Immunodeficiency with hyper IgM type 1, Pituitary adenoma, growth hormone- secreting, Heterotaxy, visceral, X-linked, VACTERL association with hydrocephaly, X-linked, Congenital heart defects,
nonsyndromic, Heterotaxy, visceral, X-linked, Hereditary factor IX deficiency disease, Hereditary factor IX deficiency disease, Thrombophilia, X-linked, due to factor IX defect, Mucopolysaccharidosis, MPS-II, Mucopolysaccharidosis, type II, severe form, Mucopolysaccharidosis, MPS-II, Hypospadias, X-linked, Severe X-linked myotubular myopathy, Child syndrome, Spondyloepimetaphyseal dysplasia X-linked, Microcephaly, Carious teeth, Intellectual disability, Global developmental delay, Abnormality of the cerebral cortex, Skeletal muscle atrophy, Oral-pharyngeal dysphagia, Muscular hypotonia, Muscular hypotonia, Creatine deficiency, X-linked, Chromosome Xq28 deletion syndrome, Adrenoleukodystrophy, Nephrogenic diabetes insipidus, X-linked, Nephrogenic syndrome of inappropriate antidiuresis, N-terminal acetyltransferase deficiency, Rett syndrome, Mental retardation, X-linked, syndromic, Rett syndrome, Rett syndrome, Stereotypy, Delayed speech and language development, Delayed gross motor development, Bruxism, Deuteranopia, Otopalatodigital spectrum disorder, Melnick-Needles syndrome, Periventricular nodular heterotopia, Melnick-Needles syndrome, Oto-palato-digital syndrome, type II,
Frontometaphyseal dysplasia, Cardiac valvular dysplasia, X-linked, Periventricular nodular heterotopia, Oto-palato-digital syndrome, type II, Oto-palato-digital syndrome, type I, Emery-Dreifuss muscular dystrophy, X-linked, 3-Methylglutaconic aciduria type 2,
Galloway-Mowat Syndrome, X-Linked, Glucose 6 phosphate dehydrogenase deficiency, G6pd a-, G6PD Canton, G6PD GIFU, G6PD Agrigento, G6PD Taiwan-Hakka, Anemia, nonspherocytic hemolytic, due to G6PD deficiency, G6PD LOMA Linda, Anemia, nonspherocytic hemolytic, due to G6PD deficiency, Glucose phosphate dehydrogenase deficiency, G6pd a-G6PD Gastonia, G6PD Marion, G6PD Minnesota, Anemia,
nonspherocytic hemolytic, due to G6PD deficiency, Hypohidrotic ectodermal dysplasia with immune deficiency, Dyskeratosis congenita X-linked, Hereditary factor VIII deficiency disease, Parkinsonism, early onset with mental retardation, Mental retardation, X-linked, Leri Weill dyschondrosteosis, XY sex reversal, type 1, Leigh syndrome, Chloramphenicol resistance, nonsyndromic sensorineural, mitochondrial, Leber's optic atrophy, Cytochrome c oxidase i deficiency, Leigh syndrome, Mitochondrial complex I deficiency, Leigh syndrome, Retinitis pigmentosa-deafness syndrome, Cerebellar ataxia, cataract, and diabetes mellitus.
[0362] In some aspects, the present disclosure provides uses of any one of the fusion proteins described herein and a guide RNA targeting this fusion protein to a target A:T base pair in a nucleic acid molecule in the manufacture of a kit for nucleic acid editing, wherein the nucleic acid editing comprises contacting the nucleic acid molecule with the fusion protein and guide RNA under conditions suitable for the substitution of the adenine (A) of the A:T nucleobase pair with a cytosine (C). In some embodiments of these uses, the nucleic acid molecule is a double-stranded DNA molecule. In some embodiments, the step of contacting of induces separation of the double- stranded DNA at a target region. In some embodiments, the step of contacting further comprises nicking one strand of the double-stranded DNA, wherein the one strand comprises an unmutated strand that comprises the T of the target A:T nucleobase pair.
[0363] In some embodiments of the described uses, the step of contacting is performed in vitro. In other embodiments, the step of contacting is performed in vivo. In some
embodiments, the step of contacting is performed in a subject (e.g., a human subject or a non human animal subject). In some embodiments, the step of contacting is performed in a cell, such as a human or non-human animal cell.
[0364] The present disclosure also provides uses of any one of the fusion proteins described herein as a medicament. The present disclosure also provides uses of any one of the complexes of fusion proteins and guide RNAs described herein as a medicament.
Pharmaceutical compositions
[0365] Other embodiments of the present disclosure relate to pharmaceutical compositions comprising any of the fusion proteins or the fusion protein-gRNA complexes described herein. The term“pharmaceutical composition”, as used herein, refers to a composition formulated for pharmaceutical use. In some embodiments, the pharmaceutical composition further comprises a pharmaceutically acceptable carrier. In some embodiments, the pharmaceutical composition comprises additional agents (e.g., for specific delivery, increasing half-life, or other therapeutic compounds).
[0366] In some embodiments, any of the fusion proteins, gRNAs, and/or complexes described herein are provided as part of a pharmaceutical composition. In some
embodiments, the pharmaceutical composition comprises any of the fusion proteins provided herein. In some embodiments, the pharmaceutical composition comprises any of the complexes provided herein. In some embodiments pharmaceutical composition comprises a gRNA, a napDNAbp-dCas9 fusion protein, and a pharmaceutically acceptable excipient. In some embodiments pharmaceutical composition comprises a gRNA, a napDNAbp-nCas9 fusion protein, and a pharmaceutically acceptable excipient. Pharmaceutical compositions may optionally comprise one or more additional therapeutically active substances.
[0367] In some embodiments, compositions provided herein are administered to a subject, for example, to a human subject, in order to effect a targeted genomic modification within the subject. In some embodiments, cells are obtained from the subject and contacted with a any of the pharmaceutical compositions provided herein. In some embodiments, cells removed from a subject and contacted ex vivo with a pharmaceutical composition are re-introduced into the subject, optionally after the desired genomic modification has been effected or detected in the cells. Methods of delivering pharmaceutical compositions comprising nucleases are known, and are described, for example, in U.S. Pat. Nos. 6,453,242; 6,503,717; 6,534,261; 6,599,692; 6,607,882; 6,689,558; 6,824,978; 6,933,113; 6,979,539; 7,013,219; and 7,163,824, the disclosures of all of which are incorporated by reference herein in their entireties. Although the descriptions of pharmaceutical compositions provided herein are principally directed to pharmaceutical compositions which are suitable for administration to humans, it will be understood by the skilled artisan that such compositions are generally suitable for administration to animals or organisms of all sorts. Modification of
pharmaceutical compositions suitable for administration to humans in order to render the compositions suitable for administration to various animals is well understood, and the ordinarily skilled veterinary pharmacologist can design and/or perform such modification with merely ordinary, if any, experimentation. Subjects to which administration of the pharmaceutical compositions is contemplated include, but are not limited to, humans and/or other primates; mammals, domesticated animals, pets, and commercially relevant mammals such as cattle, pigs, horses, sheep, cats, dogs, mice, and/or rats; and/or birds, including commercially relevant birds such as chickens, ducks, geese, and/or turkeys.
[0368] Formulations of the pharmaceutical compositions described herein may be prepared by any method known or hereafter developed in the art of pharmacology. In general, such preparatory methods include the step of bringing the active ingredient(s) into association with an excipient and/or one or more other accessory ingredients, and then, if necessary and/or desirable, shaping and/or packaging the product into a desired single- or multi-dose unit.
[0369] Pharmaceutical formulations may additionally comprise a pharmaceutically acceptable excipient, which, as used herein, includes any and all solvents, dispersion media, diluents, or other liquid vehicles, dispersion or suspension aids, surface active agents, isotonic agents, thickening or emulsifying agents, preservatives, solid binders, lubricants and the like, as suited to the particular dosage form desired. Remington’s The Science and Practice of Pharmacy, 21st Edition, A. R. Gennaro (Lippincott, Williams & Wilkins, Baltimore, MD, 2006; incorporated in its entirety herein by reference) discloses various excipients used in formulating pharmaceutical compositions and known techniques for the preparation thereof. See also PCT application PCT/US2010/055131 (Publication No. WO/2011053982), filed Nov. 2, 2010, incorporated in its entirety herein by reference, for additional suitable methods, reagents, excipients and solvents for producing pharmaceutical compositions comprising a nuclease. Except insofar as any conventional excipient medium is incompatible with a substance or its derivatives, such as by producing any undesirable biological effect or otherwise interacting in a deleterious manner with any other component(s) of the
pharmaceutical composition, its use is contemplated to be within the scope of this disclosure.
[0370] As used here, the term“pharmaceutically acceptable carrier” means a
pharmaceutically acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or steric acid), or solvent encapsulating material, involved in carrying or transporting the compound from one site (e.g., the delivery site) of the body, to another site (e.g., organ, tissue or portion of the body). A pharmaceutically acceptable carrier is“acceptable” in the sense of being compatible with the other ingredients of the formulation and not injurious to the tissue of the subject (e.g., physiologically compatible, sterile, physiologic pH, etc.).
Some examples of materials which can serve as pharmaceutically acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as com starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose,
methylcellulose, ethyl cellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients, such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) glycols, such as propylene glycol; (11) polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters, such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents, such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethyl alcohol; (20) pH buffered solutions; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids (23) serum component, such as serum albumin, HDL and LDL; (22) C2-C12 alcohols, such as ethanol; and (23) other non-toxic compatible substances employed in pharmaceutical formulations. Wetting agents, coloring agents, release agents, coating agents, sweetening agents, flavoring agents, perfuming agents, preservative and antioxidants may also be present in the formulation. The terms such as“excipient”,“carrier”,“pharmaceutically acceptable carrier” or the like are used interchangeably herein.
[0371] In some embodiments, the pharmaceutical composition is formulated for delivery to a subject, e.g., for gene editing. Suitable routes of administrating the pharmaceutical composition described herein include, without limitation: topical, subcutaneous, transdermal, intradermal, intralesional, intraarticular, intraperitoneal, intravesical, transmucosal, gingival, intradental, intracochlear, transtympanic, intraorgan, epidural, intrathecal, intramuscular, intravenous, intravascular, intraosseus, periocular, intratumoral, intracerebral, and
intracerebroventricular administration.
[0372] In some embodiments, the pharmaceutical composition described herein is administered locally to a diseased site. In some embodiments, the pharmaceutical composition described herein is administered to a subject by injection, by means of a catheter, by means of a suppository, or by means of an implant, the implant being of a porous, non-porous, or gelatinous material, including a membrane, such as a sialastic membrane, or a fiber.
[0373] In some embodiments, the pharmaceutical composition is formulated in accordance with routine procedures as a composition adapted for intravenous or subcutaneous administration to a subject, e.g., a human. In some embodiments, pharmaceutical composition for administration by injection are solutions in sterile isotonic aqueous buffer. Where necessary, the pharmaceutical can also include a solubilizing agent and a local anesthetic such as lignocaine to ease pain at the site of the injection. Generally, the ingredients are supplied either separately or mixed together in unit dosage form, for example, as a dry lyophilized powder or water free concentrate in a hermetically sealed container such as an ampoule or sachette indicating the quantity of active agent. Where the pharmaceutical is to be administered by infusion, it can be dispensed with an infusion bottle containing sterile pharmaceutical grade water or saline. Where the pharmaceutical composition is administered by injection, an ampoule of sterile water for injection or saline can be provided so that the ingredients can be mixed prior to administration.
[0374] The pharmaceutical composition can be contained within a lipid particle or vesicle, such as a liposome or microcrystal, which is also suitable for parenteral administration. The particles can be of any suitable structure, such as unilamellar or plurilamellar, so long as compositions are contained therein. Compounds can be entrapped in“stabilized plasmid- lipid particles” (SPLP) containing the fusogenic lipid dioleoylphosphatidylethanolamine (DOPE), low levels (5-10 mol%) of cationic lipid, and stabilized by a polyethyleneglycol (PEG) coating (Zhang Y. P. et al., Gene Ther. 1999, 6:1438-47). Positively charged lipids such as N-[l-(2,3-dioleoyloxi)propyl]-N,N,N-trimethyl-amoniummethylsulfate, or “DOTAP,” are particularly preferred for such particles and vesicles. The preparation of such lipid particles is well known. See, e.g., U.S. Patent Nos. 4,880,635; 4,906,477; 4,911,928; 4,917,951; 4,920,016; and 4,921,757; each of which is incorporated herein by reference. [0375] The pharmaceutical composition described herein may be administered or packaged as a unit dose, for example. The term“unit dose” when used in reference to a pharmaceutical composition of the present disclosure refers to physically discrete units suitable as unitary dosage for the subject, each unit containing a predetermined quantity of active material calculated to produce the desired therapeutic effect in association with the required diluent; i.e., carrier, or vehicle.
[0376] Further, the pharmaceutical composition can be provided as a pharmaceutical kit comprising (a) a container containing a compound of the invention in lyophilized form and (b) a second container containing a pharmaceutically acceptable diluent (e.g., sterile water) for injection. The pharmaceutically acceptable diluent can be used for reconstitution or dilution of the lyophilized compound of the invention. Optionally associated with such container(s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use or sale for human administration.
[0377] In another aspect, an article of manufacture containing materials useful for the treatment of the diseases described above is included. In some embodiments, the article of manufacture comprises a container and a label. Suitable containers include, for example, bottles, vials, syringes, and test tubes. The containers may be formed from a variety of materials such as glass or plastic. In some embodiments, the container holds a composition that is effective for treating a disease described herein and may have a sterile access port. For example, the container may be an intravenous solution bag or a vial having a stopper pierceable by a hypodermic injection needle. The active agent in the composition is a compound of the invention. In some embodiments, the label on or associated with the container indicates that the composition is used for treating the disease of choice. The article of manufacture may further comprise a second container comprising a pharmaceutically acceptable buffer, such as phosphate-buffered saline, Ringer’s solution, or dextrose solution. It may further include other materials desirable from a commercial and user standpoint, including other buffers, diluents, filters, needles, syringes, and package inserts with instructions for use.
Delivery Methods
[0378] In some embodiments, the disclosure provides methods comprising delivering any of the fusion proteins, gRNAs, and/or complexes described herein. In other embodiments, the disclosure provides methods comprising delivery of one or more vectors as described herein, one or more transcripts thereof, and/or one or proteins transcribed therefrom, to a host cell. In some embodiments, the invention further provides cells produced by such methods, and organisms (such as animals, plants, or fungi) comprising or produced from such cells. In some embodiments, a base editor as described herein in combination with (and optionally complexed with) a guide sequence is delivered to a cell. Conventional viral and non-viral based gene transfer methods can be used to introduce nucleic acids in mammalian cells or target tissues. Such methods can be used to administer nucleic acids encoding components of a base editor to cells in culture, or in a host organism. Non-viral vector delivery systems include ribonucleoprotein (RNP) complexes, DNA plasmids, RNA (e.g., a transcript of a vector described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome. Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell. For a review of gene therapy procedures, see Anderson, Science 256:808-813 (1992); Nabel & Feigner,
TIB TECH 11:211-217 (1993); Mitani & Caskey, TIB TECH 11:162-166 (1993); Dillon,
TIB TECH 11:167-175 (1993); Miller, Nature 357:455-460 (1992); Van Brunt,
Biotechnology 6(10): 1149-1154 (1988); Vigne, Restorative Neurology and Neuroscience 8:35-36 (1995); Kremer & Perricaudet, British Medical Bulletin 51(1):31-44 (1995);
Haddada et al., in Current Topics in Microbiology and Immunology Doerfler and Bihm (eds) (1995); and Yu et al., Gene Therapy 1:13-26 (1994).
[0379] In certain embodiments, the method of delivery and vector provided herein is an RNP complex. RNP delivery of base editors markedly increases the DNA specificity of base editing. RNP delivery of base editors leads to decoupling of on- and off-target editing. RNP delivery ablated off-target editing at non-repetitive sites while maintaining on-target editing comparable to plasmid delivery, and greatly reduced off-target editing even at the highly repetitive VEGFA site 2. See Rees, H.A. et al., Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery, Nat. Commun. 8, 15790 (2017), which is incorporated by reference herein in its entirety.
[0380] Methods of non-viral delivery of nucleic acids include RNP complexes, lipofection, nucleofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipidmucleic acid conjugates, naked DNA, artificial virions, and agent- enhanced uptake of DNA. Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386, 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g.,
Transfectam™ and Lipofectin™). Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of Feigner, WO 1991/17424; WO 1991/16024. Delivery can be to cells (e.g., in vitro or ex vivo administration) or target tissues (e.g., in vivo administration).
[0381] The preparation of lipidmucleic acid complexes, including targeted liposomes such as immunolipid complexes, is well known to one of skill in the art (see, e.g., Crystal, Science 270:404-410 (1995); Blaese et al., Cancer Gene Ther. 2:291-297 (1995); Behr et ah,
Bioconjugate Chem. 5:382-389 (1994); Remy et al., Bioconjugate Chem. 5:647-654 (1994); Gao et al., Gene Therapy 2:710-722 (1995); Ahmad et al., Cancer Res. 52:4817-4820 (1992); U.S. Pat. Nos. 4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, and 4,946,787).
[0382] The use of RNA or DNA viral based systems for the delivery of nucleic acids take advantage of highly evolved processes for targeting a virus to specific cells in the body and trafficking the viral payload to the nucleus. Viral vectors can be administered directly to patients {in vivo) or they can be used to treat cells in vitro, and the modified cells may optionally be administered to patients {ex vivo). Conventional viral based systems could include retroviral, lentivirus, adenoviral, adeno-associated and herpes simplex virus vectors for gene transfer. Integration in the host genome is possible with the retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often resulting in long term expression of the inserted transgene. Additionally, high transduction efficiencies have been observed in many different cell types and target tissues.
[0383] The tropism of a viruses can be altered by incorporating foreign envelope proteins, expanding the potential target population of target cells. Lentiviral vectors are retroviral vectors that are able to transduce or infect non-dividing cells and typically produce high viral titers. Selection of a retroviral gene transfer system would therefore depend on the target tissue. Retroviral vectors are comprised of cis-acting long terminal repeats with packaging capacity for up to 6-10 kb of foreign sequence. The minimum cis-acting LTRs are sufficient for replication and packaging of the vectors, which are then used to integrate the therapeutic gene into the target cell to provide permanent transgene expression. Widely used retroviral vectors include those based upon murine leukemia virus (MuLV), gibbon ape leukemia vims (GaLV), Simian Immuno deficiency vims (SIV), human immuno deficiency vims (HIV), and combinations thereof (see, e.g., Buchscher et al., J. Virol. 66:2731-2739 (1992); Johann et al., J. Virol. 66:1635-1640 (1992); Sommnerfelt et al., Virol. 176:58-59 (1990); Wilson et al., J. Virol. 63:2374-2378 (1989); Miller et al, J. Virol. 65:2220-2224 (1991);
PCT/US 94/05700). In applications where transient expression is preferred, adenoviral based systems may be used. Adenoviral based vectors are capable of very high transduction efficiency in many cell types and do not require cell division. With such vectors, high titer and levels of expression have been obtained. This vector can be produced in large quantities in a relatively simple system. Adeno-associated virus (“AAV”) vectors may also be used to transduce cells with target nucleic acids, e.g., in the in vitro production of nucleic acids and peptides, and for in vivo and ex vivo gene therapy procedures (see, e.g., West el al., Virology 160:38-47 (1987); U.S. Pat. No. 4,797,368; WO 93/24641; Kotin, Human Gene Therapy 5:793-801 (1994); Muzyczka, J. Clin. Invest. 94:1351 (1994). Construction of recombinant AAV vectors are described in a number of publications, including U.S. Pat. No. 5,173,414; Tratschin et al., Mol. Cell. Biol. 5:3251-3260 (1985); Tratschin, et al., Mol. Cell. Biol.
4:2072-2081 (1984); Hermonat & Muzyczka, PNAS 81:6466-6470 (1984); and Samulski et al, J. Virol. 63:03822-3828 (1989).
[0384] Packaging cells are typically used to form vims particles that are capable of infecting a host cell. Such cells include 293 cells, which package adenovirus, and y2 cells or PA317 cells, which package retrovirus. Viral vectors used in gene therapy are usually generated by producing a cell line that packages a nucleic acid vector into a viral particle. The vectors typically contain the minimal viral sequences required for packaging and subsequent integration into a host, other viral sequences being replaced by an expression cassette for the polynucleotide(s) to be expressed. The missing viral functions are typically supplied in trans by the packaging cell line. For example, AAV vectors used in gene therapy typically only possess ITR sequences from the AAV genome which are required for packaging and integration into the host genome. Viral DNA is packaged in a cell line, which contains a helper plasmid encoding the other AAV genes, namely rep and cap, but lacking ITR sequences. The cell line may also be infected with adenovirus as a helper. The helper vims promotes replication of the AAV vector and expression of AAV genes from the helper plasmid. The helper plasmid is not packaged in significant amounts due to a lack of ITR sequences. Contamination with adenovirus can be reduced by, e.g., heat treatment to which adenovims is more sensitive than AAV. Additional methods for the delivery of nucleic acids to cells are known to those skilled in the art. Reference is made to US 2003/0087817, published May 8, 2003, International Patent Application No. WO 2016/205764, published December 22, 2016, International Patent Application No. WO/2018/071868, published April 19, 2018, and U.S. Patent Publication No. 2018/0127780, published May 10, 2018, the disclosures of each of which are incorporated herein by reference.
[0385] In various embodiments, the disclosed expression constructs may be engineered for delivery in one or more rAAV vectors. An rAAV as related to any of the methods and compositions provided herein may be of any serotype including any derivative or pseudotype (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 2/1, 2/5, 2/8, 2/9, 3/1, 3/5, 3/8, or 3/9). An rAAV may comprise a genetic load (i.e., a recombinant nucleic acid vector that expresses a gene of interest, such as a whole or split fusion protein that is carried by the rAAV into a cell) that is to be delivered to a cell. An rAAV may be chimeric.
[0386] As used herein, the serotype of an rAAV refers to the serotype of the capsid proteins of the recombinant virus. Non-limiting examples of derivatives and pseudotypes include rAAV2/l, rAAV2/5, rAAV2/8, rAAV2/9, AAV2-AAV3 hybrid, AAVrh.lO, AAVhu.14, AAV3a/3b, AAVrh32.33, AAV-HSC15, AAV-HSC17, AAVhu.37, AAVrh.8, CHt-P6, AAV2.5, AAV6.2, AAV2i8, AAV-HSC15/17, AAVM41, AAV9.45, AAV6(Y445F/Y731F), AAV2.5T, AAV-HAE1/2, AAV clone 32/83, AAVShHIO, AAV2 (Y->F), AAV8 (Y733F), AAV2.15, AAV2.4, AAVM41, and AAVr3.45. A non-limiting example of derivatives and pseudotypes that have chimeric VP1 proteins is rAAV2/5-lVPlu, which has the genome of AAV2, capsid backbone of AAV5 and VPlu of AAV1. Other non-limiting example of derivatives and pseudotypes that have chimeric VP1 proteins are rAAV2/5-8VPlu, rAAV2/9-lVPlu, and rAAV2/9-8VPlu.
[0387] AAV derivatives/pseudotypes, and methods of producing such
derivatives/pseudotypes are known in the art (see, e.g., Mol Ther. 2012 Apr;20(4):699-708. doi: 10.1038/mt.2011.287. Epub 2012 Jan 24. The AAV vector toolkit: poised at the clinical crossroads. Asokan Al, Schaffer DV, Samulski RJ.). Methods for producing and using pseudotyped rAAV vectors are known in the art (see, e.g., Duan et ah, J. Virol., 75:7662- 7671, 2001; Halbert et ah, J. Virol., 74:1524-1532, 2000; Zolotukhin et ah, Methods, 28:158- 167, 2002; and Auricchio et ah, Hum. Molec. Genet., 10:3075-3081, 2001).
[0388] Methods of making or packaging rAAV particles are known in the art and reagents are commercially available (see, e.g., Zolotukhin et al. Production and purification of serotype 1, 2, and 5 recombinant adeno-associated viral vectors. Methods 28 (2002) 158-167; and U.S. Patent Publication Numbers US20070015238 and US20120322861, which are incorporated herein by reference; and plasmids and kits available from ATCC and Cell Biolabs, Inc.). For example, a plasmid comprising a gene of interest may be combined with one or more helper plasmids, e.g., that contain a rep gene (e.g., encoding Rep78, Rep68, Rep52 and Rep40) and a cap gene (encoding VP1, VP2, and VP3, including a modified VP2 region as described herein), and transfected into a recombinant cells such that the rAAV particle can be packaged and subsequently purified. [0389] In some embodiments, the fusion proteins can be divided at a split site and provided as two halves of a whole/complete fusion protein. The two halves can be delivered to cells (e.g., as expressed proteins or on separate expression vectors) and once in contact inside the cell, the two halves form the complete fusion protein through the self-splicing action of the inteins on each fusion protein half. Split intein sequences can be engineered into each of the halves of the encoded fusion protein to facilitate their transplicing inside the cell and the concomitant restoration of the complete, functioning ACBE.
[0390] These split intein-based methods overcome several barriers to in vivo delivery. For example, the DNA encoding fusion proteins is larger than the recombinant AAV (rAAV) packaging limit, and so requires different solutions. One such solution is formulating the editor fused to split intein pairs that are packaged into two separate rAAV particles that, when co-delivered to a cell, reconstitute the functional editor protein. Several other special considerations to account for the unique features of base editing are described, including the optimization of second-site nicking targets and properly packaging fusion proteins into vims vectors, including lentiviruses and rAAV.
[0391] Accordingly, the disclosure provides dual rAAV vectors and dual rAAV vector particles that comprise expression constructs that encode two halves of any of the disclosed fusion proteins, wherein the encoded fusion protein is divided between the two halves at a split site. In some embodiments, the two halves may be delivered to cells (e.g., as expressed proteins or on separate expression vectors) and once in contact inside the cell, the two halves form the complete fusion protein through the self-splicing action of the inteins on each fusion protein half. Split intein sequences can be engineered into each of the halves of the encoded fusion protein to facilitate their transplicing inside the cell and the concomitant restoration of the complete, functioning ACBE.
[0392] In various embodiments, the fusion proteins may be engineered as two half proteins (i.e., an ACBE N-terminal half and a ACBE C-terminal half) by“splitting” the whole fusion protein as a“split site.” The“split site” refers to the location of insertion of split intein sequences (i.e., the N intein and the C intein) between two adjacent amino acid residues in the fusion protein. More specifically, the“split site” refers to the location of dividing the whole fusion protein into two separate halves, wherein in each halve is fused at the split site to either the N intein or the C intein motifs. The split site can be at any suitable location in the fusion protein fusion protein, but preferably the split site is located at a position that allows for the formation of two half proteins which are appropriately sized for delivery (e.g., by expression vector) and wherein the inteins, which are fused to each half protein at the split site termini, are available to sufficiently interact with one another when one half protein contacts the other half protein inside the cell.
[0393] Additional methods for the delivery of nucleic acids to cells are known to those skilled in the art. See, for example, US 2003/0087817, incorporated herein by reference.
[0394] It should be appreciated that any fusion protein, e.g., any of the fusion proteins provided herein, may be introduced into the cell in any suitable way, either stably or transiently. In some embodiments, a fusion protein may be transfected into the cell. In some embodiments, the cell may be transduced or transfected with a nucleic acid construct that encodes a fusion protein. For example, a cell may be transduced (e.g., with a vims encoding a fusion protein), or transfected (e.g., with a plasmid encoding a fusion protein) with a nucleic acid that encodes a fusion protein, or the translated fusion protein. Such transduction may be a stable or transient transduction. In some embodiments, cells expressing a fusion protein or containing a fusion protein may be transduced or transfected with one or more gRNA molecules, for example when the fusion protein comprises a Cas9 (e.g., nCas9) domain. In some embodiments, a plasmid expressing a fusion protein may be introduced into cells through electroporation, transient (e.g., lipofection) and stable genome integration (e.g., piggybac) and viral transduction or other methods known to those of skill in the art.
Kits and cells
[0395] This disclosure provides kits comprising a nucleic acid construct comprising nucleotide sequences encoding the fusion proteins, gRNAs, and/or complexes described herein. Some embodiments of this disclosure provide kits comprising a nucleic acid construct comprising a nucleotide sequence encoding an adenine oxidase-napDNAbp fusion protein capable of oxidizing an adenine in a nucleic acid molecule. In some embodiments, the nucleotide sequence encodes any of the adenine oxidases provided herein. In some embodiments, the nucleotide sequence comprises a heterologous promoter that drives expression of the fusion protein. The nucleotide sequence may further comprise a heterologous promoter that drives expression of the gRNA, or a heterologous promoter that drives expression of the fusion protein and the gRNA.
[0396] In some embodiments, the kit further comprises an expression construct encoding a guide nucleic acid backbone, e.g., a guide RNA backbone, wherein the construct comprises a cloning site positioned to allow the cloning of a nucleic acid sequence identical or complementary to a target sequence into the guide nucleic acid, e.g., guide RNA backbone. In some embodiments, the kit further comprises an expression construct comprising a nucleotide sequence encoding an iBER.
[0397] The disclosure further provides kits comprising a fusion protein as provided herein, a gRNA having complementarity to a target sequence, and one or more of the following:
cofactor proteins, buffers, media, and target cells (e.g. human cells). Kits may comprise combinations of several or all of the aforementioned components.
[0398] Some embodiments of this disclosure provide cells comprising any of the fusion proteins or complexes provided herein. In some embodiments, the cells comprise nucleotide constructs that encodes any of the fusion proteins provided herein. In some embodiments, the cells comprise any of the nucleotides or vectors provided herein. In some embodiments, a host cell is transiently or non-transiently transfected with one or more vectors described herein. In some embodiments, a cell is transfected as it naturally occurs in a subject. In some embodiments, a cell that is transfected is taken from a subject. In some embodiments, the cell is derived from cells taken from a subject, such as a cell line. A wide variety of cell lines for tissue culture are known in the art.
[0399] In some embodiments, a host cell is transiently or non-transiently transfected with one or more vectors described herein. In some embodiments, a cell is transfected as it naturally occurs in a subject. In some embodiments, a cell that is transfected is taken from a subject.
In some embodiments, the cell is derived from cells taken from a subject, such as a cell line.
A wide variety of cell lines for tissue culture are known in the art. Examples of cell lines include, but are not limited to, C8161, CCRF-CEM, MOLT, mIMCD-3, NHDF, HeLa-S3, Huhl, Huh4, Huh7, HUVEC, HASMC, HEKn, HEKa, MiaPaCell, Panel, PC-3, TF1, CTLL- 2, C1R, Rat6, CV1, RPTE, A10, T24, J82, A375, ARH-77, Calul, SW480, SW620, SKOV3, SK-UT, CaCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, Jurkat, J45.01, LRMB, Bcl-1, BC-3, IC21, DLD2, Raw264.7, NRK, NRK-52E, MRC5, MEF, Hep G2, HeLa B, HeLa T4, COS, COS-1, COS-6, COS-M6A, BS-C-1 monkey kidney epithelial, BALB/3T3 mouse embryo fibroblast, 3T3 Swiss, 3T3-L1, 132-d5 human fetal fibroblasts; 10.1 mouse fibroblasts, 293-T, 3T3, 721, 9L, A2780, A2780ADR, A2780cis, A 172, A20, A253, A431, A-549, ALC, B16, B35, BCP-1 cells, BEAS-2B, bEnd.3, BHK-21, BR 293. BxPC3. C3H- 10T1/2, C6/36, Cal-27, CHO, CHO-7, CHO-IR, CHO-K1, CHO-K2, CHO-T, CHO Dhfr -/-, COR-L23, COR-L23/CPR, COR-L23/5010, COR-L23/R23, COS-7, COV-434, CML Tl, CMT, CT26, D17, DH82, DU145, DuCaP, EL4, EM2, EM3, EMT6/AR1, EMT6/AR10.0, FM3, H1299, H69, HB54, HB55, HCA2, HEK-293, HeLa, Hepalclc7, HL-60, HMEC, HT- 29, Jurkat, JY cells, K562 cells, Ku812, KCL22, KG1, KYOl, LNCap, Ma-Mel 1-48, MC- 38, MCF-7, MCF-IOA, MDA-MB-231, MDA-MB-468, MDA-MB-435, MDCK II, MDCK 11, MOR/0.2R, MONO-MAC 6, MTD-1A, MyEnd, NCI-H69/CPR, NCI-H69/LX10, NCI- H69/LX20, NCI-H69/LX4, NIH-3T3, NALM-1, NW-145, OPCN/OPCT cell lines, Peer, PNT-1A/PNT 2, RenCa, RIN-5F, RMA/RMAS, Saos-2 cells, Sf-9, SkBr3, T2, T-47D, T84, THP1 cell line, U373, U87, U937, VCaP, Vero cells, WM39, WT-49, X63, YAC-1, YAR, and transgenic varieties thereof. Cell lines are available from a variety of sources known to those with skill in the art (see, e.g., the American Type Culture Collection (ATCC)
(Manassus, Va.)). In some embodiments, a cell transfected with one or more vectors described herein is used to establish a new cell line comprising one or more vector-derived sequences. In some embodiments, a cell transiently transfected with the components of a CRISPR system as described herein (such as by transient transfection of one or more vectors, or transfection with RNA), and modified through the activity of a CRISPR complex, is used to establish a new cell line comprising cells containing the modification but lacking any other exogenous sequence. In some embodiments, cells transiently or non-transiently transfected with one or more vectors described herein, or cell lines derived from such cells are used in assessing one or more test compounds.
EXAMPLES
Example 1. An AlkBH3 Base Editor
[0400] Oxidation of adenine to 8-oxoadenine, which disrupts existing hydrogen bonding with the unmutated strand, may be catalyzed by a fusion protein. Steric rotation of the 8-oxoA around the glycosidic bond is induced, presenting the Hoogsteen edge for base pairing during replication or repair of the unmutated strand, 8-oxoA is read by a polymerase as a cytosine and the cell’s mismatch repair machinery converts the base-paired thymine of the non-edited strand to a guanine to correct the apparent mismatch. The resulting base pairing features two three-center hydrogen bonding systems. Upon the next round of replication, the cell’s mismatch repair machinery converts the 8-oxoA lesion to a cytosine (FIG. 2). Human AlkBH3 performs methyl group C-H hydroxylation on DNA and RNA via an active Fe(IV)- oxo intermediate formed through an iron cofactor and is selective for ssDNA over dsDNA.
[0401] Human AlkBH3 was purified and isolated. The AlkBH3 was tethered to an nCas9 using an XTEN linker (SEQ ID NO: 11). The fusion protein was introduced to E. coli cells.
[0402] The AlkBH3 protein was sequenced by LC-MS/MS. The AlkBH3 gene was cloned and the activity of the encoded protein confirmed. Example 2. Evolving the AlkBH3 Base Editor
[0403] Guided by the crystallographic structures, mutations were targeted to residues in the active site and/or at the DNA binding interface with the goal of affecting the relative orientation of a target adenine and the non-heme Fe(IV) center. In this manner, variants of AlkBH3 were evolved using PACE systems to form a large library of AlkBH3 mutants. Mutants were cloned into a vector coding for an N-terminal fusion with a dCas9. Mutants were subjected to selection based on ability to convert adenine into 8-oxoadenine in DNA using a spectinomycin antibiotic resistance assay.
[0404] The E. coli selection strain was transformed with a) an accessory plasmid containing an AlkBH3 mutant-dCas9 fusion and targeting guide RNAs, and b) a selection plasmid containing an inactivated spectinomycin resistance gene with a mutation at the active site that requires A:T-to-C:G editing to correct (FIG. 4). Cells harboring AlkBH3 mutants that restored antibiotic resistance were isolated and subjected to further rounds of mutation and selection under varying selection stringencies.
[0405] E. coli do not have a nick-directed mismatch repair pathway, thus the Cas9 nickase function in mammalian base editing constructs was excluded from this mutant selection design. 8-oxoA lesions are removed by a mismatch- specific uracil-DNA glycosylase (MUG) in E. coli. Accordingly, to avoid clearance of targeted 8-oxoA lesions before the full base edit could take hold, MUG knockout E. coli cells were also employed during screening.
[0406] Those AlkBH3 variants that conferred a survival advantage to E. coli cells containing the edited selection gene of >100-fold are expressed within a fusion construct comprising a Cas9 nickase, wherein the nickase is tethered to the oxidase by a linker (e.g., an XTEN linker). The resulting fusion protein is tested for base editing activity in human and murine cells.
[0407] Because 8-oxoadenine excision by the cell’s native repair machinery limits editing efficiency, the oxidized adenine can be protected from base excision repair by fusing to the candidate A-to-C base editor (ACBE) to a known catalytically inactivated iBER (e.g., TDG) that retains its ability to tightly bind 8-oxoadenine-containing DNA (FIG. 5). See, e.g., Norman, D. P., Chung, S. J. & Verdine, G. L., Structural and biochemical exploration of a critical amino acid in human 8-oxo-guanine glycosylase, Biochemistry 42, 1564-1572 (2003) and Banerjee, A., Santos, W. L. & Verdine, G. L., Structure of a DNA glycosylase searching for lesions, Science 311, 1153-1157 (2006), each of which is incorporated by reference herein. [0408] Candidate ACBEs are characterized in human (HEK293T) and murine cell lines across > 30 endogenous genomic loci to assess editing efficiency, product purity, the size of the editing window, and sequence context preferences (FIG. 4). Directed evolution is continued until the resulting ACBEs perform at a level useful to the genome editing community (e.g., > 20% editing, > 50% product purity, < 5% indels, and an editing window of 2-8 nucleotides). Similar to studies reported with previous base editors, off-target analysis is performed for candidate ACBEs at Cas9 nuclease off-targets identified by GUIDE-seq using the same sgRNAs. See Tsai, S. Q. el al, GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nature Biotechnology 33, 187-197 (2015), which is incorporated herein in its entirety.
[0409] If AlkBH3 ultimately proves unsuccessful, selections and evolutions are performed using other candidate 8-oxoadenine-generating enzymes that are known to oxidize DNA and/or RNA. These enzymes include, but are not limited to, human TET1, TET2 or TET3, human xanthine dehydrogenase, human AOX1, and human CYP1A2, CYP2A6, or CYP3A4.
EQUIVALENTS AND SCOPE
[0410] In the claims articles such as“a,”“an,” and“the” may mean one or more than one unless indicated to the contrary or otherwise evident from the context. Claims or descriptions that include“or” between one or more members of a group are considered satisfied if one, more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process unless indicated to the contrary or otherwise evident from the context. The invention includes embodiments in which exactly one member of the group is present in, employed in, or otherwise relevant to a given product or process. The invention includes embodiments in which more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process.
[0411] Furthermore, the invention encompasses all variations, combinations, and
permutations in which one or more limitations, elements, clauses, and descriptive terms from one or more of the listed claims is introduced into another claim. For example, any claim that is dependent on another claim can be modified to include one or more limitations found in any other claim that is dependent on the same base claim. Where elements are presented as lists, e.g., in Markush group format, each subgroup of the elements is also disclosed, and any element(s) can be removed from the group. It should it be understood that, in general, where the invention, or embodiments of the invention, is/are referred to as comprising particular elements and/or features, certain embodiments of the invention or embodiments of the invention consist, or consist essentially of, such elements and/or features. For purposes of simplicity, those embodiments have not been specifically set forth in haec verba herein. It is also noted that the terms“comprising” and“containing” are intended to be open and permits the inclusion of additional elements or steps. Where ranges are given, endpoints are included. Furthermore, unless otherwise indicated or otherwise evident from the context and understanding of one of ordinary skill in the art, values that are expressed as ranges can assume any specific value or sub-range within the stated ranges in different embodiments of the invention, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise.
[0412] This application refers to various issued patents, published patent applications, journal articles, and other publications, all of which are incorporated herein by reference. If there is a conflict between any of the incorporated references and the instant specification, the specification shall control. In addition, any particular embodiment of the present invention that falls within the prior art may be explicitly excluded from any one or more of the claims. Because such embodiments are deemed to be known to one of ordinary skill in the art, they may be excluded even if the exclusion is not set forth explicitly herein. Any particular embodiment of the invention can be excluded from any claim, for any reason, whether or not related to the existence of prior art.
[0413] Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation many equivalents to the specific embodiments described herein. The scope of the present embodiments described herein is not intended to be limited to the above Description, but rather is as set forth in the appended claims. Those of ordinary skill in the art will appreciate that various changes and modifications to this description may be made without departing from the spirit or scope of the present invention, as defined in the following claims.

Claims

CLAIMS What is claimed is:
1. A fusion protein comprising: (i) a nucleic acid programmable DNA binding protein (napDNAbp), and (ii) an adenine oxidase.
2. The fusion protein of claim 1, wherein the adenine oxidase oxidizes an 8 position of adenine.
3. The fusion protein of claim 1 or 2, wherein the adenine oxidase oxidizes adenine to 8- oxoadenine (8-oxoA).
4. The fusion protein of any one of claims 1-3, wherein the adenine oxidase oxidizes adenine to 8-oxoadenine (8-oxoA) in deoxyribonucleic acid (DNA).
5. The fusion protein of any one of claims 1-4, wherein the adenine oxidase is a wild- type adenine oxidase, or a variant thereof.
6. The fusion protein of any one of claims 1-5, wherein the adenine oxidase is an ABH oxidase, or a variant thereof.
7. The fusion protein of any one of claims 1-5, wherein the adenine oxidase is selected from the group consisting of an ABH oxidase, a TET oxidase, an FTO, a molybdopterin- dependent xanthine hydrogenase, a molybdopterin-dependent aldehyde oxidase, or a cytochrome P450, a heme iron oxidase, and a flavin monooxygenase, or a variant thereof.
8. The fusion protein of claim 6 or 7, wherein the ABH oxidase is an AlkBH2 (ABH2) or an AlkBH3 (ABH3) oxidase, or a variant thereof.
9. The fusion protein of claim 6 or 8, wherein the ABH oxidase is a Homo sapien ABH oxidase, or a variant thereof.
10. The fusion protein of claim 6 or 8, wherein the ABH oxidase is an AlkB oxidase, or a variant thereof.
11. The fusion protein of claim 10, wherein the AlkB oxidase is an Escherichia coli AlkB oxidase, or a variant thereof.
12. The fusion protein of any one of claims 1-5, wherein the adenine oxidase is a TET oxidase, or a variant thereof.
13. The fusion protein of claim 12, wherein the TET oxidase is selected from the group consisting of TET1, TET1-CD, TET2, TET3, or a variant thereof.
14. The fusion protein of claim 12 or 13, wherein the TET oxidase is a Homo sapien TET oxidase, or a variant thereof.
15. The fusion protein of any one of claims 1-5, wherein the adenine oxidase is a cytochrome P450 or a variant thereof.
16. The fusion protein of claim 15, wherein the cytochrome P450 is selected from the group consisting of CYP1A2, CYP2A6, CYP3A4, and variants thereof.
17. The fusion protein of any one of claims 1-5, wherein the adenine oxidase is a molybdopterin-dependent aldehyde oxidase, or a variant thereof.
18. The fusion protein of claim 17, wherein the molybdopterin-dependent aldehyde oxidase is a Homo sapien AOX1, or a variant thereof.
19. The fusion protein of any one of claims 1-5, wherein the adenine oxidase is a flavin monooxygenase, or a variant thereof.
20. The fusion protein of any one of claims 1-5, wherein the adenine oxidase is a xanthine oxidase or a variant thereof.
21. The fusion protein of claim 20, wherein the xanthine oxidase is selected from the group consisting of Streptomyces cyanogenus xanthine dehydrogenase (XDH), C. capitate XDH, N. crassa XDH, M. hansupus XDH, E. cloacae XDH, S. snoursei XDH, S. albulus XDH, S. himastatinicus XDH, or S. lividans XDH, or any variant thereof.
22. The fusion protein of claim 20, wherein the xanthine oxidase is Streptomyces cyanogenus xanthine dehydrogenase.
23. The fusion protein of any one of claims 1-5, wherein the adenine oxidase is an FTO oxidase.
24. The fusion protein of any one of claims 1-23, wherein the adenine oxidase comprises an amino acid sequence that is at least 80%, 85%, 90%, 95%, 98%, or 99% identical to the amino acid sequence of any one of SEQ ID NOs: 5-8, 10, 15-20, 22-31, and 35-41.
25. The fusion protein of any one of claims 1-24, wherein the adenine oxidase comprises any one of the amino acid sequences of SEQ ID NOs: 5-8, 10, 15-20, 22-31, and 35-41.
26. The fusion protein of any one of claims 5-25, wherein the variant of the wild-type adenine oxidase is produced by evolving an oxidase.
27. The fusion protein of claim 26, wherein the variant of the wild-type adenine oxidase is produced by evolving an ABH oxidase, a TET oxidase, a cytochrome P450, a
molybdopterin-dependent aldehyde oxidase, a flavin monooxygenase, or a xanthine oxidase.
28. The fusion protein of claim 26 or 27, wherein the evolving includes phage assisted continuous evolution (PACE).
29. The fusion protein of claim 26 or 27, wherein the evolving includes phage assisted non-continuous evolution (PANCE).
30. The fusion protein of any one of claims 1-29, further comprising an inhibitor of base excision repair (iBER).
31. The fusion protein of claim 30, wherein the inhibitor of base excision repair (iBER) is a DNA glycosylase inhibitor.
32. The fusion protein of clam 30 or 31, wherein the inhibitor of base excision repair (iBER) is a thymine-DNA glycosylase inhibitor, a uracil-DNA glycosylase inhibitor or an 8- oxo-guanine glycosylase inhibitor.
33. The fusion protein of any one of claims 30-32, wherein the inhibitor of base excision repair (iBER) is a catalytically inactive thymine-DNA glycosylase.
34. The fusion protein of any one of claims 30-33, wherein the inhibitor of base excision repair (iBER) is a protein that binds, but does not cleave, 8-oxoadenine (8-oxoA).
35. The fusion protein of any one of claims 30-34, wherein the inhibitor of base excision repair (iBER) comprises an amino acid sequence that is at least 80%, 85%, 90%, 95%, 98%, or 99% identical to the amino acid sequence of any one of SEQ ID NOs: 43, 44, 46, and 47.
36. The fusion protein of any one of claims 30-35, wherein the inhibitor of base excision repair (iBER) comprises an amino acid sequence of SEQ ID NOs: 43, 44, 46, and 47.
37. The fusion protein of any one of claims 1-36, wherein the nucleic acid programmable DNA binding protein (napDNAbp) is a Cas9, a CasX, a CasY, a C2cl, a C2c2, a C2c3, a GeoCas9, a CjCas9, a Casl2a, a Casl2b, a Casl2g, a Casl2h, a Casl2i, a Casl3b, a Casl3c, a Casl3d, a Casl4, a Csn2, an xCas9, an SpCas9-NG, an LbCasl2a, an AsCasl2a, a Cas9- KKH, a circularly permuted Cas9, an Argonaute (Ago), a SmacCas9, or a Spy-macCas9 domain.
38. The fusion protein of claim 37, wherein the Cas9 domain is a nuclease dead Cas9 (dCas9), a Cas9 nickase (nCas9), or a nuclease active Cas9.
39. The fusion protein of claim 37, wherein the Cas9 domain is a nuclease dead Cas9 (dCas9).
40. The fusion protein of claim 37, wherein the Cas9 domain is a Cas9 nickase (nCas9).
41. The fusion protein of claim 37, wherein the Cas9 domain is a nuclease active Cas9.
42. The fusion protein of any one of claims 1-41, wherein the fusion protein comprises the structure NH2-[napDNAbp]- [adenine oxidase] -COOH, or NH2-[adenine oxidase]- [napDNAbp]-COOH, wherein each instance of
Figure imgf000224_0001
indicates the presence of an optional linker sequence.
43. The fusion protein of claim 42, wherein the napDNAbp and the adenine oxidase are fused via a linker comprising the amino acid sequence
SGGSSGGSSGS ETPGT S ES ATPES SGGSSGGS (SEQ ID NO: 11); SGGSGGSGGS (SEQ ID NO: 12); GGG; GGGS (SEQ ID NO: 1); SGGGS (SEQ ID NO: 2);
SGSETPGTSESATPES (SEQ ID NO: 48); or SGGS (SEQ ID NO: 14).
44. The fusion protein of any one of claims 1-43, wherein the fusion protein comprises the structure:
Nth- [iBER] - [napDNAbp] -[adenine oxidase] -COOH;
N¾- [napDNAbp] - [iBER] -[adenine oxidase] -COOH;
N¾- [napDNAbp] - [adenine oxidase] - [iBER] -COOH;
N¾- [iBER] - [adenine oxidase] -[napDNAbp] -COOH;
NH2-[adenine oxidase]-[iBER]-[napDNAbp]-COOH; or
NH2-[adenine oxidase]-[napDNAbp]-[iBER]-COOH, wherein each instance of“]-[” indicates the presence of an optional linker sequence.
45. The fusion protein of claim 44, wherein the napDNAbp and the adenine oxidase are fused via a linker comprising the amino acid sequence:
SGGSSGGSSGS ETPGT S ES ATPES SGGSSGGS (SEQ ID NO: 11); SGGSGGSGGS (SEQ ID NO: 12); GGG; GGGS (SEQ ID NO: 1); SGGGS (SEQ ID NO: 2);
SGSETPGTSESATPES (SEQ ID NO: 48); or SGGS (SEQ ID NO: 14).
46. The fusion protein of claim 44 or 45, wherein the napDNAbp and the iBER are fused via a linker comprising the amino acid sequence:
SGGSSGGSSGS ETPGT S ES ATPES SGGSSGGS (SEQ ID NO: 11); SGGSGGSGGS (SEQ ID NO: 12); GGG; GGGS (SEQ ID NO: 1); SGGGS (SEQ ID NO: 2);
SGSETPGTSESATPES (SEQ ID NO: 48); or SGGS (SEQ ID NO: 14).
47. The fusion protein of any one of claims 44-46, wherein the adenine oxidase and the iBER are fused via a linker comprising the amino acid sequence:
SGGSSGGSSGS ETPGT S ES ATPES SGGSSGGS (SEQ ID NO: 11); SGGSGGSGGS (SEQ ID NO: 12); GGG; GGGS (SEQ ID NO: 1); SGGGS (SEQ ID NO: 2);
SGSETPGTSESATPES (SEQ ID NO: 48); or SGGS (SEQ ID NO: 14).
48. A polynucleotide encoding the fusion protein of any one of claims 1-47.
49. A vector comprising the polynucleotide of claim 48.
50. The vector of claim 49, wherein the vector comprises a heterologous promoter driving expression of the polynucleotide.
51. A complex comprising the fusion protein of any one of claims 1-47 and a guide RNA bound to the nucleic acid programmable DNA binding protein (napDNAbp) of the fusion protein.
52. A cell comprising the fusion protein of any one of claims 1-47.
53. A cell comprising the polynucleotide of claim 48.
54. A cell comprising the vector of claim 49 or 50.
55. A cell comprising the complex of claim 51.
56. A pharmaceutical composition comprising the fusion protein of any one of claims 1- 47 and a pharmaceutically acceptable excipient.
57. A pharmaceutical composition comprising the polynucleotide of claim 48 and a pharmaceutically acceptable excipient.
58. A pharmaceutical composition comprising the vector of claim 49 or 50 and a pharmaceutically acceptable excipient.
59. A pharmaceutical composition comprising the complex of claim 51 and a
pharmaceutically acceptable excipient.
60. A kit comprising a nucleic acid construct, comprising
(i) a nucleic acid sequence encoding the fusion protein of any one of claims 1-47;
(ii) a nucleic acid sequence encoding a gRNA; and
(iii) one or more heterologous promoters that drive the expression of the sequence of
(i) and/or the sequence of (ii).
61. The kit of claim 60, further comprising an expression construct encoding a guide RNA backbone, wherein the construct comprises a cloning site positioned to allow the cloning of a nucleic acid sequence identical or complementary to a target sequence into the guide RNA backbone.
62. A kit comprising:
(i) the fusion protein of any one of claims 1-47;
(ii) a gRNA; and
(iii) target cells.
63. A method for editing a nucleobase pair of a double- stranded DNA sequence, the method comprising:
contacting a double-stranded DNA sequence with a complex comprising a fusion protein of any one of claims 1-47 and a guide nucleic acid, wherein the double-stranded DNA comprises a target A:T nucleobase pair, and whereby the adenine (A) of the A:T nucleobase pair is oxidized to 8-oxoadenine (8-oxoA).
64. The method of claim 63, whereby the step of contacting induces separation of the double-stranded DNA at a target region.
65. The method of claim 63 or 64, whereby one strand of the double- stranded DNA is cut, wherein the one strand comprises the T of the target A:T nucleobase pair.
66. The method of any one of claims 63-65, whereby the T of the target A:T nucleobase pair is replaced with a guanine (G).
67. The method of any one of claims 63-66, whereby the 8-oxoadenine is replaced with a cytosine (C), thereby generating an A to C point mutation.
68. The method of any one of claims 63-67, wherein the double- stranded DNA comprises a sequence associated with a disease or disorder.
69. The method of any one of claims 63-68, wherein the double- stranded DNA comprises a point mutation associated with a disease or disorder.
70. The method of claim 69, wherein the activity of the fusion protein or the complex results in a correction of the point mutation.
71. The method of any one of claims 63-70, wherein the method is performed in vitro , in vivo, or ex vivo.
72. The method of any one of claims 63-71, wherein the double- stranded DNA is in a subject.
73. The method of claim 72, wherein the subject is human.
74. A method of treating a subject having or at risk of developing a disease, disorder, or condition, the method comprising:
administering to the subject the fusion protein the fusion protein of any one of claims 1-47, the polynucleotide of claim 48, the vector of claim 49 or 50, the complex of claim 51, or the pharmaceutical composition of any one of claims 56-59.
75. The method of claim 74, wherein the subject has been diagnosed with a disease, disorder, or condition.
76. The method of claim 74 or 75, wherein the subject has a C to A, or a G to T mutation that is associated with a disease, disorder, or condition.
77. The method of claim 76, wherein the A of the C to A mutation is converted to a C.
78. The method of claim 76 or 77, wherein the T of the G to T mutation is converted to a G.
79. The method of claim 74, wherein the disease, disorder, or condition is congenital deafness, spastic paraplegia, nonsyndromic hearing loss, spinal muscular atrophy, or hypohidrotic ectodermal dysplasia.
80. Use of (a) a fusion protein of any one of claims 1-47 and (b) a guide RNA targeting the base editor of (a) to a target A:T nucleobase pair in a double- stranded DNA molecule in DNA editing.
81. The use of claim 80, whereby the DNA editing comprises nicking one strand of the double-stranded DNA, wherein the one strand comprises the T of the target T:A nucleobase pair.
82. Use of a fusion protein of any one of claims 1-47, or a complex of claim 51, as a medicament.
83. The fusion protein of any one of claims 1-5, wherein the fusion protein does not comprise an alkA dehydrogenase or an alkB dehydrogenase.
84. The fusion protein of any one of claims 1-5, wherein the fusion protein does not comprise a TET oxidase or TET dioxygenase.
PCT/US2020/021362 2019-03-06 2020-03-06 A:t to c:g base editors and uses thereof WO2020181180A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962814766P 2019-03-06 2019-03-06
US62/814,766 2019-03-06

Publications (1)

Publication Number Publication Date
WO2020181180A1 true WO2020181180A1 (en) 2020-09-10

Family

ID=70166147

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/021362 WO2020181180A1 (en) 2019-03-06 2020-03-06 A:t to c:g base editors and uses thereof

Country Status (1)

Country Link
WO (1) WO2020181180A1 (en)

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021108717A2 (en) 2019-11-26 2021-06-03 The Broad Institute, Inc Systems and methods for evaluating cas9-independent off-target editing of nucleic acids
US11053481B2 (en) 2013-12-12 2021-07-06 President And Fellows Of Harvard College Fusions of Cas9 domains and nucleic acid-editing domains
WO2021158921A2 (en) 2020-02-05 2021-08-12 The Broad Institute, Inc. Adenine base editors and uses thereof
WO2021222318A1 (en) 2020-04-28 2021-11-04 The Broad Institute, Inc. Targeted base editing of the ush2a gene
EP3922719A1 (en) 2020-06-12 2021-12-15 Eligo Bioscience Specific decolonization of antibiotic resistant bacteria for prophylactic purposes
US11214780B2 (en) 2015-10-23 2022-01-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
WO2022003209A1 (en) 2020-07-03 2022-01-06 Eligo Bioscience Method of containment of nucleic acid vectors introduced in a microbiome population
US11224621B2 (en) 2020-04-08 2022-01-18 Eligo Bioscience Modulation of microbiota function by gene therapy of the microbiome to prevent, treat or cure microbiome-associated diseases or disorders
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11299755B2 (en) 2013-09-06 2022-04-12 President And Fellows Of Harvard College Switchable CAS9 nucleases and uses thereof
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
WO2022096596A1 (en) 2020-11-04 2022-05-12 Eligo Bioscience Cutibacterium acnes recombinant phages, method of production and uses thereof
WO2022144381A1 (en) 2020-12-30 2022-07-07 Eligo Bioscience Microbiome modulation of a host by delivery of dna payloads with minimized spread
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
WO2022238555A1 (en) 2021-05-12 2022-11-17 Eligo Bioscience Production of lytic phages
WO2022261509A1 (en) 2021-06-11 2022-12-15 The Broad Institute, Inc. Improved cytosine to guanine base editors
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11578343B2 (en) 2014-07-30 2023-02-14 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US11584781B2 (en) 2019-12-30 2023-02-21 Eligo Bioscience Chimeric receptor binding proteins resistant to proteolytic degradation
US11617773B2 (en) 2020-04-08 2023-04-04 Eligo Bioscience Elimination of colonic bacterial driving lethal inflammatory cardiomyopathy
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11702651B2 (en) 2016-08-03 2023-07-18 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11746352B2 (en) 2019-12-30 2023-09-05 Eligo Bioscience Microbiome modulation of a host by delivery of DNA payloads with minimized spread
WO2023196802A1 (en) 2022-04-04 2023-10-12 The Broad Institute, Inc. Cas9 variants having non-canonical pam specificities and uses thereof
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
WO2023212715A1 (en) 2022-04-28 2023-11-02 The Broad Institute, Inc. Aav vectors encoding base editors and uses thereof
US11820969B2 (en) 2016-12-23 2023-11-21 President And Fellows Of Harvard College Editing of CCR2 receptor gene to protect against HIV infection
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
US11920181B2 (en) 2013-08-09 2024-03-05 President And Fellows Of Harvard College Nuclease profiling system
WO2024047151A1 (en) 2022-08-31 2024-03-07 Snipr Biome Aps A novel type of crispr/cas system
US11970701B2 (en) 2023-09-28 2024-04-30 Eligo Bioscience Phage-derived particles for in situ delivery of DNA payload into C. acnes population

Citations (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4186183A (en) 1978-03-29 1980-01-29 The United States Of America As Represented By The Secretary Of The Army Liposome carriers in chemotherapy of leishmaniasis
US4217344A (en) 1976-06-23 1980-08-12 L'oreal Compositions containing aqueous dispersions of lipid spheres
US4235871A (en) 1978-02-24 1980-11-25 Papahadjopoulos Demetrios P Method of encapsulating biologically active materials in lipid vesicles
US4261975A (en) 1979-09-19 1981-04-14 Merck & Co., Inc. Viral liposome particle
US4485054A (en) 1982-10-04 1984-11-27 Lipoderm Pharmaceuticals Limited Method of encapsulating biologically active materials in multilamellar lipid vesicles (MLV)
US4501728A (en) 1983-01-06 1985-02-26 Technology Unlimited, Inc. Masking of liposomes from RES recognition
EP0264166A1 (en) 1986-04-09 1988-04-20 Genzyme Corporation Transgenic animals secreting desired proteins into milk
US4774085A (en) 1985-07-09 1988-09-27 501 Board of Regents, Univ. of Texas Pharmaceutical administration systems containing a mixture of immunomodulators
US4797368A (en) 1985-03-15 1989-01-10 The United States Of America As Represented By The Department Of Health And Human Services Adeno-associated virus as eukaryotic expression vector
US4837028A (en) 1986-12-24 1989-06-06 Liposome Technology, Inc. Liposomes with enhanced circulation time
US4873316A (en) 1987-06-23 1989-10-10 Biogen, Inc. Isolation of exogenous recombinant proteins from the milk of transgenic mammals
US4880635A (en) 1984-08-08 1989-11-14 The Liposome Company, Inc. Dehydrated liposomes
US4897355A (en) 1985-01-07 1990-01-30 Syntex (U.S.A.) Inc. N[ω,(ω-1)-dialkyloxy]- and N-[ω,(ω-1)-dialkenyloxy]-alk-1-yl-N,N,N-tetrasubstituted ammonium lipids and uses therefor
US4906477A (en) 1987-02-09 1990-03-06 Kabushiki Kaisha Vitamin Kenkyusyo Antineoplastic agent-entrapping liposomes
US4911928A (en) 1987-03-13 1990-03-27 Micro-Pak, Inc. Paucilamellar lipid vesicles
US4917951A (en) 1987-07-28 1990-04-17 Micro-Pak, Inc. Lipid vesicles formed of surfactants and steroids
US4920016A (en) 1986-12-24 1990-04-24 Linear Technology, Inc. Liposomes with enhanced circulation time
US4921757A (en) 1985-04-26 1990-05-01 Massachusetts Institute Of Technology System for delayed and pulsed release of biologically active substances
US4946787A (en) 1985-01-07 1990-08-07 Syntex (U.S.A.) Inc. N-(ω,(ω-1)-dialkyloxy)- and N-(ω,(ω-1)-dialkenyloxy)-alk-1-yl-N,N,N-tetrasubstituted ammonium lipids and uses therefor
US5049386A (en) 1985-01-07 1991-09-17 Syntex (U.S.A.) Inc. N-ω,(ω-1)-dialkyloxy)- and N-(ω,(ω-1)-dialkenyloxy)Alk-1-YL-N,N,N-tetrasubstituted ammonium lipids and uses therefor
WO1991016024A1 (en) 1990-04-19 1991-10-31 Vical, Inc. Cationic lipids for intracellular delivery of biologically active molecules
WO1991017424A1 (en) 1990-05-03 1991-11-14 Vical, Inc. Intracellular delivery of biologically active substances by means of self-assembling lipid complexes
US5173414A (en) 1990-10-30 1992-12-22 Applied Immune Sciences, Inc. Production of recombinant adeno-associated virus vectors
WO1993024641A2 (en) 1992-06-02 1993-12-09 The United States Of America, As Represented By The Secretary, Department Of Health & Human Services Adeno-associated virus with inverted terminal repeat sequences as promoter
WO2001038547A2 (en) 1999-11-24 2001-05-31 Mcs Micro Carrier Systems Gmbh Polypeptides comprising multimers of nuclear localization signals or of protein transduction domains and their use for transferring molecules into cells
US6453242B1 (en) 1999-01-12 2002-09-17 Sangamo Biosciences, Inc. Selection of sites for targeting by zinc finger proteins and methods of designing zinc finger proteins to bind to preselected sites
US6503717B2 (en) 1999-12-06 2003-01-07 Sangamo Biosciences, Inc. Methods of using randomized libraries of zinc finger proteins for the identification of gene function
US6534261B1 (en) 1999-01-12 2003-03-18 Sangamo Biosciences, Inc. Regulation of endogenous gene expression in cells using zinc finger proteins
US6599692B1 (en) 1999-09-14 2003-07-29 Sangamo Bioscience, Inc. Functional genomics using zinc finger proteins
US6689558B2 (en) 2000-02-08 2004-02-10 Sangamo Biosciences, Inc. Cells for drug discovery
US7013219B2 (en) 1999-01-12 2006-03-14 Sangamo Biosciences, Inc. Regulation of endogenous gene expression in cells using zinc finger proteins
US20070015238A1 (en) 2002-06-05 2007-01-18 Snyder Richard O Production of pseudotyped recombinant AAV virions
WO2010028347A2 (en) 2008-09-05 2010-03-11 President & Fellows Of Harvard College Continuous directed evolution of proteins and nucleic acids
US20110059502A1 (en) 2009-09-07 2011-03-10 Chalasani Sreekanth H Multiple domain proteins
WO2011053982A2 (en) 2009-11-02 2011-05-05 University Of Washington Therapeutic nuclease compositions and methods
WO2012088381A2 (en) 2010-12-22 2012-06-28 President And Fellows Of Harvard College Continuous directed evolution
US20120322861A1 (en) 2007-02-23 2012-12-20 Barry John Byrne Compositions and Methods for Treating Diseases
US8871445B2 (en) 2012-12-12 2014-10-28 The Broad Institute Inc. CRISPR-Cas component systems, methods and compositions for sequence manipulation
WO2015035136A2 (en) 2013-09-06 2015-03-12 President And Fellows Of Harvard College Delivery system for functional nucleases
US20150166980A1 (en) 2013-12-12 2015-06-18 President And Fellows Of Harvard College Fusions of cas9 domains and nucleic acid-editing domains
WO2015134121A2 (en) 2014-01-20 2015-09-11 President And Fellows Of Harvard College Negative selection and stringency modulation in continuous evolution systems
US9340799B2 (en) 2013-09-06 2016-05-17 President And Fellows Of Harvard College MRNA-sensing switchable gRNAs
US9405700B2 (en) 2010-11-04 2016-08-02 Sonics, Inc. Methods and apparatus for virtualization in an integrated circuit
WO2016168631A1 (en) 2015-04-17 2016-10-20 President And Fellows Of Harvard College Vector-based mutagenesis system
WO2016205764A1 (en) 2015-06-18 2016-12-22 The Broad Institute Inc. Novel crispr enzymes and systems
US20170044520A1 (en) 2015-07-22 2017-02-16 President And Fellows Of Harvard College Evolution of site-specific recombinases
WO2017070632A2 (en) 2015-10-23 2017-04-27 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US20170233708A1 (en) 2014-10-22 2017-08-17 President And Fellows Of Harvard College Evolution of proteases
WO2017208247A1 (en) * 2016-06-02 2017-12-07 Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. Assay for the removal of methyl-cytosine residues from dna
WO2018027078A1 (en) 2016-08-03 2018-02-08 President And Fellows Of Harard College Adenosine nucleobase editors and uses thereof
WO2018071868A1 (en) 2016-10-14 2018-04-19 President And Fellows Of Harvard College Aav delivery of nucleobase editors
WO2018152197A1 (en) * 2017-02-15 2018-08-23 Massachusetts Institute Of Technology Dna writers, molecular recorders and uses thereof
US10077453B2 (en) 2014-07-30 2018-09-18 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
WO2018176009A1 (en) 2017-03-23 2018-09-27 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable dna binding proteins
WO2019023680A1 (en) 2017-07-28 2019-01-31 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (pace)
WO2019079347A1 (en) 2017-10-16 2019-04-25 The Broad Institute, Inc. Uses of adenosine base editors
WO2019226593A1 (en) 2018-05-24 2019-11-28 Aqua-Aerobic Systems, Inc. System and method of solids conditioning in a filtration system
WO2019241649A1 (en) 2018-06-14 2019-12-19 President And Fellows Of Harvard College Evolution of cytidine deaminases

Patent Citations (81)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4217344A (en) 1976-06-23 1980-08-12 L'oreal Compositions containing aqueous dispersions of lipid spheres
US4235871A (en) 1978-02-24 1980-11-25 Papahadjopoulos Demetrios P Method of encapsulating biologically active materials in lipid vesicles
US4186183A (en) 1978-03-29 1980-01-29 The United States Of America As Represented By The Secretary Of The Army Liposome carriers in chemotherapy of leishmaniasis
US4261975A (en) 1979-09-19 1981-04-14 Merck & Co., Inc. Viral liposome particle
US4485054A (en) 1982-10-04 1984-11-27 Lipoderm Pharmaceuticals Limited Method of encapsulating biologically active materials in multilamellar lipid vesicles (MLV)
US4501728A (en) 1983-01-06 1985-02-26 Technology Unlimited, Inc. Masking of liposomes from RES recognition
US4880635B1 (en) 1984-08-08 1996-07-02 Liposome Company Dehydrated liposomes
US4880635A (en) 1984-08-08 1989-11-14 The Liposome Company, Inc. Dehydrated liposomes
US4897355A (en) 1985-01-07 1990-01-30 Syntex (U.S.A.) Inc. N[ω,(ω-1)-dialkyloxy]- and N-[ω,(ω-1)-dialkenyloxy]-alk-1-yl-N,N,N-tetrasubstituted ammonium lipids and uses therefor
US5049386A (en) 1985-01-07 1991-09-17 Syntex (U.S.A.) Inc. N-ω,(ω-1)-dialkyloxy)- and N-(ω,(ω-1)-dialkenyloxy)Alk-1-YL-N,N,N-tetrasubstituted ammonium lipids and uses therefor
US4946787A (en) 1985-01-07 1990-08-07 Syntex (U.S.A.) Inc. N-(ω,(ω-1)-dialkyloxy)- and N-(ω,(ω-1)-dialkenyloxy)-alk-1-yl-N,N,N-tetrasubstituted ammonium lipids and uses therefor
US4797368A (en) 1985-03-15 1989-01-10 The United States Of America As Represented By The Department Of Health And Human Services Adeno-associated virus as eukaryotic expression vector
US4921757A (en) 1985-04-26 1990-05-01 Massachusetts Institute Of Technology System for delayed and pulsed release of biologically active substances
US4774085A (en) 1985-07-09 1988-09-27 501 Board of Regents, Univ. of Texas Pharmaceutical administration systems containing a mixture of immunomodulators
EP0264166A1 (en) 1986-04-09 1988-04-20 Genzyme Corporation Transgenic animals secreting desired proteins into milk
US4837028A (en) 1986-12-24 1989-06-06 Liposome Technology, Inc. Liposomes with enhanced circulation time
US4920016A (en) 1986-12-24 1990-04-24 Linear Technology, Inc. Liposomes with enhanced circulation time
US4906477A (en) 1987-02-09 1990-03-06 Kabushiki Kaisha Vitamin Kenkyusyo Antineoplastic agent-entrapping liposomes
US4911928A (en) 1987-03-13 1990-03-27 Micro-Pak, Inc. Paucilamellar lipid vesicles
US4873316A (en) 1987-06-23 1989-10-10 Biogen, Inc. Isolation of exogenous recombinant proteins from the milk of transgenic mammals
US4917951A (en) 1987-07-28 1990-04-17 Micro-Pak, Inc. Lipid vesicles formed of surfactants and steroids
WO1991016024A1 (en) 1990-04-19 1991-10-31 Vical, Inc. Cationic lipids for intracellular delivery of biologically active molecules
WO1991017424A1 (en) 1990-05-03 1991-11-14 Vical, Inc. Intracellular delivery of biologically active substances by means of self-assembling lipid complexes
US5173414A (en) 1990-10-30 1992-12-22 Applied Immune Sciences, Inc. Production of recombinant adeno-associated virus vectors
WO1993024641A2 (en) 1992-06-02 1993-12-09 The United States Of America, As Represented By The Secretary, Department Of Health & Human Services Adeno-associated virus with inverted terminal repeat sequences as promoter
US6607882B1 (en) 1999-01-12 2003-08-19 Sangamo Biosciences, Inc. Regulation of endogenous gene expression in cells using zinc finger proteins
US6453242B1 (en) 1999-01-12 2002-09-17 Sangamo Biosciences, Inc. Selection of sites for targeting by zinc finger proteins and methods of designing zinc finger proteins to bind to preselected sites
US6534261B1 (en) 1999-01-12 2003-03-18 Sangamo Biosciences, Inc. Regulation of endogenous gene expression in cells using zinc finger proteins
US20030087817A1 (en) 1999-01-12 2003-05-08 Sangamo Biosciences, Inc. Regulation of endogenous gene expression in cells using zinc finger proteins
US7013219B2 (en) 1999-01-12 2006-03-14 Sangamo Biosciences, Inc. Regulation of endogenous gene expression in cells using zinc finger proteins
US7163824B2 (en) 1999-01-12 2007-01-16 Sangamo Biosciences, Inc. Regulation of endogenous gene expression in cells using zinc finger proteins
US6824978B1 (en) 1999-01-12 2004-11-30 Sangamo Biosciences, Inc. Regulation of endogenous gene expression in cells using zinc finger proteins
US6933113B2 (en) 1999-01-12 2005-08-23 Sangamo Biosciences, Inc. Modulation of endogenous gene expression in cells
US6979539B2 (en) 1999-01-12 2005-12-27 Sangamo Biosciences, Inc. Regulation of endogenous gene expression in cells using zinc finger proteins
US6599692B1 (en) 1999-09-14 2003-07-29 Sangamo Bioscience, Inc. Functional genomics using zinc finger proteins
WO2001038547A2 (en) 1999-11-24 2001-05-31 Mcs Micro Carrier Systems Gmbh Polypeptides comprising multimers of nuclear localization signals or of protein transduction domains and their use for transferring molecules into cells
US6503717B2 (en) 1999-12-06 2003-01-07 Sangamo Biosciences, Inc. Methods of using randomized libraries of zinc finger proteins for the identification of gene function
US6689558B2 (en) 2000-02-08 2004-02-10 Sangamo Biosciences, Inc. Cells for drug discovery
US20070015238A1 (en) 2002-06-05 2007-01-18 Snyder Richard O Production of pseudotyped recombinant AAV virions
US20120322861A1 (en) 2007-02-23 2012-12-20 Barry John Byrne Compositions and Methods for Treating Diseases
WO2010028347A2 (en) 2008-09-05 2010-03-11 President & Fellows Of Harvard College Continuous directed evolution of proteins and nucleic acids
US9771574B2 (en) 2008-09-05 2017-09-26 President And Fellows Of Harvard College Apparatus for continuous directed evolution of proteins and nucleic acids
US9023594B2 (en) 2008-09-05 2015-05-05 President And Fellows Of Harvard College Continuous directed evolution of proteins and nucleic acids
US20110059502A1 (en) 2009-09-07 2011-03-10 Chalasani Sreekanth H Multiple domain proteins
WO2011053982A2 (en) 2009-11-02 2011-05-05 University Of Washington Therapeutic nuclease compositions and methods
US9405700B2 (en) 2010-11-04 2016-08-02 Sonics, Inc. Methods and apparatus for virtualization in an integrated circuit
US9394537B2 (en) 2010-12-22 2016-07-19 President And Fellows Of Harvard College Continuous directed evolution
WO2012088381A2 (en) 2010-12-22 2012-06-28 President And Fellows Of Harvard College Continuous directed evolution
US20130345064A1 (en) 2010-12-22 2013-12-26 President And Fellows Of Harvard College Continuous directed evolution
US8871445B2 (en) 2012-12-12 2014-10-28 The Broad Institute Inc. CRISPR-Cas component systems, methods and compositions for sequence manipulation
US9737604B2 (en) 2013-09-06 2017-08-22 President And Fellows Of Harvard College Use of cationic lipids to deliver CAS9
US9340799B2 (en) 2013-09-06 2016-05-17 President And Fellows Of Harvard College MRNA-sensing switchable gRNAs
WO2015035136A2 (en) 2013-09-06 2015-03-12 President And Fellows Of Harvard College Delivery system for functional nucleases
US20150166980A1 (en) 2013-12-12 2015-06-18 President And Fellows Of Harvard College Fusions of cas9 domains and nucleic acid-editing domains
US20150166981A1 (en) 2013-12-12 2015-06-18 President And Fellows Of Harvard College Methods for nucleic acid editing
US9840699B2 (en) 2013-12-12 2017-12-12 President And Fellows Of Harvard College Methods for nucleic acid editing
WO2015134121A2 (en) 2014-01-20 2015-09-11 President And Fellows Of Harvard College Negative selection and stringency modulation in continuous evolution systems
US10179911B2 (en) 2014-01-20 2019-01-15 President And Fellows Of Harvard College Negative selection and stringency modulation in continuous evolution systems
US20160348096A1 (en) 2014-01-20 2016-12-01 President And Fellows Of Harvard College Negative selection and stringency modulation in continuous evolution systems
US10077453B2 (en) 2014-07-30 2018-09-18 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US20170233708A1 (en) 2014-10-22 2017-08-17 President And Fellows Of Harvard College Evolution of proteases
WO2016168631A1 (en) 2015-04-17 2016-10-20 President And Fellows Of Harvard College Vector-based mutagenesis system
US20180087046A1 (en) 2015-04-17 2018-03-29 President And Fellows Of Harvard College Vector-based mutagenesis system
WO2016205764A1 (en) 2015-06-18 2016-12-22 The Broad Institute Inc. Novel crispr enzymes and systems
US20170044520A1 (en) 2015-07-22 2017-02-16 President And Fellows Of Harvard College Evolution of site-specific recombinases
US10167457B2 (en) 2015-10-23 2019-01-01 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US20170121693A1 (en) 2015-10-23 2017-05-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
WO2017070633A2 (en) 2015-10-23 2017-04-27 President And Fellows Of Harvard College Evolved cas9 proteins for gene editing
WO2017070632A2 (en) 2015-10-23 2017-04-27 President And Fellows Of Harvard College Nucleobase editors and uses thereof
WO2017208247A1 (en) * 2016-06-02 2017-12-07 Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. Assay for the removal of methyl-cytosine residues from dna
US10113163B2 (en) 2016-08-03 2018-10-30 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
WO2018027078A1 (en) 2016-08-03 2018-02-08 President And Fellows Of Harard College Adenosine nucleobase editors and uses thereof
US20180073012A1 (en) 2016-08-03 2018-03-15 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
WO2018071868A1 (en) 2016-10-14 2018-04-19 President And Fellows Of Harvard College Aav delivery of nucleobase editors
US20180127780A1 (en) 2016-10-14 2018-05-10 President And Fellows Of Harvard College Aav delivery of nucleobase editors
WO2018152197A1 (en) * 2017-02-15 2018-08-23 Massachusetts Institute Of Technology Dna writers, molecular recorders and uses thereof
WO2018176009A1 (en) 2017-03-23 2018-09-27 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable dna binding proteins
WO2019023680A1 (en) 2017-07-28 2019-01-31 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (pace)
WO2019079347A1 (en) 2017-10-16 2019-04-25 The Broad Institute, Inc. Uses of adenosine base editors
WO2019226593A1 (en) 2018-05-24 2019-11-28 Aqua-Aerobic Systems, Inc. System and method of solids conditioning in a filtration system
WO2019241649A1 (en) 2018-06-14 2019-12-19 President And Fellows Of Harvard College Evolution of cytidine deaminases

Non-Patent Citations (129)

* Cited by examiner, † Cited by third party
Title
A. R. GRUBER ET AL., CELL, vol. 106, no. 1, 2008, pages 23 - 24
ABUDAYYEH ET AL.: "C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector", SCIENCE, vol. 353, no. 6299, 5 August 2016 (2016-08-05), XP055407082, DOI: 10.1126/science.aaf5573
AHMAD ET AL., CANCER RES., vol. 52, 1992, pages 4817 - 4820
AMRANN ET AL., GENE, vol. 69, 1988, pages 301 - 315
ANDERSON, SCIENCE, vol. 256, 1992, pages 808 - 813
AURICCHIO ET AL., HUM. MOLEC. GENET., vol. 10, 2001, pages 3075 - 3081
AUTIERIAGRAWAL, J. BIOL. CHEM., vol. 273, 1998, pages 14731 - 37
BADRAN, A.H.LIU, D.R.: "In vivo continuous directed evolution", CURR. OPIN. CHEM. BIOL., vol. 24, 2015, pages 1 - 10, XP055350566, DOI: 10.1016/j.cbpa.2014.09.040
BANEIJEE, A.SANTOS, W. L.VERDINE, G. L.: "Structure of a DNA glycosylase searching for lesions", SCIENCE, vol. 311, 2006, pages 1153 - 1157
BLAESE ET AL., CANCER GENE THER., vol. 2, 1995, pages 291 - 297
BRINER AE ET AL.: "Guide RNA functional modules direct Cas9 activity and orthogonality", MOL CELL, vol. 56, 2014, pages 333 - 339, XP055376599, DOI: 10.1016/j.molcel.2014.09.019
BRUTLAG ET AL., COMP. APP. BIOSCI., vol. 6, 1990, pages 237 - 245
BUCHSCHER ET AL., J. VIROL., vol. 66, 1992, pages 1635 - 1640
BURSTEIN ET AL.: "New CRISPR-Cas systems from uncultivated microbes", CELL RES., 21 February 2017 (2017-02-21)
BYRNERUDDLE, PROC. NATL. ACAD. SCI. USA, vol. 86, 1989, pages 5473 - 5477
CALAMEEATON, ADV. IMMUNOL., vol. 43, 1988, pages 235 - 275
CAMPESTILGHMAN, GENES DEV., vol. 3, 1989, pages 537 - 546
CAVUZIC, V.LIU, Y.: "Biosynthesis of Sulfur-Containing tRNA Modifications: A Comparison of Bacterial, Archaeal, and Eukaryotic Pathways", BIOMOLECULES, vol. 7, 2017, pages 27
CHANG, W.-C. ET AL.: "Mechanistic Investigation of a Non-Heme Iron Enzyme Catalyzed Epoxidation in (-)-4'-Methoxycyclopenin Biosynthesis", J. AM. CHEM. SOC., vol. 138, no. 33, 2016, pages 10390 - 10393
CHO SW ET AL.: "Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease", NATURE BIOTECHNOLOGY, vol. 31, 2013, pages 230 - 232
CHUAI, G. ET AL.: "DeepCRISPR: optimized CRISPR guide RNA design by deep learning", GENOME BIOL., vol. 19, no. 80, 2018
CHYLINSKI, RHUNCHARPENTIER: "The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems", RNA BIOLOGY, vol. 10, no. 5, 2013, pages 726 - 737, XP055116068, DOI: 10.4161/rna.24321
COFFIN ET AL.: "Retroviruses", 1997, CSHL PRESS
CONG L ET AL.: "Multiplex genome engineering using CRIPSR/Cas systems", SCIENCE, vol. 339, 2013, pages 819 - 823
CONG, L. ET AL.: "Multiplex genome engineering using CRISPR/Cas systems", SCIENCE, vol. 339, 2013, pages 819 - 823, XP055458249, DOI: 10.1126/science.1231143
COON, M. J.: "Cytochrome P450: nature's most versatile biological catalyst", ANNU. REV. PHARMACOL. TAXICOL., vol. 45, 2005, pages 1 - 25, XP002545171, DOI: 10.1146/ANNUREV.PHARMTOX.45.120403.100030
CRYSTAL, SCIENCE, vol. 270, 1995, pages 404 - 410
DELTCHEVA E.CHYLINSKI K.SHARMA C.M.GONZALES K.CHAO Y.PIRZADA Z.A.ECKERT M.R.VOGEL J.CHARPENTIER E.: "CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III", NATURE, vol. 471, 2011, pages 602 - 607, XP055619637, DOI: 10.1038/nature09886
DICARLO, J.E. ET AL.: "Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems", NUCLEIC ACID RES., 2013
DICKINSON, B.C.PACKER, M.S.BADRAN, A.H.LIU, D.R.: "A system for the continuous directed evolution of proteases rapidly reveals drug-resistance mutations", NAT. COMMUN., vol. 5, 2014, pages 5352
DUAN ET AL., J. VIROL., vol. 75, 2001, pages 7662 - 7671
EAST-SELETSKY ET AL.: "Two distinct RNase activities of CRISPR-C2c2 enable guide-RNA processing and RNA detection", NATURE, vol. 538, no. 7624, 13 October 2016 (2016-10-13), pages 270 - 273, XP055407060, DOI: 10.1038/nature19802
EDLUND ET AL., SCIENCE, vol. 230, 1985, pages 912 - 916
ELIZABETH KUTTERALEXANDER SULAKVELIDZE: "Bacteriophages: Biology and Applications", December 2004, CRC PRESS
ESWARAMOORTHY, S. ET AL.: "Mechanism of action of a flavin-containing monooxygenase", PROC. NATL. ACAD. SCI., vol. 103, no. 26, 2006, pages 9832 - 9837
FALNES, P. 0.ROGNES, T.: "DNA repair by bacterial AlkB proteins", RES. MICROBIOL., vol. 154, no. 8, 2003, pages 531 - 538
FERRETTIMCSHAN W.M.AJDIC D.J.SAVIC D.J.SAVIC G.LYON K.PRIMEAUX C.SEZATE S.SUVOROV A.N.KENTON S.: "Complete genome sequence of an M 1 strain of Streptococcus pyogenes", PROC. NATL. ACAD. SCI. U.S.A., vol. 98, 2001, pages 4658 - 4663
FORTINI, P. ET AL.: "8-Oxoguanine DNA damage: at the crossroad of alternative repair pathways", MUTAT. RES., vol. 531, no. 1-2, 2003, pages 127 - 39, XP001182325, DOI: 10.1016/j.mrfmmm.2003.07.004
GAO ET AL., GENE THERAPY, vol. 2, 1995, pages 710 - 722
GAO ET AL., NAT BIOTECHNOL., vol. 34, no. 7, 2016, pages 768 - 73
GAO ET AL., NAT BIOTECHNOL., vol. 34, no. 7, July 2016 (2016-07-01), pages 768 - 73
GAO ET AL.: "DNA-guided genome editing using the Natronobacterium gregoryi Argonaute", NATURE BIOTECHNOLOGY, vol. 34, no. 7, 2016, pages 768 - 73, XP055518128, DOI: 10.1038/nbt.3547
GAUDELLI, N. M. ET AL.: "Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage", NATURE, vol. 551, 2017, pages 464 - 471
GAUDELLI, N.M. ET AL.: "Programmable base editing of A:T to G:C in genomic DNA without DNA cleavage", NATURE, vol. 551, 2017, pages 464 - 471
HALBERT ET AL., J. VIROL., vol. 74, 2000, pages 1524 - 1532
HERMONATMUZYCZKA, PNAS, vol. 81, 1984, pages 6466 - 6470
HUANG, T.P. ET AL.: "Circularly permuted and PAM-modified Cas9 variants broaden the targeting scope of base editors", NAT. BIOTECHNOL., vol. 37, 2019, pages 626 - 631, XP036900674, DOI: 10.1038/s41587-019-0134-y
HUBBARD, B.P. ET AL.: "Continuous directed evolution of DNA-binding proteins to improve TALEN specificity", NAT. METHODS, vol. 12, 2015, pages 939 - 942, XP055548970, DOI: 10.1038/nmeth.3515
HWANG, W.Y. ET AL.: "Efficient genome editing in zebrafish using a CRISPR-Cas system", NATURE BIOTECHNOLOGY, vol. 31, 2013, pages 227 - 229, XP055086625, DOI: 10.1038/nbt.2501
ITO, S. ET AL.: "Human NAT 10 Is an ATP-dependent RNA Acetyltransferase Responsible for N4-Acetylcytidine Formation in 18 S Ribosomal RNA (rRNA", J. BIOL. CHEM., vol. 289, 2014, pages 35724 - 35730
ITO, S. ET AL.: "Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine", SCIENCE, vol. 333, no. 6047, 2011, pages 1300 - 1303, XP055101432, DOI: 10.1126/science.1210597
JAKIMO ET AL.: "A Cas9 with Complete PAM Recognition for Adenine Dinucleotides", BIORXIV, September 2018 (2018-09-01)
JIANG, W. ET AL.: "RNA-guided editing of bacterial genomes using CRISPR-Cas systems", NATURE BIOTECHNOLOGY, vol. 31, 2013, pages 233 - 239, XP055249123, DOI: 10.1038/nbt.2508
JINEK M.CHYLINSKI K.FONFARA I.HAUER M.DOUDNA J.A.CHARPENTIER E.: "A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity", SCIENCE, vol. 337, 2012, pages 816 - 821, XP055549487, DOI: 10.1126/science.1225829
JINEK, M. ET AL.: "RNA-programmed genome editing in human cells", ELIFE, vol. 2, 2013, pages e00471, XP002699851, DOI: 10.7554/eLife.00471
KAMIYA, H. ET AL.: "8-Hydroxyadenine (7,8-dihydro-8-oxoadenine) induces misincorporation in in vitro DNA synthesis and mutations in NIH 3T3 cells", NUCLEIC ACIDS RES., vol. 23, no. 15, 1995, pages 2893 - 2895
KAUFMAN ET AL., EMBO J., vol. 6, 1987, pages 187 - 195
KAYA ET AL.: "A bacterial Argonaute with noncanonical guide RNA specificity", PROC NATL ACAD SCI U S A., vol. 113, no. 15, 12 April 2016 (2016-04-12), pages 4057 - 62, XP055482683, DOI: 10.1073/pnas.1524385113
KESSELGRUSS, SCIENCE, vol. 249, 1990, pages 374 - 379
KLEINSTIVER, B. P. ET AL.: "Broadening the targeting range of Staphylococcus aureus CRISPR-Cas9 by modifying PAM recognition", NATURE BIOTECHNOLOGY, vol. 33, 2015, pages 1293 - 1298, XP055309933, DOI: 10.1038/nbt.3404
KLEINSTIVER, B. P. ET AL.: "Engineered CRISPR-Cas9 nucleases with altered PAM specificities", NATURE, vol. 523, 2015, pages 481 - 485, XP055293257, DOI: 10.1038/nature14592
KOMOR, A. C.BADRAN, A. H.LIU, D. R.: "CRISPR-Based Technologies for the Manipulation of Eukaryotic Genomes", CELL, vol. 168, 2017, pages 20 - 36, XP002781814, DOI: 10.1016/j.cell.2016.10.044
KOMOR, A.C. ET AL.: "Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity", SCI ADV, vol. 3, 2017, XP055453964, DOI: 10.1126/sciadv.aao4774
KOMOR, A.C. ET AL.: "Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage", NATURE, vol. 533, 2016, pages 420 - 424, XP055551781, DOI: 10.1038/nature17946
KOTIN, HUMAN GENE THERAPY, vol. 5, 1994, pages 793 - 801
KREMERPERRICAUDET, BRITISH MEDICAL BULLETIN, vol. 51, no. 1, 1995, pages 31 - 44
KUIJANHERSKOWITZ, CELL, vol. 30, 1982, pages 933 - 943
LANDRUM, M.J. ET AL.: "ClinVar: public archive of relationships among sequence variation and human phenotype", NUCLEIC ACIDS RES., vol. 42, 2014, pages D980 - 985
LEONARD, G. A. ET AL.: "Conformation of guanine-8-oxoadenine base pairs in the crystal structure of d(CGCGAATT(08A)GCG", BIOCHEM., vol. 31, no. 36, 1992, pages 8415 - 8420
LI JF ET AL.: "Multiplex and homologous recombination-mediated genome editing in Arabidopsis and Nicotiana benthamiana using guide RNA and Cas9", NATURE BIOTECHNOLOGY, vol. 31, 2013, pages 688 - 691, XP055129103, DOI: 10.1038/nbt.2654
LIU ET AL., CELL DISCOVERY, vol. 5, 2019, pages 58
LIU ET AL.: "C2cl-sgRNA Complex Structure Reveals RNA-Guided DNA Cleavage Mechanism", MOL. CELL, vol. 65, no. 2, 19 January 2017 (2017-01-19), pages 310 - 322, XP029890333, DOI: 10.1016/j.molcel.2016.11.040
LIU ET AL.: "CasX enzymes comprises a distinct family of RNA-guided genome editors", NATURE, vol. 566, 2019, pages 218 - 223
LUCKLOWSUMMERS, VIROLOGY, vol. 170, 1989, pages 6.3.1 - 6.3.6,2.10.3
MAGIN ET AL., VIROLOGY, vol. 274, 2000, pages 11 - 16
MAKAROVA ET AL.: "C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector", SCIENCE, vol. 353, no. 6299, 2016, XP055407082, DOI: 10.1126/science.aaf5573
MAKAROVA K. ET AL.: "Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements", BIOL DIRECT., vol. 4, 25 August 2009 (2009-08-25), pages 29, XP021059840, DOI: 10.1186/1745-6150-4-29
MALI PESVELT KMCHURCH GM: "Cas9 as a versatile tool for engineering biology", NATURE METHODS, vol. 10, 2013, pages 957 - 963, XP002718606, DOI: 10.1038/nmeth.2649
MALI, P. ET AL.: "RNA-guided human genome engineering via Cas9", SCIENCE, vol. 339, 2013, pages 823 - 826, XP055469277, DOI: 10.1126/science.1232033
MARTHA R. J. CLOKIEANDREW M. KROPINSKI: "Bacteriophages: Methods and Protocols", vol. 2, December 2008, HUMANA PRESS, article "Isolation, Characterization, and Interactions (Methods in Molecular Biology"
MILLER ET AL., J. VIROL., vol. 65, 1991, pages 2220 - 2224
MILLER, NATURE, vol. 357, 1992, pages 455 - 460
MITANICASKEY, TIBTECH, vol. 11, 1993, pages 167 - 175
MOEDE ET AL., FEBS LETT., vol. 461, 1999, pages 229 - 34
MOL THER., vol. 20, no. 4, April 2012 (2012-04-01), pages 699 - 708
MUZYCZKA, J. CLIN. INVEST., vol. 94, 1994, pages 1351
NAKAMURA, Y. ET AL.: "Codon usage tabulated from the international DNA sequence databases: status for the year 2000", NUCL. ACIDS RES., vol. 28, 2000, pages 292, XP002941557, DOI: 10.1093/nar/28.1.292
NISHIMASU ET AL.: "Crystal structure of Cas9 in complex with guide RNA and target DNA", CELL, vol. 156, no. 5, pages 935 - 949, XP028667665, DOI: 10.1016/j.cell.2014.02.001
NORMAN, D. P.CHUNG, S. J.VERDINE, G. L.: "Structural and biochemical exploration of a critical amino acid in human 8-oxo-guanine glycosylase", BIOCHEMISTRY, vol. 42, 2003, pages 1564 - 1572
OAKES ET AL.: "CRISPR-Cas9 Circular Permutants as Programmable Scaffolds for Genome Modification", CELL, vol. 176, 10 January 2019 (2019-01-10), pages 254 - 267
OAKES ET AL.: "Protein Engineering of Cas9 for enhanced function", METHODS ENZYMOL, vol. 546, 2014, pages 491 - 511, XP008176614, DOI: 10.1016/B978-0-12-801185-0.00024-6
OHE, T.WATANABE, Y.: "Purification and Properties of Xanthine Dehydrogenase from Streptomyces cyanogenus", J. BIOCHEM., vol. 86, 1979, pages 45 - 53
PA CARRGM CHURCH, NATURE BIOTECHNOLOGY, vol. 27, no. 12, 2009, pages 1151 - 62
PINKERT ET AL., GENES DEV., vol. 1, 1987, pages 268 - 277
QI ET AL., CELL, vol. 152, no. 5, 2013, pages 1173 - 83
QUEENBALTIMORE, CELL, vol. 33, 1983, pages 741 - 748
RASHIDI, M. R.SOLTANI, S.: "An overview of aldehyde oxidase: an enzyme of emerging importance in novel drug discovery", EXPERT OPIN. DRUG DISCOV., vol. 12, no. 3, 2017, pages 305 - 316
REES, H.A. ET AL.: "Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery", NAT. COMMUN., vol. 8, 2017, pages 15790, XP055597104, DOI: 10.1038/ncomms15790
REESLIU: "Base editing: precision chemistry on the genome and transcriptome of living cells", NAT REV GENET., vol. 19, no. 12, 2018, pages 770 - 788, XP036637441, DOI: 10.1038/s41576-018-0068-0
REMY ET AL., BIOCONJUGATE CHEM., vol. 5, 1994, pages 647 - 654
SALADINO, R. ET AL.: "A new and efficient synthesis of 8-hydroxypurine derivatives by dimethyldioxirane oxidation", TET. LETT., vol. 36, 1995, pages 2665 - 2668, XP004028277, DOI: 10.1016/0040-4039(95)00328-A
SAMULSKI ET AL., J. VIROL., vol. 63, 1989, pages 03822 - 3828
SCHULTZ ET AL., GENE, vol. 54, 1987, pages 113 - 123
SEED, NATURE, vol. 329, 1987, pages 840
SHMAKOV ET AL.: "Discovery and Functional Characterization of Diverse Class 2 CRISPR Cas Systems", MOL. CELL, vol. 60, no. 3, 5 November 2015 (2015-11-05), pages 385 - 397, XP055482679, DOI: 10.1016/j.molcel.2015.10.008
SMITH ET AL., MOL. CELL. BIOL., vol. 3, 1983, pages 2156 - 2165
SOMMNERFELT ET AL., VIROL., vol. 176, 1990, pages 58 - 59
SUZUKI T. ET AL.: "Crystal structures reveal an elusive functional domain of pyrrolysyl-tRNA synthetase", NAT CHEM BIOL., vol. 13, no. 12, 2017, pages 1261 - 1266
SWARTS ET AL., NATURE, vol. 507, no. 7491, 2014, pages 258 - 61
SWARTS ET AL., NUCLEIC ACIDS RES., vol. 43, no. 10, 2015, pages 5120 - 9
TAN, X.GROLLMAN, A. P.SHIBUTANI, S.: "Comparison of the mutagenic properties of 8-oxo-7,8-dihydro-2'-deoxyadenosine and 8-oxo-7,8-dihydro-2'-deoxyguanosine DNA lesions in mammalian cells", CARCINOGENESIS, vol. 20, no. 12, 1999, pages 2287 - 2292
THURONYI, B.W. ET AL.: "Continuous evolution of base editors with expanded target compatibility and improved activity", NAT. BIOTECHNOL., 2019, pages 1070 - 1079, XP036878165, DOI: 10.1038/s41587-019-0193-0
TINLAND ET AL., PROC. NATL. ACAD. SCI. U.S.A., vol. 89, 1992, pages 7442 - 46
TRATSCHIN ET AL., MOL. CELL. BIOL., vol. 4, 1984, pages 2072 - 2081
TRATSCHIN ET AL., MOL. CELL. BIOL., vol. 5, 1985, pages 3251 - 3260
TSAI, S. Q. ET AL.: "GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases", NATURE BIOTECHNOLOGY, vol. 33, 2015, pages 187 - 197, XP055555627, DOI: 10.1038/nbt.3117
VAN BRUNT, BIOTECHNOLOGY, vol. 6, no. 10, 1988, pages 1149 - 1154
VIDALLEGRAIN: "Yeast n-hybrid review", NUCLEIC ACID RES., vol. 27, 1999, pages 919
VIGNE, RESTORATIVE NEUROLOGY AND NEUROSCIENCE, vol. 8, 1995, pages 35 - 36
WANG, T.BADRAN, A.H.HUANG, T.P.LIU, D.R.: "Continuous directed evolution of proteins with improved soluble expression", NAT. CHEM. BIOL., vol. 14, 2018, pages 972 - 980, XP036592855, DOI: 10.1038/s41589-018-0121-5
WEST ET AL., VIROLOGY, vol. 160, 1987, pages 38 - 47
WINOTOBALTIMORE, EMBO J., vol. 8, 1989, pages 729 - 733
YAMANO ET AL.: "Crystal structure of Cpfl in complex with guide RNA and target DNA", CELL, vol. 165, 2016, pages 949 - 962
YANG ET AL.: "PAM-dependent Target DNA Recognition and Cleavage by C2C1 CRISPR-Cas endonuclease", CELL, vol. 167, no. 7, 15 December 2016 (2016-12-15), pages 1814 - 1828, XP029850724, DOI: 10.1016/j.cell.2016.11.053
YU ET AL., GENE THERAPY, vol. 1, 1994, pages 13 - 26
ZETSCHE ET AL., CELL, vol. 163, 2015, pages 759 - 771
ZHANG Y. P. ET AL., GENE THER., vol. 6, 1999, pages 1438 - 47
ZOLOTUKHIN ET AL.: "Production and purification of serotype 1,2, and 5 recombinant adeno-associated viral vectors", METHODS, vol. 28, 2002, pages 158 - 167, XP002256404, DOI: 10.1016/S1046-2023(02)00220-7
ZUKERSTIEGLER, NUCLEIC ACIDS RES., vol. 9, 1981, pages 133 - 148

Cited By (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11920181B2 (en) 2013-08-09 2024-03-05 President And Fellows Of Harvard College Nuclease profiling system
US11299755B2 (en) 2013-09-06 2022-04-12 President And Fellows Of Harvard College Switchable CAS9 nucleases and uses thereof
US11053481B2 (en) 2013-12-12 2021-07-06 President And Fellows Of Harvard College Fusions of Cas9 domains and nucleic acid-editing domains
US11124782B2 (en) 2013-12-12 2021-09-21 President And Fellows Of Harvard College Cas variants for gene editing
US11578343B2 (en) 2014-07-30 2023-02-14 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US11214780B2 (en) 2015-10-23 2022-01-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US11702651B2 (en) 2016-08-03 2023-07-18 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US11820969B2 (en) 2016-12-23 2023-11-21 President And Fellows Of Harvard College Editing of CCR2 receptor gene to protect against HIV infection
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11932884B2 (en) 2017-08-30 2024-03-19 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11643652B2 (en) 2019-03-19 2023-05-09 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11795452B2 (en) 2019-03-19 2023-10-24 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
WO2021108717A2 (en) 2019-11-26 2021-06-03 The Broad Institute, Inc Systems and methods for evaluating cas9-independent off-target editing of nucleic acids
US11746352B2 (en) 2019-12-30 2023-09-05 Eligo Bioscience Microbiome modulation of a host by delivery of DNA payloads with minimized spread
US11584781B2 (en) 2019-12-30 2023-02-21 Eligo Bioscience Chimeric receptor binding proteins resistant to proteolytic degradation
WO2021158921A2 (en) 2020-02-05 2021-08-12 The Broad Institute, Inc. Adenine base editors and uses thereof
US11376286B2 (en) 2020-04-08 2022-07-05 Eligo Bioscience Modulation of microbiota function by gene therapy of the microbiome to prevent, treat or cure microbiome-associated diseases or disorders
US11534467B2 (en) 2020-04-08 2022-12-27 Eligo Bioscience Modulation of microbiota function by gene therapy of the microbiome to prevent, treat or cure microbiome-associated diseases or disorders
US11617773B2 (en) 2020-04-08 2023-04-04 Eligo Bioscience Elimination of colonic bacterial driving lethal inflammatory cardiomyopathy
US11224621B2 (en) 2020-04-08 2022-01-18 Eligo Bioscience Modulation of microbiota function by gene therapy of the microbiome to prevent, treat or cure microbiome-associated diseases or disorders
US11690880B2 (en) 2020-04-08 2023-07-04 Eligo Bioscience Modulation of microbiota function by gene therapy of the microbiome to prevent, treat or cure microbiome-associated diseases or disorders
WO2021222318A1 (en) 2020-04-28 2021-11-04 The Broad Institute, Inc. Targeted base editing of the ush2a gene
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
WO2021250284A1 (en) 2020-06-12 2021-12-16 Eligo Bioscience Specific decolonization of antibiotic resistant bacteria for prophylactic purposes
EP3922719A1 (en) 2020-06-12 2021-12-15 Eligo Bioscience Specific decolonization of antibiotic resistant bacteria for prophylactic purposes
WO2022003209A1 (en) 2020-07-03 2022-01-06 Eligo Bioscience Method of containment of nucleic acid vectors introduced in a microbiome population
WO2022096590A1 (en) 2020-11-04 2022-05-12 Eligo Bioscience Phage-derived particles for in situ delivery of dna payload into c. acnes population
US11820989B2 (en) 2020-11-04 2023-11-21 Eligo Bioscience Phage-derived particles for in situ delivery of DNA payload into C. acnes population
US11473093B2 (en) 2020-11-04 2022-10-18 Eligo Bioscience Cutibacterium acnes recombinant phages, method of production and uses thereof
US11840695B2 (en) 2020-11-04 2023-12-12 Eligo Bioscience Recombinant C. acnes phages comprising transgenes
WO2022096596A1 (en) 2020-11-04 2022-05-12 Eligo Bioscience Cutibacterium acnes recombinant phages, method of production and uses thereof
WO2022144381A1 (en) 2020-12-30 2022-07-07 Eligo Bioscience Microbiome modulation of a host by delivery of dna payloads with minimized spread
WO2022144382A1 (en) 2020-12-30 2022-07-07 Eligo Bioscience Chimeric receptor binding proteins resistant to proteolytic degradation
US11739304B2 (en) 2021-05-12 2023-08-29 Eligo Bioscience Production of lytic phages
WO2022238552A1 (en) 2021-05-12 2022-11-17 Eligo Bioscience Production bacterial cells and use thereof in production methods
WO2022238555A1 (en) 2021-05-12 2022-11-17 Eligo Bioscience Production of lytic phages
US11697802B2 (en) 2021-05-12 2023-07-11 Eligo Bioscience Production bacterial cells and use thereof in production methods
US11939598B2 (en) 2021-05-12 2024-03-26 Eligo Bioscience Production bacterial cells and use thereof in production methods
US11952595B2 (en) 2021-05-12 2024-04-09 Eligo Bioscience Production of lytic phages
WO2022261509A1 (en) 2021-06-11 2022-12-15 The Broad Institute, Inc. Improved cytosine to guanine base editors
WO2023196802A1 (en) 2022-04-04 2023-10-12 The Broad Institute, Inc. Cas9 variants having non-canonical pam specificities and uses thereof
WO2023212715A1 (en) 2022-04-28 2023-11-02 The Broad Institute, Inc. Aav vectors encoding base editors and uses thereof
WO2024047151A1 (en) 2022-08-31 2024-03-07 Snipr Biome Aps A novel type of crispr/cas system
US11970701B2 (en) 2023-09-28 2024-04-30 Eligo Bioscience Phage-derived particles for in situ delivery of DNA payload into C. acnes population

Similar Documents

Publication Publication Date Title
WO2020181180A1 (en) A:t to c:g base editors and uses thereof
US20220170013A1 (en) T:a to a:t base editing through adenosine methylation
US20230272425A1 (en) Methods and compositions for evolving base editors using phage-assisted continuous evolution (pace)
US20230086199A1 (en) Systems and methods for evaluating cas9-independent off-target editing of nucleic acids
US20220307003A1 (en) Adenine base editors with reduced off-target effects
WO2021030666A1 (en) Base editing by transglycosylation
US20230235309A1 (en) Adenine base editors and uses thereof
WO2020181202A1 (en) A:t to t:a base editing through adenine deamination and oxidation
WO2020181178A1 (en) T:a to a:t base editing through thymine alkylation
US20220282275A1 (en) G-to-t base editors and uses thereof
WO2020181195A1 (en) T:a to a:t base editing through adenine excision
US20220380740A1 (en) Constructs for improved hdr-dependent genomic editing
US11912985B2 (en) Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
US20230357766A1 (en) Prime editing guide rnas, compositions thereof, and methods of using the same
US11702651B2 (en) Adenosine nucleobase editors and uses thereof
US20220204975A1 (en) System for genome editing
US20230123669A1 (en) Base editor predictive algorithm and method of use
US20210198330A1 (en) Base editors and uses thereof
WO2021072328A1 (en) Methods and compositions for prime editing rna
EP4100032A1 (en) Gene editing methods for treating spinal muscular atrophy
WO2022261509A1 (en) Improved cytosine to guanine base editors
WO2023288304A2 (en) Context-specific adenine base editors and uses thereof
WO2023240137A1 (en) Evolved cas14a1 variants, compositions, and methods of making and using same in genome editing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20717012

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20717012

Country of ref document: EP

Kind code of ref document: A1