WO2021032155A1 - 一种碱基编辑***和其使用方法 - Google Patents

一种碱基编辑***和其使用方法 Download PDF

Info

Publication number
WO2021032155A1
WO2021032155A1 PCT/CN2020/110207 CN2020110207W WO2021032155A1 WO 2021032155 A1 WO2021032155 A1 WO 2021032155A1 CN 2020110207 W CN2020110207 W CN 2020110207W WO 2021032155 A1 WO2021032155 A1 WO 2021032155A1
Authority
WO
WIPO (PCT)
Prior art keywords
base editing
fusion protein
deaminase
domain
nucleic acid
Prior art date
Application number
PCT/CN2020/110207
Other languages
English (en)
French (fr)
Inventor
高彩霞
李超
Original Assignee
中国科学院遗传与发育生物学研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院遗传与发育生物学研究所 filed Critical 中国科学院遗传与发育生物学研究所
Priority to CN202080059623.XA priority Critical patent/CN114945670A/zh
Publication of WO2021032155A1 publication Critical patent/WO2021032155A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K19/00Hybrid peptides, i.e. peptides covalently bound to nucleic acids, or non-covalently bound protein-protein complexes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells

Definitions

  • the invention belongs to the field of genetic engineering. Specifically, the present invention relates to a base editing system and its use method. More specifically, the present invention relates to a base editing system and method capable of generating de novo mutations, such as saturation mutations, in situ in an organism.
  • the base editing system and method are based on the inclusion of deaminase and CRISPR.
  • the base editing fusion protein of the effector protein is based on the inclusion of deaminase and CRISPR.
  • CRISPR/Cas Clusters of regularly spaced short palindromic repeats and their related (CRISPR/Cas) systems can produce double strand breaks (DSB) at endogenous target sites.
  • DSB double strand breaks
  • CRISPR/Cas Clusters of regularly spaced short palindromic repeats and their related (CRISPR/Cas) systems can produce double strand breaks (DSB) at endogenous target sites.
  • DSB double strand breaks
  • a base editing fusion protein which comprises a nucleic acid targeting domain, a cytosine deamination domain, and an adenine deamination domain.
  • Item 2 The base editing fusion protein of Item 1, wherein the nucleic acid targeting domain comprises at least one CRISPR effector protein polypeptide.
  • the base editing fusion protein of item 2 wherein the CRISPR effector protein is Cas9 nuclease or a functional variant thereof, preferably, the CRISPR effector protein is nuclease-inactivated Cas9, more preferably, The nuclease-inactivated Cas9 includes the amino acid sequence shown in SEQ ID NO: 2, and most preferably, the nuclease-inactivated Cas9 includes the amino acid sequence shown in SEQ ID NO: 3.
  • Item 4 The base editing fusion protein of any one of items 1 to 3, wherein the cytosine deaminase domain comprises at least one cytosine deaminase polypeptide.
  • Item 5 The base editing fusion protein of item 4, wherein the cytosine deaminase is selected from the group consisting of APOBEC1 deaminase, activation-induced cytidine deaminase (AID), APOBEC3G, CDA1, human APOBEC3A deaminase, or Their functional variants.
  • cytosine deaminase is selected from the group consisting of APOBEC1 deaminase, activation-induced cytidine deaminase (AID), APOBEC3G, CDA1, human APOBEC3A deaminase, or Their functional variants.
  • Item 6 The base editing fusion protein of Item 5, wherein the cytosine deaminase is human APOBEC3A deaminase or a functional variant thereof, for example, the human APOBEC3A deaminase comprises SEQ ID NO: 4 Amino acid sequence.
  • Item 7 The base editing fusion protein of any one of items 1 to 6, wherein the adenine deaminase domain comprises at least one DNA-dependent adenine deaminase polypeptide.
  • Item 8 The base editing fusion protein of item 7, wherein the DNA-dependent adenine deaminase is derived from wild-type E. coli tRNA adenine deaminase TadA (ecTadA), for example, the DNA-dependent adenine deaminase Aminase includes the amino acid sequence shown in SEQ ID NO: 6.
  • Item 9 The base editing fusion protein of Item 7 or 8, wherein the adenine deaminase domain comprises two DNA-dependent adenine deaminase.
  • Item 10 The base editing fusion protein of item 7, wherein the adenine deaminase domain further comprises wild-type E. coli tRNA adenine deaminase TadA (ecTadA) fused to the DNA-dependent adenine deaminase
  • the DNA-dependent adenine deaminase is fused to the C-terminus of the wild-type E. coli tRNA adenine deaminase TadA (ecTadA).
  • Item 11 The base editing fusion protein of Item 7 or 8, wherein the adenine deamination domain comprises the amino acid sequence shown in SEQ ID NO: 7 or 8.
  • Item 12 The base editing fusion protein of any one of items 1-11, wherein the nucleic acid targeting domain, the cytosine deaminization domain and the adenine deaminization domain are fused via a linker, for example,
  • the linker comprises an amino acid sequence selected from SEQ ID NO: 9-11.
  • Item 13 The base editing fusion protein of any one of items 1-12, wherein the base editing fusion protein comprises from N-terminal to C-terminal in the following order: cytosine deaminization domain, adenine deaminization structure A domain and a nucleic acid targeting domain, or the base editing fusion protein comprises an adenine deamination domain, a cytosine deamination domain, and a nucleic acid targeting domain in the following order from N-terminal to C-terminal.
  • UFI uracil DNA glycosylase inhibitor
  • the enzyme inhibitor (UGI) comprises the amino acid sequence shown in SEQ ID NO: 12.
  • Item 15 The base editing fusion protein of any one of items 1-14, wherein the base editing fusion protein further comprises one or more nuclear localization sequences (NLS).
  • NLS nuclear localization sequences
  • Item 16 The base editing fusion protein of Item 1, wherein the base editing fusion protein comprises the amino acid sequence shown in any one of SEQ ID NOs: 13-19.
  • a base editing system for modifying target nucleic acid regions in the genome which comprises:
  • the at least one guide RNA is directed to at least one target sequence in the target nucleic acid region.
  • Item 18 The base editing system of Item 17, wherein the guide RNA is sgRNA, for example, the sgRNA includes the scaffold sequence shown in SEQ ID NO: 27 or SEQ ID NO: 28.
  • Item 19 The base editing system of Item 17 or 18, wherein the target sequence targeted by the guide RNA contains a PAM sequence at the 3'end, such as 5'-NGG-3' or 5'-NG-3'.
  • Item 20 The base editing system of any one of Items 17-19, wherein the at least one guide RNA is directed to a target sequence located on the sense strand and/or antisense strand in the target nucleic acid region of the cell genome.
  • Item 21 The base editing system of any one of items 17-20, wherein the nucleotide sequence encoding the base editing fusion protein is codon-optimized for the organism whose genome is to be modified.
  • Item 22 The base editing system of Item 21, wherein the base editing fusion protein is encoded by any nucleotide sequence shown in SEQ ID NO: 20-26.
  • Item 23 A method of producing at least one genetically modified cell, comprising introducing the base editing system of any one of items 17-22 into at least one of the cells, thereby resulting in a target nucleic acid region in the at least one cell One or more nucleotide substitutions within.
  • Item 24 The method of Item 23, further comprising the step of selecting cells having the desired one or more nucleotide substitutions from the at least one cell.
  • Item 25 The method of Item 23 or 24, wherein the base editing system is introduced into the cell by a method selected from the group consisting of calcium phosphate transfection, protoplast fusion, electroporation, liposome transfection, microinjection, viral infection (such as Baculovirus, vaccinia virus, adenovirus, adeno-associated virus, lentivirus and other viruses), gene bombardment, PEG-mediated transformation of protoplasts, and Agrobacterium-mediated transformation.
  • viral infection such as Baculovirus, vaccinia virus, adenovirus, adeno-associated virus, lenti
  • Item 26 The method of any one of items 23-25, wherein the cells are derived from mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens, ducks, and geese; Plants, including monocots and dicots, preferably crop plants such as wheat, rice, corn, soybean, sunflower, sorghum, rape, alfalfa, cotton, barley, millet, sugar cane, tomato, tobacco, cassava and potato.
  • mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats
  • poultry such as chickens, ducks, and geese
  • Plants including monocots and dicots, preferably crop plants such as wheat, rice, corn, soybean, sunflower, sorghum, rape, alfalfa, cotton, barley, millet, sugar cane, tomato, tobacco, cassava and potato.
  • Item 27 A method for producing a genetically modified plant, comprising introducing the base editing system of any one of items 17-22 into at least one of the plants, thereby resulting in a target nucleic acid region in the genome of the at least one plant One or more nucleotide substitutions within.
  • Item 28 The method of Item 27, further comprising selecting plants having the desired one or more nucleotide substitutions from the at least one plant.
  • Item 29 The method of Item 27 or 28, wherein the base editing system is introduced into the plant by a method selected from the group consisting of: gene bombardment, PEG-mediated transformation of protoplasts, Agrobacterium-mediated transformation, plant virus-mediated Transformation, pollen tube passage method and ovary injection method.
  • Item 30 The method of Item 28, wherein the importing is performed in the absence of selective pressure.
  • Item 31 The method of any one of items 27-30, wherein the introducing comprises transforming the base editing system into an isolated plant cell or tissue, and then regenerating the transformed plant cell or tissue into a whole plant Preferably, the regeneration is performed in the absence of selective pressure.
  • Item 32 The method of any one of items 27-30, wherein the introducing comprises transforming the base editing system into leaves, stem tips, pollen tubes, young ears, or hypocotyls on intact plants.
  • Item 33 The method of any one of items 27-30, wherein the expression construct is an in vitro transcribed RNA molecule.
  • Item 34 The method of any one of items 27 to 33, wherein the genetically modified plant does not contain an exogenous polynucleotide integrated into its genome.
  • Item 35 The method of any one of items 27 to 34, wherein the modified target nucleic acid region is related to plant traits such as agronomic traits.
  • Item 36 The method of any one of items 27 to 35, further comprising the step of screening plants for desired traits such as agronomic traits.
  • Item 37 The method of any one of items 27 to 36, further comprising obtaining progeny of the genetically modified plant.
  • a method for plant breeding comprising combining the genetically modified first plant that contains one or more nucleotide substitutions in the target nucleic acid region obtained by the method of any one of items 27-37 and does not contain the A second plant with one or more nucleotide substitutions is crossed, thereby introducing the one or more nucleotide substitutions into the second plant.
  • the genetically modified first plant has desired traits such as agronomic traits .
  • Item 39 A method for in-situ saturation mutation of an endogenous target nucleic acid region in a cell or organism to obtain a mutation of interest in the target nucleic acid region, comprising
  • the base editing system of any one of items 17-22 is introduced into the population of said cells or organisms, resulting in one or more mutations in the endogenous target nucleic acid region of the cells or organisms of said population;
  • Item 40 The method of Item 39, wherein the base editing system comprises a plurality of guide RNAs and/or a plurality of expression constructs containing nucleotide sequences encoding the plurality of guide RNAs, preferably, the plurality of The guide RNA targets different target sequences within the target nucleic acid region.
  • Item 41 The method of Item 39 or 40, wherein the plurality of guide RNAs comprise 2 to 250 or more, such as 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 75, 100, 150, 200, 250, 300 or more guide RNAs.
  • Item 42 The method of any one of items 39-41, wherein the target sequences and/or complementary sequences targeted by at least some of the plurality of guide RNAs partially overlap each other and/or are adjacent to each other.
  • Item 43 The method of any one of items 39-42, wherein the target nucleic acid region has a length of about 20bp to about 10000bp or longer, for example about 20bp, about 40bp, about 60bp, about 80bp, about 100bp, about 120bp, About 140bp, about 160bp, about 180bp, about 200bp, about 300bp, about 400bp, about 500bp, about 1000bp, about 1500bp, about 2000bp, about 3000bp, about 4000bp, about 5000bp, about 6000bp or longer; or, wherein said The target nucleic acid sequence encodes an amino acid sequence of about 5 to about 2000 amino acids in length, for example, it can encode about 5, about 10, about 15, about 20, about 25, about 30, or about 35.
  • Item 44 The method of any one of items 39 to 43, wherein the target sequence targeted by the plurality of guide RNAs substantially covers the target nucleic acid region.
  • Item 45 The method of any one of items 39-44, wherein at least a portion of the plurality of guide RNAs target the sense strand of the target nucleic acid region.
  • Item 46 The method of any one of items 39-45, wherein at least a portion of the plurality of guide RNAs target the antisense strand of the target nucleic acid region.
  • Item 47 The method of any one of items 39 to 46, wherein the plurality of guide RNAs and/or a plurality of expression constructs containing nucleotide sequences encoding the plurality of guide RNAs are each independently introduced into the cell Or a population of organisms; or the plurality of guide RNAs and/or a plurality of expression constructs containing nucleotide sequences encoding the plurality of guide RNAs are introduced into the population of cells or organisms in combination with each other.
  • Item 48 The method of any one of items 39-47, wherein the mutation is a nucleotide substitution, such as a C to T substitution, A to G substitution, G to A substitution, or T to C substitution.
  • a nucleotide substitution such as a C to T substitution, A to G substitution, G to A substitution, or T to C substitution.
  • Item 49 The method of any one of items 39-48, wherein the target nucleic acid region is located in a coding region of a protein, for example, the target nucleic acid region encodes a functionally related motif or domain of the protein.
  • Item 50 The method of Item 49, wherein the mutation in the target nucleic acid region causes an amino acid substitution in the amino acid sequence of the protein, preferably, the mutation causes a change in the function of the protein.
  • Item 51 The method of any one of items 39-50, wherein the target nucleic acid region is related to a trait of the cell or organism, for example, a mutation in the target nucleic acid region causes the trait of the cell or organism change.
  • Item 52 The method of any one of items 39-51, in step iii), screening cells or organisms with mutations of interest by screening cells or organisms with changes in traits of interest, for example, the induction
  • the trait change of interest is selected from increased growth rate, increased yield, increased nutrient content, increased cold resistance, increased drought resistance, increased insect resistance, increased disease resistance, and increased herbicide resistance.
  • Item 53 The method of any one of items 39-52, wherein the cells are derived from mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens, ducks, and geese; plants , Including monocot plants and dicot plants, preferably crop plants, including but not limited to wheat, rice, corn, soybean, sunflower, sorghum, rape, alfalfa, cotton, barley, millet, sugar cane, tomato, tobacco, cassava and potato,
  • the cell is a plant cell, more preferably a crop plant cell, more preferably a rice cell.
  • Item 54 The method of any one of items 39-52, wherein the organism is selected from mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens, ducks, and geese ; Plants, including monocots and dicots, preferably crop plants, including but not limited to wheat, rice, corn, soybean, sunflower, sorghum, rape, alfalfa, cotton, barley, millet, sugarcane, tomato, tobacco, cassava and potato,
  • the organism is a plant, more preferably a crop plant, more preferably rice.
  • Item 55 A method of treating a disease in a subject in need thereof, comprising delivering to the subject an effective amount of the base editing system of any one of items 17-22 to modify genes related to the disease.
  • Item 56 Use of the base editing system of any one of items 17-22 in the preparation of a pharmaceutical composition for treating a disease in a subject in need, wherein the base editing system is used to modify the disease Related genes.
  • Item 57 A pharmaceutical composition for treating a disease in a subject in need, comprising the base editing system of any one of items 17-22, and optionally a pharmaceutically acceptable carrier, wherein the base editing The system is used to modify genes related to the disease.
  • Item 58 The method of item 55, the use of item 56 or the pharmaceutical composition of item 57, wherein the subject is a mammal, such as a human.
  • Item 59 The method of item 55, the use of item 56 or the pharmaceutical composition of item 57, wherein the disease is selected from the group consisting of tumor, inflammation, Parkinson's disease, cardiovascular disease, Alzheimer's disease, autism, and drug composition Addiction, age-related macular degeneration, schizophrenia, genetic diseases.
  • Item 60 A kit comprising the base editing fusion protein of any one of items 1-16 and/or an expression construct containing a nucleotide sequence encoding the base editing fusion protein, or comprising item 17 -22 base editing system.
  • STEME performs base editing through the fusion of cytidine and adenine deaminase.
  • FIG. 1 pOsU3-esgRNA expression vector.
  • the present invention preferably uses esgRNA.
  • Figure 3 Editing efficiency of STEME-1, STEME-2, STEME-3 and STEME-4 in protoplasts.
  • (a) Comparison of C>T base editing frequency between A3A-PBE and four STEME constructs (n 3).
  • (b) Comparison of A>G base editing frequency between PABE-7 and four STEME constructs (n 3).
  • Untreated protoplast samples were used as controls. The values and error bars reflect the mean ⁇ s.e.m of three independent biological replicates.
  • FIG. 1 Activity and product purity of STEME-1, STEME-2, STEME-3 and STEME-4 in rice protoplasts.
  • (a)-(f) are OsAAT, OsACC, OsCDC48, OsDEP1, OsEV and OsOD target sequences, respectively.
  • Figure 5 Comparison of base editing efficiency of STEME-1, STEME-5 and STEME-6 in rice protoplasts.
  • (a) The structure of STEME-5 and STEME-6. ecTadA7.10: evolved E. coli TadA; aa, amino acid; XTEN: 16 amino acid linker.
  • (b) Comparison of C>T base editing frequency of A3A-PBE, STEME-1, STEME-5 and STEME-6 constructs.
  • STEME-NG is used for saturation de novo mutation in rice protoplasts.
  • Figure 8. Design of NG PAM targets at different rice sites.
  • STEME-NG can widely edit target sequences with NGA, NGT, NGC or NGG PAM.
  • NGG PAM and STEME-1 of untreated protoplast samples were used as controls. The values and error bars reflect the mean ⁇ s.e.m of three independent biological replicates.
  • Figure 10 The activity and product purity of STEME-NG in rice protoplasts.
  • FIG. 11 The basic editing efficiency of C>T and A>G in rice protoplasts by A3A-PBE-NG and PABE7-NG, respectively. Both A3A-PBE-NG(a) and PABE7-NG(b) have extensive capabilities to edit NGA, NGT, NGC or NGG PAM targets. Untreated protoplast samples were used as controls. The values and error bars reflect the mean ⁇ s.e.m of three independent biological replicates.
  • FIG. 12 De novo saturation mutation of key domains of OsACC protein with STEME-NG.
  • (c) The heat map shows the conversion saturation of STEME-NG on the 56 amino acids involved in (b) in rice protoplasts. Statistics are collected for silent mutation, missense mutation and nonsense mutation.
  • FIG. 13 The 168bp of the CT domain of OsACC saturation mutation by A3A-PBE-NG in rice protoplasts.
  • A3A-PBE-NG (a) and untreated control (b) are shown.
  • sgRNAs convert the same cytidine and guanosine
  • the values and error bars reflect the mean ⁇ s.e.m of three independent biological replicates.
  • the term “and/or” encompasses all combinations of items connected by the term, and should be treated as if each combination has been individually listed herein.
  • “A and/or B” encompasses “A”, “A and B”, and “B”.
  • “A, B, and/or C” encompasses "A”, “B”, “C”, “A and B”, “A and C”, “B and C”, and "A and B and C”.
  • the protein or nucleic acid may be composed of the sequence, or may have additional amino acids or cores at one or both ends of the protein or nucleic acid. Glycolic acid, but still has the activity of the present invention.
  • methionine encoded by the start codon at the N-terminus of the polypeptide will be retained under certain practical conditions (for example, when expressed in a specific expression system), but does not substantially affect the function of the polypeptide.
  • Gene as used herein not only covers chromosomal DNA present in the nucleus, but also includes organelle DNA present in the subcellular components of the cell (such as mitochondria, plastids).
  • Genetically modified organism or “genetically modified cell” means an organism or cell that contains exogenous polynucleotides or contains modified genes or expression control sequences in its genome.
  • exogenous polynucleotides can be stably integrated into the genome of organisms or cells, and inherited for successive generations.
  • the exogenous polynucleotide can be integrated into the genome alone or as part of a recombinant DNA construct.
  • the modified gene or expression control sequence is that the gene or expression control sequence in the genome of an organism or cell contains one or more deoxynucleotide substitutions, deletions and additions.
  • Form in terms of sequence means a sequence from a foreign species or, if from the same species, a sequence whose composition and/or locus has been significantly altered from its natural form through deliberate human intervention.
  • nucleic acid sequence is used interchangeably and are single-stranded or double-stranded RNA or DNA polymers, optionally containing synthetic, non-natural Or changed nucleotide bases.
  • Nucleotides are referred to by their single letter names as follows: “A” is adenosine or deoxyadenosine (respectively for RNA or DNA), “C” is cytidine or deoxycytidine, and “G” is guanosine or Deoxyguanosine, “U” means uridine, “T” means deoxythymidine, “R” means purine (A or G), “Y” means pyrimidine (C or T), “K” means G or T, “ H” means A or C or T, “D” means A, T or G, “I” means inosine, and “N” means any nucleotide.
  • Polypeptide “peptide”, and “protein” are used interchangeably in the present invention and refer to a polymer of amino acid residues.
  • the term applies to amino acid polymers in which one or more amino acid residues are artificial chemical analogs of the corresponding naturally occurring amino acids, as well as to naturally occurring amino acid polymers.
  • the terms “polypeptide”, “peptide”, “amino acid sequence” and “protein” may also include modified forms, including but not limited to glycosylation, lipid linkage, sulfation, gamma carboxylation of glutamic acid residues, hydroxyl And ADP-ribosylation.
  • expression construct refers to a vector, such as a recombinant vector, suitable for expression of a nucleotide sequence of interest in an organism.
  • “Expression” refers to the production of a functional product.
  • the expression of a nucleotide sequence may refer to the transcription of the nucleotide sequence (such as transcription to generate mRNA or functional RNA) and/or the translation of RNA into a precursor or mature protein.
  • the "expression construct" of the present invention can be a linear nucleic acid fragment, a circular plasmid, a viral vector, or, in some embodiments, can be RNA (such as mRNA) that can be translated, for example, RNA generated by in vitro transcription.
  • RNA such as mRNA
  • the "expression construct" of the present invention may contain regulatory sequences and nucleotide sequences of interest from different sources, or regulatory sequences and nucleotide sequences of interest from the same source but arranged in a way different from those normally occurring in nature.
  • regulatory sequence and “regulatory element” can be used interchangeably and refer to the upstream (5' non-coding sequence), middle or downstream (3' non-coding sequence) of the coding sequence, and affect the transcription, RNA processing or processing of the related coding sequence. Stability or translated nucleotide sequence. Regulatory sequences may include, but are not limited to, promoters, translation leader sequences, introns, and polyadenylation recognition sequences.
  • Promoter refers to a nucleic acid fragment capable of controlling the transcription of another nucleic acid fragment.
  • the promoter is a promoter capable of controlling gene transcription in a cell, regardless of whether it is derived from the cell.
  • the promoter can be a constitutive promoter or a tissue-specific promoter or a developmentally regulated promoter or an inducible promoter.
  • tissue-specific promoter and “tissue-preferred promoter” are used interchangeably, and refer to mainly but not necessarily exclusively expressed in a tissue or organ, and can also be expressed in a specific cell or cell type The promoter.
  • tissue-preferred promoter refers to a promoter whose activity is determined by developmental events.
  • inducible promoters selectively express operably linked DNA sequences in response to endogenous or exogenous stimuli (environment, hormones, chemical signals, etc.).
  • promoters include, but are not limited to, polymerase (pol) I, pol II, or pol III promoters.
  • pol I promoters include chicken RNA pol I promoter.
  • pol II promoters include, but are not limited to, cytomegalovirus immediate early (CMV) promoter, Rous sarcoma virus long terminal repeat (RSV-LTR) promoter, and simian virus 40 (SV40) immediate early promoter.
  • pol III promoters include U6 and H1 promoters.
  • An inducible promoter such as a metallothionein promoter can be used.
  • promoters include T7 phage promoter, T3 phage promoter, ⁇ -galactosidase promoter, and Sp6 phage promoter.
  • the promoter may be cauliflower mosaic virus 35S promoter, corn Ubi-1 promoter, wheat U6 promoter, rice U3 promoter, corn U3 promoter, rice actin promoter.
  • operably linked refers to the connection of regulatory elements (for example, but not limited to, promoter sequences, transcription termination sequences, etc.) to nucleic acid sequences (for example, coding sequences or open reading frames) such that the nucleotides The transcription of the sequence is controlled and regulated by the transcription control element.
  • regulatory elements for example, but not limited to, promoter sequences, transcription termination sequences, etc.
  • nucleic acid sequences for example, coding sequences or open reading frames
  • Introducing a nucleic acid molecule (such as a plasmid, linear nucleic acid fragment, RNA, etc.) or protein into an organism refers to transforming the cell of the organism with the nucleic acid or protein so that the nucleic acid or protein can function in the cell.
  • the "transformation” used in the present invention includes stable transformation and transient transformation.
  • Stable transformation refers to the introduction of exogenous nucleotide sequences into the genome, resulting in stable inheritance of the exogenous gene. Once stably transformed, the exogenous nucleic acid sequence is stably integrated into the genome of the organism and any successive generations thereof.
  • Transient transformation refers to the introduction of nucleic acid molecules or proteins into cells to perform functions without stable inheritance of foreign genes. In transient transformation, the foreign nucleic acid sequence is not integrated into the genome.
  • Proteins refer to the physiological, morphological, biochemical or physical characteristics of cells or organisms.
  • “Agronomic traits” especially refer to the measurable index parameters of crop plants, including but not limited to: leaf green, grain yield, growth rate, total biomass or accumulation rate, fresh weight at maturity, dry weight at maturity, fruit Yield, seed yield, plant total nitrogen content, fruit nitrogen content, seed nitrogen content, plant nutrient tissue nitrogen content, plant total free amino acid content, fruit free amino acid content, seed free amino acid content, plant nutrient tissue free amino acid content, plant total protein Content, fruit protein content, seed protein content, plant nutrition tissue protein content, herbicide resistance, drought resistance, nitrogen absorption, root lodging, harvest index, stem lodging, plant height, ear height, ear length, disease resistance Resistance, cold resistance, salt resistance and tiller number.
  • the present invention provides a base editing fusion protein comprising a nucleic acid targeting domain, a cytosine deaminization domain, and an adenine deaminization domain.
  • base editing fusion protein and “base editor” are used interchangeably, and refer to those that can mediate the substitution of one or more nucleotides of a target sequence in the genome in a sequence-specific manner. protein.
  • nucleic acid targeting domain refers to a domain capable of mediating the attachment of the base editing fusion protein to a specific target sequence in the genome in a sequence-specific manner (for example, through a guide RNA).
  • the nucleic acid targeting domain comprises at least one (e.g., one) CRISPR effector polypeptide.
  • CRISPR effector protein generally refers to a nuclease (CRISPR nuclease) or a functional variant thereof found in a naturally occurring CRISPR system.
  • the term covers any effector protein based on the CRISPR system that can achieve sequence-specific targeting in cells.
  • a "functional variant" in terms of CRISPR nuclease means that it retains at least the guide RNA-mediated sequence-specific targeting ability.
  • the functional variant is a nuclease-inactivated variant, that is, it lacks double-stranded nucleic acid cleavage activity.
  • CRISPR nucleases lacking double-stranded nucleic acid cleavage activity also encompass nickases, which form nicks in double-stranded nucleic acid molecules, but do not completely cut double-stranded nucleic acids.
  • the CRISPR effector protein of the present invention has nickase activity.
  • the functional variant recognizes a different PAM (proximal region sequence adjacent motif) sequence relative to the wild-type nuclease.
  • the "CRISPR effector protein” can be derived from Cas9 nuclease, including Cas9 nuclease or functional variants thereof.
  • the Cas9 nuclease may be a Cas9 nuclease from different species, such as spCas9 from S. pyogenes or SaCas9 derived from S. aureus.
  • Cas9 nuclease and Cas9 are used interchangeably herein, and refer to RNA comprising Cas9 protein or fragments thereof (for example, a protein containing the active DNA cleavage domain of Cas9 and/or the gRNA binding domain of Cas9) Guided nuclease.
  • Cas9 is a component of CRISPR/Cas (clustered regularly spaced short palindrome repeats and related systems) genome editing system, which can target and cut DNA target sequences under the guidance of guide RNA to form DNA double-strand breaks (DSB) ).
  • DSB DNA double-strand breaks
  • An exemplary amino acid sequence of wild-type spCas9 is shown in SEQ ID NO:1.
  • CRISPR effector protein can also be derived from Cpf1 nuclease, including Cpf1 nuclease or functional variants thereof.
  • the Cpf1 nuclease may be Cpf1 nuclease from different species, for example, Cpf1 nuclease from Francisella novicida U112, Acidaminococcus sp. BV3L6 and Lachnospiraceae bacterium ND2006.
  • CRISPR effector proteins can also be derived from Cas3, Cas8a, Cas5, Cas8b, Cas8c, Cas10d, Cse1, Cse2, Csy1, Csy2, Csy3, GSU0054, Cas10, Csm2, Cmr5, Cas10, Csx11, Csx10, Csf1, Csn2 , Cas4, C2c1, C2c3 or C2c2 nucleases, for example, include these nucleases or functional variants thereof.
  • the CRISPR effector protein is nuclease-inactivated Cas9.
  • the DNA cleavage domain of Cas9 nuclease is known to contain two subdomains: HNH nuclease subdomain and RuvC subdomain.
  • the HNH subdomain cleaves the strand complementary to gRNA, while the RuvC subdomain cleaves the non-complementary strand. Mutations in these subdomains can inactivate the nuclease activity of Cas9, forming "nuclease-inactivated Cas9".
  • the Cas9 inactivated by the nuclease still retains the DNA binding ability guided by gRNA.
  • the nuclease-inactivated Cas9 of the present invention can be derived from Cas9 of different species, for example, derived from S. pyogenes Cas9 (SpCas9), or derived from Staphylococcus aureus (S. aureus) Cas9 (SaCas9). ). Simultaneously mutating the HNH nuclease subdomain and RuvC subdomain of Cas9 (for example, including the mutations D10A and H840A) makes the nuclease of Cas9 inactive and becomes nuclease death Cas9 (dCas9). Mutation and inactivation of one of the subdomains can make Cas9 have nickase activity, that is, obtain Cas9 nickase (nCas9), for example, nCas9 with only mutation D10A.
  • SpCas9 S. pyogenes Cas9
  • SaCas9 Staphyloc
  • the nuclease-inactivated Cas9 variant of the present invention contains amino acid substitution D10A and/or H840A relative to wild-type Cas9, wherein the amino acid number refers to SEQ ID NO:1.
  • the nuclease-inactivated Cas9 contains the amino acid substitution D10A relative to the wild-type Cas9, wherein the amino acid number refers to SEQ ID NO:1.
  • the nuclease-inactivated Cas9 comprises the amino acid sequence shown in SEQ ID NO: 2 (nCas9(D10A)).
  • Cas9 nuclease When Cas9 nuclease is used for gene editing, it usually requires the target sequence to have a 5'-NGG-3' PAM (proximal region sequence adjacent motif) sequence at the 3'end.
  • PAM proximal region sequence adjacent motif
  • CRISPR effector proteins that recognize different PAM sequences are preferably used in the present invention, for example, functional variants of Cas9 nuclease with different PAM sequences.
  • the CRISPR effector protein is a Cas9 variant that recognizes the 5'-NG-3' of the PAM sequence.
  • the Cas9 variant that recognizes the PAM sequence 5'-NG-3' is also referred to herein as Cas9-NG.
  • Cas9-NG includes the following amino acid substitutions R1335V, L1111R, D1135V, G1218R, E1219F, A1322R, T1337R relative to wild-type Cas9, wherein the amino acid number refers to SEQ ID NO:1.
  • the CRISPR effector protein is nuclease-inactivated and recognizes the Cas9 variant of the PAM sequence 5'-NG-3'.
  • the nuclease-inactivated Cas9 variant that recognizes the PAM sequence 5'-NG-3' contains the following amino acid substitutions D10A, R1335V, L1111R, D1135V, G1218R, E1219F, A1322R, T1337R relative to wild-type Cas9 , Where the amino acid number refers to SEQ ID NO:1.
  • the nuclease-inactivated Cas9 variant that recognizes the PAM sequence 5'-NG-3' comprises the amino acid sequence shown in SEQ ID NO: 3 (nCas9-NG(D10A)).
  • cytosine deamination domain refers to a domain that can accept single-stranded DNA as a substrate and catalyze the deamination of cytidine or deoxycytidine into uracil or deoxyuracil, respectively.
  • the cytosine deaminase domain comprises at least one (eg, one or two) cytosine deaminase polypeptides.
  • the cytidine deaminization domain in the fusion protein can convert the cytidine deamination of the single-stranded DNA produced during the formation of the fusion protein-guide RNA-DNA complex into U, and then realize base mismatch repair C to T base substitution.
  • cytosine deaminase examples include, but are not limited to, for example, APOBEC1 deaminase, activation-induced cytidine deaminase (AID), APOBEC3G, CDA1, human APOBEC3A deaminase, or their functional modifications body.
  • the cytosine deaminase is human APOBEC3A deaminase or a functional variant thereof.
  • the human APOBEC3A deaminase comprises the amino acid sequence shown in SEQ ID NO:4.
  • adenine deamination domain refers to a domain that can accept single-stranded DNA as a substrate and catalyze the formation of inosine (I) from adenosine or deoxyadenosine (A).
  • the adenine deaminase domain comprises at least one (eg, one) DNA-dependent adenine deaminase polypeptide.
  • the adenine deamination domain in the fusion protein can convert the adenosine deamination of the single-stranded DNA generated during the formation of the CRISPR effector protein-guide RNA-DNA complex into inosine (I), due to DNA polymerization
  • the enzyme treats inosine (I) as guanine (G), so A to G substitution can be achieved through base mismatch repair.
  • the DNA-dependent adenine deaminase is a variant of E. coli tRNA adenine deaminase TadA (ecTadA).
  • ecTadA E. coli tRNA adenine deaminase TadA
  • An exemplary wild-type ecTadA amino acid sequence is shown in SEQ ID NO: 5.
  • the wild-type ecTadA amino acid sequence may not include the N-terminal methionine in SEQ ID NO: 5.
  • the DNA-dependent adenine deaminase comprises one or more sets of mutations selected from the following relative to wild-type ecTadA:
  • the amino acid number refers to SEQ ID NO: 5.
  • the DNA-dependent adenine deaminase contains the following mutations relative to wild-type ecTadA: W23R, H36L, R51L, S146C, K157N, A106V, D108N, P48A, L84F, H123Y, I156F, For D147Y, E155V and R152P, the amino acid number refers to SEQ ID NO: 5.
  • the DNA-dependent adenine deaminase comprises the amino acid sequence shown in SEQ ID NO:6.
  • E. coli tRNA adenine deaminase usually functions as a dimer, it is expected that two DNA-dependent adenine deaminase will form a dimer or DNA-dependent adenine deaminase and wild-type adenine The formation of dimers by deaminase can significantly increase the editing activity of fusion proteins A to G.
  • the adenine deaminase domain comprises two of the DNA-dependent adenine deaminase.
  • the adenine deaminase domain further comprises a corresponding DNA-dependent adenine deaminase (such as a DNA-dependent variant of E. coli tRNA adenine deaminase TadA) fused to Wild-type adenine deaminase (eg E. coli tRNA adenine deaminase TadA).
  • the DNA-dependent adenine deaminase (such as a DNA-dependent variant of E. coli tRNA adenine deaminase TadA) is fused to a corresponding wild-type adenine deaminase (such as E. coli The C-terminus of tRNA adenine deaminase (TadA).
  • DNA-dependent adenine deaminase e.g., a DNA-dependent variant of E. coli tRNA adenine deaminase TadA
  • DNA-dependent adenine deaminase e.g., A DNA-dependent variant of E. coli tRNA adenine deaminase TadA
  • wild-type adenine deaminase such as E. coli tRNA adenine deaminase TadA
  • the adenine deamination domain comprises the amino acid sequence shown in SEQ ID NO: 7 or 8.
  • the nucleic acid targeting domain, the cytosine deamination domain and the adenine deamination domain are fused via a linker.
  • linkers can be 1-50 long (e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or 20-25, 25-50) or more amino acids, non-functional amino acid sequences without secondary or higher structure.
  • the linker may be a flexible linker, such as GGGGS, GS, GAP, (GGGGS)x3, GGS and (GGS)x7, etc.
  • the linker is 32 amino acids long, for example, the linker comprises the amino acid sequence shown in SEQ ID NO:9.
  • the linker is 48 amino acids long, for example, the linker comprises the amino acid sequence shown in SEQ ID NO:10.
  • the linker is an XTEN linker comprising the amino acid sequence shown in SEQ ID NO: 11.
  • the base editing fusion protein comprises the following order from the N-terminus to the C-terminus: a cytosine deamination domain, an adenine deamination domain, and a nucleic acid targeting domain. In some embodiments, the base editing fusion protein comprises in the following order from the N-terminal to the C-terminal: an adenine deamination domain, a cytosine deamination domain, and a nucleic acid targeting domain.
  • uracil DNA glycosylase catalyzes the removal of U from DNA and initiates base excision repair (BER), resulting in the repair of U:G to C:G. Therefore, without being limited by any theory, the combination of the base editing fusion protein of the present invention and Uracil DNA Glycosylase Inhibitor (UGI) will increase the efficiency of C to T base editing.
  • Uracil DNA Glycosylase Inhibitor Uracil DNA Glycosylase Inhibitor
  • the base editing fusion protein is co-expressed with uracil DNA glycosylase inhibitor (UGI).
  • UFI uracil DNA glycosylase inhibitor
  • the base editing fusion protein further comprises Uracil DNA Glycosylase Inhibitor (UGI).
  • UBI Uracil DNA Glycosylase Inhibitor
  • UGI is connected to other parts of the base editing fusion protein through a linker.
  • UGI is connected to other parts of the base editing fusion protein through a "self-cleaving peptide”.
  • self-cleaving peptide means a peptide that can achieve self-cleavage within a cell.
  • the self-cleaving peptide may include a protease recognition site, so that it can be recognized and specifically cleaved by the protease in the cell.
  • the self-cleaving peptide may be a 2A polypeptide.
  • the 2A polypeptide is a type of short peptide derived from viruses, and its self-cleavage occurs during translation. When 2A polypeptide is used to connect two different target polypeptides and expressed in the same reading frame, the two target polypeptides are almost produced at a ratio of 1:1.
  • 2A polypeptides can be P2A from porcine techovirus-1, T2A from Thosea asignis virus, E2A from equine rhinitis A virus And F2A from foot-and-mouth disease virus.
  • T2A porcine techovirus-1
  • E2A from equine rhinitis A virus
  • F2A foot-and-mouth disease virus.
  • a variety of functional variants of these 2A polypeptides are also known in the art, and these variants can also be used in the present invention.
  • the self-cleavable peptide does not exist between or within the nucleic acid targeting domain, the cytosine deamination domain and the adenine deamination domain.
  • UGI is located at the N-terminus or C-terminus of the base editing fusion protein, preferably the C-terminus.
  • the uracil DNA glycosylase inhibitor comprises the amino acid sequence shown in SEQ ID NO: 12.
  • the fusion protein of the present invention may also include a nuclear localization sequence (NLS).
  • NLS nuclear localization sequence
  • one or more NLS in the fusion protein should have sufficient strength to drive the accumulation of the fusion protein in an amount that can achieve its base editing function in the nucleus of the cell.
  • the strength of nuclear localization activity is determined by the number and location of NLS in the fusion protein, one or more specific NLS used, or a combination of these factors.
  • the NLS of the fusion protein of the present invention may be located at the N-terminal and/or C-terminal. In some embodiments of the present invention, the NLS of the fusion protein of the present invention may be located between the adenine deamination domain, cytosine deamination domain, nucleic acid targeting domain and/or UGI. In some embodiments, the fusion protein comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLS. In some embodiments, the fusion protein comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLS at or near the N-terminus. In some embodiments, the fusion protein comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLS at or near the C-terminus.
  • the polypeptide includes a combination of these, such as one or more NLS at the N-terminus and one or more NLS at the C-terminus. When there is more than one NLS, each one can be selected as not dependent on the other NLS.
  • NLS consists of one or more short sequences of positively charged lysine or arginine exposed on the surface of the protein, but other types of NLS are also known.
  • Non-limiting examples of NLS include: KKRKV, PKKKRKV, or KRPAATKKAGQAKKKK.
  • the fusion protein of the present invention may also include other positioning sequences, such as cytoplasmic positioning sequences, chloroplast positioning sequences, mitochondrial positioning sequences and the like.
  • the base editing fusion protein comprises the amino acid sequence shown in any one of SEQ ID NO: 13-19.
  • the present invention provides a base editing system for modifying target nucleic acid regions in the genome, which comprises:
  • the at least one guide RNA is directed to at least one target sequence in the target nucleic acid region.
  • base editing system refers to a combination of components required for base editing of the genome of a cell or organism.
  • the various components of the system such as base editing fusion protein, one or more guide RNAs, can exist independently of each other, or can exist in any combination as a composition.
  • guide RNA and “gRNA” are used interchangeably, and refer to RNA that can form a complex with the CRISPR effector protein and can target the complex to the target sequence due to a certain identity with the target sequence molecular.
  • the guide RNA targets the target sequence by base pairing with the complementary strand of the target sequence.
  • the gRNA used by Cas9 nuclease or its functional variants is usually composed of crRNA and tracrRNA molecules that are partially complementary to form a complex, wherein the crRNA contains sufficient identity with the target sequence to hybridize with the complementary strand of the target sequence and guide
  • the CRISPR complex (Cas9+crRNA+tracrRNA) is a guide sequence (also called a seed sequence) that specifically binds to the target sequence.
  • sgRNA single guide RNA
  • the gRNA used by Cpf1 nuclease or its functional variants is usually composed of mature crRNA molecules only, which can also be called sgRNA. Designing a suitable gRNA based on the CRISPR nuclease used and the target sequence to be edited is within the abilities of those skilled in the art.
  • the sequence of sgRNA may include the following scaffold sequence:
  • the target sequence targeted by the guide RNA usually contains 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27 , 28, 29, 30, preferably 20 nucleotides.
  • the guide sequence in the guide RNA usually includes 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 1, 27, 28, 29, 30, preferably 20 nucleotides.
  • the target sequence targeted by the guide RNA also requires a PAM sequence at one end, such as the 3'end, to be recognized by the CRISPR effector protein (or fusion protein)-guide RNA complex.
  • the location, type, and length of the PAM required for the target sequence depends on the CRISPR effector protein used.
  • the PAM sequence is 5'-NGG-3' located at the 3'end of the target sequence.
  • the PAM sequence is 5'-NG-3' located at the 3'end of the target sequence.
  • the base editing fusion protein and the guide RNA can form a complex, and the complex specifically targets the target sequence under the guidance of the guide RNA, and causes One or more Cs are replaced by T and/or one or more A is replaced by G in the target sequence.
  • the C to T base editing window of the base editing fusion protein of the present invention is located at positions 1-17 of the target sequence. That is to say, the base editing fusion protein of the present invention can have one or more Cs in the range of positions 1-17 from the 5'end of the target sequence replaced by T.
  • the A to G base editing window of the base editing fusion protein of the present invention is located at positions 4-8 of the target sequence. That is to say, the base editing fusion protein of the present invention can make one or more A in the range of 4-8 positions from the 5'end of the target sequence replaced by G.
  • the at least one guide RNA may be directed to a target sequence located on the sense strand (such as the protein coding strand) and/or the antisense strand within the target nucleic acid region of the genome.
  • the base editing composition of the present invention can cause one or more Cs in the target sequence on the sense strand (e.g., protein coding strand) to be replaced by T and/ Or one or more A is replaced by G.
  • the base editing composition of the present invention can cause one or more Gs in the target sequence on the sense strand (for example, protein coding strand) to be replaced by A and/or one or more T Replaced by C.
  • the nucleotide sequence encoding the base editing fusion protein is codon optimized for the organism whose genome is to be modified.
  • Codon optimization refers to replacing at least one codon (e.g., about or more than about 1, 2, 3, 4, 5, 10) of the natural sequence with a codon that is used more frequently or most frequently in the gene of the host cell. , 15, 20, 25, 50 or more codons while maintaining the natural amino acid sequence to modify the nucleic acid sequence to enhance expression in the host cell of interest.
  • Codon preference (the difference in codon usage between organisms) is often related to the translation efficiency of messenger RNA (mRNA), and the translation efficiency is considered to depend on the nature and nature of the codon being translated
  • tRNA transfer RNA
  • genes can be tailored to be the best in a given organism based on codon optimization. Good gene expression. Codon utilization tables can be easily obtained, such as the "Codon Usage Database” available at www.kazusa.orjp/codon/ , and these tables can be adjusted in different ways Applicable. See, Nakamura Y. et al., "Codon usage tabulated from the international DNA sequence databases: status for the year 2000. Nucl. Acids Res., 28:292 (2000).
  • the base editing fusion protein of the present invention is encoded by any one of the nucleotide sequences shown in SEQ ID NO: 20-26.
  • Organisms whose genome can be modified by the base editing system of the present invention include any organisms suitable for base editing, preferably eukaryotes.
  • organisms include, but are not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, and cats; poultry such as chickens, ducks, and geese; plants, including monocots and dicots
  • the plant is a crop plant, including but not limited to wheat, rice, corn, soybean, sunflower, sorghum, rape, alfalfa, cotton, barley, millet, sugarcane, tomato, tobacco, cassava, and potato.
  • the organism is a plant. More preferably, the organism is rice.
  • the present invention also provides a method for producing at least one genetically modified cell, comprising introducing the base editing system of the present invention into at least one of the cells, thereby causing a target nucleic acid region in the at least one cell One or more nucleotide substitutions within.
  • the method further includes the step of selecting cells having the desired one or more nucleotide substitutions from the at least one cell.
  • the methods of the invention are performed in vitro.
  • the cell is an isolated cell, or a cell in an isolated tissue or organ.
  • the present invention also provides a genetically modified organism, which comprises a genetically modified cell or its progeny cells produced by the method of the present invention.
  • a genetically modified organism which comprises a genetically modified cell or its progeny cells produced by the method of the present invention.
  • the genetically modified cell or its progeny cells have the desired one or more nucleotide substitutions.
  • the target nucleic acid region to be modified can be located anywhere in the genome, for example, within a functional gene such as a protein coding gene, or, for example, can be located in a gene expression regulatory region such as a promoter region or an enhancer region, so as to achieve The modification of gene function or the modification of gene expression.
  • the desired nucleotide substitution results in a desired gene function modification or gene expression modification.
  • the target nucleic acid region is related to the trait of the cell or organism. In some embodiments, the mutation in the target nucleic acid region causes a change in the trait of the cell or organism. In some embodiments, the target nucleic acid region is located in the coding region of the protein. In some embodiments, the target nucleic acid region encodes a functionally related motif or domain of the protein. In some preferred embodiments, one or more nucleotide substitutions in the target nucleic acid region result in amino acid substitutions in the amino acid sequence of the protein. In some embodiments, the one or more nucleotide substitutions result in a change in the function of the protein.
  • the base editing system can be introduced into cells by various methods well known to those skilled in the art.
  • Methods that can be used to introduce the base editing system of the present invention into cells include, but are not limited to: calcium phosphate transfection, protoplast fusion, electroporation, liposome transfection, microinjection, viral infection (such as baculovirus, vaccinia virus, Adenovirus, adeno-associated virus, lentivirus and other viruses), gene bombardment, PEG-mediated transformation of protoplasts, and Agrobacterium-mediated transformation.
  • Cells that can be base-edited by the method of the present invention can be derived from, for example, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, and cats; poultry such as chickens, ducks, and geese; plants, including Monocotyledonous plants and dicotyledonous plants, preferably crop plants, including but not limited to wheat, rice, corn, soybean, sunflower, sorghum, rape, alfalfa, cotton, barley, millet, sugarcane, tomato, tobacco, cassava and potato.
  • mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, and cats
  • poultry such as chickens, ducks, and geese
  • plants including Monocotyledonous plants and dicotyledonous plants, preferably crop plants, including but not limited to wheat, rice, corn, soybean, sunflower, sorghum, rape, alfalfa, cotton, barley, millet, sugarcane, tomato
  • the base editing fusion protein, the base editing system and the method for producing genetically modified cells of the present invention are particularly suitable for genetic modification of plants.
  • the plant is a crop plant, including but not limited to wheat, rice, corn, soybean, sunflower, sorghum, rape, alfalfa, cotton, barley, millet, sugarcane, tomato, tobacco, cassava and potato. More preferably, the plant is rice.
  • the present invention provides a method for producing a genetically modified plant, comprising introducing the base editing system of the present invention into at least one of the plants, thereby causing a target nucleic acid region in the genome of the at least one plant One or more nucleotide substitutions within.
  • the method further includes screening the at least one plant for plants having the desired one or more nucleotide substitutions.
  • the base editing composition can be introduced into plants by various methods well known to those skilled in the art.
  • Methods that can be used to introduce the base editing system of the present invention into plants include, but are not limited to: gene bombardment, PEG-mediated transformation of protoplasts, Agrobacterium-mediated transformation, plant virus-mediated transformation, pollen tube channel method, and seed Room injection.
  • the base editing composition is introduced into the plant by transient transformation.
  • the target sequence can be modified by introducing or producing the base editing fusion protein and guide RNA into plant cells, and the modification can be inherited stably, without the need to encode the base Exogenous polynucleotides that are components of the editing system stably transform plants. This avoids the potential off-target effects of the stable (continuously produced) base editing composition, and also avoids the integration of foreign nucleotide sequences in the plant genome, thereby having higher biological safety.
  • the introduction is carried out in the absence of selective pressure, so as to avoid the integration of foreign nucleotide sequences in the plant genome.
  • the introduction includes transforming the base editing system of the present invention into an isolated plant cell or tissue, and then regenerating the transformed plant cell or tissue into a whole plant.
  • the regeneration is performed in the absence of selective pressure, that is, no selective agent for the selective gene carried on the expression vector is used in the tissue culture process.
  • no selection agent can improve the regeneration efficiency of plants and obtain modified plants without exogenous nucleotide sequences.
  • the base editing system of the present invention can be transformed to a specific part of the whole plant, such as leaves, stem tips, pollen tubes, young ears or hypocotyls. This is particularly suitable for the transformation of plants that are difficult to undergo tissue culture regeneration.
  • the protein expressed in vitro and/or the RNA molecule transcribed in vitro (for example, the expression construct is an RNA molecule transcribed in vitro) is directly transformed into the plant.
  • the protein and/or RNA molecule can realize base editing in plant cells and then be degraded by the cell, avoiding the integration of foreign nucleotide sequences in the plant genome.
  • genetic modification and breeding of plants using the method of the present invention can obtain plants whose genomes are not integrated with foreign polynucleotides, that is, transgene-free modified plants.
  • the modified target nucleic acid region is related to plant traits such as agronomic traits, whereby the one or more nucleotide substitutions result in the plant having altered characteristics relative to wild-type plants.
  • plant traits such as agronomic traits
  • the one or more nucleotide substitutions result in the plant having altered characteristics relative to wild-type plants.
  • improved traits such as agronomic traits.
  • the method further includes the step of screening for plants having desired one or more nucleotide substitutions and/or desired traits such as agronomic traits.
  • the method further includes obtaining progeny of the genetically modified plant.
  • the genetically modified plant or its progeny has desired one or more nucleotide substitutions and/or desired traits such as agronomic traits.
  • the present invention also provides a genetically modified plant or its progeny or part thereof, wherein the plant is obtained by the above-mentioned method of the present invention.
  • the genetically modified plant or progeny or part thereof is non-transgenic.
  • the genetically modified plant or its progeny have desired genetic modification and/or desired traits such as agronomic traits.
  • the present invention also provides a plant breeding method, which comprises combining the genetically modified first plant that contains one or more nucleotide substitutions in the target nucleic acid region obtained by the above-mentioned method of the present invention with the one that does not contain The second plant with the one or more nucleotide substitutions is crossed, thereby introducing the one or more nucleotide substitutions into the second plant.
  • the genetically modified first plant has desired traits such as agronomic traits.
  • the present invention provides a method for in situ saturation mutation of an endogenous target nucleic acid region in a cell or organism to obtain a mutation of interest in the target nucleic acid region, comprising
  • the methods of the invention are performed in vitro.
  • the cell is an isolated cell, or a cell in an isolated tissue or organ.
  • the base editing system includes multiple guide RNAs and/or multiple expression constructs containing nucleotide sequences encoding the multiple guide RNAs.
  • the multiple guide RNAs are directed to different target sequences in the target nucleic acid region.
  • the plurality of guide RNAs may be 2 to 250 or more, for example 2, 3, 4, 5, 6, 7, 8, 9, 10. Species, 15 species, 20 species, 25 species, 50 species, 75 species, 100 species, 150 species, 200 species, 250 species, 300 species or more.
  • the target sequences and/or complementary sequences targeted by at least some of the plurality of guide RNAs partially overlap each other and/or are adjacent to each other.
  • the base editing system of the present invention can realize base editing of a longer target nucleic acid region.
  • the target nucleic acid region may have a length of about 20bp to about 10000bp or longer, such as about 20bp, about 40bp, about 60bp, about 80bp, about 100bp, about 120bp, about 140bp, about 160bp, about 180bp, about 200bp, About 300bp, about 400bp, about 500bp, about 1000bp, about 1500bp, about 2000bp, about 3000bp, about 4000bp, about 5000bp, about 6000bp or longer.
  • the target nucleic acid sequence may encode an amino acid sequence of about 5 to about 2000 amino acids in length, for example, may encode an amino acid sequence of about 5, about 10, or about 15 in length.
  • the target sequence targeted by the plurality of guide RNAs substantially covers the target nucleic acid region.
  • At least a portion of the plurality of guide RNAs target the sense strand of the target nucleic acid region.
  • At least a portion of the plurality of guide RNAs target the antisense strand of the target nucleic acid region.
  • the plurality of guide RNAs and/or a plurality of expression constructs containing nucleotide sequences encoding the plurality of guide RNAs can each be independently introduced into the population of the cell or organism. In some embodiments, the plurality of guide RNAs and/or a plurality of expression constructs containing nucleotide sequences encoding the plurality of guide RNAs can be introduced into the population of the cell or organism in combination with each other.
  • each guide RNA or its expression construct is introduced into a subpopulation of the cells or organisms, and finally all the subpopulations constitute the population of cells or organisms that have been introduced into the gene editing system; or, every two guide RNAs Or a mixture of expression constructs thereof is used to introduce a subpopulation of said cells or organisms, and finally all subpopulations constitute a population of cells or organisms that have been introduced into the gene editing system; and so on.
  • the mutation is a nucleotide substitution, such as a C to T substitution, A to G substitution, G to A substitution, or T to C substitution.
  • the target nucleic acid region is located in the coding region of the protein. In some embodiments, the target nucleic acid region encodes a functionally related motif or domain of the protein. In some embodiments, the mutations in the target nucleic acid region may be silent mutations, missense mutations, or nonsense mutations. In some preferred embodiments, mutations in the target nucleic acid region result in amino acid substitutions in the amino acid sequence of the protein. In some embodiments, the mutation results in a change in the function of the protein.
  • the "saturation” mutation does not necessarily mean that the population of cells or organisms contains all nucleotide mutations in the target nucleic acid region or all amino acid mutations in the amino acid sequence encoded by the target nucleic acid region. “Near saturation” mutations are also encompassed, such as mutations in which more than 50% of the nucleotides in the target nucleic acid region are contained in the population of cells or organisms or more than 50% of the amino acid mutations in the amino acid sequence encoded by the target nucleic acid region.
  • the target nucleic acid region is related to the trait of the cell or organism.
  • the mutation in the target nucleic acid region causes a change in the trait of the cell or organism. Therefore, in some embodiments, mutations of interest can be screened for changes in the traits of cells or organisms.
  • mutant of interest generally refers to a mutation that causes a change in a trait of interest of a cell or organism. Therefore, in step iii), cells or organisms with mutations of interest can be screened by screening cells or organisms with changes in traits of interest.
  • the cells can be derived from, for example, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens, ducks, and geese; plants, including monocots and dicots, preferably Crop plants, including but not limited to wheat, rice, corn, soybean, sunflower, sorghum, rape, alfalfa, cotton, barley, millet, sugarcane, tomato, tobacco, cassava, and potato.
  • the cell is a plant cell, more preferably a crop plant cell, more preferably a rice cell.
  • the organism may be, for example, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, and cats; poultry such as chickens, ducks, and geese; plants, including monocots and dicots, Preferred crop plants include, but are not limited to, wheat, rice, corn, soybean, sunflower, sorghum, rape, alfalfa, cotton, barley, millet, sugar cane, tomato, tobacco, cassava, and potato.
  • the organism is a plant, more preferably a crop plant, more preferably rice.
  • the traits of interest include improved agronomic traits, including but limited to increased growth rate, increased yield, increased nutrient content, increased cold resistance, increased drought resistance, Increased insect resistance, increased disease resistance, increased herbicide resistance, etc.
  • the traits of interest include but are not limited to drug resistance.
  • the present invention also covers the application of the base editing system of the present invention in the treatment of diseases.
  • Modification of disease-related genes by the base editing system of the present invention can realize up-regulation, down-regulation, inactivation, activation or mutation correction of disease-related genes, etc., thereby achieving disease prevention and/or treatment.
  • the target nucleic acid region in the present invention can be located in the protein coding region of a disease-related gene, or, for example, can be located in a gene expression regulatory region such as a promoter region or an enhancer region, so that functional modification or modification of the disease-related gene can be achieved.
  • Modification of disease-related gene expression. Therefore, modifying disease-related genes described herein includes modification of disease-related genes themselves (such as protein coding regions), as well as modification of their expression regulatory regions (such as promoters, enhancers, introns, etc.).
  • a “disease-related” gene refers to any gene that produces transcription or translation products at abnormal levels or in an abnormal form in cells derived from tissues affected by the disease, compared to tissues or cells that are not disease control. In the case where the altered expression is related to the appearance and/or progression of the disease, it may be a gene expressed at an abnormally high level; it may be a gene expressed at an abnormally low level.
  • Disease-related genes also refer to genes that have one or more mutations or genetic variants directly responsible for or linkage disequilibrium with one or more genes responsible for the etiology of the disease.
  • the mutation or genetic variation is, for example, a single nucleotide variation (SNV).
  • SNV single nucleotide variation
  • the present invention also provides a method of treating a disease in a subject in need thereof, comprising delivering to the subject an effective amount of the base editing system of the present invention to modify genes related to the disease.
  • the present invention also provides the use of the base editing system of the present invention in preparing a pharmaceutical composition for treating diseases in a subject in need, wherein the base editing system is used to modify genes related to the disease.
  • the present invention also provides a pharmaceutical composition for treating diseases in a subject in need, which comprises the base editing system of the present invention, and optionally a pharmaceutically acceptable carrier, wherein the base editing system is used for modification and The genes related to the disease.
  • the subject is a mammal, such as a human.
  • diseases include, but are not limited to, tumors, inflammation, Parkinson's disease, cardiovascular diseases, Alzheimer's disease, autism, drug addiction, age-related macular degeneration, schizophrenia, genetic diseases and the like.
  • the present invention also includes a kit for the method of the present invention, which includes the base editing fusion protein of the present invention and/or an expression construct containing a nucleotide sequence encoding the base editing fusion protein, or The base editing system of the present invention.
  • the kit generally includes a label indicating the intended use and/or method of use of the contents of the kit.
  • the term label includes any written or recorded material provided on or with the kit or otherwise provided with the kit.
  • the kit of the present invention may also contain suitable materials for constructing the expression vector in the base editing system of the present invention.
  • the kit of the present invention may also include reagents suitable for transforming the base editing fusion protein or base editing composition of the present invention into cells.
  • the cytidine deaminase, adenosine deaminase, nCas9 (D10A) and UGI parts of STEME-1, STEME-2, STEME-3 and STEME-4 are codon-optimized for cereal plants and synthesized commercially (GENEWIZ, Suzhou , China).
  • the Cas9 variant nCas9-NG (D10A) containing the R1335V/L1111R/D1135V/G1218R/E1219F/A1322R/T1337R mutation was obtained from the mutant nCas9 (Mut Express MultiS Fast Mutagenesis Kit, Vazyme, Nanjing, China) by the Gibson assembly method D10A) obtained by mutation.
  • the japonica rice variety Nipponbare was used to prepare the protoplasts used in this study. Protoplast isolation and transformation were performed as described previously (reference 3). 10 ⁇ g of base editor and sgRNA plasmid DNA were introduced into protoplasts by PEG-mediated transfection, and the average transformation efficiency was measured to be 40-55%. The transfected protoplasts were incubated at 23°C. 60 hours after transfection, protoplasts were collected to extract genomic DNA for deep sequencing of amplicons.
  • PCR was directly performed to amplify the protospacer of the binary vector from the transgenic callus.
  • the barcode is added to both ends of the PCR product through primers.
  • two rounds of PCR were performed. In the first round of PCR, site-specific primers are used to amplify the target region.
  • forward and reverse barcodes were added to the ends of PCR products for library construction. The same amount of PCR products were combined and purified by gel DNA extraction, and commercial sequencing was performed on the samples using the Illumina NextSeq 500 platform (Genewiz, Suzhou, China). Check the pre-spacer sequence in the sequencing reads to analyze C>T and/or A>G substitutions and indels. Using genomic DNA extracted from three independent protoplast samples, repeat amplicon sequencing for each target sequence three times.
  • CBE cytosine base editor
  • ABE adenine base editor
  • a new type of base editor is designed, which can simultaneously generate C:G>T:A and A:T>G:C on the same target site with only one sgRNA.
  • Mutation in order to carry out endogenous sequence targeted saturation mutagenesis (saturated targeting endogenous mutagenesis, STEM) on the selected target gene.
  • the inventors fused cytosine deaminase and adenine deaminase into a new deaminase, and developed a saturated targeting endogenous mutagenesis editor (STEME) for endogenous sequence targeting.
  • the components of STEME also include nCas9 (D10A) and uracil DNA glycosylase inhibitor (UGI) ( Figure 1a).
  • the fused deaminase can deaminate C and/or A in the deamination window, nCas9 can promote the mismatch repair mechanism (mismatch repair, MMR) in the cell, and UGI is used to inhibit uracil DNA glycosylase (uracil DNA glycosylase, UDG), so that the damaged DNA chain can be repaired according to the target chain being deamination.
  • MMR mismatch repair
  • UGI uracil DNA glycosylase
  • UDG uracil DNA glycosylase
  • PABE-7 is composed of artificially evolved ecTadA-ecTadA7.10 heterodimer and nCas9 with 3 NLS at the N-terminal.
  • the inventors designed two forms of fusion deaminase: APOBEC3A-ecTadA-ecTadA7.10 and ecTadA-ecTadA7.10-APOBEC3A , And fused them to the N-terminal of nCas9 (D10A) respectively, and fused one UGI or two UGIs freely expressed (through the T2A connecting peptide) to the C-terminal of nCas9 (D10A) to construct STEME-1, STEME -2, STEME-3 and STEME-4 four carriers (Figure 1b). All STEME vectors are optimized according to the codons of the crops, and expressed by the Ubiquitin-1 (Ubi-1) promoter of maize.
  • Ubiquitin-1 Ubiquitin-1
  • the 20nt sgRNA spacer sequence was constructed on the esgRNA vector driven by the OsU3 promoter ( Figure 2).
  • A3A-PBE and PABE-7 were used as controls for C>T and A>G, and wild-type Cas9 was used as a control for indel production.
  • Perform amplicon sequencing on each sample get about 30,000-310,000 sequence reads per sample, and analyze the base editing efficiency.
  • the results show that all four STEME vectors can produce high-efficiency C>T and/or A>G base transversion in rice protoplasts.
  • STEME-1 has the highest C>T efficiency (0.1%-61.61%) ( Figure 3a and Figure 4).
  • the base editing window of STEME to C>T is the same as that of A3A-PBE, which is C1 to C17 ( Figure 3a and Figure 4).
  • the editing efficiency of STEME-1 from C5 to C14 averaged 29.59%, which was 1.3 times that of A3A-PBE ( Figure 3a and Figure 4).
  • STEME-1 still has the highest A>G efficiency (0.69%-15.5%) among the four vectors ( Figure 3b and Figure 4).
  • the A>G base editing window of STEME is the same as PABE-7, ranging from A4 to A8, but the four types of STEME (0.07%-15.5%) are all higher than the A>G base produced by PABE-7 (1.74%-21.54%) Editing efficiency is low ( Figure 3b and Figure 4).
  • STEME According to the amplicon sequencing data of rice protoplasts, STEME has high product purity at the six target sites tested (Figure 4), and the efficiency of indels produced by it is consistent with that of the untreated group, much lower than that of Cas9 (6.3%-15.61%) (Figure 6).
  • STEME especially STEME-1
  • STEME-1 can simultaneously achieve C:G>T:A and/or A:T>G:C base transversion using only one sgRNA.
  • STEME-1 produces higher C:G>T:A base editing efficiency from C5 to C14 than A3A-PBE, and the base transversion of A:T>G:C can increase the target of directed evolution through saturation mutation To the mutation type.
  • Cas9 derived from Streptococcus pyogenes requires NGG PAM in the target sequence, which limits the number of sgRNAs available on the rice genome.
  • the bio-information analysis of the rice reference genome showed that the use of Cas9-NG (VRVRFRR) extended the editable range to 79%, while the use of Cas9 could only target 19% of the rice genome ( Figure 7a). Therefore, in order to expand the editing scope of STEME, nCas9 (D10A) on STEME-1 was replaced with nCas9-NG (D10A), and STEME-NG was constructed ( Figure 7b).
  • A3A-PBE-NG, PABE7-NG and pCas9-NG were also constructed (Figure 8a).
  • a 20nt target sequence with PAM as NGA, NGT, NGC and NGG was designed to reduce the impact of chromatin state ( Figure 8b and c), and the target sequence Constructed on pOsU3-esgRNA vector.
  • STEME-NG has editing activity on target sequences with PAM as NGA, NGT, NGC and NGG ( Figure 9 and Figure 10).
  • STEME-NG can edit the cytosine in C1 to C17 and the adenine in A4 to A8, but like STEME-1, the base editing efficiency of A>G is lower than that of C>T ( Figure 9).
  • the activity of STEME-NG on the NGG PAM target is lower than that of STEME-1 ( Figure 9 and Figure 10), and it has higher editing on NGA and NGT PAM Activity ( Figure 9 and Figure 10).
  • STEME-NG has a higher editing activity on the OsODEV-NGC target, and the editing activity on the other three NGC targets is relatively low ( Figure 9 and Figure 10) .
  • STEME-NG The activities of A3A-PBE-NG and PABE7-NG at the target sites of NGA, NGT, NGC and NGG PAM are consistent with those of STEME-NG ( Figure 11).
  • STEME-NG, A3A-PBE-NG and PABE7-NG all have lower indels values compared with pCas9-NG.
  • STEME-NG greatly expands the base editing range of C>T and/or A>G on the genome, and promotes mutations produced by saturation mutations and directed evolution in plants.
  • This example uses OsACC as an example to illustrate the ability of STEME-mediated saturation de novo mutation to produce directed evolution in protoplasts.
  • Acetyl-coenzyme A carboxylase is a key enzyme in the lipid synthesis pathway, and the carboxyltransferase domain (CT) is the herbicide-resistant active site of this enzyme ( Figure 12a).
  • CT carboxyltransferase domain
  • these 20 target sites can cover 90.32% of C, 40.43% of A, 77.78% of G and 38.89% of T on the coding chain, covering a total of 61.31% of the bases on the coding chain ( Figure 12a, Table 3).
  • STEME-NG produced a total of 212 mutation types, which is 2.7 times that of A3A-PBE-NG.
  • 212 mutation types 18.4% of the mutations were caused by the simultaneous mutations of C:G>T:A and A:T>G:C.
  • pCas9-NG still has a higher indel efficiency (0.32%-39.72%) than STEME-NG and A3A-PBE-NG.
  • STEME-NG can generate a variety of mutation types in the coding chain. Different from the recently reported directed evolution methods that rely on bacteria and yeasts (such as PACE, EvolvR, CREATE, CHAnGE, etc.), STEME will be able to directly generate saturated de novo mutation types in situ and can be used for directed evolution of plant proteins .
  • STEME-1 APOBEC3A-48aa linker-ecTadA-32aa linker-ecTadA7.10-32aa linker-nCas9(D10A)-NLS-UGI-NLS) coding sequence
  • STEME-NG (APOBEC3A-48aa linker-ecTadA-32aa linker-ecTadA7.10-32aa linker-nCas9-NG(D10A)-NLS-UGI-NLS) coding sequence

Landscapes

  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Zoology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Cell Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

提供了一种碱基编辑融合蛋白,该碱基编辑融合蛋白包含核酸靶向结构域、胞嘧啶脱氨酶结构域和腺嘌呤脱氨酶结构域。还提供了一种碱基编辑***和产生至少一个经遗传修饰的细胞的方法。

Description

一种碱基编辑***和其使用方法 技术领域
本发明属于基因工程领域。具体而言,本发明涉及一种碱基编辑***和其使用方法。更具体而言,本发明涉及一种能在生物体原位产生从头(de novo)突变,例如饱和突变的碱基编辑***和方法,所述碱基编辑***和方法基于包含脱氨酶和CRISPR效应蛋白的碱基编辑融合蛋白。
发明背景
遗传与变异是生物进化的基础,变异的产生有利于生物体适应不断变化的环境。因此,在过去的几十年中,通过物理(例如,紫外线)或化学(例如,TILLING)方法创制随机或靶向的突变体已经成为一种筛选新型农艺性状的切实可行的方法。但是,这些方法比较费时费力。大部分的定向进化的方法是在细菌或酵母菌中开发的,例如易错PCR(ep-PCR),DNA重组技术,DNA合成以及PACE等技术。对于高等生物尤其是植物来说,需要定向进化的目的基因往往被转化到细菌或酵母菌中进行操作。一旦这些基因离开原始的基因组环境,那些被筛选出来的有功能的变体可能并不能完全反映出它在原始物种中的特性。成簇规律间隔短回文重复及其相关(CRISPR/Cas)***可以在内源靶位点处产生双链断裂(double strand break,DSB),其研究的兴起为产生靶向突变提供了一种简单且高效的方法。但是,CRISPR/Cas产生的多数突变类型为***和缺失(insertions and deletions,indels),会产生很多的移码突变或翻译的提前终止。由于很多的具有功能的突变多数是由一个碱基的变异引起的,这种产生***和缺失的特点使得CRISPR/Cas9***产生较多的无效突变,尤其是在需要从头突变产生一个功能获得型内源基因的时候。
因此,本领域仍然需要新的基因诱变方法,其能原位产生从头(de novo)突变,例如饱和突变。
发明内容
本申请至少涵盖以下各项记载的实施方案:
项目1.一种碱基编辑融合蛋白,其包含核酸靶向结构域、胞嘧啶脱氨结构域和腺嘌呤脱氨结构域。
项目2.项目1的碱基编辑融合蛋白,其中所述核酸靶向结构域包含至少一个CRISPR效应蛋白多肽。
项目3.项目2的碱基编辑融合蛋白,其中所述CRISPR效应蛋白是Cas9核酸酶或其功能性变体,优选地,所述CRISPR效应蛋白是核酸酶失活的Cas9,更优选地,所述核酸酶失活的Cas9包含SEQ ID NO:2所示的氨基酸序列,最优选地,所述核酸酶失活 的Cas9包含SEQ ID NO:3所示的氨基酸序列。
项目4.项目1-3中任一项的碱基编辑融合蛋白,其中所述胞嘧啶脱氨结构域包含至少一个胞嘧啶脱氨酶多肽。
项目5.项目4的碱基编辑融合蛋白,其中所述胞嘧啶脱氨酶选自APOBEC1脱氨酶、激活诱导的胞苷脱氨酶(AID)、APOBEC3G、CDA1、人APOBEC3A脱氨酶,或它们的功能性变体。
项目6.项目5的碱基编辑融合蛋白,其中所述胞嘧啶脱氨酶是人APOBEC3A脱氨酶或其功能性变体,例如,所述人APOBEC3A脱氨酶包含SEQ ID NO:4所示氨基酸序列。
项目7.项目1-6中任一项的碱基编辑融合蛋白,其中所述腺嘌呤脱氨结构域包含至少一个DNA依赖型腺嘌呤脱氨酶多肽。
项目8.项目7的碱基编辑融合蛋白,其中所述DNA依赖型腺嘌呤脱氨酶衍生自野生型大肠杆菌tRNA腺嘌呤脱氨酶TadA(ecTadA),例如,所述DNA依赖型腺嘌呤脱氨酶包含如SEQ ID NO:6所示的氨基酸序列。
项目9.项目7或8的碱基编辑融合蛋白,其中所述腺嘌呤脱氨结构域包含两个DNA依赖型腺嘌呤脱氨酶。
项目10.项目7的碱基编辑融合蛋白,其中所述腺嘌呤脱氨结构域还包含与所述DNA依赖型腺嘌呤脱氨酶融合的野生型大肠杆菌tRNA腺嘌呤脱氨酶TadA(ecTadA),优选地,所述DNA依赖型腺嘌呤脱氨酶融合至野生型大肠杆菌tRNA腺嘌呤脱氨酶TadA(ecTadA)的C端。
项目11.项目7或8的碱基编辑融合蛋白,其中所述腺嘌呤脱氨结构域包含SEQ ID NO:7或8所示的氨基酸序列。
项目12.项目1-11中任一项的碱基编辑融合蛋白,其中所述核酸靶向结构域、所述胞嘧啶脱氨结构域和所述腺嘌呤脱氨结构域通过接头融合,例如,所述接头包含选自SEQ ID NO:9-11的氨基酸序列。
项目13.项目1-12中任一项的碱基编辑融合蛋白,其中所述碱基编辑融合蛋白从N端至C端方向按以下顺序包含:胞嘧啶脱氨结构域、腺嘌呤脱氨结构域和核酸靶向结构域,或者,所述碱基编辑融合蛋白从N端至C端方向按以下顺序包含:腺嘌呤脱氨结构域、胞嘧啶脱氨结构域和核酸靶向结构域。
项目14.项目1-13中任一项的碱基编辑融合蛋白,其中所述碱基编辑融合蛋白还包含尿嘧啶DNA糖基化酶抑制剂(UGI),例如所述尿嘧啶DNA糖基化酶抑制剂(UGI)包含SEQ ID NO:12所示的氨基酸序列。
项目15.项目1-14中任一项的碱基编辑融合蛋白,其中所述碱基编辑融合蛋白还包含一或多个核定位序列(NLS)。
项目16.项目1的碱基编辑融合蛋白,其中所述碱基编辑融合蛋白包含SEQ ID NO:13-19中任一所示的氨基酸序列。
项目17.一种用于对基因组中靶核酸区域进行修饰的碱基编辑***,其包含:
i)项目1-16中任一项的碱基编辑融合蛋白和/或含有编码所述碱基编辑融合蛋白的核苷酸序列的表达构建体;和/或
ii)至少一种向导RNA和/或至少一种含有编码所述至少一种向导RNA的核苷酸序列的表达构建体,
其中所述至少一种向导RNA针对所述靶核酸区域内的至少一个靶序列。
项目18.项目17的碱基编辑***,其中所述向导RNA是sgRNA,例如所述sgRNA包含SEQ ID NO:27或SEQ ID NO:28所示的支架序列。
项目19.项目17或18的碱基编辑***,其中向导RNA所针对的靶序列在3’端包含PAM序列,例如5’-NGG-3’或5’-NG-3’。
项目20.项目17-19中任一项的碱基编辑***,所述至少一种向导RNA针对位于细胞基因组靶核酸区域内有义链和/或反义链上的靶序列。
项目21.项目17-20中任一项的碱基编辑***,其中编码所述碱基编辑融合蛋白的核苷酸序列针对其基因组待进行修饰的生物体进行密码子优化。
项目22.项目21的碱基编辑***,其中所述碱基编辑融合蛋白由SEQ ID NO:20-26中任一所示核苷酸序列编码。
项目23.一种产生至少一个经遗传修饰的细胞的方法,包括将项目17-22中任一项的碱基编辑***导入至少一个所述细胞,由此导致所述至少一个细胞中靶核酸区域内的一个或多个核苷酸取代。
项目24.项目23的方法,还包括从所述至少一个细胞筛选具有期望的一个或多个核苷酸取代的细胞的步骤。
项目25.项目23或24的方法,其中所述碱基编辑***通过选自以下的方法导入细胞:磷酸钙转染、原生质融合、电穿孔、脂质体转染、微注射、病毒感染(如杆状病毒、痘苗病毒、腺病毒、腺相关病毒、慢病毒和其他病毒)、基因枪法、PEG介导的原生质体转化、土壤农杆菌介导的转化。
项目26.项目23-25中任一项的方法,其中所述细胞来自哺乳动物如人、小鼠、大鼠、猴、犬、猪、羊、牛、猫;家禽如鸡、鸭、鹅;植物,包括单子叶植物和双子叶植物,优选作物植物,例如小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯和马铃薯。
项目27.一种产生经遗传修饰的植物的方法,包括将项目17-22中任一项的碱基编辑***导入至少一个所述植物,由此导致所述至少一个植物的基因组中靶核酸区域内的一个或多个核苷酸取代。
项目28.项目27的方法,所述方法还包括从所述至少一个植物筛选具有期望的一个或多个核苷酸取代的植物。
项目29.项目27或28的方法,其中所述碱基编辑***通过选自以下的方法导入植物:基因枪法、PEG介导的原生质体转化、土壤农杆菌介导的转化、植物病毒介导的转 化、花粉管通道法和子房注射法。
项目30.项目28的方法,其中所述导入在不存在选择压力下进行。
项目31.项目27-30中任一项的方法,其中所述导入包括将所述碱基编辑***转化至分离的植物细胞或组织,然后使所述经转化的植物细胞或组织再生为完整植物,优选地,在不存在选择压力下进行所述再生。
项目32.项目27-30中任一项的方法,其中所述导入包括将所述碱基编辑***转化至完整植物上的叶片、茎尖、花粉管、幼穗或下胚轴。
项目33.项目27-30中任一项的方法,其中所述表达构建体是体外转录的RNA分子。
项目34.项目27-33中任一项的方法,其中所述经遗传修饰的植物不包含整合至其基因组的外源多核苷酸。
项目35.项目27-34中任一项的方法,其中所述被修饰的靶核酸区域与植物性状如农艺性状相关。
项目36.项目27-35中任一项的方法,还包括筛选期望的性状如农艺性状的植物的步骤。
项目37.项目27-36中任一项的方法,还包括获得所述经遗传修饰的植物的后代。
项目38.一种植物育种方法,包括将通过项目27-37中任一项的方法获得的在靶核酸区域包含一个或多个核苷酸取代的经遗传修饰的第一植物与不含有所述一个或多个核苷酸取代的第二植物杂交,从而将所述一个或多个核苷酸取代导入第二植物,优选地,所述经遗传修饰的第一植物具有期望的性状如农艺性状。
项目39.一种在细胞或生物体中对内源靶核酸区域进行原位饱和突变以获得所述靶核酸区域内的感兴趣的突变的方法,包括
i)提供所述细胞或生物体的群体;
ii)将项目17-22中任一项的碱基编辑***导入所述细胞或生物体的群体,导致所述群体的细胞或生物体内源靶核酸区域内的一或多个突变;
iii)筛选所述细胞或生物体的群体中包含感兴趣的突变的细胞或生物体;和任选地
iv)鉴定所述感兴趣的突变。
项目40.项目39的方法,其中所述碱基编辑***包含多种向导RNA和/或多种含有编码所述多种向导RNA的核苷酸序列的表达构建体,优选地,所述多种向导RNA针对靶核酸区域内不同的靶序列。
项目41.项目39或40的方法,其中所述多种向导RNA包含2-250种或更多种,例如是2种、3种、4种、5种、6种、7种、8种、9种、10种、15种、20种、25种、50种、75种、100种、150种、200种、250种、300种或更多种向导RNA。
项目42.项目39-41中任一项的方法,其中所述多种向导RNA的至少一些所针对的靶序列和/或其互补序列相互部分重叠和/或相互邻接。
项目43.项目39-42中任一项的方法,其中所述靶核酸区域长度为大约20bp-大约 10000bp或更长,例如大约20bp、大约40bp、大约60bp、大约80bp、大约100bp、大约120bp、大约140bp、大约160bp、大约180bp、大约200bp、大约300bp、大约400bp、大约500bp、大约1000bp、大约1500bp、大约2000bp、大约3000bp、大约4000bp、大约5000bp、大约6000bp或更长;或者,其中所述靶核酸序列编码长度为大约5个-大约2000个氨基酸的氨基酸序列,例如可以编码长度为大约5个、大约10个、大约15个、大约20个、大约25个、大约30个、大约35个、大约40个、大约45个、大约50个、大约60个、大约70个、大约80个、大约90个、大约100个、大约125个、大约150个、大约200个、大约250个、大约500个、大约750个、大约1000个、大约1500个、大约2000个或更多个氨基酸的氨基酸序列。
项目44.项目39-43中任一项的方法,其中所述多种向导RNA的所针对的靶序列基本上覆盖所述靶核酸区域。
项目45.项目39-44中任一项的方法,其中所述多种向导RNA中至少一部分靶向所述靶核酸区域的有义链。
项目46.项目39-45中任一项的方法,其中所述多种向导RNA中至少一部分靶向所述靶核酸区域的反义链。
项目47.项目39-46中任一项的方法,其中所述多种向导RNA和/或多种含有编码所述多种向导RNA的核苷酸序列的表达构建体各自独立地导入所述细胞或生物体的群体;或者所述多种向导RNA和/或多种含有编码所述多种向导RNA的核苷酸序列的表达构建体相互组合地导入所述细胞或生物体的群体。
项目48.项目39-47中任一项的方法,其中所述突变是核苷酸取代,例如C至T取代、A至G取代、G至A取代、或T至C取代。
项目49.项目39-48中任一项的方法,其中所述靶核酸区域位于蛋白的编码区,例如,所述靶核酸区域编码蛋白的功能相关基序或结构域。
项目50.项目49的方法,其中所述靶核酸区域中的突变导致所述蛋白的氨基酸序列中的氨基酸取代,优选地,所述突变导致蛋白的功能的改变。
项目51.项目39-50中任一项的方法,所述靶核酸区域与所述细胞或生物体的性状相关,例如,所述靶核酸区域中的突变导致所述细胞或生物体的性状的改变。
项目52.项目39-51中任一项的方法,在步骤iii),通过筛选具有感兴趣的性状改变的细胞或生物体来筛选具有感兴趣的突变的细胞或生物体,例如,所述感兴趣的性状改变选自增加的生长速率、增加的产量、增加的营养含量、增加的抗寒性、增加的抗旱性、增加的抗虫性、增加的抗病性、增加的除草剂抗性。
项目53.项目39-52中任一项的方法,所述细胞来自哺乳动物如人、小鼠、大鼠、猴、犬、猪、羊、牛、猫;家禽如鸡、鸭、鹅;植物,包括单子叶植物和双子叶植物,优选作物植物,包括但不限于小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯和马铃薯,
优选地,所述细胞是植物细胞,更优选作物植物细胞,更优选水稻细胞。
项目54.项目39-52中任一项的方法,所述生物体选自哺乳动物如人、小鼠、大鼠、猴、犬、猪、羊、牛、猫;家禽如鸡、鸭、鹅;植物,包括单子叶植物和双子叶植物,优选作物植物,包括但不限于小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯和马铃薯,
优选地,所述生物体是植物,更优选作物植物,更优选水稻。
项目55.一种治疗有需要的对象中的疾病的方法,包括向所述对象递送有效量的项目17-22中任一项的碱基编辑***以修饰与所述疾病相关的基因。
项目56.项目17-22中任一项的碱基编辑***在制备用于治疗有需要的对象中的疾病的药物组合物中的用途,其中所述碱基编辑***用于修饰与所述疾病相关的基因。
项目57.用于治疗有需要的对象中的疾病的药物组合物,其包含项目17-22中任一项的碱基编辑***,以及任选的药学可接受的载体,其中所述碱基编辑***用于修饰与所述疾病相关的基因。
项目58.项目55的方法、项目56的用途或项目57的药物组合物,其中所述对象是哺乳动物,例如人。
项目59.项目55的方法、项目56的用途或项目57的药物组合物,其中所述疾病选自肿瘤、炎症、帕金森病、心血管疾病、阿尔茨海默病、自闭症、药物成瘾、年龄相关性黄斑变性、精神***症、遗传性疾病。
项目60.一种试剂盒,其包括项目1-16中任一项的碱基编辑融合蛋白和/或含有编码所述碱基编辑融合蛋白的核苷酸序列的表达构建体,或包含项目17-22中任一项的碱基编辑***。
附图简述
图1.STEME通过胞苷和腺嘌呤脱氨酶的融合进行碱基编辑。(a)STEME介导的C:G>T:A和/或A:T>G:C碱基编辑策略。(b)STEME-1、STEME-2、STEME-3和STEME-4的结构。ecTadA7.10:进化的大肠杆菌TadA;aa,氨基酸;XTEN:16个氨基酸的接头。
图2.pOsU3-esgRNA表达载体。(a)pOsU3-esgRNA的构建体。(b)天然sgRNA支架和esgRNA支架之间的序列和茎环比较。本发明优选使用esgRNA。
图3.STEME-1、STEME-2、STEME-3和STEME-4在原生质体中的编辑效率。(a)A3A-PBE和四种STEME构建体之间C>T碱基编辑频率的比较(n=3)。(b)PABE-7和四种STEME构建体之间A>G碱基编辑频率的比较(n=3)。未处理的原生质体样品用作对照。值和误差条反映了三个独立生物学重复的平均值±s.e.m。
图4.水稻原生质体中STEME-1、STEME-2、STEME-3和STEME-4的活性和产物纯度。(a)-(f)分别是OsAAT、OsACC、OsCDC48、OsDEP1、OsEV和OsOD靶序列。
图5.STEME-1、STEME-5和STEME-6在水稻原生质体中碱基编辑效率的比较。(a)STEME-5和STEME-6的结构。ecTadA7.10:进化的大肠杆菌TadA;aa,氨基酸; XTEN:16个氨基酸的接头。(b)A3A-PBE、STEME-1、STEME-5和STEME-6构建体的C>T碱基编辑频率的比较。(c)PABE-7、STEME-1、STEME-5和STEME-6构建体的A>G碱基编辑频率的比较。未处理的原生质体样品用作对照。值和误差条反映了三个独立生物学重复的平均值±s.e.m。
图6.与Cas9相比,水稻原生质体中STEME的***缺失(indel)效率。(a)水稻原生质体中A3A-PBE、PABE-7、STEME-1、STEME-2、STEME-3和STEME-4构建体的indel效率比较。(b)水稻原生质体中A3A-PBE、PABE-7、STEME-1、STEME-5和STEME-6构建体的indel效率比较。未处理的原生质体样品用作对照。值和误差条反映了三个独立生物学重复的平均值±s.e.m。
图7.STEME-NG用于水稻原生质体中的饱和从头突变。(a)水稻基因组中含有NGG PAM和NG PAM的20nt靶序列的百分比。(b)STEME-NG的结构。
图8.不同水稻位点NG PAM靶的设计。(a)A3A-PBE-NG、PABE7-NG和pCas9-NG的结构。ecTadA7.10:进化的大肠杆菌TadA;aa,氨基酸;XTEN:16个氨基酸的接头。(b)来自水稻基因组中四个基因座的16个NG PAM靶的序列。(c)OsAAT、OsCDC48、OsDEP1和OsODEV基因座及其NG PAM靶序列图。
图9.STEME-NG能够广泛地编辑具有NGA、NGT、NGC或NGG PAM的靶序列。用NGG PAM和未处理的原生质体样品的STEME-1作为对照。值和误差条反映了三个独立生物学重复的平均值±s.e.m。
图10.STEME-NG在水稻原生质体中的活性和产物纯度。(a)-(p)分别来自OsAAT、OsCDC48、OsDEP1和OsEVOD基因座的NG PAM靶序列。
图11.通过A3A-PBE-NG和PABE7-NG分别在水稻原生质体中C>T和A>G的基础编辑效率。A3A-PBE-NG(a)和PABE7-NG(b)都具有编辑NGA、NGT、NGC或NGG PAM靶的广泛能力。未处理的原生质体样品用作对照。值和误差条反映了三个独立生物学重复的平均值±s.e.m。
图12.用STEME-NG对OsACC蛋白关键结构域进行饱和从头突变。(a)Pfam对OsACC蛋白结构域的概述。使用NGD(D=A,T或G)PAM和HCN(H=A,T或C)PAM设计靶序列。(b)在水稻原生质体中饱和突变OsACC CT结构域的编码链上的168bp。当不同的sgRNA转换同一的胞苷、腺苷、胸苷和鸟苷时,计算最大值。值和误差条反映了三个独立生物学重复的平均值±s.e.m。(c)热图显示水稻原生质体中STEME-NG在(b)中涉及的56个氨基酸上的转换饱和度。沉默突变、错义突变和无义突变均收集统计。
图13.在水稻原生质体中通过A3A-PBE-NG在OsACC上饱和突变CT结构域的168bp。显示了A3A-PBE-NG(a)和未处理的对照(b)。当不同的sgRNA转换同一的胞苷和鸟苷时,计算最大值。值和误差条反映了三个独立生物学重复的平均值±s.e.m。
发明内容
一、定义
在本发明中,除非另有说明,否则本文中使用的科学和技术名词具有本领域技术人员所通常理解的含义。并且,本文中所用的蛋白质和核酸化学、分子生物学、细胞和组织培养、微生物学、免疫学相关术语和实验室操作步骤均为相应领域内广泛使用的术语和常规步骤。例如,本发明中使用的标准重组DNA和分子克隆技术为本领域技术人员熟知,并且在如下文献中有更全面的描述:Sambrook,J.,Fritsch,E.F.和Maniatis,T.,Molecular Cloning:A Laboratory Manual;Cold Spring Harbor Laboratory Press:Cold Spring Harbor,1989(下文称为“Sambrook”)。同时,为了更好地理解本发明,下面提供相关术语的定义和解释。
如本文所用,术语“和/或”涵盖由该术语连接的项目的所有组合,应视作各个组合已经单独地在本文列出。例如,“A和/或B”涵盖了“A”、“A和B”以及“B”。例如,“A、B和/或C”涵盖“A”、“B”、“C”、“A和B”、“A和C”、“B和C”以及“A和B和C”。
“包含”一词在本文中用于描述蛋白质或核酸的序列时,所述蛋白质或核酸可以是由所述序列组成,或者在所述蛋白质或核酸的一端或两端可以具有额外的氨基酸或核苷酸,但仍然具有本发明所述的活性。此外,本领域技术人员清楚多肽N端由起始密码子编码的甲硫氨酸在某些实际情况下(例如在特定表达***表达时)会被保留,但不实质影响多肽的功能。因此,本申请说明书和权利要求书中在描述具体的多肽氨基酸序列时,尽管其可能不包含N端由起始密码子编码的甲硫氨酸,然而此时也涵盖包含该甲硫氨酸的序列,相应地,其编码核苷酸序列也可以包含起始密码子;反之亦然。
“基因组”如本文所用不仅涵盖存在于细胞核中的染色体DNA,而且还包括存在于细胞的亚细胞组分(如线粒体、质体)中的细胞器DNA。
“经遗传修饰的生物体”或“经遗传修饰的细胞”意指在其基因组内包含外源多核苷酸或包含经修饰的基因或表达调控序列的生物体或细胞。例如外源多核苷酸能够稳定地整合进生物体或细胞的基因组中,并遗传连续的世代。外源多核苷酸可单独地或作为重组DNA构建体的部分整合进基因组中。经修饰的基因或表达调控序列为在生物体或细胞基因组中所述基因或表达调控序列包含一个或多个脱氧核苷酸取代、缺失和添加。
针对序列而言的“外源”意指来自外来物种的序列,或者如果来自相同物种,则指通过蓄意的人为干预而从其天然形式发生了组成和/或基因座的显著改变的序列。
“多核苷酸”、“核酸序列”、“核苷酸序列”或“核酸片段”可互换使用并且是单链或双链RNA或DNA聚合物,任选地可含有合成的、非天然的或改变的核苷酸碱基。核苷酸通过如下它们的单个字母名称来指代:“A”为腺苷或脱氧腺苷(分别对应RNA或DNA),“C”表示胞苷或脱氧胞苷,“G”表示鸟苷或脱氧鸟苷,“U”表示尿苷,“T”表示脱氧胸苷,“R”表示嘌呤(A或G),“Y”表示嘧啶(C或T),“K”表示G或T,“H”表示A或C或T,“D”表示A、T或G,“I”表示肌苷,并且“N”表示任何核苷酸。
“多肽”、“肽”、和“蛋白”在本发明中可互换使用,指氨基酸残基的聚合物。该术语适用于其中一个或多个氨基酸残基是相应的天然存在的氨基酸的人工化学类似物 的氨基酸聚合物,以及适用于天然存在的氨基酸聚合物。术语“多肽”、“肽”、“氨基酸序列”和“蛋白”还可包括修饰形式,包括但不限于糖基化、脂质连接、硫酸盐化、谷氨酸残基的γ羧化、羟化和ADP-核糖基化。
如本发明所用,“表达构建体”是指适于感兴趣的核苷酸序列在生物体中表达的载体如重组载体。“表达”指功能产物的产生。例如,核苷酸序列的表达可指核苷酸序列的转录(如转录生成mRNA或功能RNA)和/或RNA翻译成前体或成熟蛋白质。
本发明的“表达构建体”可以是线性的核酸片段、环状质粒、病毒载体,或者,在一些实施方式中,可以是能够翻译的RNA(如mRNA),例如是体外转录生成的RNA。
本发明的“表达构建体”可包含不同来源的调控序列和感兴趣的核苷酸序列,或相同来源但以不同于通常天然存在的方式排列的调控序列和感兴趣的核苷酸序列。
“调控序列”和“调控元件”可互换使用,指位于编码序列的上游(5'非编码序列)、中间或下游(3'非编码序列),并且影响相关编码序列的转录、RNA加工或稳定性或者翻译的核苷酸序列。调控序列可包括但不限于启动子、翻译前导序列、内含子和多腺苷酸化识别序列。
“启动子”指能够控制另一核酸片段转录的核酸片段。在本发明的一些实施方案中,启动子是能够控制细胞中基因转录的启动子,无论其是否来源于所述细胞。启动子可以是组成型启动子或组织特异性启动子或发育调控启动子或诱导型启动子。
“组成型启动子”指一般将引起基因在多数细胞类型中在多数情况下表达的启动子。“组织特异性启动子”和“组织优选启动子”可互换使用,并且指主要但非必须专一地在一种组织或器官中表达,而且也可在一种特定细胞或细胞型中表达的启动子。“发育调控启动子”指其活性由发育事件决定的启动子。“诱导型启动子”响应内源性或外源性刺激(环境、激素、化学信号等)而选择性表达可操纵连接的DNA序列。
启动子的实例包括但不限于聚合酶(pol)I、pol II或pol III启动子。pol I启动子的实例包括鸡RNA pol I启动子。pol II启动子的实例包括但不限于巨细胞病毒立即早期(CMV)启动子、劳斯肉瘤病毒长末端重复(RSV-LTR)启动子和猿猴病毒40(SV40)立即早期启动子。pol III启动子的实例包括U6和H1启动子。可以使用诱导型启动子如金属硫蛋白启动子。启动子的其他实例包括T7噬菌体启动子、T3噬菌体启动子、β-半乳糖苷酶启动子和Sp6噬菌体启动子。当用于植物时,启动子可以是花椰菜花叶病毒35S启动子、玉米Ubi-1启动子、小麦U6启动子、水稻U3启动子、玉米U3启动子、水稻肌动蛋白启动子。
如本文中所用,术语“可操作地连接”指调控元件(例如但不限于,启动子序列、转录终止序列等)与核酸序列(例如,编码序列或开放读码框)连接,使得核苷酸序列的转录被所述转录调控元件控制和调节。用于将调控元件区域可操作地连接于核酸分子的技术为本领域已知的。
将核酸分子(例如质粒、线性核酸片段、RNA等)或蛋白质“导入”生物体是指用所述核酸或蛋白质转化生物体细胞,使得所述核酸或蛋白质在细胞中能够发挥功能。本发 明所用的“转化”包括稳定转化和瞬时转化。“稳定转化”指将外源核苷酸序列导入基因组中,导致外源基因稳定遗传。一旦稳定转化,外源核酸序列稳定地整合进所述生物体和其任何连续世代的基因组中。“瞬时转化”指将核酸分子或蛋白质导入细胞中,执行功能而没有外源基因稳定遗传。瞬时转化中,外源核酸序列不整合进基因组中。
“性状”指细胞或生物体的生理的、形态的、生化的或物理的特征。
“农艺性状”特别是指作物植物的可测量的指标参数,包括但不限于:叶片绿色、籽粒产量、生长速率、总生物量或积累速率、成熟时的鲜重、成熟时的干重、果实产量、种子产量、植物总氮含量、果实氮含量、种子氮含量、植物营养组织氮含量、植物总游离氨基酸含量、果实游离氨基酸含量、种子游离氨基酸含量、植物营养组织游离氨基酸含量、植物总蛋白含量、果实蛋白含量、种子蛋白含量、植物营养组织蛋白质含量、除草剂的抗性抗旱性、氮的吸收、根的倒伏、收获指数、茎的倒伏、株高、穗高、穗长、抗病性、抗寒性、抗盐性和分蘖数等。
二、碱基编辑融合蛋白
在一方面,本发明提供一种碱基编辑融合蛋白,其包含核酸靶向结构域、胞嘧啶脱氨结构域和腺嘌呤脱氨结构域。
在本文实施方案中,“碱基编辑融合蛋白”和“碱基编辑器”可互换使用,指的是可以以序列特异性方式介导基因组中靶序列的一或多个核苷酸取代的蛋白。
如本文所用,“核酸靶向结构域”指的是能够介导所述碱基编辑融合蛋白以序列特异性方式(例如通过向导RNA)附着至基因组中特定靶序列处的结构域。在一些实施方案中,所述核酸靶向结构域包含至少一个(例如一个)CRISPR效应蛋白(CRISPR effector)多肽。
如本文所用,术语“CRISPR效应蛋白”通常指在天然存在的CRISPR***中存在的核酸酶(CRISPR核酸酶)或其功能性变体。该术语涵盖基于CRISPR***的能够在细胞内实现序列特异性靶向的任何效应蛋白。
如本文所用,就CRISPR核酸酶而言的“功能性变体”意指其至少保留向导RNA介导的序列特异性靶向能力。优选地,所述功能性变体是核酸酶失活的变体,即其缺失双链核酸切割活性。然而,缺失双链核酸切割活性的CRISPR核酸酶也涵盖切口酶(nickase),其在双链核酸分子形成切口(nick),但不完全切断双链核酸。在本发明的一些优选的实施方案中,本发明所述CRISPR效应蛋白具有切口酶活性。在一些实施方案中,所述功能性变体相对于野生型核酸酶识别不同的PAM(前间区序列邻近基序)序列。
“CRISPR效应蛋白”可以衍生自Cas9核酸酶,包括Cas9核酸酶或其功能性变体。所述Cas9核酸酶可以是来自不同物种的Cas9核酸酶,例如来自化脓链球菌(S.pyogenes)的spCas9或衍生自金黄色葡萄球菌(S.aureus)的SaCas9。“Cas9核酸酶”和“Cas9”在本文中可互换使用,指的是包括Cas9蛋白或其片段(例如包含Cas9的活性DNA切割结构域和/或Cas9的gRNA结合结构域的蛋白)的RNA指导的核酸酶。Cas9是 CRISPR/Cas(成簇的规律间隔的短回文重复序列及其相关***)基因组编辑***的组分,能在向导RNA的指导下靶向并切割DNA靶序列形成DNA双链断裂(DSB)。野生型spCas9的示例性氨基酸序列示于SEQ ID NO:1。
“CRISPR效应蛋白”还可以衍生自Cpf1核酸酶,包括Cpf1核酸酶或其功能性变体。所述Cpf1核酸酶可以是来自不同物种的Cpf1核酸酶,例如来自Francisella novicida U112、Acidaminococcus sp.BV3L6和Lachnospiraceae bacterium ND2006的Cpf1核酸酶。
可用的“CRISPR效应蛋白”还可以衍生自Cas3、Cas8a、Cas5、Cas8b、Cas8c、Cas10d、Cse1、Cse2、Csy1、Csy2、Csy3、GSU0054、Cas10、Csm2、Cmr5、Cas10、Csx11、Csx10、Csf1、Csn2、Cas4、C2c1、C2c3或C2c2核酸酶,例如包括这些核酸酶或其功能性变体。
在一些实施方案中,所述CRISPR效应蛋白是核酸酶失活的Cas9。Cas9核酸酶的DNA切割结构域已知包含两个亚结构域:HNH核酸酶亚结构域和RuvC亚结构域。HNH亚结构域切割与gRNA互补的链,而RuvC亚结构域切割非互补的链。在这些亚结构域中的突变可以使Cas9的核酸酶活性失活,形成“核酸酶失活的Cas9”。所述核酸酶失活的Cas9仍然保留gRNA指导的DNA结合能力。
本发明所述核酸酶失活的Cas9可以衍生自不同物种的Cas9,例如,衍生自化脓链球菌(S.pyogenes)Cas9(SpCas9),或衍生自金黄色葡萄球菌(S.aureus)Cas9(SaCas9)。同时突变Cas9的HNH核酸酶亚结构域和RuvC亚结构域(例如,包含突变D10A和H840A)使Cas9的核酸酶失去活性,成为核酸酶死亡Cas9(dCas9)。突变失活其中一个亚结构域可以使得Cas9具有切口酶活性,即获得Cas9切口酶(nCas9),例如,仅具有突变D10A的nCas9。
因此,在本发明的一些实施方案中,本发明所述核酸酶失活的Cas9变体相对于野生型Cas9包含氨基酸取代D10A和/或H840A,其中氨基酸编号参照SEQ ID NO:1。在一些优选实施方式中,所述核酸酶失活的Cas9相对于野生型Cas9包含氨基酸取代D10A,其中氨基酸编号参照SEQ ID NO:1。在一些实施方式中,所述核酸酶失活的Cas9包含SEQ ID NO:2所示的氨基酸序列(nCas9(D10A))。
Cas9核酸酶在用于基因编辑时,通常需要靶序列在3’端具有5’-NGG-3’的PAM(前间区序列邻近基序)序列。然而,本发明人令人惊奇地发现,这一PAM序列在某些物种例如水稻中出现频率很低,极大地限制了在这些物种如水稻中的基因编辑。为此,本发明中优选使用识别不同的PAM序列的CRISPR效应蛋白,例如具有不同的PAM序列的Cas9核酸酶功能性变体。
在一些优选实施方案中,所述CRISPR效应蛋白是识别PAM序列5’-NG-3’的Cas9变体。识别PAM序列5’-NG-3’的Cas9变体在本文也称为Cas9-NG。在一些实施方案中,Cas9-NG相对于野生型Cas9包含以下氨基酸取代R1335V、L1111R、D1135V、G1218R、E1219F、A1322R、T1337R,其中氨基酸编号参照SEQ ID NO:1。
在一些优选实施方案中,所述CRISPR效应蛋白是核酸酶失活的且识别PAM序列 5’-NG-3’的Cas9变体。在一些实施方案中,核酸酶失活的且识别PAM序列5’-NG-3’的Cas9变体相对于野生型Cas9包含以下氨基酸取代D10A、R1335V、L1111R、D1135V、G1218R、E1219F、A1322R、T1337R,其中氨基酸编号参照SEQ ID NO:1。在一些实施方案中,核酸酶失活的且识别PAM序列5’-NG-3’的Cas9变体包含SEQ ID NO:3所示氨基酸序列(nCas9-NG(D10A))。
如本文所用,“胞嘧啶脱氨结构域”指的是能够接受单链DNA作为底物,催化胞苷或脱氧胞苷分别脱氨化为尿嘧啶或脱氧尿嘧啶的结构域。在一些实施方案中,所述胞嘧啶脱氨结构域包含至少一个(例如一个或两个)胞嘧啶脱氨酶多肽。
在本发明中,融合蛋白中的胞苷脱氨结构域能够将融合蛋白-向导RNA-DNA复合物形成中产生的单链DNA的胞苷脱氨转换成U,再通过碱基错配修复实现C至T的碱基替换。
可用于本发明的胞嘧啶脱氨酶的实例包括但不限于例如APOBEC1脱氨酶、激活诱导的胞苷脱氨酶(AID)、APOBEC3G、CDA1、人APOBEC3A脱氨酶,或它们的功能性变体。在一些实施方式中,所述胞嘧啶脱氨酶是人APOBEC3A脱氨酶或其功能性变体。在一些实施方案中,人APOBEC3A脱氨酶包含SEQ ID NO:4所示氨基酸序列。
如本文所用,“腺嘌呤脱氨结构域”是指能够接受单链DNA作为底物,催化腺苷或脱氧腺苷(A)形成肌苷(I)的结构域。在一些实施方案中,所述腺嘌呤脱氨结构域包含至少一个(例如一个)DNA依赖型腺嘌呤脱氨酶多肽。
在本发明中,融合蛋白中的腺嘌呤脱氨结构域能够将CRISPR效应蛋白-向导RNA-DNA复合物形成中产生的单链DNA的腺苷脱氨转换成肌苷(I),由于DNA聚合酶会将肌苷(I)当做鸟嘌呤(G)处理,因此通过碱基错配修复可以实现A至G的取代。
在一些实施方案中,所述DNA依赖型腺嘌呤脱氨酶是大肠杆菌tRNA腺嘌呤脱氨酶TadA(ecTadA)的变体。示例性的野生型ecTadA氨基酸序列如SEQ ID NO:5所示,然而野生型ecTadA氨基酸序列也可以不包含SEQ ID NO:5中N末端的甲硫氨酸。
在一些实施方案中,所述DNA依赖型腺嘌呤脱氨酶相对于野生型ecTadA包含一或多组选自以下的突变:
1)A106V和D108N;
2)D147Y和E155V;
3)L84F、H123Y和I156F;
4)A142N;
5)H36L、R51L、S146C和K157N;
6)P48S/T/A;
7)A142N;
8)W23L/R;
9)R152H/P,
其中氨基酸编号参照SEQ ID NO:5。
在本发明一些优选实施方式中,所述DNA依赖型腺嘌呤脱氨酶相对于野生型ecTadA包含以下突变:W23R、H36L、R51L、S146C、K157N、A106V、D108N、P48A、L84F、H123Y、I156F、D147Y、E155V和R152P,其中氨基酸编号参照SEQ ID NO:5。
在本发明一些优选实施方式中,所述DNA依赖型腺嘌呤脱氨酶包含如SEQ ID NO:6所示的氨基酸序列。
由于大肠杆菌tRNA腺嘌呤脱氨酶(ecTadA)通常以二聚体发挥功能,因此预期两个DNA依赖型腺嘌呤脱氨酶形成二聚体或DNA依赖型腺嘌呤脱氨酶与野生型腺嘌呤脱氨酶形成二聚体可以显著提高融合蛋白A至G的编辑活性。
在一些优选实施方案中,所述腺嘌呤脱氨结构域包含两个所述DNA依赖型腺嘌呤脱氨酶。
在一些优选实施方案中,所述腺嘌呤脱氨结构域还包含与所述DNA依赖型腺嘌呤脱氨酶(例如大肠杆菌tRNA腺嘌呤脱氨酶TadA的DNA依赖型变体)融合的对应的野生型腺嘌呤脱氨酶(例如大肠杆菌tRNA腺嘌呤脱氨酶TadA)。在一些优选实施方案中,所述DNA依赖型腺嘌呤脱氨酶(例如大肠杆菌tRNA腺嘌呤脱氨酶TadA的DNA依赖型变体)融合至对应的野生型腺嘌呤脱氨酶(例如大肠杆菌tRNA腺嘌呤脱氨酶TadA)的C端。
在一些实施方案中,所述两个DNA依赖型腺嘌呤脱氨酶(例如大肠杆菌tRNA腺嘌呤脱氨酶TadA的DNA依赖型变体)之间或所述DNA依赖型腺嘌呤脱氨酶(例如大肠杆菌tRNA腺嘌呤脱氨酶TadA的DNA依赖型变体)与所述对应的野生型腺嘌呤脱氨酶(例如大肠杆菌tRNA腺嘌呤脱氨酶TadA)之间通过接头融合。
在一些优选实施方案中,所述腺嘌呤脱氨结构域包含SEQ ID NO:7或8所示的氨基酸序列。
在本发明的一些实施方案中,所述核酸靶向结构域、所述胞嘧啶脱氨结构域和所述腺嘌呤脱氨结构域通过接头融合。
如本文所用,“接头”可以是长1-50个(例如1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20个或20-25个、25-50个)或更多个氨基酸、无二级以上结构的非功能性氨基酸序列。例如,所述接头可以是柔性接头,例如GGGGS、GS、GAP、(GGGGS)x 3、GGS和(GGS)x7等。在一些实施方案中,所述接头长32个氨基酸,例如所述接头包含SEQ ID NO:9所示氨基酸序列。在一些实施方案中,所述接头长48个氨基酸,例如所述接头包含SEQ ID NO:10所示氨基酸序列。在一些优选的实施方案中,所述接头是包含SEQ ID NO:11所示氨基酸序列的XTEN接头。
在一些实施方案中,所述碱基编辑融合蛋白从N端至C端方向按以下顺序包含:胞嘧啶脱氨结构域、腺嘌呤脱氨结构域和核酸靶向结构域。在一些实施方案中,所述碱基编辑融合蛋白从N端至C端方向按以下顺序包含:腺嘌呤脱氨结构域、胞嘧啶脱氨结构域和核酸靶向结构域。
此外,在细胞中,尿嘧啶DNA糖基化酶催化U从DNA上的去除并启动碱基切除修复(BER),导致将U:G修复成C:G。因此,不受任何理论限制,在本发明的碱基编辑 融合蛋白与尿嘧啶DNA糖基化酶抑制剂(UGI)组合将能够增加C至T碱基编辑的效率。
在一些实施方式中,所述碱基编辑融合蛋白与尿嘧啶DNA糖基化酶抑制剂(UGI)共表达。
在一些实施方式中,所述碱基编辑融合蛋白还包含尿嘧啶DNA糖基化酶抑制剂(UGI)。
在一些实施方式中,UGI通过接头与所述碱基编辑融合蛋白其它部分连接。
在一些实施方式中,UGI通过“自裂解肽”与所述碱基编辑融合蛋白其它部分连接。
如本文所用“自裂解肽”意指可以在细胞内实现自剪切的肽。例如,所述自裂解肽可以包含蛋白酶识别位点,从而被细胞内的蛋白酶识别并特异性切割。或者,所述自裂解肽可以是2A多肽。2A多肽是一类来自病毒的短肽,其自切割发生在翻译期间。当用2A多肽连接两种不同目的多肽在同一读码框表达时,几乎以1:1的比例生成两种目的多肽。常用的2A多肽可以是来自猪捷申病毒(porcine techovirus-1)的P2A、来自明脉扁刺蛾β四体病毒(Thosea asigna virus)的T2A、马甲型鼻病毒(equine rhinitis A virus)的E2A和来自***病毒(foot-and-mouth disease virus)的F2A。本领域也已知多种这些2A多肽的功能性变体,这些变体也可以用于本发明。
优选地,所述核酸靶向结构域、所述胞嘧啶脱氨结构域和所述腺嘌呤脱氨结构域之间或内部不存在所述自裂解肽。在一些实施方式中,UGI位于所述碱基编辑融合蛋白的N末端或C末端,优选C末端。
在一些具体实施方式中,所述尿嘧啶DNA糖基化酶抑制剂(UGI)包含SEQ ID NO:12所示的氨基酸序列。
在本发明的一些实施方案中,本发明的融合蛋白还可以包含核定位序列(NLS)。一般而言,所述融合蛋白中的一个或多个NLS应具有足够的强度,以便在细胞的核中驱动所述融合蛋白以可实现其碱基编辑功能的量积聚。一般而言,核定位活性的强度由所述融合蛋白中NLS的数目、位置、所使用的一个或多个特定的NLS、或这些因素的组合决定。
在本发明的一些实施方案中,本发明的融合蛋白的NLS可以位于N端和/或C端。在本发明的一些实施方案中,本发明的融合蛋白的NLS可以位于所述腺嘌呤脱氨结构域、胞嘧啶脱氨结构域、核酸靶向结构域和/或UGI之间。在一些实施方案中,所述融合蛋白包含约1、2、3、4、5、6、7、8、9、10个或更多个NLS。在一些实施方案中,所述融合蛋白包含在或接近于N端的约1、2、3、4、5、6、7、8、9、10个或更多个NLS。在一些实施方案中,所述融合蛋白包含在或接近于C端约1、2、3、4、5、6、7、8、9、10个或更多个NLS。在一些实施方案中,所述多肽包含这些的组合,如包含在N端的一个或多个NLS以及在C端的一个或多个NLS。当存在多于一个NLS时,每一个可以被选择为不依赖于其他NLS。
一般而言,NLS由暴露于蛋白表面上的带正电的赖氨酸或精氨酸的一个或多个短序列组成,但其他类型的NLS也是已知的。NLS的非限制性实例包括:KKRKV、PKKKRKV 或KRPAATKKAGQAKKKK。
此外,根据所需要编辑的DNA位置,本发明的融合蛋白还可以包括其他的定位序列,例如细胞质定位序列、叶绿体定位序列、线粒体定位序列等。
在一些实施方案中,所述碱基编辑融合蛋白包含SEQ ID NO:13-19中任一所示的氨基酸序列。
三、碱基编辑***
在另一方面,本发明提供一种用于对基因组中靶核酸区域进行修饰的碱基编辑***,其包含:
i)本发明的碱基编辑融合蛋白和/或含有编码所述碱基编辑融合蛋白的核苷酸序列的表达构建体;和/或
ii)至少一种向导RNA和/或至少一种含有编码所述至少一种向导RNA的核苷酸序列的表达构建体,
其中所述至少一种向导RNA针对所述靶核酸区域内的至少一个靶序列。
如本文所用,“碱基编辑***”是指用于对细胞或生物体内基因组进行碱基编辑所需的成分的组合。其中所述***的各个成分,例如碱基编辑融合蛋白、一种或多种向导RNA可以各自独立地存在,或者可以以任意的组合作为组合物的形式存在。
如本文所用,“向导RNA”和“gRNA”可互换使用,指的是能够与CRISPR效应蛋白形成复合物并由于与靶序列具有一定相同性而能够将所述复合物靶向靶序列的RNA分子。向导RNA通过与靶序列互补链之间的碱基配对而靶向所述靶序列。例如,Cas9核酸酶或其功能性变体所采用的gRNA通常由部分互补形成复合物的crRNA和tracrRNA分子构成,其中crRNA包含与靶序列具有足够相同性以便与该靶序列的互补链杂交并且指导CRISPR复合物(Cas9+crRNA+tracrRNA)与该靶序列序列特异性地结合的引导序列(也称种子序列)。然而,本领域已知可以设计单向导RNA(sgRNA),其同时包含crRNA和tracrRNA的特征。而Cpf1核酸酶或其功能性变体所采用的gRNA通常仅由成熟crRNA分子构成,其也可称为sgRNA。基于所使用的CRISPR核酸酶和待编辑的靶序列设计合适的gRNA属于本领域技术人员的能力范围内。
在本发明一些具体实施方式中,例如当所述CRISPR效应蛋白是Cas9或其功能性变体时,sgRNA的序列可以包括以下的支架(scaffold)序列:
5’-guuuuagagcuagaaauagcaaguuaaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuu-3’(SEQ ID NO:27,对应于图2的sgRNA scaffold)或
5’-guuuaagagcuaugcuggaaacagcauagcaaguuuaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuu-3’(SEQ ID NO:28,对应于图2的esgRNA scaffold)。sgRNA中引导序列(也称种子序列,即与靶序列相同的序列)位于上述支架序列的5’端。
向导RNA所针对的靶序列通常包含14个、15个、16个、17个、18个、19个、20个、21个、22个、23个、24个、25个、26个、27个、28个、29个、30个,优选 20个核苷酸。相应地,所述向导RNA中的引导序列通常包含14个、15个、16个、17个、18个、19个、20个、21个、22个、23个、24个、25个、26个、27个、28个、29个、30个,优选20个核苷酸。
通常而言,向导RNA所针对的靶序列还需要在其一端例如3’端的PAM序列以被CRISPR效应蛋白(或融合蛋白)-向导RNA复合物所识别。靶序列所需要的PAM的位置、种类和长度取决于所使用的CRISPR效应蛋白。在一些优选实施方式中,所述PAM序列是位于靶序列3’端的5’-NGG-3’。在一些优选实施方式中,所述PAM序列是位于靶序列3’端的5’-NG-3’。
本发明的碱基编辑***在导入所述细胞后,所述碱基编辑融合蛋白和所述向导RNA能够形成复合物,并且该复合物在向导RNA介导下特异性靶向靶序列,并导致靶序列中一或多个C被T取代和/或一或多个A被G取代。
在一些实施方案中,本发明的碱基编辑融合蛋白的C至T碱基编辑窗口位于靶序列的位置1-17。也就是说,本发明的碱基编辑融合蛋白可以使靶序列从5’末端起的第1-17位范围内的一或多个C被T取代。
在一些实施方案中,本发明的碱基编辑融合蛋白的A至G碱基编辑窗口位于靶序列的位置4-8。也就是说,本发明的碱基编辑融合蛋白可以使靶序列从5’末端起的第4-8位范围内的一或多个A被G取代。
在一些实施方案中,所述至少一种向导RNA可以针对位于基因组靶核酸区域内有义链(例如蛋白编码链)和/或反义链上的靶序列。当向导RNA靶向有义链(例如蛋白编码链)时,本发明的碱基编辑组合物可以导致有义链(例如蛋白编码链)上靶序列内的一或多个C被T取代和/或一或多个A被G取代。当向导RNA靶向反义链时,本发明的碱基编辑组合物可以导致有义链(例如蛋白编码链)上靶序列内的一或多个G被A取代和/或一或多个T被C取代。
为了在细胞中获得有效表达,在本发明的一些实施方式中,编码所述碱基编辑融合蛋白的核苷酸序列针对其基因组待进行修饰的生物体进行密码子优化。
密码子优化是指通过用在宿主细胞的基因中更频繁地或者最频繁地使用的密码子代替天然序列的至少一个密码子(例如约或多于约1、2、3、4、5、10、15、20、25、50个或更多个密码子同时维持该天然氨基酸序列而修饰核酸序列以便增强在感兴趣宿主细胞中的表达的方法。不同的物种对于特定氨基酸的某些密码子展示出特定的偏好。密码子偏好性(在生物之间的密码子使用的差异)经常与信使RNA(mRNA)的翻译效率相关,而该翻译效率则被认为依赖于被翻译的密码子的性质和特定的转运RNA(tRNA)分子的可用性。细胞内选定的tRNA的优势一般反映了最频繁用于肽合成的密码子。因此,可以将基因定制为基于密码子优化在给定生物中的最佳基因表达。密码子利用率表可以容易地获得,例如在 www.kazusa.orjp/codon/上可获得的密码子使用数据库(“Codon Usage Database”)中,并且这些表可以通过不同的方式调整适用。参见,Nakamura Y.等,“Codon usage tabulated from the international DNA sequence databases:status for the  year2000.Nucl.Acids Res.,28:292(2000)。
在一些实施方案中,本发明所述碱基编辑融合蛋白由SEQ ID NO:20-26中任一所示核苷酸序列编码。
可以通过本发明的碱基编辑***进行基因组修饰的生物体包括适于碱基编辑的任何生物体,优选真核生物。生物体的实例包括但不限于,哺乳动物如人、小鼠、大鼠、猴、犬、猪、羊、牛、猫;家禽如鸡、鸭、鹅;植物,包括单子叶植物和双子叶植物,例如,所述植物是作物植物,包括但不限于小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯和马铃薯。优选地,所述生物体是植物。更优选地,所述生物体是水稻。
四、产生经遗传修饰的细胞的方法
在另一方面,本发明还提一种产生至少一个经遗传修饰的细胞的方法,包括将本发明的碱基编辑***导入至少一个所述细胞,由此导致所述至少一个细胞中靶核酸区域内的一个或多个核苷酸取代。
在一些实施方案中,所述方法还包括从所述至少一个细胞筛选具有期望的一个或多个核苷酸取代的细胞的步骤。
在一些实施方式中,本发明的方法在体外进行。例如,所述细胞是分离的细胞,或在分离的组织或器官中的细胞。
在另一方面,本发明还提供经遗传修饰的生物体,其包含通过本发明的方法产生的经遗传修饰的细胞或其后代细胞。优选地,所述经遗传修饰的细胞或其后代细胞具有期望的一个或多个核苷酸取代。
在本发明中,待进行修饰的靶核酸区域可以位于基因组的任何位置,例如位于功能基因如蛋白编码基因内,或者例如可以位于基因表达调控区如启动子区或增强子区,从而实现对所述基因功能修饰或对基因表达的修饰。在一些实施方案中,所述期望的核苷酸取代导致期望的基因功能修饰或基因表达修饰。
在一些实施方案中,所述靶核酸区域与所述细胞或生物体的性状相关。在一些实施方案中,所述靶核酸区域中的突变导致所述细胞或生物体的性状的改变。在一些实施方案中,所述靶核酸区域位于蛋白的编码区。在一些实施方案中,所述靶核酸区域编码蛋白的功能相关基序或结构域。在一些优选实施方案中,所述靶核酸区域中的一个或多个核苷酸取代导致所述蛋白的氨基酸序列中的氨基酸取代。在一些实施方案中,所述一个或多个核苷酸取代导致蛋白的功能的改变。
在本发明的方法中,所述碱基编辑***可以通过本领域技术人员熟知的各种方法导入细胞。
可用于将本发明的碱基编辑***导入细胞的方法包括但不限于:磷酸钙转染、原生质融合、电穿孔、脂质体转染、微注射、病毒感染(如杆状病毒、痘苗病毒、腺病毒、腺相关病毒、慢病毒和其他病毒)、基因枪法、PEG介导的原生质体转化、土壤农杆菌 介导的转化。
可以通过本发明的方法进行碱基编辑的细胞可以来自例如,哺乳动物如人、小鼠、大鼠、猴、犬、猪、羊、牛、猫;家禽如鸡、鸭、鹅;植物,包括单子叶植物和双子叶植物,优选作物植物,包括但不限于小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯和马铃薯。
五、在植物中的应用
本发明的碱基编辑融合蛋白、碱基编辑***和产生经遗传修饰的细胞的方法特别适合用于对植物进行遗传学修饰。优选地,所述植物是作物植物,包括但不限于小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯和马铃薯。更优选地,所述植物是水稻。
在另一方面,本发明提供了一种产生经遗传修饰的植物的方法,包括将本发明的碱基编辑***导入至少一个所述植物,由此导致所述至少一个植物的基因组中靶核酸区域内的一个或多个核苷酸取代。
在一些实施方案中,所述方法还包括从所述至少一个植物筛选具有期望的一个或多个核苷酸取代的植物。
在本发明的方法中,所述碱基编辑组合物可以本领域技术人员熟知的各种方法导入植物。可用于将本发明的碱基编辑***导入植物的方法包括但不限于:基因枪法、PEG介导的原生质体转化、土壤农杆菌介导的转化、植物病毒介导的转化、花粉管通道法和子房注射法。优选地,通过瞬时转化将所述碱基编辑组合物导入植物。
在本发明的方法中,只需在植物细胞中导入或产生所述碱基编辑融合蛋白和向导RNA即可实现对靶序列的修饰,并且所述修饰可以稳定遗传,无需将编码所述碱基编辑***的组分的外源多核苷酸稳定转化植物。这样避免了稳定存在的(持续产生的)碱基编辑组合物的潜在脱靶作用,也避免外源核苷酸序列在植物基因组中的整合,从而具有更高生物安全性。
在一些优选实施方式中,所述导入在不存在选择压力下进行,从而避免外源核苷酸序列在植物基因组中的整合。
在一些实施方式中,所述导入包括将本发明的碱基编辑***转化至分离的植物细胞或组织,然后使所述经转化的植物细胞或组织再生为完整植物。优选地,在不存在选择压力下进行所述再生,也即是,在组织培养过程中不使用任何针对表达载体上携带的选择基因的选择剂。不使用选择剂可以提高植物的再生效率,获得不含外源核苷酸序列的经修饰的植物。
在另一些实施方式中,可以将本发明的碱基编辑***转化至完整植物上的特定部位,例如叶片、茎尖、花粉管、幼穗或下胚轴。这特别适合于难以进行组织培养再生的植物的转化。
在本发明的一些实施方式中,直接将体外表达的蛋白质和/或体外转录的RNA分子 (例如,所述表达构建体是体外转录的RNA分子)转化至所述植物。所述蛋白质和/或RNA分子能够在植物细胞中实现碱基编辑,随后被细胞降解,避免了外源核苷酸序列在植物基因组中的整合。
因此,在一些实施方式中,使用本发明的方法对植物进行遗传修饰和育种可以获得其基因组无外源多核苷酸整合的植物,即非转基因(transgene-free)的经修饰的植物。
在本发明的一些实施方式中,其中所述被修饰的靶核酸区域与植物性状如农艺性状相关,由此所述一个或多个核苷酸取代导致所述植物相对于野生型植物具有改变的(优选改善的)性状,例如农艺性状。
在一些实施方式中,所述方法还包括筛选具有期望的一个或多个核苷酸取代和/或期望的性状如农艺性状的植物的步骤。
在本发明的一些实施方式中,所述方法还包括获得所述经遗传修饰的植物的后代。优选地,所述经遗传修饰的植物或其后代具有期望的一个或多个核苷酸取代和/或期望的性状如农艺性状。
在另一方面,本发明还提供了经遗传修饰的植物或其后代或其部分,其中所述植物通过本发明上述的方法获得。在一些实施方式中,所述经遗传修饰的植物或其后代或其部分是非转基因的。优选地,所述经遗传修饰的植物或其后代具有期望的遗传修饰和/或期望的性状如农艺性状。
在另一方面,本发明还提供了一种植物育种方法,包括将通过本发明上述的方法获得的在靶核酸区域包含一个或多个核苷酸取代的经遗传修饰的第一植物与不含有所述一个或多个核苷酸取代的第二植物杂交,从而将所述一个或多个核苷酸取代导入第二植物。优选地,所述经遗传修饰的第一植物具有期望的性状如农艺性状。
六、在细胞或生物体中对内源靶核酸区域进行原位饱和诱变的方法
在另一方面,本发明提供了一种在细胞或生物体中对内源靶核酸区域进行原位饱和突变以获得所述靶核酸区域内的感兴趣的突变的方法,包括
i)提供所述细胞或生物体的群体;
ii)将本发明的碱基编辑***导入所述细胞或生物体的群体,导致所述群体的细胞或生物体内源靶核酸区域内的一或多个突变;
iii)筛选所述细胞或生物体的群体中包含感兴趣的突变的细胞或生物体;和任选地
iv)鉴定所述感兴趣的突变。
在一些实施方式中,本发明的方法在体外进行。例如,所述细胞是分离的细胞,或在分离的组织或器官中的细胞。
在一些实施方案中,所述碱基编辑***包含多种向导RNA和/或多种含有编码所述多种向导RNA的核苷酸序列的表达构建体。在一些实施方案中,所述多种向导RNA针对靶核酸区域内不同的靶序列。
在一些实施方案中,所述多种向导RNA可以是2-250种或更多种,例如是2种、3 种、4种、5种、6种、7种、8种、9种、10种、15种、20种、25种、50种、75种、100种、150种、200种、250种、300种或更多种。
在一些实施方案中,所述多种向导RNA的至少一些所针对的靶序列和/或其互补序列相互部分重叠和/或相互邻接。由此,通过本发明的碱基编辑***可以实现较长的靶核酸区域的碱基编辑。例如,所述靶核酸区域长度可以为大约20bp-大约10000bp或更长,例如大约20bp、大约40bp、大约60bp、大约80bp、大约100bp、大约120bp、大约140bp、大约160bp、大约180bp、大约200bp、大约300bp、大约400bp、大约500bp、大约1000bp、大约1500bp、大约2000bp、大约3000bp、大约4000bp、大约5000bp、大约6000bp或更长。在所述靶核酸区域是蛋白编码区的情况下,所述靶核酸序列可以编码长度为大约5个-大约2000个氨基酸的氨基酸序列,例如可以编码长度为大约5个、大约10个、大约15个、大约20个、大约25个、大约30个、大约35个、大约40个、大约45个、大约50个、大约60个、大约70个、大约80个、大约90个、大约100个、大约125个、大约150个、大约200个、大约250个、大约500个、大约750个、大约1000个、大约1500个、大约2000个或更多个氨基酸的氨基酸序列。
在一些实施方案中,所述多种向导RNA的所针对的靶序列基本上覆盖所述靶核酸区域。
在一些实施方案中,所述多种向导RNA中至少一部分靶向所述靶核酸区域的有义链。
在一些实施方案中,所述多种向导RNA中至少一部分靶向所述靶核酸区域的反义链。
在一些实施方案中,所述多种向导RNA和/或多种含有编码所述多种向导RNA的核苷酸序列的表达构建体可以各自独立地导入所述细胞或生物体的群体。在一些实施方案中,所述多种向导RNA和/或多种含有编码所述多种向导RNA的核苷酸序列的表达构建体可以相互组合地导入所述细胞或生物体的群体。例如,每种向导RNA或其表达构建体分别导入所述细胞或生物体的亚群,最终全部亚群构成已经导入所述基因编辑***的细胞或生物体的群体;或者,每两种向导RNA或其表达构建体的混合物用于导入所述细胞或生物体的亚群,最终全部亚群构成已经导入所述基因编辑***的细胞或生物体的群体;以此类推。
在一些实施方案中,所述突变是核苷酸取代,例如C至T取代、A至G取代、G至A取代、或T至C取代。
在一些实施方案中,所述靶核酸区域位于蛋白的编码区。在一些实施方案中,所述靶核酸区域编码蛋白的功能相关基序或结构域。在一些实施方案中,所述靶核酸区域中的突变可以是沉默突变(silent mutations)、错义突变(missense mutations)或无义突变(nonsense mutations)。在一些优选实施方案中,所述靶核酸区域中的突变导致所述蛋白的氨基酸序列中的氨基酸取代。在一些实施方案中,所述突变导致蛋白的功能的改变。
如本文所用,所述“饱和”突变并不一定意味着所述细胞或生物体的群体中包含靶 核酸区域的所有核苷酸的突变或靶核酸区域编码的氨基酸序列的所有氨基酸突变,该术语也涵盖“近饱和”突变,例如所述细胞或生物体的群体中包含靶核酸区域的50%以上核苷酸的突变或靶核酸区域编码的氨基酸序列的50%以上氨基酸突变。
在一些实施方案中,所述靶核酸区域与所述细胞或生物体的性状相关。在一些实施方案中,所述靶核酸区域中的突变导致所述细胞或生物体的性状的改变。因此,在一些实施方案中,可以通过细胞或生物体的性状的改变筛选感兴趣的突变。
如本文所用,“感兴趣的突变”通常而言指的是引起细胞或生物体的感兴趣的性状改变的突变。因此,在步骤iii),可以通过筛选具有感兴趣的性状改变的细胞或生物体来筛选具有感兴趣的突变的细胞或生物体。
所述细胞可以来自例如,哺乳动物如人、小鼠、大鼠、猴、犬、猪、羊、牛、猫;家禽如鸡、鸭、鹅;植物,包括单子叶植物和双子叶植物,优选作物植物,包括但不限于小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯和马铃薯。优选地,所述细胞是植物细胞,更优选作物植物细胞,更优选水稻细胞。
所述生物体可以是例如,哺乳动物如人、小鼠、大鼠、猴、犬、猪、羊、牛、猫;家禽如鸡、鸭、鹅;植物,包括单子叶植物和双子叶植物,优选作物植物,包括但不限于小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯和马铃薯。优选地,所述生物体是植物,更优选作物植物,更优选水稻。
对于植物特别是作物植物而言,所述感兴趣的性状改变包括改善的农艺性状,包括但限于增加的生长速率、增加的产量、增加的营养含量、增加的抗寒性、增加的抗旱性、增加的抗虫性、增加的抗病性、增加的除草剂抗性等。
对于动物,特别是哺乳动物例如人而言,所述感兴趣的性状改变包括但不限于药物抗性。
七、治疗应用
本发明还涵盖本发明的碱基编辑***在疾病治疗中的应用。
通过本发明的碱基编辑***对疾病相关基因进行修饰,可以实现疾病相关基因的上调、下调、失活、激活或者突变纠正等,从而实现疾病的预防和/或治疗。例如,本发明中所述靶核酸区域可以位于疾病相关基因的蛋白编码区内,或者例如可以位于基因表达调控区如启动子区或增强子区,从而可以实现对所述疾病相关基因功能修饰或对疾病相关基因表达的修饰。因此,本文所述修饰疾病相关基因包括对疾病相关基因本身(例如蛋白编码区)的修饰,也包含对其表达调控区域(如启动子、增强子、内含子等)的修饰。
“疾病相关”基因是指与非疾病对照的组织或细胞相比,在来源于疾病影响的组织的细胞中以异常水平或以异常形式产生转录或翻译产物的任何基因。在改变的表达与疾病的出现和/或进展相关的情况下,它可以是以异常高的水平被表达的基因;它可以是以异常低的水平被表达的基因。疾病相关基因还指具有一个或多个突变或直接负责或与一 个或多个负责疾病的病因学的基因连锁不平衡的遗传变异的基因。所述突变或遗传变异例如是单核苷酸变异(SNV)。转录的或翻译的产物可以是已知的或未知的,并且可以处于正常或异常水平。
因此,本发明还提供治疗有需要的对象中的疾病的方法,包括向所述对象递送有效量的本发明的碱基编辑***以修饰与所述疾病相关的基因。
本发明还提供本发明的碱基编辑***在制备用于治疗有需要的对象中的疾病的药物组合物中的用途,其中所述碱基编辑***用于修饰与所述疾病相关的基因。
本发明还提供用于治疗有需要的对象中的疾病的药物组合物,其包含本发明的碱基编辑***,以及任选的药学可接受的载体,其中所述碱基编辑***用于修饰与所述疾病相关的基因。
在一些实施方式中,所述对象是哺乳动物,例如人。
所述疾病的实例包括但不限于肿瘤、炎症、帕金森病、心血管疾病、阿尔茨海默病、自闭症、药物成瘾、年龄相关性黄斑变性、精神***症、遗传性疾病等。
八、试剂盒
本发明还包括用于本发明的方法的试剂盒,该试剂盒包括本发明的碱基编辑融合蛋白和/或含有编码所述碱基编辑融合蛋白的核苷酸序列的表达构建体,或包含本发明的碱基编辑***。试剂盒一般包括表明试剂盒内容物的预期用途和/或使用方法的标签。术语标签包括在试剂盒上或与试剂盒一起提供的或以其他方式随试剂盒提供的任何书面的或记录的材料。本发明所述试剂盒还可以包含用于构建本发明的碱基编辑***中的表达载体的合适的材料。本发明所述试剂盒还可以包含适于将本发明的碱基编辑融合蛋白或碱基编辑组合物转化进细胞的试剂。
实施例
下面将通过实施例的方式进一步说明本发明,但并不因此将本发明限制在所描述的实施例范围中。
实验材料和方法
质粒构建
STEME-1、STEME-2、STEME-3和STEME-4的胞苷脱氨酶、腺苷脱氨酶、nCas9(D10A)和UGI部分对谷类植物进行密码子优化,并商业合成(GENEWIZ,Suzhou,China)。含有R1335V/L1111R/D1135V/G1218R/E1219F/A1322R/T1337R突变的Cas9变体nCas9-NG(D10A),其通过Gibson组装方法(Mut Express MultiS Fast Mutagenesis Kit,Vazyme,Nanjing,中国)从突变体nCas9(D10A)突变获得。使用TransStart FastPfu DNA聚合酶(TransGen Biotech)进行该工作中的PCR。通过Gibson组装方法(Mut Express MultiS Fast Mutagenesis Kit,Vazyme,Nanjing,China)将各种结构组装进pJIT163骨架。 esgRNA构建体pOsU3-esgRNA如前所述(参考文献1)。将退火的靶序列***到BsaI(New England BioLabs)消化的pOsU3-esgRNA中。为了构建pH-STEME和pH-STEME-NG二元载体,将STEME-1或STEME-NG和OsU3-esgRNA表达盒一起克隆到pHUE411骨架(参考文献2)中。
原生质体转染
使用粳稻品种Nipponbare来制备本研究中使用的原生质体。如前所述(参考文献3)进行原生质体分离和转化。通过PEG介导的转染将10μg碱基编辑器和sgRNA质粒DNA引入原生质体中,测量平均转化效率为40-55%。将转染的原生质体在23℃温育。在转染后60小时,收集原生质体以提取基因组DNA用于扩增子深度测序。
DNA提取
用DNA quick Plant System(Tiangen Biotech,Beijing,China)提取原生质体的基因组DNA。用特异性引物扩增靶序列,用EasyPure PCR纯化试剂盒(TransGen Biotech,Beijing,China)纯化扩增子,并用NanoDrop TM 2000分光光度计(Thermo Fisher Scientific,Waltham,MA,USA)定量。
扩增子深度测序和数据分析
为了测试转化的esgRNA覆盖度,直接进行PCR以扩增来自转基因愈伤组织的二元载体的前间隔区(protospacer)。通过引物将条形码添加到PCR产物的两端。对于其他扩增子,进行两轮PCR。在第一轮PCR中,使用位点特异性引物扩增靶区域。在第二轮PCR中,将正向和反向条形码添加到PCR产物的末端用于文库构建。合并等量的PCR产物并通过凝胶DNA提取纯化,并使用Illumina NextSeq 500平台对样品进行商业测序(Genewiz,Suzhou,China)。检查测序读段中的前间隔区序列以分析C>T和/或A>G取代和***缺失(indels)。使用从三个独立的原生质体样品中提取的基因组DNA,对每个靶序列重复扩增子测序三次。
实施例1、构建内源序列靶向性饱和诱变编辑器(STEME)
最近,CRISPR介导的单碱基编辑策略,称作胞嘧啶碱基编辑器(cytosine base editor,CBE)或腺嘌呤碱基编辑器(adenine base editor,ABE)分别被用于产生编辑窗口内的C:G>T:A或A:T>G:C碱基颠换。CBE和ABE均将Cas9变体(nCas9或dCas9)与相应的脱氨酶融合,通过靶向编码链和非编码链,为饱和突变编码链上的C/G或A/T奠定了基础。
在本发明中,设计了一种新型的碱基编辑器,该碱基编辑器只用一个sgRNA就可以在同一靶位点上同时产生C:G>T:A和A:T>G:C突变,以期在选定的目的基因上进行内源序列靶向性饱和诱变(saturated targeting endogenous mutagenesis,STEM)。本发明人 将胞嘧啶脱氨酶和腺嘌呤脱氨酶融合为一种新的脱氨酶,开发出一种内源序列靶向性饱和诱变编辑器(saturated targeting endogenous mutagenesis editor,STEME)。除了融合的脱氨酶,STEME的组分还包括nCas9(D10A)和尿嘧啶DNA糖基化酶抑制剂(uracil DNA glycosylase inhibitor,UGI)(图1a)。该融合的脱氨酶可以将脱氨窗口内的C和/或A进行脱氨,nCas9能够促进细胞内的错配修复机制(mismatch repair,MMR),UGI用来抑制尿嘧啶DNA糖基化酶(uracil DNA glycosylase,UDG),使得损伤的DNA链能够依照被脱氨的目的链进行修复,经过复制以后,最终实现C:G>T:A和A:T>G:C的突变(图1a)。
在之前的研究中,本发明人开发了一种基于人源胞嘧啶脱氨酶APOBEC3A的在植物中具有高效和宽编辑窗口的胞嘧啶碱基编辑器A3A-PBE,以及一种腺嘌呤碱基编辑器PABE-7,PABE-7由人工进化的ecTadA-ecTadA7.10异二聚体和N端具有3个NLS的nCas9组成。为了同时实现C:G>T:A和A:T>G:C的突变,本发明人设计了两种形式的融合脱氨酶:APOBEC3A-ecTadA-ecTadA7.10和ecTadA-ecTadA7.10-APOBEC3A,并分别将它们融合在nCas9(D10A)的N端,并在nCas9(D10A)的C端融合1个UGI或两个自由表达(通过T2A连接肽)的UGI,构建成了STEME-1、STEME-2、STEME-3和STEME-4四种载体(图1b)。所有的STEME载体均根据作物的密码子进行优化,并由玉米的Ubiquitin-1(Ubi-1)启动子驱动表达。
为了研究STEME的碱基编辑特性,本发明人设计了水稻基因组上的六个靶序列OsAAT、OsACC、OsCDC48、OsDEP1、OsEV和OsOD,见下表1,粗体和下划线标出PAM序列。
表1
Figure PCTCN2020110207-appb-000001
分别将20nt的sgRNA间隔序列(spacer)构建到由OsU3启动子驱动的esgRNA载体上(图2)。每种sgRNA分别与STEME-1、STEME-2、STEME-3和STEME-4共转化水稻原生质体。A3A-PBE和PABE-7作为C>T和A>G的对照,野生型Cas9作为产生indel的对照。对每个样品进行扩增子测序,每个样品获得约30,000-310,000序列读段数,并对碱基编辑效率进行分析。结果表明,四种STEME载体均能够在水稻原生质体产生高 效的C>T和/或A>G碱基颠换。其中STEME-1具有最高的C>T效率(0.1%-61.61%)(图3a和图4)。
STEME对C>T的碱基编辑窗口与A3A-PBE一致,均为C1至C17(图3a和图4)。STEME-1在C5至C14的编辑效率平均为29.59%,为A3A-PBE的1.3倍(图3a和图4)。同时,在6个靶序列上均有A>G的碱基颠换,STEME-1在4种载体中依然具有最高的A>G效率(0.69%-15.5%)(图3b和图4)。STEME的A>G碱基编辑窗口与PABE-7一致,为A4至A8,但是4种STEME(0.07%-15.5%)均比PABE-7(1.74%-21.54%)产生的A>G碱基编辑效率低(图3b和图4)。
由于ecTad7.10是在大肠杆菌以单体的形式进化的,因此推测ecTadA7.10单体或同源二聚体能够增加STEME的A>G活性。将STEME-1载体上的APOBEC3A-ecTadA-ecTadA7.10替换为APOBEC3A-ecTadA7.10或APOBEC3A-ecTadA7.10-ecTadA7.10融合蛋白,构建了STEME-5和STEME-6(图5a),并以实施例1的方法用6个靶序列进行验证。
对STEME-5和STEME-6组进行扩增子测序分析发现,STEME-5和STEME-6的C>T碱基编辑效率与STEME-1接近(图5b),但是A>G的编辑效率依然比STEME-1和PABE-7低(图5c)。STEME载体的A>G碱基编辑效率的下降与之前的研究一致,即在ecTadA-ecTadA7.10异二聚体N端有多余的残基会降低腺嘌呤脱氨酶的活性。根据水稻原生质体的扩增子测序数据,STEME在测试的6个靶位点上均均有较高的产物纯度(图4),且其产生的indels效率与未处理组一致,远低于Cas9(6.3%-15.61%)(图6)。
以上研究结果表明,STEME特别是STEME-1可以只使用1个sgRNA同时实现C:G>T:A和/或A:T>G:C碱基颠换。STEME-1在C5至C14产生比A3A-PBE更高的C:G>T:A碱基编辑效率,同时A:T>G:C的碱基颠换能够增加通过饱和突变进行定向进化的靶向突变类型。
实施例2、STEME的优化
来源于酿脓链球菌的Cas9在靶序列需要NGG PAM,这限制了水稻基因组上可用sgRNA的数量。对水稻参考基因组进行生信分析,结果表明使用Cas9-NG(VRVRFRR)将可编辑的范围扩展至79%,而使用Cas9只能靶向水稻基因组的19%(图7a)。因此,为了扩展STEME的编辑范围,将STEME-1上的nCas9(D10A)替换为nCas9-NG(D10A),构建了STEME-NG(图7b)。
为了研究Cas9-NG变体在水稻中的活性,同时也构建了A3A-PBE-NG、PABE7-NG和pCas9-NG(图8a)。在OsAAT、OsCDC48、OsDEP1和OsODEV基因座上的80bp区域内设计具有PAM为NGA、NGT、NGC和NGG的20nt靶序列,来减少染色质状态产生的影响(图8b和c),并将靶序列构建到pOsU3-esgRNA载体上。
靶向NG PAM的sgRNA载体与STEME-NG共转化水稻原生质体,STEME-1与靶向NGG PAM的sgRNA作为对照。扩增子测序结果表明,STEME-NG在具有PAM为 NGA、NGT、NGC和NGG的靶序列上均具有编辑活性(图9和图10)。STEME-NG可以编辑C1至C17内的胞嘧啶以及A4至A8内的腺嘌呤,但是与STEME-1一样,A>G的碱基编辑效率比C>T小(图9)。像Cas9-NG(VRVRFRR)变体一样,STEME-NG在NGG PAM靶点上的活性与STEME-1相比有所降低(图9和图10),在NGA、NGT PAM上具有较高的编辑活性(图9和图10)。对于所测试的NGC靶位点,STEME-NG除在OsODEV-NGC靶点上具有较高的编辑活性之外,在其它三个NGC靶点上的编辑活性均比较低(图9和图10)。
A3A-PBE-NG和PABE7-NG在NGA、NGT、NGC和NGG PAM靶位点上的活性与STEME-NG一致(图11)。此外,和未处理组一样,与pCas9-NG相比,STEME-NG、A3A-PBE-NG和PABE7-NG均具有较低indels值。综上,STEME-NG、A3A-PBE-NG和PABE7-NG在NG PAM上的活性主要是依赖于Cas9-NG的活性。它们均偏好于NGD PAM(D=A、T或G),而不是NGC PAM。STEME-NG极大地拓展了基因组上C>T和/或A>G的碱基编辑范围,促进了通过饱和突变而产生的变异以及植物中的定向进化。
实施例3、STEME介导的饱和从头突变
本实施例以OsACC为例阐明在原生质体中STEME介导的饱和从头突变产生定向进化的能力。
乙酰辅酶A羧化酶(Acetyl-coenzyme A carboxylase,ACC)是脂类合成途径中的关键酶,羧基转移酶结构域(carboxyltransferase,CT)是该酶的抗除草剂活性位点(图12a)。据报道,ACC酶CT结构域上的氨基酸取代可以使得杂草具有除草剂抗性。因此,发明人在CT结构域的一段编码56个氨基酸的168bp序列上设计了20个靶序列,其中包括11个NGD(D=A、T或G)PAM靶序列(即靶向有义链)和9个HCN(H=A、C或T)PAM靶序列(即靶向反义链)(图12a,表2)。靶向有义链将导致编码链上的C至T和/或A至G取代,而靶向反义链将导致编码链上G至A和/或T至C取代。
表2
Figure PCTCN2020110207-appb-000002
Figure PCTCN2020110207-appb-000003
使用STEME-NG,这些20个靶位点能够覆盖编码链上的90.32%的C,40.43%的A,77.78%的G和38.89%的T,总共覆盖了编码链上61.31%的碱基(图12a,表3)。
表3
Figure PCTCN2020110207-appb-000004
将这20个靶序列构建到pOsU3-esgRNA载体上,分别与STEME-NG共转化水稻原生质体,A3A-PBE-NG和pCas9-NG分别作为C>T和indels的对照。扩增子测序结果表明,STEME-NG编辑了覆盖到的96.43%的C至T,63.16%的A至G,92.86%的G至A以及42.86%的T至C,平均编辑效率分别为11.5%、0.35%、13.33%和0.45%(图12b,表3)。然而,与STEME-NG相比,使用相同的20个靶位点A3A-PBE-NG只编辑了覆盖到的89.29%C至T和92.86%的G至A(图13a,表3)。未处理组没有碱基颠换(图13b)。对于具体的扩增子,使用相同的20个靶序列,STEME-NG共产生212种突变类型,是 A3A-PBE-NG的2.7倍。在212种突变类型中,有18.4%的突变是由C:G>T:A和A:T>G:C同时突变产生的。在这20个靶序列上,pCas9-NG仍然比STEME-NG和A3A-PBE-NG具有更高的indel效率(0.32%-39.72%)。
由于碱基编辑器介导的氨基酸突变的改变对蛋白定向进化具有重要作用,发明人分析了靶向编辑的56个氨基酸的取代效率,结果表明,共有41个氨基酸发生了取代,包括沉默突变(silent mutations)、错义突变(missense mutations)和无义突变(nonsense mutations)(图12c)。只使用20个sgRNA就可以实现56个氨基酸的近饱和突变(73.2%)。24个氨基酸位置具有1种氨基酸取代,12个氨基酸位置具有2种氨基酸取代,5个氨基酸位置具有3种氨基酸取代(图12c)。而A3A-PBE-NG替换了33个氨基酸,包括26个氨基酸位置具有1种氨基酸取代,6个位置具有2种氨基酸取代,以及1个氨基酸位置具有3种氨基酸取代(图13c)。上述结果表明,在水稻原生质体中联合使用NG PAM和CN PAM,STEME-NG能够在编码链上产生多种多样突变类型。与最近报道的依赖于细菌和酵母菌的定向进化方法(如,PACE、EvolvR、CREATE、CHAnGE等)不同,STEME将能够直接在原位产生饱和的从头突变类型,并可用于植物蛋白的定向进化。
参考文献
1.Li,C.et al.Expanded base editing in rice and wheat using a Cas9-adenosine deaminase fusion.Genome Biol.19,59(2018).
2.Xing,H.L.et al.A CRISPR/Cas9 toolkit for multiplex genome editing in plants.BMC Plant Biol.14,327(2014).
3.Shan,Q.et al.Rapid and efficient gene modification in rice and Brachypodium using TALENs.Mol.Plant 6,1365-1368(2013).
序列说明
>SEQ ID NO:1 野生型spCas9氨基酸序
>SEQ ID NO:2 nCas9(D10A)氨基酸序列
>SEQ ID NO:3 nCas9-NG(D10A)氨基酸序列
>SEQ ID NO:4 人APOBEC3A脱氨酶氨基酸序列
>SEQ ID NO:5 野生型ecTadA氨基酸序列
>SEQ ID NO:6 DNA依赖型腺嘌呤脱氨酶ecTadA7.10氨基酸序列
>SEQ ID NO:7 腺嘌呤脱氨结构域ecTadA7.10-ecTadA7.10氨基酸序列
>SEQ ID NO:8 腺嘌呤脱氨结构域ecTadA-ecTadA7.10氨基酸序列
>SEQ ID NO:9 32aa接头
>SEQ ID NO:10 48aa接头
>SEQ ID NO:11 XTEN接头
>SEQ ID NO:12 尿嘧啶DNA糖基化酶抑制剂(UGI)氨基酸序列
>SEQ ID NO:13 STEME-1氨基酸序列
>SEQ ID NO:14 STEME-2氨基酸序列
>SEQ ID NO:15 STEME-3氨基酸序列
>SEQ ID NO:16 STEME-4氨基酸序列
>SEQ ID NO:17 STEME-5氨基酸序列
>SEQ ID NO:18 STEME-6氨基酸序列
>SEQ ID NO:19 STEME-NG氨基酸序列
>SEQ ID NO:20 STEME-1(APOBEC3A-48aa linker-ecTadA-32aa linker-ecTadA7.10-32aa linker-nCas9(D10A)-NLS-UGI-NLS)编码序列
>SEQ ID NO:21 STEME-2(ecTadA-32aa linker-ecTadA7.10-32aa linker-APOBEC3A-16aa linker-nCas9(D10A)-NLS-UGI-NLS)编码序列
>SEQ ID NO:22 STEME-3(APOBEC3A-48aa linker-ecTadA-32aa linker-ecTadA7.10-32aa linker-nCas9(D10A)-NLS-T2A-UGI-NLS-T2A-UGI-NLS)编码序列
>SEQ ID NO:23 STEME-4(ecTadA-32aa linker-ecTadA7.10-32aa linker-APOBEC3A-16aa linker-nCas9(D10A)-NLS-T2A-UGI-NLS-T2A-UGI-NLS)编码序列
>SEQ ID NO:24 STEME-5(APOBEC3A-32aa linker-ecTadA7.10-32aa linker-nCas9(D10A)-NLS-UGI-NLS)编码序列
>SEQ ID NO:25 STEME-6(APOBEC3A-32aa linker-ecTadA7.10-32aa linker-ecTadA7.10-32aa linker-nCas9(D10A)-NLS-UGI-NLS)编码序列
>SEQ ID NO:26 STEME-NG(APOBEC3A-48aa linker-ecTadA-32aa linker-ecTadA7.10-32aa linker-nCas9-NG(D10A)-NLS-UGI-NLS)编码序列
>SEQ ID NO:27 sgRNA支架序列
>SEQ ID NO:28 esgRNA支架序列
>SEQ ID NO:29 A3A-PBE-NG(APOBEC3A-16aa linker-nCas9(D10A)-NLS-UGI-NLS)氨基酸序列
>SEQ ID NO:30PABE7-NG(ecTadA-32aa linker-ecTadA7.10-32aa linker-nCas9-NG(D10A)-NLS-NLS-NLS)氨基酸序列
>SEQ ID NO:31pCas9-NG(NLS-nCas9-NG-NLS)氨基酸序列

Claims (26)

  1. 一种碱基编辑融合蛋白,其包含核酸靶向结构域、胞嘧啶脱氨结构域和腺嘌呤脱氨结构域。
  2. 权利要求1的碱基编辑融合蛋白,其中所述核酸靶向结构域包含至少一个CRISPR效应蛋白多肽。
  3. 权利要求2的碱基编辑融合蛋白,其中所述CRISPR效应蛋白是Cas9核酸酶或其功能性变体,优选地,所述CRISPR效应蛋白是核酸酶失活的Cas9,更优选地,所述核酸酶失活的Cas9包含SEQ ID NO:2所示的氨基酸序列,最优选地,所述核酸酶失活的Cas9包含SEQ ID NO:3所示的氨基酸序列。
  4. 权利要求1-3中任一项的碱基编辑融合蛋白,其中所述胞嘧啶脱氨结构域包含至少一个胞嘧啶脱氨酶多肽。
  5. 权利要求4的碱基编辑融合蛋白,其中所述胞嘧啶脱氨酶选自APOBEC1脱氨酶、激活诱导的胞苷脱氨酶(AID)、APOBEC3G、CDA1、人APOBEC3A脱氨酶,或它们的功能性变体。
  6. 权利要求5的碱基编辑融合蛋白,其中所述胞嘧啶脱氨酶是人APOBEC3A脱氨酶或其功能性变体,例如所述人APOBEC3A脱氨酶包含SEQ ID NO:4所示氨基酸序列。
  7. 权利要求1-6中任一项的碱基编辑融合蛋白,其中所述腺嘌呤脱氨结构域包含至少一个DNA依赖型腺嘌呤脱氨酶多肽。
  8. 权利要求7的碱基编辑融合蛋白,其中所述DNA依赖型腺嘌呤脱氨酶衍生自野生型大肠杆菌tRNA腺嘌呤脱氨酶TadA(ecTadA),例如,所述DNA依赖型腺嘌呤脱氨酶包含如SEQ ID NO:6所示的氨基酸序列。
  9. 权利要求7或8的碱基编辑融合蛋白,其中所述腺嘌呤脱氨结构域包含两个DNA依赖型腺嘌呤脱氨酶。
  10. 权利要求7的碱基编辑融合蛋白,其中所述腺嘌呤脱氨结构域还包含与所述DNA依赖型腺嘌呤脱氨酶融合的野生型大肠杆菌tRNA腺嘌呤脱氨酶TadA(ecTadA),优选地,所述DNA依赖型腺嘌呤脱氨酶融合至野生型大肠杆菌tRNA腺嘌呤脱氨酶TadA(ecTadA)的C端。
  11. 权利要求7或8的碱基编辑融合蛋白,其中所述腺嘌呤脱氨结构域包含SEQ ID NO:7或8所示的氨基酸序列。
  12. 权利要求1-11中任一项的碱基编辑融合蛋白,其中所述核酸靶向结构域、所述胞嘧啶脱氨结构域和所述腺嘌呤脱氨结构域通过接头融合,例如,所述接头包含选自SEQ ID NO:9-11的氨基酸序列。
  13. 权利要求1-12中任一项的碱基编辑融合蛋白,其中所述碱基编辑融合蛋白从N端至C端方向按以下顺序包含:胞嘧啶脱氨结构域、腺嘌呤脱氨结构域和核酸靶向结构域,或者,所述碱基编辑融合蛋白从N端至C端方向按以下顺序包含:腺嘌呤脱氨结构域、胞嘧啶脱氨结构域和核酸靶向结构域。
  14. 权利要求1-13中任一项的碱基编辑融合蛋白,其中所述碱基编辑融合蛋白还包含尿嘧啶DNA糖基化酶抑制剂(UGI),例如所述尿嘧啶DNA糖基化酶抑制剂(UGI)包含SEQ ID NO:12所示的氨基酸序列。
  15. 权利要求1-14中任一项的碱基编辑融合蛋白,其中所述碱基编辑融合蛋白还包含一或多个核定位序列(NLS)。
  16. 权利要求1的碱基编辑融合蛋白,其中所述碱基编辑融合蛋白包含SEQ ID NO:13-19中任一所示的氨基酸序列。
  17. 一种用于对基因组中靶核酸区域进行修饰的碱基编辑***,其包含:
    i)权利要求1-16中任一项的碱基编辑融合蛋白和/或含有编码所述碱基编辑融合蛋白的核苷酸序列的表达构建体;和/或
    ii)至少一种向导RNA和/或至少一种含有编码所述至少一种向导RNA的核苷酸序列的表达构建体,
    其中所述至少一种向导RNA针对所述靶核酸区域内的至少一个靶序列。
  18. 权利要求17的碱基编辑***,其中所述向导RNA是sgRNA,例如所述sgRNA包含SEQ ID NO:27或SEQ ID NO:28所示的支架序列。
  19. 权利要求17或18的碱基编辑***,其中向导RNA所针对的靶序列在3’端包含PAM序列,例如5’-NGG-3’或5’-NG-3’。
  20. 权利要求17-19中任一项的碱基编辑***,所述至少一种向导RNA针对位于细胞基因组靶核酸区域内有义链和/或反义链上的靶序列。
  21. 权利要求17-20中任一项的碱基编辑***,其中编码所述碱基编辑融合蛋白的核苷酸序列针对其基因组待进行修饰的生物体进行密码子优化。
  22. 权利要求21的碱基编辑***,其中所述碱基编辑融合蛋白由SEQ ID NO:20-26中任一所示核苷酸序列编码。
  23. 一种产生至少一个经遗传修饰的细胞的方法,包括将权利要求17-22中任一项的碱基编辑***导入至少一个所述细胞,由此导致所述至少一个细胞中靶核酸区域内的一个或多个核苷酸取代。
  24. 权利要求23的方法,还包括从所述至少一个细胞筛选具有期望的一个或多个核苷酸取代的细胞的步骤。
  25. 权利要求23或24的方法,其中所述碱基编辑***通过选自以下的方法导入细胞:磷酸钙转染、原生质融合、电穿孔、脂质体转染、微注射、病毒感染(如杆状病毒、痘苗病毒、腺病毒、腺相关病毒、慢病毒和其他病毒)、基因枪法、PEG介导的原生质体转化、土壤农杆菌介导的转化。
  26. 权利要求23-25中任一项的方法,其中所述细胞来自哺乳动物如人、小鼠、大鼠、猴、犬、猪、羊、牛、猫;家禽如鸡、鸭、鹅;植物,包括单子叶植物和双子叶植物,优选作物植物,例如小麦、水稻、玉米、大豆、向日葵、高粱、油菜、苜蓿、棉花、大麦、粟、甘蔗、番茄、烟草、木薯和马铃薯。
PCT/CN2020/110207 2019-08-20 2020-08-20 一种碱基编辑***和其使用方法 WO2021032155A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202080059623.XA CN114945670A (zh) 2019-08-20 2020-08-20 一种碱基编辑***和其使用方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910767418 2019-08-20
CN201910767418.8 2019-08-20

Publications (1)

Publication Number Publication Date
WO2021032155A1 true WO2021032155A1 (zh) 2021-02-25

Family

ID=74659768

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/110207 WO2021032155A1 (zh) 2019-08-20 2020-08-20 一种碱基编辑***和其使用方法

Country Status (2)

Country Link
CN (1) CN114945670A (zh)
WO (1) WO2021032155A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113201517A (zh) * 2021-05-12 2021-08-03 广州大学 一种胞嘧啶单碱基编辑器工具及其应用
CN115704015A (zh) * 2021-08-12 2023-02-17 清华大学 基于腺嘌呤和胞嘧啶双碱基编辑器的靶向诱变***
WO2024051850A1 (zh) * 2022-09-09 2024-03-14 中国科学院遗传与发育生物学研究所 基于dna聚合酶的基因组编辑***和方法
CN118086285A (zh) * 2024-04-23 2024-05-28 天津凯莱英生物科技有限公司 蛋白定向进化的方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115820603B (zh) * 2022-11-15 2024-07-05 吉林大学 一种基于dCasRx-NSUN6单基因特异性M5C修饰编辑方法
CN116751799B (zh) * 2023-06-14 2024-01-26 江南大学 一种多位点双重碱基编辑器及其应用

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109517841A (zh) * 2018-12-05 2019-03-26 华东师范大学 一种用于核苷酸序列修饰的组合物、方法与应用
CN109957569A (zh) * 2017-12-22 2019-07-02 中国科学院遗传与发育生物学研究所 基于cpf1蛋白的碱基编辑***和方法
WO2019147014A1 (ko) * 2018-01-23 2019-08-01 기초과학연구원 연장된 단일 가이드 rna 및 그 용도
CN110835634A (zh) * 2018-08-15 2020-02-25 华东师范大学 一种新型碱基转换编辑***及其应用

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109957569A (zh) * 2017-12-22 2019-07-02 中国科学院遗传与发育生物学研究所 基于cpf1蛋白的碱基编辑***和方法
WO2019147014A1 (ko) * 2018-01-23 2019-08-01 기초과학연구원 연장된 단일 가이드 rna 및 그 용도
CN110835634A (zh) * 2018-08-15 2020-02-25 华东师范大学 一种新型碱基转换编辑***及其应用
CN109517841A (zh) * 2018-12-05 2019-03-26 华东师范大学 一种用于核苷酸序列修饰的组合物、方法与应用

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LI CHAO; ZHANG RUI; MENG XIANGBING; CHEN SHA; ZONG YUAN; LU CHUNJU; QIU JIN-LONG; CHEN YU-HANG; LI JIAYANG; GAO CAIXIA: "Targeted, random mutagenesis of plant genes with dual cytosine and adenine base editors", NATURE BIOTECHNOLOGY, GALE GROUP INC., NEW YORK, US, vol. 38, no. 7, 13 January 2020 (2020-01-13), us, pages 875 - 882, XP037187539, ISSN: 1087-0156, DOI: 10.1038/s41587-019-0393-7 *
LIU, JIAHUI ET AL.: "Research Progress of Base Editing System", WORLD SCI-TECH R&D, vol. 39, no. 6, 31 December 2017 (2017-12-31), XP055763468, ISSN: 1006-6055, DOI: 10.16507/j.issn.1006-6055.2017.09.004 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113201517A (zh) * 2021-05-12 2021-08-03 广州大学 一种胞嘧啶单碱基编辑器工具及其应用
CN113201517B (zh) * 2021-05-12 2022-11-01 广州大学 一种胞嘧啶单碱基编辑器工具及其应用
CN115704015A (zh) * 2021-08-12 2023-02-17 清华大学 基于腺嘌呤和胞嘧啶双碱基编辑器的靶向诱变***
WO2024051850A1 (zh) * 2022-09-09 2024-03-14 中国科学院遗传与发育生物学研究所 基于dna聚合酶的基因组编辑***和方法
CN118086285A (zh) * 2024-04-23 2024-05-28 天津凯莱英生物科技有限公司 蛋白定向进化的方法

Also Published As

Publication number Publication date
CN114945670A (zh) 2022-08-26

Similar Documents

Publication Publication Date Title
WO2021032155A1 (zh) 一种碱基编辑***和其使用方法
US11820990B2 (en) Method for base editing in plants
WO2019120310A1 (en) Base editing system and method based on cpf1 protein
US11447785B2 (en) Method for base editing in plants
KR20200103769A (ko) 연장된 단일 가이드 rna 및 그 용도
CN108866092A (zh) 抗除草剂基因的产生及其用途
CN107027313A (zh) 用于多元rna引导的基因组编辑和其它rna技术的方法和组合物
WO2021185358A1 (zh) 一种提高植物遗传转化和基因编辑效率的方法
US20210403901A1 (en) Targeted mutagenesis using base editors
WO2021175289A1 (zh) 多重基因组编辑方法和***
WO2023169454A1 (zh) 腺嘌呤脱氨酶及其在碱基编辑中的用途
WO2021082830A1 (zh) 靶向性修饰植物基因组序列的方法
JP2022511508A (ja) ゲノム編集による遺伝子サイレンシング
WO2023169410A1 (zh) 胞嘧啶脱氨酶及其在碱基编辑中的用途
CN112805385B (zh) 基于人apobec3a脱氨酶的碱基编辑器及其用途
CN117295817A (zh) Dna修饰酶及其活性片段和变体以及使用方法
WO2021175288A1 (zh) 改进的胞嘧啶碱基编辑***
WO2023227050A1 (zh) 一种在基因组中定点***外源序列的方法
WO2022199665A1 (zh) 一种提高植物遗传转化和基因编辑效率的方法
WO2024051850A1 (zh) 基于dna聚合酶的基因组编辑***和方法
WO2022127894A1 (zh) 除草剂抗性植物
WO2023232109A1 (zh) 新的crispr基因编辑***
US20230227835A1 (en) Method for base editing in plants
IL303583A (en) Cannabis plant resistant to herbicides

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20854097

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20854097

Country of ref document: EP

Kind code of ref document: A1